• Carnegie Mellon statistics professor Cosma Shalizi considers the differences and similarities between statistics and data science.

    If people want to call those who do such jobs “data scientists” rather than “statisticians” because it sounds more dignified, or gets them more money, or makes them easier to hire, then more power to them. If they want to avoid the suggestion that you need a statistics degree to do this work, they have a point but it seems a clumsy way to make it. If, however, the name “statistician” is avoided because that connotes not a powerful discipline which transforms profound ideas about learning from experience into practical tools, but rather, a meaningless conglomeration of rituals better conducted with twenty-sided dice, then we as a profession have failed ourselves and, more importantly, the public, and the blame lies with us. Since what we have to offer is really quite wonderful, we should not let that happen.

    Some time during the past couple of years, statistics became data science’s older, more boring sibling that always plays by the rules. There are a lot of statisticians who now call themselves data scientists. I still call myself a statistician.

    But I think we’re getting closer to that part in the movie when the older, more stuffy character learns from the young whipper snapper that loosening up could be a good thing, and when the young one realizes that some elbow grease and tradition can go a long way.

  • Adam Cole and Nelson Hsu for NPR plotted the percentage of people, ages 15 to 49, living with HIV from 1990 to 2009.

    By 1990, the world had a pandemic on its hands. In 1997, the peak of the epidemic, more than 3 million people became newly infected with HIV.

    Then science struck back. Drugs approved for HIV treatment in the mid-1990s proved profoundly effective, transforming AIDS from a death sentence to a chronic illness. Those treatments, combined with an international commitment to manage the disease by providing access to free drug therapy, led to a steep drop in new HIV infections.

    The countries in middle, eastern, and southern Africa stand out in the chart, like Swaziland with a whopping 25.9%, but most areas cluster well below five percent. Although the drop-down filters help some with country selection, the data probably would’ve benefitted from a chart that had a self-updating vertical axis.

  • Using the same tech Martin Wattenberg and Fernanda Viegas created to show wind flow, the NOAA Great Lakes Environmental Research Laboratory mapped water flow in the Great Lakes, based on forecasting simulations.

    The “Latest” and “3hrs Previous” visualizations depict water motion corresponding to a snapshot of lake currents at the present time and three hours previous to the present time. Lake currents can change rapidly with changing wind conditions.

    Surface currents tend to follow the wind direction more closely than currents at depth. Depth-averaged currents represent the average water motion from surface to bottom and tend to follow shoreline and bottom contours.

    The default map is semi-live, but you can also see flows for previous months. For example, the patterns during February 2011 are kinda cool, with a lot of swirling and well-defined currents.

  • If you want to learn visualization, you should learn data. To learn data, you should learn statistics. Where to begin? The free analysis courses offered on Coursera, by Johns Hopkins professors is probably a good place to start. Currently available: Computing for Data Analysis with biostatistics professor Roger D. Peng and Data Analysis with Jeff Leek, also a biostatistics professor.

    There’s also a handful of data-related courses from other university professors that might be worth a look.

  • Designer Matthew Olin unmasked the characters behind the typeface characters for his MFA thesis. Others include sans serif as Batman, slab serif as the Hulk, and handwriting as the Flash.

  • Geographers James Cheshire and Oliver O’Brien visualized life expectancy in London as a tube map.

    Whilst the average life expectancy predictions show that today’s children are expected to live longer, the range is startling. For the stations mapped, it is over 20 years with those around Star Lane (on the DLR) predicted to live, on average, for 75.3 years in contrast to 96.38 years for those around Oxford Circus. The smaller disparities are no less striking. For example, between Lancaster Gate and Mile End (20 minutes on the Central line) life expectancy decreases by 12 years and crossing the Thames between Pimlico and Vauxhall sees life expectancy drop by 6 years. The stations serving the Olympic Park fair badly and contrast with the Olympic volleyball venue at Earl’s Court whose spectators will be passing through areas with far higher life expectancies and lower child poverty

    The tube map metaphor is typically a stretch, in line with the periodic table of whatever, but this actually works. [via Guardian]