From statistics to data science, and vice versa

July 26, 2012

Topic

Statistics / data science

Carnegie Mellon statistics professor Cosma Shalizi considers the differences and similarities between statistics and data science.

If people want to call those who do such jobs “data scientists” rather than “statisticians” because it sounds more dignified, or gets them more money, or makes them easier to hire, then more power to them. If they want to avoid the suggestion that you need a statistics degree to do this work, they have a point but it seems a clumsy way to make it. If, however, the name “statistician” is avoided because that connotes not a powerful discipline which transforms profound ideas about learning from experience into practical tools, but rather, a meaningless conglomeration of rituals better conducted with twenty-sided dice, then we as a profession have failed ourselves and, more importantly, the public, and the blame lies with us. Since what we have to offer is really quite wonderful, we should not let that happen.

Some time during the past couple of years, statistics became data science’s older, more boring sibling that always plays by the rules. There are a lot of statisticians who now call themselves data scientists. I still call myself a statistician.

But I think we’re getting closer to that part in the movie when the older, more stuffy character learns from the young whipper snapper that loosening up could be a good thing, and when the young one realizes that some elbow grease and tradition can go a long way.

Tracking the spread of AIDS →

July 26, 2012

Topic

Statistical Visualization / health, interactive, NPR

Adam Cole and Nelson Hsu for NPR plotted the percentage of people, ages 15 to 49, living with HIV from 1990 to 2009.

By 1990, the world had a pandemic on its hands. In 1997, the peak of the epidemic, more than 3 million people became newly infected with HIV.

Then science struck back. Drugs approved for HIV treatment in the mid-1990s proved profoundly effective, transforming AIDS from a death sentence to a chronic illness. Those treatments, combined with an international commitment to manage the disease by providing access to free drug therapy, led to a steep drop in new HIV infections.

The countries in middle, eastern, and southern Africa stand out in the chart, like Swaziland with a whopping 25.9%, but most areas cluster well below five percent. Although the drop-down filters help some with country selection, the data probably would’ve benefitted from a chart that had a self-updating vertical axis.

Great Lakes currents map

July 25, 2012

Topic

Maps / animation, water

Using the same tech Martin Wattenberg and Fernanda Viegas created to show wind flow, the NOAA Great Lakes Environmental Research Laboratory mapped water flow in the Great Lakes, based on forecasting simulations.

The “Latest” and “3hrs Previous” visualizations depict water motion corresponding to a snapshot of lake currents at the present time and three hours previous to the present time. Lake currents can change rapidly with changing wind conditions.

Surface currents tend to follow the wind direction more closely than currents at depth. Depth-averaged currents represent the average water motion from surface to bottom and tend to follow shoreline and bottom contours.

The default map is semi-live, but you can also see flows for previous months. For example, the patterns during February 2011 are kinda cool, with a lot of swirling and well-defined currents.

Computing for data analysis

July 24, 2012

Topic

Statistics / courses, Johns Hopkins

If you want to learn visualization, you should learn data. To learn data, you should learn statistics. Where to begin? The free analysis courses offered on Coursera, by Johns Hopkins professors is probably a good place to start. Currently available: Computing for Data Analysis with biostatistics professor Roger D. Peng and Data Analysis with Jeff Leek, also a biostatistics professor.

There’s also a handful of data-related courses from other university professors that might be worth a look.

Typographic superheroes

July 23, 2012

Topic

Miscellaneous / heros, typography

Designer Matthew Olin unmasked the characters behind the typeface characters for his MFA thesis. Others include sans serif as Batman, slab serif as the Hulk, and handwriting as the Flash.

Life expectancy and child poverty as a tube map

July 23, 2012

Topic

Maps / health, poverty, tube

Geographers James Cheshire and Oliver O’Brien visualized life expectancy in London as a tube map.

Whilst the average life expectancy predictions show that today’s children are expected to live longer, the range is startling. For the stations mapped, it is over 20 years with those around Star Lane (on the DLR) predicted to live, on average, for 75.3 years in contrast to 96.38 years for those around Oxford Circus. The smaller disparities are no less striking. For example, between Lancaster Gate and Mile End (20 minutes on the Central line) life expectancy decreases by 12 years and crossing the Thames between Pimlico and Vauxhall sees life expectancy drop by 6 years. The stations serving the Olympic Park fair badly and contrast with the Olympic volleyball venue at Earl’s Court whose spectators will be passing through areas with far higher life expectancies and lower child poverty

The tube map metaphor is typically a stretch, in line with the periodic table of whatever, but this actually works. [via Guardian]

From statistics to data science, and vice versa

Topic

Tracking the spread of AIDS →

Topic

Great Lakes currents map

Topic

Computing for data analysis

Topic

Typographic superheroes

Topic

Life expectancy and child poverty as a tube map

Topic

Recently for Members

September 11, 2025
Making of: Salary and Occupation beeswarm charts

September 4, 2025
Data with missing information

August 28, 2025
Visualization Tools and Resources – August 2025

August 21, 2025
Visualization editing

August 14, 2025
Careless chart mistakes

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)

Browse by Chart Type See All →

Browse By Topic

Visualization

Maps

Infographics

Networks

Statistics

Software

Sources

Design

Made by FlowingData

The Process

Data Underload

Chart Everything

Guides

Books

Shop

Topic

Topic

Topic

Topic

Topic

Topic

Recently for Members

September 11, 2025 Making of: Salary and Occupation beeswarm charts

September 4, 2025 Data with missing information

August 28, 2025 Visualization Tools and Resources – August 2025

August 21, 2025 Visualization editing

August 14, 2025 Careless chart mistakes

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)

Browse by Chart Type See All →

Browse By Topic

Visualization

Maps

Infographics

Networks

Statistics

Software

Sources

Design

Made by FlowingData

September 11, 2025
Making of: Salary and Occupation beeswarm charts

September 4, 2025
Data with missing information

August 28, 2025
Visualization Tools and Resources – August 2025

August 21, 2025
Visualization editing

August 14, 2025
Careless chart mistakes