From statistics to data science, and vice versa

Posted to Statistics  |  Tags:  |  Nathan Yau

Carnegie Mellon statistics professor Cosma Shalizi considers the differences and similarities between statistics and data science.

If people want to call those who do such jobs “data scientists” rather than “statisticians” because it sounds more dignified, or gets them more money, or makes them easier to hire, then more power to them. If they want to avoid the suggestion that you need a statistics degree to do this work, they have a point but it seems a clumsy way to make it. If, however, the name “statistician” is avoided because that connotes not a powerful discipline which transforms profound ideas about learning from experience into practical tools, but rather, a meaningless conglomeration of rituals better conducted with twenty-sided dice, then we as a profession have failed ourselves and, more importantly, the public, and the blame lies with us. Since what we have to offer is really quite wonderful, we should not let that happen.

Some time during the past couple of years, statistics became data science’s older, more boring sibling that always plays by the rules. There are a lot of statisticians who now call themselves data scientists. I still call myself a statistician.

But I think we’re getting closer to that part in the movie when the older, more stuffy character learns from the young whipper snapper that loosening up could be a good thing, and when the young one realizes that some elbow grease and tradition can go a long way.


How to Spot Visualization Lies

Many charts don’t tell the truth. This is a simple guide to spotting them.

The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.