Data Science is catching on

Posted to Statistics  |  Nathan Yau

Maybe there's something to this whole data science thing after all. Mike Loukides describes data science and where it's headed on O'Reilly Radar. It's a good read, but statisticians get clumped into suits crunching numbers like actuarial drones:

Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.

What is data science? It's what well-rounded statisticians do.


  • hey careful! some of us actuarial drones read this blog too you know :)

  • The quote from Mike Driscoll (@dataspora) and the point that modelling is key and “[d]ata science isn’t just about the existence of data, or making guesses about what that data might mean; it’s about testing hypotheses and making sure that the conclusions you’re drawing from the data are valid” make up for the jab at “plain” statisticians.

  • ‘I love it when a plan comes together’. I like this part: ;-)

    “Where do you find the people this versatile? According to DJ Patil, chief scientist at LinkedIn (@dpatil), the best data scientists tend to be “hard scientists,” particularly physicists, rather than computer science majors. Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you’ve just spent a lot of grant money generating data, you can’t just throw the data out if it isn’t as clean as you’d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn’t what you think it’s telling.”


The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

Watching the growth of Walmart – now with 100% more Sam’s Club

The ever so popular Walmart growth map gets an update, and yes, it still looks like a wildfire. Sam’s Club follows soon after, although not nearly as vigorously.

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.