The important parts of data analysis

Posted to Statistics  |  Tags:  |  Nathan Yau

There’s plenty of software to muck around with data, but to gain the skills to really get something out of it, that takes time and experience. Mikio Braun, a post doc in machine learning, explains.

For a number of reasons, I don’t think that you cannot “toolify” data analysis that easily. I wished it would be, but from my hard-won experience with my own work and teaching people this stuff, I’d say it takes a lot of experience to be done properly and you need to know what you’re doing. Otherwise you will do stuff which breaks horribly once put into action on real data.

And I don’t write this because I don’t like the projects which exists, but because I think it is important to understand that you can’t just give a few coders new tools and they will produce something which works. And depending on how you want to use data analysis in your company, this might break or make your company.

Braun breaks it down into four bullet points worth a read, but the tl;dr version is that analysis isn’t simple, and no tool is going to do everything for you. It’s simple with simple data, but you can almost always go deeper with more data, and it takes experience to ask the right questions. So try not to be too content with that software output.


10 Best Data Visualization Projects of 2015

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.