Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, ask me specific questions about hypothesis tests or required sampling size, and my answer probably won’t be very good.

The other day I was trying to think of the last time I did an actual hypothesis test or formal analysis. I couldn’t remember. I actually had to dig up old course listings to figure out when it was. It was four years ago during my first year of graduate school. I did well in those courses, and I’m confident I could do that stuff with a quick refresher, but it’s a no go off the cuff. It’s just not something I do regularly.

Instead, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data. Here they are in no particular order.

Attention to Detail

Oftentimes it’s the little things that end up being the most important. There was this one time in class when my professor put up a graph on the projector. It was a bunch of data points with a smooth fitted line. He asked what we saw. Well, there was an increase in the beginning, a leveling off in the middle, and then another increase. However, what I missed was the little blip in the curve in the first increase. That was what we were after.

The point is that trends and patterns are important, but so are outliers, missing data points, and inconsistencies.

See the Big Picture

With that said, it’s important not to get too caught up with individual data points or a tiny section in a really big dataset. We saw this in the recent recovery graph. Like some pointed out, if we took a step back and looked at a larger time frame, the Obama/Bush contrast doesn’t look so shocking.

No Agendas

This should go without saying, but approach data as objectively as possible. I’m not saying you shouldn’t have a hunch about what you’re looking for, but don’t let your preconceived ideas influence the results. Because if you go to length looking for some specific pattern, you’re probably going to find it. It’ll just be at the sacrifice of accurate results.

Look Outside the Data

Context, context, context. Sometimes this will come in the form of metadata. Other times it’ll come from more data.

The more you know about how the data was collected, where it came from, when it happened, and what was going on at the time, the more informative your results and the more confident you can be about your findings.

Ask Why

Finally, and this is the most important thing I’ve learned, always ask why. When you see a blip in a graph, you should wonder why it’s there. If you find some correlation, you should think about whether or not it makes any sense. If it does make sense, then cool, but if not, dig deeper. Numbers are great, but you have to remember that when humans are involved, errors are always a possibility.

*Photo by misterbisson

57 Comments

Become a member. Support an independent site. Make great charts.

See What You Get

Favorites

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.

Peak Non-Creepy Dating Pool

Based on the “half-your-age-plus-seven” rule, the range of people you can date expands with age. Combine that with population counts and demographics, and you can find when your non-creepy dating pool peaks.

Cycle of Many, a 24-hour snapshot for a day in the life of Americans

This is a 24-hour snapshot for a day in the life of Americans.

Guessing Names Based on What They Start With

I’m terrible at names, but maybe data can help. Put in your sex, the decade when you were born, and start putting in your name. I’ll try to guess before you’re done.