I call myself a statistician, because, well, I’m a statistics graduate student. However, ask me specific questions about hypothesis tests or required sampling size, and my answer probably won’t be very good.
The other day I was trying to think of the last time I did an actual hypothesis test or formal analysis. I couldn’t remember. I actually had to dig up old course listings to figure out when it was. It was four years ago during my first year of graduate school. I did well in those courses, and I’m confident I could do that stuff with a quick refresher, but it’s a no go off the cuff. It’s just not something I do regularly.
Instead, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data. Here they are in no particular order.
Attention to Detail
Oftentimes it’s the little things that end up being the most important. There was this one time in class when my professor put up a graph on the projector. It was a bunch of data points with a smooth fitted line. He asked what we saw. Well, there was an increase in the beginning, a leveling off in the middle, and then another increase. However, what I missed was the little blip in the curve in the first increase. That was what we were after.
The point is that trends and patterns are important, but so are outliers, missing data points, and inconsistencies.
See the Big Picture
With that said, it’s important not to get too caught up with individual data points or a tiny section in a really big dataset. We saw this in the recent recovery graph. Like some pointed out, if we took a step back and looked at a larger time frame, the Obama/Bush contrast doesn’t look so shocking.
This should go without saying, but approach data as objectively as possible. I’m not saying you shouldn’t have a hunch about what you’re looking for, but don’t let your preconceived ideas influence the results. Because if you go to length looking for some specific pattern, you’re probably going to find it. It’ll just be at the sacrifice of accurate results.
Look Outside the Data
Context, context, context. Sometimes this will come in the form of metadata. Other times it’ll come from more data.
The more you know about how the data was collected, where it came from, when it happened, and what was going on at the time, the more informative your results and the more confident you can be about your findings.
Finally, and this is the most important thing I’ve learned, always ask why. When you see a blip in a graph, you should wonder why it’s there. If you find some correlation, you should think about whether or not it makes any sense. If it does make sense, then cool, but if not, dig deeper. Numbers are great, but you have to remember that when humans are involved, errors are always a possibility.
*Photo by misterbisson