Now that we’ve covered the 7 basic rules to graph design, it’s time to go deeper, starting with the first: check the data.
I have to admit. Data checking is definitely my least favorite part of graph-making. I mean, when someone, a group, or a service provides you with a bunch of data, it should be up to them to make sure all of their data is legit, goshdarnit. But this is what good graph-makers do. After all, reliable builders don’t use shoddy cement for a house’s foundation. You don’t use shoddy data to build your data graphic.
Data-checking and verification is one of the most important—if not the most important—part of graph design.
What to look for
Basically, what you’re looking for is stuff that makes no sense. Maybe there was an error at data entry and someone added an extra zero (or missed one). Maybe there were connectivity issues during a data scrape, and some bits got mucked up in random spots. Whatever it is, you’ll want to verify with the source if anything looks funky.
The person who supplied the data usually has a sense of what to expect. If you were the one who collected the data, then just ask yourself if it makes sense. That state is 90% of whatever and all other states are only in the 10% to 20% range. What’s going on there?
Oftentimes, an anomaly is simply a typo, and other times it’s actually an interesting point in your dataset and that forms the whole drive for your story. Just make sure you know which one it is.
When you only have a few data points, you can probably just eyeball it. Otherwise, simple graphs in something like Excel or R will do the trick. Usually it’ll be best to make the bare minimum of what you want your final result to be. It doesn’t matter if they don’t look sexy. They’re just for fact-checking.
If you’re using R or some other stat software, you can look at the summary numbers (like mean, median, and your quartiles) pretty easily. In R, you use the
summary() command. Imagine that.
Everything check out? Great, your base is strong, and you can be confident in your final result.
Stay tuned for rule #2: explain your encodings.
Also, remember not to stop checking the data as you go through the rest of the graph design process. At the very least, after you are finished, compare the results you show on your graph with the original data set. There are plenty of opportunities to screw up the data during manipulation and transfer and you can’t always blame the source.
Pingback: Daily Links for August 16th through August 17th | Akkam's Razor
Oh the fun that can be found in the outliers. As a marketing guy and data geek I have found many interesting stories in the customers and prospects that fall in the outlier category. Some are indeed typos, but others are legit and have provided some fascinating ideas for new products and marketing strategy.