Graph Design Rule #1: Check the data

Posted to Guides  |  Tags:  |  Nathan Yau

Now that we’ve covered the 7 basic rules to graph design, it’s time to go deeper, starting with the first: check the data.

I have to admit. Data checking is definitely my least favorite part of graph-making. I mean, when someone, a group, or a service provides you with a bunch of data, it should be up to them to make sure all of their data is legit, goshdarnit. But this is what good graph-makers do. After all, reliable builders don’t use shoddy cement for a house’s foundation. You don’t use shoddy data to build your data graphic.

Data-checking and verification is one of the most important—if not the most important—part of graph design.

What to look for

Basically, what you’re looking for is stuff that makes no sense. Maybe there was an error at data entry and someone added an extra zero (or missed one). Maybe there were connectivity issues during a data scrape, and some bits got mucked up in random spots. Whatever it is, you’ll want to verify with the source if anything looks funky.

The person who supplied the data usually has a sense of what to expect. If you were the one who collected the data, then just ask yourself if it makes sense. That state is 90% of whatever and all other states are only in the 10% to 20% range. What’s going on there?

Oftentimes, an anomaly is simply a typo, and other times it’s actually an interesting point in your dataset and that forms the whole drive for your story. Just make sure you know which one it is.

Useful tools

When you only have a few data points, you can probably just eyeball it. Otherwise, simple graphs in something like Excel or R will do the trick. Usually it’ll be best to make the bare minimum of what you want your final result to be. It doesn’t matter if they don’t look sexy. They’re just for fact-checking.

If you’re using R or some other stat software, you can look at the summary numbers (like mean, median, and your quartiles) pretty easily. In R, you use the summary() command. Imagine that.

Everything check out? Great, your base is strong, and you can be confident in your final result.

Stay tuned for rule #2: explain your encodings.


  • Also, remember not to stop checking the data as you go through the rest of the graph design process. At the very least, after you are finished, compare the results you show on your graph with the original data set. There are plenty of opportunities to screw up the data during manipulation and transfer and you can’t always blame the source.

  • Greg Timpany August 20, 2010 at 8:00 am

    Oh the fun that can be found in the outliers. As a marketing guy and data geek I have found many interesting stories in the customers and prospects that fall in the outlier category. Some are indeed typos, but others are legit and have provided some fascinating ideas for new products and marketing strategy.



Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

10 Best Data Visualization Projects of 2015

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.