Last month, I had the pleasure of spending a week at the Census Bureau as a “visiting scholar.” They’re looking to boost their visualization efforts across all departments, and I put in my two cents on how to go about doing it. For being a place where there is so much data, the visual side of things is still in the early stages, generally speaking.
During all the meetings, there were recurring themes about what visualization is and what it is used for. Some people really got it, but others were new to the subject, and we ran into a few misconceptions that I think are worth repeating.
Here we go, in no particular order.
Visualization is for making data flashy
This is probably the most common one. It’s easy to look at a lot of the best visualization projects and want your data to look and feel the same way. So people ask, “I have such and such data. Is there a visualization technique that I can use to make it look cooler?”
Well, maybe. Not if you only have five data points though. You can spend a lot of time with icons or fancy print, but the graphics are interesting because the data that the visuals represent is interesting.
For example, I mapped the growth of Walmart a while back (It’s amazing how much mileage I get out of this graphic.), and people seem to like it because of the organic growth pattern. It starts in one area and spreads outwards like a virus.
Okay, compared to Toby Segaran’s original, I did add some interactive flourishes, but even without, the growth pattern is what makes the animation interesting.
For example, here’s a map with the same style as my Walmart one, but it shows the spread of Target. It’s not nearly as fun to watch, because Target took a more opportunistic approach of expansion. Locations pop up kind of randomly at times. It’s mostly interesting as a contrast to the Walmart map.
It should always be data first. Certain graphics get eyeballs because they show something that wouldn’t be seen in a table.
Software does everything
There are a lot of options for visualization, and the “best” one will change depending on who you ask.
My main point is that there is no one piece of software that will do everything for you.
Some software is good for analysis, some is good for specific types of analysis, and some is good for storytelling.
The more information in a single graphic, the better
A misstep a lot of people take when they’re trying to advance “beyond Excel” is to layer too much information on top of their basic graphic. I’m all for providing context and highlighting interesting spots in your data, but at some point it’s better to split your one chart into two or three charts.
Some people try to be clever by using multiple axes on a single plot or multiple visual cues in a single chart to save space. Again, this works sometimes. A lot of the time it doesn’t. Oftentimes, simple and clear is better than clever and compact.
My favorite test is to show a graphic to someone who doesn’t know the data and isn’t a visualization expert and see what they take away from the visual.
Visualization is too biased to be useful
There’s a certain amount of subjectivity that goes into any visualization as you choose what data to show and how to show it. By focusing on one part of the data, you might inadvertently obscure another. However, if you’re careful, get to know the data that you’re dealing with, and stay true to what’s there, then it should be easier to overcome bias.
After all, statistics is somewhat subjective, too. You choose what you analyze, what methods to use, and pick what to point out in reports.
News organizations, for example, have to do this all the time. They get a dataset, decide what story they want to tell (or find what story the data has to tell). Browse through graphics by The New York Times, and you can see how you can add a layer of information that objectively describes what the data is about.
It has to be exact
If you’re using visualization to show the exact value of every single data point, along with every standard error, you’re probably using it wrong. Accuracy is important. Yes. But visualization is less about the individual values and more about the distribution of them over time and space. You’re looking for (or showing) patterns. You’re comparing and contrasting.
If all you care about are individual data points, you might as well put it in a table.
Are there other common misconceptions that you can think of?