Data Visualization is Only Part of the Answer to Big Data

How can we now cope with a large amount of data and still do a thorough job of analysis so that we don’t miss the Nobel Prize?

— Bill Cleveland, Getting Past the Pie Chart, SEED Magazine, 2.18.2009

For the past year, I’ve been slowly drifting off my statistical roots – more interested in design and aesthetics than in whether or not a particular graphic works or the more numeric tools at my disposal. I’ve always had more fun experimenting on a bunch different things rather than really knuckling down on a particular problem. This works for a lot of things – like online musings – but you miss a lot of the important technical points in the process, so I’ve been (slowly) working my way back to the analytical side of the river.

If you really want to learn about a large dataset, visualization is only part of the answer. It’s an exploratory process. You create a graph. You create a whole bunch of graphs. Notice anything interesting? Okay, let’s look over there. This process is called exploratory data analysis, coined by famed statistician John Tukey back in the 1970s. Too often we settle on a particular graphic because it looks pretty, or worse, it helps prove our point. We get blinded by outside motivations, that we forget to listen and look at what else the data have to say. On the flip side, we often like to visualize everything at once and leave it at that. This works to an extent, but we miss out on a lot of details.

Basically, what I’m trying to say is that design can do wonders for visualization, yes, but so can analysis. Put the two together, and you’re going to gain a much better understanding of a dataset than if you were to have just one or the other. In my experience, designers are afraid of statistical methods and statisticians are oblivious to design. I say – put the two together. Learn both, and we’ll all be that much better at understanding the even bigger data to come.


  • While charts and other data visualizations are probably sometimes intentionally manipulated, I think a lot of the time the errors are simply mistakes or oversights. Like you said, the designers and the statisticians like their jobs, but not each others.

    One site that I’ve found quite interesting has fun with the errors in informational graphics. Perhaps Flowing Data readers will find it interesting too.

    Check out Junk Charts at

  • simianmenace March 20, 2009 at 7:18 am

    Visualisations that work best for me are those where the presentation layer elucidates relationships within the data, easy on the eye yet still a lens. Large datasets are often samples of even more massive populations with sampling error still present.

  • himan powered March 20, 2009 at 3:32 pm

    Great intro into the subject. I would suspect the human factors field will have started looking into this. Certainly as we continue to need to make sense of huge data sets there will be real research into what visualizations do the best job of increasing usability versus what is pretty design. I wish I had the time to do this muself.

  • I suggest you have a look at HCE ( – it is a nice tool that try to fill this gap between statistics and visualizations

  • I have come to the exact same conclusion: use raw data visualization to define hypotheses before doing the statistical analysis, which leads to new visualizations again, and so on.

  • I couldn’t agree more. We (a design research group) recently made a partnership with a department of statistics, in order to be able to work both on analysis and visualization, integrating as much as possible these domains. I can say that the relationship works very well, and there are more common elements then one could expect.

  • @Paolo – that’s definitely something i’d like to hear more about as that relationship develops


Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.