How can we now cope with a large amount of data and still do a thorough job of analysis so that we don’t miss the Nobel Prize?— Bill Cleveland, Getting Past the Pie Chart, SEED Magazine, 2.18.2009
For the past year, I’ve been slowly drifting off my statistical roots – more interested in design and aesthetics than in whether or not a particular graphic works or the more numeric tools at my disposal. I’ve always had more fun experimenting on a bunch different things rather than really knuckling down on a particular problem. This works for a lot of things – like online musings – but you miss a lot of the important technical points in the process, so I’ve been (slowly) working my way back to the analytical side of the river.
If you really want to learn about a large dataset, visualization is only part of the answer. It’s an exploratory process. You create a graph. You create a whole bunch of graphs. Notice anything interesting? Okay, let’s look over there. This process is called exploratory data analysis, coined by famed statistician John Tukey back in the 1970s. Too often we settle on a particular graphic because it looks pretty, or worse, it helps prove our point. We get blinded by outside motivations, that we forget to listen and look at what else the data have to say. On the flip side, we often like to visualize everything at once and leave it at that. This works to an extent, but we miss out on a lot of details.
Basically, what I’m trying to say is that design can do wonders for visualization, yes, but so can analysis. Put the two together, and you’re going to gain a much better understanding of a dataset than if you were to have just one or the other. In my experience, designers are afraid of statistical methods and statisticians are oblivious to design. I say – put the two together. Learn both, and we’ll all be that much better at understanding the even bigger data to come.