Going Beyond Collaborative Visual Analytics with Statistics

Jeffrey Heer et al. writes in Design Considerations for Collaborative Visual Analytics about a couple of models for social visualization — information visualization reference model and the sensemaking model. The former is a simpler, more straightforward model starting with raw data -> processed data -> visual structures -> actual visualization; while the latter is a bit more complicated with similar stages but with feedback loops. My main reflections weren’t so much with the ideas proposed by the paper. Rather, I’m more interested in what was not mentioned — not only in this paper but in other social data analysis papers.

Social data analysis so far has seemingly stayed inside the visualization bubble (for the most part) with little talk of statistics or traditional analysis. Don’t get me wrong. I love data visualization and am all for “harnessing the power of human cognition,” but I think some quantitative analysis needs to be in the loop. As most of the SDA models are now, analysis starts with the visualization; a group of people interpret the viz; and then somehow they come to some kind of consensus, maybe.

Think Like a Statistician

What if we started with some visualization, then some stat, back to viz, so on and so forth? I’m just thinking of how I would approach a large dataset with a group of stat people. It’s almost always exploratory data analysis (EDA) first, find something interesting in the viz, run some analysis, go back to the viz, so on and so forth.

With powerful data visualization coupled with statistics, there’s definitely something there — especially when it’s all wrapped up in the ideas of socialized data. Maybe? At the very least, we can put Statistics at the end of the flow chart for some kind of validation of the group’s findings. Visualization and EDA, as it is now, can only give us a certain level of results with a limited amount of reliability.


  • Great point. Have you tried Statcrunch or Covariable? I would be interested to hear your opinions on those.

  • The other problem with “social” data analysis is that it often seems to be a case of the blind leading the blind (off the edge of cliff). I’m all for making visualisation and data analysis more available to the general public, but it seems important that these sites help people to learn what good data analysis is.

  • @anon: I came across Statcrunch a while back, but for some reason I didn’t look any deeper than the homepage. I think maybe because of the membership fee. Covariable is new to me. I think it’s time I take a deeper look into both. Thanks for the pointers.

    @Hadley: I agree, and I think at the root of that problem is that the people creating these sites don’t really know what _good_ data analysis is either (with some exceptions of course). I like to think though that people eventually will learn.