• # 3 Worthwhile Alternatives to the Pie Chart

August 19, 2008  |  Statistical Visualization

A while back I asked what you wanted to see more of on FlowingData. Thanks to the 447 of you who responded.

I was actually kind of surprised that there were so many votes for statistical visualization. I thought there would be more of a balance between design, art viz, and stat viz. I was, however, happy to see that the second most voted-on choice was "All of the Above." I must be doing something right! So by popular demand, here's some statistical visualization.

## Pie Chart Alternatives

Since the above pie chart is making some of you cringe in agony (although I can't imagine why), let's take a look a few alternatives for the pie chart using the same poll results.

### Bar Chart

How about a horizontal bar chart? The results are sorted and you can easily see the difference in voting counts.

### Stacked Bar Chart

The above bar chart is missing a little something though. It doesn't explicitly show that each bar is really a part of a whole - in this case, all the people who voted. How about a stacked bar chart then? It shows the groupings and is a little easier to read than the pie chart in the sense that it's linear differences as opposed to radial.

### Bubble Chart

Let's not forget our friends the bubbles. Carrying the same "problems" as a pie chart, the bubbles on the left are essentially a table with some flavor.

Personally, I still like the pie. Which one do you think is best? Or is there something else that might have been better than the above? How about a mosaic plot? Donut graph? A plain table?

• # Watch the Rise of Gasoline Retail Prices, 1993 – 2008

August 8, 2008  |  Projects, Statistical Visualization

Gas prices have been pretty crazy lately. I'm not used to paying over \$45 for a tank of gas in my fuel-efficient Honda Civic. I mean, come on, what the heck?

So naturally, we want to know, "What do the data look like for gasoline prices?" The Energy Information Administration has this data available for download. They have historic gas prices for certain states (not all, unfortunately) as well as for U.S. regions. Check out the animation showing the rise and fall... and rise.. and fall and rise of U.S. gas prices from 1993 up until now. Things started going crazy in 2006.

• # Can You Improve this Graph Showing Suicide Rates in Japan?

July 30, 2008  |  Statistical Visualization

Are you ready for another deconstruct/reconstruct exercise? I just posted a time series plot in the FlowingData forums that shows suicide rates and unemployment rates in Japan. Here are questions worth considering:

• What is the graph trying to show? Does it succeed?
• Is this the appropriate type of plot of this type of data?
• What would make the data more clear?

At a glance, the graph almost looks fine, but on a slightly deeper than superficial look, there are some clear problems.

• # There’s More Than One Way to Skin a Dataset

July 25, 2008  |  Statistical Visualization

Last week I asked if you could improve a mediocre bar chart showing party majorities by county. There was a resounding yes as many of you deconstructed and then reconstructed your own graphs. For reference, here's the original chart:

Here are the key flaws to the original that you all caught:

1. The x-axis tick marks were in really weird places;
2. The y-axis label was misleading because the data were number of counties;
3. Red and blue would make more sense for Democrats and Republicans;
4. Counts for counties don't match the years, because they are reversed;
5. We see a different story when we bring in data for undecided "other" and "declined to declare."

What was the graph trying to show? It was trying to show party registration in California over the past five presidential elections. Did it succeed? No. It failed miserably; however, you did much better. Here are all the reworks.

Brijesh made a stacked chart for Democrats and Republicans:

Tyler made a horizontal stacked bar chart with a useful majority line down the middle:

Blair provided some R code:

David used a tornado chart, which turned out well:

Amos went with a stacked line chart:

Kevin sent this one in:

John put together a few versions - this being one of about five:

Jorge went with simplicity:

Stack created a time series for the Dems and Reps:

Jake put up a fan favorite:

Nate, the graphic designer, embedded a stacked line chart inside the California boundaries:

This is the one I made at the workshop:

Personally, I like Jake and David's the best, but who gets the golden star for best graph? I'll let you be the judge.

• # Can You Improve this Mediocre Statistical Graphic?

July 18, 2008  |  Statistical Visualization

I'm on my way back home from the workshop Integrating Computing into the Statistics Curricula in Berkeley (and this time I managed to get through the line without getting yelled at). During one of the labs, there was an assignment called Deconstruct-Reconstruct which was a great way to learn how to improve statistical graphics. Basically, we picked apart (deconstruct) a graphic from Swivel and then created a better version (reconstruct).

## Your Mission, If You Choose to Accept it...

As I was making my own version, I thought to myself, "I bet FlowingData readers would do really well with this exercise." Let's see if I'm right. Can you deconstruct-reconstruct the above graphic? Here are questions worth considering:

• What is the graphic (trying) to show?
• Does the graphic achieve its goal?
• Are there other data that could make the plot more informative?
• How can we improve the bar chart?

I'll put my version a little later...This post will self-destruct in ten seconds...

• # Is Napoleon’s March the Greatest Statistical Graphic Ever?

July 17, 2008  |  Statistical Visualization

I'm starting to hear about Charles Minard's map of Napoleon's march time and time again - almost to the point of exhaustion. Is the map really that awesome, or is it just because Edward Tufte said so? Here is my question to all of you:

## Is Minard's map the best statistical graphic ever drawn?

I have my own thoughts about this, but more importantly, I want to know what you all think. If you don't think it's the best ever, what is? If you do think it's the greatest of all time, what's second best?

• # Reflecting On the Data Viz VI Conference

July 9, 2008  |  Statistical Visualization

A little over a week ago, I was in Bremen for the Data Viz VI conference. Read that Data Viz 6 - not Data Viz V.I., as I thought through the first three days.

I asked, "Is this the first one of these?"

"What do you mean? This is the sixth one. That's why it's called Data Viz SIX."

"Ah, ok, I did not get that."

Anyways, Adalbert and company put together an excellent conference, and I'm glad I was lucky enough to attend. It was the absolute best statistical conference I've ever been to. That's saying a lot, because it's the only statistical conference I've ever been to. But seriously, it was a good conference.

## Looking Backward, Looking Forward

Michael Friendly opened up with the almost obligatory talk on the history of statistical graphics and where the field is headed. Anyone who's opened up a Tufte book will have seen a lot of the examples he's used (e.g. Napoleon's march and John Snow's map), but the history behind some of the graphics was interesting. Sometimes statistical graphics tend to lose that back story and becomes all about the values, so it's always nice to hear the human part of datasets.

## Visual Analytics Tools for Analysis of Movement Data

My ears perked up when I saw "analysis of movement of data" in Gennady Andrienko's talk. I work with a lot of GPS data. I was reminded of the many ways to split up spatio-temporal data - by geographic section, by chunks of time, etc. It's easy to get caught up in the literal GPS traces on the map, so the talk was a good reminder. I do, however, wish Andrienko used more dynamic examples and branched out from Google Maps as the primary mapping tool. This was probably because his work is more computation-heavy than focused on interaction. Because of that, I was left wanting more than I got.

## GGobi for Exploratory Data Analysis

I had the chance to chat a bit with the group behind GGobi, an exploratory tool that lets you "tour" multidimensional data via different projections. (That is one nice group of people, let me tell you.) Off the top of my head, there were four separate talks from the group, showing the various applications GGobi can be applied to. It's kind of hard to explain in brief, so I'd encourage you to check out the free software from the GGobi site. If anything, it's fun to see your data move ala John Tukey.

## Parallel Coordinates - Good or Bad?

Al Inselberg promoted parallel coordinate plots (PCP) as the ultimate of statistical graphics. I got the sense that not everyone feels the same way. I remember during my second quarter as a graduate student, I proposed PCPs for a project. I was quickly rebuffed with a no way, those are horrible, and I simply moved on. After getting a personal demo from Inselberg though, I might have to take another look. Although, PCPs are certainly no panacea.

## Collaboration Wanted

Still, my main take away from Data Viz VI was the need for collaboration between design, computer science, and statistics. As we've seen on FlowingData, there's a lot of great visualization coming from all three camps, but I wish there were more collaboration between all. As Di pointed out, this can sometimes be difficult because statisticians need certain tools (i.e. R) to be tightly coupled with whatever visualization they're developing. But outside the pure analytical tool, I see a sweet spot at the epicenter of statistics, design, and computer science, which is certainly something to get excited about.

• # Statistical Graphics Conference – Jet Lag Wins. I Lose.

June 27, 2008  |  Statistical Visualization

As you might have noticed, I haven't been live blogging the Data Viz VI conference here in Bremen. I arrived Tuesday evening and on Wednesday, the first day of the conference, I woke up at 9:00am (which is midnight PDT), and my body said, "Nathan, I hate you. Go back to bed." I said no, and now I'm being punished. That's pretty much how it's been.

The actual conference, however, has been really interesting. Di Cook demoed GGobi via high school dropout salary data; Michael Friendly gave a nice talk on the golden age of statistical graphics; Gennady Andrienko talked a bit on clustering spatio-temporal data; and there have been plenty of other interesting ones in the mix. One criticism - Minard's map, showing the march of Napoleon, has been mentioned at least five times. Enough already.

## My Talk

I gave my talk on visualization for self-surveillance. I felt slightly off-topic talking more on design than on traditional statistical visualization, but no one threw any tomatoes at me, so that's okay. The emphasis was on collecting data about ourselves, looking for patterns, and gaining some insight on the way we live with my current project as the case in point.

## Animation in R

Yesterday, Andreas Buja got the audience's attention by using R for animation. He used R to show fishing boat activity off the Pacific coast simply using getGraphicsEvent(). The coding syntax was very similar to Actionscript where there is a listener, and when an event fires off, a function is called. For example, you can tell R to do something when the user clicks on the mouse. The animated map amazed a lot of people. I was mildly amused.

## Design and Statistics

I've always known about the big divide between statistics and design for data visualization, but I didn't really know how big the gap was until now. For example, Processing, which is the default tool for a lot of designers, is foreign to statisticians. At the same time, most designers have never touched or heard of R. From where I sit, I see two separate worlds trying to do the same thing - tell stories with data. Both sides have much to learn from the other. They just don't know it yet.

This is not to say that the two haven't done great things separately, because they have. But the potential is high when they merge. Throw computer science in there, which has found it way into seemingly everything as a necessity, and you've got something good on its way.

• # Voting Breakdown for Democratic Presidential Primaries

June 5, 2008  |  Statistical Visualization

The above New York Times graphic shows where each candidate got his or her support from. The x-axis (horizontal) represents strength of support and the y-axis shows the number of states.

On the surface, it's a stacked bar chart, but the animation as you browse the groups (e.g. under age 30, whites, blacks), makes things interesting. Highlight a state and watch it move left to right and right to left or just click on "blacks" and watch all the states shoot to the right in support of Obama. FlowingData readers will recognize the names of the skilled graphics editors who made the graphic - Shan Carter and Amanda Cox.

[Thanks, Chris]

• # Quickie Visualizations for Debugging

May 15, 2008  |  Statistical Visualization

This guest post is by Rahul Bhargava, a Senior Software Engineer at nTAG Interactive, makers of interactive name badges for conferences and meetings. Email him : rahul [ @ ] ntag . com

• # Poverty Statistics that Make Sense – Welcome to Povertyville and Slumtown

April 25, 2008  |  Statistical Visualization

Dan Beech represents worldwide poverty in this video, which is actually a 3-dimensional bar chart with some flare:

Welcome to Povertyville, Slumtown, and Low Income city. I'm not sure what to think. Should I laugh? Should I cry? I don't know. What do you think?

In this genre of over-produced graphs, Povertyville reminds me of the real estate roller coaster, a dramatic 3-D time series plot:

• # Rolling Out Your Own Online Maps and Graphs with HTML/CSS

April 24, 2008  |  Mapping, Statistical Visualization

Wilson Miner and Paul Smith, two co-founders of Everyblock, post tutorials and a little bit of their own experiences rolling out their own maps and creating graphs with web standards.

## Why Not Go With Google Maps?

Paul gets into the mechanics of how you can use your own maps discussing the map stack - browser UI, tile cache, map server, and finally, the data. My favorite part though was his reasons for going with their own maps:

Ask yourself this question: why would you, as a website developer who controls all aspects of your site, from typography to layout, to color palette to photography, to UI functionality, allow a big, alien blob to be plopped down in the middle of your otherwise meticulously designed application? Think about it. You accept whatever colors, fonts, and map layers Google chooses for their map tiles. Sure, you try to rein it back in with custom markers and overlays, but at the root, the core componentâ€”the map itselfâ€”is out of your hands.

Because it's so easy to put in Google Maps instead of make your own (although it is getting a little easier), everything starts to look and feel the same and we get stuck in this Google Maps-confined interaction funk. Don't get me wrong. Google Maps does have its uses and it is a great application. I look up directions with it all the time, but we should also keep in mind that there's more to mapping than bubble markers all in the color of the Google flag.

Remember: a little bit of design goes a long way.

## Data Visualization with Web Standards

Wilson provides a tutorial for horizontal bar charts and sparklines with nothing but HTML and CSS. Why would you want to do this when you could use some fancy graphing API? Using Everyblock as an example, data visualization can serve as part of a navigation system as opposed to a standalone graphic:

Sometimes the visualization isn't at the center of attention.

Make sure you check out Everyblock, a site that is all about the data in your very own neighborhood, to see these maps and graphs in action.

[Thanks, Jodi]

• # Chernoff Faces to Display Baseball Managers From 2007 MLB Season

April 4, 2008  |  Statistical Visualization

Check out this lovely use of Chernoff Faces by Steve Wang of Swarthmore College. This method of visualization was developed by none other than mathematician-statistician-physicist Herman Chernoff in 1973. These faces were designed on the premise that people could easily understand facial expressions. With that in mind, Chernoff used facial characteristics to represent multivariate data.

If you like, you can make your own Chernoff faces with this R library.

• # Is the New Google Visualization API Going to Limit Our Data Imagination?

March 21, 2008  |  Statistical Visualization

Google recently released a visualization API that allows you to share embeddable visualization on your website, create Google Gadgets that can be shared and reused, and create extensions for existing Google products. Andrew asks, "Will this shape the future of data visualization online?"

On one side, this is exciting for the visualization field, because when Google talks, everyone listens. On the opposing side, could this be another Google Maps type of thing? Google Maps was cool at first, but now, mashup after mashup has left me bored and disillusioned. Ultimately though, I like to think that this API is going to benefit all of us.

## What the API Offers

There's a slew of charts, graphs, gidgets, and gadgets available that you'll see in the gallery.

### Time Series

I'm sure this Google Finance-looking graph will make a lot of people happy.

### Gauges

These are, um, interesting.

### Maps

We've seen this before, but the difference here is that it's now in widget form, which means a hook into Google Docs and other apps.

## How We Will Benefit

If Google visualization becomes popular, visualization, in general, grows in popularity. People who weren't exposed will now know more, and if all goes according to plan, data awareness has a chance to develop.

As an example, Google Maps made online mapping what it is now - commonplace. Remember when online mapping was only limited to the big boys? Now everyone can mashup to their heart's content. People know how to use it and similar mapping applications and because of that, more "idea people" ask for mapping. As a result there is more opportunity.

Similarly, with the data viz API, we'll see data mashups outside of the map. Data visualization will no longer just be for the big boys, but at the same time, we'll still be able to make our own designs with a wider audience ready to experiment and play.

What do you think? Is the Google visualization API going to limit our imagination where we get stuck in a Google-ish funk; or is data and visualization awareness ready to rise to a point where we all benefit?

• # 17 Ways to Visualize the Twitter Universe

I just created a new Twitter account, and it got me to thinking about all the data visualization I've seen for Twitter tweets. I felt like I'd seen a lot, and it turns out there are quite a few. Here they are grouped into four categories - network diagrams, maps, analytics, and abstract.

## Network Diagrams

Twitter is a social network with friends (and strangers) linking up with each other and sharing tweets aplenty. These network diagrams attempt to show the relationships that exist among users.

The ebiquity group did some cluster analysis and managed to group tweets by topic.

I'm not completely sure how to read this one. I looks like it starts from a single user and then shoots out into the network.

• # Explore Your del.icio.us Tags and Bookmarks On 6pli

March 4, 2008  |  Statistical Visualization

Santiago, who I met at the Visualizar workshop, forwarded me his work on the visualization of del.icio.us tags and bookmarks called 6pli. Normally, I'm not a big fan of network diagrams, because I always seem to get lost in all the nodes and edges cluttering up the place. I feel differently about 6pli though.

6pli sets itself apart with really smooth, responsive interaction and three views - elastic net 3-d, elastic net 2-d, and circle 2-d. All three views rely on a metric of tag-similarity. So the more co-tags that a single tag has with its neighbors, the closer the tags will be in proximity.

Was that confusing? OK, it'll be more clear with pretty pictures.

## Elastic Net 3-D

The elastic net 3-D (pictured above) shows tags and bookmarks in a 3-dimensional view. Tags are in rectangles and bookmarks are circles. A bookmark (or circle) will be closer to another bookmark (or circle) if it has more tags in common. Similarly, if a tag is often grouped with other tags, it will appear closer to that group. Click on a tag, and a list of bookmarks show up on the right.

The cool part is when you start playing with the 3-D network blobby. You can rotate it like a globe and the movement is controlled by spring action. The visualization's response is immediate and really smooth with nice transitions from one view to the next, unlike this paragraph.

## Elastic Net 2-D

The 2-dimensional view is the same principle as the 3-D. The only difference is the 2-D is a projection of the 3-D view onto a flat plane. Smooth interaction still applies here.

## Circle 2-D

Finally, the circle view arranges tags and bookmarks into their del.icio.us bundles. Each circle is divided homogeneously and the radius of the circle can me manually modified.

One thing I would recommend for the beta release is some kind of input to type in a tag or the name of a bookmark. Right now, the starting point feels kind of random, but if I could specify where I wanted to explore, I think the viz would be that much more useful.

Check out my 6pli del.icio.us tags viz here.

• # Can We Improve this Graphic Showing History of Bipartisan Senate?

February 28, 2008  |  Statistical Visualization

David forwarded me his graphic on the modern two party system in the United States senate which essentially shows the senate's bipartisanship over time. It made me happy to see someone in political science using R, playing around with data, and taking a stab at creating a useful graphic.

## Improving the Graphic

While the graphic is indeed useful, I think there are some things that could make it even better. Here are thoughts that I sent to David.

• I wasn't immediately sure what each visual cue represented e.g. size of state abbrev. until I reached the bottom. It might be worth making the annotation more prominent either by position, size, or color or all three.
• To me, the congress numbers don't matter so much, but that just might be I don't have a lot of learning on the history of American government.
• I'm wondering if there's some way to make the labeling of the years more concise? If you just labeled with the first year of the two-year term, would it be obvious that you're describing a two-year term? What if you took away the alternating gray background and just made it all white and then had a bar timeline-type thing on top (and bottom)?
• What if you tried to use a color scheme? I mean, you have the red and blue for the reps and dems (which I think is right), but the gradient for the senate counts turns very bright pink and purple which doesn't go too well. Then there's the cyan, yellow, and green which doesn't seem to have any specific significance other than each color represents something. What I mean is... is there a reason you chose those colors?
• It might be worth making the annotations bigger so that you don't have to "zoom in" to read.
• I think I would make the median lines a bit more prominent, but that's just me.
• There's a lot of cool stuff getting represented here, and I wonder if anything might benefit as a separate graph. Would this benefit at all as a series of graphs instead of one large graphic?

So that's my opinion. What do you think? Judging from our FlowingData Facebook group (which I'm happy to see is growing), we have a very diverse bunch from design, statistics, computer science, and some other areas, so I'm eager to hear what the rest of you think about this visualization.

• # Is an Animated Transition From a Scatter Plot to a Bar Graph Effective?

February 20, 2008  |  Statistical Visualization

Statistical graphics are kind of stuck in a static funk where you create a plot in R, Excel, or whatever, and you can't really interact with it. If you want another graphic, you manually create it. Hence, Jeffrey Heer and George G. Robertson investigated the benefits of using animation in statistical graphics. Continue Reading

• # How to Read (and Use) a Box-and-Whisker Plot

February 15, 2008  |  Statistical Visualization

The box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used to show the distribution of a dataset (at a glance). Think of the type of data you might use a histogram with, and the box-and-whisker (or box plot, for short) could probably be useful.

The box plot, although very useful, seems to get lost in areas outside of Statistics, but I'm not sure why. It could be that people don't know about it or maybe are clueless on how to interpret it. In any case, here's how you read a box plot.

Let's say we ask 2,852 people (and they miraculously all respond) how many hamburgers they've consumed in the past week. We'll sort those responses from least to greatest and then graph them with our box-and-whisker.

Take the top 50% of the group (1,426) who ate more hamburgers; they are represented by everything above the median (the white line). Those in the top 25% of hamburger eating (713) are shown by the top "whisker" and dots. Dots represent those who ate a lot more than normal or a lot less than normal (outliers). If more than one outlier ate the same number of hamburgers, dots are placed side by side.

## Find Skews in the Data

The box-and-whisker of course shows you more than just four split groups. You can also see which way the data sways. For example, if there are more people who eat a lot of burgers than eat a few, the median is going to be higher or the top whisker could be longer than the bottom one. Basically, it gives you a good overview of the data's distribution.

That's all there is to it, so the next time you're thinking of making a bar graph or a histogram, think about using Tukey's beloved box-and-whisker plot too.