• Can You Improve this Mediocre Statistical Graphic?

    July 18, 2008  |  Statistical Visualization

    I'm on my way back home from the workshop Integrating Computing into the Statistics Curricula in Berkeley (and this time I managed to get through the line without getting yelled at). During one of the labs, there was an assignment called Deconstruct-Reconstruct which was a great way to learn how to improve statistical graphics. Basically, we picked apart (deconstruct) a graphic from Swivel and then created a better version (reconstruct).

    Your Mission, If You Choose to Accept it...

    As I was making my own version, I thought to myself, "I bet FlowingData readers would do really well with this exercise." Let's see if I'm right. Can you deconstruct-reconstruct the above graphic? Here are questions worth considering:

    • What is the graphic (trying) to show?
    • Does the graphic achieve its goal?
    • Are there other data that could make the plot more informative?
    • How can we improve the bar chart?

    I'll put my version a little later...This post will self-destruct in ten seconds...

  • Is Napoleon’s March the Greatest Statistical Graphic Ever?

    July 17, 2008  |  Statistical Visualization

    I'm starting to hear about Charles Minard's map of Napoleon's march time and time again - almost to the point of exhaustion. Is the map really that awesome, or is it just because Edward Tufte said so? Here is my question to all of you:

    Is Minard's map the best statistical graphic ever drawn?

    I have my own thoughts about this, but more importantly, I want to know what you all think. If you don't think it's the best ever, what is? If you do think it's the greatest of all time, what's second best?

  • Reflecting On the Data Viz VI Conference

    July 9, 2008  |  Statistical Visualization

    A little over a week ago, I was in Bremen for the Data Viz VI conference. Read that Data Viz 6 - not Data Viz V.I., as I thought through the first three days.

    I asked, "Is this the first one of these?"

    "What do you mean? This is the sixth one. That's why it's called Data Viz SIX."

    "Ah, ok, I did not get that."

    Anyways, Adalbert and company put together an excellent conference, and I'm glad I was lucky enough to attend. It was the absolute best statistical conference I've ever been to. That's saying a lot, because it's the only statistical conference I've ever been to. But seriously, it was a good conference.

    Looking Backward, Looking Forward

    Michael Friendly opened up with the almost obligatory talk on the history of statistical graphics and where the field is headed. Anyone who's opened up a Tufte book will have seen a lot of the examples he's used (e.g. Napoleon's march and John Snow's map), but the history behind some of the graphics was interesting. Sometimes statistical graphics tend to lose that back story and becomes all about the values, so it's always nice to hear the human part of datasets.

    Visual Analytics Tools for Analysis of Movement Data

    My ears perked up when I saw "analysis of movement of data" in Gennady Andrienko's talk. I work with a lot of GPS data. I was reminded of the many ways to split up spatio-temporal data - by geographic section, by chunks of time, etc. It's easy to get caught up in the literal GPS traces on the map, so the talk was a good reminder. I do, however, wish Andrienko used more dynamic examples and branched out from Google Maps as the primary mapping tool. This was probably because his work is more computation-heavy than focused on interaction. Because of that, I was left wanting more than I got.

    GGobi for Exploratory Data Analysis

    I had the chance to chat a bit with the group behind GGobi, an exploratory tool that lets you "tour" multidimensional data via different projections. (That is one nice group of people, let me tell you.) Off the top of my head, there were four separate talks from the group, showing the various applications GGobi can be applied to. It's kind of hard to explain in brief, so I'd encourage you to check out the free software from the GGobi site. If anything, it's fun to see your data move ala John Tukey.

    Parallel Coordinates - Good or Bad?

    Al Inselberg promoted parallel coordinate plots (PCP) as the ultimate of statistical graphics. I got the sense that not everyone feels the same way. I remember during my second quarter as a graduate student, I proposed PCPs for a project. I was quickly rebuffed with a no way, those are horrible, and I simply moved on. After getting a personal demo from Inselberg though, I might have to take another look. Although, PCPs are certainly no panacea.

    Collaboration Wanted

    Still, my main take away from Data Viz VI was the need for collaboration between design, computer science, and statistics. As we've seen on FlowingData, there's a lot of great visualization coming from all three camps, but I wish there were more collaboration between all. As Di pointed out, this can sometimes be difficult because statisticians need certain tools (i.e. R) to be tightly coupled with whatever visualization they're developing. But outside the pure analytical tool, I see a sweet spot at the epicenter of statistics, design, and computer science, which is certainly something to get excited about.

  • Statistical Graphics Conference – Jet Lag Wins. I Lose.

    June 27, 2008  |  Statistical Visualization

    As you might have noticed, I haven't been live blogging the Data Viz VI conference here in Bremen. I arrived Tuesday evening and on Wednesday, the first day of the conference, I woke up at 9:00am (which is midnight PDT), and my body said, "Nathan, I hate you. Go back to bed." I said no, and now I'm being punished. That's pretty much how it's been.

    The actual conference, however, has been really interesting. Di Cook demoed GGobi via high school dropout salary data; Michael Friendly gave a nice talk on the golden age of statistical graphics; Gennady Andrienko talked a bit on clustering spatio-temporal data; and there have been plenty of other interesting ones in the mix. One criticism - Minard's map, showing the march of Napoleon, has been mentioned at least five times. Enough already.

    My Talk

    I gave my talk on visualization for self-surveillance. I felt slightly off-topic talking more on design than on traditional statistical visualization, but no one threw any tomatoes at me, so that's okay. The emphasis was on collecting data about ourselves, looking for patterns, and gaining some insight on the way we live with my current project as the case in point.

    Animation in R

    Yesterday, Andreas Buja got the audience's attention by using R for animation. He used R to show fishing boat activity off the Pacific coast simply using getGraphicsEvent(). The coding syntax was very similar to Actionscript where there is a listener, and when an event fires off, a function is called. For example, you can tell R to do something when the user clicks on the mouse. The animated map amazed a lot of people. I was mildly amused.

    Design and Statistics

    I've always known about the big divide between statistics and design for data visualization, but I didn't really know how big the gap was until now. For example, Processing, which is the default tool for a lot of designers, is foreign to statisticians. At the same time, most designers have never touched or heard of R. From where I sit, I see two separate worlds trying to do the same thing - tell stories with data. Both sides have much to learn from the other. They just don't know it yet.

    This is not to say that the two haven't done great things separately, because they have. But the potential is high when they merge. Throw computer science in there, which has found it way into seemingly everything as a necessity, and you've got something good on its way.

  • Voting Breakdown for Democratic Presidential Primaries

    June 5, 2008  |  Statistical Visualization

    voting

    The above New York Times graphic shows where each candidate got his or her support from. The x-axis (horizontal) represents strength of support and the y-axis shows the number of states.

    On the surface, it's a stacked bar chart, but the animation as you browse the groups (e.g. under age 30, whites, blacks), makes things interesting. Highlight a state and watch it move left to right and right to left or just click on "blacks" and watch all the states shoot to the right in support of Obama. FlowingData readers will recognize the names of the skilled graphics editors who made the graphic - Shan Carter and Amanda Cox.

    [Thanks, Chris]

  • Quickie Visualizations for Debugging

    May 15, 2008  |  Statistical Visualization

    This guest post is by Rahul Bhargava, a Senior Software Engineer at nTAG Interactive, makers of interactive name badges for conferences and meetings. Email him : rahul [ @ ] ntag . com

    A common thread in many of the great visualizations Nathan shares on Flowing Data is that they are created for external consumption - someone designs a neat way to represent a dataset to a larger, naive audience. I want to talk about the under appreciated utility of writing quick visualizations for yourself, to help you debug your own complicated or data-dense problems. This is not a new discussion, but I want to remind all the programmers out there that a speedily-created visual representation of your debugging log data might be the quickest way to find your problem! Below are some examples of what we've done at nTAG, and some techniques we've found particularly useful. Please post a comment about what you do.
    Continue Reading

  • Poverty Statistics that Make Sense – Welcome to Povertyville and Slumtown

    April 25, 2008  |  Statistical Visualization

    Dan Beech represents worldwide poverty in this video, which is actually a 3-dimensional bar chart with some flare:

    Welcome to Povertyville, Slumtown, and Low Income city. I'm not sure what to think. Should I laugh? Should I cry? I don't know. What do you think?

    In this genre of over-produced graphs, Povertyville reminds me of the real estate roller coaster, a dramatic 3-D time series plot:

  • Rolling Out Your Own Online Maps and Graphs with HTML/CSS

    April 24, 2008  |  Mapping, Statistical Visualization

    Wilson Miner and Paul Smith, two co-founders of Everyblock, post tutorials and a little bit of their own experiences rolling out their own maps and creating graphs with web standards.

    Why Not Go With Google Maps?

    Paul gets into the mechanics of how you can use your own maps discussing the map stack - browser UI, tile cache, map server, and finally, the data. My favorite part though was his reasons for going with their own maps:

    Ask yourself this question: why would you, as a website developer who controls all aspects of your site, from typography to layout, to color palette to photography, to UI functionality, allow a big, alien blob to be plopped down in the middle of your otherwise meticulously designed application? Think about it. You accept whatever colors, fonts, and map layers Google chooses for their map tiles. Sure, you try to rein it back in with custom markers and overlays, but at the root, the core component—the map itself—is out of your hands.

    Because it's so easy to put in Google Maps instead of make your own (although it is getting a little easier), everything starts to look and feel the same and we get stuck in this Google Maps-confined interaction funk. Don't get me wrong. Google Maps does have its uses and it is a great application. I look up directions with it all the time, but we should also keep in mind that there's more to mapping than bubble markers all in the color of the Google flag.

    Remember: a little bit of design goes a long way.

    Data Visualization with Web Standards

    Wilson provides a tutorial for horizontal bar charts and sparklines with nothing but HTML and CSS. Why would you want to do this when you could use some fancy graphing API? Using Everyblock as an example, data visualization can serve as part of a navigation system as opposed to a standalone graphic:

    Everyblock Graphs

    Sometimes the visualization isn't at the center of attention.

    Make sure you check out Everyblock, a site that is all about the data in your very own neighborhood, to see these maps and graphs in action.

    [Thanks, Jodi]

  • Chernoff Faces to Display Baseball Managers From 2007 MLB Season

    April 4, 2008  |  Statistical Visualization

    Check out this lovely use of Chernoff Faces by Steve Wang of Swarthmore College. This method of visualization was developed by none other than mathematician-statistician-physicist Herman Chernoff in 1973. These faces were designed on the premise that people could easily understand facial expressions. With that in mind, Chernoff used facial characteristics to represent multivariate data.

    If you like, you can make your own Chernoff faces with this R library.

  • Is the New Google Visualization API Going to Limit Our Data Imagination?

    March 21, 2008  |  Statistical Visualization

    Google recently released a visualization API that allows you to share embeddable visualization on your website, create Google Gadgets that can be shared and reused, and create extensions for existing Google products. Andrew asks, "Will this shape the future of data visualization online?"

    On one side, this is exciting for the visualization field, because when Google talks, everyone listens. On the opposing side, could this be another Google Maps type of thing? Google Maps was cool at first, but now, mashup after mashup has left me bored and disillusioned. Ultimately though, I like to think that this API is going to benefit all of us.

    What the API Offers

    There's a slew of charts, graphs, gidgets, and gadgets available that you'll see in the gallery.

    Time Series

    I'm sure this Google Finance-looking graph will make a lot of people happy.

    Time Series

    Gauges

    These are, um, interesting.

    Guages

    Maps

    We've seen this before, but the difference here is that it's now in widget form, which means a hook into Google Docs and other apps.

    Maps

    How We Will Benefit

    If Google visualization becomes popular, visualization, in general, grows in popularity. People who weren't exposed will now know more, and if all goes according to plan, data awareness has a chance to develop.

    As an example, Google Maps made online mapping what it is now - commonplace. Remember when online mapping was only limited to the big boys? Now everyone can mashup to their heart's content. People know how to use it and similar mapping applications and because of that, more "idea people" ask for mapping. As a result there is more opportunity.

    Similarly, with the data viz API, we'll see data mashups outside of the map. Data visualization will no longer just be for the big boys, but at the same time, we'll still be able to make our own designs with a wider audience ready to experiment and play.

    Good or Bad?

    What do you think? Is the Google visualization API going to limit our imagination where we get stuck in a Google-ish funk; or is data and visualization awareness ready to rise to a point where we all benefit?

  • 17 Ways to Visualize the Twitter Universe

    I just created a new Twitter account, and it got me to thinking about all the data visualization I've seen for Twitter tweets. I felt like I'd seen a lot, and it turns out there are quite a few. Here they are grouped into four categories - network diagrams, maps, analytics, and abstract.

    Network Diagrams

    Twitter is a social network with friends (and strangers) linking up with each other and sharing tweets aplenty. These network diagrams attempt to show the relationships that exist among users.

    Twitter Browser

    Twitter Browser

    Twitter Social Network Analysis

    The ebiquity group did some cluster analysis and managed to group tweets by topic.

    Twitter Social Network Analysis

    Twitter Vrienden

    Twitter Vrienden

    Twitter in Red

    I'm not completely sure how to read this one. I looks like it starts from a single user and then shoots out into the network.

    Twitter in Red

    Twitter Network

    Twitter Network

    Continue Reading

  • Explore Your del.icio.us Tags and Bookmarks On 6pli

    March 4, 2008  |  Statistical Visualization

    Santiago, who I met at the Visualizar workshop, forwarded me his work on the visualization of del.icio.us tags and bookmarks called 6pli. Normally, I'm not a big fan of network diagrams, because I always seem to get lost in all the nodes and edges cluttering up the place. I feel differently about 6pli though.

    6pli sets itself apart with really smooth, responsive interaction and three views - elastic net 3-d, elastic net 2-d, and circle 2-d. All three views rely on a metric of tag-similarity. So the more co-tags that a single tag has with its neighbors, the closer the tags will be in proximity.

    Was that confusing? OK, it'll be more clear with pretty pictures.

    Elastic Net 3-D

    The elastic net 3-D (pictured above) shows tags and bookmarks in a 3-dimensional view. Tags are in rectangles and bookmarks are circles. A bookmark (or circle) will be closer to another bookmark (or circle) if it has more tags in common. Similarly, if a tag is often grouped with other tags, it will appear closer to that group. Click on a tag, and a list of bookmarks show up on the right.

    The cool part is when you start playing with the 3-D network blobby. You can rotate it like a globe and the movement is controlled by spring action. The visualization's response is immediate and really smooth with nice transitions from one view to the next, unlike this paragraph.

    Elastic Net 2-D

    Elastic Net 2D

    The 2-dimensional view is the same principle as the 3-D. The only difference is the 2-D is a projection of the 3-D view onto a flat plane. Smooth interaction still applies here.

    Circle 2-D

    Circles 2D

    Finally, the circle view arranges tags and bookmarks into their del.icio.us bundles. Each circle is divided homogeneously and the radius of the circle can me manually modified.

    One thing I would recommend for the beta release is some kind of input to type in a tag or the name of a bookmark. Right now, the starting point feels kind of random, but if I could specify where I wanted to explore, I think the viz would be that much more useful.

    Check out my 6pli del.icio.us tags viz here.

  • Can We Improve this Graphic Showing History of Bipartisan Senate?

    February 28, 2008  |  Statistical Visualization

    David forwarded me his graphic on the modern two party system in the United States senate which essentially shows the senate's bipartisanship over time. It made me happy to see someone in political science using R, playing around with data, and taking a stab at creating a useful graphic.

    Improving the Graphic

    While the graphic is indeed useful, I think there are some things that could make it even better. Here are thoughts that I sent to David.

    • I wasn't immediately sure what each visual cue represented e.g. size of state abbrev. until I reached the bottom. It might be worth making the annotation more prominent either by position, size, or color or all three.
    • To me, the congress numbers don't matter so much, but that just might be I don't have a lot of learning on the history of American government.
    • I'm wondering if there's some way to make the labeling of the years more concise? If you just labeled with the first year of the two-year term, would it be obvious that you're describing a two-year term? What if you took away the alternating gray background and just made it all white and then had a bar timeline-type thing on top (and bottom)?
    • What if you tried to use a color scheme? I mean, you have the red and blue for the reps and dems (which I think is right), but the gradient for the senate counts turns very bright pink and purple which doesn't go too well. Then there's the cyan, yellow, and green which doesn't seem to have any specific significance other than each color represents something. What I mean is... is there a reason you chose those colors?
    • It might be worth making the annotations bigger so that you don't have to "zoom in" to read.
    • I think I would make the median lines a bit more prominent, but that's just me.
    • There's a lot of cool stuff getting represented here, and I wonder if anything might benefit as a separate graph. Would this benefit at all as a series of graphs instead of one large graphic?

    Now It's Your Turn

    So that's my opinion. What do you think? Judging from our FlowingData Facebook group (which I'm happy to see is growing), we have a very diverse bunch from design, statistics, computer science, and some other areas, so I'm eager to hear what the rest of you think about this visualization.

  • Is an Animated Transition From a Scatter Plot to a Bar Graph Effective?

    February 20, 2008  |  Statistical Visualization

    Statistical graphics are kind of stuck in a static funk where you create a plot in R, Excel, or whatever, and you can't really interact with it. If you want another graphic, you manually create it. Hence, Jeffrey Heer and George G. Robertson investigated the benefits of using animation in statistical graphics. Continue Reading

  • How to Read (and Use) a Box-and-Whisker Plot

    February 15, 2008  |  Statistical Visualization

    Box-and-Whisker Plot LessonThe box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used to show the distribution of a dataset (at a glance). Think of the type of data you might use a histogram with, and the box-and-whisker (or box plot, for short) could probably be useful.

    The box plot, although very useful, seems to get lost in areas outside of Statistics, but I'm not sure why. It could be that people don't know about it or maybe are clueless on how to interpret it. In any case, here's how you read a box plot.

    Reading a Box-and-Whisker Plot

    Box-and-Whisker Plot ExplainedLet's say we ask 2,852 people (and they miraculously all respond) how many hamburgers they've consumed in the past week. We'll sort those responses from least to greatest and then graph them with our box-and-whisker.

    Take the top 50% of the group (1,426) who ate more hamburgers; they are represented by everything above the median (the white line). Those in the top 25% of hamburger eating (713) are shown by the top "whisker" and dots. Dots represent those who ate a lot more than normal or a lot less than normal (outliers). If more than one outlier ate the same number of hamburgers, dots are placed side by side.

    Find Skews in the Data

    The box-and-whisker of course shows you more than just four split groups. You can also see which way the data sways. For example, if there are more people who eat a lot of burgers than eat a few, the median is going to be higher or the top whisker could be longer than the bottom one. Basically, it gives you a good overview of the data's distribution.

    That's all there is to it, so the next time you're thinking of making a bar graph or a histogram, think about using Tukey's beloved box-and-whisker plot too.

    Want to learn more about making data graphics? Become a member.

  • New Hampshire Graphic from The Times

    January 11, 2008  |  Statistical Visualization

    This graphic is from The New York Times graphics department. It matches the FlowingData colors. That is all. Oh, and it's excellent, but that's a given, right? Note the use of each bar's two dimensions.

  • One Day in the Life of the Average American

    December 17, 2007  |  Statistical Visualization

    Jobs by Happiness from Time MagazineTime Magazine's multimedia section has a fun, little piece showing some statistics for a day in the life of the average American. There's some mapping for average commute time, annual traffic delays, and city population shifts. I'm not a huge fan of the map on the third dimension, but oh well.

    There's also a simple grid ranking jobs by level of happiness. Priests are apparently the happiest with gas station attendants at the very bottom. Poor gas station attendants. I guess I might classify myself as a computer programmer which is somewhere in between waiters and dress makers. Maybe I should consider a change in focus. Although, I could also consider myself an engineer which is towards the top of the rankings. Alright, I'm an engineer. The title of "computer programmer" has a weird stigma attached to it anyways.

  • A Magazine Dedicated Entirely to Visualizing Something Useful

    October 19, 2007  |  Statistical Visualization

    GOOD Magazine is "media for people who give a damn."

    While so much of today's media is taking up our space, dumbing us down, and impeding our productivity, GOOD exists to add value. Through a print magazine, feature and documentary films, original multimedia content and local events, GOOD is providing a platform for the ideas, people, and businesses that are driving change in the world.

    My favorite part of the magazine is the transparency section, which is a series of graphics displaying data in one way or another. The graphic (or video, I guess) above shows what companies are paying to advertise in New York City. The Walmart graphic I talked about earlier is in the most recent GOOD.

    What if...

    What if instead of just a section, there was an entire magazine that was a transparency section? Now that would be awesome. It could be a mix of the media & design in GOOD with some real statistical graphics. It would be a complete visual experience with of course a short blurb on each, but the magazine would focus on the graphics to inspire change and improve awareness. (Picture good. Words.... baaaad.)

    Each issue would hover around a specific theme like the environment or economics; or even better, each issue could be more specific covering U.S. pollution or the decline of toy sales. I wonder how hard it would be to start something like that. Online first, print second? Is there a magazine already like this? If there isn't, there needs to be.

  • Presidential Nomination Polls With Smoothers

    October 11, 2007  |  Statistical Visualization

    Pollster Poll Results

    It almost feels like I see a new poll every day for who's leading in the presidential race. There's usually a good amount of fluctuation within a single poll with sampling margin of error and then of course the numbers vary across multiple polls. This can be confusing at times, so Pollster put all the results in one scatter plot. Then they stuck a smoother through all the points (for each candidate), and just like that, the viewer gets a general sense of how each candidate has been doing.

    Keep in mind that the amount of noise (or bumps in the curve) is going to vary depending on the type of estimation you use, so I wouldn't place the smaller curves under too much scrutiny. I'm not sure what method Pollster is using, but it's interesting to see the overall trends. Could we be looking at a double New Yorker election?

    Pollster also offers the raw poll data, so in case you want to have some of your own fun, there's data waiting for you.

    [via Mike Love]

  • iPod Design and Apple Stock Over the Years

    September 9, 2007  |  Statistical Visualization

    Wall Street Journal put up a nice little graphic showing the evolution of the iPod along with Apple's stock price. Semi-informative, I guess. Probably more of a fun graphic than anything else. I think it's slightly misleading, suggesting the iPod was the only reason Apple's stock changed. Let's not forget about the iBook, iMac, Macbook, etc releases. Nevertheless, it's cool to see Apple's sexy design over the years.

    [link via Core77]

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.