• Who else has been enjoying the Olympics as much as I have? I think I might have developed an unhealthy obsession to the games these past few days with the 800 kajillion hours of NBC coverage.
    Read More

  • I’m sure you’ve seen Wordle by now, which puts an artistic spin on the traditional tag cloud. An application by Jonathan Feinberg, Wordle lets you put any text or RSS/atom feed in as input and get a cloud of words sized by frequency and arranged every which way. Above is a Wordle cloud of the current FlowingData feed.

    Many Eyes recently added Feinberg’s visualization to their slew of other visualization tools.

    Wordle marks a departure from the more analytical visualizations on Many Eyes. Why bring a self-described “toy” to a site for social data analysis? People have reported finding value beyond entertainment in creating these word clouds. Teachers have used Wordles in classrooms as conversation catalysts; others have created them to express their identities, and scholars have used them to visualize the output of statistical explorations of texts.

    No doubt Many Eyes, with Martin Wattenberg and Fernanda Viégas (who know a thing or two about design) at the helm, recognizes that data visualization isn’t always about analytics and exactness. Sometimes visualization is just about getting people to think.

  • The FlowingData forums have been up for a few weeks now, and it’s been fun watching it slowly grow. Thanks to those who’ve joined the community and a big thank you to those who have helped the community develop by posting topics, questions, and data.

    Some Recent Topics

    Visualizing Comic Book Saleshijinx, a comic book retailer, uses data to decide how to stock his inventory. What books sellout in the first week? Which ones are sell at a slower rate, but for a longer period of time?

    The Complex Venn Diagram Problem – Venn diagrams can be useful and intuitive, but once you have a need to show more than 3 categories, things start to get messy. JulianF wonders if there are any more creative options out there.

    Do You Want My Personal Fluids (Data)?Tim collects data about himself like no other. He offers his data for all the drinks he’s consumed over the past 6 months. I’m sure someone can have some fun with that data.

    Visualize Flight Delays. Win $1,000.Hadley posts details for Data Expo 2009. It’s 20 years and 12 gigabytes of flights in the United States. Can you make sense of it all? Keep an eye out here on FlowingData as I have some fun with this dataset in the coming weeks.

    If you haven’t joined yet, why not join now? It’s free and ridiculously awesome.

  • The Shirt Project, by Rich Watts and Louise Ma, takes the infographics out of the newspaper and puts them onto brightly colored tshirts. What a great idea. They put out a new shirt every couple of months and topics range from the New York steam explosion to a bit of pop culture celebrating the birthdays of Michael Jackson, Prince, and Madonna. Subscribe to tshirts the same way you subscribe to newspapers. For the low price of $20, you can be both stylish and educational.

  • The BBC has a gorgeous documentary series that started yesterday — Britain from Above. They take a look at Britain from the skies using satellite technology and GPS data. Watch patterns emerge as taxis, ships, and planes travel back and forth and information and data pass through Britain’s national telephone network. The imagery is beautiful. I love visualization that brings data to life.

    The high-resolution videos don’t seem to be working right now, but here’s just a small sample:

  • Gas prices have been pretty crazy lately. I’m not used to paying over $45 for a tank of gas in my fuel-efficient Honda Civic. I mean, come on, what the heck?

    So naturally, we want to know, “What do the data look like for gasoline prices?” The Energy Information Administration has this data available for download. They have historic gas prices for certain states (not all, unfortunately) as well as for U.S. regions. Check out the animation showing the rise and fall… and rise.. and fall and rise of U.S. gas prices from 1993 up until now. Things started going crazy in 2006.
    Read More

  • Remember SimCity 2000? That was a great game. That was probably the last computer game I played for any significant length of time, and if my Macbook Pro were able to read 5-inch floppies, I’d totally pop it in and build myself a city called Yau Town.

    Put the look of SimCity 2000 together with Google Maps, and you get OnionMap. Most of the site is in Korean, but from what I gather it aims to be something of a tourist guide with a little bit of social network mixed in. That part of OnionMap is a little fuzzy, but it was worth the five minutes for the maps.

    [Thanks, Tim]

  • Google announced Insights for Search yesterday. Think Google Trends but with more information and more useful. Type in some search terms and get the rundown on interest over time based on search volume, regional interest, and related searches. It’s geared towards advertisers using AdWords, but it can still be interesting to outsiders.

    For example, I put in a search for data + visualization + design + statistics and got the above. Apparently interest for all of those subjects (i.e. FlowingData) is on the decline and India sure loves its data. I’m packing my bags to India as we speak.

    [via TechCrunch]

  • Lee Byron, Amanda Cox and Matthew Ericson of the New York Times graphics department map Olympic medals starting from the first one hosted by the International Olympic Committee in 1896 up to the most recent one in Athens. It looks like someone has an affinity for the colliding ball effect. Not that that’s bad or anything.
    Read More

  • Circos is a project by Martin Krzywinski that lets you upload genomic data and visualize it as a network like the one above.

    It is easy to plot, format and layer your data with Circos. A large variety of plot and feature parameters are customizable, helping you make the image that best communicates your data. You supply your data to Circos as flat files (e.g. GFF format), tell Circos what you want plotted using the configuration file, and then create the image.

    While Circos is developed in the interest of visualizing genomic data, it is general enough that you can use it with other types of data that show relationships. The New York Times debate graphic is the first thing that comes to mind. Anyone want to give Circos a spin? Post a link to your image in the comments.

    [Thanks, Max]

  • Our FlowingData community went up from 2,641 subscribers last month to about 4,100, so more than a third of you are new. Welcome (and thanks to the those of you who have obviously been spreading the word :). As a new reader, you might not know where to begin, so let me show you around.
    Read More

  • Christopher Nolan’s Dark Knight, starring Christian Bale and the late Heath Ledger, has been breaking records left and right. After only 10 days, the movie passed the $300 million mark – faster than any move before it. Pirates of the Caribbean: Dead Man’s Chest was the previous record holder. Pirates did it in 16 days.

    So the next record that everyone’s wondering about is — Will Dark Knight make more than $600 million to beat Titanic as the highest grossing film of all time? So far it’s been 12 days and has grossed $333,929,159. Punch your answer in the poll below.

    {democracy:5}

    How much do you think Dark Knight will make (domestically)? I say it won’t do it — $525 million tops.

  • A new version of Flare, the data visualization toolkit for Actionscript (which means it runs in Flash), was just released yesterday with a number of major improvements from the previous version. The toolkit was created and is maintained by the UC Berkeley Visualization Lab and was one of the first bits of Actionscript that I got my hands on. The effort-to-output ratio was pretty satisfying, so if you want to learn Acitonscript for data visualization, check out Flare. The tutorial is a good place to start.

    Here are some sample applications created with Flare:

    [Thanks, Jeff]

  • Are you ready for another deconstruct/reconstruct exercise? I just posted a time series plot in the FlowingData forums that shows suicide rates and unemployment rates in Japan. Here are questions worth considering:

    • What is the graph trying to show? Does it succeed?
    • Is this the appropriate type of plot of this type of data?
    • What would make the data more clear?

    At a glance, the graph almost looks fine, but on a slightly deeper than superficial look, there are some clear problems.

  • Through the Internet, sharing data has — you know what, I’m not even going to try to make this relevant. A car exploded in my driveway!!!!

    It was 6am and I was laying in bed. There was a continuous honking horn that was annoying the crap out of me. I figured someone was trying to get someone else to move their car so that they could pull out, but after a minute of one long honk, there was a huge BOOOOMMMM!

    I ran to my office window, and I saw a car on fire!! I managed to get some of it on camera:

    It was quite the sight – and now my apartment smells like smoke. Luckily no one was hurt.

  • Barcodes. We all know what they look like. They’re the black stripes that vary in thickness with numbers that indicate something or another, but what is that something? Every product has a unique barcode number and when you pass it through an international key database, you get information about the product and the country of origin. Daniel Becker uses this data to create art in Barcode Plantage.

    Once a bar code is keyed or scanned in, the program sends a request to the database, which returns a master file data. This master file data is then analysed to define positions, curves and colours of Bezier curves of the tree structure.

    The number of these curves will vary correspondence to the number of figures in the code. Simultaneously, the user will hear a melody, which is based on the figures of the bar code.

    Because every barcode is unique so is the resulting tree. Pretty.

    [via swissmiss]

  • I’ve always liked twittervision. I’m not sure what it is, but it’s strangely mesmerizing, getting a tiny peak into others’ lives. This weekend, I recreated twittervision with a little bit of style for good measure. Say hello to Twitter World.

    The Data

    Twitter World shows updates from the Twitter public timeline, and makes use of the twittervision API for location. Until I get whitelisted for the Twitter API, I’m polling Twitter and twittervision every six minutes to keep things fresh. Hopefully neither putters out.

    The Implementation

    Like my visualization showing the spread of Walmart, I used Modest Maps (+ OpenStreetMap) to map things out, and I used TweenFilterLite to animate. I had all the gears in place and everything working nicely a couple of hours in – but that was with a flat XML file. The hard part was feeding the thing live data and then making sure everything was synchronized. That took, um, too much time.

    In any case, not bad for a weekend project.

    PS. Don’t forget to follow me on Twitter :)

  • Last week I asked if you could improve a mediocre bar chart showing party majorities by county. There was a resounding yes as many of you deconstructed and then reconstructed your own graphs. For reference, here’s the original chart:

    Here are the key flaws to the original that you all caught:

    1. The x-axis tick marks were in really weird places;
    2. The y-axis label was misleading because the data were number of counties;
    3. Red and blue would make more sense for Democrats and Republicans;
    4. Counts for counties don’t match the years, because they are reversed;
    5. We see a different story when we bring in data for undecided “other” and “declined to declare.”

    What was the graph trying to show? It was trying to show party registration in California over the past five presidential elections. Did it succeed? No. It failed miserably; however, you did much better. Here are all the reworks.

    Brijesh made a stacked chart for Democrats and Republicans:

    Tyler made a horizontal stacked bar chart with a useful majority line down the middle:

    Blair provided some R code:

    David used a tornado chart, which turned out well:

    Amos went with a stacked line chart:

    Kevin sent this one in:

    John put together a few versions – this being one of about five:

    Jorge went with simplicity:

    Stack created a time series for the Dems and Reps:

    Jake put up a fan favorite:

    Nate, the graphic designer, embedded a stacked line chart inside the California boundaries:

    This is the one I made at the workshop:

    Personally, I like Jake and David’s the best, but who gets the golden star for best graph? I’ll let you be the judge.

  • Lee Byron, recent Carnegie Mellon grad and newly inducted New York Times graphics intern, maps walkability in San Francisco. He scraped Walk Score for uh, walk scores, which are scores from 0-100 based on the amenities around a location like “nearby stores, restaurants, schools, parks, etc” – how easy it is to live without a car.

    Color was calculated on a per pixel basis using bicubic interpolation. From there he let Processing do the graphical labor to construct a map overlay. The result, which is accurate to the block, is a pretty one.

    If you want data (sans map) for your own neighborhood, Lee has kindly provided the scraper.

  • In the FlowingData forums, Ryan asks a really good question about data design:

    What simple rules should we all follow when we present data?

    I came up with three rules of thumb a while back, but surely there are more. Context, clarity, and real data are clear winners, but what else is there? Those are really broad and can be broken down a few ways – like reducing the number of variables could contribute to clarity. If you have any ideas, please do post your ideas to the forum thread.

    Ah yes, I can hear you flipping through your Tufte books.