• Data Visualization is Only Part of the Answer to Big Data

    March 20, 2009  |  Design, Exploratory Data Analysis

    How can we now cope with a large amount of data and still do a thorough job of analysis so that we don't miss the Nobel Prize?

    — Bill Cleveland, Getting Past the Pie Chart, SEED Magazine, 2.18.2009

    For the past year, I've been slowly drifting off my statistical roots - more interested in design and aesthetics than in whether or not a particular graphic works or the more numeric tools at my disposal. I've always had more fun experimenting on a bunch different things rather than really knuckling down on a particular problem. This works for a lot of things - like online musings - but you miss a lot of the important technical points in the process, so I've been (slowly) working my way back to the analytical side of the river.
    Continue Reading

  • John Tukey and the Beginning of Interactive Graphics

    January 1, 2008  |  Exploratory Data Analysis

    John TukeyWith the start of a new year, it only seems right to open with John Tukey and his work with interactive graphics. In 1972, when computers were giant and screens were green, John Tukey came up with PRIM-9, the first program to use interactive dynamic graphics to explore multivariate data. PRIM-9 allowed picturing, rotation, isolation, and masking. In other words, PRIM-9 allowed users to see multivariate data from different angles and identify structures in a dataset that might otherwise have gone undiscovered (kind of like the more recent GGobi).

    To fully appreciate the revolutionary nature of PRIM-9 one has to view it against the backdrop of its time. When Statistics was widely taken to be synonymous with inference and hypotheses testing, PRIM-9 was a purely descriptive instrument designed for data exploration. When statistics research meant research in statistical theory, employing the tools of mathematics, the research content of PRIM-9 was in the area of computer-human interfaces, drawing on tools from computer science. When the product of statistical research was theorems published in journals, PRIM-9 was a program documented in a movie.

    John W. Tukey's Work on Interactive Graphics. The Annals of Statistics, Vol. 30 No. 6. 2002.

    Luckily, you can appreciate Tukey's work here at the ASA video library. It's even more amazing when you consider where computers and technology were at back then. Who knows where Statistics would be if it weren't for Tukey and his brilliance and creativity. I can't imagine, or maybe I just don't want to.

    Tukey was someone who truly understood data -- structure, patterns, and what to look for -- and because of that, he was able to create something amazing.

  • Netflix Prize Dataset Visualization

    December 11, 2007  |  Exploratory Data Analysis

    netflix-prize (1)

    Most are familiar with the Netflix Prize. If you're not, Netflix has offered a one million dollar prize to whoever improves their movie recommendation by a certain amount. It's been going on for a little over a year with still no grand prize winner. The dataset is 100 million ratings.

    The above is a visualization of the Netflix dataset. Each dot represents a movie, and the closer two dots are the more similar the two corresponding movies are based on Netflix ratings. I'm guessing the orientation of the dots was decided by some variant of multidimensional scaling.

    It's kind of fun to scroll over the clusters. Like in the bottom right we see Babylon 5, Buffy the Vampire Slayer, Alias, and Battlestar Galactica clumped together. The giant blob in the middle, however, is pretty useless; it'd probably benefit from some zoom functionality.

    The Need to Explore

    I'm kind of surprised that I haven't seen more Netflix visualizations like this (or ones better than this), because I'm pretty sure it would help see some relationships that typical analysis won't provide. I was browsing the forum and saw someone ask if others had had success loading the 100 million observation dataset into R. Silly undergrad.

    A computer scientist, designer, and statistician walk into a bar; they discuss how they would boost the Netflix recommendation system. The punchline is that they win a million dollars, but I'm not sure what happens in between.

  • Transcript Analyzer for Republican Debate

    December 4, 2007  |  Exploratory Data Analysis

    New York Times Transcript Analyzer

    The New York Times recently put up a cool data exploration tool to sift through the transcript of the most recent Republican debate. They call it the transcript analyzer. There are three key features:

    1. View where candidates put in their two cents indicated by the blue, highlighted rectangles
    2. Read the actual chunks of transcript for each block
    3. Search the transcript to see when specific words and phrases were used indicated by the smaller gray highlighted rectangles

    My particular favorite is the search feature because it really allows readers to dig into the transcript or a reader can find out which candidate is (or isn't) talking about his or her point of interest and when in the debate the topic was discussed. The intuitive text scrolling is pretty awesome too. Good job, New York Times!

    [via Jon Udell]

  • Exploring Twitter with Blocks

    September 2, 2007  |  Exploratory Data Analysis

    twitter-blocks

    On their new exploration section, Twitter blocks is available for viewing and use. The viz is in Flash and is supposed to allow you to explore your neighbors as well as your neighbors' neighbors. I think the higher up the blocks are, the more recent. It's kind of hard to say. Other than that, I'm actually not really sure what I'm looking at. I thought it might be because I'm not following that many people, but I viewed the blocks for the public timeline and still had trouble deciphering. Maybe others will have better luck.

    Update: Michal posted on the feedback they've been getting on Twitter Blocks that's certainly worth reading:

    So we get this a lot: "Beautiful! But useless!". We've heard it in response to most projects we've done over the past few years (one exception has been Oakland Crimespotting, whose stock yokel response is: "no way am I moving to Oakland!").

    This kinda surprises me. I think their other projects are pretty useful and informative.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.
7ads6x98y