• Visual Website Analytics in Video Game Format

    Posted to Visualization

    Visitorville screenshotFor a while now, I've been interested in how we can apply interaction principles of video games to visualization and exploratory data analysis (although admittedly, gaming is still a very foreign concept to me). Visitorville is an example of how the fun of video games can be applied to analytics. It looks a lot like the awesome classic SimCity (whose source code was recently released, by the way).

    VisitorVille applies video game principles to help you easily visualize and better understand your web site traffic statistics.

    It's easy: each building represents a web page; each bus a search engine; and each animated character a real visitor to your site.

    Just paste our tracking code into your web pages, then launch VisitorVille for Windows to analyze your stats, watch your traffic in real time, provide Live Help, track your PPC campaigns in real time -- and more.

    Using our unique Virtual VCR, you can even play back traffic from any day or time, at any speed.

    Learning From Video Games

    We certainly have a lot to learn from video games -- interaction, user engagement, graphics, and fun. Seriously, statistical visualization could stand to have a little bit 'o fun tossed in. At least that's what I tell my wife when I try to convince her to buy me an Xbox 360.

    Somewhat related note -- there was an interesting talk at Journalism 3G on using video games to tell stories, which I'll be discussing some time in the near future once I get all my notes together.

    [via Water Cooler Games | Thanks, Iman]

  • Talk to the New York Times Graphics Director, Steve Duenes

    Posted to Site News

    Everyone knows that The New York Times produces great graphics. I bet you're interested in how those graphics get made. What's the process of making a graphic? What makes a good visual journalist? What's a day in the life of a New York Times graphics editor? Now you can find out.

    From February 25 (um, yesterday) until this Friday, you can talk to The New York Times graphics director, Steve Duenes. Go ahead. I know you want to.

    Looking very dashing in that picture there, Steve.

  • IBM Visual Communications Lab and Stamen Design Are at the NYC MoMA

    Posted to Visualization

    Congratulations to two of my most favorite visualization / design groups - IBM Visual Communications Lab and Stamen Design - who officially now have their work featured at the Museum of Modern Art in New York. Really incredible and well deserved.

    From this past Sunday to May 12, VCL's History Flow and Thinking Machine and Stamen's Cabspotting are featured in Design and the Elastic Mind.

    Design and the Elastic Mind

    The exhibition will highlight examples of successful translation of disruptive innovation, examples based on ongoing research, as well as reflections on the future responsibilities of design. Of particular interest will be the exploration of the relationship between design and science and the approach to scale. The exhibition will include objects, projects, and concepts offered by teams of designers, scientists, and engineers from all over the world, ranging from the nanoscale to the cosmological scale. The objects range from nanodevices to vehicles, from appliances to interfaces, and from pragmatic solutions for everyday use to provocative ideas meant to influence our future choices.

     Continue Reading 

  • Welcome to FlowingData, Boing Boing Readers

    Posted to Site News

    Welcome, Boing Boing readers. If you're new to FlowingData, you might want to read the about page to find out what FlowingData is all about. Essentially, I like to cover how people from different fields -- statistics, computer science, design, etc -- are using data to explore ourselves and the environment around us, mainly with data visualization.

    Oftentimes, data (or information) just gets overlooked or misinterpreted. We should work on changing that, and I think that data visualization is the way to make people see.

    Feel free to take a look at the archive or some of the more popular posts listed on the sidebar, and of course, if you like what you see, you can stay updated by subscribing to the feed.

    Thanks to Boing Boing for linking here and to Mike for making the suggestion!

  • Ebb and Flow of Box Office Receipts Over Past 20 Years

    Posted to Infographics

    This graphic from The New York Times kind of caught me off guard. I guess we're starting to gain a bit more faith in the public's ability to understand visualization (yay). The graphic was created by the usual suspects -- Matthew Bloch, Shan Carter and Amanda Cox -- and as usual, great work.
     Continue Reading 

  • Weekend Minis – Online Video, Visualization Types, Poverty, Digital Life

    Posted to Miscellaneous

    Weekend Treats

    A Tale of Two Types of Visualization and Much Confusion - Depending on who you talk to, data visualization can have very different meanings.

    It's Official. People Love Online Videos. Billions Of ‘Em. - 141 million unique viewers watched 10,156,199,000 videos this past December.

    Global Poverty Maps - Explores the political economy of aid, examining the contributions made by developed country governments and their role in development.

    My Trails Network - Inventing new ways to manage your digital life.

  • Live Webcast of Journalism 3G: A Symposium on Computation + Journalism

    Posted to Site News

    Computation+JournalismBy now, if everything has gone to plan, I should have gone on my short 2-hour flight and be at Georgia Tech in Atlanta listening to the welcoming address at Journalism 3G: The Future of Technology in the Field. All 230 seats were sold out, so it should be pretty interesting. If you're not at the event and would like to listen in (and watch), lucky for you the talks will be webcast live (that is, if all the tech works, which we all know never seems to go exactly as planned).

    UPDATE: Things did not go according to plan. Security took an abnormally long time, and I missed my flight by 5 minutes. My only option was to rebook for an extra $1,000 (thieves!). That flight would have gotten me into Atlanta around midnight, which just wasn't worth it. So I'm going to miss the symposium. So disappointed. At least I can still watch the webcast.

  • Rambo Kill Counts From Parts I, II, III, and IV

    Posted to Data Sources

    I don't think I've seen a single Rambo all the way through nor do I remember the premise of any of the movies, but I still found these kill counts amusing. Notice the near doubling of deaths each sequel. Yo, Adrian!!! Yeah, I know, wrong movie, but come on, is there really a difference?

    Here's a graph showing kill counts (mostly for my own entertainment):

    Rambo Kill Counts Graph

    Mr. Rambo may have gotten more violent in the latest installment, but it looks like he also grew more modest.

    [via Geekstir]

  • What Impact Does Our Country Have on Climate Change?

    Posted to Mapping

    BreathingEarth is an animated map that represents death rate data from September 2005 and birth rate data from August 2006 compiled by the World Factbook and 2002 carbon dioxide emission rates from the United Nations. The frying sound is kind of a nice touch.

    Pretty But Not Very Useful

    I think that BreathingEarth, like many maps before it, communicates an important point (in this case, CO2 emissions), but doesn't particularly do a good job of showing it. I watched BreathingEarth for a few minutes, but I didn't get much of a sense of what country had more deaths, had more births, or created more CO2 emissions. It's one those projects when a statistician could have lent a useful hand.

    So to answer the question - What Impact Does Our Country Have on Climate Change? - I'm not sure. It is a pretty map though.

  • Join the FlowingData Facebook Group

    Posted to Site News

    FacebookI just created a FlowingData Facebook group where (I hope) readers can discuss and post interesting goodies about data visualization and statistics. Honestly, I'm only half-expecting like two people to join, but hey, it's a start. I'm a Facebook addict, so I'll be checking it regularly whether anyone joins or not. Please do join though :). I'd like to know who's reading and what fun things you all are up to.

    P.S. On a completely unrelated note, on Hadley's request, you can now subscribe to the FlowingData comment feed.

  • Is an Animated Transition From a Scatter Plot to a Bar Graph Effective?

    Statistical graphics are kind of stuck in a static funk where you create a plot in R, Excel, or whatever, and you can't really interact with it. If you want another graphic, you manually create it. Hence, Jeffrey Heer and George G. Robertson investigated the benefits of using animation in statistical graphics.  Continue Reading 

  • Putting Analysis Online With StatCrunch and Covariable [Review]

    Posted to Online Applications, Reviews  |  Tags:

    StatCrunch and Covariable aim to put statistical analysis on the Web via a graphical user interface (GUI). The former is meant for students in an introduction to statistics course while the latter wants to be a little more; however, both have a lot in common. Here are my thoughts.

    Trying to Simplify Analysis With Toolbox

    ToolboxThrough undergrad and graduate school, I've always used R for analysis, so performing analyses through a GUI has always seemed a little strange to me. Although I suppose I don't really have any good reason to feel that way.

    I think the main difference between programmatic and clickety analysis is that when you're doing something programmatically, you need to know what method or tool you want to use before you actually use it.

    With a GUI, you tend to have a list of methods (e.g. ANOVA, multiple linear regression) in a menu and you just click on the one you want to use. It's kind of like a big toolbox of statistical tools that should make analysis easier (since it allows you to avoid all code), but I'm still a bit skeptical.
     Continue Reading 

  • Grandma, Thank You For Giving Us Something to Smile About

    Posted to Site News

    My grandma, Jane Yau, passed away a couple of weeks ago, and I attended her funeral this past weekend. It was tough at first seeing her laying there lifeless, because the last time I saw her was about 8 months ago, healthy and smiling. I had to walk away with eyes full of tears. I wondered how in the world I was going to deliver her eulogy.

    I went up again though and just looked at her for a long time. She was peaceful, almost like she was sleeping, and I felt this calm cover over me. My heart beat slowed and the sadness left. That was the effect my grandma always had on me.

    I'll miss you, grandma. I hope I can make you proud.

  • How to Read (and Use) a Box-and-Whisker Plot

    Posted to Statistical Visualization  |  Tags:

    Box-and-Whisker Plot LessonThe box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used to show the distribution of a dataset (at a glance). Think of the type of data you might use a histogram with, and the box-and-whisker (or box plot, for short) could probably be useful.

    The box plot, although very useful, seems to get lost in areas outside of Statistics, but I'm not sure why. It could be that people don't know about it or maybe are clueless on how to interpret it. In any case, here's how you read a box plot.

    Reading a Box-and-Whisker Plot

    Box-and-Whisker Plot ExplainedLet's say we ask 2,852 people (and they miraculously all respond) how many hamburgers they've consumed in the past week. We'll sort those responses from least to greatest and then graph them with our box-and-whisker.

    Take the top 50% of the group (1,426) who ate more hamburgers; they are represented by everything above the median (the white line). Those in the top 25% of hamburger eating (713) are shown by the top "whisker" and dots. Dots represent those who ate a lot more than normal or a lot less than normal (outliers). If more than one outlier ate the same number of hamburgers, dots are placed side by side.

    Find Skews in the Data

    The box-and-whisker of course shows you more than just four split groups. You can also see which way the data sways. For example, if there are more people who eat a lot of burgers than eat a few, the median is going to be higher or the top whisker could be longer than the bottom one. Basically, it gives you a good overview of the data's distribution.

    That's all there is to it, so the next time you're thinking of making a bar graph or a histogram, think about using Tukey's beloved box-and-whisker plot too.

    Want to learn more about making data graphics? Become a member.

  • Mapping Manhattan’s Skyscraper Districts Through Time

    Posted to Mapping

    Manhattan Timeformations looks like a series of interactive schematics from a video game, but really it's a computer model that allows you to look at the relationships between the developments of the lower Manhattan skyline and other urban factors like farms, urban renewal, subways, and commercial zones. The visualization provides different views in the form of the traditional 2-dimensional map views as well as rotations, fly-throughs, and layers.

    It's nice to step out of that Google mashup look every once in a while.

  • Spamology From Visualizar is Available for Exploration

    Posted to Data Art

    Spamology, by Irad Lee, was one of favorite projects at the Visualizar Workshop, and it's now available online for others to play with. I talked about Spamology a little bit when the showcase was officially opened in Madrid, but the piece wasn't online yet.
     Continue Reading 

  • Headed to Computational Journalism at Georgia Tech

    Posted to Site News

    Computation+JournalismI'm headed to Journalism 3G: The Future of Technology in the Field February 22-23.

    The spreadsheet, word processor, web browser, digital audio and video, blogs–each an example of the vaunted killer software application–have all become valuable, some would say essential, tools of journalism. Now Web 2.0 has forever altered the nature of software innovation, while at the very same time the news industry undergoes historic change. Those two points taken together mean one thing: Time lags which used to buffer innovations in computation from their inevitable impacts on newsrooms are poised to disappear. Who’s ready for this? We plan to see.

    Some of the participating organizations include Digg, The New York Times, and Reuters with some really interesting-looking panels over the two days:

    • Advances in News Gathering
    • Improving Journalism Workflow: Automation & Productivity
    • Social Computing and Journalism
    • Ubiquitous Journalism
    • Participant Journalism & Journalism Participation: Authoring and Interacting in New Media
    • Sensemaking & Visualization
    • Information Mashups: Aggregation, Syndication, and Web Services
    • 21st Century Editor in Chief

    Naturally, I'm most excited about Sensemaking & Visualization. Is anyone else planning on going?

  • A Lesson in Recycling Chartjunk as Junk Art

    Posted to Miscellaneous

    What is Data and Why Should We Care About It?This guest post is by Kaiser Fung, from Junk Charts and Data Matter. He answers my question - "What is data and why should we care about it?"

    Who's got more data? The largest retailer in the world or the largest library in the world?

    Walmart tends to over 500 terabytes of data (see here, here, etc.) while the Library of Congress, largest according to the Guinness Book of World Records, has a petty 20 terabytes, cowered by comparison.

    To hear it from data warehouse vendors, data mining academics, data savvy politicians, or data fixated citizens, Walmart versus the LOC is like New World versus Old World, the future versus the past, fast versus slow, wired versus tired.

    The more things change, the more they stay the same. The flood of data has not washed away these two age-old truisms.
     Continue Reading 

  • Understanding Data, Not Just the Realm of Scientists in Ivory Towers

    Posted to Miscellaneous

    What is Data and Why Should We Care About It?This guest post is by Hadley Wickham, a Statistics PhD candidate and a part of the GGobi team. He answers my question -- "What is data and why should we care about it?"

    For me, most data comes in the form of a data frame: a rectangular set of values with observations in rows and variables in columns. Most values are continuous (e.g. real numbers) or categorical (e.g. colours, treatments, subject ids), but are sometimes more esoteric (images, sounds, intervals). Each variable contains values of only one type and may also contain missing values. Missing values are particularly important for statisticians, and are often encoded as . or NA (encoding them as special numeric values, like 99, is generally a bad idea). Most data is "messy" and cleaning it up requires you to ensure that observations are in rows and variables in columns, as well as spending plenty of time to make sure that the values actually make sense (visualisation is really useful for this!).

    Data Helps Illuminate Patterns

    To me, caring about the message in data is the essence of science, where we perform some action on the world and record its response in our data. This isn't just the realm of scientists in ivory towers, but something that we do everyday, whether it's trying to understand the impact of a new marketing campaign, figuring out which house to buy or exploring why a new cancer drug isn't working. Recording and examining the data that matters not only supports rational decision making, but also reveals the unexpected and helps illuminate underlying patterns.

  • Comparing Roger Clemens to Hall of Fame Pitchers

    Posted to Statistics

    Andrew had some comments about the graphs on Freakonomics that showed a seemingly odd "change of fortune" for Roger Clemens.

    Roger Clemens - NYT

    You can see that Clemens almost followed an opposite pattern from all other pitchers in the league. As Andrew notes though, there seems to be a lot riding on the quadratic fit and average values when we know that Clemens has been anything but ordinary throughout his long career.

    Graphing Without Smoothing

    For fun, I tried graphing the ERA data for Clemens against the ERAs for the 16 most recent hall of fame pitchers (that I could get data for). My thinking was the hall-of-famer performances might be a better indicator of what should be "normal" for great pitchers. The results are a little less compelling. However, one thing to note is that most players who played past age 40 saw an increase in ERA while Clemens had a pretty significant improvement in ERA from age 40 to 43.

    Whether this is due to performance enhancing drugs or just a change in pitching strategy, coaching, or some other factor, I can't say. There's probably only a few people who can know for sure.

    Anyways, if anyone has a different take on the data, I'd love to hear it in the comments.