• It’s been a little over six months since I put up my first FlowingData post about creating effective visualization. Going through the archive, I’m amazed by how much this blog has developed and more importantly, by the people I’ve found who have many of the same academic interests that I do. For that, I’m extremely grateful.

    I’m also pretty impressed with how consistent I’ve been with the posts, because to be honest, I wasn’t sure if I’d be able to keep it up when I first started. Had I known about all of the interesting data visualization work and research going on, I wouldn’t have had such sour thoughts. Now I know better, and I hope others are benefiting.

    So here we are — the top 10 most viewed posts for 2007:

    1. Three Designers, a Statistician, and Migration Inflows Data
    2. What is the Best Way to Learn Flash/Actionscript for Data Visualization?
    3. News Flowing Through Moveable Type at The New York Times Building
    4. Visualizar Showcase Officially Opened at Medialab
    5. Yahoo Charts Control Library Now Available
    6. Sharing Personal Data to Push Social Data Analysis
    7. Netflix Prize Dataset Visualization
    8. 100 Reasons You Should Be Interested in, Want to Share, and Get Excited About Data
    9. Bars as an Alternative to Bubble Charts
    10. Use Flare Visualization Toolkit to Build Interactive Viz for the Web

    Happy new year! See you in 2008.

  • Zachary Pousman et al. write in their paper Casual Information Visualization: Depictions of Data in Everyday Life

    Information visualization has often focused on providing deep insight for expert user populations and on techniques for amplifying cognition through complicated interactive visual models. This paper proposes a new subdomain for infovis research that complements the focus on analytic tasks and expert use. Instead of work-related and analytically driven infovis, we propose Casual Information Visualization (or Casual Infovis) as a complement to more traditional infovis domains. Traditional infovis systems, techniques, and methods do not easily lend themselves to the broad range of user populations, from expert to novices, or from work tasks to more everyday situations.

    Read More

  • I stumbled across this article about Aili Malm, a GIS specialist (I think) who uses social network analysis to find the most probably locations of organized crime.

    “I look at where organized crime groups are located and I study how these groups are linked to one another,” she explained. “I can chart their cell phone use or e-mail communication or with whom they co-offend. Based on these connections, I try to isolate the important players. Then I take the social and make it spatial. I look at individuals important to the criminal network and map where they live and where they commit their crimes.”

    It’s just like that show Numb3rs on CBS. Albeit, math and statistics is a bit glorified on the show, but hey, at least it’s loosely based on reality.

  • It’s a Wonderful LifeMerry Christmas Bedford Falls! Merry Christmas you old Savings and Loan! Merry Christmas Mr. Potter! Merry Christmas! Gosh, I love that movie. I watch it every year, and it never gets old. That scene where he comes home so happy to be alive, his children are hanging off of him, and he’s embracing his wife… wonderful.

    On that note, posts here on FlowingData will be sparse through January 1 as I buckle down and focus on relaxing and having fun. I can’t wait to see what Santa brings me. I am going to make sure I leave him extra cookies and a big glass of milk. I suggest you do the same. Santa wasn’t so nice last year. He gave me a pair of used socks, a half-eaten candy cane, and a note that asked, “Where are my cookies and milk?” I am sorry Santa. It will never happen again.

    Merry Christmas and have a happy new year!

  • Baseball (or all sports for that matter) statistics are all over the place. You can easily find data for pretty much whatever sport and for whichever player you want at any given time. The problem is that if you want to download all of the data at once, you usually have to write a script and do some parsing. Who wants to do that? I don’t.
    Read More

  • Jonathan Corum and Farhana Hossain created a network visualization that shows readers who has spoken about who in presidential debates. Scroll over each candidate name to isolate the connections; important/interesting points are highlighted. Candidates are colored blue and red for their respective political parties.

    There are three main things that this thing shows — who has spoken about who (lines), who has been talking the most (circle segments), and finally, attention by party (red and blue). In usual fashion, The New York Times churns out another beautiful graphic. Not only is the visualization attractive, but unlike so many network diagrams before it, this graphic is also useful and informative.

  • Noah Kalina took a picture of himself every day for six years (and still going); above is all of the pictures put together into a time lapse. Now that’s diligence.

    When I was collecting my own step data with a pedometer, I would constantly forget, and eventually, I just got bored with it. I think my interest faded because collecting one number per day wasn’t satisfying enough. This on the other hand, seems more personal, it takes a little less effort, and it only takes a second to take a picture, and like they say, a picture is worth a thousand words. String them together and you get a story.
    Read More

  • YouTube (or should I say Google), released their visualization for related videos. It’s essentially a ball and stick graph without the sticks. The above is a screenshot of the videos related to Marty McFly playing Johnny B. Goode in Back to the Future, the greatest movie of all time.

    Some of the video bubbles that circle around the Marty clip are the same as those in the “Related Videos” section of the usual page while others are not. Place the cursor over a bubble for about two seconds, and related videos for the one you have your mouse over will bubble up.

    I’m not sure if the distance between the bubbles have to do with similarity level. So far it seems not, because I’ve refreshed the Marty visualization a few times and the bubbles’ initial positions have always been different.

    Read More

  • Jobs by Happiness from Time MagazineTime Magazine’s multimedia section has a fun, little piece showing some statistics for a day in the life of the average American. There’s some mapping for average commute time, annual traffic delays, and city population shifts. I’m not a huge fan of the map on the third dimension, but oh well.

    There’s also a simple grid ranking jobs by level of happiness. Priests are apparently the happiest with gas station attendants at the very bottom. Poor gas station attendants. I guess I might classify myself as a computer programmer which is somewhere in between waiters and dress makers. Maybe I should consider a change in focus. Although, I could also consider myself an engineer which is towards the top of the rankings. Alright, I’m an engineer. The title of “computer programmer” has a weird stigma attached to it anyways.

  • United Nations and Migration InformationFor our humanflows visualization, we used data from the United Nations Common Database and the Migration Information Source. The great thing about these types of sources is that they are publicly available so that everyone gets to have fun with the data. The downside is that the data is accessible via a user interface that often makes it a chore to get all of the data.

    Hence, to save you some time, you can now download the migration database that we used. I don’t see any reason why you have to go through the whole data importing process when we already did it. Enjoy!

    Disclaimer: Keep in mind that the data is from the United Nations and Migration Information Source, so you should refer to the two sites for any documentation. In a nutshell, the inflows table is from MIS and the rest is from United Nations. If you’re looking for more, you might also want to check out OECD. I really wanted to use their data at the time, but was having trouble accessing it from Spain.

  • You used to only be able to get a small thumbnail to “share” the visualization you found or created on Many Eyes (well, outside of taking screenshots and emailing), but Many Eyes just announced the embed feature. In the same way you can embed YouTube videos, you can embed Many Eyes visualizations. This is a really big step forward, because users can share what they’ve found or seen more easily and as a result, it’s more likely others will become drawn in. You know, it’s that whole viral marketing thing.

    Just one weird thing. I had to change the single quotes in the copy and paste snippet to double quotes for the embedding to work, because my version (or all versions?) of WordPress escapes the single quotes.

  • Studies on names and performance seem to be all the rage right now:

    We like our names. And that preference can have negative repercussions, according to research published last month. Major leaguers with “K” initials tend to strike out more, perhaps reflecting the batters’ unconscious pull to appear next to the strikeout symbol “K” on scorecards. Students with initials C and D have worse grades than the A’s and B’s and everyone else, gravitating toward the grades their initials represent.

    Of course, I’m a little skeptical about all of these studies, and with tiny effects like 0.02, these studies probably deserve it. In any case, they’re still interesting to read about. I wonder how one could get his hands on such data. The data’s probably just an email away, but in my current half-asleep stooper, I’ll leave that for another time. I’m sure it’d be really interesting to play with though.

    Have you read Freakonomics? If you have, all of these name studies remind me of that chapter about the two brothers named Winner and Loser. If you haven’t read the book, uh, there’s a chapter on two brothers named Winner and Loser.

  • Most are familiar with the Netflix Prize. If you’re not, Netflix has offered a one million dollar prize to whoever improves their movie recommendation by a certain amount. It’s been going on for a little over a year with still no grand prize winner. The dataset is 100 million ratings.

    The above is a visualization of the Netflix dataset. Each dot represents a movie, and the closer two dots are the more similar the two corresponding movies are based on Netflix ratings. I’m guessing the orientation of the dots was decided by some variant of multidimensional scaling.

    It’s kind of fun to scroll over the clusters. Like in the bottom right we see Babylon 5, Buffy the Vampire Slayer, Alias, and Battlestar Galactica clumped together. The giant blob in the middle, however, is pretty useless; it’d probably benefit from some zoom functionality.

    The Need to Explore

    I’m kind of surprised that I haven’t seen more Netflix visualizations like this (or ones better than this), because I’m pretty sure it would help see some relationships that typical analysis won’t provide. I was browsing the forum and saw someone ask if others had had success loading the 100 million observation dataset into R. Silly undergrad.

    A computer scientist, designer, and statistician walk into a bar; they discuss how they would boost the Netflix recommendation system. The punchline is that they win a million dollars, but I’m not sure what happens in between.

  • The Visualizar Showcase is officially open and ready for public viewing, so if you’re in Madrid (and I’m about 80% sure you will be) from now until January 5, 2008 check out the projects spawned from two weeks of hard work. You can find a complete list of the projects at the Visualizar website, but here are a few of my favorites in no particular order.

    Mail Garden

    Mail Garden Poster

    Mail Garden, from Kjen Wilkens, explores emails under a garden metaphor with the implication that our email is in someway living (like all data). In the visualization, emails exist as plants and as you scroll over you can read each email. The best part of of Mail Garden though is probably when you’re not using it. When the system is idle, you can watch your plants (your emails) gently sway back and forth in the wind.

    TweetPad

    TweetPad presentation

    As if Twitter weren’t playful enough, TweetPad, by Elie Zananiri, is a visualization that lets you playfully explore the live Twitter feed. Elie’s main interest was in word interaction, and you can see that clearly in the TweetPad. Move the cursor clockwise for synonyms, back and forth to shuffle words, and counter clockwise to revert back to the original tweet all the while the Twitter feed is coming at you live.

    Spamology

    Spamology Presentation

    This visualization, as you might have guessed, explores one of the most popular canned meats in the world. No, just kidding. Spamology, by Irad Lee, explores email spam. The visualization is nice as you explore the small and giant buildings of spam, but it’s the sound accompaniment that really makes it. Sound corresponds to the height of each spam building. Usually, pieces like this end up sounding like noise, but this was more like beautiful music.

    Now before I cover every work, which I’m a little tempted to do, I’ll stop here. If you happen to be in Madrid, Spain, go check it out. If you read this blog, you’re more than likely to enjoy the projects on display at the Medialab… or you can watch it on the news. Visualizar was also featured on some news show in Madrid. Be patient. The segment on the workshop comes some time around ten and a half minutes.

  • Yahoo: Look Google, I’ve got a Flash charts API now. I make it easier for people to plot their data, and look, pretty colors.

    Google: So what. Look what I’ve got. I have URL-based chart creation with fun, cartoon-ish Google colors. My API is way easier, and plus, since I’m Google, everyone will use my API and not yours.

    Y: Why are you so mean to me? We both have two O’s in our name. Can’t we be friends?

    G: No. That’s right, you heard me. I’m better. Now kiss my feet.

    Sigh, poor Yahoo. Right after Yahoo released their flash-based charting API, Google proudly announces a super simple charting API of their own. The idea is very straightforward. It all starts with the URL http://chart.apis.google.com/chart and from there

    1. Add parameters to URL
    2. Link to URL as an image

    That’s it.

    For example, this URL

    http://chart.apis.google.com/chart?chs=520×225&chd=s:helloWorld&cht=lc&chxt=x,y&
    chxl=0:|Mar|Apr|May|June|July|1:||50+Kb

    gives you

    You have the usual options of line, bar, pie, venn, and scatter; and you can change the colors, labels, size, etc.

    With all the charting available, could this be a sign that data is becoming more popular?

    [via Blogoscoped]

  • Yahoo User Interface 2.4.0 was recently released which includes the new YUI Charts Control.

    Josh Tynjala of the Yahoo! Flash Platform team contributes the new YUI Charts Control, a hybrid JavaScript/Flash component that supports bar, line, and pie charts. The Charts Control draws data from the same DataSource Utility that underpins the YUI DataTable Control, making it possible to do combined chart/table visualizations. The Charts Control accepts CSS style information, allowing you to skin the chart itself without touching the underlying .swf file. But if you do want to dig into the Flash side of this project, you can get full access to those assets on the ASTRA site.

    What does this mean? It means that we’re probably going to see a lot more hack-ish looking charts online (example above); but we might also see some nice-looking charts since it seems like they’re potentially customizable. In any case, it’s good to see this. There’s some cruddy Flash-based chart libraries that people are actually charging money for. This free and open library should have some positive effects.

  • I made a few tweaks and our humanflows visualization prototypes are now online. There’s a bit of information on how humanflows came about, who was involved, and a day-by-day recap of the design process. Once you get to the prototypes section, give the applets a few seconds to load and hopefully you’re not disappointed. The interaction is pretty intuitive. All you have to do is click and hold to browse the flow lines and the map. Also, if you can, go full screen on your browser. It looks much better that way (and how it was intended to be shown).

    Again, I’d like to thank Miguel, Iman, and Monica for making my trip to Spain and the Visualizar workshop a memorable experience. Thank you!

  • The New York Times recently put up a cool data exploration tool to sift through the transcript of the most recent Republican debate. They call it the transcript analyzer. There are three key features:

    1. View where candidates put in their two cents indicated by the blue, highlighted rectangles
    2. Read the actual chunks of transcript for each block
    3. Search the transcript to see when specific words and phrases were used indicated by the smaller gray highlighted rectangles

    My particular favorite is the search feature because it really allows readers to dig into the transcript or a reader can find out which candidate is (or isn’t) talking about his or her point of interest and when in the debate the topic was discussed. The intuitive text scrolling is pretty awesome too. Good job, New York Times!

    [via Jon Udell]

  • After two weeks at Visualizar, I’m back in the United States. It’s good to be back. I don’t know how many people know this (because I certainly didn’t), but the people in Madrid (or all of Spain?) eat a ridiculous number of sandwiches. I spoke to a couple of locals who said it’s pretty common to eat two sandwiches a day every day. I’m all sandwiched out.

    Anyways, the Visualizar symposium / workshop was a lot of fun, really interesting, and I ended up learning a lot more than I expected from some incredibly talented people. During my two weeks, I had the opportunity to work with designers Miguel Cabanzo, Iman Moradi, and Monica Sanchez and we managed to build a visualization framework that shows migration data with economic indicators. We call the piece humanflows.

    Human Flows, the Piece

    humanflows Poster by Miguel Cabanzo

    I just tried putting humanflows online, but of course it’s not working on my server right now (because all computers are against me), so I settled for a couple of screencasts. You’ll just have to take my word for it that the whole thing came together really nicely with a kiosk-looking type setup and a designer’s touch (three of them, actually). The visualization itself was done in Processing.

    Here’s the first one that just shows the flows. Right off bat, you can see the huge rush to the United States (especially immigrants from Mexico).

    This one shows the flows with unemployment rate.

    We also did one with GDP, but you get the idea.

    Of course, now that we have a framework, there’s so many other things that I can think of adding. Functionality like specific country selection and the ability to browse through other indicators would really allow some serious data exploration and since we were working with data form the United Nations Common Database, which has a hundreds of publicly available datasets, there’s a lot to work with.

    So there it is. Humanflows.

    Through the development process, I learned a lot about what I can do with Processing as well as gained an entirely different perspective on data visualization — a designer’s perspective. Simple concepts like color and more complex ideas like how to approach a large dataset are some of the things that I learned that I think are important for statisticians and the more technically-involved data people to know. I’ll cover that stuff in later posts though.

    For now, I’d appreciate any comments on our visualization and any ideas on how to improve it. How would you visualize migration data?

  • I feel like it’s been forever since my last post, so I just wanted to let everyone know that I am not dead.

    It’s the last few days here at Visualizar so I’ve got a couple of late nights ahead to make sure we get our project done, and on Wednesday, we set things up for the one-month exhibition. That should be fun. It’ll be especially nice to see everyone else’s work out on display.

    The most interesting part about this workshop has probably been working and talking to designers about data visualization. I’m a statistician. Everyone else is a designer of some sort. With a statistics background and just coming off my New York Times internship, it felt really strange for the first week to go from the very literal and straightforward representation of data to the artsy, metaphorical data visualizations.

    The defining moment — when I saw a huge difference between designers and statisticians’ views on visualization — was what followed after a talk from someone from the GapMinder foundation.

    I’ll get into all of this stuff I’ve learned once I return to the lovely United States of America. In the meantime though, there was short blurb about the Visualizar workshop on We Make Money Not Art. There’s a picture of my back. I’m famous.

    Oh, and if you’re really bored, the MediaLab has a Flickr stream. They’ve been taking tons of photos.