[via xkcd | Thanks, Justin]
FlowingData posts will slow down this holiday week. I'm going to be busy watching all the movies coming out this Christmas and eating a lot of food that will inevitably cause extended hours of sleep. I hope all of you get to do the same or something similar. Merry Christmas and a happy new year!
Regular posting will resume on January 1, 2009 to satisfy your data visualization needs. Try not to take it too hard, but if it's too much to handle, try the FlowingData archives. There's lots of good stuff in there.
Philip from infochimps posts the results of some heavy Twitter scraping. Data for 2.7 million users, 10 million tweets, and 58 million edges (i.e. connections between users) to satisfy your data hunger are available for download. I know a lot of you social network researchers will especially appreciate the big dataset, and best of all, Twitter gave Philip permssion to release. Yes, you could use the Twitter API, but isn't it better when someone does it for you?
Download the data here. The password is the Ramanujan taxicab number followed by the word
'kennedy' - all one word. Google is your friend, if that doesn't make sense.
It's hard to believe that another year has come and gone, but as I looked back on the FlowingData archives, it feels like ages since I wrote up some of these posts. I give you the most popular posts of 2008:
- 17 Ways to Visualize the Twitter Universe
- Winner of the Personal Visualization Project is...
- Watching the Growth of Walmart Across America, Interactive Edition
- 21 Ways to Visualize and Explore Your Email Inbox
- 12 Cool Visualizations to Explore Books
- Showing the Obama-Clinton Divide in Decision Tree Infographic
- 10 Largest Data Breaches Since 2000 - Millions Affected
- 23 Personal Tools to Learn More About Yourself
- Watch the Rise of Gasoline Retail Prices, 1993 - 2008
- 40 Essential Tools and Resources to Visualize Data
At the beginning of this year, on January 1, 2008, FlowingData had 126 subscribers. Compare that to the now... wow. Thanks again for sharing FlowingData, everyone. Thank you for the comments, the suggestions, contest entries, and forum topics. FlowingData is what it is because of its readers. Lastly, thank you to the FlowingData sponsors - , , and - who help me keep up with FlowingData's growth.
Here's to an exciting 2009.
During major events, people use their mobile phones to share their emotions: the euphoria of a football match in Spain or Romania, World Music Day in France, or Saint John's night in Poland. We want to share our excitement, so we call up our close friends and family. Urban Mobs allows us to see this activity in four major European cities - this "urban heartbeat" so to speak.
So when is someone going to do something for the United States?
I made a few updates to the FlowingData forums last night - the main being the ability to upload attachments. You've always been able to post images, but now you don't have to link to some remote server. That means it's much easier to share visualization with everyone now. It's also where I will host all future contests and giveaways. So go register now!
Oh, and one more thing - I added two new sections for Events and Finding a Job, so if you're looking for or want to publicize a workshop or conference; or if you're looking for work or looking for someone to work for you, that'd be the place to do it.
From the Forums
Tis the season for... infographics contests - Win $500 from GOOD Magazine or a free subscription to Chance Magazine [Thanks, Alberto].
UK Wired Article Help - johno is writing a feature for UK Wired Magazine (launching this spring) on lifetracking and wants to talk to those involved with self-tracking.
Associate in Research for Software Development - There's a data visualization position open at Duke University
Visualization software used... - A short discussion what people use to visualize data.
Data visualization continues to grow online and in the real world. It exists as masterful art pieces and amazingly useful analysis tools. In both cases though it brings data -- which is oftentimes cryptic -- to the masses and shows that data is more than a bucket of numbers. Data is interesting. As we collect more and more data about ourselves and our surroundings, the data and the visualization will only get more interesting. On that note, I give you FlowingData's picks for the top 5 data visualization projects of 2008. Visualizations were judged based on the use of data, aesthetics, overall effect on the visualization arena, and how well they told a story.
Honorable Mention: Wordle
Wordle, by Jonathan Feinberg, is the word cloud revamped. Wordle caught on like wildfire across the Web as people were putting in their RSS/Atom feeds, cutting and pasting snippets, and visualizing presidential speeches. It was even added to the Many Eyes visualization toolbox. It's hard to say what exactly made Wordle so popular, but I think it was a mix of randomness, aesthetics, and customization options.
5. Decision Tree: The Obama-Clinton Divide
Amanda Cox of The New York Times has a knack for creating excellent graphics. She managed to make regression trees interesting and spark some heated debate with her Obama-Clinton graphic. I would also like to note that Amanda has, yes, a statistics degree. Excuse me while I beam with pride.
4. Radiohead "House of Cards" Music Video
The Radiohead "House of Cards" music video was a bit different in that no cameras were used to "film" it. Instead, they used a rotating scanner and lasers to collect 3D data. What you see in the music video (below) is a visualization of all that data. The group behind the video also made the data freely available, which is icing on the cake. You don't have to be a Radiohead fan to appreciate that.
3. Last.fm and Movie Box Office Streamgraphs
Lee Byron was certainly on to something when he created Streamgraphs to visualize music listening history on last.fm. They are a variant of stacked graphs and an improvement on Havre et al.'s ThemeRiver in the way the baseline is chosen, layer ordering, and color choice. In February 2008, Amanda Cox (yes, again), Matthew Bloch, and Shan Carter of The New York Times, together with Lee, used a similar technique to show the ebb and flow of box office receipts for 7,500 movies over 21 years. Discussion burst out across the Web -- about the technique and what people were seeing in the data -- that I am convinced would not have come about if instead of a Streamgraph, they used say, a stacked bar chart.
Read more about the Streamgraph in Lee Byron and Martin Wattenberg's paper: Stacked Graphs - Geometry & Aesthetics.
2. I Want You to Want Me
I Want You to Want Me was commissioned by New York's Museum of Modern Art and created by Jonathan Harris and Sep Kamvar, who you probably know from past projects, We Feel Fine and Lovelines. The two are best known for the ability to tell stories with data, and it shows in IWYTWM, which explores the world of online dating. Individuals float in balloons hoping to find their match.
Here's the video, so you can more fully appreciate the work:
This blend of art, computer science, and mathematics is beautiful.
1. Britain From Above
When I first caught a glimpse of a clip from Britain from Above, I was immediately impressed, and it only left me wanting more. It was a special series on the BBC with beautiful visuals produced by 422 South. GPS traces from taxi cabs and airline flights scurried to locations; telephone communications glowed in the sky; ground lights twinkled as if the roles of sky and earth were switched; and internet traffic burst from computer to computer. With all that data on display, patterns emerged - zero air traffic in no-fly zones and taxis taking alternate routes to avoid heavy traffic.
There you have it - FlowingData's top 5 data visualizations for 2008. It's going to be interesting to see what comes out in 2009. Now it's your turn. What's on your list?
The great thing about being a graduate student is that you get to experiment. Anita Lillie, from the MIT Media Lab, demos MusicBox, her master's thesis project that visualizes and maps music collections based on songs' acoustic features. As might be expected, she uses principle components analysis to arrange songs. Each dot represents a song. If two songs sound similar, they should appear close to each other. As an example, the above dots are colored by music genre. Rap songs appear on the left in red while classical appears on the right.
As an aside, Anita's project reminds me a lot of a GGobi demo by Di Cook. She used the tuneR library in R to quantify Beatles songs and then used GGobi to do something similar to MusicBox. R and GGobi are free to use, so if you're interested in visualizing your own music library, you might want to check them out.
In his latest data sculptures, Andreas Nicolas Fischer places data visualization in a physical space when we're so used to seeing it on a computer monitor. Above is a piece of two layers - the bottom is gross domestic product for 2007 (made of plywood) and the top maps "the derivatives volume, alloted to the coordinates of the countries on a map." I don't know what derivatives volume and I probably should, but I'm too lazy to look it up (a lil' help please?).
Most of us create graphs with actual graphing software. Maybe it's Microsoft Excel. Maybe it's R. Whatever it is though it's usually specialized for analysis. What if you want to make a graphic for a publication or a presentation that's polished and fully customized? Adobe Illustrator gives you the control you need to do this. It's not graphing software. It's illustration software, but once you get the hang of things, Adobe Illustrator can be a valuable tool in your visualization arsenal.
Photo by Darwin Bell
It happened again. I told someone I study statistics. He told me that he hated statistics in college. It doesn't annoy me like it used to - I've come to expect it - but why do so many people have this beef with stat? Is it really that boring? Confusing? What is it about statistics that turns people off? So I reach out to all of you:
What is it that makes statistics so uninteresting?
I'm going to assume that the icky factor is less for FlowingData readers (obviously), but still, I implore you - tell me why statistics sucks. I must know.
I thought this was a joke when I first read it, but scientists from Japanâ€™s ATR Computational Neuroscience Laboratories have developed software that can map brain activity to an image. Subjects were shown letters from the word neuron and images were reconstructed and displayed on a computer screen.
A spokesman at ATR Computational Neuroscience Laboratories said: "It was the first time in the world that it was possible to visualise what people see directly from the brain activity.
"By applying this technology, it may become possible to record and replay subjective images that people perceive like dreams." The scientists, lead by chief researcher Yukiyaso Kamitani, focused on the image recognition procedures in the retina of the human eye.
It is while looking at an object that the eye's retina is able to recognise an image, which is subsequently converted into electrical signals sent into the brain's visual cortex.
The research investigated how electrical signals are captured and reconstructed into images, according to the study, which will be published in the US journal Neuron.
I'm not sure how much brain activity from the retina has to do with activity during dreams, but it's interesting nevertheless (although I am sure - like all interesting science - it is slightly hyped by the media).
One of the more painful parts of analysis or visualization is that you have to get the data in a proper format. Real data almost never comes how you want it. Magic/Replace from DabbleDB lets you reformat data via their spreadsheet interface and a few sprinkles of magic. The solution is really quite elegant.
You copy and paste CSV or TSV from a spreadsheet and submit. You then see a column editor and a preview window. This is where the magic happens. In the column editor, you can edit a column so that it fits a certain format and Magic/Replace will show you a preview of what the others will look like. For example, say you have a column of phone numbers and they're in the (555) 555-5555 format, but what you really want is 555-555-5555. Change a single row, and voila, Magic/Replace does the rest. It really is "data cleanup for everyone" - not that the data were dirty to begin with.
I ran a contest last week to improve a graph from Swivel that showed immigration to the United States. FlowingData readers sent in lots of different approaches (that took me forever to get organized for this post), and I still stand by my statement that there's always more than one way to skin a dataset.
I got an email from Harald asking, "How does the job market for DV developers work?" I find this question, or some variation of it, in my inbox every now and then, so I thought I'd give it a shot. I am after all a graduate student who will graduate eventually, so let's take a look at some of the options. I'd like to expand on the question though, and not just focus on developers. What's the job market like for anyone who wants to do data visualization for a living?
In the News
Infographics in the news have been commonplace for a while now. Maps, charts, graphs, plots, etc. are in the newspaper every day, and as news on the Web continues to expand, so do the types of interactive visualizations. In fact The New York Times has its own graphics department as well as a group dedicated to online interactives. It's only a matter of time before the other big news organizations follow suit (unless they go bankrupt first).
There are a lot of data visualization specialists who masquerade as graphic designers. As a result, there are lots of design studios that do data visualization (although they don't focus on just that alone) that do work for the Web or a slew of other things like company branding, physical installations, or simply art pieces. I can only think of a handful of design groups that are specifically known for data visualization. Either way though, most stuff that the studios push out are more on the artistic end of things, naturally.
Analytics is on the opposite site of the spectrum. It's all about decision-making. Businesses are starting to rack up terabytes of data per day, but aren't sure what to do with it. Basic Microsoft Excel skills will only take you so far. You'll also hear about dashboards pretty often. Think lots of graphs and lots of charts and lots of data which takes a certain statistical expertise to manage effectively.
While the analytics groups tend to be more about application of existing visualization techniques, there are research labs that primarily think of ways to improve the existing or new representations of data. They design, experiment, analyze, and then write papers. It's like getting paid to be a graduate student, I imagine. Visualization software companies not dissimilar to FlowingData sponsors might also be bundled into this group.
I visited AT&T research labs a few months ago, and there was a small group focused on the best way to show network graphs. The IBM Visual Communications Lab does a lot with social data analysis.
This one is sort of obvious I guess. Academics is similar to working in a research lab, and really, a lot of academic groups call themselves a research lab anyways. Often you'll see collaboration between the two. The only difference is, uh, professors have to put up with graduate students like me. Tough nookies.
A lot of businesses aren't looking for a full-time visualization person. They just need some help with things here and there. There are also a lot of online developments that can benefit from having some visualization. Some have already got developers, but want some aesthetics, while others might have a specific data set that they want realized - might be just for show or actually something quantitative. There's certainly a wide variety out there.
What About You?
That covers a good bit, but I'm almost certain that I've missed something. If your expertise is data visualization, what do you do for money? I, among many others, would be interested to know in the comments.
The Washington Post recently put up TimeSpace: World, which is an interactive map that shows articles, video, photos, and commentary as they happen around the world (through the Washington Post's eyes). Similar to Trulia Snapshot, by Stamen Design, news items are arranged with a force-directed graph and can be filtered by time with a timeline at the bottom. Adjust time range to find news stories from a given time of day. You get a breakdown of number of images, articles, etc. Photos seem to dominate. Here is the embedded version (which seems a little buggy):
One thing that I really liked about Trulia Snapshot, which isn't included as a part of TimeSpace: World is a play button. It'd be like watching the news unfold over time - or even better, make TimeSpace self-updating. Maybe in the next iteration.
FlowingData will be transferred to a bigger and better server tonight between 10pm and 2am, during which the site will be down for about 30 minutes. Hopefully everything goes as planned, but in case you don't hear from me tomorrow, you'll know why. Keep your fingers crossed.
UUorld (pronounced "world") is a 4-dimensional mapping tool that lets you explore geographic data - the fourth dimension being time. The interface will remind you a bit of Google Earth with the map, pan, zoom, etc, however, UUorld isn't trying to replace Google Earth. In fact, it'll probably be better if you use it with Google Earth. Think of it as another tool to add to your box of mapping toys.
UUorld's focus is on finding trends over space and time. Load your own data or import data from UUorld's data portal, and then play it out over time. Spatial boundaries undulate up and down as land masses look a bit like skyscrapers. Color and boundary lines are customizable. When you're satisfied with the results, record it as video or export as KML, and then import into Google Earth or whatever else you want.
How effective is this method of visualization though? There's the usual argument of area perception, but does color-coding and vertical dimension make up for that? Discuss amongst yourselves.