I'm sure a lot of you love The Beatles. I'm not a huge fan myself, but for those who are, you will love these graphics from designer Michael Deal.
Today is Data Privacy Day 2010, apparently. Was there a Data Privacy Day 2009? I dunno. To be honest, I didn't even care all that much about data privacy a couple years ago.
But it's grown in importance as everything goes digital (and Google acquires every business under the sun).
Take a moment and think about what Google knows about you. Correspondence and contacts via email, schedule via calendar, interest via feed reader, purchases via Checkout, and most importantly your day-to-day via search. How do you feel about a single company knowing that much about you? Don't you want to know how they use all that data and more importantly, how they protect it?
(Thanks, Michael for the idea)
Since you'll be trying every single drink recipe in the engineer's guide this weekend, you're most likely going to drop some food on the ground. Consult this flowchart to decide whether to eat it. Results may vary by individual.
Food on the ground, food on the ground. Looking like a fool with your food on the ground.
Seeing as the weekend is just about here, I'm sure many of you can find a use for this guide. It's drink recipes hand-drawn like schematics to some circuitry system. I like how color wasn't an option, so instead they used 42 stripe and dot patterns to differentiate ingredients.
See the full version here [pdf].
My sister sent this one along, but I couldn't find the original source. Anyone know?
In case you don't know what a heatmap is, it's basically a table that has colors in place of numbers. Colors correspond to the level of the measurement. Each column can be a different metric like above, or it can be all the same like this one. It's useful for finding highs and lows and sometimes, patterns.
On to the tutorial.
Step 0. Download R
We're going to use R for this. It's a statistical computing language and environment, and it's free. Get it for Windows, Mac, or Linux. It's a simple one-click install for Windows and Mac. I've never tried Linux.
Did you download and install R? Okay, let's move on.
Step 1. Load the data
Like all visualization, you should start with the data. No data? No visualization for you.
For this tutorial, we'll use NBA basketball statistics from last season that I downloaded from databaseBasketball. I've made it available here as a CSV file. You don't have to download it though. R can do it for you.
I'm assuming you started R already. You should see a blank window.
Initial R window when you open it. Exciting, I know.
Now we'll load the data using
nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep=",")
We've read a CSV file from a URL and specified the field separator as a comma. The data is stored in
nbain the window, and you can see the data.
What the data looks like when you load it into R
Step 2. Sort data
The data is sorted by points per game, greatest to least. Let's make it the other way around so that it's least to greatest.
nba <- nba[order(nba$PTS),]
We could just as easily chosen to order by assists, blocks, etc.
Step 3. Prepare data
As is, the column names match the CSV file's header. That's what we want.
But we also want to name the rows by player name instead of row number, so type this in the window:
row.names(nba) <- nba$Name
Now the rows are named by player, and we don't need the first column anymore so we'll get rid of it:
nba <- nba[,2:20]
Step 4. Prepare data, again
Are you noticing something here? It's important to note that a lot of visualization involves gathering and preparing data. Rarely, do you get data exactly how you need it, so you should expect to do some data munging before the visuals. Anyways, moving on.
The data was loaded into a data frame, but it has to be a data matrix to make your heatmap. The difference between a frame and a matrix is not important for this tutorial. You just need to know how to change it.
nba_matrix <- data.matrix(nba)
Step 5. Make a heatmap
It's time for the finale. In just one line of code, build the heatmap (remove the line break):
nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(5,10))
You should get a heatmap that looks something like this:
Default cyan to purple heatmap
Step 6. Color selection
Maybe you want a different color scheme. Just change the argument to
col, which is
cm.colors(256)in the line of code we just executed. Type
?cm.colorsfor help on what colors R offers. For example, you could use more heat-looking colors:
nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c(5,10))
Changing to heat colors with the
For the heatmap at the beginning of this post, I used the RColorBrewer library. Really, you can choose any color scheme you want. The
colargument accepts any vector of hexidecimal-coded colors.
Step 7. Clean it up - optional
If you're using the heatmap to simply see what your data looks like, you can probably stop. But if it's for a report or presentation, you'll probably want to clean it up. You can fuss around with the options in R or you can save the graphic as a PDF and then import it into your favorite illustration software.
I personally use Adobe Illustrator, but you might prefer Inkscape, the open source (free) solution. Illustrator is kind of expensive, but you can probably find an old version on the cheap. I still use CS2. Adobe's up to CS4 already.
For the final basketball graphic, I used a blue color scheme from RColorBrewer and then lightened the blue shades, added white border, changed the font, and organized the labels in Illustrator. Voila.
Updated heatmap in Illustrator with clearer labels and a blue-white color scale
Rinse and repeat to use with your own data. Have fun heatmapping.
For more on custom heat maps to visualize your data, check out the members-only tutorial.
Update: I had scheduled this post for next week, but apparently, Data.gov.uk launched today. The site isn't loading for me right now though. I guess they weren't prepared for traffic.
Data.gov, a catalog of US data, launched last year. Now it's the UK's turn. Well, not yet. But soon. Data.gov.uk is still under lock and key, but it has granted access to some developers. Ito Labs, the group behind mapping a year of OpenStreetMap edits posted screenshots of their maps that show vehicle counts (above).
Here are some comparison maps between 2001 and 2008, by vehicle type.
I'm admittedly not very good with historical precedent, but I think we can all agree it's important to know about the work those have done before us. It makes your own work better and lets you appreciate what others do more (or less).
Thank you, sponsors. I wouldn't be able to do what I do on this blog without you. It seems like FlowingData is growing faster every month, and you guys make that possible.
Check out what these fine groups have to offer. They help you understand your data:
Tableau Software – Data exploration and visual analytics in an easy-to-use analysis tool.
InstantAtlas – Create and present compelling data reports on geographic maps.
NetCharts – Agile Performance Dashboarding™ for business users.
Xcelsius Engage – Create insightful and engaging dashboards from any data source with point-and-click ease.
Business Intelligence – Visual data analysis made easy. Try 30 days for free.
FusionCharts – Convert all your boring data to stunning charts. Download your free trial now.
Xcelsius Present – Transform spreadsheets into professional, interactive presentations.
Email me at nathan [at] flowingdata [dot] com if you'd like to sponsor FlowingData, and I'll send you the details.
In 1903, Crayola had eight colors in its standard package. Today, there are 120, along with special packs like Gem Tones and Silver Swhirls. What happened? Above, from Weather Sealed, shows the growing color selection (and a few color retirements) in the standard package from 1903 to now.
In 2101, Crayola will hit a color peak and revert to a simpler time. The standard pack will have just two colors: black and Tickle Me Pink (#FC89AC).
[via Waxy Links]
It's funny how data is finding it's way into everyday objects. There was jewelry a few months ago and coins last month. Now we've got this experiment with Christmas ornaments from Really Interesting Group (RIG). The snowman's head is sized by the number of followers on Twitter; the (rain) bars represent miles traveled per month on Dopplr; the red shows listening habits on last.fm; and finally, the blue one shows apertures you've used over the year for photos uploaded to Flickr. Continue Reading
Unless you live under a rock inside a cave in the remotest area in the world, you know a huge quake struck Haiti on Tuesday, and much lies in ruins. The New York Times just posted some before and after satellite images, and it's a horrible thing to see. Buildings gone. People gone.
It pains me to think about what if that were to happen to me or my family.
To this end, I'm donating all proceeds from World Progress Report orders, along with this month's FlowingData revenues, to UNICEF's relief efforts. The Report, after all, is an effort to relate to the rest of the world. It only seems fitting. It's not much in the grand scheme of things, I guess, but at least it's something. As they say, every little bit counts.
Again, I'm taking orders for one week - through January 21. Do some good and get something good too. I'm including How America Learns with all orders now. Buy a print now.
Or if the World Progress Report just isn't your thing, you can donate directly to UNICEF.
I mean, seriously, there are 27,000 of you + me. We can make a big difference together.
UNdata provides a catalog of 27 United Nations statistical databases and 60 million records about the past, present, and future state of the world. Topics include demographics, life expectancy, labor levels, poverty, and a lot more. What does all that data mean though? World Progress Report, the latest from FlowingPrints, offers a look into the expansive UN collection.
In whole, the report tells a story of how we live and die, and the stuff in between.
How do you compare music visually? You can break it down into data by quantifying the notes, volume, etc and then visualize it with timescapes (above). The horizontal axis represents musical time, from the beginning to end of a piece. Large blocks show similarities to other pieces and smaller noisy chunks show more "fleeting" similarities.
Randall of xkcd has been having fun with data visualization lately. In his latest data-ish comic, Randall explores gravity wells. The height of each well is sized relative to the amount of energy (on Earth) it would take to escape that planet's gravity. The width of wells are scaled by planet size.
So you'd need one big arse rocket to escape Jupiter.
I know it's a comic, hand-drawn, and all stick-figurey and stuff, but Randall actually explains the concepts really well. There's good annotation, clear examples, and he's made an obscure topic easy to understand.
It's also entertaining in the Bill Nye the Science Guy (i.e. best Saturday morning show ever) sort of way.
[Thanks, Ricki and Thomas]