How to visualize data with cartoonish faces ala Chernoff
The goal of Chernoff faces is to show a bunch of variables at once via facial features like lips, eyes, and nose size. Most of the time there are better solutions, but the faces can be interesting to work with.
FlowingData reader Chris asks:
I was wondering, have you ever considered doing a Chernoff faces tutorial for R? I think Chernoff faces are pretty interesting and I haven’t seen much about them on the web.
This wasn’t the first time someone’s asked how to make Chernoff faces, so I did a quick search. Guess what. There’s an R package for that. This tutorial describes how to apply Chernoff faces to your own data.
The point of Chernoff faces is to display multiple variables at once by positioning parts of the human face, such as ears, hair, eyes, and nose, based on numbers in a dataset. The assumption is that we can read people’s faces easily in real life, so we should be able to recognize small differences when they represent data. Now that’s a pretty big assumption, but debate aside, they’re fun to make.
1. Because these are faces rather than abstract geometric shapes, be careful what you show with this method and who you show it to. As was the case in this tutorial, those who aren’t familiar with the method might take the faces literally and take offense.We’ve seen them applied to baseball players and judge ratings. In this tutorial, we’ll look at US crime rate by state.1
Like in previous tutorials, we’ll be using R (surprise, surprise), the software environment for statistical computing and graphics, to make our Chernoff faces, so if you haven’t already, download and install R first before moving on. It’s free, open-source, and a one-click install. Go on, I’ll wait for you.
Step 1. Install package
Once you’ve opened up R, the first thing we need to do is install the aplpack (Another Plot Package) package by Peter Wolf. Go to the the “Packages & Data” menu in R, and select the “Package Installer.” Select “CRAN (binaries)” in the dropdown menu if it’s not already on that, and then click on “Get List.” Scroll down to “aplpack” and click on the “Install Selected” button and installation should begin.
The Another Plot Package will do most of the grunt work.
Alternatively, you can also just type this in the R console:
Step 2. Load the data
Next we need to load the data into the R environment. Like I said, we’ll be looking at crime rates by state. I got the data from Infochimps, which is actually from Table 301 of the 2008 US Statistical Abstract, but it’s typically a headache going through dot gov navigation, so I avoid it when I can.
I cleaned the datafile I got from Infochimps a little bit more so it only includes the numbers we’re interested in. You can find it here, but you don’t need to download it. We’ll load it directly into R via the URL using the
crime <- read.csv("http://datasets.flowingdata.com/crimeRatesByState-formatted.csv")
To view the data, type the following:
This shows you the first six lines of our dataset. Note that there are eight columns. The first column is state name, with the exception of the row for US average and District of Columbia later on. The rest of the columns are seven categories of crime.
Step 3. Make some faces
Once the data is in, it's actually really easy to make some faces using the
faces() function from the
aplpack package. So far we've only installed the package, so now we'll load it:
If you get errors when you try to load, you might want to check to see if you installed the package correctly.
Okay, let's make some faces:
Here we're telling R to use the
faces() function, using columns 2 through 8 of our crime data. Remember, the first column is state name. You get something that looks like this:
Default Chernoff Faces using
Step 4. Change Features
This is pretty much what we want except for two things. The first is that the faces are labeled with numbers. That isn't of much use without a key. The second is that some of the faces are smiling. For more positive datasets like quality of life or baseball stats, that would make sense. The higher the value, the better. This is crime data though. The higher the value, the worse. Smiles for rate of larceny theft doesn't seem quite right.
faces() function doesn't let us choose what face parts to associate with each metric, so we need to find a workaround. According to the documentation (view by typing
?faces), the curve of the smile is applied to the sixth column in the input matrix, which is
crime in this case.
Ah. Here's what we'll do. We make the sixth column in our data all the same value. That way all smile curves will be neutral. Here's how we can do that:
crime_filled <- cbind(crime[,1:6], rep(0, length(crime$state)), crime[,7:8])
cbind() function combines multiple columns to form a matrix. In the above, we combine the first six columns of
crime, stick a column of zeros whose length matches the number of rows in our crime data, and then we end with the last two columns in
crime. We save the new matrix into a variable called
crime_filled. Similar to in Step 2, you can type the following to see the first rows of
Notice the new column of zeros?
We get similar faces, but with no more smiles:
Using different features to indicate variables
Step 5. Add labels
Instead of numbers, it'd be much more useful to include state names. Easy.
It's the same as previous, but we use the
labels argument to use the
state column in
crime_filled to label with state names.
Add state name labels so it's not so ambiguous.
Much more useful now. We can easily associate the faces with a state. It's a little cluttered, but we can fix that up easy in Illustrator.
Step 6. Fix up in Illustrator (optional)
You can pretty much stop here if you like, but as most of you know, I like to save the image as a PDF, bring it into Adobe Illustrator (aff), and clean things up to make it more readable. You can also try Inkscape, the open-source alternative, although I've never tried it.
After some label cleanup and some annotation, here's our final result. What's going on there Washington, D.C.?
Not too bad, right?
Read the R documentation on
faces() for more details on what else you can do with the function. Remember, documentation is your friend when it comes to making full use of R.
Now go on. Have some fun with your new Chernoff toy.
For more examples, guidance, and all-around data goodness like this, order Visualize This, the FlowingData book on visualization, statistics, and design.
More Tutorials See All →
How to Make a Connected Scatter Plot
The combination of a time series chart and a scatter plot lets you compare two variables along with temporal changes.
How I Made That: Animated Difference Charts in R
A combination of a bivariate area chart, animation, and a population pyramid, with a sprinkling of detail and annotation.