Posted to Tutorials  /  Tags: ,

Beeswarm Plot in R, to Show Distributions

Try the more element-based approach instead of your traditional histogram or boxplot.

Sometimes it can be beneficial to visualize the individual people, places, or things in your data rather than binning them into abstract encodings. The histogram groups data together with rectangles, and the box-and-whisker plot summarizes a dataset with quartiles. Other traditional chart types, meant to visualize distributions, do the same.

The beeswarm plot on the other hand plots all of your points in a single space. It plots the data on a single axis and then offsets in the other direction to show volume or counts.

For example, lets say you have annual incomes for 1,000 people in 2014. You could plot the data as a histogram to show the distribution of incomes. Bar height represents the number of people who made a certain annual income within an income range:

Default histogram

Here is the beeswarm equivalent, which represents each individual out the 1,000-person sample with a dot. The dot grouping size represents the total number of people:

Beeswarm default

It’s like a stripchart but tries to avoid overlapping dots.

Stripchart

In this quick R tutorial, you go over the beeswarm, making use of Aron Charles Eklund’s R package. It’s called — wait for it — beeswarm.

Setup

If you don’t have R installed yet, go ahead and do that first. It’s free, runs on the major platforms, and is typically a straightforward installation.

Assuming you have R installed, open the console and install the beeswarm package like so:

install.packages("beeswarm")

Then load it:

library("beeswarm")

That’s all you need. The rest relies on base R.

Default Beeswarm

After all, no data means no chart.On to the chart-making. Well, after you load your data with read.csv(). The data is included in the data folder in the tutorial download. Be sure to set your working directory in R to the tutorial’s folder on your computer.

# Load data
workers <- read.csv("data/income-sample-2014.tsv", sep="\t", stringsAsFactors=FALSE)

The data is a semi-random sample of 1,000 responses from the 2014 American Community Survey about total annual income. There are multiple variables, but INCTOT is the one of interest. With Eklund’s package, it’s straightforward to make a beeswarm chart with the beeswarm() function.

# Beeswarm
beeswarm(workers$INCTOT)

Here’s what you get (same chart as above):

Beeswarm default

It clutters pretty quick since it’s 1,000 points in one space, but you can categorize the incomes (INCTOT) by occupation (main_occ).

beeswarm(INCTOT ~ main_occ, data=workers, method="swarm")

You can always view the documentation by typing a question mark followed by the function name. So in the console, enter
?beeswarm.
Notice the tilde (~) notation and the “swarm” method specification. The beeswarm() function also provides three other methods, which I’ll let you poke around with via the documentation.

03-beeswarm categories

It’s still kind of cluttered, eh? Let’s mess with more of the options to make this thing more readable.

Beeswarm Options

An explanation of the graphical parameters in base R.At this point, you can modify the beeswarm chart like you would any chart from base graphics. You can change colors (col), symbols (pch), and symbols size (cex) like so:

# Beeswarm options
beeswarm(INCTOT ~ main_occ, data=workers, col=sample(colors(), 27), pch=19, method="swarm", cex=0.5)

That’s a bit better.

If you’re interested in what the job categories actually are, you can find them on IPUMS.Beeswarm options

But of course you can do more. MOAR! Let’s make it horizontal, change the axis labels, add a title, change the category labels, and remove the box. Because we can.

# More parameters
par(las=1)
beeswarm(INCTOT ~ main_occ, data=workers, col=sample(colors(), 27), pch=19, method="swarm", cex=0.5, horizontal=TRUE, xlab="Annual Income, Dollars", ylab="Occupation Category", main="Distribution of Income, by Occupation Caetgory", labels=c(LETTERS, "AA"), bty="n")

The result:

Beeswarm more options

And that’s it. Like I said, there are more things you can fiddle with, which you can find in the documentation. Just enter ?beeswarm in the R console. You can also find more examples on Eklund’s page.

More Tutorials Worth a Look

Here are some tutorials you might also be interested in.

About the Author

Nathan Yau is a statistician who works primarily with visualization. He earned his PhD in statistics from UCLA, is the author of two best-selling books — Data Points and Visualize This — and runs FlowingData. Introvert. Likes food. Likes beer. Follow him @flowingdata.

Become a member. Learn to visualize your data. Support FlowingData.

Join Today

Membership

This is for people who want to learn to make and design data graphics. Your support goes directly to FlowingData, an independently run site.

Benefits of Membership

  • Instant access to tutorials on how to make and design data graphics
  • Source code and files to use with your own data
  • Four-week course on visualization in R
  • Hand-picked links and resources from around the web

Add Comment

You must be logged in to post a comment.

More Tutorials See All →

Make a Moving Bubbles Chart to Show Clustering and Distributions

Use a force-directed graph to form a collection of bubbles and move them around based on data.

How to Make an Interactive Choropleth Map

When presented with a static graphic, it can be useful to see specific values after you see overall patterns. This tutorial shows you how to add simple interactions to a choropleth map so you can get specifics for regions.

How to Make Smoothed Density Maps in R

Too many points to plot often means obscured patterns in the clutter. Density maps offer a smooth alternative.

How to: make a scatterplot with a smooth fitted line

Oftentimes, you’ll want to fit a line to a bunch of data points. This tutorial will show you how to do that quickly and easily using open-source software, R.