40 years of boxplots

Dec 6, 2011

Famed statistician John Tukey created the boxplot in 1970. It shows a distribution summary in a small amount of space. Hadley Wickham and Lisa Stryjewski look back on the old standby and its evolution up to present. Keep it in mind, while still used today, the boxplot was created with pencil and paper.

One of the original constraints on the boxplot was that it was designed to be computed and drawn by hand. As every statistician now has a computer on their desk, this constraint can be relaxed, allowing variations of the boxplot that are substantially more complex. These variations attempt to display more information about the distribution, maintaing the compact size of the boxplot, but bringing in the richer distributional summary of the histogram or density plot. These plots can overcome problems in the original such as the failure to display multi-modality, or the excessive number of “outliers” when n is large.

Alright, computers are useful. I guess.

[40 years of boxplots]


  • I’ve been using beanplots a lot lately. 99% of the graphs I draw are distributional visualizations, and beanplots are particularly good for comparing multiple pairs of distributions (e.g., diversity in two classes of sites by region).

    Thanks for sharing this article!

    • Hey Raphael!
      Could you tell me what the outer edges of bean plots imply and why they are of such varying shapes.

  • John Tukey deserves enormous credit for his energetic and enthusiastic advocacy of box plots (and, naturally, much, much else in statistical graphics, statistical science, and science, generally).

    But the claim that he invented the box plot, although passed on from course to course and text to text as an invariable meme, is at best a half-truth. Re-invention, very likely.

    Box plots were used in climatology and geography from at least 1933, usually under the dull name “dispersion diagram”. later Mary Ellen Spear included them in 1952 as “range bars” in a text on graphics, as this paper acknowledges. Such diagrams showed median, quartiles and extremes, and often _more_ detail about other data points than many box plots do at present. (That box plots often leave out too much is a frequent discovery.)

    The name “box plot” is, so far as I can gather, 100% Tukey, as are his rules on when to show individual data points beyond the “whiskers”.

    • Should be: Mary Eleanor Spear. Also, according to a report by John Bibby, A.L. Bowley was using box plots in his lectures about 1897.


Jobs Charted by State and Salary

Jobs and pay can vary a lot depending on where you live, based on 2013 data from the Bureau of Labor Statistics. Here’s an interactive to look.

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

Most popular porn searches, by state

We’ve seen that we can learn from what people search …

10 Best Data Visualization Projects of 2015

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent.