40 years of boxplots

Famed statistician John Tukey created the boxplot in 1970. It shows a distribution summary in a small amount of space. Hadley Wickham and Lisa Stryjewski look back on the old standby and its evolution up to present. Keep it in mind, while still used today, the boxplot was created with pencil and paper.

One of the original constraints on the boxplot was that it was designed to be computed and drawn by hand. As every statistician now has a computer on their desk, this constraint can be relaxed, allowing variations of the boxplot that are substantially more complex. These variations attempt to display more information about the distribution, maintaing the compact size of the boxplot, but bringing in the richer distributional summary of the histogram or density plot. These plots can overcome problems in the original such as the failure to display multi-modality, or the excessive number of “outliers” when n is large.

Alright, computers are useful. I guess.

[40 years of boxplots]


  • I’ve been using beanplots a lot lately. 99% of the graphs I draw are distributional visualizations, and beanplots are particularly good for comparing multiple pairs of distributions (e.g., diversity in two classes of sites by region).

    Thanks for sharing this article!

    • Hey Raphael!
      Could you tell me what the outer edges of bean plots imply and why they are of such varying shapes.

  • John Tukey deserves enormous credit for his energetic and enthusiastic advocacy of box plots (and, naturally, much, much else in statistical graphics, statistical science, and science, generally).

    But the claim that he invented the box plot, although passed on from course to course and text to text as an invariable meme, is at best a half-truth. Re-invention, very likely.

    Box plots were used in climatology and geography from at least 1933, usually under the dull name “dispersion diagram”. later Mary Ellen Spear included them in 1952 as “range bars” in a text on graphics, as this paper acknowledges. Such diagrams showed median, quartiles and extremes, and often _more_ detail about other data points than many box plots do at present. (That box plots often leave out too much is a frequent discovery.)

    The name “box plot” is, so far as I can gather, 100% Tukey, as are his rules on when to show individual data points beyond the “whiskers”.

    • Should be: Mary Eleanor Spear. Also, according to a report by John Bibby, A.L. Bowley was using box plots in his lectures about 1897.


Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

The Most Unisex Names in US History

Moving on from the most trendy names in US history, let’s look at the most unisex ones. Some names have …

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Divorce Rates for Different Groups

We know when people usually get married. We know who never marries. Finally, it’s time to look at the other side: divorce and remarriage.