40 years of boxplots

Famed statistician John Tukey created the boxplot in 1970. It shows a distribution summary in a small amount of space. Hadley Wickham and Lisa Stryjewski look back on the old standby and its evolution up to present. Keep it in mind, while still used today, the boxplot was created with pencil and paper.

One of the original constraints on the boxplot was that it was designed to be computed and drawn by hand. As every statistician now has a computer on their desk, this constraint can be relaxed, allowing variations of the boxplot that are substantially more complex. These variations attempt to display more information about the distribution, maintaing the compact size of the boxplot, but bringing in the richer distributional summary of the histogram or density plot. These plots can overcome problems in the original such as the failure to display multi-modality, or the excessive number of “outliers” when n is large.

Alright, computers are useful. I guess.

[40 years of boxplots]

4 Comments

  • I’ve been using beanplots a lot lately. 99% of the graphs I draw are distributional visualizations, and beanplots are particularly good for comparing multiple pairs of distributions (e.g., diversity in two classes of sites by region).

    Thanks for sharing this article!

    • Hey Raphael!
      Could you tell me what the outer edges of bean plots imply and why they are of such varying shapes.

  • John Tukey deserves enormous credit for his energetic and enthusiastic advocacy of box plots (and, naturally, much, much else in statistical graphics, statistical science, and science, generally).

    But the claim that he invented the box plot, although passed on from course to course and text to text as an invariable meme, is at best a half-truth. Re-invention, very likely.

    Box plots were used in climatology and geography from at least 1933, usually under the dull name “dispersion diagram”. later Mary Ellen Spear included them in 1952 as “range bars” in a text on graphics, as this paper acknowledges. Such diagrams showed median, quartiles and extremes, and often _more_ detail about other data points than many box plots do at present. (That box plots often leave out too much is a frequent discovery.)

    The name “box plot” is, so far as I can gather, 100% Tukey, as are his rules on when to show individual data points beyond the “whiskers”.

    • Should be: Mary Eleanor Spear. Also, according to a report by John Bibby, A.L. Bowley was using box plots in his lectures about 1897.

Favorites

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

The Changing American Diet

See what we ate on an average day, for the past several decades.

Marrying Age

People get married at various ages, but there are definite trends that vary across demographic groups. What do these trends look like?

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.