Visualizing the Uncertainty in Data

Data is a representation of real life. It’s an abstraction, and it’s impossible to encapsulate everything in a spreadsheet, which leads to uncertainty in the numbers.

How well does a sample represent a full population? How likely is it that a dataset represents the truth? How much do you trust the numbers?

Statistics is a game where you figure out these uncertainties and make estimated judgements based on your calculations. But standard errors, confidence intervals, and likelihoods often lose their visual space in data graphics, which leads to judgements based on simplified summaries expressed as means, medians, or extremes.

That’s no good. You miss out on the interesting stuff. The important stuff. So here are some visualization options for the uncertainties in your data, each with its pros, cons, and examples.


Let’s start with the traditional visualization approach, which at the least is to show a range or confidence interval. A point in the middle represents a mean or median, and a bar or line shows other possible values or coverage.


Lines or bars represent a range of values, so you can see that a mean or median represents only part of an estimate. The range is especially useful when you compare multiple estimates, because you can see overlap between categories. You don’t get this from means.


If you have a full distribution of values, you don’t get to see all of the details in the data. Also, a lot of people don’t understand the concept of confidence intervals or what standard error bars are, so you need to explain clearly with annotation.


FiveThirtyEight often does a good job at valuing uncertainty in their work. In their basketball player ratings and projections, they show range with light gray bars behind a black dot to represent possible player impact over time.

See also: the classic box-and-whisker plot, salary percentiles by industry and my attempt at animating potential values. And of course, how can one forget the jittering gauge.


Show the spread of possible values with a histogram or a variant of it. You might see something a median would never show.


By showing the variation in a sample, you or a reader can make a more educated judgement about whether a sample is trustworthy. It is oddly skewed? Are there multiple peaks? Or is it an expected bell curve?


Again, many people don’t understand distributions, so you need to explain what’s going on. Sometimes variation is just noise, or the details might obscure the forest for the trees.


There’s a ton of variance when people experience a “first” in their relationship lives, so instead of just average ages, I used distributions.

See also: How people spend their time visualized with parallel coordinates.

Multiple Outcomes

When it comes to projections and forecasts, it is helpful to see various outcomes to see what might happen. Key word: might.


Uncertainty is displayed more explicitly. People can see that there is no set path, and instead they see a bunch of possible paths.


If there’s too much noise or there are too many possibilities, the chart might not provide anything of use. But that might be a problem with the forecasting more than the chart choice.


To show simulation uncertainty for the election, The Upshot displayed multiple delegate outcomes at the same time using various models.

See also: Hurricane tracking and the fan chart for time series data, and bootstrap density curves.


Similar to showing multiple outcomes, seeing various results occur one-by-one to build up an overall picture provides intuition for the fuzziness of predictions.


When data appears all at once or in aggregate, it can be a challenge for many to interpret results and link it back to what the data actually represents. By showing simulations, you get a sense of build-up and a link with individual outcomes.


Too much weight might be placed on individual outcomes which obscures the overall picture.


The Social Security Administration puts out life expectancy and probabilities of death at any given age. I used that to simulate how many years you might have left to live.

See also: how you will die, the day of 1,000 Americans, and Parable of the Polygons.


The more uncertain an estimate is, the more difficult it is to see, becoming less visually prominent compared to more certain estimates. You can achieve this effect a number of ways, such as with transparency, color scale, or blurriness.


The metaphor makes sense. If you’re less certain about an estimate, make it less visually prominent. The data that’s less up in the air gets more attention as a result.


How is fuzziness or obscurity perceived? Are various levels actually interpreted or is it a bivariate thing? This requires more research.


I haven’t seen this done much, but the wind prediction map by Moritz Stefaner comes to mind.

Lines represent wind predictions, and opacity represents the strength of the predictions.


Maybe visualization isn’t what you’re looking for at all. After all, you don’t have to visualize everything. You can add uncertainty to your writing by avoiding absolutes when you describe numbers. Treat estimates as such when you use them, and account for the uncertainty in the numbers.

For reference:

Become a member. Support an independent site. Make great charts.

See What You Get


Finding the New Age, for Your Age

You’ve probably heard the lines about how “40 is the new 30” or “30 is the new 20.” What is this based on? I tried to solve the problem using life expectancy data. Your age is the new age.

How Much Minimum Wage Changed in Each State

Minimum wage has increased over the years, but by how much depends on where you live.

Jobs Charted by State and Salary

Jobs and pay can vary a lot depending on where you live, based on 2013 data from the Bureau of Labor Statistics. Here’s an interactive to look.

Cuisine Ingredients

What are the ingredients that make each cuisine? I looked at 40,000 recipes spanning 20 cuisines and 6,714 ingredients to see what makes food taste different.