Why context is as important as the data itself

Posted to Design, Statistics  |  Nathan Yau

John Allen Paulos, a math professor at Temple University, explains, in the New York Times, the importance of the before and after of when you get that data blobby thing in your hands.

The problem isn’t with statistical tests themselves but with what we do before and after we run them. First, we count if we can, but counting depends a great deal on previous assumptions about categorization. Consider, for example, the number of homeless people in Philadelphia, or the number of battered women in Atlanta, or the number of suicides in Denver. Is someone homeless if he’s unemployed and living with his brother’s family temporarily? Do we require that a women self-identify as battered to count her as such? If a person starts drinking day in and day out after a cancer diagnosis and dies from acute cirrhosis, did he kill himself?

In a nutshell, statistics is a game of estimation. More often than not, the numbers in front of you aren’t an exact count. They could easily change if you shift the criteria of what was counted. As a result, there’s always some amount of uncertainty attached to your data, and it’s the statistician, analyst, and data scientist’s job to minimize that uncertainty.

So the next time you see a list of rankings like “fattest city” or “dumbest town,” don’t take it for absolute truth. Instead, think of it as an educated guess. Similarly, when you analyze and visualize, remember the context of your data.

Catch Paulos’ full article here.

5 Comments

  • Agreed! Context can make a number mean something, or something else…. a take on web operations and context:
    http://www.kitchensoap.com/2009/05/10/context-and-operational-metrics/

  • Yep – context is essential for meaningful comparison. Statrs me thinking about ontologies, linked data and the semantic web.

  • That’s one of the main issues I have with pop-infographics: they rarely provide an estimate of uncertainty. It’s like reporting regression coefficients without standard errors.

  • True words indeed … I always find it important to realize that all our (statistical) tests are associated with a large number of assumptions that are excluded from the test itself. Indeed, these assumptions include the data collection and coding, assumption regarding the substantive theory being used, and of course the statistical procedure itself.

  • We agree data needs a context in order to be valuable. Numbers only tell a part of the story; we need meaningful ways to define, categorize and interpret the data. This is especially important when it translates into real-world decisions and actions. We wrote a feature on how we can give meaning to data through beautiful, communicative designs: http://www.healthymagination.com/stories/decoding-data/. Check it out if you have the chance.

Favorites

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

The Changing American Diet

See what we ate on an average day, for the past several decades.

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.