Consequences of big data exclusions

Posted to Statistics  |  Tags: ,  |  Nathan Yau

Big data, in all its glory, promises insights into the soul of humankind. There's a hefty restriction though. Data only tells you about the population and actions of individuals it represents, which inevitably excludes part of the population. Jonas Lerman considers two hypothetical people. The first one:

The first is a thirty-year-old white-collar resident of Manhattan. She participates in modern life in all the ways typical of her demographic: smartphone, Google, Gmail, Netflix, Spotify, Amazon. She uses Facebook, with its default privacy settings, to keep in touch with friends. She dates through the website OkCupid. She travels frequently, tweeting and posting geotagged photos to Flickr and Instagram. Her wallet holds a debit card, credit cards, and a MetroCard for the subway and bus system. On her keychain are plastic barcoded cards for the “customer rewards” programs of her grocery and drugstore. In her car, a GPS sits on the dash, and an E‑ZPass transponder (for bridge, tunnel, and highway tolls) hangs from the windshield.

That's a lot of data. The second person:

He lives two hours southwest of Manhattan, in Camden, New Jersey, America’s poorest city. He is underemployed, working part-time at a restaurant, paid under the table in cash. He has no cell phone, no computer, no cable. He rarely travels and has no passport, car, or GPS. He uses the Internet, but only at the local library on public terminals. When he rides the bus, he pays the fare in cash.

The second person has fewer data flows.

These days, big data exclusion almost sounds like a good thing — if you're intent on avoiding all marketing-related data collection — but when policy-making, fund allocation, etc. come into play, it's possible the excluded aren't counted. That's not to say people should hurriedly sign up for Facebook and opt-in to every tracking study. It's the opposite. Those in charged of the data and those who decide based on what they see in the data are responsible for knowing the background of their source.


Interactive: When Do Americans Leave For Work?

We don’t all start our work days at the same time, despite what morning rush hour might have you think.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

How We Spend Our Money, a Breakdown

We know spending changes when you have more money. Here’s by how much.