Consequences of big data exclusions

Posted to Statistics  |  Tags: ,

Big data, in all its glory, promises insights into the soul of humankind. There's a hefty restriction though. Data only tells you about the population and actions of individuals it represents, which inevitably excludes part of the population. Jonas Lerman considers two hypothetical people. The first one:

The first is a thirty-year-old white-collar resident of Manhattan. She participates in modern life in all the ways typical of her demographic: smartphone, Google, Gmail, Netflix, Spotify, Amazon. She uses Facebook, with its default privacy settings, to keep in touch with friends. She dates through the website OkCupid. She travels frequently, tweeting and posting geotagged photos to Flickr and Instagram. Her wallet holds a debit card, credit cards, and a MetroCard for the subway and bus system. On her keychain are plastic barcoded cards for the “customer rewards” programs of her grocery and drugstore. In her car, a GPS sits on the dash, and an E‑ZPass transponder (for bridge, tunnel, and highway tolls) hangs from the windshield.

That's a lot of data. The second person:

He lives two hours southwest of Manhattan, in Camden, New Jersey, America’s poorest city. He is underemployed, working part-time at a restaurant, paid under the table in cash. He has no cell phone, no computer, no cable. He rarely travels and has no passport, car, or GPS. He uses the Internet, but only at the local library on public terminals. When he rides the bus, he pays the fare in cash.

The second person has fewer data flows.

These days, big data exclusion almost sounds like a good thing — if you're intent on avoiding all marketing-related data collection — but when policy-making, fund allocation, etc. come into play, it's possible the excluded aren't counted. That's not to say people should hurriedly sign up for Facebook and opt-in to every tracking study. It's the opposite. Those in charged of the data and those who decide based on what they see in the data are responsible for knowing the background of their source.