6 Influential Datasets That Changed the Way We Think

Posted to Visualization  |  Nathan Yau

The thing about data is that it can be very convincing. Maybe it’s because it’s so hard to argue against numbers, or maybe it’s just that there’s so much of it. In any case, here’s six datasets that undoubtedly changed the way some people behave or showed us something that brought about a different way of thinking about things.

Baseball Statistics

Baseball Statistics

In 2003, Michael M. Lewis’ book, Moneyball: The Art of Winning an Unfair Game, was released. As a result, the way baseball teams were built changed completely. Before Moneyball, teams relied on insider information and the choice of players was highly subjective. However, in 2002, a year before the book was published, the Oakland A’s had $41 million in salary and had to figure out how to compete against teams like the New York Yankees and the Boston Red Sox who spent over $100 million in salaries.

Global Warming

Inconvenient Truth Graph

By now, everyone’s seen the dramatic presentation of weather data in An Inconvenient Truth. That data has got everyone thinking about the environment and has transformed “global warming” into a buzz term for this year’s elections. I encourage you to watch the documentary, if you haven’t already.

AOL Search Data

AOL Search Data

America Online released three months of keyword searches from 650,000 users as a contribution to improving the existing online search functionality. There was, however, a huge backlash. Even though users were identified by an ID number (rather than their actual name), everyone argued a huge breach in privacy. While the data could have been (well, actually it still is) extremely useful, it was a new take on privacy over the Internet.

Megan’s Law

Megan’s Law

Since 1994, those who have been convicted of sex crimes against children have been required to register with local law enforcement. That data is made public so that people know about sex offenders in their area. Mash that data with Google Maps. Lo and behold, parents became instantly aware of caution areas and some might never look at their neighbor the same way ever again, while sex offenders start declaring themselves homeless.

Enron Emails

Enron Emails

In the 1990s, Fortune had named Enron “America’s Most Innovative Company” six years in a row. In 2001, Enron filed for bankruptcy due to financial fraud. The email dataset played a role in showing the communications among executives in the Enron corporation.

Blackjack Simluations

Blackjack Simulations

For the Blackjack we know nowadays, with the multiple decks and automatic shufflers, we can thank Edward O. Thorpe. Thorpe’s book, Beat the Dealer: A Winning Strategy for the Game of Twenty-One (1962), showed us how to shift the cards in our favor. Thorpe used computer simulation after simulation to find out when to hit, when to bet big, etc. If we knew what cards were already dealt, we could calculate the odds of getting a favorable card later on — card counting. Card counting was later exemplified by a group of MIT students who won hundreds of thousands of dollars and were soon barred from all Las Vegas casinos.

What data has influenced you lately?



The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but …

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.

Visualizing the Uncertainty in Data

Data is an abstraction, and it’s impossible to encapsulate everything it represents in real life. So there is uncertainty. Here are ways to visualize the uncertainty.