6 Influential Datasets That Changed the Way We Think

Posted Jan 24, 2008 to Featured by Nathan  /  12 responses

6 Influential Datasets That Changed the Way We Think

The thing about data is that it can be very convincing. Maybe it's because it's so hard to argue against numbers, or maybe it's just that there's so much of it. In any case, here's six datasets that undoubtedly changed the way some people behave or showed us something that brought about a different way of thinking about things.

Baseball Statistics

Baseball Statistics

In 2003, Michael M. Lewis' book, Moneyball: The Art of Winning an Unfair Game, was released. As a result, the way baseball teams were built changed completely. Before Moneyball, teams relied on insider information and the choice of players was highly subjective. However, in 2002, a year before the book was published, the Oakland A's had $41 million in salary and had to figure out how to compete against teams like the New York Yankees and the Boston Red Sox who spent over $100 million in salaries.

Global Warming

Inconvenient Truth Graph

By now, everyone's seen the dramatic presentation of weather data in An Inconvenient Truth. That data has got everyone thinking about the environment and has transformed "global warming" into a buzz term for this year's elections. I encourage you to watch the documentary, if you haven't already.

AOL Search Data

AOL Search Data

America Online released three months of keyword searches from 650,000 users as a contribution to improving the existing online search functionality. There was, however, a huge backlash. Even though users were identified by an ID number (rather than their actual name), everyone argued a huge breach in privacy. While the data could have been (well, actually it still is) extremely useful, it was a new take on privacy over the Internet.

Megan's Law

Megan’s Law

Since 1994, those who have been convicted of sex crimes against children have been required to register with local law enforcement. That data is made public so that people know about sex offenders in their area. Mash that data with Google Maps. Lo and behold, parents became instantly aware of caution areas and some might never look at their neighbor the same way ever again, while sex offenders start declaring themselves homeless.

Enron Emails

Enron Emails

In the 1990s, Fortune had named Enron "America's Most Innovative Company" six years in a row. In 2001, Enron filed for bankruptcy due to financial fraud. The email dataset played a role in showing the communications among executives in the Enron corporation.

Blackjack Simluations

Blackjack Simulations

For the Blackjack we know nowadays, with the multiple decks and automatic shufflers, we can thank Edward O. Thorpe. Thorpe's book, Beat the Dealer: A Winning Strategy for the Game of Twenty-One (1962), showed us how to shift the cards in our favor. Thorpe used computer simulation after simulation to find out when to hit, when to bet big, etc. If we knew what cards were already dealt, we could calculate the odds of getting a favorable card later on -- card counting. Card counting was later exemplified by a group of MIT students who won hundreds of thousands of dollars and were soon barred from all Las Vegas casinos.

What data has influenced you lately?

Replies

12 responses to "6 Influential Datasets That Changed the Way We Think"
  • Tony
    Jan 24, 2008, 8:36 am

    I would add:

    Zillow - http://www.zillow.com
    Enron - I would add - Sarbanes-Oxley
    iTunes/IPod - Music, Video, Movies, Books, etc

    BTW - I love the BlackJack reference. I am a huge fan and have just about all of the MIT/Thorpe materials.


  • Alex Tolley
    Feb 27, 2008, 6:08 pm

    Why is the BlackJack simulation called a data set? Surely the data was the actual runs of results the computer generated?

    I would argue that the private data set in Amazon that tracks user viewing and buying habits is highly influential in how we think about data mining and our exposure to the results that influences our buying habits.

    For the future, the astronomy data created by the LSST will be one hell of a great data set, allowing the interested public to analyze the data in all sorts of ways.


  • Nathan
    Feb 27, 2008, 11:57 pm
    Author

    current amazon recommendations: A Handbook of Statistical Analyses Using R, Physical Computing: Sensing and Controlling the Physical World with Computers, The R Book, Linear Models with R.

    yup, i’d say that’s a pretty good dataset :)


  • Anonymous
    Feb 28, 2008, 3:52 pm

    As someone who routinely works with extremely large (but non-public) data sets I have to say that neither of the last two would be data sets as I understand the term.

    And if we don’t want to limit it to publicly available data, then certainly Google’s internal data sets that they undoubtedly use to refine and improve the search algorithm deserve a place on this list.


  • Steve Taylor
    Feb 28, 2008, 4:11 pm

    You’re kidding me? Six datasets and you didn’t have room for the Broad Street pump outbreak and the birth of epidemiology?

    Have a look at: http://www.ph.ucla.edu/epi/sno.....tpump.html


  • Nathan
    Feb 28, 2008, 5:26 pm
    Author

    well, i gave mr. snow his very own post a while back http://flowingdata.com/2007/09.....olera-map/ so i thought cholera should step aside, if just for a moment, to highlight some others.


  • Steve Taylor
    Feb 28, 2008, 6:37 pm

    I guess if he gets his own post he’s doing fairly well then…


  • Stilgherrian · Saturday Reading, 1 March 2008
    Feb 29, 2008, 9:46 pm

    […] 6 Influential Datasets That Changed the Way We Think. Hat-tip to O’Reilly Radar. […]


  • links for 2008-03-02 | Daily EM
    Mar 1, 2008, 11:19 pm

    […] 6 Influential Datasets That Changed the Way We Think | FlowingData “What data has influenced you lately?” (tags: data statistics information science datasets research datamining visualization crowdsourcing) Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. […]


  • Jeanne Breault
    Mar 16, 2008, 11:49 pm

    How about google reader…shows our blog reading trends ANÐ lets us see what our friends are reading!


  • 21 Ways to Visualize and Explore Your Email Inbox | FlowingData
    Mar 19, 2008, 12:21 pm

    […] posted about this in 6 Influential Datasets that Changed the Way We Think. When did Enron start going […]


  • JMG3Y
    Mar 21, 2008, 8:37 pm

    The Keeling Curve(s) (and the serendipity in the story behind it)

    http://www.gapminder.org/

    http://rs.resalliance.org/wp-c.....rldMap.jpg
    http://beta.sedac.ciesin.colum.....global.jsp
    http://www.census.gov/ipc/www/idb/pyramids.html
    https://eed.llnl.gov/flow/02flow.php
    http://www-personal.umich.edu/~mejn/cartograms/

    Not a dataset in the classical sense:
    http://visibleearth.nasa.gov/v.....oryID=2364


Add Your Response