The thing about data is that it can be very convincing. Maybe it’s because it’s so hard to argue against numbers, or maybe it’s just that there’s so much of it. In any case, here’s six datasets that undoubtedly changed the way some people behave or showed us something that brought about a different way of thinking about things.
Baseball Statistics
In 2003, Michael M. Lewis’ book, Moneyball: The Art of Winning an Unfair Game, was released. As a result, the way baseball teams were built changed completely. Before Moneyball, teams relied on insider information and the choice of players was highly subjective. However, in 2002, a year before the book was published, the Oakland A’s had $41 million in salary and had to figure out how to compete against teams like the New York Yankees and the Boston Red Sox who spent over $100 million in salaries.
Global Warming
By now, everyone’s seen the dramatic presentation of weather data in An Inconvenient Truth. That data has got everyone thinking about the environment and has transformed “global warming” into a buzz term for this year’s elections. I encourage you to watch the documentary, if you haven’t already.
AOL Search Data
America Online released three months of keyword searches from 650,000 users as a contribution to improving the existing online search functionality. There was, however, a huge backlash. Even though users were identified by an ID number (rather than their actual name), everyone argued a huge breach in privacy. While the data could have been (well, actually it still is) extremely useful, it was a new take on privacy over the Internet.
Megan’s Law
Since 1994, those who have been convicted of sex crimes against children have been required to register with local law enforcement. That data is made public so that people know about sex offenders in their area. Mash that data with Google Maps. Lo and behold, parents became instantly aware of caution areas and some might never look at their neighbor the same way ever again, while sex offenders start declaring themselves homeless.
Enron Emails
In the 1990s, Fortune had named Enron “America’s Most Innovative Company” six years in a row. In 2001, Enron filed for bankruptcy due to financial fraud. The email dataset played a role in showing the communications among executives in the Enron corporation.
Blackjack Simluations
For the Blackjack we know nowadays, with the multiple decks and automatic shufflers, we can thank Edward O. Thorpe. Thorpe’s book, Beat the Dealer: A Winning Strategy for the Game of Twenty-One (1962), showed us how to shift the cards in our favor. Thorpe used computer simulation after simulation to find out when to hit, when to bet big, etc. If we knew what cards were already dealt, we could calculate the odds of getting a favorable card later on — card counting. Card counting was later exemplified by a group of MIT students who won hundreds of thousands of dollars and were soon barred from all Las Vegas casinos.
What data has influenced you lately?
I would add:
Zillow – http://www.zillow.com
Enron – I would add – Sarbanes-Oxley
iTunes/IPod – Music, Video, Movies, Books, etc
BTW – I love the BlackJack reference. I am a huge fan and have just about all of the MIT/Thorpe materials.
I would add:
Zillow – http://www.zillow.com
Enron – I would add – Sarbanes-Oxley
iTunes/IPod – Music, Video, Movies, Books, etc
BTW – I love the BlackJack reference. I am a huge fan and have just about all of the MIT/Thorpe materials.
Why is the BlackJack simulation called a data set? Surely the data was the actual runs of results the computer generated?
I would argue that the private data set in Amazon that tracks user viewing and buying habits is highly influential in how we think about data mining and our exposure to the results that influences our buying habits.
For the future, the astronomy data created by the LSST will be one hell of a great data set, allowing the interested public to analyze the data in all sorts of ways.
current amazon recommendations: A Handbook of Statistical Analyses Using R, Physical Computing: Sensing and Controlling the Physical World with Computers, The R Book, Linear Models with R.
yup, i’d say that’s a pretty good dataset :)
As someone who routinely works with extremely large (but non-public) data sets I have to say that neither of the last two would be data sets as I understand the term.
And if we don’t want to limit it to publicly available data, then certainly Google’s internal data sets that they undoubtedly use to refine and improve the search algorithm deserve a place on this list.
As someone who routinely works with extremely large (but non-public) data sets I have to say that neither of the last two would be data sets as I understand the term.
And if we don’t want to limit it to publicly available data, then certainly Google’s internal data sets that they undoubtedly use to refine and improve the search algorithm deserve a place on this list.
You’re kidding me? Six datasets and you didn’t have room for the Broad Street pump outbreak and the birth of epidemiology?
Have a look at: http://www.ph.ucla.edu/epi/snow/broadstreetpump.html
You’re kidding me? Six datasets and you didn’t have room for the Broad Street pump outbreak and the birth of epidemiology?
Have a look at: http://www.ph.ucla.edu/epi/snow/broadstreetpump.html
well, i gave mr. snow his very own post a while back https://flowingdata.com/2007/09/12/john-snows-famous-cholera-map/ so i thought cholera should step aside, if just for a moment, to highlight some others.
I guess if he gets his own post he’s doing fairly well then…
I guess if he gets his own post he’s doing fairly well then…
Pingback: Stilgherrian · Saturday Reading, 1 March 2008
Pingback: links for 2008-03-02 | Daily EM
How about google reader…shows our blog reading trends ANÃ lets us see what our friends are reading!
Pingback: 21 Ways to Visualize and Explore Your Email Inbox | FlowingData
The Keeling Curve(s) (and the serendipity in the story behind it)
http://www.gapminder.org/
http://rs.resalliance.org/wp-content/uploads/2006/02/worldMap.jpg
http://beta.sedac.ciesin.columbia.edu/gpw/global.jsp
http://www.census.gov/ipc/www/idb/pyramids.html
https://eed.llnl.gov/flow/02flow.php
http://www-personal.umich.edu/~mejn/cartograms/
Not a dataset in the classical sense:
http://visibleearth.nasa.gov/view_set.php?categoryID=2364
Pingback: Data visualisation | Lunchbox
Pingback: More Data that Changed the World « Joshua Tauberer’s Blog