Tim Harford for Financial Times on big data and how the same problems for small data still apply: The multiplecomparisons problem arises when a researcher looks at many possible patterns.…
Statistics
More than mean and standard deviation.

Big data, same statistical challenges

Bike share data in New York, animated
Citi Bike, also known as NYC Bike Share, is releasing monthly data dumps for station checkouts and checkins, which gives you a sense of where and when people move about…

Dead links on the Million Dollar Homepage
Remember the Million Dollar Homepage from 2005? It sold ad space to anyone who was interested for one dollar per pixel, and there were one million pixels available. All spots…

Gambling data as a proxy for excitement in sports
After he noticed gambling odds fluctuate wildly at the end of a football game, Todd Schneider realized a correlation between betting odds and game excitement. The Gambletron 2000 is a…

Where time comes from
The Atlantic interviewed Dr. Demetrios Matsakis, Chief Scientist for Time Services at the US Naval Observatory about where time comes from, the precision required and how they obtain it, and…

How people really read and share online
Tony Haile discusses how we read and share online, based on actual data. It’s not as click and pageviewbased as you might think. A widespread assumption is that the more…

The important parts of data analysis
There’s plenty of software to muck around with data, but to gain the skills to really get something out of it, that takes time and experience. Mikio Braun, a post…

Statistical concepts explained through dance
Forget bell curves, jellybeans, and coin flips to explain statistical concepts. Dancing Statistics is a video series that demonstrates variance, correlation, and sampling through coreographed movements. The dance below explains…

ProPublica opened a data store
One of the main challenges of any data project is getting the data. It seems obvious, but the effort to get the right data to answer a question seems to…

Game theory to win game shows
I like how a little bit of game theory has crept into Jeopardy! with contestant Arthur Chu. He bounces around the board in search of Daily Doubles and bets to…

A visual explanation of conditional probability
Victor Powell, who has visualized the Central Limit Theorem and Simpson’s Paradox, most recently provided a visual explainer for conditional probability. Two bars, one blue and one red, represent two…

Basketball analytics
Kirk Goldsberry talks the rise of analytics usage in the NBA. With cameras above every court recording player movements, there’s a higher granularity analysis that is now possible, beyond the…

Texting data to save lives
Remember that TED talk from a couple of years ago on texting patterns to a crisis hotline? The TED talker Nancy Lublin proposed the analysis of these text messages to…

How R came to be
Statistician John Chambers, the creator of S and a core member of R, talks about how R came to be in the short video below. Warning: Super nerdy waters ahead.…

Facebook debunks Princeton study
Researchers at Princeton released a study that said that Facebook was on the way out, based primarily on Google search data. Naturally, Facebook didn’t appreciate it much and followed up…

Using data to find a girlfriend
Remember when Amy Webb created a bunch of fake male profiles to scrape data from two dating sites and analyze it to find a husband? Mathematician Chris McKinlay took a…

Disney MagicBands track your theme park activities
You can now wear a MagicBand when you enter Disneyland to get a more personalized experience, and in return, the park gets to know what their customers are up to.…

How Netflix creates movie microgenres
Alexis Madrigal and Ian Bogost for The Atlantic reverse engineered the Netflix genre generator, analyzed the data, and then made their own. Then they talked to Todd Yellin, the guy…

Clusters of single malt Scotch whiskies
Luba Gloukhov of Revolution Analytics used kmeans clustering to find groups of single malt Scotch whiskies. Because you know, New Year’s morning is when whisky is on everyone’s mind. The…

Statistics Done Wrong, a guide to common analysis mistakes
Alex Reinhart, a PhD statistics student at Carnegie Mellon University, covers some of the common analysis mistakes in Statistics Done Wrong. Statistics Done Wrong is a guide to the most…

Iron Maiden uses piracy data for tour locations
When you hear “piracy data” and “music” in the same sentence, it usually ends with exorbitant fines. Iron Maiden took a different route. In the case of Iron Maiden, still…

Data scientist surpasses statistician on Google Trends
The relative interest in data scientist surpassed statistician this month. It was also higher in April and September of this year, so it’s not new, but it does seem like…

Easy text classification
Text can be a great source of data, but it can be a challenge to glean information from an analysis standpoint. etcML can help you with that. Browse Twitter trends,…

Prediction of sexual orientation through Facebook friends
Carter Jernigan and Behram F.T. Mistree found that sexual orientation of an individual is strongly correlated to the sexual orientation of the individual’s friends on Facebook.…

Up all night to get data, a music video parody
Neuroscience students at the University of California, San Diego made a music video parody of Daft Punk’s “Get Lucky.” It’s about gathering data in the lab. Graduate students are such…

Global status tracker for open government data
The Open Knowledge Foundation launched the Open Data Index, so you can see what data countries provide to their citizens. An increasing number of governments have committed to open up…

Cancer data for the U.S. released
The Centers for Disease Control and Prevention released their most recent cancer data a few days ago. It’s the numbers for 2010, which feels dated. However, the annual data goes…

U.S. Open Data Institute
With a $250,000 grant from the Knight Foundation, Waldo Jaquith pushes forward with the U.S. Open Data Institute, an effort to link government data sources and organizations over the next…

Monty Hall xkcd
Nice one, xkcd.…

Degrees of separation between athletes from different sports
You’ve probably heard of the six degrees of Kevin Bacon. The idea is that you can name any actor and trace back to Kevin Bacon through actors who have worked…