Posted to

Statistics

Hiding a pregnancy from advertisers

You probably remember how Target used purchase histories to predict pregnancies among their customer base (although, don't forget the false positives). Janet Vertesi, an assistant…

A principal component analysis step-by-step

Sebastian Raschka offers a step-by-step tutorial for a principal component analysis in Python. The main purposes of a principal component analysis are the analysis of…

Analysis of Bob Ross paintings

As a lesson on conditional probability for himself, Walt Hickey watched 403 episodes of "The Joy of Painting" with Bob Ross, tagged them with keywords…

Porn views for red versus blue states

Pornhub continues their analysis of porn viewing demographics in their latest comparison of pageviews per capita between red and blue states (SFW for most, I…

Using Census survey data properly

The American Community Survey, an ongoing survey that the Census administers to millions per year, provides detailed information about how Americans live now and decades…

Bracket picks of the masses versus sports pundits

Stephen Pettigrew and Reuben Fischer-Baum, for Regressing, compared 11 million brackets on ESPN.com against those of pundits. To evaluate how much better (or worse) the…

Fox News bar chart gets it wrong

Because Fox News. See also this, this, and this. [Thanks, Meron]…

Big data, same statistical challenges

Tim Harford for Financial Times on big data and how the same problems for small data still apply: The multiple-comparisons problem arises when a researcher…

Bike share data in New York, animated

Citi Bike, also known as NYC Bike Share, is releasing monthly data dumps for station check-outs and check-ins, which gives you a sense of where…

Dead links on the Million Dollar Homepage

Remember the Million Dollar Homepage from 2005? It sold ad space to anyone who was interested for one dollar per pixel, and there were one…

Gambling data as a proxy for excitement in sports

After he noticed gambling odds fluctuate wildly at the end of a football game, Todd Schneider realized a correlation between betting odds and game excitement.…

Where time comes from

The Atlantic interviewed Dr. Demetrios Matsakis, Chief Scientist for Time Services at the US Naval Observatory about where time comes from, the precision required…

How people really read and share online

Tony Haile discusses how we read and share online, based on actual data. It's not as click- and pageview-based as you might think. A widespread…

The important parts of data analysis

There's plenty of software to muck around with data, but to gain the skills to really get something out of it, that takes time and…

Statistical concepts explained through dance

Forget bell curves, jellybeans, and coin flips to explain statistical concepts. Dancing Statistics is a video series that demonstrates variance, correlation, and sampling through coreographed…

ProPublica opened a data store

One of the main challenges of any data project is getting the data. It seems obvious, but the effort to get the right data to…

Game theory to win game shows

I like how a little bit of game theory has crept into Jeopardy! with contestant Arthur Chu. He bounces around the board in search of…

A visual explanation of conditional probability

Victor Powell, who has visualized the Central Limit Theorem and Simpson's Paradox, most recently provided a visual explainer for conditional probability. Two bars, one blue…

Basketball analytics

Kirk Goldsberry talks the rise of analytics usage in the NBA. With cameras above every court recording player movements, there's a higher granularity analysis that…

Texting data to save lives

Remember that TED talk from a couple of years ago on texting patterns to a crisis hotline? The TED talker Nancy Lublin proposed the analysis…

How R came to be

Statistician John Chambers, the creator of S and a core member of R, talks about how R came to be in the short video below.…

Facebook debunks Princeton study

Researchers at Princeton released a study that said that Facebook was on the way out, based primarily on Google search data. Naturally, Facebook didn't appreciate…