Posted to

Statistics

Flexible data

Data is an abstraction of something that happened in the real world. How people move. How they spend money. How a computer works. The tendency…

Problematic databases used to track employee theft

Employee theft accounts for billions of dollars of lost merchandise per year, so it's a huge concern for retailers, but it often goes unreported as…

How to become a password cracker in a day

Deputy editor at Ars Technica Nate Anderson was curious if he could learn to crack passwords in a day. Although there's definitely a difference between…

Odds of a perfect NCAA March Madness bracket

Math professor Jeff Bergen explains the odds of picking a perfect bracket. The first probability is based on a 50/50 split of correct picks, which…

Declining songwriter ratings with age

Do singer-songwriters age well like a fine wine, or does quality decline with age? Kyle Biehle analyzed fan ratings by age. I understand all of…

Data hackathon challenges and why questions are important

Jake Porway, executive director of DataKind on data hackathons and why they require careful planning to actually work: Any data scientist worth their salary will…

What data brokers know about you

Lois Beckett for ProPublica has a thorough piece on data brokers — companies that collect and sell information about you — and what they know…

Using search data to find drug side effects

Along the same lines as Google Flu Trends, researchers at Microsoft, Stanford and Columbia University are investigating whether search data can be used to find…

Netflix data and puppets

Andrew Leonard for Salon fears what might come of the creative process if movies are based on algorithms and data and that we might turn…

This pie chart is amazing.

From the Winnipeg Sun. Something isn't right here. [via]…

Porn star demographics

Jon Millward explored porn star demographics using a data scrape from the Internet Adult Film Database: hair color, race, and birthplace, among other things. (There…

Analysis of LEGO brick prices over the years

Reality Prose has an excellent analysis on the changing price of LEGO bricks over the years and a misconception that cost has gone up. According…

Philosophy of data

David Brooks for The New York Times on the philosophy of data and what the future holds: If you asked me to describe the rising…

The most poisoned name in US history

Biostatistics PhD candidate Hilary Parker dived into the most poisoned names in US history. Her own name topped the list. There were several fad names…

Using data to find a husband

When it was time to settle down with the right man, Amy Webb joined two dating sites, created a profile, and went on some horrible…

Data Analysis (with R) on Coursera

Jeff Leek, an Assistant Professor of Biostatistics at the Johns Hopkins Bloomberg School of Public Health, is teaching a course on data analysis on…

Statistical network of basketball

By now, everyone's heard of Moneyball. Applying statistics to baseball to build the best team for the buck. Naturally, there's a lot of interest these…

The differences between machine learning, data mining, and statistics

From machine learning to data mining. From statistics to probability. A lot of it seems similar, so what are the differences? Statistician William Briggs explains…

A new kind of resource

Jer Thorp talks ethics in the data-as-new-oil metaphor: [W]e need to change the way that we collectively think about data, so that it is not…

Machines and built-in morality

With Google's driverless cars now street legal in California, Florida, and Nevada, Gary Marcus for the New Yorker ponders a world where machines need a…

Archive of datasets bundled with R

R comes with a lot of datasets, some with the core distribution and others with packages, but you'd never know which ones unless you went…

Incredibly divided nation in a map

I knew things were bad, but I didn't know they were this bad. Obama has his work cut out for him. [Thanks, @adamsinger]…