Using data to find a girlfriend

Posted to Statistics  |  Tags: , ,  |  Nathan Yau

Remember when Amy Webb created a bunch of fake male profiles to scrape data from two dating sites and analyze it to find a husband? Mathematician Chris McKinlay took a similar route to find a girlfriend (and now fiancee). However, unlike Webb who used a relatively small sample, McKinlay scraped data for thousands of profiles in his area and analyzed the data more thoroughly, in search of the perfect mate.

For McKinlay’s plan to work, he’d have to find a pattern in the survey data—a way to roughly group the women according to their similarities. The breakthrough came when he coded up a modified Bell Labs algorithm called K-Modes. First used in 1998 to analyze diseased soybean crops, it takes categorical data and clumps it like the colored wax swimming in a Lava Lamp. With some fine-tuning he could adjust the viscosity of the results, thinning it into a slick or coagulating it into a single, solid glob.

He played with the dial and found a natural resting point where the 20,000 women clumped into seven statistically distinct clusters based on their questions and answers. “I was ecstatic,” he says. “That was the high point of June.”

He selected the two clusters most to his liking, looked at what interested the women, and then adjusted his profile accordingly. He didn’t lie. He just emphasized the traits that he possessed and that women tended to like. Then he waited for women to notice him.

It’s kind of like he built a targeted advertising system for himself and then cast a really wide net. Even though McKinlay is engaged now, I still wonder if it actually worked or if something similar might have happened if he left it to chance. I like to believe in the latter. He did after all go on dates with 87 other people before finding a match.

Favorites

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.