"Target doesn't just know when you're buying sheets. They know what you're doing in between them."
In the 1980s, students and researchers at UCLA, led by marketing professor Alan Andreasen, found some interesting spending patterns when people approach major life events.
[W]hen some customers were going through a major life event, like graduating from college or getting a new job or moving to a new town, their shopping habits became flexible in ways that were both predictable and potential gold mines for retailers. The study found that when someone marries, he or she is more likely to start buying a new type of coffee. When a couple move into a new house, they're more apt to purchase a different kind of cereal. When they divorce, there’s an increased chance they'll start buying different brands of beer.
These findings turned out to be the backbone of work by statistician Andrew Pole, who was hired by Target to analyze their data and increase sales. Somewhere along the way, the marketing department at Target asked Pole if there was a way to predict that a customer was expecting a child. Birth records are freely available, so it's easy to send baby-related coupons and advertisements to new mothers, but Target wanted first dibs, before that baby came out.
As you might expect, Pole found 25 products that were strong indicators and soon he had an estimate of pregnancies with a pregnancy prediction score.
Pole applied his program to every regular female shopper in Target's national database and soon had a list of tens of thousands of women who were most likely pregnant. If they could entice those women or their husbands to visit Target and buy baby-related products, the company’s cue-routine-reward calculators could kick in and start pushing them to buy groceries, bathing suits, toys and clothing, as well. When Pole shared his list with the marketers, he said, they were ecstatic. Soon, Pole was getting invited to meetings above his paygrade. Eventually his paygrade went up.
Creepy or just good marketing? I say the latter.
Nate Silver looks at past players who have scored 20 or more points, had 6 or more assists, and shot better than 50 percent in four or more games in a row. It's an illustrious list of all-stars, including Jordan, Bird, and Magic, with only a handful who were just so-so.
Like everyone else, I was skeptical. I saw him play with the Warriors, and it was never that impressive. However, watching last night's game against the Lakers it was hard not to buy in to Linsanity. We'll see if he can extend the streak tonight against Minnesota, but even if the Knicks do win, should we read that much into it? Remember, there aren't that many other scoring options on the Knicks right now, two of the past four wins were against horrible teams (New Jersey and Washington) and the other two, the Lakers and the Jazz, were teams just slightly above .500.
Data science has been covered at length during the past couple of years, and we tend to think of it as a field of study just a couple of years older than that. Jeff Hammerbacher and DJ Patil have played roles in further propagating the term as an actual profession in roughly the same timespan. So I was surprised to come across this rarely-cited 2001 paper by statistician William Cleveland, Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics [pdf].
This document describes a plan to enlarge the major areas of technical work of the ﬁeld of statistics. Because the plan is ambitious and implies substantial change, the altered ﬁeld will be called "data science."
You would think that something so concrete, carefully recorded by authorities, wouldn't be too tough to tabulate, even if at a large scale. Not so.
Homicide is a "serious crime that many people are concerned with, it is well-measured, and it is to a large degree well-reported and -recorded," says Alfred Blumstein, a criminologist at Carnegie Mellon University. "That is not to say that there aren't a variety of ways for fudging the measurement."
Among the factors that cloud homicide numbers: gaps between police-reported numbers and counts by public-health organizations. The discrepancy is wide in many African countries and some Caribbean ones. The United Nations attributes the disparity to several factors, including definitional differences—whether honor killings should count—a lack of public-health infrastructure in some countries, and undercounting—possibly deliberate—by police.
I think this is something the common public often doesn't understand about data. The numbers are entered and analyzed on a computer, so it's easy to mistake data for mechanical output. It must be accurate, right? That's usually not the case though, especially when it comes to data collection outside a controlled lab setting.
The game always changes when humans are involved. Not everyone responds to surveys, definitions of events vary across organizations, estimation methods change every year, and the list goes on.
For those who do stuff with data, you have to deal with that uncertainty, and as data consumers, you have to remember that numbers don't automatically mean fact.
I thought this riveting post on the New York Times Bits blog about the rise of the toilet texter deserved a graphic. Since their graphics department is no doubt busy with elections, I took the liberty. I am — the 91 percent.
I got the numbers straight from the Bits post, but you can download the full report from 11mark for all the demographics. You have to register though, and I didn't want to be the guy who creates an online account to just read a report on what people do while they make dooty. I have standards.
Data is hot right now, so as you would expect, more people are signing up and applying to learn about it. Quentin Hardy for The New York Times reports.
At North Carolina State, an advanced analytics program lasting 10 months has, since its founding in 2006, placed over 90 percent of its students annually. The average graduate’s starting salary for an entry-level job is $73,000. Its current class of 40 students had 185 applicants, and next year’s applications are already twice that. In 2009, Harvard awarded four undergraduate degrees in statistics. Two graduates went into finance, one to political polling and one became a substitute teacher. There were nine graduates in 2010, 13 last year. They headed into Google, biosciences and Wall Street, as well as Stanford's literature department.
And in 2011, just about everywhere.
Priceonomics takes the association of fixie bikes to hipsters, and creates the Fixie Bike Index. After starting with New York, they branch out to national numbers.
In short, fixed gear bikes = hipsters, and New York boroughs that have more fixies per capita should have more hipsters per capita. We sampled our data to see the number of used bikes for sale per capita in each borough with the term "fixie" or "fixed gear" in the product title to create the Fixie Index.
I don't know about these numbers. I lived in Modesto for a year and don't remember people riding bikes — or hipsters, and riding your bike in Los Angeles kind of sucks.
In working with tenants to help their city attorney convict a group of slumlords, an economic justice organization collected public data on housing violations that were going unfixed. They tried standard mind mapping and organization software, but the relationships were too complex to unearth anything useful. So they eventually used social network analysis, revealing money exchanging hands in such a way that allowed owners to strip the value from buildings without actually fixing them.
The analysis results, combined with the city's investigation, allowed key convictions and court-awarded finances for tenants to move elsewhere.
Sounds like a good reason for Data Without Borders.
Legos are the best toys ever invented. That's indisputable fact. So it's no surprise that Mark Changizi et al. at Duke University used the toys in their study of growing complexity of systems and networks. They looked at 389 Lego sets and compared the number of pieces in the set to the number of piece types, as shown above.
Tarot cards don't cut it anymore as a predictors. We turn to data for a look to the future:
"We're finally in a position where people volunteer information about their specific activities, often their location, who they're with, what they're doing, how they're feeling about what they're doing, what they're talking about," said Johan Bollen, a professor at the School of Informatics and Computing at Indiana University Bloomington who developed a way to predict the ups and downs of the stock market based on Twitter activity. "We've never had data like that before, at least not at that level of granularity." Bollen added: "Right now it’s a gold rush."
Or you could just get yourself a flux capacitor and save yourself some time.
Team lead, David Ferrucci, recalls the early days of putting together the team that built Watson:
Likewise, the scientists would have to reject an ego-driven perspective and embrace the distributed intelligence that the project demanded. Some were still looking for that silver bullet that they might find all by themselves. But that represented the antithesis of how we would ultimately succeed. We learned to depend on a philosophy that embraced multiple tracks, each contributing relatively small increments to the success of the project.
As I sit here reading about egos within IBM, with the NFL playoffs in front of me, I can't help but smirk.
Jon Kleinberg, whose work influenced Google's PageRank, is working on ranking something else. Kleinberg et al. developed an algorithm that ranks people, based on how they speak to each other.
"We show that in group discussions, power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to," say Kleinberg and co.
The key to this is an idea called linguistic co-ordination, in which speakers naturally copy the style of their interlocutors. Human behaviour experts have long studied the way individuals can copy the body language or tone of voice of their peers, some have even studied how this effect reveals the power differences between members of the group.
Now Kleinberg and co say the same thing happens with language style.
That's why I just don't talk at all. Introvert to the max.
Carl Bialik, for The Wall Street Journal, reports on PSAs and the use of scary numbers:
The Ad Council usually avoids statistics in PSAs. "We know from our experience that effective advertising has to have an emotional component and statistics-based campaigns can be very rational," Conlon said. "We’ve also found that people tend not to believe statistics."
And sometimes they just don’t care much about them. "When we were developing our underage drinking prevention campaign," Conlon recalled, "we found that it doesn't resonate with parents to learn about how many children are drinking underage. It's too easy for them to say 'it's not my child.' We found that it was much more compelling to include a statistic that was more about the consequences of underage drinking: Those who start drinking before age 15 are six times more likely to have alcohol problems as adults than those who start drinking at age 21 or older."
The well-known Stalin quote comes to mind.
Stop global warming. Decrease the National Science Foundation's R&D budget. It's so easy. More lessons on correlation and causation found here.
Facebook logs and saves a lot of data about you and what you do on their site. This shouldn't be surprising given the more time people spend on Facebook, the greater the cash flow, but just how much data do they store? Austrian law student Max Schrems, because European law states that citizens can do this, requested all the data Facebook had about him. He got back a CD with 1,222 PDF files.
Charts and graphs are great, because they can let you see a pattern that you might not see in a spreadsheet, but they only work when you use the actual data. Fox News isn't doing themselves any favors by putting up this chart. It shows the recently announced drop in unemployment rate to 8.6 percent as a non-change.
Testing the idea of six degrees of separation, first proposed by Frigyes Karinthy, the Facebook Data Team and researchers at the Università degli Studi di Milano found that most of us are connected by even fewer degrees, and average separation is getting smaller:
While we will never know if it was true in 1929, the scale and international reach of Facebook allows us to finally perform this study on a global scale. Using state-of-the-art algorithms developed at the Laboratory for Web Algorithmics of the Università degli Studi di Milano, we were able to approximate the number of hops between all pairs of individuals on Facebook. We found that six degrees actually overstates the number of links between typical pairs of users: While 99.6% of all pairs of users are connected by paths with 5 degrees (6 hops), 92% are connected by only four degrees (5 hops). And as Facebook has grown over the years, representing an ever larger fraction of the global population, it has become steadily more connected. The average distance in 2008 was 5.28 hops, while now it is 4.74.
So when you see random strangers, shake their hands and say hello. You're practically best friends.
Too bad there isn't an interactive we can enter random names on to see how close we are.
There's so much emphasis and attention on Black Friday, the day of sales after Thanksgiving in the states. People line up for hours before stores open at midnight in hopes that they'll be able to get the best deal, but it looks like Black Friday isn't even the day to get the best deals:
For higher-end electronics, Mr. de Grandpre’s trends show, shoppers should wait until the week after Thanksgiving.
"Black Friday is about cheap stuff at cheap prices, and I mean cheap in every connotation of the word,” Mr. de Grandpre said. Manufacturers like Dell or HP will allow their cheap laptops to be discounted via retailers on that Friday, but they will reserve markdowns through their own sites for later.
When later? Cyber Monday is a good day to buy.
On a whim, we found ourselves at a midnight Black Friday at the mall. I was like, "Eh, it shouldn't be that busy this late at night." So wrong. The avoidance of large crowds is enough of an incentive for me to wait. Although if I were a young, teenage girl in the market for a nice pair of boots, I suppose I might sing a different tune.
Saturday Morning Breakfast Cereal on significant digits and statisticians' natural disbelief in numbers. Life is so hard. [Thanks, Michael]