• Lego mathematics and growing complexity in networks

    January 12, 2012  |  Statistics

    lego curve

    Legos are the best toys ever invented. That's indisputable fact. So it's no surprise that Mark Changizi et al. at Duke University used the toys in their study of growing complexity of systems and networks. They looked at 389 Lego sets and compared the number of pieces in the set to the number of piece types, as shown above.
    Continue Reading

  • Predicting the future of prediction

    January 9, 2012  |  Statistics

    Tarot cards don't cut it anymore as a predictors. We turn to data for a look to the future:

    "We're finally in a position where people volunteer information about their specific activities, often their location, who they're with, what they're doing, how they're feeling about what they're doing, what they're talking about," said Johan Bollen, a professor at the School of Informatics and Computing at Indiana University Bloomington who developed a way to predict the ups and downs of the stock market based on Twitter activity. "We've never had data like that before, at least not at that level of granularity." Bollen added: "Right now it’s a gold rush."

    Or you could just get yourself a flux capacitor and save yourself some time.

    [Boston]

  • Teamwork and collaboration that built Watson

    January 8, 2012  |  Statistics

    Team lead, David Ferrucci, recalls the early days of putting together the team that built Watson:

    Likewise, the scientists would have to reject an ego-driven perspective and embrace the distributed intelligence that the project demanded. Some were still looking for that silver bullet that they might find all by themselves. But that represented the antithesis of how we would ultimately succeed. We learned to depend on a philosophy that embraced multiple tracks, each contributing relatively small increments to the success of the project.

    As I sit here reading about egos within IBM, with the NFL playoffs in front of me, I can't help but smirk.

    [New York Times via Simply Statistics]

  • Algorithm estimates who’s in control

    January 4, 2012  |  Statistics

    Jon Kleinberg, whose work influenced Google's PageRank, is working on ranking something else. Kleinberg et al. developed an algorithm that ranks people, based on how they speak to each other.

    "We show that in group discussions, power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to," say Kleinberg and co.

    The key to this is an idea called linguistic co-ordination, in which speakers naturally copy the style of their interlocutors. Human behaviour experts have long studied the way individuals can copy the body language or tone of voice of their peers, some have even studied how this effect reveals the power differences between members of the group.

    Now Kleinberg and co say the same thing happens with language style.

    That's why I just don't talk at all. Introvert to the max.

    [Technology Review]

  • When numbers are too factual

    December 19, 2011  |  Statistics

    Carl Bialik, for The Wall Street Journal, reports on PSAs and the use of scary numbers:

    The Ad Council usually avoids statistics in PSAs. "We know from our experience that effective advertising has to have an emotional component and statistics-based campaigns can be very rational," Conlon said. "We’ve also found that people tend not to believe statistics."

    And sometimes they just don’t care much about them. "When we were developing our underage drinking prevention campaign," Conlon recalled, "we found that it doesn't resonate with parents to learn about how many children are drinking underage. It's too easy for them to say 'it's not my child.' We found that it was much more compelling to include a statistic that was more about the consequences of underage drinking: Those who start drinking before age 15 are six times more likely to have alcohol problems as adults than those who start drinking at age 21 or older."

    The well-known Stalin quote comes to mind.

    [The Numbers Guy]

  • Causation is real, people

    December 15, 2011  |  Statistics

    amusing correlations

    Stop global warming. Decrease the National Science Foundation's R&D budget. It's so easy. More lessons on correlation and causation found here.

  • What Facebook knows about you

    December 14, 2011  |  Data Sources

    Facebook privacy

    Facebook logs and saves a lot of data about you and what you do on their site. This shouldn't be surprising given the more time people spend on Facebook, the greater the cash flow, but just how much data do they store? Austrian law student Max Schrems, because European law states that citizens can do this, requested all the data Facebook had about him. He got back a CD with 1,222 PDF files.
    Continue Reading

  • Fox News still makes awesome charts

    December 12, 2011  |  Mistaken Data

    unemployment chart by fox news

    Charts and graphs are great, because they can let you see a pattern that you might not see in a spreadsheet, but they only work when you use the actual data. Fox News isn't doing themselves any favors by putting up this chart. It shows the recently announced drop in unemployment rate to 8.6 percent as a non-change.
    Continue Reading

  • Four degrees of separation

    November 30, 2011  |  Statistics

    Facebook four degrees of separation

    Testing the idea of six degrees of separation, first proposed by Frigyes Karinthy, the Facebook Data Team and researchers at the Università degli Studi di Milano found that most of us are connected by even fewer degrees, and average separation is getting smaller:

    While we will never know if it was true in 1929, the scale and international reach of Facebook allows us to finally perform this study on a global scale. Using state-of-the-art algorithms developed at the Laboratory for Web Algorithmics of the Università degli Studi di Milano, we were able to approximate the number of hops between all pairs of individuals on Facebook. We found that six degrees actually overstates the number of links between typical pairs of users: While 99.6% of all pairs of users are connected by paths with 5 degrees (6 hops), 92% are connected by only four degrees (5 hops). And as Facebook has grown over the years, representing an ever larger fraction of the global population, it has become steadily more connected. The average distance in 2008 was 5.28 hops, while now it is 4.74.

    So when you see random strangers, shake their hands and say hello. You're practically best friends.

    Too bad there isn't an interactive we can enter random names on to see how close we are.

    [Facebook]

  • Finding best deals: Black Friday is for retailers

    November 28, 2011  |  Statistics

    There's so much emphasis and attention on Black Friday, the day of sales after Thanksgiving in the states. People line up for hours before stores open at midnight in hopes that they'll be able to get the best deal, but it looks like Black Friday isn't even the day to get the best deals:

    For higher-end electronics, Mr. de Grandpre’s trends show, shoppers should wait until the week after Thanksgiving.

    "Black Friday is about cheap stuff at cheap prices, and I mean cheap in every connotation of the word,” Mr. de Grandpre said. Manufacturers like Dell or HP will allow their cheap laptops to be discounted via retailers on that Friday, but they will reserve markdowns through their own sites for later.

    When later? Cyber Monday is a good day to buy.

    On a whim, we found ourselves at a midnight Black Friday at the mall. I was like, "Eh, it shouldn't be that busy this late at night." So wrong. The avoidance of large crowds is enough of an incentive for me to wait. Although if I were a young, teenage girl in the market for a nice pair of boots, I suppose I might sing a different tune.

    [New York Times via @drewconway]

  • Statisticians and significant digits

    November 23, 2011  |  Statistics

    Saturday Morning significance

    Saturday Morning Breakfast Cereal on significant digits and statisticians' natural disbelief in numbers. Life is so hard. [Thanks, Michael]

  • Analysis of Steve Jobs tribute messages

    October 25, 2011  |  Statistics

    Apple has a page dedicated to Steve Jobs that displays messages from friends, colleagues, and fans. Neil Kodner downloaded those messages and extracted overall themes:

    I wanted to see what how people were speaking about Steve Jobs and especially what terms were being used to describe him. There was no point in performing sentiment analysis on this text as all of the texts were not only obviously positive but were also vetted by Apple for content. Using NLTK, I performed part-of-speech tagging on every word in each tribute message and then wrote some code to total the adjectives and adverbs used in the tribute messages.

    The top descriptors? Not surprisingly: great, many, first, sad, better, best, and visionary. About one in five messages referenced an Apple product.

    The message data and Kodner's code is available on github.

    [Thanks, Guy]

  • When data guys triumph

    October 12, 2011  |  Statistics

    Cade Massey and Bob Tedeschi for The New York Times on the book, now turned movie, "Moneyball" and how it's made data-backed thinking sound less crazy:

    At its heart, of course, "Moneyball" isn’t about baseball. It’s not even about statistics. Rather, it’s about challenging conventional wisdom with data. By embedding this lesson in the story of Billy Beane and the Oakland A's, the book has lured millions of readers into the realm of the geek. Along the way, it converted many into empirical evangelists.

    Good. Sure makes my life a lot easier.

    Is the movie worth the 2 hours and 10 bucks in the theatre? The movie seems right up my alley, but for some reason the previews left me disinterested.

    [New York Times via @alexlundry]

  • A global mood ring called Twitter

    October 7, 2011  |  Statistics

    Twitter moods

    In a follow-up to their mood maps, Scott Golder and Michael Macy of Cornell University look at mood cycles during the hours of the day:

    They found that, on average, people wake up in a good mood, which falls away over the course of the day. Positive feelings peak early in the morning and again nearer midnight, while negative feelings peak between 9pm and 3am. Unsurprisingly, people get happier as the week goes on. They’re most positive on Saturdays and Sundays and they tend to lie in for an extra two hours, as shown by the delayed peak in their positive feelings. The United Arab Emirates provide an interesting exception. There, people work from Sunday to Thursday, and their tweets are most positive on Friday and Saturday.

    It's strange that good mood peaks around midnight. Maybe the people who are in a bad mood slowly go to sleep, leaving only those in a good mood to tweet. Then again, negative mood also seems to peak around midnight. Peculiar. I don't have access to the full article, so if anyone does, I'd be interested to hear Golder and Macy's interpretations.

    [Discover Magazine via @albertocairo]

  • PDF data woes

    September 14, 2011  |  Data Sharing

    We do not provide these tables in Excel or CSV format. You will have to cut and paste from the pdf.

    — A government group that provides a lot of data

    If you're going to provide a dataset to the public, or anyone for that matter, please don't use PDF as your one and only format. At the very least, provide it in Excel. You can easily export spreadsheets to PDF. I don't hold anything against the person who sent me this message. She was just doing her job. But organizations need to get with the times and provide data in a way that is actually usable.

  • Teaching math with context and applications

    September 1, 2011  |  Statistics

    Most of us have gone through the paces of algebra through calculus in high school. I remember lots of problems and fact sheets. Sol Garfunkel and David Mumford imagine a math education system that teaches skills for the real world and increases quantitative literacy:

    Imagine replacing the sequence of algebra, geometry and calculus with a sequence of finance, data and basic engineering. In the finance course, students would learn the exponential function, use formulas in spreadsheets and study the budgets of people, companies and governments. In the data course, students would gather their own data sets and learn how, in fields as diverse as sports and medicine, larger samples give better estimates of averages. In the basic engineering course, students would learn the workings of engines, sound waves, TV signals and computers. Science and math were originally discovered together, and they are best learned together now.

    [New York Times]

  • Geo API from Infochimps brings you closer to mapping fun

    August 31, 2011  |  Data Sources

    Summarizer from infochimps

    Mostly because of the popularity of smartphones, location data is all the rage nowadays. You're almost always connected no matter where you are. Rich location data can help provide you a new sense of place, and at the same time, this sort of data can paint an interesting picture of what's going on in your country or around the world. Hence, Infochimps, the one-stop shop for data folk and developers, just announced their new Geo API.
    Continue Reading

  • Why one death is more moving than a million

    August 31, 2011  |  Statistics

    We read the story about the suffering of an individual, and we're moved. We read in the paper that millions have died over the years due to hunger, and we're not quite as moved. This is due in part to our inability to imagine big numbers, but as David Ropeik for Psychology Today explains, the way we perceive risk also is a factor:

    Paul Slovic, one of the pioneers of research into the way we perceive risk, calls this greater concern for the one than the many "a fundamental deficiency in our humanity." As the world watches but, insufficiently moved, fails to act to prevent mass starvation or stop genocides in Congo or Kosovo or Cambodia or so many more, who would not agree with such a lament. But as heartless as it seems to care more about the one than the many, it makes perfect sense in terms of human psychology. You are a person, not a number. You don't see digits in the mirror, you see a face. And you don't see a crowd. You see an individual. So you and I relate more powerfully to the reality of a single person than to the numbing faceless nameless lifeless abstraction of numbers. "Statistics," as Slovic put it in a paper titled "Psychic Numbing and Genocide", "are human beings with the tears dried off." This tendency to relate more emotionally to the reality of a single person than to two or more people, or to the abstraction of statistics, is especially powerful when it comes to the way we perceive risk and danger, because what might happen to a single real person, might happen to you. As the familiar adage puts it, "There but for the grace of God go I."

    [Psychology Today via @alexlundry]

  • Reporters make it easier to access Census data

    August 29, 2011  |  Data Sources

    Census data can provide valuable information, but the datasets are not always the easiest to access. So you often end up spending a lot of time getting your data in order before you actually get to do anything with it. Investigative Reporters and Editors has released the next phase in their Census project to make Census 2010 more accessible via a simple interface. Easily download data in bulk as CSV or shapefiles or build it into your applications with the API.

    [census.ire.org via @bryanboyer]

  • Statisticians as a tribe

    August 23, 2011  |  Statistics

    Peter Curran for BBC Radio 4 puts the tribe of statisticians under the anthropological microscope. At the Royal Statistical Society Awards and Summer Reception, Curran interviews a number of statisticians on what they do and what statistics is really about. I mainly post this though for the part where he whispers about what he is seeing as if he were in a jungle studying a tribe of monkeys. Cracked me up.

    What's statistics to you?

    [BBC via @TimHarford]

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.