• How do people use Firefox?

    November 30, 2010  |  Data Sources, News

    Mozilla Labs just released a bunch of anonymized browsing data for their open data visualization competition:

    This competition is based on Mozilla's own open data program, Test Pilot. Test Pilot is a user research platform that collects structured user data through Firefox. All data is gathered through pre-defined Test Pilot studies, which aim to explore how people use their web browser and the Internet.

    There are two datasets in various formats. The first is browsing behavior from 27,000 users, including on/off private browsing that we saw a few months ago. The second dataset is from 160,000 users and is on how they actually use the Firefox interface.

    Additionally, both sets have survey answers to questions like "How long have you used Firefox?" which could make for some fun and interesting breakdowns.

    The deadline is December 17.

    [Mozilla Labs]

  • Statistics vs. Stories

    November 29, 2010  |  Statistics

    Professor of Mathematics at Temple University, John Allen Paulos describes the differences between statistics and stories:

    [T]here is a tension between stories and statistics, and one under-appreciated contrast between them is simply the mindset with which we approach them. In listening to stories we tend to suspend disbelief in order to be entertained, whereas in evaluating statistics we generally have an opposite inclination to suspend belief in order not to be beguiled.

    And he concludes:

    The focus of stories is on individual people rather than averages, on motives rather than movements, on point of view rather than the view from nowhere, context rather than raw data. Moreover, stories are open-ended and metaphorical rather than determinate and literal.

    Which way do we go when we start telling stories with data?

    [New York Times via @joandimicco]

  • R is the need-to-know stat software

    November 17, 2010  |  Software, Statistics

    This Forbes post on the greatness that is R is being passed around by every statistician and his mother today.

    It's not that this type of analysis wasn't possible before — statisticians have existed, and commercial software has been available to support them, for decades. The fact that R is free to use, free to modify, and its source is open to view, extend and improve means students, stock traders-in-training and fantasy football junkies can familiarize themselves with the software. They can write programs against it. They're likely to continue that usage into their professional lives. When they share their work, the community, down the line, benefits. And the virtuous cycle strengthens.

    What's your favorite (graphical) use of R?

  • Recalls for March

    Making recalls and market withdrawals more accessible

    Last week I found out that the FDA has a feed for all product recalls and market withdrawals since 2009 and an RSS feed with…
  • Simple analysis makes Expedia extra $12m

    November 5, 2010  |  Statistics

    There was a problem on Expedia where a lot of people were choosing their itinerary, entering their information and then dropping off after they clicked on the Buy Now button. It's like getting to the cash register at a store, and the cashier says they can't take your money.

    So analysts took a look and found that the field to enter your company was confusing people, leading to the input of an incorrect address. "After we realised that we just went onto the site and deleted that field — overnight there was a step function [change], resulting in $12m of profit a year, simply by deleting a field."

    Not bad for a little bit of data digging. I hope the analysts got a bonus.

    That said, not every decision has to be driven by data. Balance is good.

    [Silicon via @jpmarcum]

  • Stat concepts to the tune of Gershwin

    October 29, 2010  |  Statistics

    Stat people will probably find this amusing. For the rest, this might make your head explode. Gurdeep Stephens and Michael Greenacre perform classic songs but use statistical concepts for lyrics. Here's Summertime, originally by George Gershwin, turned into a song about statistical modeling (video below).

    It's summertime,
    Statistical modelling is easy,
    Data are fitting,
    Explained variance is high.
    Your data are rich,
    And your model's good-looking,
    So hush, statisticians, don't you cry...

    Continue Reading

  • Opportunities in Government 2.0

    October 27, 2010  |  Data Sources

    Vivek Wadhwa talks government data and the (financial) opportunities ripe for the picking:

    What is happening with the opening up of government data is nothing less than a silent revolution. There are literally thousands of new opportunities to improve government and to improve society—and to make a fortune while doing it. Unlike the Web 2.0 space, which is overcrowded, Gov 2.0 is uncharted territory: a new frontier to explore, grow things on, and settle on. It’s fresh soil for unlikely seedling ideas that, if they take root, could lead to very successful ventures. So I encourage entrepreneurs to stake their claims as soon as they can.

    Wait a minute. Hold up. You can do more with government data than awkward dashboards? Bring it.

    [TechCrunch via @ucdatalab]

  • A different analytical wall

    October 14, 2010  |  Statistics

    In reference to the wall between reporting data and understanding it, Martin Theus proposes a different one:

    Once you start to explore the data, the whole thing stops to be linear but gets to be very iterative, jumping over the wall every now and then. I.e., you may find out that the data cleaning is insufficient, or the model you have in mind needs some other transformation of the data, or you might want to collect additional or other data altogether.

    The wall does exist, but I think it is more separating two kinds of people / thinking.

    Theus finishes:

    One thing is for sure: we won’t succeed if analysts continue to build useful but technically insufficient tools and computer scientists still build fancy tools that merely help the analysts.

    Or even better: analyst and tool builder become the same person. That'll take much longer though, so communication is a good place to start.

    [Theusrus]

  • OkCupid explores gay and straight stereotypes

    October 12, 2010  |  Statistics

    Online dating site OkCupid dives into their data for 3.2 million users again, this time to explore gay and straight stereotypes. Many are false. Some are true. Among the findings: who's gay curious in the United States and who thinks the earth is bigger than the sun.

  • The simple truth about statistics

    October 10, 2010  |  Quicklinks, Statistics

    Matt Parker explains why no one should be fooled by a misuse of statistics just like no one was fooled by "I did not have sexual relations with that woman."

  • The real stuff white people like

    September 13, 2010  |  Statistics

    Stuff white males like according to OkCupid

    Online dating site OkCupid continues their run of amusing yet thorough analysis of their users. This time: the real stuff white people like. Well actually, the stuff that all races like:

    We selected 526,000 OkCupid users at random and divided them into groups by their (self-stated) race. We then took all these people's profile essays (280 million words in total!) and isolated the words and phrases that made each racial group's essays statistically distinct from the others'.

    Top phrase for white males? Tom Clancy. White female? The Red Sox. Black males? Soul food. Black females? Soul food. Asian males? Taiwan. Asian females? Coz. Yeah, I don't know what that is either.

    [Thanks, John]

  • Various ways to rate a college

    September 8, 2010  |  Network Visualization, Statistics

    Measures for different college ratings

    There are a bunch of college ratings out there to help students decide what college to apply to (and give something for alumni to gloat about). The tough part is that there doesn't seem to be any agreement on what makes a good college. Alex Richards and Ron Coddington describe the discrepancies.
    Continue Reading

  • Simple data converter from Excel

    September 6, 2010  |  Online Applications, Statistics

    If you've ever created an interactive graphic or anything else that requires that you feed in data, you will love this barebones data conversion tool by Shan Carter. Copy and paste data from Excel, which I feel like I've done a billion times, and then take your pick from Actionscript, JSON, XML, and Ruby. Simple, but a potential time saver. [via]

  • Statistical literacy guides for the basics

    September 3, 2010  |  Statistics

    Guide to statistical charts - before and after

    You can get pretty far with data graphics with just limited statistical knowledge, but if you want to take your skills, resume, and portfolio to the next level, you should learn standard data practices. Of all places, UK Parliament has some short and free guides to help you with basic statistical concepts. They provide 13 notes, each only two or three pages long that can help you with stuff like how to adjust for inflation, confidence intervals and statistical significance, or basic graph suggestions [pdf]. I like.

    [via | Thanks, @joemako]

  • How people use private browsing

    August 25, 2010  |  Data Sources, Statistics

    Time of day people use private browsing

    Private browsing. All the modern browsers have it. Turn it on, and the browser won't keep your history during the session. Sometimes it's used to pay bank bills on a public computer. Sometimes it's used for other stuff. In an opt-in study looking at a week in the life of a browser, Mozilla looked at how people use private browsing.

    Again, it's worth noting that people opted in to this study (about 4,000 of them), and Mozilla only recorded when users started and stopped private browsing. Nothing in between.

    That said, they came up with two basic findings. The first is when people typically use private browsing (above).

    They saw usage spikes during the lunch hours as well as just before the work day ended. The other spike is after the dinner hours and then finally, in the late hours of the night.
    Continue Reading

  • Harvard scientist found guilty of misconduct

    August 22, 2010  |  Mistaken Data

    Shady research from Harvard scientist Marc Hauser is confirmed:

    On Friday, Michael D. Smith, dean of the Harvard faculty of arts and sciences, issued a letter to the faculty confirming the inquiry and saying the eight instances of scientific misconduct involved problems of “data acquisition, data analysis, data retention, and the reporting of research methodologies and results.” No further details were given.

    This is why we don't just accept any old data and why we care about the methodology behind the numbers. Stuff like this always reminds me of an exam question that asked us to investigate the data from an article in a prominent scientific journal. The analysis was all wrong.

    Sometimes data is wrong out of ignorance. Other times it's wrong because people make stuff up. I can understand the former, but why you would ever do the latter is beyond me.

    [via]

    Update: More details on what happened from research assistants' point of view on the Chronicle. [thx, Winawer]

  • How weather data became open data

    August 18, 2010  |  Data Sources

    Weather in the private sector is over a $1.5 billion industry, and it's largely because of the government's open weather data. You can find what the weather is just about anywhere with just a few clicks of the mouse. It wasn't always like that though. Clay Johnson, former director of Sunlight Labs, describes the history of open weather data, starting with Thomas Jefferson in the late 1700s.
    Continue Reading

  • How data will improve health care

    August 12, 2010  |  Statistics

    how data will imrpove health care

    My wife is an ER doc, so I hear about this sort of stuff all the time. Hospitals are going all-digital, and the exchange of data from doctor to doctor, from hospital to hospital, from patient to doctor, and doctor to patient is only going to get easier.

    This expedited exchange of information will bring advantages such as fewer prescription errors, easier hospital transfers, and through sensors and mobile devices, professional health practitioners will be able to provide better care to those with chronic health conditions. This illustration from Chris Luongo explains a bit more.

    Naturally, with all these benefits come plenty of challenges. Data privacy is huge here. Can you imagine if your medical charts ended up in some random hacker's hands and then sold to the highest bidder? At least we might get more useful spam. I want big discounts on mis-spelled drugs that I actually need.

    Seriously though. Data is blowing up, and there's going to be monster demand for data scientists in the next ten years. See that wagon? Better jump on it while there's still room.

    [via Smarter Planet]

  • iPhone users are more promiscuous

    August 11, 2010  |  Statistics

    Sex and Smart Phones By Age

    I should just automatically bring the OkTrends feed into FlowingData. In their never-ending quest to understand humankind, the group from online dating site OkCupid analyzes 11.4 million opinions on what makes a "great" photo - as in makes people want to date you. Some of the findings include: photos from Panasonic Micro 4/3s were best received, "photo attractiveness" decreased by age, and the Flash adds seven years.

    There's one finding that's got everyone buzzing though. iPhone users have more sexual partners. See the graph above and below for the numbers.
    Continue Reading

  • Lies people tell in online dating

    August 5, 2010  |  Statistics

    Male height distribution graph on OkCupid

    Online dating site OkCupid continues with amusing yet thorough analysis of their 1.51 million users. This time around, they cover the lies people tell:

    People do everything they can in their OkCupid profiles to make themselves seem awesome, and surely many of our users genuinely are. But it's very hard for the casual browser to tell truth from fiction. With our behind-the-scenes perspective, we're able to shed some light on some typical claims and the likely realities behind them.

    Among the findings:

    • People exaggerate their height by about two inches.
    • If someone says they make $100k per year, they probably mean $80k.
    • The more attractive a picture, the older it is.
    • Most self-identified bisexuals (80%) only like one gender.

    Buyer beware.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.