• OkCupid explores gay and straight stereotypes

    October 12, 2010  |  Statistics

    Online dating site OkCupid dives into their data for 3.2 million users again, this time to explore gay and straight stereotypes. Many are false. Some are true. Among the findings: who's gay curious in the United States and who thinks the earth is bigger than the sun.

  • The simple truth about statistics

    October 10, 2010  |  Quicklinks, Statistics

    Matt Parker explains why no one should be fooled by a misuse of statistics just like no one was fooled by "I did not have sexual relations with that woman."

  • The real stuff white people like

    September 13, 2010  |  Statistics

    Stuff white males like according to OkCupid

    Online dating site OkCupid continues their run of amusing yet thorough analysis of their users. This time: the real stuff white people like. Well actually, the stuff that all races like:

    We selected 526,000 OkCupid users at random and divided them into groups by their (self-stated) race. We then took all these people's profile essays (280 million words in total!) and isolated the words and phrases that made each racial group's essays statistically distinct from the others'.

    Top phrase for white males? Tom Clancy. White female? The Red Sox. Black males? Soul food. Black females? Soul food. Asian males? Taiwan. Asian females? Coz. Yeah, I don't know what that is either.

    [Thanks, John]

  • Various ways to rate a college

    September 8, 2010  |  Network Visualization, Statistics

    Measures for different college ratings

    There are a bunch of college ratings out there to help students decide what college to apply to (and give something for alumni to gloat about). The tough part is that there doesn't seem to be any agreement on what makes a good college. Alex Richards and Ron Coddington describe the discrepancies.
    Continue Reading

  • Simple data converter from Excel

    September 6, 2010  |  Online Applications, Statistics

    If you've ever created an interactive graphic or anything else that requires that you feed in data, you will love this barebones data conversion tool by Shan Carter. Copy and paste data from Excel, which I feel like I've done a billion times, and then take your pick from Actionscript, JSON, XML, and Ruby. Simple, but a potential time saver. [via]

  • Statistical literacy guides for the basics

    September 3, 2010  |  Statistics

    Guide to statistical charts - before and after

    You can get pretty far with data graphics with just limited statistical knowledge, but if you want to take your skills, resume, and portfolio to the next level, you should learn standard data practices. Of all places, UK Parliament has some short and free guides to help you with basic statistical concepts. They provide 13 notes, each only two or three pages long that can help you with stuff like how to adjust for inflation, confidence intervals and statistical significance, or basic graph suggestions [pdf]. I like.

    [via | Thanks, @joemako]

  • How people use private browsing

    August 25, 2010  |  Data Sources, Statistics

    Time of day people use private browsing

    Private browsing. All the modern browsers have it. Turn it on, and the browser won't keep your history during the session. Sometimes it's used to pay bank bills on a public computer. Sometimes it's used for other stuff. In an opt-in study looking at a week in the life of a browser, Mozilla looked at how people use private browsing.

    Again, it's worth noting that people opted in to this study (about 4,000 of them), and Mozilla only recorded when users started and stopped private browsing. Nothing in between.

    That said, they came up with two basic findings. The first is when people typically use private browsing (above).

    They saw usage spikes during the lunch hours as well as just before the work day ended. The other spike is after the dinner hours and then finally, in the late hours of the night.
    Continue Reading

  • Harvard scientist found guilty of misconduct

    August 22, 2010  |  Mistaken Data

    Shady research from Harvard scientist Marc Hauser is confirmed:

    On Friday, Michael D. Smith, dean of the Harvard faculty of arts and sciences, issued a letter to the faculty confirming the inquiry and saying the eight instances of scientific misconduct involved problems of “data acquisition, data analysis, data retention, and the reporting of research methodologies and results.” No further details were given.

    This is why we don't just accept any old data and why we care about the methodology behind the numbers. Stuff like this always reminds me of an exam question that asked us to investigate the data from an article in a prominent scientific journal. The analysis was all wrong.

    Sometimes data is wrong out of ignorance. Other times it's wrong because people make stuff up. I can understand the former, but why you would ever do the latter is beyond me.

    [via]

    Update: More details on what happened from research assistants' point of view on the Chronicle. [thx, Winawer]

  • How weather data became open data

    August 18, 2010  |  Data Sources

    Weather in the private sector is over a $1.5 billion industry, and it's largely because of the government's open weather data. You can find what the weather is just about anywhere with just a few clicks of the mouse. It wasn't always like that though. Clay Johnson, former director of Sunlight Labs, describes the history of open weather data, starting with Thomas Jefferson in the late 1700s.
    Continue Reading

  • How data will improve health care

    August 12, 2010  |  Statistics

    how data will imrpove health care

    My wife is an ER doc, so I hear about this sort of stuff all the time. Hospitals are going all-digital, and the exchange of data from doctor to doctor, from hospital to hospital, from patient to doctor, and doctor to patient is only going to get easier.

    This expedited exchange of information will bring advantages such as fewer prescription errors, easier hospital transfers, and through sensors and mobile devices, professional health practitioners will be able to provide better care to those with chronic health conditions. This illustration from Chris Luongo explains a bit more.

    Naturally, with all these benefits come plenty of challenges. Data privacy is huge here. Can you imagine if your medical charts ended up in some random hacker's hands and then sold to the highest bidder? At least we might get more useful spam. I want big discounts on mis-spelled drugs that I actually need.

    Seriously though. Data is blowing up, and there's going to be monster demand for data scientists in the next ten years. See that wagon? Better jump on it while there's still room.

    [via Smarter Planet]

  • iPhone users are more promiscuous

    August 11, 2010  |  Statistics

    Sex and Smart Phones By Age

    I should just automatically bring the OkTrends feed into FlowingData. In their never-ending quest to understand humankind, the group from online dating site OkCupid analyzes 11.4 million opinions on what makes a "great" photo - as in makes people want to date you. Some of the findings include: photos from Panasonic Micro 4/3s were best received, "photo attractiveness" decreased by age, and the Flash adds seven years.

    There's one finding that's got everyone buzzing though. iPhone users have more sexual partners. See the graph above and below for the numbers.
    Continue Reading

  • Lies people tell in online dating

    August 5, 2010  |  Statistics

    Male height distribution graph on OkCupid

    Online dating site OkCupid continues with amusing yet thorough analysis of their 1.51 million users. This time around, they cover the lies people tell:

    People do everything they can in their OkCupid profiles to make themselves seem awesome, and surely many of our users genuinely are. But it's very hard for the casual browser to tell truth from fiction. With our behind-the-scenes perspective, we're able to shed some light on some typical claims and the likely realities behind them.

    Among the findings:

    • People exaggerate their height by about two inches.
    • If someone says they make $100k per year, they probably mean $80k.
    • The more attractive a picture, the older it is.
    • Most self-identified bisexuals (80%) only like one gender.

    Buyer beware.

  • Afghanistan war logs revealed and mapped

    July 27, 2010  |  Data Sources, Mapping

    Afghanistan incidents from war logs

    This past Sunday, well-known whistle-blower site Wikileaks released over 91,000 secret US military reports, covering the war in Afghanistan. Each report contains the time, geographic location, and details of an event the US military thought was important enough to put on paper.
    Continue Reading

  • Tardiness solves statistics theorems

    July 21, 2010  |  Statistics

    Yeah, you read that right. Tardiness makes the world go 'round:

    One day in 1939, Berkeley doctoral candidate George Dantzig arrived late for a statistics class taught by Jerzy Neyman. He copied down the two problems on the blackboard and turned them in a few days later, apologizing for the delay — he’d found them unusually difficult. Distracted, Neyman told him to leave his homework on the desk.

    On a Sunday morning six weeks later, Neyman banged on Dantzig’s door. The problems that Dantzig had assumed were homework were actually unproved statistical theorems that Neyman had been discussing with the class — and Dantzig had proved both of them. Both were eventually published, with Dantzig as coauthor.

    Other benefits include more hours of sleep, exercise while power-walking to your destination, and all-around warm, fuzzy feelings knowing that you live by nobody's schedule. You might also supposedly inspire films like Good Will Hunting.

    Who knew?

    [via Bobulate]

  • Open data doesn’t empower communities

    July 5, 2010  |  Data Sharing

    internet.artizans reflects on the usefulness of open data:

    I'm inspired by the idea that nuggets of opened data could seed guerilla public services, plugging gaps left by government, but i don't see any of that in the data.gov.uk apps list. The reasons aren't technical but psychosocial - the people and communities who could use this data to help tackle their own disadvantage and marginalisation don't have the self-confident sense of entitlement that makes for successful civic hacktivism.

    The groups that really need it also often don't have the tech or know-how to make use of - or even collect useful data - to make a case for anything. People like us, the data and tech-savvy can help.

    [via migurski]

  • Data and its impact on journalism

    June 7, 2010  |  Data Sources, Statistics

    In regards to the UK's recent boom in open data, Simon Rogers of the Guardian, ponders data's role in journalism, and the opportunities this new found information could bring:

    The impact on journalism is expected to be great. The Chicago-based web developer and founder of the neighbourhood news site EveryBlock, Adrian Holovaty, says it's going to be challenging but exciting for journalists. "As more governments open their data, journalists lose privileged status as gatekeepers of information – but the need for their work as curators and explainers increases. The more data that's available in the world, the more essential it is for somebody to make sense of it."

    This need not only creates a fresh brand of news, but also a new type of journalist:

    I once prided myself on my lack of maths knowledge. Now I find myself editing a datajournalism site, the Guardian's datablog: a site where we use Google Spreadsheets to post key datasets. We make the data properly accessible, then encourage our users to take the numbers, produce graphics and applications and help us look for stories.

    Priding yourself on a lack of know-how on how to deal with data is a little weird, but okay.

    In any case, people always ask me how to get into information design, infographics, visualization etc. Journalism is one of those choices, and there's a lot of opportunity there if you've got the skills.

  • Egregious Citations Issued to BP

    June 6, 2010  |  Data Sources

    BP processes about 1.5 million barrels of crude oil per day, across six refineries in the United States. In total, 150 refineries in the United States process just under 18 million barrels per day, so BP processes about 8.5 percent of it. However, as reported by the Center for Public Integrity, 97 percent of the most dangerous violations found by OSHA were on BP properties.
    Continue Reading

  • Data Science is catching on

    June 2, 2010  |  Statistics

    Maybe there's something to this whole data science thing after all. Mike Loukides describes data science and where it's headed on O'Reilly Radar. It's a good read, but statisticians get clumped into suits crunching numbers like actuarial drones:

    Using data effectively requires something different from traditional statistics, where actuaries in business suits perform arcane but fairly well-defined kinds of analysis. What differentiates data science from statistics is that data science is a holistic approach. We're increasingly finding data in the wild, and data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others.

    What is data science? It's what well-rounded statisticians do.

  • Live webcast: Community Health Data Initiative

    June 2, 2010  |  Data Sources, News

    Health and Human Services (HHS) is about to announce the launch of their Community Health Data Initiative over in DC right now. The point is to make health data more usable for consumers and communities.

    Today groups will be presenting how they've made use of the data in the past few weeks from about 9:30 to 10:30 - as in right now. I've embedded the live webcast below.

    They're just going through the formalities of thank yous and intros right now, but the good stuff should start soon.
    Continue Reading

  • Junk food equivalents of sugary drinks

    May 28, 2010  |  Statistics

    Men's Health takes a look at America's most sugary drinks and their junk food equivalents. A Peppermint White Chocolate Mocha with whipped cream (venti size) from Starbucks has the same amount of sugar as 8½ scoops of Edy’s Slow Churned Rich and Creamy Coffee Ice Cream. Calorie-wise, the picture might look a little different. Still though, that's a lot of sugar.

    Be careful what you drink, boys and girls.

    [via Boing Boing]

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.