• Pew Research raw survey data now available

    May 25, 2011  |  Data Sources

    The Pew Research churns out a lot of interesting results from a number of surveys about online and American culture, but they usually only shared aggregated results, pre-made charts and graphs. This is well and good for the information-consuming public; however, these results can spawn curiosities that are fun to dig into. Luckily, the Pew Research Center launched a Data Sets section that provides raw survey responses and the questions in a variety of easy-to-use data formats.

    Our raw data, previously posted only as SPSS files, is now available in comma-delimited (.csv) format for all reports going back to 2003. We hope that making our data available in this open-source format will make analysis easier for researchers who don’t own a copy of SPSS to analyze our data.

    This should be fun. Recent datasets include the social side of the Internet, health tracking habits, and reputation management.

    [Pew Research via @kzickhur]

  • Open thread: Can you spot the wrongness in this tax graph?

    May 17, 2011  |  Mistaken Data

    middle class taxes

    The argument behind this graph in The Wall Street Journal is that the middle class has most of the money and ties into a larger argument about who should be taxed what. There is after all a spike in the middle. Is that really the case though? Sound off in the comments.

    (Cheat sheet: Jonathan Chait explains what's going on and Kevin Drum improves the graph to show more truth, although his graph can be improved, too. Grab the data here [Excel spreadsheet] from the IRS, and give it a go.)

    [Wall Street Journal via @joandimicco]

  • Plush statistical distribution pillows

    May 13, 2011  |  Statistics

    Distribution pillows

    For the statistical nerd in you or for the child you are raising as one, Nausicaa Distribution on Etsy sells handmade gifts inspired by statistical distributions. Above shows the dastardly gang of five evil distribution plushies: Weibull, Cauchy, Poisson, Gumbel, and Erlang. Judging by their moustaches, you better watch out when they're around.
    Continue Reading

  • Charts about sex

    April 21, 2011  |  Statistics

    Portion looking for sex

    OkCupid adds another report to their growing list of analyses on relationships. This time around, they look at sex and how ideas vary by demographic. The above graph shows per capita GDP versus portion of people looking for casual sex.

    We were amazed at this result—money seems to be a more powerful influence on sex drive than culture or even religion.

    You have, for example, Portugal, Oman, Slovenia, and Taiwan within a few pixels of each other on the right side of the graph, and Syria, Sri Lanka, and Guatemala almost stacked on the left, and all of them sit along the trend line.

    Interesting as usual. What amazes me more is that so many people answer such private questions. Have any of you tried OkCupid? Are these questions part of the matching process?

    See OkCupid for more findings on sex such as drive and body type and Twitter usage and commitment.

    [OkCupid]

  • Map your location – that your iPhone secretly records

    April 20, 2011  |  Data Sources, Mapping

    iphone gps trace

    Researchers Alasdair Allan and Pete Warden have found that the iPhone records cell tower access, and hence your location, in an easy-to-read file that is transferred as you switch devices. And they do this whether you like it or not.

    The more fundamental problem is that Apple are collecting this information at all. Cell-phone providers collect similar data almost inevitably as part of their operations, but it’s kept behind their firewall. It normally requires a court order to gain access to it, whereas this is available to anyone who can get their hands on your phone or computer.

    Allan and Warden provide an open-source application, iPhone Tracker, that maps that data. The good news is that the data doesn't seem go to be anywhere other than your own backups and devices. Privacy concerns aside, this kind of makes me wish I had an iPhone; although I suspect my map would be painfully boring.

    [iPhone Tracker via Marco]

  • Recession and rise in antidepressant prescriptions

    April 11, 2011  |  Statistics

    Over the past four years there was a 43 percent increase in prescriptions for antidepressants. Some news outlets attribute this rise to the recession. People more depressed equals more drugs. Ben Goldacre of Bad Science explains why said outlets need to be more careful with their analyses.

    From what I can tell, all the reports took an aggregate (the 43 percent) and then made a big assumption to explain it. I'm all for data journalism, but statistics is rarely that straightforward.

  • Thoughts on end of Data.gov

    April 5, 2011  |  Statistics

    In a guest post for the guardian.co.uk Datablog, I thought out loud about the possible end of Data.gov and what it means for open government data. Let me know what you think.

    Update: Funding might not be cut completely (for now).

  • Data.gov and other transparency sites to be shut down due to budget cuts

    March 31, 2011  |  Data Sources

    Last week, there were rumblings over the end of the Statistical Abstract, and I suggested that it was just a sign of changing technologies. I thought that Data.gov and similar sites were the natural progression. Here's the problem with that argument. Congress is planning on shutting down Data.gov and other transparency sites in the next few months.
    Continue Reading

  • Tell-all telephone reveals politician’s life

    March 30, 2011  |  Data Sources, Mapping

    Tell-all telephone

    Not many people understand the importance of data privacy. They don't get out how little bits of information sent from your phone every now and then can show a lot about your day-to-day life.

    As the German government tries to come to a consensus about its data retention rules, Green party politician Malte Spitz retrieved six months of phone data from Deutsche Telekom (by suing them), to show what you can get from a little bit of private mobile data. He handed the data to Zeit Online, and they in turn mapped and animated practically every one of Spitz' moves over half a year and combined it with publicly available information from sources such as his appointment website, blog, and Twitter feed for more context.
    Continue Reading

  • Think Quarterly from Google UK on data

    March 27, 2011  |  Statistics

    “The problem isn’t that specialised companies lack the data they need, it’s that they don’t go and look for it, they don’t understand how to handle it.”

    —Hans Rosling, A Data State of Mind, March 2011

    Google UK produced a short book called Think Quarterly to distribute to partners and advertisers, but it's actually pretty interesting for a more general audience. Articles feature Hans Rosling, Hal Varian, and others. Also a hat tip to FlowingData in Simon Rogers' list of sexy resources.

  • The Like Log Study: Buzzwords and engagement

    March 16, 2011  |  Statistics

    cnn like terms

    The Web is a game of pageviews, and outlets such as Twitter and Facebook are a way to rack up the counts. The more people who share your posts and articles, the more new people that visit your site. So what kind of articles are shared more often? How do people with interact with these articles? Yahoo! research scientist Yury Lifshits digs into Facebook likes for some ideas, using data collected from 45 sites, 100k+ articles, and 40 million reactions, between October 2010 and January 2011.
    Continue Reading

  • Why sports statisticians should be more involved in games

    March 14, 2011  |  Statistics

    Hot off the MIT Sloan Sports Analytics Conference, Sean Gregory argues for more "stats geeks" on the sidelines and in the huddle during the game.

    [S]itting next to your team's manager, a scruffy baseball lifer, in the dugout is not just another scruffy baseball lifer, spitting tobacco. Instead, by his side is a guy with a Ph.D. in theoretical physics, a beautiful mind who can calculate complex probabilities, in real time, in his head. He can tell you the odds of so-and-so throwing such-and-such a pitch to so-and-so on such-and-such a count.

    It's a fluffy article with not much on what the stat person would actually do, so you'll have to imagine. Honestly, I hope sports statistics doesn't come to that though. Unpredictability is what makes games so fun to watch.

    [Time via @amstatnews]

  • Test your Rock-Paper-Scissors strategy against the machine

    March 7, 2011  |  Statistics

    Rock paper scissors

    We learned the strategy to win Rock-Paper-Scissors every time, but does it really work? For the New York Times, Gabriel Dance and Tom Jackson give you your chance:

    Computers mimic human reasoning by building on simple rules and statistical averages. Test your strategy against the computer in this rock-paper-scissors game illustrating basic artificial intelligence. Choose from two different modes: novice, where the computer learns to play from scratch, and veteran, where the computer pits over 200,000 rounds of previous experience against you.

    Be sure to play at least five rounds, and then click on the button to see what the computer is thinking. In veteran mode, the computer searches its database for sequences that match your last five moves and its last five moves and then tries to predict what you'll throw next.

    Are you good enough to beat the basic artificial intelligence?

    [New York Times]

  • Lots of health data released via Health Indicators Warehouse

    March 1, 2011  |  Data Sources

    Health indicators warehouse

    The government has been making a big push for more open health-related data, and a couple of weeks ago, they released a whole bunch of it with the launch of HealthData.gov. It's the same interface as Data.gov, but for health. Additionally, the Health Indicators Warehouse launched with different data and a slightly more useable interface.

    A quick scan of the data available, however, does seem to indicate that a lot of it is spotty or outdated (like on data.gov), which doesn't make it especially useful. For example, some data sets are only one data point, while others are only a single year. At least it's a start.

    [Health Indicators Warehouse via @periscopic]

  • Best Picture vs. most popular – Oscar statistics

    February 26, 2011  |  Statistics

    William Briggs and John Briggs examine the differences between movies that have won Best Picture and those that were top at the Box Office, based on money, gender, age, and genre. "There was only one Oscar winning movie with a leading actress older than 50: Jessica Tandy in Driving Miss Daisy. Eight women were at least 40 in Oscar winning movies, e.g. Myrna Loy, Bette Davis, Sandra Bullock. However, half of these were just 40 or 41."

    [via]

  • Million song dataset available for download

    February 24, 2011  |  Data Sources

    Need music data? Get all the data you want and more from the freely available million song dataset, offered by LabROSA at Columbia University and Echo Nest. There's lots of metadata on song features and your standard stuff like year and artist. There are also several code wrappers and samples to help researchers make use of the data right away.

    [Million Song Dataset via @MacDivaONA]

  • Sunlight Labs opens up Real Time Congress API

    February 17, 2011  |  Data Sources

    Sunlight Labs continues its work for a more open government with its recent release of the Real Time Congress API.

    Today we're making available the Real Time Congress API, a service we've been working on for several months, and will be continuing to expand.

    The Real Time Congress API (RTC) is a RESTful API over the artifacts of Congress, kept up to date in as close to real time as possible. It consists of several live feeds of data, available in JSON or XML. These feeds are filterable and sortable and sliceable in all sorts of different ways, and you can read the docs to see how.

    There are seven data types the API will report:

    • Bills
    • Votes
    • Amendments
    • Videos
    • Floor Updates
    • Committee Hearings
    • Documents

    Now someone has to do something with all of this data coming in. Can you think of a useful application for what is essentially an automated government Twitter feed?

    [Real Time Congress API]

  • A close look at troll comments versus real ones

    February 17, 2011  |  Statistics

    Troll slide

    There's something very strange about the anonymity of the Web that brings out the worst in some people. I don't get it, but it's something we have to deal with for now. Courtney Stanton, who wrote a couple of posts that drew the ire of a bunch of trolls, had a look at the troll comments versus the sincere ones.

    I should warn you that the following content does contain adult themes, but it's the contrast between the groups that's most interesting (and the good use of Many Eyes).
    Continue Reading

  • How tech tools have changed today’s prostitution business

    February 16, 2011  |  Statistics

    Sexwork map

    Sudhir Venkatesh, a professor of sociology at Columbia University, along with his students, has been studying the sex work industry since the 1990s. In a recent article for Wired, Venkatesh describes how the business has changed over the past couple of decades.
    Continue Reading

  • OkCupid: Best questions to ask on a first date

    February 9, 2011  |  Statistics

    BeerGoggles

    OkCupid continues their analysis on the mysteries of the dating world, this time on the best questions to ask on a first date, or rather, the best questions to ask when you actually want to find out something else. Will your date have sex on the first date? Ask your date if he or she likes the taste of beer, because:

    Among all our casual topics, whether someone likes the taste of beer is the single best predictor of if he or she has sex on the first date.

    Well, okay, not entirely correct. The question is if they would consider sex, not if they'd actually do it. I've considered buying just about every Apple product, but all I have is the one Macbook. Still interesting though.

    I'm still waiting for LinkedIn to start doing this sort of analysis. I mean it's more or less the same thing, except you're trying to find a company to work for rather than a partner to, uh, have beer with. Who's with me?

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.
7ads6x98y