• Lots of health data released via Health Indicators Warehouse

    March 1, 2011  |  Data Sources

    Health indicators warehouse

    The government has been making a big push for more open health-related data, and a couple of weeks ago, they released a whole bunch of it with the launch of HealthData.gov. It's the same interface as Data.gov, but for health. Additionally, the Health Indicators Warehouse launched with different data and a slightly more useable interface.

    A quick scan of the data available, however, does seem to indicate that a lot of it is spotty or outdated (like on data.gov), which doesn't make it especially useful. For example, some data sets are only one data point, while others are only a single year. At least it's a start.

    [Health Indicators Warehouse via @periscopic]

  • Best Picture vs. most popular – Oscar statistics

    February 26, 2011  |  Statistics

    William Briggs and John Briggs examine the differences between movies that have won Best Picture and those that were top at the Box Office, based on money, gender, age, and genre. "There was only one Oscar winning movie with a leading actress older than 50: Jessica Tandy in Driving Miss Daisy. Eight women were at least 40 in Oscar winning movies, e.g. Myrna Loy, Bette Davis, Sandra Bullock. However, half of these were just 40 or 41."


  • Million song dataset available for download

    February 24, 2011  |  Data Sources

    Need music data? Get all the data you want and more from the freely available million song dataset, offered by LabROSA at Columbia University and Echo Nest. There's lots of metadata on song features and your standard stuff like year and artist. There are also several code wrappers and samples to help researchers make use of the data right away.

    [Million Song Dataset via @MacDivaONA]

  • Sunlight Labs opens up Real Time Congress API

    February 17, 2011  |  Data Sources

    Sunlight Labs continues its work for a more open government with its recent release of the Real Time Congress API.

    Today we're making available the Real Time Congress API, a service we've been working on for several months, and will be continuing to expand.

    The Real Time Congress API (RTC) is a RESTful API over the artifacts of Congress, kept up to date in as close to real time as possible. It consists of several live feeds of data, available in JSON or XML. These feeds are filterable and sortable and sliceable in all sorts of different ways, and you can read the docs to see how.

    There are seven data types the API will report:

    • Bills
    • Votes
    • Amendments
    • Videos
    • Floor Updates
    • Committee Hearings
    • Documents

    Now someone has to do something with all of this data coming in. Can you think of a useful application for what is essentially an automated government Twitter feed?

    [Real Time Congress API]

  • A close look at troll comments versus real ones

    February 17, 2011  |  Statistics

    Troll slide

    There's something very strange about the anonymity of the Web that brings out the worst in some people. I don't get it, but it's something we have to deal with for now. Courtney Stanton, who wrote a couple of posts that drew the ire of a bunch of trolls, had a look at the troll comments versus the sincere ones.

    I should warn you that the following content does contain adult themes, but it's the contrast between the groups that's most interesting (and the good use of Many Eyes).
    Continue Reading

  • How tech tools have changed today’s prostitution business

    February 16, 2011  |  Statistics

    Sexwork map

    Sudhir Venkatesh, a professor of sociology at Columbia University, along with his students, has been studying the sex work industry since the 1990s. In a recent article for Wired, Venkatesh describes how the business has changed over the past couple of decades.
    Continue Reading

  • OkCupid: Best questions to ask on a first date

    February 9, 2011  |  Statistics


    OkCupid continues their analysis on the mysteries of the dating world, this time on the best questions to ask on a first date, or rather, the best questions to ask when you actually want to find out something else. Will your date have sex on the first date? Ask your date if he or she likes the taste of beer, because:

    Among all our casual topics, whether someone likes the taste of beer is the single best predictor of if he or she has sex on the first date.

    Well, okay, not entirely correct. The question is if they would consider sex, not if they'd actually do it. I've considered buying just about every Apple product, but all I have is the one Macbook. Still interesting though.

    I'm still waiting for LinkedIn to start doing this sort of analysis. I mean it's more or less the same thing, except you're trying to find a company to work for rather than a partner to, uh, have beer with. Who's with me?

  • Statistician cracks the scratch lottery code

    February 7, 2011  |  Statistics

    Statisticians everywhere are squealing in delight over this story on fellow statistician Mohan Srivastava, who used his know-how to crack the code of a tic-tac-toe scatcher lottery game. After winning three dollars on a scratcher ticket that was given to him as a gag gift, Srivastava got to wondering about the process of how tickets were made. As a geological consultant who figures out if areas are worth mining for gold, he wondered if he could do the same with this scatcher.
    Continue Reading

  • Stock market predictions with Twitter

    February 3, 2011  |  Statistics

    Apparently moods on Twitter can be used to predict the ups and downs of the stock market, according to work from Johan Bollen and Huina Mao of Indiana University-Bloomington: "Measuring how calm the Twitterverse is on a given day can foretell the direction of changes to the Dow Jones Industrial Average three days later with an accuracy of 86.7 percent."

    I can't wait until Twitter is used to predict when I want to eat and sleep, and my robot can cook me gourmet meals and provide turn down service accordingly. And it better be accurate to the minute. Anything less is failure.

  • Predicting crime before it happens

    February 3, 2011  |  Statistics

    Christopher Beam for Slate explains research being done at UCLA in collaboration with the LAPD on predictive policing:

    Predictive policing is based on the idea that some crime is random—but a lot isn't. For example, home burglaries are relatively predictable. When a house gets robbed, the likelihood of that house or houses near it getting robbed again spikes in the following days. Most people expect the exact opposite, figuring that if lightning strike once, it won't strike again. "This type of lightning does strike more than once," says Brantingham. Other crimes, like murder or rape, are harder to predict. They're more rare, for one thing, and the crime scene isn't always stationary, like a house. But they do tend to follow the same general pattern. If one gang member shoots another, for example, the likelihood of reprisal goes up.

    This happened in my neighborhood when I was in fifth grade. We lived in a pretty quiet neighborhood, but one morning a window was open. Someone had come into our house while we were sleeping and stole whatever was in immediate reach. They also stole my dad's brand new bicycle from the garage. Same thing happened to my neighbor two days later.

    [Slate via @amstatnews]

  • Find more of the data you need with DataMarket

    January 31, 2011  |  Data Sources, Online Applications

    Add another online destination to find the data that you need. DataMarket launched back in May with Icelandic data, but just a few days ago relaunched with data of the international variety. They tout 100 million time series datasets and 600 million facts. I'm not totally sure what that means (100 million lines, sets of lines?), but I take it that means a lot.

    Just over 2 years and countless cups of coffee after we started coding, DataMarket.com launches with international data. You can now find, visualize and download data from many of the world’s most important data providers on our site.

    At first glance DataMarket feels a lot like now defunct Swivel. Search for the data you want and you get back a list of datasets. The focus on only time series though is actually a plus in that they can provide more specific tools to visualize and explore. The current toolset isn't going to blow you away, but it's not bad.
    Continue Reading

  • Open thread: Charts during the State of the Union address

    January 26, 2011  |  Discussion, Mistaken Data

    Bubble chart during SOTU

    President Barack Obama delivered his State of the Union address yesterday, and this year it was "enhanced" by charts and graphs. Basically, as Obama spoke, graphics that you could equate to Powerpoint slides showed up on the side. What'd you think of the enhancement? Did it add or detract from the message? Were the graphics used honestly and effectively?

    One thing's for sure: there's something wrong with that bubble chart. Uh oh.
    Continue Reading

  • Tracking space garbage with Space Fence

    January 20, 2011  |  Statistics

    Space Fence

    Lockheed Martin's Space Fence, expected to be in initial operation in 2015, will track the junk floating in space:

    Space Fence is envisioned as a network of ground-based S-band radars that will detect, track, measure and catalog thousands of objects in low-Earth orbit. Expected to begin initial operation in 2015, the system will replace the existing Air Force Space Surveillance System, or VHF Fence, which has been in service since the early 1960s. A leader in S-band radar development, Lockheed Martin's high-powered radar systems will find and follow the course of thousands of pieces of space debris to an accuracy of just meters.

    They provide this video (below) to explain the concept, of which I'm pretty sure most of is fake, but let's pretend it's real. It's more exciting that way.
    Continue Reading

  • A guide for scraping data

    January 17, 2011  |  Data Sources

    Data is rarely in the format you want it. Dan Nguyen, for ProPublica, provides a thorough guide on how to scrape data from Flash, HTML, and PDF. [via @JanWillemTulp]

  • The Joy of Stats available in its entirety

    December 30, 2010  |  Statistics

    The Joy of Stats with Hans Rosling

    The Joy of Stats, hosted by Hans Rosling, is now viewable in its entirety (video below):

    Hans Rosling says there’s nothing boring about stats, and then goes on to prove it. Only with statistics can we make sense of the world and harness the data deluge to serve us rather than drown in its confusion.
    A one-hour long documentary produced by Wingspan Productions and broadcast by BBC, 2010.

    Originally, it was only viewable in the UK, and then there were some clips, but finally, you can watch the whole thing. It's an hour long so you might want to bookmark it for later, but it's entertaining all the way through. Plus, it's the week between Christmas and New Year's so I know you're not working.
    Continue Reading

  • Why the other lines always seem to move faster than yours

    December 24, 2010  |  Statistics

    Waiting lines

    Why does it almost always seem like you're in the slow line at the grocery store or in the driving lane with the most cars on the freeway? Bill "Engineer Guy" Hammack explains in terms of queuing theory in the video below:

    Bill reveals how "queueing theory" - developed by engineers to route phone calls - can be used to find the most efficient arrangement of cashiers and check out lines. He reports on the work of Agner Erlang, a Danish engineer who, at the opening of the 20th century, helped the Copenhagen Telephone Company provide the best level of service at the lowest price.

    Erlang found out how many telephone lines the company needed, given the average number of calls per hour. Similarly, you can figure out how many checkout lines you need, given the average number of customers. It turns out the best arrangement is to have a single line, and the next customer goes to the next available register. There's less chance of blockage from a single delay.

    But people don't like doing that apparently, and so assuming random selection, ending up in the slow line comes down to simple probability.
    Continue Reading

  • Right versus wrong bubble size

    December 17, 2010  |  Mistaken Data

    Subsidize This from Good Magazine

    I was going to post this graphic from Good when it came out, but decided not to. I made the same mistake when I first started out. It was another case of wrongly sized bubbles. But they fixed the problem, so now we can see what a big difference it makes. Continue Reading

  • Data analysis is the future of journalism

    December 8, 2010  |  Statistics

    Tim Berners-Lee, credited with inventing the Web, says analyzing data is the future of journalism:

    "Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you'll do it that way some times.

    "But now it's also going to be about poring over data and equipping yourself with the tools to analyse it and picking out what's interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what's going on in the country."

    The Guardian post focuses on current journalists learning new skills, but what we're also going to see is a new type of person — computer scientists, statisticians, and interaction designers — become the storytellers.

  • Jon Stewart explains Wikileaks’ Cablegate

    December 2, 2010  |  Data Sources, News

    You've probably already heard and read about Wikileaks' Cablegate. If not, Andy Baio has a fine roundup with significant coverage and events to get you caught up quick. Alternatively, you can watch Jon Stewart and The Daily Show explain in the clip below (slightly NSFW, because it mentions a body part).
    Continue Reading

  • The Joy of Stats with Hans Rosling

    November 30, 2010  |  Statistics, Visualization

    Hans Rosling on development

    The Joy of Stats, a one-hour documentary, hosted by none other than the charismatic Hans Rosling, explores the growing importance of statistics:

    [W]ithout statistics we are cast adrift on an ocean of confusion, but armed with stats we can take control of our lives, hold our rulers to account and see the world as it really is. What's more, Hans concludes, we can now collect and analyse such huge quantities of data and at such speeds that scientific method itself seems to be changing.

    From the description, it sounds like they'll touch on Crimespotting by Stamen, Google Translation, among other data-driven projects. Whatever they cover, it's bound to be interesting with Rosling at the front.
    Continue Reading

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.