• Instant electric bike and data collector

    May 26, 2010  |  Data Sharing, Self-surveillance

    When you ride your bicycle around, I bet you always wish for two things. First: "I wish this was electric so that I didn't have to pedal so much." Second: "I wish I could use my bicycle as a data collection device." Well guess what. Your dreams have come true. The Copenhagen Wheel, conceived by the MIT SENSEable City Lab, will do just that. With everything rolled up into one hub, a quick and simple installation turns your plain old bicycle into an electric data collection device.
    Continue Reading

  • Why context is as important as the data itself

    May 21, 2010  |  Design, Statistics

    John Allen Paulos, a math professor at Temple University, explains, in the New York Times, the importance of the before and after of when you get that data blobby thing in your hands.

    The problem isn’t with statistical tests themselves but with what we do before and after we run them. First, we count if we can, but counting depends a great deal on previous assumptions about categorization. Consider, for example, the number of homeless people in Philadelphia, or the number of battered women in Atlanta, or the number of suicides in Denver. Is someone homeless if he’s unemployed and living with his brother’s family temporarily? Do we require that a women self-identify as battered to count her as such? If a person starts drinking day in and day out after a cancer diagnosis and dies from acute cirrhosis, did he kill himself?

    In a nutshell, statistics is a game of estimation. More often than not, the numbers in front of you aren't an exact count. They could easily change if you shift the criteria of what was counted. As a result, there's always some amount of uncertainty attached to your data, and it's the statistician, analyst, and data scientist's job to minimize that uncertainty.

    So the next time you see a list of rankings like "fattest city" or "dumbest town," don't take it for absolute truth. Instead, think of it as an educated guess. Similarly, when you analyze and visualize, remember the context of your data.

    Catch Paulos' full article here.

  • Wait. Something isn’t right here…

    May 14, 2010  |  Mistaken Data

    All the same

    No clue where this is from, but something seems sort of off, no? I guess we should take the title literally. By the numbers... only.

    I'm going to give the benefit of the doubt though, and assume this was just an honest mistake. Here's my guess about what happened. A deadline was coming up quick, and a graphics editor put this together to get a feel for what the final design would look like. He then saved it as a different file, and then went to work. Except when it came time to send the file to the printers, the editor sent the wrong file. Actually, now that I think about it, I'm surprised this doesn't happen more often.

    [via @EagerEyes]

  • Write your own TED talk with lies, damned lies and statistics

    May 12, 2010  |  Statistics

    Sebastian Wernicke, an engagement manager at Oliver Wyman and former bioinformatics researcher, explains the results from his pseudo-analysis of TED talks. The result: a guide on how to give the ultimate TED talk. Go as long as you can, grow your hair out and wear glasses, and cover happy ideas that are easy to relate to. Or better yet, use Wernicke's tedPAD to formulaically write your own talk to drive the audience wild - or boo at you emphatically.
    Continue Reading

  • How open data saved $3.2 billion

    May 12, 2010  |  Statistics

    This is a story of fake charities and tax shelters. In an analysis of data from the Canada Revenue Agency (CRA), it was found that billions of dollars in donations were collected by fraudulent organizations, with only a tiny portion going to the actual causes. In one case, only $1 out of every $100 went to helping the homeless. The rest of the money went to a tax shelter. Shameful.

    All told, my colleague estimated that these illegally operating charities alone sheltered roughly half a billion dollars in 2005. Indeed, newspapers later confirmed that in 2007, fraudulent donations were closer to a billion dollars a year, with some 3.2 billion dollars illegally sheltered, a sum that accounts for 12% of all charitable giving in Canada.

    Not only did this lead to the exposure of fraud, but also negligence on the part of the CRA charity division (now under new leadership). How did this go on for so long? A simple sort on the data would have raised questions immediately. Instead, it took a freelance consultant, poking around out of curiosity, and journalists, who were aware of fishy behavior, to move things along.

    [via @datamarket]

  • How men and women label colors

    May 4, 2010  |  Infographics, Statistics

    Along the same lines of Dolores Labs' color experiment, Randall Munroe of xkcd reveals the results of his color survey. He took a slightly different approach though. Here are some of the basic findings:

    If you ask people to name colors long enough, they go totally crazy.

    “Puke” and “vomit” are totally real colors.

    Colorblind people are more likely than non-colorblind people to type “fuck this” (or some variant) and quit in frustration.

    Indigo was totally just added to the rainbow so it would have 7 colors and make that “ROY G. BIV” acronym work, just like you always suspected. It should really be ROY GBP, with maybe a C or T thrown in there between G and B depending on how the spectrum was converted to RGB.

    A couple dozen people embedded SQL ‘drop table’ statements in the color names. Nice try, kids.

    Nobody can spell “fuchsia”.

    Continue Reading

  • Twitter data buffet is back in business

    April 28, 2010  |  Data Sources

    Almost a year and a half ago, Infochimps, the data repository slash marketplace, released a giant scrape of Twitter data representing 2.7 million users, 10 million tweets, and 58 million connections. Twitter soon requested that they take it down while they figured out how they wanted to handle licensing, privacy, etc.

    That was in 2008, before Twitter really started booming. Fast forward to now. Twitter and Infochimps have figured out what they want to do, and the Twitter census data is back up. It's no longer a measly 2.7 million users anymore though. The population has grown to 35 million.
    Continue Reading

  • R is an ‘epic fail’ – or how to make statisticians mad

    April 22, 2010  |  Software, Statistics

    Statisticians are mad and out for blood. Someone called R an epic fail and said it wasn't the next big thing.

    I know that R is free and I am actually a Unix fan and think Open Source software is a great idea. However, for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

    How dare she, right? Here's the thing. She's right. Wait, wait, hear me out. For the general audience - the people who use Excel as their analysis tool - R is not for them. In this case, the one that appeals to non-statistician analysts, R, as they say, is an epic fail (and that is the last time I will say that stupid phrase).

    However, R wasn't designed to enable everyday users to dig into data. It was designed to enable statisticians with computing power. It's a statistical computing language largely based on S, which was developed in the 1970s by the super smart John Chambers of Bell Labs. The 1970s. Weren't people using slide rules still? Or maybe it was the abacus. Can't remember. Oh wait, I wasn't born yet. In any case, there's really no need to get into the whole R-for-general-audience conversation — just like we don't need to talk about why The SpongeBob SquarePants Movie lacked emotional depth.
    Continue Reading

  • World data released ‘is a dream come true’

    April 20, 2010  |  Data Sources

    mortality

    In another step towards open data and all that jazz, the World Bank released World Development Indicators 2010 today, which is meant to serve as a progress report of the world.

    The WDI provides a valuable statistical picture of the world and how far we've come in advancing development," said Justin Yifu Lin, the World Bank’s Chief Economist and the Senior Vice President for Development Economics. “Making this comprehensive data free for all is a dream come true.

    More importantly though, this comes with the launch of the freely available online database and public API to 1,000+ indicators. There used to be a big fee for this data. I can't speak for the API, but the website is well-designed. It has profile pages for each country, links to download the indicators in Excel and XML, and hey, are those graphs implemented in HTML5? I spy <canvas> tags.
    Continue Reading

  • TransparencyData makes campaign finance data easier to access

    April 14, 2010  |  Data Sources

    Anyone who's looked at campaign finance data knows it can get messy really quick (especially if you're getting it directly from the FEC). Sunlight Labs' newly launched TransparencyData aims to make the process a lot easier.

    They've merged state data from FollowTheMoney and federal data from OpenSecrets and made it easy to search with a clickable interface. Select from a number of filters such as amount, recipient, or contributor, and then download data in bulk or make use of the API.
    Continue Reading

  • Twitter predicts the future?

    April 13, 2010  |  Statistics

    twitter-prediction

    A recent study [pdf] by Sitaram Asur and Bernardo A. Huberman at HP Labs found that it's possible to use Twitter chatter to predict first-weekend box office revenues simply based on volume of tweets. The predictions were even more accurate when they introduced sentiment analysis (i.e. classified tweets as positive or negative).
    Continue Reading

  • IBM data propaganda – babies and old guys with glasses

    March 27, 2010  |  Statistics

    IBM has been spreading the whole "smarter planet" spiel for a while now, but in the past few days, they've revealed the punchline. It's data. The key to a smarter planet is learning how to process and extract information from the 15 petabytes of data we generate per day.

    Surprising? No, not at all. Data's the hot thing right now, and that's where the money's at. Learn how to process all of it and you're gold.
    Continue Reading

  • Buy and sell data at Data Marketplace

    March 22, 2010  |  Data Sources

    Add another site to the list of places to find the data you need. Data Marketplace connects people who want data to people who can find, scrape, and cull data.

    Here's how it works. If you want data, you put in a request and optionally, a deadline and budget. A provider can then go find that data for you, maybe through scraping a difficult-to-parse website, and then post it online. You then have the option to purchase the tabular data.

    There are three big humps to get over though for Data Marketplace to work.
    Continue Reading

  • Tim Berners-Lee with an update on open data

    March 15, 2010  |  Data Sharing

    If people put data on the Web - government data, scientific data, community data - whatever it is, it will be used by other people to do wonderful things in ways they never could have imagined.

    — Tim Berners-Lee, TED, February 2010

    Tim Berners-Lee, credited with inventing the World Wide Web, comes back to TED a year after his call for open, structured data with a quick update. Spoiler alert: things are looking good - and they're only going to get a lot better. But you already knew that, right?
    Continue Reading

  • Is Jeff Bridges most likely to win best actor?

    March 7, 2010  |  Statistics

    oscar-time

    There's this article on CNN, from The Frisky, that has this little theory about who is most likely to win the Oscar for best actor:

    [T]he Oscar generally goes to the dude who has the most best actor and best supporting nominations under his belt already.

    That seemed like a curious statement. Didn't Forest Whitaker, Philip Seymour Hoffman, and Jaimie Foxx recently win on their first nominations for the coveted award? Okay, so Hoffman was actually up against a bunch of other newbies, but what about the rest?

    Only 10 out of the past 29 winners, or just over a third, had the most nominations their year. Take a look at the data since 1980. Is the theory valid? You decide.
    Continue Reading

  • Think like a statistician – without the math

    March 4, 2010  |  Design, Statistics

    Think like a Statistican

    I call myself a statistician, because, well, I'm a statistics graduate student. However, ask me specific questions about hypothesis tests or required sampling size, and my answer probably won't be very good.

    The other day I was trying to think of the last time I did an actual hypothesis test or formal analysis. I couldn't remember. I actually had to dig up old course listings to figure out when it was. It was four years ago during my first year of graduate school. I did well in those courses, and I'm confident I could do that stuff with a quick refresher, but it's a no go off the cuff. It's just not something I do regularly.

    Instead, the most important things I've learned are less formal, but have proven extremely useful when working/playing with data. Here they are in no particular order.
    Continue Reading

  • Spirit of graph and dance is alive

    February 24, 2010  |  Statistics

    A good portion of my time in high school was spent trying to get into college. The rest of the time I was trying to look cool while doing it. Now of course I know better and fully embrace the inner geek. I'll never know what life would've been like had I thrown caution to the wind back then, but I'm guessing it would've been something like this.
    Continue Reading

  • Get a Date With Your Online Profile Pic – Myths Debunked

    February 10, 2010  |  Statistics

    The online dating world can be a confusing place. How do you interact with others? Who should contact? What should you say about yourself? There are a lot of decisions to make, but it all starts with your profile picture when it comes to grabbing the attention of potential dates. Online dating site, OkCupid, analyzed over 7,000 profile pictures, debunking four myths:

    1. It's better to smile
    2. You shouldn’t take your picture with your phone or webcam
    3. Guys should keep their shirts on
    4. Make sure your face is showing

    Some of the results are pretty surprising. For example, men's photos were most effective when they weren't looking at the camera and not smiling:

    It was the opposite for women. A flirty face or smiling while looking at the camera showed most effective:

    Catch the full analysis here.

    [Thanks, Tom]

  • Data.gov.uk Homepage

    Data.gov.uk versus Data.gov – Which wins?

    Back in May last year, the US government launched Data.gov as a statement of transparency, and the Internet rejoiced. After the launch, excitement kind of…
  • Understanding risk – play it safe or eat a bacon sandwich?

    January 27, 2010  |  Statistics

    David Spiegelhalter is a Professor of the Public Understanding of Risk at Cambridge University. He studies the choices we make, and how those choices can have an effect later on.
    Continue Reading

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.