Category: Statistics

  • NYC BigApps Competition – $20k In Prize Money

    Posted Oct 6, 2009 to Data Sources / 4 comments

    NYC BigApps Competition – $20k In Prize Money

    It's exciting times for data heads. The launch of Data.gov back in May got things jump started; San Francisco recently announced DataSF; and now New York is getting in on the party with the announcement of their own Data Mine (live at 1pm EST today) and the NYC Big Apps competition.

    Here's the idea. NYC releases 170 datasets. Whoever can best make use of the data will win part of the $20,000 in prize money. The individual or startup with the best Web or mobile application gets an invite to a dinner with NYC mayor, MIchael R. Bloomberg.

    Some of the data in the list of 170 sets include:

    • Restaurant inspection results
    • Extensive property data
    • Citywide events
    • Directories of recreation facilities and businesses
    • City budget data
    • Traffic updates
    • Alternate side parking updates

    Not too shabby, right? And that's just a small subset.

    No doubt this is going to be an interesting competition. I don't know about you, but I'm going to be keeping an eye on NYC Big Apps from December to January. If the competition is a hit, other big cities will follow.

    At the very least, we're going to see some cool stuff coming out of The New York Times graphics department :).

    Subscribe to the RSS feed or follow on Twitter to stay updated on what's new in data visualization. All the cool people are doing it.

  • 30 Resources to Find the Data You Need

    Posted Oct 1, 2009 to Data Sources, Featured / 68 comments

    Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there.

    Universities

    Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere. Continue Reading

  • Share and Sell Data with Infochimps (100 Invites)

    Posted Sep 25, 2009 to Data Sources / 2 comments

    infochimpsThere's a lot of data on the Web, but it's all very scattered. At the same time, there's a lot of data sitting on people's hard drives that we don't have access to. There are various reasons why people don't share, but mainly, they just don't see the point.

    Infochimps tries to solve both of these problems with an open data marketplace.

    Find Data

    If you're looking for data, search the Infochimps catalog, and you might find what you're looking for. The system is loosely structured and meant to be a publicly curated data place with a mix of open data and for-sale data. Some data sources are simply outgoing links while others are stored in Infochimps infrastructure.

    Sell Data

    If you're on the other side, and you have data to offer, you can put your dataset up for sale. Fill out some forms, specify your price, and let Infochimps handle the rest like storage and cataloging. Infochimps takes a 20% commission on each sale for their service.

    Selling data is of course nothing new. Search for databases for sale, and you'll get plenty of results, but this makes it easier for individuals and small groups to make their data available. Oh, and you can make your data open also.

    Quality Assurance

    The main challenge I see here is making sure the cataloged data are of good quality. It's one thing when the data are open and free, but when you're paying money, you want to make sure you're buying a product that's worth the price.

    Currently, there's a star rating system, but it's unclear who decides how many stars go on a dataset. There's also no way to get a data sample, so all you get is a description pre-purchase.

    Clearly, there's still a lot of work to be done with the application, but there's plenty of potential.

    Infochimps is currently beta testing. If you'd like try it out, there's invites for the first 100 FlowingData readers who sign up. Use the code 'dataflowing' when you register.

    UPDATE: Infochimps has kindly provided 100 more invites in case you missed first. Use this code when you sign up: flowswithdata.

  • Online Dating Service Analyzes Intro Messages – How to Get a Response

    Posted Sep 24, 2009 to Statistics / 4 comments

    Online Dating Service Analyzes Intro Messages – How to Get a Response

    Online dating can be tricky. What do you say? How do you reply to people? What should you put in your profile? Should you use that profile picture from 15 years ago?

    Well, fret no more, because OkCupid, an online dating service, analyzed over 500,000 introduction messages and whether or not they got a response from the message receiver. For example, the above graphs shows reply rates for intro messages that used netspeak. Here's a tip: don't use it, probably because it makes you sound like an idiot or you take writing advice from the comments on YouTube.

    Other fine tips include: avoid compliments on physical appearance (because it's the inside that counts) and don't try to bring the conversation outside the service (because that's creepy).

    [via Waxy]

  • TV Size Over the Past 8 Years

    Posted Sep 23, 2009 to Statistics / 21 comments

    Apparently the average television size is going to be 60 inches by 2015. Do we really need that much television? I mean, come on.

    I used to watch my mom's old 9-inch black-and-white television in my room, and I thought it was the greatest thing ever. PacMan on my cousin's hand-me-down Atari couldn't look any better. Things are a little different now, yeah? I wonder what my Xbox games would look like on that old TV.

    Anyways, I scraped some television size data representing the past eight years or so, and actually, growth isn't as dramatic as you might think.

    I remember when my 9-inch black and white with manual dial was good enough.

  • What Cell Phone Provider is Best For You?

    Posted Sep 15, 2009 to Statistics / 33 comments

    Picking a cell phone plan is confusing, but it doesn't have to be.

    Providers purposely make it that way, so you don't see all that you're forking over per month until you're locked into a horrible 2-year plan. It doesn't have to be like this though. Let's look at the data to find what cell phone provider has the best price.

    better, you big babies?

    Prices are a little different if you have an iPhone, but it's not like you have a choice in provider anyways ;).

    Okay, so now you've seen the numbers. Are you are on the right cell phone plan?

  • Low Income Hinders College Attendance, Even for Top Students

    Posted Sep 1, 2009 to Statistics / 10 comments

    Low Income Hinders College Attendance, Even for Top Students

    What if you were a good student but knew you weren't going to be able to go to college?

    I was fortunate enough for most of my life to know that if I wanted to get a higher education, I would be able to. Thanks, Mom and Dad. It's hard for me to imagine working hard in middle school and high school if I didn't have that goal in mind, but that's the path that many grow up with.

    The above graph are the results of a study by the Department of Education started in 1988. It shows that low-income students are most likely not to complete college - despite doing well in 8th grade. It's a much different story for high-income students.

    The Department tracked student progress in 8th grade and through high school and college over the next 12 years. Only 3% of students, from low income families, with low 8th grade math performance, completed college. Compare that to students with the same math performance but from high income families. Thirty percent finished college. That's ten times more than the former.

    What's worse is that many low-income students who had high math performance still didn't complete college. The percentage of college completion for low-income, high math students was still lower than high-income, low math students.

    [via @golan]

  • Data is the New Hot, Drop-dead Gorgeous Field

    Posted Aug 7, 2009 to Statistics / 12 comments

    We all know this already, but it's nice to get some backing from The New York Times every now and then. In this NYT article, that I'm sure has spread to every statistician's email inbox by now, Steve Lohr describes the dead sexy that is statistics:

    The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore — sensor signals, surveillance tapes, social network chatter, public records and more. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.

    I've got about one more year (hopefully) until I finish graduate school. Hmm, things are looking up, yeah? Of course, it's never been about the money. The profession of statistician didn't nearly seem so hot when I started school. The best news here is that us data folk are going to get paid for doing what we enjoy, and as time goes on there's only going to be more data to play with, and we're going to be in high demand:

    Yet data is merely the raw material of knowledge. “We’re rapidly entering a world where everything can be monitored and measured,” said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology’s Center for Digital Business. “But the big problem is going to be the ability of humans to use, analyze and make sense of the data.”

    Wait, but it's not just statisticians who can interpret data:

    Though at the fore, statisticians are only a small part of an army of experts using modern statistical techniques for data analysis. Computing and numerical skills, experts say, matter far more than degrees. So the new data sleuths come from backgrounds like economics, computer science and mathematics.

    Like a... data scientist? Excellent.

  • IT Dashboard and Data from USAspending.gov

    Posted Jul 22, 2009 to Data Sources / 9 comments

    IT Dashboard and Data from USAspending.gov

    Taking another step towards data transparency, the US government provides the IT dashboard via USAspending.gov:

    The IT Dashboard provides the public with an online window into the details of Federal information technology investments and provides users with the ability to track the progress of investments over time. The IT Dashboard displays data received from agency reports to the Office of Management and Budget (OMB), including general information on over 7,000 Federal IT investments and detailed data for nearly 800 of those investments that agencies classify as "major." The performance data used to track the 800 major IT investments is based on milestone information displayed in agency reports to OMB called "Exhibit 300s." Agency CIOs are responsible for evaluating and updating select data on a monthly basis, which is accomplished through interfaces provided on the website.

    Along with a page to filter and download spending data, there's a variety of views into the IT spending data that all provide a pretty good level of interaction.

    it-bar

    it-program

    it-home

    it-motion

    One thing I can't really figure out is if "IT investments" means investments in the traditional sense like stocks, or if it's something else. I was a little surprised that the government is making investments at all but I guess I didn't have any good reason to think that. I don't know. Maybe someone can explain it to me.

    [Thanks Justin & Preston]

  • Taking a Closer Look at Airplane-Bird Collisions

    Posted Jul 16, 2009 to Data Sources / 10 comments

    Taking a Closer Look at Airplane-Bird Collisions

    While we're on the subject of flight, ever since that plane landed in the Hudson River a few months ago, the thought of bird-airplane collisions haven't strayed too far from the media (or my mind each time I fly). In light of all the hoopla, the Federal Aviation Administration (FAA) finally gave in and opened up their bird strike database to the public.

    Below is an interactive exploring this data breaking things down by bird type, location, phase of flight, and time of day. Click through to this post to view.

    Beware of the Canada Goose and gulls, rats of the sea. The sparrow, Mourning Dove, and European Starling seem to get in the way plenty also, but don't cause nearly as much damage.

    On the flip side - poor birds. What a way to go.

    Do you see anything interesting?

  • Explore World Data with Factbook eXplorer from OECD

    Explore World Data with Factbook eXplorer from OECD

    The Organization for Economic Co-operation and Development (OECD) makes a lot of world indicators available (e.g. world population and birth rate). Much of it goes unnoticed, because most people just see a bunch of numbers. However, the Factbook eXplorer from the OECD, in collaboration with the National Center for Visual Analytics, is a visualization tool that helps you see and explore the data.

    Those who have seen Hans Rosling's Gapminder presentations - and I imagine most of us have - will recognize the style with a play button and a motion graph in sync with parallel coordinates and a map. Choose an indicator, or several of them, press play, and watch the visualization move through time.

    Also, if you've got your own data, you can load that too, which is certainly a nice touch.

    [via BBC News | Thanks, Lawrie & Liam]

  • The Devil is in the Digits?

    Posted Jun 22, 2009 to Statistics / 8 comments
    digits
    Photo by Leo Reynolds

    Undoubtedly you've been seeing a lot of headlines about the stuff going on in Iran. If you haven't, you must be living under a rock.

    One of the huge issues right now is whether or not fraud was involved in the election of Mahmoud Ahmadinejad.

    Wait a minute. Voting? Results? Numbers?

    Oh, we have to look at the data for this one. Bernd Beber and Alexandra Scacco, Ph.D. candidates in political science at Columbia University, discuss in their Op-ed for the Washington Post:

    The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4 percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.

    Why does this matter? Well humans are bad at making up sequences of numbers. Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama). Beber and Scacco go on to describe the details of why the data look fishy. For those of us who've read Freakonomics will recognize the discussion.

    The result?

    The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.

    Now what?

    [via Statistical Modeling]

  • The Current State of Social Data

    Posted Jun 16, 2009 to Social Data Analysis / 7 comments

    Check out my guest post on The Guardian's Data Blog on the current state of social data applications. There are what seems like a ton of them but none of them have really taken off (yet).

    While the post is more of an overview of what's available, I'd like to start a little discussion here on why these data apps haven't gained more popularlity. There always seems be a lot of buzz around launch time, but then it fizzles.

    Are people just not interested in interacting with data or do we need to approach the whole social data puzzle from a different angle?

  • Poll: Will Data Always Be Just For Geeks?

    Posted Jun 10, 2009 to Polls, Statistics / 20 comments
    geek
    Photo by penmachine

    I threw out a random thought a couple of months back. I tweeted, "Remember when computers used to be just for geeks? Now they're ubiquitous. We can do the same for data."

    To be honest, I was just babbling, but I've been giving it some thought, and you know, now I'm not so sure. There are so many applications popping up every day that promise to socialize data. To make it the YouTube of data. None of them have really taken off though.

    Is it because the visualization tools aren't advanced enough to make data accessible to the common user or is data simply meant to stay in the hands of experts?

    So this begs the question:

    Will Data Always Be Just For Geeks?
    View Results

    If yes, what do you think makes data so distant to non-experts? If no, what will it take for non-experts to start interacting with data? Or are they already?

  • Rise of the Data Scientist

    Posted Jun 4, 2009 to Data Design Tips, Featured, Statistics / 44 comments

    Photo by majamarko

    As we've all read by now, Google's chief economist Hal Varian commented in January that the next sexy job in the next 10 years would be statisticians. Obviously, I whole-heartedly agree. Heck, I'd go a step further and say they're sexy now - mentally and physically.

    However, if you went on to read the rest of Varian's interview, you'd know that by statisticians, he actually meant it as a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts.
    Continue Reading