• Virtual Slot Machine Teaches the Logic of Loss

    December 18, 2009  |  Infographics, Statistics

    This interactive by Las Vegas Sun describes how in the long run, you're going to lose every single penny when you throw your hard-earned money into a slot machine. In the short-term though, it is possible to win. It's all probability. It's also why statisticians don't gamble. Nobody plays a game that he's practically guaranteed to lose, unless you're a masochist - or you're Al Pacino in that one horrible sports gambling movie with Matthew McConaughey.

    One clarification on the snippet about payout percentage.

    Here's what the graphic reads:

    This is the ratio of money a player will get back to the amount of money he bets, which is programmed into the slot machine. If a machine has payout percentage of 90 percent, that means 90 percent of the money someone bets should statistically be won back. It means a player is not likely to lose 10 percent of the amount initially put into the machine, but rather 10 percent, on average, over time.

    The wording is kind of confusing. To be more clear - over time, on average, you'd lose 10% of the money you put in per bet. This is an important note, because it's how casinos make money. For example, when you play Blackjack perfectly (sans card-counting), you'll lose on average 2% (or something like that) per hand, so play long enough, and you're going to lose all your money.

    Imagine you have two buckets. One is filled with water. The other is empty. Transfer the water back and forth between the two buckets. Some of the water drips out during some of the transfers. Eventually, all the water is on the ground.

    Ah yes, intro probability is fun. Play the virtual slot machine and do some learning for yourself.

    [Thanks, Tyson]

  • Fox News Makes the Best Pie Chart. Ever.

    November 26, 2009  |  Mistaken Data, Ugly Charts

    Fox News pie chart

    What? I don't see anything wrong with it.
    Continue Reading

  • Choose Your Own Adventure – Watch the Stories Unfold (Updated)

    November 19, 2009  |  Infographics, Statistics

    Interaction designer Christian Swinehart takes a careful look at the popular Choose Your Own Adventure books from the 1980s. We saw something like this before, but Swinehart takes it a step further.
    Continue Reading

  • Class Size and SAT Scores By State

    November 10, 2009  |  Statistics

    Are there any differences in student performance between schools with small classes (as in students per teacher) and those with large classes?

    The natural response is yeah, of course, because if there are less students per teacher, each student gets more individual attention from the teacher. Then again, I went to pretty big elementary and high schools where some classes were in the high thirties. It didn't seem all that bad.
    Continue Reading

  • Unemployment, 2004 to Present – The Country is Bleeding

    November 4, 2009  |  Data Sources, Mapping

    The Bureau of Labor Statistics released the most recent unemployment numbers last week. Things aren't looking good for the unemployed, I'm afraid.

    I showed my younger sister the maps. Her response: "It looks like the country is bleeding."
    Continue Reading

  • Target Store Openings Since the First in 1962 – Data Now Available

    October 22, 2009  |  Data Sources

    FlowingData readers who have been around for a while will remember I made a map early this year that showed the growth of Target stores across America. It starts with the first one in 1962 and then goes from there. It was a follow-up to the Walmart map, which I shared the code and data for.
    Continue Reading

  • NYC BigApps Competition – $20k In Prize Money

    October 6, 2009  |  Data Sources

    It's exciting times for data heads. The launch of Data.gov back in May got things jump started; San Francisco recently announced DataSF; and now New York is getting in on the party with the announcement of their own Data Mine (live at 1pm EST today) and the NYC Big Apps competition.
    Continue Reading

  • 30 Resources to Find the Data You Need

    October 1, 2009  |  Data Sources

    Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there.

    Universities

    Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere. Continue Reading

  • Share and Sell Data with Infochimps (100 Invites)

    September 25, 2009  |  Data Sources

    infochimpsThere's a lot of data on the Web, but it's all very scattered. At the same time, there's a lot of data sitting on people's hard drives that we don't have access to. There are various reasons why people don't share, but mainly, they just don't see the point.

    Infochimps tries to solve both of these problems with an open data marketplace.
    Continue Reading

  • Online Dating Service Analyzes Intro Messages – How to Get a Response

    September 24, 2009  |  Statistics

    netspeak-chart

    Online dating can be tricky. What do you say? How do you reply to people? What should you put in your profile? Should you use that profile picture from 15 years ago?

    Well, fret no more, because OkCupid, an online dating service, analyzed over 500,000 introduction messages and whether or not they got a response from the message receiver. For example, the above graphs shows reply rates for intro messages that used netspeak. Here's a tip: don't use it, probably because it makes you sound like an idiot or you take writing advice from the comments on YouTube.

    Other fine tips include: avoid compliments on physical appearance (because it's the inside that counts) and don't try to bring the conversation outside the service (because that's creepy).

    [via Waxy]

  • What Cell Phone Provider is Best For You?

    September 15, 2009  |  Statistics

    Picking a cell phone plan is confusing, but it doesn't have to be.

    Providers purposely make it that way, so you don't see all that you're forking over per month until you're locked into a horrible 2-year plan. It doesn't have to be like this though. Let's look at the data to find what cell phone provider has the best price.
    Continue Reading

  • Low Income Hinders College Attendance, Even for Top Students

    September 1, 2009  |  Statistics

    snap20051012a

    What if you were a good student but knew you weren't going to be able to go to college?

    I was fortunate enough for most of my life to know that if I wanted to get a higher education, I would be able to. Thanks, Mom and Dad. It's hard for me to imagine working hard in middle school and high school if I didn't have that goal in mind, but that's the path that many grow up with.

    The above graph are the results of a study by the Department of Education started in 1988. It shows that low-income students are most likely not to complete college - despite doing well in 8th grade. It's a much different story for high-income students.

    The Department tracked student progress in 8th grade and through high school and college over the next 12 years. Only 3% of students, from low income families, with low 8th grade math performance, completed college. Compare that to students with the same math performance but from high income families. Thirty percent finished college. That's ten times more than the former.

    What's worse is that many low-income students who had high math performance still didn't complete college. The percentage of college completion for low-income, high math students was still lower than high-income, low math students.

    [via @golan]

  • Data is the New Hot, Drop-dead Gorgeous Field

    August 7, 2009  |  Statistics

    We all know this already, but it's nice to get some backing from The New York Times every now and then. In this NYT article, that I'm sure has spread to every statistician's email inbox by now, Steve Lohr describes the dead sexy that is statistics:

    The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore sensor signals, surveillance tapes, social network chatter, public records and more. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.

    I've got about one more year (hopefully) until I finish graduate school. Hmm, things are looking up, yeah? Of course, it's never been about the money. The profession of statistician didn't nearly seem so hot when I started school. The best news here is that us data folk are going to get paid for doing what we enjoy, and as time goes on there's only going to be more data to play with, and we're going to be in high demand:

    Yet data is merely the raw material of knowledge. "We're rapidly entering a world where everything can be monitored and measured," said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology's Center for Digital Business. "But the big problem is going to be the ability of humans to use, analyze and make sense of the data."

    Wait, but it's not just statisticians who can interpret data:

    Though at the fore, statisticians are only a small part of an army of experts using modern statistical techniques for data analysis. Computing and numerical skills, experts say, matter far more than degrees. So the new data sleuths come from backgrounds like economics, computer science and mathematics.

    Like a... data scientist? Excellent.

  • IT Dashboard and Data from USAspending.gov

    July 22, 2009  |  Data Sources

    it-dashboard

    Taking another step towards data transparency, the US government provides the IT dashboard via USAspending.gov:

    The IT Dashboard provides the public with an online window into the details of Federal information technology investments and provides users with the ability to track the progress of investments over time. The IT Dashboard displays data received from agency reports to the Office of Management and Budget (OMB), including general information on over 7,000 Federal IT investments and detailed data for nearly 800 of those investments that agencies classify as "major." The performance data used to track the 800 major IT investments is based on milestone information displayed in agency reports to OMB called "Exhibit 300s." Agency CIOs are responsible for evaluating and updating select data on a monthly basis, which is accomplished through interfaces provided on the website.

    Along with a page to filter and download spending data, there's a variety of views into the IT spending data that all provide a pretty good level of interaction.
    Continue Reading

  • Taking a Closer Look at Airplane-Bird Collisions

    July 16, 2009  |  Data Sources

    While we're on the subject of flight, ever since that plane landed in the Hudson River a few months ago, the thought of bird-airplane collisions haven't strayed too far from the media (or my mind each time I fly). In light of all the hoopla, the Federal Aviation Administration (FAA) finally gave in and opened up their bird strike database to the public.

    Below is an interactive exploring this data breaking things down by bird type, location, phase of flight, and time of day. Click through to this post to view.
    Continue Reading

  • Explore World Data with Factbook eXplorer from OECD

    explorer

    The Organization for Economic Co-operation and Development (OECD) makes a lot of world indicators available (e.g. world population and birth rate). Much of it goes unnoticed, because most people just see a bunch of numbers. However, the Factbook eXplorer from the OECD, in collaboration with the National Center for Visual Analytics, is a visualization tool that helps you see and explore the data.

    Those who have seen Hans Rosling's Gapminder presentations - and I imagine most of us have - will recognize the style with a play button and a motion graph in sync with parallel coordinates and a map. Choose an indicator, or several of them, press play, and watch the visualization move through time.

    Also, if you've got your own data, you can load that too, which is certainly a nice touch.

    [via BBC News | Thanks, Lawrie & Liam]

  • The Devil is in the Digits?

    June 22, 2009  |  Statistics
    digits
    Photo by Leo Reynolds

    Undoubtedly you've been seeing a lot of headlines about the stuff going on in Iran. If you haven't, you must be living under a rock.

    One of the huge issues right now is whether or not fraud was involved in the election of Mahmoud Ahmadinejad.

    Wait a minute. Voting? Results? Numbers?

    Oh, we have to look at the data for this one. Bernd Beber and Alexandra Scacco, Ph.D. candidates in political science at Columbia University, discuss in their Op-ed for the Washington Post:

    The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4 percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.

    Why does this matter? Well humans are bad at making up sequences of numbers. Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama). Beber and Scacco go on to describe the details of why the data look fishy. For those of us who've read Freakonomics will recognize the discussion.

    The result?

    The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.

    Now what?

    [via Statistical Modeling]

  • The Current State of Social Data

    June 16, 2009  |  Social Data Analysis

    Check out my guest post on The Guardian's Data Blog on the current state of social data applications. There are what seems like a ton of them but none of them have really taken off (yet).

    While the post is more of an overview of what's available, I'd like to start a little discussion here on why these data apps haven't gained more popularlity. There always seems be a lot of buzz around launch time, but then it fizzles.

    Are people just not interested in interacting with data or do we need to approach the whole social data puzzle from a different angle?

  • Poll: Will Data Always Be Just For Geeks?

    June 10, 2009  |  Polls, Statistics
    geek
    Photo by penmachine

    I threw out a random thought a couple of months back. I tweeted, "Remember when computers used to be just for geeks? Now they're ubiquitous. We can do the same for data."

    To be honest, I was just babbling, but I've been giving it some thought, and you know, now I'm not so sure. There are so many applications popping up every day that promise to socialize data. To make it the YouTube of data. None of them have really taken off though.

    Is it because the visualization tools aren't advanced enough to make data accessible to the common user or is data simply meant to stay in the hands of experts?

    So this begs the question:

    {democracy:9}

    If yes, what do you think makes data so distant to non-experts? If no, what will it take for non-experts to start interacting with data? Or are they already?

  • Rise of the Data Scientist

    June 4, 2009  |  Design, Statistics

    Photo by majamarko

    As we've all read by now, Google's chief economist Hal Varian commented in January that the next sexy job in the next 10 years would be statisticians. Obviously, I whole-heartedly agree. Heck, I'd go a step further and say they're sexy now - mentally and physically.

    However, if you went on to read the rest of Varian's interview, you'd know that by statisticians, he actually meant it as a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts.
    Continue Reading

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.