Update: I had scheduled this post for next week, but apparently, Data.gov.uk launched today. The site isn't loading for me right now though. I guess they weren't prepared for traffic.
Data.gov, a catalog of US data, launched last year. Now it's the UK's turn. Well, not yet. But soon. Data.gov.uk is still under lock and key, but it has granted access to some developers. Ito Labs, the group behind mapping a year of OpenStreetMap edits posted screenshots of their maps that show vehicle counts (above).
Here are some comparison maps between 2001 and 2008, by vehicle type.
This interactive by Las Vegas Sun describes how in the long run, you're going to lose every single penny when you throw your hard-earned money into a slot machine. In the short-term though, it is possible to win. It's all probability. It's also why statisticians don't gamble. Nobody plays a game that he's practically guaranteed to lose, unless you're a masochist - or you're Al Pacino in that one horrible sports gambling movie with Matthew McConaughey.
One clarification on the snippet about payout percentage.
Here's what the graphic reads:
This is the ratio of money a player will get back to the amount of money he bets, which is programmed into the slot machine. If a machine has payout percentage of 90 percent, that means 90 percent of the money someone bets should statistically be won back. It means a player is not likely to lose 10 percent of the amount initially put into the machine, but rather 10 percent, on average, over time.
The wording is kind of confusing. To be more clear - over time, on average, you'd lose 10% of the money you put in per bet. This is an important note, because it's how casinos make money. For example, when you play Blackjack perfectly (sans card-counting), you'll lose on average 2% (or something like that) per hand, so play long enough, and you're going to lose all your money.
Imagine you have two buckets. One is filled with water. The other is empty. Transfer the water back and forth between the two buckets. Some of the water drips out during some of the transfers. Eventually, all the water is on the ground.
Ah yes, intro probability is fun. Play the virtual slot machine and do some learning for yourself.
What? I don't see anything wrong with it.
Are there any differences in student performance between schools with small classes (as in students per teacher) and those with large classes?
The natural response is yeah, of course, because if there are less students per teacher, each student gets more individual attention from the teacher. Then again, I went to pretty big elementary and high schools where some classes were in the high thirties. It didn't seem all that bad.
FlowingData readers who have been around for a while will remember I made a map early this year that showed the growth of Target stores across America. It starts with the first one in 1962 and then goes from there. It was a follow-up to the Walmart map, which I shared the code and data for.
It's exciting times for data heads. The launch of Data.gov back in May got things jump started; San Francisco recently announced DataSF; and now New York is getting in on the party with the announcement of their own Data Mine (live at 1pm EST today) and the NYC Big Apps competition.
Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there.
Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere. Continue Reading
There's a lot of data on the Web, but it's all very scattered. At the same time, there's a lot of data sitting on people's hard drives that we don't have access to. There are various reasons why people don't share, but mainly, they just don't see the point.
Online dating can be tricky. What do you say? How do you reply to people? What should you put in your profile? Should you use that profile picture from 15 years ago?
Well, fret no more, because OkCupid, an online dating service, analyzed over 500,000 introduction messages and whether or not they got a response from the message receiver. For example, the above graphs shows reply rates for intro messages that used netspeak. Here's a tip: don't use it, probably because it makes you sound like an idiot or you take writing advice from the comments on YouTube.
Other fine tips include: avoid compliments on physical appearance (because it's the inside that counts) and don't try to bring the conversation outside the service (because that's creepy).
Picking a cell phone plan is confusing, but it doesn't have to be.
Providers purposely make it that way, so you don't see all that you're forking over per month until you're locked into a horrible 2-year plan. It doesn't have to be like this though. Let's look at the data to find what cell phone provider has the best price.
What if you were a good student but knew you weren't going to be able to go to college?
I was fortunate enough for most of my life to know that if I wanted to get a higher education, I would be able to. Thanks, Mom and Dad. It's hard for me to imagine working hard in middle school and high school if I didn't have that goal in mind, but that's the path that many grow up with.
The above graph are the results of a study by the Department of Education started in 1988. It shows that low-income students are most likely not to complete college - despite doing well in 8th grade. It's a much different story for high-income students.
The Department tracked student progress in 8th grade and through high school and college over the next 12 years. Only 3% of students, from low income families, with low 8th grade math performance, completed college. Compare that to students with the same math performance but from high income families. Thirty percent finished college. That's ten times more than the former.
What's worse is that many low-income students who had high math performance still didn't complete college. The percentage of college completion for low-income, high math students was still lower than high-income, low math students.
We all know this already, but it's nice to get some backing from The New York Times every now and then. In this NYT article, that I'm sure has spread to every statistician's email inbox by now, Steve Lohr describes the dead sexy that is statistics:
The rising stature of statisticians, who can earn $125,000 at top companies in their first year after getting a doctorate, is a byproduct of the recent explosion of digital data. In field after field, computing and the Web are creating new realms of data to explore sensor signals, surveillance tapes, social network chatter, public records and more. And the digital data surge only promises to accelerate, rising fivefold by 2012, according to a projection by IDC, a research firm.
I've got about one more year (hopefully) until I finish graduate school. Hmm, things are looking up, yeah? Of course, it's never been about the money. The profession of statistician didn't nearly seem so hot when I started school. The best news here is that us data folk are going to get paid for doing what we enjoy, and as time goes on there's only going to be more data to play with, and we're going to be in high demand:
Yet data is merely the raw material of knowledge. "We're rapidly entering a world where everything can be monitored and measured," said Erik Brynjolfsson, an economist and director of the Massachusetts Institute of Technology's Center for Digital Business. "But the big problem is going to be the ability of humans to use, analyze and make sense of the data."
Wait, but it's not just statisticians who can interpret data:
Though at the fore, statisticians are only a small part of an army of experts using modern statistical techniques for data analysis. Computing and numerical skills, experts say, matter far more than degrees. So the new data sleuths come from backgrounds like economics, computer science and mathematics.
Like a... data scientist? Excellent.
Taking another step towards data transparency, the US government provides the IT dashboard via USAspending.gov:
The IT Dashboard provides the public with an online window into the details of Federal information technology investments and provides users with the ability to track the progress of investments over time. The IT Dashboard displays data received from agency reports to the Office of Management and Budget (OMB), including general information on over 7,000 Federal IT investments and detailed data for nearly 800 of those investments that agencies classify as "major." The performance data used to track the 800 major IT investments is based on milestone information displayed in agency reports to OMB called "Exhibit 300s." Agency CIOs are responsible for evaluating and updating select data on a monthly basis, which is accomplished through interfaces provided on the website.
While we're on the subject of flight, ever since that plane landed in the Hudson River a few months ago, the thought of bird-airplane collisions haven't strayed too far from the media (or my mind each time I fly). In light of all the hoopla, the Federal Aviation Administration (FAA) finally gave in and opened up their bird strike database to the public.
Below is an interactive exploring this data breaking things down by bird type, location, phase of flight, and time of day. Click through to this post to view.
The Organization for Economic Co-operation and Development (OECD) makes a lot of world indicators available (e.g. world population and birth rate). Much of it goes unnoticed, because most people just see a bunch of numbers. However, the Factbook eXplorer from the OECD, in collaboration with the National Center for Visual Analytics, is a visualization tool that helps you see and explore the data.
Those who have seen Hans Rosling's Gapminder presentations - and I imagine most of us have - will recognize the style with a play button and a motion graph in sync with parallel coordinates and a map. Choose an indicator, or several of them, press play, and watch the visualization move through time.
Also, if you've got your own data, you can load that too, which is certainly a nice touch.
Photo by Leo Reynolds
Undoubtedly you've been seeing a lot of headlines about the stuff going on in Iran. If you haven't, you must be living under a rock.
One of the huge issues right now is whether or not fraud was involved in the election of Mahmoud Ahmadinejad.
Wait a minute. Voting? Results? Numbers?
Oh, we have to look at the data for this one. Bernd Beber and Alexandra Scacco, Ph.D. candidates in political science at Columbia University, discuss in their Op-ed for the Washington Post:
The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4 percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.
Why does this matter? Well humans are bad at making up sequences of numbers. Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama). Beber and Scacco go on to describe the details of why the data look fishy. For those of us who've read Freakonomics will recognize the discussion.
The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.
[via Statistical Modeling]