Undoubtedly you've been seeing a lot of headlines about the stuff going on in Iran. If you haven't, you must be living under a rock.
One of the huge issues right now is whether or not fraud was involved in the election of Mahmoud Ahmadinejad.
Wait a minute. Voting? Results? Numbers?
Oh, we have to look at the data for this one. Bernd Beber and Alexandra Scacco, Ph.D. candidates in political science at Columbia University, discuss in their Op-ed for the Washington Post:
The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4 percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.
Why does this matter? Well humans are bad at making up sequences of numbers. Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama). Beber and Scacco go on to describe the details of why the data look fishy. For those of us who've read Freakonomics will recognize the discussion.
The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.
[via Statistical Modeling]