<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: The Devil is in the Digits?</title>
	<atom:link href="http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/feed/" rel="self" type="application/rss+xml" />
	<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/</link>
	<description>Strength in Numbers</description>
	<lastBuildDate>Thu, 24 May 2012 07:38:42 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Joe H.</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31222</link>
		<dc:creator>Joe H.</dc:creator>
		<pubDate>Wed, 24 Jun 2009 02:35:01 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31222</guid>
		<description>Just because the numbers don&#039;t fit one&#039;s tidy little theory doesn&#039;t make them false. Just because the chances aren&#039;t great about something doesn&#039;t mean it didn&#039;t happen.

You say, &quot;Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama).&quot; It&#039;s hard to decide whether the e.g. refers to made-up seq&#039;s or real seq&#039;s but if you think McCain/Obama was real it shows how far into anal darkness your head is.</description>
		<content:encoded><![CDATA[<p>Just because the numbers don&#8217;t fit one&#8217;s tidy little theory doesn&#8217;t make them false. Just because the chances aren&#8217;t great about something doesn&#8217;t mean it didn&#8217;t happen.</p>
<p>You say, &#8220;Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama).&#8221; It&#8217;s hard to decide whether the e.g. refers to made-up seq&#8217;s or real seq&#8217;s but if you think McCain/Obama was real it shows how far into anal darkness your head is.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zach</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31214</link>
		<dc:creator>Zach</dc:creator>
		<pubDate>Tue, 23 Jun 2009 16:41:30 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31214</guid>
		<description>Visio&#039;s critique isn&#039;t on the right track; the authors are right that the phenomena they&#039;ve identified is rare (although they get the probability wrong - it&#039;ll happen in 0.15% and not 0.5% of elections).

However, go look at the authors&#039; previous work on elections in Nigeria and, assuming you have ever honestly used statistical tests, you&#039;ll see the flaw in their analysis.  They apply a different test to the Nigerian data; in fact, the psychological studies they cite in that work directly belie the Iranian data.  They posit that a fraudulent set of random numbers will have too many 1s, 2s, and 3s and not enough high digits.  Yet here they see an excess of 7s and a paucity of 9s - neither number is discussed in their earlier work.  The nonadjacent numbers test was discussed, but not applied, to the Nigerian data - why is it used here?

The most likely answer is that the authors here looked at every possible thing they&#039;ve identified as diagnostic of a fraudulent election, picked the ones that occurred, and calculated the probability of that occurrence.  As a statistical test of fraud, that&#039;s completely worthless.

Ask yourself: had this phenomena occurred in the penultimate digit instead of the last digit, would the authors have written the same article?

In fact, in the 2008 Obama/McCain election (set of 102 state totals form Wikipedia) the 20% of the numbers are 7 and only 5% of them are 8 in the penultimate digit.  This is rarer than the 7s and 9s observation here.

If you look at any set of 116 similar, random numbers, you&#039;ll find an apparently paradoxical pattern; it would be much, much more unlikely, in fact, not to see such a pattern.  Would fraud be less likely if each digit were used exactly 11 or 12 times out of the 116 numbers?  That&#039;s a much less likely outcome than having one digit occur 17% of the time and another occur 4% of the time.</description>
		<content:encoded><![CDATA[<p>Visio&#8217;s critique isn&#8217;t on the right track; the authors are right that the phenomena they&#8217;ve identified is rare (although they get the probability wrong &#8211; it&#8217;ll happen in 0.15% and not 0.5% of elections).</p>
<p>However, go look at the authors&#8217; previous work on elections in Nigeria and, assuming you have ever honestly used statistical tests, you&#8217;ll see the flaw in their analysis.  They apply a different test to the Nigerian data; in fact, the psychological studies they cite in that work directly belie the Iranian data.  They posit that a fraudulent set of random numbers will have too many 1s, 2s, and 3s and not enough high digits.  Yet here they see an excess of 7s and a paucity of 9s &#8211; neither number is discussed in their earlier work.  The nonadjacent numbers test was discussed, but not applied, to the Nigerian data &#8211; why is it used here?</p>
<p>The most likely answer is that the authors here looked at every possible thing they&#8217;ve identified as diagnostic of a fraudulent election, picked the ones that occurred, and calculated the probability of that occurrence.  As a statistical test of fraud, that&#8217;s completely worthless.</p>
<p>Ask yourself: had this phenomena occurred in the penultimate digit instead of the last digit, would the authors have written the same article?</p>
<p>In fact, in the 2008 Obama/McCain election (set of 102 state totals form Wikipedia) the 20% of the numbers are 7 and only 5% of them are 8 in the penultimate digit.  This is rarer than the 7s and 9s observation here.</p>
<p>If you look at any set of 116 similar, random numbers, you&#8217;ll find an apparently paradoxical pattern; it would be much, much more unlikely, in fact, not to see such a pattern.  Would fraud be less likely if each digit were used exactly 11 or 12 times out of the 116 numbers?  That&#8217;s a much less likely outcome than having one digit occur 17% of the time and another occur 4% of the time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Myrddin Emrys</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31192</link>
		<dc:creator>Myrddin Emrys</dc:creator>
		<pubDate>Mon, 22 Jun 2009 22:07:33 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31192</guid>
		<description>Meant to be humorous, I know, but alphabetic characters are not normal data (in the mathematical meaning of the word &#039;normal&#039;). The last digit or two of elections should, in theory, be normal data.</description>
		<content:encoded><![CDATA[<p>Meant to be humorous, I know, but alphabetic characters are not normal data (in the mathematical meaning of the word &#8216;normal&#8217;). The last digit or two of elections should, in theory, be normal data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Vote totals from Iran were statistically&#160;improbable &#124; MNpublius.com</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31191</link>
		<dc:creator>Vote totals from Iran were statistically&#160;improbable &#124; MNpublius.com</dc:creator>
		<pubDate>Mon, 22 Jun 2009 21:58:06 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31191</guid>
		<description>[...] [via&#160;FlowingData]    June 22, 2009, 4:57 pm &#124; Category: General &#124; [...]</description>
		<content:encoded><![CDATA[<p>[...] [via&nbsp;FlowingData]    June 22, 2009, 4:57 pm | Category: General | [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nathan Yau</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31190</link>
		<dc:creator>Nathan Yau</dc:creator>
		<pubDate>Mon, 22 Jun 2009 20:06:47 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31190</guid>
		<description>no comment.</description>
		<content:encoded><![CDATA[<p>no comment.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Visio Guy</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31187</link>
		<dc:creator>Visio Guy</dc:creator>
		<pubDate>Mon, 22 Jun 2009 17:46:38 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31187</guid>
		<description>Maybe the Iranian ballot-stuffers used FlowingData text to generate those false election results.

I took the text from this post, then looked at the last digit of the ASCII code for each column (ie: the ones-column). Here is the histogram for that data:

00. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
01. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
02. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
03. &#124;&#124;&#124;&#124;&#124;&#124;&#124;
04. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
05. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
06. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
07. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
08. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;
09. &#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;&#124;

Clearly, there are too many characters that end with a code of &quot;2&quot; in this post, which makes me suspect that it wasn&#039;t generated from a valid sample.

Hmmmm....</description>
		<content:encoded><![CDATA[<p>Maybe the Iranian ballot-stuffers used FlowingData text to generate those false election results.</p>
<p>I took the text from this post, then looked at the last digit of the ASCII code for each column (ie: the ones-column). Here is the histogram for that data:</p>
<p>00. |||||||||||||||||||||||||||||||||<br />
01. ||||||||||||||||||||||||||||||||||||||||||||||||<br />
02. ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||<br />
03. |||||||<br />
04. |||||||||||||||||||||||||<br />
05. ||||||||||||||||||||||||||||||||<br />
06. |||||||||||||||||||||||<br />
07. ||||||||||||||||||||||||||||<br />
08. ||||||||||||||||<br />
09. |||||||||||||||||</p>
<p>Clearly, there are too many characters that end with a code of &#8220;2&#8243; in this post, which makes me suspect that it wasn&#8217;t generated from a valid sample.</p>
<p>Hmmmm&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael O.</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31185</link>
		<dc:creator>Michael O.</dc:creator>
		<pubDate>Mon, 22 Jun 2009 14:34:26 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31185</guid>
		<description>Hi Nathan,

Somewhat related to this, I just finished an &lt;a href=&quot;http://vis.cs.ucdavis.edu/~ogawa/research/iran-election-map/&quot; rel=&quot;nofollow&quot;&gt;election map of Iran&lt;/a&gt;. I found it odd that I didn&#039;t see anything like it in the news, so I decided to make one myself. Geographic visualization isn&#039;t my usual thing, but the data is really compelling.</description>
		<content:encoded><![CDATA[<p>Hi Nathan,</p>
<p>Somewhat related to this, I just finished an <a href="http://vis.cs.ucdavis.edu/~ogawa/research/iran-election-map/" rel="nofollow">election map of Iran</a>. I found it odd that I didn&#8217;t see anything like it in the news, so I decided to make one myself. Geographic visualization isn&#8217;t my usual thing, but the data is really compelling.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Thomas Lotze</title>
		<link>http://flowingdata.com/2009/06/22/the-devil-is-in-the-digits/#comment-31182</link>
		<dc:creator>Thomas Lotze</dc:creator>
		<pubDate>Mon, 22 Jun 2009 08:59:54 +0000</pubDate>
		<guid isPermaLink="false">http://flowingdata.com/?p=1930#comment-31182</guid>
		<description>I actually prefer Professor Mebane&#039;s analysis, which can be found &lt;a href=&quot;http://www-personal.umich.edu/~wmebane/&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.  he has been updating results daily over the last week.  He looks at the &lt;em&gt;second&lt;/em&gt; digit for Benford anomalies.  Previous work of his has exmained why first digit distributions can be naturally different from Benford results, a result recently referenced by &lt;a href=&quot;http://www.fivethirtyeight.com/2009/06/unconvincing-to-me-use-of-benfords-law.html&quot; rel=&quot;nofollow&quot;&gt;fivethirtyeight.com&lt;/a&gt;, which also had &lt;a href=&quot;http://www.fivethirtyeight.com/2009/06/karroubis-unlucky-7s.html&quot; rel=&quot;nofollow&quot;&gt;another analysis of the 7s argument&lt;/a&gt;.  The paper is fairly accessible, but I&#039;ve also put up some visualizations of his results &lt;a href=&quot;http://www.math.umd.edu/~lotze&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;.  Also available there is the full data used for the analysis and the R files used--I invite people here to find other ways to show the results, or find other aspects to the data.</description>
		<content:encoded><![CDATA[<p>I actually prefer Professor Mebane&#8217;s analysis, which can be found <a href="http://www-personal.umich.edu/~wmebane/" rel="nofollow">here</a>.  he has been updating results daily over the last week.  He looks at the <em>second</em> digit for Benford anomalies.  Previous work of his has exmained why first digit distributions can be naturally different from Benford results, a result recently referenced by <a href="http://www.fivethirtyeight.com/2009/06/unconvincing-to-me-use-of-benfords-law.html" rel="nofollow">fivethirtyeight.com</a>, which also had <a href="http://www.fivethirtyeight.com/2009/06/karroubis-unlucky-7s.html" rel="nofollow">another analysis of the 7s argument</a>.  The paper is fairly accessible, but I&#8217;ve also put up some visualizations of his results <a href="http://www.math.umd.edu/~lotze" rel="nofollow">here</a>.  Also available there is the full data used for the analysis and the R files used&#8211;I invite people here to find other ways to show the results, or find other aspects to the data.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

