<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>FlowingData &#187; Statistics</title>
	<atom:link href="http://flowingdata.com/category/statistics/feed/" rel="self" type="application/rss+xml" />
	<link>http://flowingdata.com</link>
	<description>Strength in Numbers</description>
	<lastBuildDate>Thu, 24 May 2012 07:48:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<atom:link rel="next" href="http://flowingdata.com/category/statistics/feed/?page=2" />

		<item>
		<title>Why are so many men pregnant?</title>
		<link>http://flowingdata.com/2012/05/17/why-are-so-many-men-pregnant/</link>
		<comments>http://flowingdata.com/2012/05/17/why-are-so-many-men-pregnant/#comments</comments>
		<pubDate>Thu, 17 May 2012 07:01:32 +0000</pubDate>
		<dc:creator>Kim Rees</dc:creator>
				<category><![CDATA[Mistaken Data]]></category>
		<category><![CDATA[BMJ]]></category>
		<category><![CDATA[men]]></category>
		<category><![CDATA[pregnant]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24324</guid>
		<description><![CDATA[Garbage in, garbage out the old adage goes. Nigel Hawkes, Director of Straight Statistics, describes a sort of statistical whistleblowing &#8230;]]></description>
			<content:encoded><![CDATA[<p>Garbage in, garbage out the old adage goes. Nigel Hawkes, Director of <a href="http://www.straightstatistics.org" title="Straight Statistics" target="_blank">Straight Statistics</a>, describes a sort of <a href="http://www.straightstatistics.org/blog/2012/04/06/why-are-so-many-men-pregnant" title="Why are so Many Men Pregnant?">statistical whistleblowing letter</a> to the British Medical Journal.</p>
<blockquote><p>A team from Imperial College found that in 2009-10, nearly 20,000 adults were coded as having attended paediatric outpatient services, and 3,000 patients under 19 were apparently treated in geriatric clinics. Even more striking, between 15,000 and 20,000 men have been admitted to obstetric wards each year since 2003, and almost 10,000 to gynaecology wards.</p></blockquote>
<p>It's hard to put your faith in analysis, visualization, policy, and anything else that comes out of data with reports like these. With human error being a known issue, we have to find better ways of inputting and double-checking data. Unfortunate mistakes at the outset only lead to bigger problems down the line.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/17/why-are-so-many-men-pregnant/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>A Future Without Key Social and Economic Statistics for the Country</title>
		<link>http://flowingdata.com/2012/05/13/a-future-without-key-social-and-economic-statistics-for-the-country/</link>
		<comments>http://flowingdata.com/2012/05/13/a-future-without-key-social-and-economic-statistics-for-the-country/#comments</comments>
		<pubDate>Sun, 13 May 2012 07:46:23 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Data Sources]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24113</guid>
		<description><![CDATA[Robert Groves, director of the U.S. Census Bureau, on the Appropriations Bill: The Appropriations Bill eliminates the Economic Census, which &#8230;]]></description>
			<content:encoded><![CDATA[<p>Robert Groves, director of the U.S. Census Bureau, <a href="http://directorsblog.blogs.census.gov/2012/05/11/a-future-without-key-social-and-economic-statistics-for-the-country/">on the Appropriations Bill</a>:</p>
<blockquote><p>The Appropriations Bill eliminates the Economic Census, which measures the health of our economy. It terminates the American Community Survey, which produces the social and demographic information that monitors the impact of economic trends on communities throughout the country. It halts crucial development of ways to save money on the next decennial census. In the last three years the Census Bureau has reacted to budget and technological challenges by mounting aggressive operational efficiency programs to make these key statistical cornerstones of the country more cost efficient. Eliminating them halts all the progress to build 21st century statistical tools through those innovations. This bill thus devastates the nation’s statistical information about the status of the economy and the larger society.</p></blockquote>
<p>A lot of the negative comments following the post are from people who have never used Census data, or any substantial amount of data for that matter, and have no clue how a dataset can feed into a model to make other estimates. Then there's the people who don't want to answer questions about their toilets. I wonder what their Facebook profiles look like.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/13/a-future-without-key-social-and-economic-statistics-for-the-country/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>TV anachronisms</title>
		<link>http://flowingdata.com/2012/05/11/tv-anachronisms/</link>
		<comments>http://flowingdata.com/2012/05/11/tv-anachronisms/#comments</comments>
		<pubDate>Fri, 11 May 2012 18:05:06 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[anachronism]]></category>
		<category><![CDATA[television]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24095</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/05/11/tv-anachronisms/"><img width="566" height="591" src="http://flowingdata.com/wp-content/uploads/2012/05/Modern-to-period-use-ratio.png" class="attachment-medium wp-post-image" alt="Modern to period use ratio" title="Modern to period use ratio" /></a></p>Princeton history graduate student Benjamin Schmidt explores changes in language through TV anachronisms. In Schmidt's most recent analysis, he examines &#8230;]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2012/05/11/tv-anachronisms/"><img width="566" height="591" src="http://flowingdata.com/wp-content/uploads/2012/05/Modern-to-period-use-ratio.png" class="attachment-medium wp-post-image" alt="Modern to period use ratio" title="Modern to period use ratio" /></a></p><p>Princeton history graduate student Benjamin Schmidt explores changes in language through TV anachronisms. In Schmidt's most recent analysis, he <a href="http://www.prochronism.com/2012/05/callbacks.html">examines Megan's use of "callback" in the last episode of Mad Men</a>. Above is the ratio of modern use to period use. Notice callback sticking out in the top left.</p>
<blockquote><p>The big one from the charts: Megan gets "a callback for" an audition. This is, the data says, a candidate for the worst anachronism of the season. The word "callback" is about 100x more common by the 1990s, and "callback for" is even worse. The OED doesn't have any examples of a theater-oriented use of "callback" until the 1970s; although I bet one could find some examples somewhere earlier in the New York theater scene, that may not save it. It wouldn't really suite Megan's generally dilettantish attitude towards the theater, or the office staff's lack of knowledge of it, for them to be so au courant. "call-back" and "call back" don't seem much more likely.</p></blockquote>
<p>Other anachronisms include the use of "pay phone" and a frequent use of "on the phone with" which didn't peak until the 1970s.</p>
<p>Don't miss the look into <a href="http://www.prochronism.com/2012/04/making-downton-more-traditional.html">Downton Abbey anachronisms</a>. Also, <a href="http://sappingattention.blogspot.com/2012/02/poor-mans-sentiment-analysis.html">more details</a> from Schmidt on his methodology. </p>
<p>[via <a href="http://blog.revolutionanalytics.com/2012/05/on-the-language-of-mad-men.html">Revolutions</a>]</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/11/tv-anachronisms/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why the American Community Survey is worth keeping</title>
		<link>http://flowingdata.com/2012/05/10/why-the-american-community-survey-is-worth-keeping/</link>
		<comments>http://flowingdata.com/2012/05/10/why-the-american-community-survey-is-worth-keeping/#comments</comments>
		<pubDate>Thu, 10 May 2012 17:02:04 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24074</guid>
		<description><![CDATA[Jerzy Wieczorek, a statistician with the U.S. Census Bureau, explains why the American Community Survey is worthwhile. Besides the direct &#8230;]]></description>
			<content:encoded><![CDATA[<p>Jerzy Wieczorek, a statistician with the U.S. Census Bureau, <a href="http://civilstat.com/?p=319">explains why the American Community Survey is worthwhile</a>.</p>
<blockquote><p>Besides the direct estimates from the ACS itself, the Census Bureau uses ACS data as the backbone of several other programs. For example, the Small Area Income and Poverty Estimates program provides annual data to the Department of Education for use in allocating funds to school districts, based on local counts and rates of children in poverty. Without the ACS we would be limited to using smaller surveys (and thus less accurate information about poverty in each school district) or older data (which can become outdated within a few years, such as during the recent recession). Either way, it would hurt our ability to allocate resources fairly to schoolchildren nationwide.</p>
<p>Similarly, the Census Bureau uses the ACS to produce other timely small-area estimates required by Congressional legislation or requested by other agencies: the number of people with health insurance, people with disabilities, minority language speakers, etc. The legislation requires a data source like the ACS not only so that it can be carried out well, but also so its progress can be monitored.</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/10/why-the-american-community-survey-is-worth-keeping/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>House votes to cut the American Community Survey</title>
		<link>http://flowingdata.com/2012/05/10/house-votes-to-cut-the-american-community-survey/</link>
		<comments>http://flowingdata.com/2012/05/10/house-votes-to-cut-the-american-community-survey/#comments</comments>
		<pubDate>Thu, 10 May 2012 16:48:08 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[census]]></category>
		<category><![CDATA[survey]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24067</guid>
		<description><![CDATA[Last month Republicans were pushing a bill to get rid of the American Community Survey, an 11-page questionnaire about housing, &#8230;]]></description>
			<content:encoded><![CDATA[<p>Last month Republicans were <a href="http://flowingdata.com/2012/04/03/fear-of-big-brother-and-government-surveys/">pushing a bill</a> to get rid of the American Community Survey, an 11-page questionnaire about housing, education, and other things. Yesterday, <a href="http://www.huffingtonpost.com/2012/05/09/house-votes-cut-census-survey_n_1504748.html">a bill passed to cut the survey in a 232 to 190 vote</a>.</p>
<blockquote><p>Republicans, acknowledging its usefulness, attacked the survey as an unconstitutional invasion of privacy, arguing that the government has no business knowing how many flush toilets someone has, for instance.</p>
<p>"It would seem that these questions hardly fit the scope of what was intended or required by the Constitution," said Rep. Daniel Webster (R-Fla.), author of the amendment.</p>
<p>"This survey is inappropriate for taxpayer dollars," Webster added. "It's the definition of a breach of personal privacy. It's the picture of what's wrong in Washington, D.C. It's unconstitutional."</p></blockquote>
<p>The ACS is <em>the</em> picture of what's wrong in Washington? This is idiocy.</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2012/05/10/why-the-american-community-survey-is-worth-keeping/' rel='bookmark' title='Why the American Community Survey is worth keeping'>Why the American Community Survey is worth keeping</a></li>
<li><a href='http://flowingdata.com/2011/08/03/survey-of-the-universe-fly-through/' rel='bookmark' title='Fly through a survey of the universe'>Fly through a survey of the universe</a></li>
<li><a href='http://flowingdata.com/2011/06/17/in-pursuit-of-the-american-dream-house/' rel='bookmark' title='In pursuit of the American dream (house)'>In pursuit of the American dream (house)</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/10/house-votes-to-cut-the-american-community-survey/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>CNN transcript collection, 2000-2012</title>
		<link>http://flowingdata.com/2012/05/09/cnn-transcript-collection-2000-2012/</link>
		<comments>http://flowingdata.com/2012/05/09/cnn-transcript-collection-2000-2012/#comments</comments>
		<pubDate>Wed, 09 May 2012 18:25:55 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Data Sources]]></category>
		<category><![CDATA[archive]]></category>
		<category><![CDATA[CNN]]></category>
		<category><![CDATA[transcripts]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24053</guid>
		<description><![CDATA[Thanks to the Internet Archive and CNN, thirteen years of transcripts, about a gigabyte compressed, is available to download as &#8230;]]></description>
			<content:encoded><![CDATA[<p>Thanks to the Internet Archive and CNN, thirteen years of transcripts, about a gigabyte compressed, is <a href="http://archive.org/details/cnn-transcripts-2000-2012">available to download as one file</a>.</p>
<blockquote><p>For over a decade, CNN (Cable News Network) has been providing transcripts of shows, events and newscasts from its broadcasts. The archive has been maintained and the text transcripts have been dependably available at transcripts.cnn.com. This is a just-in-case grab of the years of transcripts for later study and historical research.</p></blockquote>
<p>Changes in news coverage and CNN's focus over the years, anyone?</p>
<p>[via @<a href="https://twitter.com/#!/A_L">A_L</a>]</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2008/03/14/10-largest-data-breaches-since-2000-millions-affected/' rel='bookmark' title='10 Largest Data Breaches Since 2000 &#8211; Millions Affected'>10 Largest Data Breaches Since 2000 &#8211; Millions Affected</a></li>
<li><a href='http://flowingdata.com/2007/12/04/transcript-analyzer-for-republican-debate/' rel='bookmark' title='Transcript Analyzer for Republican Debate'>Transcript Analyzer for Republican Debate</a></li>
<li><a href='http://flowingdata.com/2010/07/27/afghanistan-war-logs-revealed-and-mapped/' rel='bookmark' title='Afghanistan war logs revealed and mapped'>Afghanistan war logs revealed and mapped</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/09/cnn-transcript-collection-2000-2012/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Common statistical fallacies</title>
		<link>http://flowingdata.com/2012/05/03/common-statistical-fallacies/</link>
		<comments>http://flowingdata.com/2012/05/03/common-statistical-fallacies/#comments</comments>
		<pubDate>Thu, 03 May 2012 09:53:16 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[education]]></category>
		<category><![CDATA[fallacies]]></category>
		<category><![CDATA[Joan Garfield]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=23904</guid>
		<description><![CDATA[I've been reading papers on how people learn statistics (and thoughts on teaching the subject) and came across the frequently-cited &#8230;]]></description>
			<content:encoded><![CDATA[<p>I've been reading papers on how people learn statistics (and thoughts on teaching the subject) and came across the frequently-cited work of mathematical psychologists Amos Tversky and Daniel Kahneman. In 1972, they studied statistical misconceptions. It doesn't seem much has changed. Joan Garfield (1995) summarizes in <a href="http://www.stat.auckland.ac.nz/~iase/publications/isr/95.Garfield.pdf">How to Learn Statistics</a> [pdf].</p>
<p><strong>Representativeness:</strong></p>
<blockquote><p>People estimate the likelihood of a sample based on how closely it resembles the population.</p></blockquote>
<p>You can't always judge how likely or improbable a sample is based on how it compares to a known population. For example, let's say you flip a coin four times and get four tails in a row (TTTT). Then you flip four more times and get HTHT. In the long run, heads and tails are going to be split 50/50, but that doesn't mean the second sequence is more likely.</p>
<p>Similarly, a sequence of ten heads in a row isn't the same as getting a million heads in a row.</p>
<p><strong>Gambler's fallacy:</strong></p>
<blockquote><p>Use of the representative heuristic leads to the view that chance is a self-correcting process.</p></blockquote>
<p>The history boards at roulette tables mean nothing. They're just for show. Just because a red hasn't come up in a while doesn't mean the roulette wheel is due for a red soon. Each spin is independent of the spins that came before it.</p>
<p><strong>Base-rate fallacy:</strong></p>
<blockquote><p>People ignore the relative sizes of population subgroups when judging the likelihood of contingent<br />
events involving the subgroups.</p></blockquote>
<p>You have to consider the base population for comparison. Maybe a company is comprised of 80 percent men and 20 percent women. If your base is the US population, you might consider that inequality, but what if the applicant breakdown was 90 percent men and 10 percent women? In the latter case, a higher percentage of women than men were actually hired.</p>
<p><strong>Availability:</strong></p>
<blockquote><p>Strength of association is used as a basis for judging how likely an event will occur.</p></blockquote>
<p>Just because some percentage of your friends are designers doesn't mean that the same percentage of people are designers elsewhere (obviously). Or the example that Garfield uses: a ten percent divorce rate among people you know isn't necessarily the same nationwide or globally.</p>
<p><strong>Conjunction fallacy:</strong></p>
<blockquote><p>The conjunction of two correlated events is judged to be more likely than either of the events themselves.</p></blockquote>
<p>The <a href="http://en.wikipedia.org/wiki/Conjunction_fallacy">common example</a> from Tversky and Kahneman: </p>
<p>"Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations." A group of people were asked if it was more probable that Linda was a bank teller or a bank teller active in the feminist movement (a sign of the times this poll was taken). </p>
<p>Eighty-five percent of respondents chose the latter, but the probability of two things happening together is always less than or equal to the events occurring individually.</p>
<p>Notice that there's still not much math involved in these examples. It's logic that plays into <a href="http://flowingdata.com/2010/03/04/think-like-a-statistician-without-the-math/">thinking like a statistician without the math</a> (with statistical foundations). You can get a lot done just by thinking critically about your data.</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2010/06/09/strata-of-common-and-not-so-common-colors/' rel='bookmark' title='Strata of common and not so common colors'>Strata of common and not so common colors</a></li>
<li><a href='http://flowingdata.com/2011/08/16/the-sexperience-1000-shows-a-statistical-view-of-what-goes-on-in-the-bedroom/' rel='bookmark' title='The Sexperience 1000 shows a (statistical) view of what goes on in the bedroom'>The Sexperience 1000 shows a (statistical) view of what goes on in the bedroom</a></li>
<li><a href='http://flowingdata.com/2011/06/20/most-common-iphone-passcodes/' rel='bookmark' title='Most common iPhone passcodes'>Most common iPhone passcodes</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/03/common-statistical-fallacies/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Hans Rosling makes Time 100 Most Influential</title>
		<link>http://flowingdata.com/2012/04/18/hans-rosling-makes-time-100-most-influential/</link>
		<comments>http://flowingdata.com/2012/04/18/hans-rosling-makes-time-100-most-influential/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 19:04:20 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=23332</guid>
		<description><![CDATA[It was bound to happen at some point. Doctor and statistician Hans Rosling, best known for his sword-swallowing TED talk, &#8230;]]></description>
			<content:encoded><![CDATA[<p><img src="http://flowingdata.com/wp-content/uploads/2011/03/Hans-Rosling-210x112.png" alt="" title="Hans Rosling" width="210" height="112" class="alignright size-thumbnail wp-image-15652" />It was bound to happen at some point. Doctor and statistician Hans Rosling, best known for his sword-swallowing TED talk, among plenty of <a href="http://flowingdata.com/index.php?s=hans+rosling">other things</a>, made the <a href="http://www.time.com/time/specials/packages/article/0,28804,2111975_2111976_2112170,00.html">Time 100 Most Influential list</a> this year.</p>
<blockquote><p>What does Rosling make of his statistical analysis of worldwide trends? "I am not an optimist," he says. "I'm a very serious possibilist. It's a new category where we take emotion apart and we just work analytically with the world." We can all, Rosling thinks, become healthy and wealthy. What a promising thought, so eloquently rendered with data.</p></blockquote>
<p>[Thanks, wife]</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2010/11/30/the-joy-of-stats-with-hans-rosling/' rel='bookmark' title='The Joy of Stats with Hans Rosling'>The Joy of Stats with Hans Rosling</a></li>
<li><a href='http://flowingdata.com/2007/07/06/hans-rosling-providing-data-inspiring-change/' rel='bookmark' title='Hans Rosling: Providing Data, Inspiring Change'>Hans Rosling: Providing Data, Inspiring Change</a></li>
<li><a href='http://flowingdata.com/2010/07/13/gapminder-makes-its-way-to-the-desktop/' rel='bookmark' title='Gapminder makes its way to the desktop'>Gapminder makes its way to the desktop</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/04/18/hans-rosling-makes-time-100-most-influential/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why $1m Netflix algorithm never went to production</title>
		<link>http://flowingdata.com/2012/04/17/why-1m-netflix-algorithm-never-went-to-production/</link>
		<comments>http://flowingdata.com/2012/04/17/why-1m-netflix-algorithm-never-went-to-production/#comments</comments>
		<pubDate>Tue, 17 Apr 2012 07:04:10 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[netflix]]></category>
		<category><![CDATA[recommendations]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=23269</guid>
		<description><![CDATA[Five and a half years ago, Netflix offered data and a $1 million prize to improve their recommendation system by &#8230;]]></description>
			<content:encoded><![CDATA[<p>Five and a half years ago, Netflix offered data and a $1 million prize to improve their recommendation system by at least ten percent. In 2009, a statistics team at AT&T Labs, <a href="http://www2.research.att.com/~volinsky/netflix/">BellKor</a>, did that. Unfortunately, <a href="http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html">Netflix never integrated the algorithm into production</a>.</p>
<blockquote><p>If you followed the Prize competition, you might be wondering what happened with the final <a href="http://www.netflixprize.com//prize?id=1">Grand Prize ensemble</a> that won the $1M two years later. This is a truly impressive compilation and culmination of years of work, blending hundreds of predictive models to finally cross the finish line. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then.</p></blockquote>
<p>That's too bad. Netflix knows their business better than anyone, but I sure wish <em>Keeping Up with the Kardashians</em> wasn't listed in my top 10 right now.</p>
<p>[via <a href="http://www.techdirt.com/blog/innovation/articles/20120409/03412518422/why-netflix-never-implemented-algorithm-that-won-netflix-1-million-challenge.shtml">Techdirt</a>]</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2007/12/11/netflix-prize-dataset-visualization/' rel='bookmark' title='Netflix Prize Dataset Visualization'>Netflix Prize Dataset Visualization</a></li>
<li><a href='http://flowingdata.com/2011/07/15/netflix-favorites-by-location/' rel='bookmark' title='Netflix favorites by location'>Netflix favorites by location</a></li>
<li><a href='http://flowingdata.com/2010/01/11/the-geography-of-netflix-rentals/' rel='bookmark' title='The Geography of Netflix Rentals'>The Geography of Netflix Rentals</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/04/17/why-1m-netflix-algorithm-never-went-to-production/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Accidental Statistician</title>
		<link>http://flowingdata.com/2012/04/06/the-accidental-statistician/</link>
		<comments>http://flowingdata.com/2012/04/06/the-accidental-statistician/#comments</comments>
		<pubDate>Fri, 06 Apr 2012 07:09:53 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistics]]></category>
		<category><![CDATA[George Box]]></category>
		<category><![CDATA[John Tukey]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=23116</guid>
		<description><![CDATA[George E.P. Box, a statistician known for his body of work in time series analysis and Bayesian inference (and his &#8230;]]></description>
			<content:encoded><![CDATA[<p>George E.P. Box, a statistician known for his body of work in time series analysis and Bayesian inference (and his <a href="http://flowingdata.com/2012/03/22/incorrect/">quotes</a>), <a href="http://www.stat.wisc.edu/~yandell/stat/50-year/Box_George.html">recounts how he became a statistician while trying to solve actual problems</a>. He was a 19-year-old college student studying chemistry. Instead of finishing, he joined the army, fed up with what the British government was doing to stop Hitler.</p>
<blockquote><p>Before I could actually do any of that I was moved to a highly secret experimental station in the south of England. At the time they were bombing London every night and our job was to help to find out what to do if, one night, they used poisonous gas.</p>
<p>Some of England's best scientists were there. There were a lot of experiments with small animals, I was a lab assistant making biochemical determinations, my boss was a professor of physiology dressed up as a colonel, and I was dressed up as a staff sergeant.</p>
<p>The results I was getting were very variable and I told my colonel that what we really needed was a statistician.</p>
<p>He said "we can't get one, what do you know about it?" I said "Nothing, I once tried to read a book about it by someone called R. A. Fisher but I didn't understand it". He said "You've read the book so you better do it", so I said, "Yes sir".</p></blockquote>
<p>Box eventually worked with Fischer, studied under E. S. Pearson in college after his discharge from the army, and started the Statistical Techniques Research Group at Princeton on the insistence of one <a href="http://flowingdata.com/2008/01/01/john-tukey-and-the-beginning-of-interactive-graphics/">John Tukey</a>.</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2010/03/04/think-like-a-statistician-without-the-math/' rel='bookmark' title='Think like a statistician &#8211; without the math'>Think like a statistician &#8211; without the math</a></li>
<li><a href='http://flowingdata.com/2011/02/07/statistician-cracks-the-scratch-lottery-code/' rel='bookmark' title='Statistician cracks the scratch lottery code'>Statistician cracks the scratch lottery code</a></li>
<li><a href='http://flowingdata.com/2012/03/06/thomas-the-train-and-friends-accidental-chart/' rel='bookmark' title='Thomas the Tank Engine and Friends, accidental chart'>Thomas the Tank Engine and Friends, accidental chart</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/04/06/the-accidental-statistician/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

