**Tags:** R

Extra point for incorporating The Simpsons:

Heat maps have become a popularist way to label a surface representation of data that occurs at discrete points. On one hand the search for a better way of showing point based data which avoids death by push-pin is a sound cartographic approach. Imagine simply looking at a map of points and trying to make sense of the patterns. Chief Clarence 'Clancy' Wiggum would certainly struggle to make sense of the pattern of crime in Springfield just from coloured dots.

**Tags:** heatmap

**Tags:** Jeopardy

In football video game Madden, NFL players are scored based on skill, which determines how they play in the game. Neil Paine, with graphics by Reuben Fischer-Baum, describes more than you ever thought you wanted to know about the scoring process in an in-depth look for FiveThirtyEight. At the heart of the process is Donny Moore, who is in charged of most of the (subjective) number assignments.

In that role, Moore is tasked with assigning more than 40 numerical grades to each of the NFL's roughly 2,600 players, evaluating them in categories ranging from passing accuracy to tackling ability. Moore's process has largely been a black box, and yet it shapes how more than 5 million gamers simulate pro football — particularly because there's no official alternative to his numbers. A decade after signing a controversial exclusivity deal with the league and the players union, Madden is still the only licensed NFL game in town.

The graphic above shows the weights for each skill (rows) and position (columns). For example, offensive line positions place a lot of weight on pass block skills.

Towards the end, there's a fun interactive that lets you see how you would stack up against players in-game.

There's no shame in being a punter.

]]>Parse tables into data frames, navigate around a website, and of course, extract bits from a page. I'll stick to BeautifulSoup, but I'm saving this for later. I'm sure it'll come in handy sooner rather than later.

**Tags:** R

From the original article, a fine use of quotation marks:

All this powerful scenario-testing machinery right there on the desktop induces some people to experiment with elaborate models. They talk of "playing" with the numbers, "massaging" the model. Computer "hackers" lose themselves in the intricacies of programming; spreadsheet hackers lose themselves in the world of what-if.

Sound familiar? Have a listen below. It's not nearly as boring as it sounds.

**Tags:** Excel, Planet Money, spreadsheet

Software engineer Chris Beaumont visualized the strength of opponent hands in Texas hold 'em, given any other hand. This is based on counting about 1.3 trillion possible combinations.

Simply enter a card combination, and the grid shows the win-loss percentage differences for all possible opponent hands. Each card value (e.g. 2 or a King) pair is represented as a four by four grid to show each suit (e.g. heart or spade). The above is the strengths of an opponent's hand given you have a four of hearts and a queen of spades. Red indicates higher chances of you losing and blue indicates higher chances of you winning.

See also the grids for hand frequency and average hand strength.

]]>Michael Beuoy's win probability model plotted on FiveThirtyEight starts all NBA teams at a 50% chance of winning. Then the probability of winning a game increases and decreases from there. However, practically speaking, we know something about the teams before each game, and we don't give even chances to the worst and best team at the zero-minute mark.

So Todd Schneider took a different approach to minute-by-minute win probability — from a gambling perspective. Each line in the time series starts closer to the end probability as gamblers wager based on what they think the final outcome will be.

I like the plot that shows gamblers' expectations against actual win percentage.

The Atlanta Hawks currently lead the league in "wins above gamblers' expectations", with an actual winning percentage of 78.2% compared to an expected winning rate of 60.6%. The Milwaukee Bucks and Memphis Grizzlies have also both performed significantly above gamblersâ€™ expectations. The lowly New York Knicks, in addition to having the worst absolute record in the league, are performing the worst relative to gamblersâ€™ expectations. The Knicks have been expected to win 32.2% of their games, and yet have only managed to win 19.1%.

The takeaway: Yes, you expect the Knicks to play poorly, but they go beyond your expectations and play even worse than you imagined.

See the probabilities for professional football and baseball too. [Thanks, Todd]

]]>**Tags:** basketball, FiveThirtyEight

Michael Beuoy made a win probability model for NBA teams and games, based on play-by-play data from 2000 to 2012. The basic calculator lets you punch in the game state, such as time left and the score difference, and it spits out the probability of a win.

Or, for a team-centric view, you can see the chart from Beuoy and Allison McCann for FiveThirtyEight, which plots the average probability using the same model. Steady rise means a steady pull towards a win, whereas spikes and steeper, positive slopes mean a tendency towards scoring spurts.

**Tags:** basketball, FiveThirtyEight

Jonathan Dushoff had issues with students in his population biology class cheating on his exams. One year there was suspicious behavior, but Dushoff and the proctors weren't able to prove the students cheated as it happened. So he looked closely at the test results to find the guilty students.

The final is entirely multiple choice. I got the results files from the scantron office. I figured that I wouldn't quite know what to do with a comparison just between these two kids (unless the tests were identical), and that it would be just about as easy (and far more informative) to compare everybody to everybody else. It's still kind of hard for me to get used to the fact that we have computers now and can really do stuff like this. I calculated the number of identical right answers and the number of identical wrong answers for each pair of students (~18K pairs), and plotted it out.

The diagonal line indicates two students who had the exact same wrong and right answers. No pair of students did this, but there were four outlying pairs that got close, shown in red. And looking back at the seating arrangements, in a class of 200 students, all four pairs were students who sat adjacent to each other.

]]>