Parse tables into data frames, navigate around a website, and of course, extract bits from a page. I'll stick to BeautifulSoup, but I'm saving this for later. I'm sure it'll come in handy sooner rather than later.

**Tags:** R

From the original article, a fine use of quotation marks:

All this powerful scenario-testing machinery right there on the desktop induces some people to experiment with elaborate models. They talk of "playing" with the numbers, "massaging" the model. Computer "hackers" lose themselves in the intricacies of programming; spreadsheet hackers lose themselves in the world of what-if.

Sound familiar? Have a listen below. It's not nearly as boring as it sounds.

**Tags:** Excel, Planet Money, spreadsheet

Software engineer Chris Beaumont visualized the strength of opponent hands in Texas hold 'em, given any other hand. This is based on counting about 1.3 trillion possible combinations.

Simply enter a card combination, and the grid shows the win-loss percentage differences for all possible opponent hands. Each card value (e.g. 2 or a King) pair is represented as a four by four grid to show each suit (e.g. heart or spade). The above is the strengths of an opponent's hand given you have a four of hearts and a queen of spades. Red indicates higher chances of you losing and blue indicates higher chances of you winning.

See also the grids for hand frequency and average hand strength.

]]>Michael Beuoy's win probability model plotted on FiveThirtyEight starts all NBA teams at a 50% chance of winning. Then the probability of winning a game increases and decreases from there. However, practically speaking, we know something about the teams before each game, and we don't give even chances to the worst and best team at the zero-minute mark.

So Todd Schneider took a different approach to minute-by-minute win probability — from a gambling perspective. Each line in the time series starts closer to the end probability as gamblers wager based on what they think the final outcome will be.

I like the plot that shows gamblers' expectations against actual win percentage.

The Atlanta Hawks currently lead the league in "wins above gamblers' expectations", with an actual winning percentage of 78.2% compared to an expected winning rate of 60.6%. The Milwaukee Bucks and Memphis Grizzlies have also both performed significantly above gamblersâ€™ expectations. The lowly New York Knicks, in addition to having the worst absolute record in the league, are performing the worst relative to gamblersâ€™ expectations. The Knicks have been expected to win 32.2% of their games, and yet have only managed to win 19.1%.

The takeaway: Yes, you expect the Knicks to play poorly, but they go beyond your expectations and play even worse than you imagined.

See the probabilities for professional football and baseball too. [Thanks, Todd]

]]>**Tags:** basketball, FiveThirtyEight

Michael Beuoy made a win probability model for NBA teams and games, based on play-by-play data from 2000 to 2012. The basic calculator lets you punch in the game state, such as time left and the score difference, and it spits out the probability of a win.

Or, for a team-centric view, you can see the chart from Beuoy and Allison McCann for FiveThirtyEight, which plots the average probability using the same model. Steady rise means a steady pull towards a win, whereas spikes and steeper, positive slopes mean a tendency towards scoring spurts.

**Tags:** basketball, FiveThirtyEight

Jonathan Dushoff had issues with students in his population biology class cheating on his exams. One year there was suspicious behavior, but Dushoff and the proctors weren't able to prove the students cheated as it happened. So he looked closely at the test results to find the guilty students.

The final is entirely multiple choice. I got the results files from the scantron office. I figured that I wouldn't quite know what to do with a comparison just between these two kids (unless the tests were identical), and that it would be just about as easy (and far more informative) to compare everybody to everybody else. It's still kind of hard for me to get used to the fact that we have computers now and can really do stuff like this. I calculated the number of identical right answers and the number of identical wrong answers for each pair of students (~18K pairs), and plotted it out.

The diagonal line indicates two students who had the exact same wrong and right answers. No pair of students did this, but there were four outlying pairs that got close, shown in red. And looking back at the seating arrangements, in a class of 200 students, all four pairs were students who sat adjacent to each other.

]]>Bayes' theorem is covered in introduction to statistics and probability courses, but I think a lot of people starting out don't understand it conceptually. They see a formula that you plug numbers into. Here's an example using LEGO bricks that clarifies the confusion, hopefully.

]]>Here's Patil, with an introduction by President Barack Obama, on what's in store and a recruitment note for the US Digital Services.

**Tags:** data science, government

**Tags:** photo, remembering

Artist Alberto Frigo took a picture of every object he used with his right hand for the past 11 years. Averaging 76 photos per day, the project — *Images of the artifact used by the main hand* — is low-tech, with just a small, hand-held camera. No internet connection, tagging, or documentation. Just a stream of photos.

Frigo aims to do this until age 60, so he has only 25 more years to go. Yep.

**Tags:** photo, remembering

Quoctrung Bui for Planet Money plotted average income for the top one percent of earners against the average income of the bottom 90%, from 1920 to 2012. Through the 1970s, the animation shows rising income for the bottom and relatively static for the top and then vice versa after that.

Of course, now all I want to see is everything in between: the distribution of earnings of these two groups and the middle group between 90 and 1 percent. Good thing you can download some of that data yourself from the World Top Incomes Database.

]]>