R site updated

March 5, 2015

Topic

Software / R

The R site has a new face. It looked dated for years — maybe decades — so I’m glad it got a refresh (with further updates in the coming months I assume). I felt like the old look was such a turn off that I had to reassure newcomers that even though the site looks like crap, the computing language itself is in fact not crap.
Read More

How to Make and Use Bar Charts in R

The chart type seems simple enough, but there sure are a lot of bad ones out there. Get yourself out of default mode.

Probably not a heat map

March 4, 2015

Topic

Maps / heatmap

A heat map is a grid of numbers colored by value. I wrote a quick tutorial on how to make the now common statistical visualization. But at some point in the past few years, a heat map came to mean a geographic map with stuff on it. Cartographer Kenneth Field explains what these maps with stuff on it actually are and provides you with the “more established, more accurate and perfectly good terms.”

Extra point for incorporating The Simpsons:

Heat maps have become a popularist way to label a surface representation of data that occurs at discrete points. On one hand the search for a better way of showing point based data which avoids death by push-pin is a sound cartographic approach. Imagine simply looking at a map of points and trying to make sense of the patterns. Chief Clarence ‘Clancy’ Wiggum would certainly struggle to make sense of the pattern of crime in Springfield just from coloured dots.

Data Underload / Jeopardy

Where to Find Jeopardy! Daily Doubles

Placement of Daily Double clues, from season 1 to 31. Watch them play out.

Madden ratings formula

March 2, 2015

Topic

Statistics / gaming, Madden, sports

In football video game Madden, NFL players are scored based on skill, which determines how they play in the game. Neil Paine, with graphics by Reuben Fischer-Baum, describes more than you ever thought you wanted to know about the scoring process in an in-depth look for FiveThirtyEight. At the heart of the process is Donny Moore, who is in charged of most of the (subjective) number assignments.
Read More

rvest: R package to scrape web data

March 2, 2015

Topic

Software / R

Inspired by the Python libraries RoboBrowser and BeautifulSoup, the rvest package by Hadley Wickham helps you scrape web data via R in a similar way.

Parse tables into data frames, navigate around a website, and of course, extract bits from a page. I’ll stick to BeautifulSoup, but I’m saving this for later. I’m sure it’ll come in handy sooner rather than later.

Spreadsheets for life

February 27, 2015

Topic

Software / Excel, Planet Money, spreadsheet

Planet Money goes back to a 1984 article by Steven Levy that discusses this new thing called a spreadsheet. It was taking the place of the paper version that accountants manually edited, added to, and taped together.
Read More

Texas hold ’em win probabilities

February 27, 2015

Topic

Statistical Visualization / gambling, poker

Software engineer Chris Beaumont visualized the strength of opponent hands in Texas hold ’em, given any other hand. This is based on counting about 1.3 trillion possible combinations.
Read More

Gambler’s perspective on sports team win probabilities

February 26, 2015

Topic

Statistical Visualization / gambling, sports

Michael Beuoy’s win probability model plotted on FiveThirtyEight starts all NBA teams at a 50% chance of winning. Then the probability of winning a game increases and decreases from there. However, practically speaking, we know something about the teams before each game, and we don’t give even chances to the worst and best team at the zero-minute mark.

So Todd Schneider took a different approach to minute-by-minute win probability — from a gambling perspective. Each line in the time series starts closer to the end probability as gamblers wager based on what they think the final outcome will be.
Read More

Every NBA team’s chances of winning, by game minute

February 26, 2015

Topic

Statistical Visualization / basketball, FiveThirtyEight

Michael Beuoy made a win probability model for NBA teams and games, based on play-by-play data from 2000 to 2012. The basic calculator lets you punch in the game state, such as time left and the score difference, and it spits out the probability of a win.

Or, for a team-centric view, you can see the chart from Beuoy and Allison McCann for FiveThirtyEight, which plots the average probability using the same model. Steady rise means a steady pull towards a win, whereas spikes and steeper, positive slopes mean a tendency towards scoring spurts.

Identifying cheaters in test results, a simple method

February 25, 2015

Topic

Statistical Visualization / cheating, education

Jonathan Dushoff had issues with students in his population biology class cheating on his exams. One year there was suspicious behavior, but Dushoff and the proctors weren’t able to prove the students cheated as it happened. So he looked closely at the test results to find the guilty students.
Read More

Bayes’ theorem explained with LEGO bricks

February 24, 2015

Topic

Visualization / Bayes, learning, LEGO

Bayes’ theorem is covered in introduction to statistics and probability courses, but I think a lot of people starting out don’t understand it conceptually. They see a formula that you plug numbers into. Here’s an example using LEGO bricks that clarifies the confusion, hopefully.

White House appoints first US Chief Data Scientist

February 23, 2015

Topic

News / data science, government

Did you hear the news? The White House officially appointed DJ Patil as the federal government’s first ever Chief Data Scientist. Awesome.

Here’s Patil, with an introduction by President Barack Obama, on what’s in store and a recruitment note for the US Digital Services.
Read More

A photo of everything touched, for 11 years

February 23, 2015

Topic

Self-surveillance / photo, remembering

Artist Alberto Frigo took a picture of every object he used with his right hand for the past 11 years. Averaging 76 photos per day, the project — Images of the artifact used by the main hand — is low-tech, with just a small, hand-held camera. No internet connection, tagging, or documentation. Just a stream of photos.

Frigo aims to do this until age 60, so he has only 25 more years to go. Yep.

Top 1% earners versus bottom 90%

February 20, 2015

Topic

Statistical Visualization / economics, income

Quoctrung Bui for Planet Money plotted average income for the top one percent of earners against the average income of the bottom 90%, from 1920 to 2012. Through the 1970s, the animation shows rising income for the bottom and relatively static for the top and then vice versa after that.
Read More

Making an Interactive Map with Category Filters

Let readers focus on the regions they care about to make their own comparisons and conclusions.

Impact of vaccines throughout history

February 19, 2015

Topic

Infographics / vaccination, Wall Street Journal

Not that anyone who does not vaccinate their kids cares, but Tynan DeBold and Dov Friedman for the Wall Street Journal show the change in number of cases for various diseases after a vaccination is introduced.
Read More

Loading Data and Basic Formatting in R

It might not be sexy, but you have to load your data and get it in the right format before you can visualize it. Here are the basics, which might be all you need.

Automated Tinder and the Eigenface

February 18, 2015

Topic

Statistics / dating, Eigenface, Tinder

Because using Tinder takes up oh so much time swiping, swiping, and swiping, Justin Long made a bot that swipes and starts conversations for him. Step 1: Use his existing preferences to create two Eigenfaces, commonly used in face recognition, that represent a yes and a no. Step 2: Automate everything else with the Tinder API.
Read More

Seagull skytrails

February 17, 2015

Topic

Visualization / After Effects, compositing, video

Watch one bird fly around, and it’s hard to make out its flight pattern. Time shift multiple copies of that bird, creating an echo effect, and it’s easy. Parker Paul did this with seagulls flapping around at the beach and After Effects.
Read More