Twitter bot generates biographies via Census data

Posted to Statistics  |  Tags: , , ,  |  Nathan Yau

We usually see Census data in aggregate. It comes in choropleth maps or as statistics about various subpopulations and geographies. Is there value in seeing the numbers as individuals? What about the people behind the numbers? FiveThirtyEight intern Jia Zhang experiments on Twitter.

[I] built a Twitter bot that mines for details in the data. Called censusAmericans, it tweets short biographies of Americans based on data they provided to the U.S. Census Bureau between 2009 and 2013. Using a small Python program, the bot reconstitutes numbers and codes from the data into mini-narratives. Once an hour, it turns a row of data into a real person.

Here are a couple of examples:

Fairly straightforward but an interesting exercise. I have a hunch someone is going to expand on this idea soon enough.

In case you’re interested, I’m guessing Zhang used the Public Use Microdata Sample (PUMS) from the Census Bureau, which is a granular dataset based on responses to the American Community Survey. Or maybe I’m thinking about it too hard. It would also be possible to simply create “estimated” individuals with the aggregate data. Either way, this is fun. I want to see more things like this, please.

Favorites

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.