Using Census survey data properly

Posted to Statistics  |  Tags: ,  |  Nathan Yau

The American Community Survey, an ongoing survey that the Census administers to millions per year, provides detailed information about how Americans live now and decades ago. There are tons of data tables on topics such as housing situations, education, and commute. The natural thing to do is to download the data, take it at face value, and carry on with your analysis or visualization.

However, as is usually the case with data, there’s more to it than that. Paul Overberg, a database editor at USA Today, explains in a practical guide on how to get the most out of the survey data (which can be generalized to other survey results).

Journalists who use ACS a lot have a helpful slogan: “Don’t make a big deal out of small differences.” Journalists have all kinds of old-fashioned tools to deal with this kind of challenge, starting with adverbs: “about,” “nearly,” “almost,” etc. It’s also a good idea to round ACS numbers as a signal to users and to improve readability.

In tables and visualizations, the job is tougher. These introduce ranking and cutpoints, which create potential pitfalls. For tables, it’s often better to avoid rankings and instead create groups—high, middle, low. In visualizations, one workaround is to adapt high-low-close stock charts to show a number and its error margins. Interactive data can provide important details on hover or click.

If you do any kind of data reporting, whatever field it’s in, you should be familiar with most of what Overberg describes. If not, better get your learn on.

Favorites

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.

One Dataset, Visualized 25 Ways

“Let the data speak” they say. But what happens when the data rambles on and on?