Data cleaning tips

Posted to Statistics  |  Tags:  |  Nathan Yau

When you first learn statistics, visualization, or any data-related subject, the data usually is given to you in a ready-to-use format. This is so that you can spend most of your time on the topic of interest. But once you step outside the learning bubble, data rarely comes in the format you want.

Marc Bellemare, an associate professor in the Department of Applied Economics at the University of Minnesota, provides some practical tips on how to deal with this. Bellemare’s parting advice:

Really, there is no big secret to cleaning data other than “Document everything” and to save everything in different files and in different locations (i.e., your computer, Dropbox, Google Drive), and there is no other way to learn data cleaning than by doing it.

Yep.

Some of the tips are in the context of specific software environment, but you can easily apply them to more general situations.

Favorites

19 Maps That Will Blow Your Mind and Change the Way You See the World. Top All-time. You Won’t Believe Your Eyes. Watch.

Many lists of maps promise to change the way you see the world, but this one actually does.

Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …

Marrying Age

People get married at various ages, but there are definite trends that vary across demographic groups. What do these trends look like?

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.