When we first learn how to deal with data in school, it’s nicely formatted and fits perfectly into a rectangular spreadsheet. Then when we start to deal with real data, we find missing values, inconsistencies, and for some reason it doesn’t plug straight into our software. What the heck?
The caveman way to fix this problem is to open Excel and manually edit everything. Some ad hoc code can often fix your problems, but still that takes time and can be a pain. Google Refine, the Googley evolution of Freebase Gridworks, can help you.
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.
The project is open-source, and you don’t have to upload your data to some other server. Refine runs on your desktop.
Watch the demo in the video below. Then download the application for yourself. I’m looking forward to taking it for a spin.
Have you tried it out (or the previous Freebase version)? Leave your thoughts in the comments.