Extract CSV data from PDF files with Tabula

Posted to Software  |  Tags: ,  |  Nathan Yau

Tabula, by Manuel Aristarán, came out months ago, but I’ve been poking at government data recently and came back to this useful piece of free software to get the data tables out of countless free-floating PDF files.

If you’ve ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can’t easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface.

It’s not the fastest software in the world, but it really is simple to use and it sure beats manual entry. You just load a PDF file into Tabula, which runs on your computer, highlight the table to extract, and the program does the rest. Save as a CSV and do what you want with it.

Download Tabula here. Find out a little more about it on Source.

Favorites

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

How We Spend Our Money, a Breakdown

We know spending changes when you have more money. Here’s by how much.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.