Jeremy B. Merrill is working on the problem of too much data in PDF files. “My pattern solves this problem using tabula-extractor, the Ruby library (and command-line tool) that powers Tabula. It’s built to output data to CSVs or to a MySQL database.”