Practical tips for scraping data

Aug 4, 2016

It’s an unpleasant feeling when you have an idea for a project and the data you need is sitting right in front of you on a bunch of random-looking webpages instead of a nice, delimited file. You could either forget about your idea (which is what most people do), you can record manually, or you can take an automated route with a bit of scraping know-how.

I often find myself taking the tedious, manual route out, but sometimes scraping is clearly the best option. David Eads from the NPR Visuals Team describes how they use a model-control approach to scraping data.

Step 1: Find the data and figure out the HTML and/or JavaScript format and pattern. Step 2: Setup a way to parse and spit out the formatted data. Step 3: Optimize.

Oh, and before all that, make sure it’s legal.

Favorites

10 Best Data Visualization Projects of 2015

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

Pizza Place Geography

Most of the major pizza chains are within a 5-mile …