How to ask for datasets

Posted to Data Sources  |  Nathan Yau

There’s a lot of data readily available online. We know this. However, there’s also a lot of data available that’s offline, sitting on people’s hard drives. You just have to ask for it — in the right way. Christian Kreibich, a researcher for the International Computer Science Institute, provides a guide.

Make sure you’ve done your homework. This, too, is key. You need to demonstrate that you know your stuff and have good reasons for the inquiry. To give a negative example, I frequently receive requests for “botnet data”. That could mean anything. Are you interested in malware binaries, traffic captures, NetFlow data? Why, and why would you need mine? Understand the meaning and potential of the data you’re asking for, and be concrete. Understand the implications of obtaining certain datasets, such as privacy concerns, risks to others, or repeatability of the experiment.

Other tips include don’t be a jerk, make your affiliation clear, and make your purpose clear. But the first tip — don’t be shy — is the best. Just because the data isn’t online doesn’t mean the source doesn’t want to share, and it never hurts to ask.

This tweet from Kevin Quealy came to mind:

Favorites

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

The Changing American Diet

See what we ate on an average day, for the past several decades.

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

Pizza Place Geography

Most of the major pizza chains are within a 5-mile radius of where I live, so I have my pick, …