Download data for 1.7 billion Reddit comments

Posted to Data Sources  |  Tags: ,  |  Nathan Yau

There’s been all sorts of weird stuff going on at Reddit lately, but who’s got time for that when you can download 1.6 billion comments left on Reddit, since 2007 through May 2015?

This is an archive of Reddit comments from October of 2007 until May of 2015 (complete month). This reflects 14 months of work and a lot of API calls. This dataset includes nearly every publicly available Reddit comment. Approximately 350,000 comments out of ~1.65 billion were unavailable due to Reddit API issues.

Timestamp, comment ids, controversiality score, and of course the comment text. It’s 5 gigabytes compressed and available over torrent.

Git er done.

Favorites

The Most Unisex Names in US History

Moving on from the most trendy names in US history, let’s look at the most unisex ones. Some names have …

Interactive: When Do Americans Leave For Work?

We don’t all start our work days at the same time, despite what morning rush hour might have you think.

The Changing American Diet

See what we ate on an average day, for the past several decades.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.