Download data for 1.7 billion Reddit comments

Posted to Data Sources  |  Tags: ,  |  Nathan Yau

There’s been all sorts of weird stuff going on at Reddit lately, but who’s got time for that when you can download 1.6 billion comments left on Reddit, since 2007 through May 2015?

This is an archive of Reddit comments from October of 2007 until May of 2015 (complete month). This reflects 14 months of work and a lot of API calls. This dataset includes nearly every publicly available Reddit comment. Approximately 350,000 comments out of ~1.65 billion were unavailable due to Reddit API issues.

Timestamp, comment ids, controversiality score, and of course the comment text. It’s 5 gigabytes compressed and available over torrent.

Git er done.

Favorites

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

How to Spot Visualization Lies

Many charts don’t tell the truth. This is a simple guide to spotting them.

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

Famous Movie Quotes as Charts

In celebration of their 100-year anniversary, the American Film Institute selected the 100 most memorable quotes from American cinema, and …