Download data for 1.7 billion Reddit comments

Posted to Data Sources  |  Tags: ,  |  Nathan Yau

There’s been all sorts of weird stuff going on at Reddit lately, but who’s got time for that when you can download 1.6 billion comments left on Reddit, since 2007 through May 2015?

This is an archive of Reddit comments from October of 2007 until May of 2015 (complete month). This reflects 14 months of work and a lot of API calls. This dataset includes nearly every publicly available Reddit comment. Approximately 350,000 comments out of ~1.65 billion were unavailable due to Reddit API issues.

Timestamp, comment ids, controversiality score, and of course the comment text. It’s 5 gigabytes compressed and available over torrent.

Git er done.


19 Maps That Will Blow Your Mind and Change the Way You See the World. Top All-time. You Won’t Believe Your Eyes. Watch.

Many lists of maps promise to change the way you see the world, but this one actually does.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.