Subreddit math with r/The_Donald helps show topic breakdowns

Posted to Statistics  |  Tags: ,  |  Nathan Yau

Trevor Martin for FiveThirtyEight used latent semantic analysis to do math with subreddits, specifically r/The_Donald.

We’ve adapted a technique that’s used in machine learning research — called latent semantic analysis — to characterize 50,323 active subreddits based on 1.4 billion comments posted from Jan. 1, 2015, to Dec. 31, 2016, in a way that allows us to quantify how similar in essence one subreddit is to another. At its heart, the analysis is based on commenter overlap: Two subreddits are deemed more similar if many commenters have posted often to both. This also makes it possible to do what we call “subreddit algebra”: adding one subreddit to another and seeing if the result resembles some third subreddit, or subtracting out a component of one subreddit’s character and seeing what’s left.



Unemployment in America, Mapped Over Time

Watch the regional changes across the country from 1990 to 2016.

This is an American Workday, By Occupation

I simulated a day for employed Americans to see when and where they work.

How We Spend Our Money, a Breakdown

We know spending changes when you have more money. Here’s by how much.

The Most Unisex Names in US History

Moving on from the most trendy names in US history, let’s look at the most unisex ones. Some names have …