Subreddit math with r/The_Donald helps show topic breakdowns

Mar 31, 2017

Trevor Martin for FiveThirtyEight used latent semantic analysis to do math with subreddits, specifically r/The_Donald.

We’ve adapted a technique that’s used in machine learning research — called latent semantic analysis — to characterize 50,323 active subreddits based on 1.4 billion comments posted from Jan. 1, 2015, to Dec. 31, 2016, in a way that allows us to quantify how similar in essence one subreddit is to another. At its heart, the analysis is based on commenter overlap: Two subreddits are deemed more similar if many commenters have posted often to both. This also makes it possible to do what we call “subreddit algebra”: adding one subreddit to another and seeing if the result resembles some third subreddit, or subtracting out a component of one subreddit’s character and seeing what’s left.

Hm.

Favorites

The Changing American Diet

See what we ate on an average day, for the past several decades.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.

Watching the growth of Walmart – now with 100% more Sam’s Club

The ever so popular Walmart growth map gets an update, and yes, it still looks like a wildfire. Sam’s Club follows soon after, although not nearly as vigorously.

Unemployment in America, Mapped Over Time

Watch the regional changes across the country from 1990 to 2016.