Hip hop vocabulary compared between artists

Posted to Statistics  |  Tags: , ,  |  Nathan Yau

Matt Daniels compared rappers' vocabularies to find out who knows the most words.

Literary elites love to rep Shakespeare's vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever.

I decided to compare this data point against the most famous artists in hip hop. I used each artist's first 35,000 lyrics. That way, prolific artists, such as Jay-Z, could be compared to newer artists, such as Drake.

As two points of reference, Daniels also counted the number of unique words in the first 5,000 used words from seven of Shakespeare's works and the number of uniques from the first 35,000 words of Herman Melville's Moby-Dick.

I'm not sure how much stock I would put into these literary comparisons though, because this is purely a keyword count. So "pimps", "pimp", "pimping", and "pimpin" count as four words in a vocabulary and I have a hunch that variants of a single word is more common in rap lyrics than in Shakespeare and Melville. Again, I'm guessing here.

That said, although there could be similar issues within the rapper comparisons, I bet the counts are more comparable.

Favorites

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.

Jobs Charted by State and Salary

Jobs and pay can vary a lot depending on where you live, based on 2013 data from the Bureau of Labor Statistics. Here’s an interactive to look.

19 Maps That Will Blow Your Mind and Change the Way You See the World. Top All-time. You Won’t Believe Your Eyes. Watch.

Many lists of maps promise to change the way you see the world, but this one actually does.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.