A collection of small datasets

Posted to Data Sources  |  Nathan Yau

Sometimes you need data, any data, to test or mess around with. Sometimes you just want to make weird crap. Corpora is a collection of small datasets that might suit your needs.

This project is a collection of static corpora (plural of “corpus”) that are potentially useful in the creation of weird internet stuff. I’ve found that, as a creator, sometimes I am making something that needs access to a lot of adjectives, but not necessarily every adjective in the English language. So for the last year I’ve been copy/pasting an adjs.json file from project to project. This is kind of awful, so I’m hoping that this project will at least help me keep everything in one place.

Some of the sets: animals, colors, corporations, and foods.

Favorites

The Most Unisex Names in US History

Moving on from the most trendy names in US history, let’s look at the most unisex ones. Some names have …

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.

Marrying Age

People get married at various ages, but there are definite trends that vary across demographic groups. What do these trends look like?

19 Maps That Will Blow Your Mind and Change the Way You See the World. Top All-time. You Won’t Believe Your Eyes. Watch.

Many lists of maps promise to change the way you see the world, but this one actually does.