A collection of small datasets

Posted to Data Sources  |  Nathan Yau

Sometimes you need data, any data, to test or mess around with. Sometimes you just want to make weird crap. Corpora is a collection of small datasets that might suit your needs.

This project is a collection of static corpora (plural of “corpus”) that are potentially useful in the creation of weird internet stuff. I’ve found that, as a creator, sometimes I am making something that needs access to a lot of adjectives, but not necessarily every adjective in the English language. So for the last year I’ve been copy/pasting an adjs.json file from project to project. This is kind of awful, so I’m hoping that this project will at least help me keep everything in one place.

Some of the sets: animals, colors, corporations, and foods.

Favorites

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.

Interactive: When Do Americans Leave For Work?

We don’t all start our work days at the same time, despite what morning rush hour might have you think.

Famous Movie Quotes as Charts

In celebration of their 100-year anniversary, the American Film Institute …