A collection of small datasets

Nov 14, 2014

Sometimes you need data, any data, to test or mess around with. Sometimes you just want to make weird crap. Corpora is a collection of small datasets that might suit your needs.

This project is a collection of static corpora (plural of “corpus”) that are potentially useful in the creation of weird internet stuff. I’ve found that, as a creator, sometimes I am making something that needs access to a lot of adjectives, but not necessarily every adjective in the English language. So for the last year I’ve been copy/pasting an adjs.json file from project to project. This is kind of awful, so I’m hoping that this project will at least help me keep everything in one place.

Some of the sets: animals, colors, corporations, and foods.

Favorites

Visualizing the Uncertainty in Data

Data is an abstraction, and it’s impossible to encapsulate everything it represents in real life. So there is uncertainty. Here are ways to visualize the uncertainty.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Pizza Place Geography

Most of the major pizza chains are within a 5-mile …