A collection of small datasets

Posted to Data Sources  |  Nathan Yau

Sometimes you need data, any data, to test or mess around with. Sometimes you just want to make weird crap. Corpora is a collection of small datasets that might suit your needs.

This project is a collection of static corpora (plural of “corpus”) that are potentially useful in the creation of weird internet stuff. I’ve found that, as a creator, sometimes I am making something that needs access to a lot of adjectives, but not necessarily every adjective in the English language. So for the last year I’ve been copy/pasting an adjs.json file from project to project. This is kind of awful, so I’m hoping that this project will at least help me keep everything in one place.

Some of the sets: animals, colors, corporations, and foods.


This is an American Workday, By Occupation

I simulated a day for employed Americans to see when and where they work.

Where People Run in Major Cities

There are many exercise apps that allow you to keep track of your running, riding, and other activities. Record speed, …

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

Jobs Charted by State and Salary

Jobs and pay can vary a lot depending on where you live, based on 2013 data from the Bureau of Labor Statistics. Here’s an interactive to look.