Posted to

Data Sources

A collection of small datasets

Sometimes you need data, any data, to test or mess around with. Sometimes you just want to make weird crap. Corpora is a collection of small datasets that might suit…

Unclaimed remains

People die, and for various reasons many bodies go unclaimed. In Los Angeles county, the bodies go to the county crematory. The Los Angeles Times reports, along with a searchable…

Jeopardy! clues data

Here's some weekend project data for you. Reddit user trexmatt dumped a dataset for 216,930 Jeopardy! questions and answers in JSON and CSV formats, a scrape from the J! Archive.…

Crisis Text Line releases trends and data

Crisis Text Line is a service that troubled teens can use to find help with suicidal thoughts, depression, anxiety, and other issues via text messaging. The long-term hope was to…

A more visual world data portal

One of the most annoying parts of downloading data from large portals is that you never quite know what you're gonna get. It's a box of chocolates. It's government data…

Large-ish data packages in R

If you've played around with R enough, there comes a time when you just need some data to mess around with. Maybe it's to learn a new method or to…

Most underrated films

Ben Moore was curious about overrated and underrated films. "Overrated" and "underrated" are slippery terms to try to quantify. An interesting way of looking at this, I thought, would be…

Bike share data in New York, animated

Citi Bike, also known as NYC Bike Share, is releasing monthly data dumps for station check-outs and check-ins, which gives you a sense of where and when people move about…

ProPublica opened a data store

One of the main challenges of any data project is getting the data. It seems obvious, but the effort to get the right data to answer a question seems to…

Texting data to save lives

Remember that TED talk from a couple of years ago on texting patterns to a crisis hotline? The TED talker Nancy Lublin proposed the analysis of these text messages to…

Cancer data for the U.S. released

The Centers for Disease Control and Prevention released their most recent cancer data a few days ago. It's the numbers for 2010, which feels dated. However, the annual data goes…

Government data shutdown

When you go to the United States Census site, Data.gov, or similar government-run sites, you see this. "Due to the lapse in government funding, census.gov sites, services, and all online…

Data.gov revamp

After budget cuts a couple of years ago, I assumed Data.gov was all but dead, but apparently there's a new site in the works. The original version of Data.gov was…

Medicare provider charge data released

The Centers for Medicare and Medicaid Services released billing data for more than 3,000 U.S. hospitals, showing high variance in cost of health scare across the country and even between…

Archive of datasets bundled with R

R comes with a lot of datasets, some with the core distribution and others with packages, but you'd never know which ones unless you went through all the examples found…

Data on decades of Boy Scout expulsions released

The Los Angeles Times released nearly 5,000 records of allegations from the Boy Scouts of America as a browseable map and searchable list. You can also download the data. This…

Losing American Community Survey would be ‘disastrous’

Many want to get rid of the American Community Survey, a Census program which releases region-specific data annually. University of Michigan professor William Frey explains why cutting the survey would…

A Future Without Key Social and Economic Statistics for the Country

Robert Groves, director of the U.S. Census Bureau, on the Appropriations Bill: The Appropriations Bill eliminates the Economic Census, which measures the health of our economy. It terminates the American…

CNN transcript collection, 2000-2012

Thanks to the Internet Archive and CNN, thirteen years of transcripts, about a gigabyte compressed, is available to download as one file. For over a decade, CNN (Cable News Network)…

1940 Census Individual Records Released

The 72-year mark has arrived, and the United States Census released individual records from 1940 yesterday. So you can now, for example, see that J.D. Salinger lived at 1133 Park…

Texting on the toilet

I thought this riveting post on the New York Times Bits blog about the rise of the toilet texter deserved a graphic. Since their graphics department is no doubt busy…

What Facebook knows about you

Facebook logs and saves a lot of data about you and what you do on their site. This shouldn't be surprising given the more time people spend on Facebook, the…