Large-ish data packages in R

Posted to Data Sources  |  Tags: ,  |  Nathan Yau

If you’ve played around with R enough, there comes a time when you just need some data to mess around with. Maybe it’s to learn a new method or to make one of your own. R offers some small-ish, clean datasets to poke at, but sometimes you need bigger, messier data. Hadley Wickham from RStudio released four popular large-ish datasets in package form to help you with that.

I’ve released four new data packages to CRAN: babynames, fueleconomy, nasaweather and nycflights13. The goal of these packages is to provide some interesting, and relatively large, datasets to demonstrate various data analysis challenges in R. The package source code (on github, linked above) is fully reproducible so that you can see some data tidying in action, or make your own modifications to the data.



Marrying Age

People get married at various ages, but there are definite trends that vary across demographic groups. What do these trends look like?

The Changing American Diet

See what we ate on an average day, for the past several decades.

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.