Large-ish data packages in R

Posted to Data Sources  |  Tags: ,  |  Nathan Yau

If you’ve played around with R enough, there comes a time when you just need some data to mess around with. Maybe it’s to learn a new method or to make one of your own. R offers some small-ish, clean datasets to poke at, but sometimes you need bigger, messier data. Hadley Wickham from RStudio released four popular large-ish datasets in package form to help you with that.

I’ve released four new data packages to CRAN: babynames, fueleconomy, nasaweather and nycflights13. The goal of these packages is to provide some interesting, and relatively large, datasets to demonstrate various data analysis challenges in R. The package source code (on github, linked above) is fully reproducible so that you can see some data tidying in action, or make your own modifications to the data.

Good.

Favorites

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.