Understanding Data, Not Just the Realm of Scientists in Ivory Towers

Posted to Miscellaneous  |  Nathan Yau

What is Data and Why Should We Care About It?This guest post is by Hadley Wickham, a Statistics PhD candidate and a part of the GGobi team. He answers my question — “What is data and why should we care about it?”

For me, most data comes in the form of a data frame: a rectangular set of values with observations in rows and variables in columns. Most values are continuous (e.g. real numbers) or categorical (e.g. colours, treatments, subject ids), but are sometimes more esoteric (images, sounds, intervals). Each variable contains values of only one type and may also contain missing values. Missing values are particularly important for statisticians, and are often encoded as . or NA (encoding them as special numeric values, like 99, is generally a bad idea). Most data is “messy” and cleaning it up requires you to ensure that observations are in rows and variables in columns, as well as spending plenty of time to make sure that the values actually make sense (visualisation is really useful for this!).

Data Helps Illuminate Patterns

To me, caring about the message in data is the essence of science, where we perform some action on the world and record its response in our data. This isn’t just the realm of scientists in ivory towers, but something that we do everyday, whether it’s trying to understand the impact of a new marketing campaign, figuring out which house to buy or exploring why a new cancer drug isn’t working. Recording and examining the data that matters not only supports rational decision making, but also reveals the unexpected and helps illuminate underlying patterns.

Favorites

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.