Deep fried data

Posted to Statistics  |  Tags: , ,  |  Nathan Yau

Maciej Ceglowski, the creator of bookmarking site Pinboard, spoke at the Library of Congress a couple of weeks ago about deep-fried data.

In our case, the deep-fryer is a toolbox of statistical techniques. The names keep changing—it used to be unsupervised learning, now it’s called big data or deep learning or AI. Next year it will be called something else. But the core ideas don’t change. You train a computer on lots of data, and it learns to recognize structure.

These techniques are effective, but the fact that the same generic approach works across a wide range of domains should make you suspicious about how much insight it’s adding.

And in any deep frying situation, a good question to ask is: what is this stuff being fried in?

Deep fried food is often delicious, regardless of what’s inside the golden breading. But deep inside — or really right there in the front of our minds — we know it’s not good for us. Moderation. Or at least make sure you go to the good food stand at the carnival that uses the clean oil.

See the talk and many others here.

Favorites

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.