Kaggle Datasets for a place to converge on public data

Jan 21, 2016

Kaggle just opened up a Datasets section to download and analyze public data.

At Kaggle, we want to help the world learn from data. This sounds bold and grandiose, but the biggest barriers to this are incredibly simple. It’s tough to access data. It’s tough to understand what’s in the data once you access it. We want to change this. That’s why we’ve created a home for high quality public datasets, Kaggle Datasets.

It’s still really new and only has a handful of datasets but it looks interesting. The key is that it’s not just a place to download data. Instead, they have analysis environments and make it easy to share code that makes use of the data. They also make it easy to share results.

Oftentimes, it’s the getting-started hurdle that gets in the way of working with a large-ish dataset. Maybe this will help set things on the right path.


Visualizing the Uncertainty in Data

Data is an abstraction, and it’s impossible to encapsulate everything it represents in real life. So there is uncertainty. Here are ways to visualize the uncertainty.

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

Divorce and Occupation

Some jobs tend towards higher divorce rates. Some towards lower. Salary also probably plays a role.

One Dataset, Visualized 25 Ways

“Let the data speak” they say. But what happens when the data rambles on and on?