Open-source Data Science Toolkit

Posted to Software  |  Tags: ,  |  Nathan Yau

Pete Warden does the data community a solid and wraps up a collection of open-source tools in the Data Science Toolkit to parse, geocode, and process data.

A collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces. Available as a self-contained VM or EC2 AMI that you can deploy yourself.

Many of the services are available via public APIs, but the usual benefits apply of running your own service such as privacy, independence, and no limits. Hit your machine with as many requests as you want. The code is available in its entirety on GitHub.

[Data Science Toolkit via @JanWillemTulp]

3 Comments

Favorites

The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

10 Best Data Visualization Projects of 2015

These are my picks for the best of 2015. As usual, they could easily appear in a different order on a different day, and there are projects not on the list that were also excellent.

Where Bars Outnumber Grocery Stores

A closer look at the age old question of where there are more bars than grocery stores, and vice versa.