Open-source Data Science Toolkit

Posted to Software  |  Tags: ,  |  Nathan Yau

Pete Warden does the data community a solid and wraps up a collection of open-source tools in the Data Science Toolkit to parse, geocode, and process data.

A collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces. Available as a self-contained VM or EC2 AMI that you can deploy yourself.

Many of the services are available via public APIs, but the usual benefits apply of running your own service such as privacy, independence, and no limits. Hit your machine with as many requests as you want. The code is available in its entirety on GitHub.

[Data Science Toolkit via @JanWillemTulp]



How We Spend Our Money, a Breakdown

We know spending changes when you have more money. Here’s by how much.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

How You Will Die

So far we’ve seen when you will die and how other people tend to die. Now let’s put the two together to see how and when you will die, given your sex, race, and age.