The Bureau of Labor Statistics released the most recent unemployment numbers last week. Things aren't looking good for the unemployed, I'm afraid.
FlowingData readers who have been around for a while will remember I made a map early this year that showed the growth of Target stores across America. It starts with the first one in 1962 and then goes from there. It was a follow-up to the Walmart map, which I shared the code and data for.
It's exciting times for data heads. The launch of Data.gov back in May got things jump started; San Francisco recently announced DataSF; and now New York is getting in on the party with the announcement of their own Data Mine (live at 1pm EST today) and the NYC Big Apps competition.
Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there.
Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere. Continue Reading
There's a lot of data on the Web, but it's all very scattered. At the same time, there's a lot of data sitting on people's hard drives that we don't have access to. There are various reasons why people don't share, but mainly, they just don't see the point.
Taking another step towards data transparency, the US government provides the IT dashboard via USAspending.gov:
The IT Dashboard provides the public with an online window into the details of Federal information technology investments and provides users with the ability to track the progress of investments over time. The IT Dashboard displays data received from agency reports to the Office of Management and Budget (OMB), including general information on over 7,000 Federal IT investments and detailed data for nearly 800 of those investments that agencies classify as "major." The performance data used to track the 800 major IT investments is based on milestone information displayed in agency reports to OMB called "Exhibit 300s." Agency CIOs are responsible for evaluating and updating select data on a monthly basis, which is accomplished through interfaces provided on the website.
While we're on the subject of flight, ever since that plane landed in the Hudson River a few months ago, the thought of bird-airplane collisions haven't strayed too far from the media (or my mind each time I fly). In light of all the hoopla, the Federal Aviation Administration (FAA) finally gave in and opened up their bird strike database to the public.
Below is an interactive exploring this data breaking things down by bird type, location, phase of flight, and time of day. Click through to this post to view.
The Organization for Economic Co-operation and Development (OECD) makes a lot of world indicators available (e.g. world population and birth rate). Much of it goes unnoticed, because most people just see a bunch of numbers. However, the Factbook eXplorer from the OECD, in collaboration with the National Center for Visual Analytics, is a visualization tool that helps you see and explore the data.
Those who have seen Hans Rosling's Gapminder presentations - and I imagine most of us have - will recognize the style with a play button and a motion graph in sync with parallel coordinates and a map. Choose an indicator, or several of them, press play, and watch the visualization move through time.
Also, if you've got your own data, you can load that too, which is certainly a nice touch.
Big news. Data.gov is now live. Government data is at your fingertips.
The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data. Visit today with us, but come back often. With your help, Data.gov will continue to grow and change in the weeks, months, and years ahead.
I was actually expecting an API of some sort, but it's a searchable catalog that makes it easier to find the datasets scattered across all the U.S. agency sites. I still need to explore more to figure out what exactly is there, but this is big news for data fans. What do you think of the new site? Discuss in the comments below.
Google announced today that they have made a small subset of public datasets searchable. Search for unemployment rate and you'll see a thumbnail at the top of the results. Click on it, and you get a the very Google-y chart like the one above, so instead of searching for unemployment rates for multiple years, you can get it all at once.
The Center for Responsive Politics (CRP), a research group well-known for its tracking of monetary influence on United States politics, announced some great news. Their expansive dataset is now available to the public via OpenSecrets.
Politicians, prepare yourselves. Lobbyists, look out. Today the nonpartisan Center for Responsive Politics is putting 200 million data records from the watchdog group's archive directly into the hands of citizens, activists, journalists and anyone else interested in following the money in U.S. politics.
Yeah, 200 million data records. Correction. 200 million cleaned, formatted, and documented data records. Awesome. They've got data on campaign finances, lobbying, personal finances, and 527 organizations, which can be downloaded as CSV files or via the RESTful API. Let the mashups begin.
[via Ben Fry | Thanks, Gegtik]
Facebook started as a spinoff of Hot or Not in 2003. Now Facebook is the world's biggest online social network. It's certainly come a long way with millions of users around the world, the opening of the Facebook Platform, and quite possibly a personal data gold mine. All Facebook, the unofficial Facebook resource, provides news, and more importantly, data on growth, demographics, pages, and applications. A lot of it is locked behind a not so pretty widget, but interesting nevertheless. The above graphic is a look at some of that data.
Philip from infochimps posts the results of some heavy Twitter scraping. Data for 2.7 million users, 10 million tweets, and 58 million edges (i.e. connections between users) to satisfy your data hunger are available for download. I know a lot of you social network researchers will especially appreciate the big dataset, and best of all, Twitter gave Philip permssion to release. Yes, you could use the Twitter API, but isn't it better when someone does it for you?
Download the data here. The password is the Ramanujan taxicab number followed by the word
'kennedy' - all one word. Google is your friend, if that doesn't make sense.
It was really only a matter of time, but Amazon now hosts public data sets. Not small data sets though - more like the ones in between 1 gigabyte and 1 terabyte:
Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. An initial list of data sets is already available, and more will be added soon.
Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.
There's the human genome data set, US Census data from the past 3 decades, labor statistics, and some others. Still waiting on Google to follow through with their data hosting plans.
[via TechCrunch | Thanks, David]
Neighborhood Boundaries by Tom Taylor uses Flickr Shapefiles and Yahoo! Geoplanet "to show you where the world thinks its neighbors are." Yahoo! provides access to the Where on Earth (WOE) database, which attempts to describe locations as a hierarchy. For example - a town belongs to a city, a city to a county, a county to a state. The Flickr API stores shape files identified by the WOE ID. Here's the punchline. The shapefiles are built using only the latitude and longitude from geotagged photos on Flickr. There's no GIS involved here.
Why this matters, I can't really say. I think it's mostly to show how much data is stored in geotagged Flickr photos. I'm no GIS expert though. Anyone care to comment on the significance?
Where do you think the US imports the most oil from? Most of us would probably say somewhere in the Middle East, but Jon Udell does some number crunching and shows that misconception is false. Canada supplies us with the most oil (according to the US Department of Energy).
This realization however, isn't the post's punchline. It's how easy it was for Jon to figure this stuff out. With some help from Dabble DB (an app that lets you easily use a database without too much technical fuss), Jon was able to parse the data and map it by region with a few swift clicks.
Weâ€™re really close to the point where non-specialists will be able to find data online, ask questions of it, produce answers that bear on public policy issues, and share those answers online for review and discussion. A few more turns of the crank, and weâ€™ll be there. And not a moment too soon.
We're gettin' there.
It was just a little over a week ago that The New York Times announced their Developer Network i.e. Campaign Finance API. Yesterday, they announced something more - the Visualization Lab. In collaboration with the Many Eyes group, the Times has rolled out a Many Eyes for data used by Times writers. You can visualize, explore, and comment on data posted at the Visualization Lab in the same way that you can at Many Eyes.
Today, weâ€™re taking the next step in reader involvement with the launch of The New York Times Visualization Lab, which allows readers to create compelling interactive charts, graphs, maps and other types of graphical presentations from data made available by Times editors. NYTimes.com readers can comment on the visualizations, share them with others in the form of widgets and images, and create topic hubs where people can collect visualizations and discuss specific subjects.
A Few More Steps
I said the API was a good step forward. The Visualization Lab is more than a step. No doubt The Times heard what I said about their API and decided to roll with it since I am the head authority on everything. Yes, I'm totally kidding, in case that didn't come across as a joke. Come on now.
I'm looking forward to seeing how well Times readers take to this new way of interacting.
The New York Times announced the opening of their Developer Network a couple of days ago. It's their "API clearinghouse and community." It might seem kind of weird that a newspaper company has an API, but as many FlowingData readers know, the Times prides itself on innovation.
The Campaign Finance API is currently available:
With the Campaign Finance API, you can retrieve contribution and expenditure data based on United States Federal Election Commission filings. Campaign finance data is public and is therefore available from a variety of sources, but the developers of the Times API have distilled the data into aggregates that answer most campaign finance questions. Instead of poring over monthly filings or searching a disclosure database, you can use the Times Campaign Finance API to quickly retrieve totals for a particular candidate, see aggregates by ZIP code or state, or get details on a particular donor.
For anyone who has tried to play with FEC data, myself included, knows that this API is cool. You could get the data directly from the FEC, but it's a bit of a painstaking process. Now you don't have to sift through a bunch of reports or an awkward user interface.
The Movie Review API is next in line. After that, who knows, but it's a good step forward for The Times.
[via serial consign]