Data Sources

  • NYC BigApps Competition – $20k In Prize Money

    October 6, 2009 to Data Sources by Nathan Yau

    It's exciting times for data heads. The launch of Data.gov back in May got things jump started; San Francisco recently announced DataSF; and now New York is getting in on the party with the announcement of their own Data Mine (live at 1pm EST today) and the NYC Big Apps competition.
    Continue Reading

  • 30 Resources to Find the Data You Need

    October 1, 2009 to Data Sources by Nathan Yau

    Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there.

    Universities

    Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere. Continue Reading

  • Share and Sell Data with Infochimps (100 Invites)

    September 25, 2009 to Data Sources by Nathan Yau

    infochimpsThere's a lot of data on the Web, but it's all very scattered. At the same time, there's a lot of data sitting on people's hard drives that we don't have access to. There are various reasons why people don't share, but mainly, they just don't see the point.

    Infochimps tries to solve both of these problems with an open data marketplace.
    Continue Reading

  • IT Dashboard and Data from USAspending.gov

    July 22, 2009 to Data Sources by Nathan Yau

    it-dashboard

    Taking another step towards data transparency, the US government provides the IT dashboard via USAspending.gov:

    The IT Dashboard provides the public with an online window into the details of Federal information technology investments and provides users with the ability to track the progress of investments over time. The IT Dashboard displays data received from agency reports to the Office of Management and Budget (OMB), including general information on over 7,000 Federal IT investments and detailed data for nearly 800 of those investments that agencies classify as "major." The performance data used to track the 800 major IT investments is based on milestone information displayed in agency reports to OMB called "Exhibit 300s." Agency CIOs are responsible for evaluating and updating select data on a monthly basis, which is accomplished through interfaces provided on the website.

    Along with a page to filter and download spending data, there's a variety of views into the IT spending data that all provide a pretty good level of interaction.
    Continue Reading

  • Taking a Closer Look at Airplane-Bird Collisions

    July 16, 2009 to Data Sources by Nathan Yau

    While we're on the subject of flight, ever since that plane landed in the Hudson River a few months ago, the thought of bird-airplane collisions haven't strayed too far from the media (or my mind each time I fly). In light of all the hoopla, the Federal Aviation Administration (FAA) finally gave in and opened up their bird strike database to the public.

    Below is an interactive exploring this data breaking things down by bird type, location, phase of flight, and time of day. Click through to this post to view.
    Continue Reading

  • Explore World Data with Factbook eXplorer from OECD

    explorer

    The Organization for Economic Co-operation and Development (OECD) makes a lot of world indicators available (e.g. world population and birth rate). Much of it goes unnoticed, because most people just see a bunch of numbers. However, the Factbook eXplorer from the OECD, in collaboration with the National Center for Visual Analytics, is a visualization tool that helps you see and explore the data.

    Those who have seen Hans Rosling's Gapminder presentations - and I imagine most of us have - will recognize the style with a play button and a motion graph in sync with parallel coordinates and a map. Choose an indicator, or several of them, press play, and watch the visualization move through time.

    Also, if you've got your own data, you can load that too, which is certainly a nice touch.

    [via BBC News | Thanks, Lawrie & Liam]

  • Data.gov is Live – Get Your Data While it’s Hot

    May 21, 2009 to Data Sources by Nathan Yau

    Big news. Data.gov is now live. Government data is at your fingertips.

    The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data. Visit today with us, but come back often. With your help, Data.gov will continue to grow and change in the weeks, months, and years ahead.

    I was actually expecting an API of some sort, but it's a searchable catalog that makes it easier to find the datasets scattered across all the U.S. agency sites. I still need to explore more to figure out what exactly is there, but this is big news for data fans. What do you think of the new site? Discuss in the comments below.

    [via infosthetics]

  • Google Adds Search to Public Data

    Google announced today that they have made a small subset of public datasets searchable. Search for unemployment rate and you'll see a thumbnail at the top of the results. Click on it, and you get a the very Google-y chart like the one above, so instead of searching for unemployment rates for multiple years, you can get it all at once.
    Continue Reading

  • Tracking Swine Flu Worldwide – Where and How, Plus Data

    April 28, 2009 to Data Sources, Infographics by Nathan Yau

    Just about everywhere you go there's something in the news about swine flu, and so naturally, when I first heard about it, I waited for The New York Times to put up a graphic. That was the first one. Here's the second (above).
    Continue Reading

  • Millions of Money-in-Politics Data Records Now Available

    April 15, 2009 to Data Sources by Nathan Yau

    The Center for Responsive Politics (CRP), a research group well-known for its tracking of monetary influence on United States politics, announced some great news. Their expansive dataset is now available to the public via OpenSecrets.

    Politicians, prepare yourselves. Lobbyists, look out. Today the nonpartisan Center for Responsive Politics is putting 200 million data records from the watchdog group's archive directly into the hands of citizens, activists, journalists and anyone else interested in following the money in U.S. politics.

    Yeah, 200 million data records. Correction. 200 million cleaned, formatted, and documented data records. Awesome. They've got data on campaign finances, lobbying, personal finances, and 527 organizations, which can be downloaded as CSV files or via the RESTful API. Let the mashups begin.

    [via Ben Fry | Thanks, Gegtik]

  • Taking a Look at Facebook Statistics from All Facebook

    March 24, 2009 to Data Sources by Nathan Yau

    facebook

    Facebook started as a spinoff of Hot or Not in 2003. Now Facebook is the world's biggest online social network. It's certainly come a long way with millions of users around the world, the opening of the Facebook Platform, and quite possibly a personal data gold mine. All Facebook, the unofficial Facebook resource, provides news, and more importantly, data on growth, demographics, pages, and applications. A lot of it is locked behind a not so pretty widget, but interesting nevertheless. The above graphic is a look at some of that data.

    [Thanks, @mobiletek]

  • All You Can Eat at the Twitter Data Buffet

    December 24, 2008 to Data Sources by Nathan Yau

    Philip from infochimps posts the results of some heavy Twitter scraping. Data for 2.7 million users, 10 million tweets, and 58 million edges (i.e. connections between users) to satisfy your data hunger are available for download. I know a lot of you social network researchers will especially appreciate the big dataset, and best of all, Twitter gave Philip permssion to release. Yes, you could use the Twitter API, but isn't it better when someone does it for you?

    Download the data here. The password is the Ramanujan taxicab number followed by the word
    'kennedy' - all one word. Google is your friend, if that doesn't make sense.

    [Thanks, Tim]

  • Amazon Gets In On the Public Data Arena

    December 5, 2008 to Data Sources by Nathan Yau

    It was really only a matter of time, but Amazon now hosts public data sets. Not small data sets though - more like the ones in between 1 gigabyte and 1 terabyte:

    Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. An initial list of data sets is already available, and more will be added soon.

    Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.

    There's the human genome data set, US Census data from the past 3 decades, labor statistics, and some others. Still waiting on Google to follow through with their data hosting plans.

    [via TechCrunch | Thanks, David]

  • Neighborhood Boundaries with Flickr Shapefiles

    November 28, 2008 to Data Sources, Mapping by Nathan Yau

    Neighborhood Boundaries by Tom Taylor uses Flickr Shapefiles and Yahoo! Geoplanet "to show you where the world thinks its neighbors are." Yahoo! provides access to the Where on Earth (WOE) database, which attempts to describe locations as a hierarchy. For example - a town belongs to a city, a city to a county, a county to a state. The Flickr API stores shape files identified by the WOE ID. Here's the punchline. The shapefiles are built using only the latitude and longitude from geotagged photos on Flickr. There's no GIS involved here.

    Why this matters, I can't really say. I think it's mostly to show how much data is stored in geotagged Flickr photos. I'm no GIS expert though. Anyone care to comment on the significance?

    [Thanks, @couch]

  • US Oil Doesn’t Come From Where You Think it Does

    November 21, 2008 to Data Sources, Mapping by Nathan Yau

    Where do you think the US imports the most oil from? Most of us would probably say somewhere in the Middle East, but Jon Udell does some number crunching and shows that misconception is false. Canada supplies us with the most oil (according to the US Department of Energy).

    This realization however, isn't the post's punchline. It's how easy it was for Jon to figure this stuff out. With some help from Dabble DB (an app that lets you easily use a database without too much technical fuss), Jon was able to parse the data and map it by region with a few swift clicks.

    We’re really close to the point where non-specialists will be able to find data online, ask questions of it, produce answers that bear on public policy issues, and share those answers online for review and discussion. A few more turns of the crank, and we’ll be there. And not a moment too soon.

    We're gettin' there.

    [Thanks, Tim]

  • New York Times Visualization Lab – Collaboration with Many Eyes

    October 28, 2008 to Data Sources by Nathan Yau

    It was just a little over a week ago that The New York Times announced their Developer Network i.e. Campaign Finance API. Yesterday, they announced something more - the Visualization Lab. In collaboration with the Many Eyes group, the Times has rolled out a Many Eyes for data used by Times writers. You can visualize, explore, and comment on data posted at the Visualization Lab in the same way that you can at Many Eyes.

    Today, we’re taking the next step in reader involvement with the launch of The New York Times Visualization Lab, which allows readers to create compelling interactive charts, graphs, maps and other types of graphical presentations from data made available by Times editors. NYTimes.com readers can comment on the visualizations, share them with others in the form of widgets and images, and create topic hubs where people can collect visualizations and discuss specific subjects.

    A Few More Steps

    I said the API was a good step forward. The Visualization Lab is more than a step. No doubt The Times heard what I said about their API and decided to roll with it since I am the head authority on everything. Yes, I'm totally kidding, in case that didn't come across as a joke. Come on now.

    I'm looking forward to seeing how well Times readers take to this new way of interacting.

    [Thanks, William]

  • Playboy Playmate Curves and the State of the Economy

    October 24, 2008 to Data Sources, Economics by Nathan Yau

    Terry Pettijohn and Brian Jungeberg of Mercyhurst College took a very close look at the curves, um, measurements of past Playboy Playmates of the Year in relation to the state of the economy.
    Continue Reading

  • Lexical Analysis of Presidential Debates and the Windbag Index

    October 23, 2008 to Data Sources, Statistics by Nathan Yau

    Martin Krzywinski, whose previous work includes Circos, digs deep into the presidential debate transcripts with tedious manual (or was it automatic?) annotation of words (noun/verb/adjective/adverb), Wordle, and his custom metric called the Windbag index that measures speech complexity.
    Continue Reading

  • New York Times Rolls Out Campaign Finance API

    October 16, 2008 to Data Sources by Nathan Yau

    The New York Times announced the opening of their Developer Network a couple of days ago. It's their "API clearinghouse and community." It might seem kind of weird that a newspaper company has an API, but as many FlowingData readers know, the Times prides itself on innovation.

    The Campaign Finance API is currently available:

    With the Campaign Finance API, you can retrieve contribution and expenditure data based on United States Federal Election Commission filings. Campaign finance data is public and is therefore available from a variety of sources, but the developers of the Times API have distilled the data into aggregates that answer most campaign finance questions. Instead of poring over monthly filings or searching a disclosure database, you can use the Times Campaign Finance API to quickly retrieve totals for a particular candidate, see aggregates by ZIP code or state, or get details on a particular donor.

    For anyone who has tried to play with FEC data, myself included, knows that this API is cool. You could get the data directly from the FEC, but it's a bit of a painstaking process. Now you don't have to sift through a bunch of reports or an awkward user interface.

    The Movie Review API is next in line. After that, who knows, but it's a good step forward for The Times.

    [via serial consign]

  • OneGeology Wants to Be Geological Equivalent of Google Maps

    September 11, 2008 to Data Sources, Mapping by Nathan Yau

    There's lots of free geographical data about what's going on at the surface of our planet. It's a different story for what going on underneath though. OneGeology aims to be the solution to that problem.

    OneGeology is an international initiative of the geological surveys of the world and a flagship project of the 'International Year of Planet Earth'. Its aim is to create dynamic geological map data of the world available via the web. This will create a focus for accessing geological information for everyone.

    I've never been one for the geology, but if the data (and interactive maps) were easily accessible, there certainly would be a peak in interest.

    [via msnbc | Thanks, Samantha]

Unless otherwise noted, graphics and words by me are licensed under Creative Commons BY-NC. Contact original authors for everything else.