• # Explore World Data with Factbook eXplorer from OECD

The Organization for Economic Co-operation and Development (OECD) makes a lot of world indicators available (e.g. world population and birth rate). Much of it goes unnoticed, because most people just see a bunch of numbers. However, the Factbook eXplorer from the OECD, in collaboration with the National Center for Visual Analytics, is a visualization tool that helps you see and explore the data.

Those who have seen Hans Rosling's Gapminder presentations - and I imagine most of us have - will recognize the style with a play button and a motion graph in sync with parallel coordinates and a map. Choose an indicator, or several of them, press play, and watch the visualization move through time.

Also, if you've got your own data, you can load that too, which is certainly a nice touch.

• # The Devil is in the Digits?

June 22, 2009  |  Statistics

Undoubtedly you've been seeing a lot of headlines about the stuff going on in Iran. If you haven't, you must be living under a rock.

One of the huge issues right now is whether or not fraud was involved in the election of Mahmoud Ahmadinejad.

Wait a minute. Voting? Results? Numbers?

Oh, we have to look at the data for this one. Bernd Beber and Alexandra Scacco, Ph.D. candidates in political science at Columbia University, discuss in their Op-ed for the Washington Post:

The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number 5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4 percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent elections would produce such numbers.

Why does this matter? Well humans are bad at making up sequences of numbers. Made-up number sequences look different from real random sequences (e.g. numbers from McCain/Obama). Beber and Scacco go on to describe the details of why the data look fishy. For those of us who've read Freakonomics will recognize the discussion.

The result?

The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.

Now what?

• # The Current State of Social Data

June 16, 2009  |  Social Data Analysis

Check out my guest post on The Guardian's Data Blog on the current state of social data applications. There are what seems like a ton of them but none of them have really taken off (yet).

While the post is more of an overview of what's available, I'd like to start a little discussion here on why these data apps haven't gained more popularlity. There always seems be a lot of buzz around launch time, but then it fizzles.

Are people just not interested in interacting with data or do we need to approach the whole social data puzzle from a different angle?

• # Poll: Will Data Always Be Just For Geeks?

June 10, 2009  |  Polls, Statistics

I threw out a random thought a couple of months back. I tweeted, "Remember when computers used to be just for geeks? Now they're ubiquitous. We can do the same for data."

To be honest, I was just babbling, but I've been giving it some thought, and you know, now I'm not so sure. There are so many applications popping up every day that promise to socialize data. To make it the YouTube of data. None of them have really taken off though.

Is it because the visualization tools aren't advanced enough to make data accessible to the common user or is data simply meant to stay in the hands of experts?

So this begs the question:

If yes, what do you think makes data so distant to non-experts? If no, what will it take for non-experts to start interacting with data? Or are they already?

• # Rise of the Data Scientist

June 4, 2009  |  Design, Statistics

As we've all read by now, Google's chief economist Hal Varian commented in January that the next sexy job in the next 10 years would be statisticians. Obviously, I whole-heartedly agree. Heck, I'd go a step further and say they're sexy now - mentally and physically.

However, if you went on to read the rest of Varian's interview, you'd know that by statisticians, he actually meant it as a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts.

• # What’s Wrong With this Graphic on the Future of Information?

June 1, 2009  |  Discussion, Mistaken Data

This graphic on the history and future of information has been making the rounds. Several people sent it to me a while back, but it didn't seem quite right, so I didn't post it; however, this post from PZ Meyers compelled me to take another look. Meyers says:

Some days, I think other people must be aliens. Or I must be. For instance, there's a lot of noise right now about this article analyzing the future of information and media that, if you read the comments, you will discover that people are praising to an astonishing degree. I looked at it and saw this graph [above graphic]. And my bullshit detector went insane. It's supposed to be saying something about where people are and will be getting their information, but there's no information about where this information came from, and it's meaningless!

Yikes. Take out the boxing gloves. Looks like we've got another clash between the technical and the design-ish and mainstream crowds. The comments from both sides are also pretty interesting with one group saying how visually appealing and informative the graphic is with the other group criticizing the graphic for failing in every way.

Clearly the graphic is not based on any real data or metric. It goes off history and probably a lot of Wikipedia entries, and then shapes and sizes go off feeling. So as an analytical graph, it doesn't work. But what about as an opinion in graph form? Does it work then? What do you think? Is this graphic a crime against all that is good in visualization or does it work for what it was trying to do?

• # Data.gov is Live – Get Your Data While it’s Hot

May 21, 2009  |  Data Sources

Big news. Data.gov is now live. Government data is at your fingertips.

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data. Visit today with us, but come back often. With your help, Data.gov will continue to grow and change in the weeks, months, and years ahead.

I was actually expecting an API of some sort, but it's a searchable catalog that makes it easier to find the datasets scattered across all the U.S. agency sites. I still need to explore more to figure out what exactly is there, but this is big news for data fans. What do you think of the new site? Discuss in the comments below.

• # 37 Data-ish Blogs You Should Know About

May 6, 2009  |  Statistics, Visualization

You might not know it, but there are actually a ton of data and visualization blogs out there. I'm a bit of a feed addict subscribing to just about anything with a chart or a mention of statistics on it (and naturally have to do some feed-cleaning every now and then). In a follow up to my short list last year, here are the data-ish blogs, some old and some new, that continue to post interesting stuff. Continue Reading

April 28, 2009  |  Data Sources, Online Applications

Google announced today that they have made a small subset of public datasets searchable. Search for unemployment rate and you'll see a thumbnail at the top of the results. Click on it, and you get a the very Google-y chart like the one above, so instead of searching for unemployment rates for multiple years, you can get it all at once.

• # Tracking Swine Flu Worldwide – Where and How, Plus Data

April 28, 2009  |  Data Sources, Infographics

Just about everywhere you go there's something in the news about swine flu, and so naturally, when I first heard about it, I waited for The New York Times to put up a graphic. That was the first one. Here's the second (above).

• # Narrow-minded Data Visualization

April 22, 2009  |  Statistics

I was going to let this one slide, but people kept commenting, essentially trashing FlowingData, and that's just not cool. As you might recall, I put in my picks for the best data visualization projects of 2008 a while back. They were the fine work of statisticians, designers, and computer scientists, all of them beautiful, and all of them built to tell an interesting story with the dataset at hand. None of them were traditional graphs or charts.

• # Millions of Money-in-Politics Data Records Now Available

April 15, 2009  |  Data Sources

The Center for Responsive Politics (CRP), a research group well-known for its tracking of monetary influence on United States politics, announced some great news. Their expansive dataset is now available to the public via OpenSecrets.

Politicians, prepare yourselves. Lobbyists, look out. Today the nonpartisan Center for Responsive Politics is putting 200 million data records from the watchdog group's archive directly into the hands of citizens, activists, journalists and anyone else interested in following the money in U.S. politics.

Yeah, 200 million data records. Correction. 200 million cleaned, formatted, and documented data records. Awesome. They've got data on campaign finances, lobbying, personal finances, and 527 organizations, which can be downloaded as CSV files or via the RESTful API. Let the mashups begin.

March 24, 2009  |  Data Sources

Facebook started as a spinoff of Hot or Not in 2003. Now Facebook is the world's biggest online social network. It's certainly come a long way with millions of users around the world, the opening of the Facebook Platform, and quite possibly a personal data gold mine. All Facebook, the unofficial Facebook resource, provides news, and more importantly, data on growth, demographics, pages, and applications. A lot of it is locked behind a not so pretty widget, but interesting nevertheless. The above graphic is a look at some of that data.

• # Data Visualization is Only Part of the Answer to Big Data

March 20, 2009  |  Design, Exploratory Data Analysis

How can we now cope with a large amount of data and still do a thorough job of analysis so that we don't miss the Nobel Prize?

— Bill Cleveland, Getting Past the Pie Chart, SEED Magazine, 2.18.2009

For the past year, I've been slowly drifting off my statistical roots - more interested in design and aesthetics than in whether or not a particular graphic works or the more numeric tools at my disposal. I've always had more fun experimenting on a bunch different things rather than really knuckling down on a particular problem. This works for a lot of things - like online musings - but you miss a lot of the important technical points in the process, so I've been (slowly) working my way back to the analytical side of the river.

• # What’s Wrong With this Financial Bubble Chart?

February 26, 2009  |  Mistaken Data

If there's anything good that has come out of America's financial crisis, it's the interesting and high-quality infographics. This isn't one of them. Below is an ill-conceived bubble chart from BillShrink that "shows" average U.S. consumer spending. Notice anything wrong with it?

Bar versus bubble debate aside, there is a ton of room for improvement as well as huge need for some fact-checking and common sense. For a blog on a site for personal finance, the graphic is, well, not something to be proud of. FlowingData readers know that I like to stay away from heavy-handed critique on what works and what doesn't (I leave that to you guys), but this BillShrink graphic is just so clearly confusing that it's worth pointing out what doesn't work so we can learn from others' mistakes. Can you find the flaws?

• # Google’s Chief Economist Hal Varian on Statistics and Data

February 25, 2009  |  Quotes, Statistics

I keep saying the sexy job in the next ten years will be statisticians. People think I'm joking, but who would've guessed that computer engineers would've been the sexy job of the 1990s?

Hal Varian, The McKinsey Quarterly, January 2009

Varian then goes on to say:

The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it's going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.

I think statisticians are part of it, but it's just a part. You also want to be able to visualize the data, communicate the data, and utilize it effectively. But I do think those skills - of being able to access, understand, and communicate the insights you get from data analysis - are going to be extremely important. Managers need to be able to access and understand the data themselves.

Wait a minute. Is this a pitch for FlowingData? I think so :). Check out the full article for more (or listen to the podcast). It's an interesting read.

• # Obama Launches Recovery.gov – Your \$787 billion at Work

February 17, 2009  |  Statistics

The Barack Obama administration is clearly making an effort to get information out to the public. We saw the plan to distribute billions of dollars to help our economy. Obama has now launched Recovery.gov as a way for you to keep track and understand where \$787 billion from the American Recovery and Reinvestment Act is used.

• # Fail: Area Circles on Wall Street

February 16, 2009  |  Mistaken Data

I know next to nothing about the economy, stocks, and investments, but I do know a little bit about charts and graphs. The above area circles were prepared by someone at JP Morgan. I don't know, you might have heard of 'em. The circles are based on data from Bloomberg and meant to show the change in market value from 2007 to 2009. The problem here is that the creator sized circles by diameter instead of area, so the difference looks ginormous. I mean, the value change is significant but not that big.

February 12, 2009  |  Data Sharing, Discussion, Online Applications

Google recently released Google Latitude, which is an online application that lets you share your location with online friends:

Of course when any application shares where you are at any given time, people start to feel like Big Brother is looming in the background ready to sneak up on us from behind a giant bush. Some call it a real danger, but is it really? I put this question out to all of you:

## Is Google Latitude a danger to anyone who uses it?

My take on things is that people are already doing it anyways, so why not make it easier for those who are interested? Sure, if some stalker got a hold of your location, that could be bad, but that's true for a lot of data... credit card statements, cell phone logs, Twitter... As long as the proper security are put in place, I don't see what all the fuss is about.

• # Sensors in Footballs – Was the Pass Good?

December 30, 2008  |  Statistics

Graduate student researchers are pretty much putting sensors in everything these days. There's always more data to collect and more information to gather. Computer engineering students from Carnegie Mellon University experiment with sensors in footballs and gloves to measure grip, trajectory, speed and position.

"You'd never want to replace the human referees because they make these calls based on years of experience, and no technology can replace that," she said. "But in addition to the instant replay, if you had a supplementary system that said this is exactly where the ball landed and where the player stopped with it, you could make these kinds of calls accurately."

So far, she and her squad of undergraduate and graduate students have focused on two things: gloves with touch sensors that can transmit that information wirelessly to a computer, and a football equipped with a global positioning receiver and accelerometer that can track the location, speed and trajectory of the ball.

Eventually, the same kind of sensors used in the gloves could be adapted to shoes, to measure stride and running patterns, or even shoulder pads, to calculate blocking positions and force.

Yes, it's the end of the post-game show as we know it.

