
What? I don't see anything wrong with it.
Continue Reading

What? I don't see anything wrong with it.
Continue Reading

This graphic on the history and future of information has been making the rounds. Several people sent it to me a while back, but it didn't seem quite right, so I didn't post it; however, this post from PZ Meyers compelled me to take another look. Meyers says:
Some days, I think other people must be aliens. Or I must be. For instance, there's a lot of noise right now about this article analyzing the future of information and media that, if you read the comments, you will discover that people are praising to an astonishing degree. I looked at it and saw this graph [above graphic]. And my bullshit detector went insane. It's supposed to be saying something about where people are and will be getting their information, but there's no information about where this information came from, and it's meaningless!
Yikes. Take out the boxing gloves. Looks like we've got another clash between the technical and the design-ish and mainstream crowds. The comments from both sides are also pretty interesting with one group saying how visually appealing and informative the graphic is with the other group criticizing the graphic for failing in every way.
Clearly the graphic is not based on any real data or metric. It goes off history and probably a lot of Wikipedia entries, and then shapes and sizes go off feeling. So as an analytical graph, it doesn't work. But what about as an opinion in graph form? Does it work then? What do you think? Is this graphic a crime against all that is good in visualization or does it work for what it was trying to do?
[Thanks, Patrick]
If there's anything good that has come out of America's financial crisis, it's the interesting and high-quality infographics. This isn't one of them. Below is an ill-conceived bubble chart from BillShrink that "shows" average U.S. consumer spending. Notice anything wrong with it?

Bar versus bubble debate aside, there is a ton of room for improvement as well as huge need for some fact-checking and common sense. For a blog on a site for personal finance, the graphic is, well, not something to be proud of. FlowingData readers know that I like to stay away from heavy-handed critique on what works and what doesn't (I leave that to you guys), but this BillShrink graphic is just so clearly confusing that it's worth pointing out what doesn't work so we can learn from others' mistakes. Can you find the flaws?
[Thanks, Jess]

I know next to nothing about the economy, stocks, and investments, but I do know a little bit about charts and graphs. The above area circles were prepared by someone at JP Morgan. I don't know, you might have heard of 'em. The circles are based on data from Bloomberg and meant to show the change in market value from 2007 to 2009. The problem here is that the creator sized circles by diameter instead of area, so the difference looks ginormous. I mean, the value change is significant but not that big.
Here's the revised version from a Big Picture reader, Rene Corda:
Now look at the original version again. Big difference, right?
Circles are 2-dimensional shapes. You can't use them and expect people to compare two circles by diameter, a 1-D metric. Sorry, JP Morgan person. You fail.
Check out the Big Picture for some more graphs of the same data.
[via Cringely | Thanks, Barry]
This video shows statistics centered around atheism, claiming that atheism is correlated with a healthy society. I don't want to turn this into a religious debate, but I really don't like these types of videos, slide shows, etc. It's not the ideas that bother me, but because some people think it's a great idea to rattle off a bunch of numbers to "prove" a point. Nevermind the biases, invalid studies, poor analysis, cruddy data, and "results" taken out of context.
What do you think? Do you buy this stuff?
Peter Donnelly talks about the misuse of statistics in his TED talk a couple of years back. The first 2/3 of the talk is an introduction to probability and its role in genetics, which admittedly, didn't get much of my interest. The last third, however, gets a lot more interesting.
Donnelly talks about a British woman who was wrongly convicted largely in part because of a misuse of statistics. A so-called expert cited how improbable it would be for two children to die of sudden infant death syndrome, but it turns out that "expert" was making incorrect assumptions about the data. This doesn't surprise me since it happens all the time.
People misuse statistics every day (intentionally and unintentionally), and oftentimes it doesn't hurt much (which doesn't make it any better), but in this case improper use directly affected someone's life in a very big way. One of the most common assumptions I see is that every observation is independent, which often is not the case. As a simple example, if it's raining today, does that change the probability that it will rain tomorrow? What it didn't rain today?
In other words, the next time you're thinking of making up or tweaking data, don't; and the next time you need to analyze some data but aren't sure how, ask for some help. Statisticians are nice and oh so awesome.
Here's Donnelly's talk:

On Last.fm, someone took snapshots of some Linkin Park songs, compared them, and concluded that all Linkin Park songs look are the same. I guess at a glance, the songs might appear the same because of the dark chunk towards middle left, but it kind of stops there. Sure, there's some loud to soft and soft to loud alternation, but who likes songs who are loud (or soft) throughout?
The beginning of the post:
Each image above shows the audio level in (roughly) the first 90 seconds of a Linkin Park song. The tempo has been adjusted for a few tracks for better visual alignment.
Wait a minute. The tempo was adjusted for better visual alignment? If you're adjusting the tempo, then really, all songs can be made to look the same. On top of that, we don't know the x-axis or y-axis units. Finally, there's a lot more to a song other than dynamics -- such as key, tempo, rhythm, and lyrics.
I saw this map of the average snow levels in Buffalo. I think I just glanced at it and that was about it. When you first look at the map, what do you make of the colors? When I see green for snow levels, I think no snow. Am I crazy? What do you think?
So the image was kind of in my head all this summer while I was in NYC. When I told people that I was going back to Buffalo after my internship, they always gave this look that said, "Ha, have fun during the winter," and then they would actually say it and then go into how they measure the snow level by comparing it against a giant pole.
Smart guy that I am, I thought to myself, "Can't be that bad. The map that I saw on the University of Buffalo page showed the winter wasn't really that bad."
I went back to the map just now, preparing to compare Buffalo weather data to New York city data. I was going to prove once and for all that the Buffalo winters weren't really that bad, and that people were just casting it in a bad light. It was going to be awesome.
Then I looked at the legend.

All of the colors represent 10-degree increments except for the extremes. Green represents average snowfall less than 80 inches. What? Darn. A quick look at the data shows that yes, Buffalo does get quite a bit more snow than NYC.
The lesson of the day: pick those colors carefully, because wrong colors just could be showing something that shouldn't be shown.

After parsing Weather Underground pages to grab temperature data, it's time to look at the data. Can't download all that data and not do anything with it!
First off, in my initial pass of my parsing script, I accidentally cut the month range short, so I didn't get any data for December from 1980 to 2005. It should be noted that these plots don't show this missing data. Um, there's no axes or labels either. Sorry, I got a little lazy, but that's not the point now anyways.
Notice anything weird about the above plot? There's some unusually smooth data in the middle. Here's a zoom in:

If we look at the data between 1994 and 1997, there's oddly a lot of smoothness... hmm... HMMM.
It looks like between that time, there was some interpolation going on. I mean, if that's all you got, that's all you got, but I wish WU would at least make note of it or provide some annotation.
Anyways, just another example of data posing to be something else. In my opinion, all data sucks until proven worthwhile.
It's easy to see how Statistics got this bad wrap because it's so easy to lie with data, charts, and graphs. Sometimes it's on purpose -- someone might try to present "good" results that actually suck. Sometimes it's accidental -- someone might have misread or didn't read the documentation that came with the data. In the case of Swivel's most recently featured graph, it was the latter. A case of mistaken identity so to speak.
The data about doping tests in sports came from here. Now the graph on Swivel would have you believe that the data represent the number of doping cases found in each sports; however, according to the USADA report, the data is actually the number of tests the association conducted inside and outside competition during the first quarter of this year. The report contains no data on the USADA's findings.
What can we learn from this? It's great to visualize data, but you have to be careful. Read the documentation. Find out what the data is about, because without context, the visualization or any findings are practically useless. Statistics isn't to lie. In fact, it's the exact opposite. Statistics came about and exists today to reveal the truth.