PBS Off Book's recent episode is on "the art of data visualization." It feels like a TED talk — kind of fluffy and warm — with several names and visualization examples that you'll recognize. No clue who the first guy is though.
We've seen plenty of augmented reality where you put on some digitally-enabled glasses or point your camera phone on something and visuals are overlaid on reality. The augmentation is typically a layer on top.
Eidos is a student project that tries taking this in a different direction. One piece applies an effect similar to long-exposure photography, and the other sends audio to your inner ear to focus on a subject and drown out ambient noise. See the devices in action in the video below.
About 35,000 meteorites have been recorded since 2500 BC, and a little over 1,000 of them were seen while they fell, based on data from the Nomenclature Committee of the Meteoritical Society. Carlo Zapponi, a data visualization designer, visualized the latter in Bolides.
We saw a mapped version of this data a while back, but Bolides takes a time-based approach. A bar chart shows the number and volume of meteorites that have been seen over time, and on the initial load, you get to watch the meteorites fall, one bright orange fireball at a time.
In collaboration between USGS, NASA and TIME, Google released a quarter century of satellite imagery to see how the world has changed over time.
The images were collected as part of an ongoing joint mission between the USGS and NASA called Landsat. Their satellites have been observing earth from space since the 1970s—with all of the images sent back to Earth and archived on USGS tape drives that look something like this example (courtesy of the USGS).
We started working with the USGS in 2009 to make this historic archive of earth imagery available online. Using Google Earth Engine technology, we sifted through 2,068,467 images—a total of 909 terabytes of data—to find the highest-quality pixels (e.g., those without clouds), for every year since 1984 and for every spot on Earth. We then compiled these into enormous planetary images, 1.78 terapixels each, one for each year.
Be sure to check out the Timelapse feature on Time.
After seeing a Reddit post on the convergence of Miss Korea faces, supposedly due to high rates of plastic surgery, graduate student Jia-Bin Huang analyzed the faces of 20 contestants. Below is a short video of each face slowly transitioning to the other.
From the video and pictures it's pretty clear that the photos look similar, but Huang took it a step further with a handful of computer vision techniques to quantify the likeness between faces. And again, the analysis shows similarity between the photos, so the gut reaction is that the contestants are nearly identical.
However, you have to assume that the pictures are accurate representations of the contestants, which doesn't seem to pan out at all. It's amazing what some makeup, hair, and photoshop can do.
You gotta consider your data source before you make assumptions about what that data represents.
Based on data drawn from media reports and state salary databases, the ranks of the highest-paid active public employees include 27 football coaches, 13 basketball coaches, one hockey coach, and 10 dorks who aren't even in charge of a team.
The quality of television shows follow all kinds of patterns. Some shows stink in the beginning and slowly gain steam, whereas others are great at first and then lost momentum towards eventual cancellation. Using data from the Global Episode Opinion Survey, Andrew Clark visualized ratings over time for many popular shows in an interactive.
Watch Arrested Development enough and you start to realize there are a lot of recurring jokes in various episodes and seasons. In an interactive by Beutler Ink and Red Edge, Recurring Developments shows what episodes jokes, such as the awkwardness between George Michael and Maeby, happen. And like the visualization this is based on, you can also go the other way around and look at the recurring themes in each episode.
The interaction is fairly straightforward. Jokes are on the left and a listing of episodes is on the right. Click a joke and orange lines extend to corresponding episodes. Click an episode and lines extend to corresponding jokes.
Excuse me while I go on an Arrested Development binge on Netflix.
On Wikipedia, there are constant edits by people around the world. You can poke your head in on the live recent edits via the IRC feed from Wikimedia. Stephen LaPorte and Mahmoud Hashemi are scraping the anonymous edits, which include IP addresses (which can be easily mapped to location), and naturally, you can see them pop up on a map.
In a follow-up to their map of racist tweets towards Barack Obama, the folks at Floating Sheep took a more rigorous route to get around the challenges of sentiment analysis. Over 150,000 geotagged tweets against races, sexuality, and disabled were manually classified and mapped.
All together, the students determined over 150,000 geotagged tweets with a hateful slur to be negative. Hateful tweets were aggregated to the county level and then normalized by the total number of tweets in each county. This then shows a comparison of places with disproportionately high amounts of a particular hate word relative to all tweeting activity. For example, Orange County, California has the highest absolute number of tweets mentioning many of the slurs, but because of its significant overall Twitter activity, such hateful tweets are less prominent and therefore do not appear as prominently on our map. So when viewing the map at a broad scale, it’s best not to be covered with the blue smog of hate, as even the lower end of the scale includes the presence of hateful tweeting activity.
Hard to believe this stuff is still around. It looks like I might want to stay clear of some parts of Virginia. (The aggregation at the national level seems a bit aggressive. When you zoom in on the map, the polarity between the east and west doesn't seem so strong.)
Update: Be sure to read the FAQ before making snap judgements.
It's hard to believe it's been over a month since Data Points: Visualization That Means Something hit the shelves. Thanks to all of you for the tweets, emails, and pictures of the book in the wild. Every one make me smile, and I'm glad that people are finding it helpful.
In case you're still deciding, here's a sample chapter from the book. It's Chapter 3 on representing data and should give you a good idea of what to expect. And of course it's way sexier in print.
This is my first time hearing about this, probably because it only happens every 17 years. After 17 years of development in the ground (getting nourishment from tree roots), the Cicada insects are starting to swarm on the east coast. Hundreds of millions of them mate, make a lot of noise, and then die. Adam Becker and Peter Aldhous for New Scientist mapped data maintained by John Cooley and Chris Simon from the University of Connecticut to show the cycles of the Cicada.
There are 17-year broods, which is what's happening now, and there are 13-year broods, with the next one expected next year in Louisiana.
Click the play button on the top right to see the various broods appear over time, and be sure to turn on the audio (in the left panel) for added flavor. [Thanks, Peter]
Terrence Fradet of Fathom Information Design ponders whether metro maps suffer or benefit by leaving out geography. Geographic accuracy is good, but sometimes it can confuse your audience.
Just how important is it that metro maps represent geography? This piece came from an interest in how metro maps over the past century have tiptoed between geographic and topological representations—topological meaning to forgo all spatial integrity and instead represent the connectivity of a specific environment.
When you focus on all the small events and decisions that happen throughout a single day, those 24 hours can seem like an eternity. Graphic designer Luke Twyman turned that around in Here is Today. It's a straightforward interactive that places one day in the context of all days ever.
You start at today, and as you move forward, the days before this one appear, until today is reduced to a one-pixel sliver on the screen and doesn't seem like much at all.
On R is My Friend, as a way to procrastinate on his own dissertation, beckmw took a look at dissertation length via the digital archives at the University of Minnesota.
I've selected the top fifty majors with the highest number of dissertations and created boxplots to show relative distributions. Not many differences are observed among the majors, although some exceptions are apparent. Economics, mathematics, and biostatistics had the lowest median page lengths, whereas anthropology, history, and political science had the highest median page lengths. This distinction makes sense given the nature of the disciplines.
I was on the long end of the statistics distribution, around 180 pages. Probably because I had a lot of pictures.
As I was working on my dissertation, people often asked me how many pages I had written and how many pages I had left to write. I never had a good answer, because there's no page limit or required page count. It's just whenever you (and your adviser) feel like there's enough to get a point across. Sometimes that takes 50 pages. Other times it takes 200.
So for those who get that dreaded page-count question, you can wave your finger at this chart and tell people you're somewhere in the distribution.
I don't know about you, but when I go to YouTube, I check my subscriptions and then look at what videos are currently popular. Because you know, it's important to stay up to date on the most current news about kittens, people getting caught doing weird things, and movie trailers. The YouTube Trends Map is another way to see what's popular, but from a geographic and demographic point of view.
Nevermind the horrible traffic in Los Angeles, where it takes a several hours to get somewhere when it should only take thirty minutes. The road quality isn't so great either. Using data from the Los Angeles Bureau of Street Services, which scores street segments on a 100-point graded scale, Ben Poston and Ben Welsh for The Los Angeles Times mapped road quality in the city.
Red represents segments with an F grade, which means resurfacing or reconstruction is required, and green are segments with A grade, which mean no cracking and no maintenance required. Yellow is everything in between. Jump to a specific area via text entry and/or see the data in aggregate, by neighborhood or council district.
The streets don't look great almost any way you look at it.
Jaz Parkinson made color signatures for classic novels. Basically, mentions of colors were tabulated and the results are shown as stacked bars, so it's fairly basic, but if you know the novels, these will mean something to you. For example, here are the signatures for Alice in Wonderland and Of Mice and Men.
The poster by Daniel E. Coe shows the life-like historical flows of the Willamette River in Oregon.
This lidar-derived digital elevation model of the Willamette River displays a 50-foot elevation range, from low elevations (displayed in white) fading to higher elevations (displayed in dark blue). This visually replaces the relatively flat landscape of the valley floor with vivid historical channels, showing the dynamic movements the river has made in recent millennia. This segment of the Willamette River flows past Albany near the bottom of the image northward to the communities of Monmouth and Independence at the top. Near the center, the Luckiamute River flows into the Willamette from the left, and the Santiam River flows in from the right.
Only $15 in print. [Thanks, Larry]
In parts of the world where there are few smartphones and GPS-enabled devices, transportation architecture has to be designed based on less granular resources, such as surveys, which can result in rough estimates. IBM researchers are looking into how data from simple cell phones can be used instead to see how people move.
The IBM work centered on Abidjan, where 539 large buses are supplemented by 5,000 mini-buses and 11,000 shared taxis. The IBM researchers studied call records from about 500,000 phones with data relevant to the commuting question...
While the data is rough—and of course not everyone on a bus has a phone or is using it—routes can be gleaned by noting the sequence of connections. And IBM and other groups have found that these mobile phone “traces” are accurate enough to serve as a guide to larger population movements for applications such as epidemiology and transportation.