FlowingData

Earth’s skies with Saturn’s rings →

June 27, 2013

Topic
Visualization / space

Illustrator Ron Miller imagined what Earth’s skies would look like if we had Saturn’s rings.

Now, Miller brings his visualizations back to Earth for a series exploring what our skies would look like with Saturn’s majestic rings. Miller strived to make the images scientifically accurate, adding nice touches like orange-pink shadows resulting from sunlight passing through the Earth’s atmosphere. He also shows the rings from a variety of latitudes and landscapes, from the U.S. Capitol building to Mayan ruins in Guatemala.

Miller has a large portfolio of space-related illustrations also worth a look. [via @golan]
Statistics jokes

June 27, 2013

Topic
Statistics / humor

There’s a fun CrossValidated thread on statistics jokes. Here’s the one with the top votes:

A statistician’s wife had twins. He was delighted. He rang the minister who was also delighted. “Bring them to church on Sunday and we’ll baptize them,” said the minister. “No,” replied the statistician. “Baptize one. We’ll keep the other as a control.

This line by George Burns is my favorite though:

If you live to be one hundred, you’ve got it made. Very few people die past that age.

Any other good ones?
Read More
COMING MAY 29
Pre-order on Amazon
Data Underload / grocery

Grocery store geography

I’ve been poking around grocery store locations, courtesy of AggData, the past few…

Read More
The Boy Who Loved Math →

June 25, 2013

Topic
Miscellaneous / book, math, Paul Erdos

The Boy Who Loved Math: The Improbable Life of Paul Erdős, written by Deborah Heiligman and illustrated by LeUyen Pham, is a kids’ book on the life of the prolific mathematician and a boy’s love of numbers.

Most people think of mathematicians as solitary, working away in isolation. And, it’s true, many of them do. But Paul Erdos never followed the usual path. At the age of four, he could ask you when you were born and then calculate the number of seconds you had been alive in his head. But he didn’t learn to butter his own bread until he turned twenty. Instead, he traveled around the world, from one mathematician to the next, collaborating on an astonishing number of publications. With a simple, lyrical text and richly layered illustrations, this is a beautiful introduction to the world of math and a fascinating look at the unique character traits that made “Uncle Paul” a great man.

Heck yeah. [via Boing Boing]
Atlas of literal place names

June 25, 2013

Topic
Maps / cartography, names, poster

We go places. They have names. What do these names mean though? The Atlas of True Names by cartographers Stephan Hormes and Silke Peust can help you with that, replacing place names with the meaning of place names. California becomes the Land of the Successors, Texas is the Land of Friends, but forget all that. Who’s up for a visit to Illinois, the Land of Those Who Speak Normally?

See more detail for the United States here. There are also versions for the British Isles, Europe, and the world, all available for purchase to adorn your walls. [via Slate]
Contrailz: Detailed flight patterns at major airports

June 24, 2013

Topic
Maps / flights

Alexey Papulovskiy collected flight data from Plane Finder for a month, which essentially gives you a bunch of points in space over time. Then he mapped the data in Contrailz.

Turns out, besides Flight Levels (FL) (which are indicated on my map by dots’ color: red ones stand for lower altitudes and blue — for higher) planes have pretty specific “roads” and “highways” as well as “intersections” and “junctions”. You can see this for yourself by taking a look at the Russian part of the map: it’s less “crowded”, so the picture is as clear as it gets. The sky above Moscow area looks particularly interesting: civil flights are allowed there only since March 2013 and only with an altitude of 27.000 ft or higher.

Aaron Koblin’s Flight Patterns always comes to mind immediately when I see flight data, and Contrailz of course looks similar, but the latter brings in European flight patterns, too, which makes it worth a gander.

By the way, you should also check out Plane Finder if you haven’t seen that yet. It shows planes currently in flight, and there’s a lot of them. [Thanks, Alexey]
Income inequality, real and personal

June 24, 2013

Topic
Infographics / income, Periscopic

In a different take on the income inequality issue, the Economic Policy Institute, in collaboration with Periscopic, created Inequality Is.

The Inequality.is website brings clarity to the national dialogue on wage and income inequality, using interactive tools and videos to tell the story of how we arrived at the state of inequality we find today and what can be done to reverse course and ensure workers get their fair share.

Inequality is: real, personal, expensive, created, and fixable. These are the categories the interactive takes you through to explain the subject. The first part reminds you of the video we saw on wealth distribution, which showed what people thought was an ideal distribution of wealth, what they thought it was in real life, and then what it actually was. However, in this interactive, you’re the one answering, which sort of sets the stage for the rest of the interactive. The goal is to make the data more relatable.

Be sure to go through the whole piece. It rounds off nicely with a video explanation with public policy professor Robert Reich and ways to shift the inequality in the other direction.
Beer recommendation system in R

June 21, 2013

Topic
Statistics / beer, R, recommendation

Using data from Beer Advocate, in the form of 1.5 million reviews, yhat shows how to build a recommendation system in R.

The goal for our system will be for a user to provide us with a beer that they know and love, and for us to recommend a new beer which they might like. To accomplish this, we’re going to use collaborative filtering. We’re going to compare 2 beers by ratings submitted by their common reviewers. Then, when one user writes similar reviews for two beers, we’ll then consider those two beers to be more similar to one another.

The simple recommender is at the end of the article. Select a beer you like, a type of beer you want to try, and you get a handful of beers you might like.

Obviously, the method isn’t exclusive to beer reviews, and this is just a start to a more advanced system that you can tailor to your own data. The good news is that the code to scrape data and recommend things is there for your disposal. [via @drewconway]
Mapping Twitter demographics

June 20, 2013

Topic
Maps / Eric Fischer, mapbox, Twitter

MapBox, along with Gnip and Eric Fischer, mapped 3 billion tweets and a handful of variables.

This is a look at 3 billion tweets — every geotagged tweet since September 2011, mapped, showing facets of Twitter’s ecosystem and userbase in incredible new detail, revealing demographic, cultural, and social patterns down to city level detail, across the entire world. We were brought in by the data team at Gnip, who have awesome APIs and raw access to the Twitter firehose, and together Tom and data artist Eric Fischer used our open source tools to visualize the data and build interfaces that let you explore the stories of space, language, and access to technology.

You’ll probably recognize some of the maps, as they build on Fischer’s previous projects, such as languages of Twitter and locals versus tourists. The originals were static images though. The interaction provides an exploratory view that lets you poke around the areas you’re interested in, and maybe best of all, it was built with open source software.
A high resolution tour of the vegetation on Earth

June 20, 2013

Topic
Maps / environment, NASA, NOAA

NOAA visualized global vegetation over a year, and the result is beautiful:

We’ve seen forestry maps before, some quite detailed, but this is the first I’ve seen it at this granularity over a period of time.

Although 75% of the planet is a relatively unchanging ocean of blue, the remaining 25% of Earth’s surface is a dynamic green. Data from the VIIRS sensor aboard the NASA/NOAA Suomi NPP satellite is able to detect these subtle differences in greenness. The resources on this page highlight our ever-changing planet, using highly detailed vegetation index data from the satellite, developed by scientists at NOAA. The darkest green areas are the lushest in vegetation, while the pale colors are sparse in vegetation cover either due to snow, drought, rock, or urban areas. Satellite data from April 2012 to April 2013 was used to generate these animations and images.

The changes are especially obvious as the season moves to summer, going from snow-covered to deep green.
Twitter trend detection algorithm

June 19, 2013

Topic
Statistics / MIT, trends, Twitter

Stuff happens, and people tweet about it. Something major happens, and a lot of people tweet about it. Masters student Stanislav Nikolov and his adviser Devavrat Shah are working on ways to algorithmically detect the latter.

People acting in social networks are reasonably predictable. If many of your friends talk about something, it’s likely that you will as well. If many of your friends are friends with person X, it is likely that you are friends with them too. Because the underlying system has, in this sense, low complexity, we should expect that the measurements from that system are also of low complexity. As a result, there should only be a few types of patterns that precede a topic becoming trending. One type of pattern could be “gradual rise”; another could be “small jump, then a big jump”; yet another could be “a jump, then a gradual rise”, and so on. But you’ll never get a sawtooth pattern, a pattern with downward jumps, or any other crazy pattern.

And with that, the algorithm compares current patterns to the ones above. If they look like a trending pattern, the algorithm marks something as a trend with some probability. In testing with past trending topics, the algorithm was able to pick correctly over 90 percent of the time.

The best part is that this method can be applied to other time series data. “We can try this on traffic data to predict the duration of a bus ride, on movie ticket sales, on stock prices, or any other time-varying measurements.”
Animation shows flow of attendees during a conference

June 18, 2013

Topic
Maps / conference, wireless

When you go to a conference, there are typically several talks going on at the same time, and you can always tell there’s a popular paper coming up when you see people leave a bunch of rooms at once and head straight into one. There’s also the unfortunate case when someone speaks, and there’s only a handful of people in the room, all in the back staring at their laptops. Open Data City visualized this activity during the German internet conference re: publica.

Open Data City used MAC addresses and access point connections to keep track of where devices went. So a person might be in a room connected to the nearest access point, disconnects as he leaves, and then reconnects as he reenters another room, which provides the flow.

It’s fun to watch the conference play out even if you didn’t attend. Each dot represents an attendee, and as the animation plays the dots migrate from room to room. Click and drag over the dots to select specific people. [Thanks, Michael]
Non-statistician analysts are the new norm

June 17, 2013

Topic
Statistics / Jeff Leek, non-professionals

As data grows cheaper and more easily accessible, the people who analyze it aren’t always statisticians. They’re likely to not even have had any statistical training. Biostatistics professor Jeff Leek says we need to adapt to this broader audience.

What does this mean for statistics as a discipline? Well it is great news in that we have a lot more people to train. It also really drives home the importance of statistical literacy. But it also means we need to adapt our thinking about what it means to teach and perform statistics. We need to focus increasingly on interpretation and critique and away from formulas and memorization (think English composition versus grammar). We also need to realize that the most impactful statistical methods will not be used by statisticians, which means we need more fool proofing, more time automating, and more time creating software. The potential payout is huge for realizing that the tide has turned and most people who analyze data aren’t statisticians.

Yep.

Those who disagree tend to worry what might happen — what kind of data-based decisions will be made — by non-statisticians, and that should definitely be a priority as we move forward. Non-statisticians often make incorrect assumptions about the data, forget about uncertainty, and don’t know much about collection methodologies.

However, as a statistician (or someone who knows statistics), you can shoo everyone else away from the data and gripe when they come back, or you can help them get things right.
The differences between a geek and a nerd

June 14, 2013

Topic
Statistics / geek, nerd, Twitter

Curious about how people use “geek” and “nerd” to describe themselves and if there was any difference between the two terms, Burr Settles analyzed words used in tweets that contained the two. Settles used pointwise mutual information (PMI), which essentially provided a measure of the geekness or nerdiness of a term. The plot above shows the results.

In broad strokes, it seems to me that geeky words are more about stuff (e.g., “#stuff”), while nerdy words are more about ideas (e.g., “hypothesis”). Geeks are fans, and fans collect stuff; nerds are practitioners, and practitioners play with ideas. Of course, geeks can collect ideas and nerds play with stuff, too. Plus, they aren’t two distinct personalities as much as different aspects of personality. Generally, the data seem to affirm my thinking.

Or maybe pop culture (geek) versus education (nerd).
Sniffing out Paul Revere with basic social network analysis

June 13, 2013

Topic
Network Visualization / metadata, Paul Revere, privacy

It’s just metadata. What can you do with that? Kieran Healy, a sociology professor at Duke University, shows what you can do, with just some basic social network analysis. Using metadata from Paul Revere’s Ride on the groups that people belonged to, Healy sniffs out Paul Revere as a main target. Bonus points for writing the summary from the point of a view of an 18th century analyst.

What a nice picture! The analytical engine has arranged everyone neatly, picking out clusters of individuals and also showing both peripheral individuals and—more intriguingly—people who seem to bridge various groups in ways that might perhaps be relevant to national security. Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.

You can grab the R code and dataset on github, too, if you want to follow along.
Data Underload / dating

What the Sexes Want, in Speed Dating

A few years ago I downloaded speed dating data from experiments conducted by…

Read More
Price of Damien Hirst spot paintings →

June 12, 2013

Topic
Statistical Visualization / Amanda Cox, Damien Hirst, New York Times

Damien Hirst is an artist known for a number of works, one of those being his large production of spot paintings. There are over a thousand of them painted by him and his assistants, varying in size, number of dots, density, and color. Amanda Cox of The New York Times plotted paintings sold from 1999 to present, topping out at $3.4 million. That’s a whole lot of dottage.
Other than advertisers

June 11, 2013

Topic
News / humor, Onion, privacy

The Onion tackles data privacy:

“As a law-abiding resident of this nation, I have the right to do whatever I want without a shadowy organization recording my every move, unless of course it’s part of an electronic campaign designed to figure out, based on all of my emails and phone conversations, what types of clothes, shoes, and houseware products I like. Then it’s fine.” Sources later confirmed that Landler had posted a Facebook rant on the issue, which had generated a pop-up ad from a company that restores lost PC data.
Easy mapping with Map Stack

June 11, 2013

Topic
Maps / OpenStreetMap, Stamen

It seems like the technical side of map-making, the part that requires code or complicated software installations, fades a little more every day. People get to focus more on actual map-making than on server setup. Map Stack by Stamen is the most recent tool to help you do this.

We provide access to different parts of the map stack, like backgrounds, roads, labels, and satellite imagery. These can be modified using straightforward controls to change things like color, opacity, and brightness. So within a few minutes you can have a map of anywhere in the world with dark green parks and blue buildings. You can get very precise with image overlays and layer effects, using layers as cut-out masks for other layers. Or just make a regular-looking map in the colors you want.

The idea is to make it radically simpler for people to design their own maps, without having to know any code, install any software, or even do any typing.

It’s completely web-based, and you edit your maps via a click interface. Pick what you want (or use Stamen’s own stylish themes) and save an image. For the time being, the service is open only from 11am to 5pm PST, so just come back later if it happens to be closed.

See here for a taste of what others have done so far.
State of the OpenStreetMap

June 11, 2013

Topic
Maps / OpenStreetMap

OpenStreetMap, the free wiki world map that offers up high quality geographic data, has grown a lot in the past eight years. The OpenStreetMap Data Report shows all these changes. Says the report: “The database now contains over 21 million miles of road data and 78 million buildings.”
Read More