• Posts and links get shared over and over again, but we usually don’t know how. We get counts, but who shares what and how far do does a link reach? Google+ Ripples gives you a peak into the process. A link or status is posted, and like when a pebble is dropped in a pond, a pattern forms outwards.
    Read More

  • Eric Fischer maps language communities on Twitter using Chrome’s open-source language detector. Each color, chosen to make differences more visibly obvious, represents a language. English is represented in dark gray, which is used just about everywhere, so it doesn’t obscure everything else.

    The emergence of borders without actually drawing them in is interesting. There’s a little bit of blending, but the splits are pretty well-defined. Especially in the Netherlands, where the tweet dispersal seems to be abnormally dense in that area. What’s going on over there?

    There’s also a world version, but Europe is where all the action’s at.

    [Language communities via @enf]

  • Casey Reas and Chandler McWilliams asked visual designers why they write their own software and how it affects their process:

    The answers reflect the individuality of the designers and their process, but some ideas are persistent. The most consistent answer is that custom software is written because it gives more control. This control is often expressed as individual freedom. Another thread is writing custom software to create a precise realization for a precise idea. To put it another way, writing custom code is one way to move away from generic solutions; new tools can create new opportunities.

    Most of the interviewees are media artists, but there are a couple of names you’ll recognize. My favorite, Amanda Cox, uses a Mad Libs metaphor:

    Mad Libs is a game where key words in a short story have been replaced with blanks. Players fill in the blanks with designated parts of speech (“noun”, “adverb”) or types of words (“body part”, “type of liquid”), without seeing the rest of the story. Occasionally, hilarity ensues, but no one really believes that this is an effective method for generating great literature.

    I’m looking at you, non-programming statistician.

    Update: The article isn’t there anymore, so you can read the cached page for now.

  • Apple has a page dedicated to Steve Jobs that displays messages from friends, colleagues, and fans. Neil Kodner downloaded those messages and extracted overall themes:

    I wanted to see what how people were speaking about Steve Jobs and especially what terms were being used to describe him. There was no point in performing sentiment analysis on this text as all of the texts were not only obviously positive but were also vetted by Apple for content. Using NLTK, I performed part-of-speech tagging on every word in each tribute message and then wrote some code to total the adjectives and adverbs used in the tribute messages.

    The top descriptors? Not surprisingly: great, many, first, sad, better, best, and visionary. About one in five messages referenced an Apple product.

    The message data and Kodner’s code is available on github.

    [Thanks, Guy]

  • It’s been about three months since Visualize This came out, and in case you haven’t gotten your hands on a copy yet, now might be a good time to get it. Amazon just lowered the price.

    I didn’t know price changed so much — although I’m not surprised — I’m guessing based on a number of factors such as third-party prices, competitors’ prices, and sales. The Kindle version (not shown) changed a lot in the beginning, costing more than the paperback, but I don’t think it’s changed since it came down to the current price. You can see the changes, as reported by price tracking site Tracktor (just try to ignore the weird vertical scale).

    The used price is not completely accurate since there weren’t any used copies available before the book was released in July.

    Also, I’m not entirely sure about the listings for used books on Amazon, as all of them are from resellers with thousands of ratings. Four copies are listed for above retail, and one of those is more than four times as much. That expensive copy must be a special edition that I haven’t heard about.

  • Using satellite data that goes back to 2002, NASA maps tens of millions of fires worldwide in this global tour.

    The tour begins by showing extensive grassland fires spreading across interior Australia and the eucalyptus forests in the northwestern and eastern part of the continent. The tour then shifts to Asia where large numbers of agricultural fires are visible first in China in June 2004, then across a huge swath of Europe and western Russia in August. It then moves across India and Southeast Asia, through the early part of 2005. The tour continues across Africa, South America, and concludes in North America.

    Bright shades of yellow indicate hotter fires, and darker shades of green represent higher levels of vegetation. Analyses show that 70 percent of fires occur in Africa, whereas only two percent occur in North America. Fascinating to watch vegetation grow and burn each year.

  • Jim Vallandingham maps racial divide in major cities using Mike Bostock’s implementation of force-directed maps:

    Data is from the 2010 Census, at the tract level. The links are hidden, but each tract is connected to each of its neighbors. The lengths of these connections encode the disparity between racial make-up between neighboring tracts. So, if a ‘mostly white’ tract is connected to another ‘mostly white’ tract, then the connection is short. If a city had uniform proportions of races in each tract, the map would not move much. However, longer connections occur where there is a sharp change in the proportions of white and black populations between neighboring tracts. These longer connections create rifts in the map and force areas apart, in some ways mimicking the real-world effects of these racial lines.

    Compare Jim’s maps with the catalyst — choropleth maps by Salon. Which do you think works better?

    [Visualizing the Racial Divide via @dwtkns]

  • Get it? It’s a Venn diagram made of actual pies. That’s why it’s called a Venn piagram. [via]

  • For Facebook’s F8 developer conference, creative agency Obscura Digital delivered the Connections installation. People could log in and see how they related to others through the eyes of circular visuals projected on the ground:

    Once “logged in” to Connections, a radial visualization, constructed from the user’s social graph data, surrounds them creating a unique “fingerprint”. Colored lines extend from the circles connecting people who share one or more of the observed metrics (mutual friends, interests, workplaces, schools, locations, birth sign, or non-English languages). When two or more people, who have mutual connections, stand within close proximity, a slideshow of mutual friends and interests appear between them.

    See it in action below. Take it a bit further, and I bet this could be a fun game. Or a novelty in a nerdy bar.
    Read More

  • Smashing Magazine offers advice on the dos and don’ts of infographic design, but they forgot to include the former. It’s as if I wrote a fake post and someone mistook it for a serious guide.

  • We’re statisticians. We don’t program.

    — Anonymous statistician

    I was talking to a small group of statisticians a few months ago, and someone said that to me when I told them how I go about mucking around with data. It still annoys me just thinking about it. It wasn’t that he didn’t know how to program — because that’s perfectly understandable — but he said it in a way as if programming and statistics were so separate that there was no possible way the two could go together.

    Wrong.

    Let’s set things straight before this silly idea spreads further. Programming and statistics belong together, and you don’t have to be a coding genius for it to work.

  • Jacob Harris, a New York Times senior software architect, rants about how people like to use word clouds to tell stories:

    Of course, the biggest problem with word clouds is that they are often applied to situations where textual analysis is not appropriate. One could argue that word clouds make sense when the point is to specifically analyze word usage (though I’d still suggest alternatives), but it’s ludicrous to make sense of a complex topic like the Iraq War by looking only at the words used to describe the events. Don’t confuse signifiers with what they signify.

    Harris says he dies a little inside every time he sees a word cloud presented as insight. Hopefully his computer doesn’t catch a virus that permanently changes his wallpaper, screensaver, and every text document he’s ever written into word clouds, or yes, he would die a little inside many times and effectively die a lot inside so much that it might show on the outside.

    Dramatics aside, I have to admit it is amusing when I get emails from people who think they have found the holy trinity of analysis, ease-of-use, and aesthetics that is Wordle. It was never intended as a serious analysis tool. Word clouds were originally made popular as a way to navigate tags for bookmarks, but other than that they’re more of a toy and should be treated that way.

  • Matthew Ericson, deputy graphics director at The New York Times, talks maps and when you should try something else:

    Maps also a terrific way to let readers look up information about specific places. On election night, they answer questions like like “Which seats did the Republicans gain?” or “Who won all the seats in Oregon?” or “Who won my Congressional district?” You don’t have to remember the number of the House district you live in — you can just look at the map, zero in on the area that you’re interested in, and see if it’s shaded red or blue.

    And obviously, when the story is completely based on the geography — “How far has the oil spill in the Gulf spread?” — there’s nothing more effective than a map showing just that.

    But sometimes the reflexive impulse to map the data can make you forget that showing the data in another form might answer other — and sometimes more important — questions.

    The full post is worth a read, chock-full of examples.

  • OpenBible quantifies the ups and downs of the Bible. Red is negative and black is positive.

    Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses, dip with the period of the judges, recover with David, and have a mixed record (especially negative when Samaria is around) during the monarchy. The exilic period isn’t as negative as you might expect, nor the return period as positive. In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows. The story of the early church, especially in the epistles, is largely positive.

    The Viralheat Sentiment Analysis API is used to assign a probability that each verse is positive or negative, and several translations are used to find a moving average.

    Those who know the Bible well want to chime in on the accuracy?

    [OpenBible]

  • Hundreds of thousands of emails are sent every second, and yet, you wouldn’t really know it because there aren’t public-facing streams like that of Twitter. Outside your own inbox, how much email is there exactly? Yahoo, in collaboration with information visualization firm Periscopic, shows you how much email they process in real-time with this interactive feature.
    Read More

  • The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.

  • Remember the Facebook connections map from a while back? It showed digital friendships around the world by connecting locations with arcs. Visual arts graduate student Ian Wojtowicz mashed that with NASA’s well-known map showing Earth at night, and the above is what you get.
    Read More

  • Cade Massey and Bob Tedeschi for The New York Times on the book, now turned movie, “Moneyball” and how it’s made data-backed thinking sound less crazy:

    At its heart, of course, “Moneyball” isn’t about baseball. It’s not even about statistics. Rather, it’s about challenging conventional wisdom with data. By embedding this lesson in the story of Billy Beane and the Oakland A’s, the book has lured millions of readers into the realm of the geek. Along the way, it converted many into empirical evangelists.

    Good. Sure makes my life a lot easier.

    Is the movie worth the 2 hours and 10 bucks in the theatre? The movie seems right up my alley, but for some reason the previews left me disinterested.

    [New York Times via @alexlundry]