• We’re statisticians. We don’t program.

    — Anonymous statistician

    I was talking to a small group of statisticians a few months ago, and someone said that to me when I told them how I go about mucking around with data. It still annoys me just thinking about it. It wasn’t that he didn’t know how to program — because that’s perfectly understandable — but he said it in a way as if programming and statistics were so separate that there was no possible way the two could go together.

    Wrong.

    Let’s set things straight before this silly idea spreads further. Programming and statistics belong together, and you don’t have to be a coding genius for it to work.

  • Jacob Harris, a New York Times senior software architect, rants about how people like to use word clouds to tell stories:

    Of course, the biggest problem with word clouds is that they are often applied to situations where textual analysis is not appropriate. One could argue that word clouds make sense when the point is to specifically analyze word usage (though I’d still suggest alternatives), but it’s ludicrous to make sense of a complex topic like the Iraq War by looking only at the words used to describe the events. Don’t confuse signifiers with what they signify.

    Harris says he dies a little inside every time he sees a word cloud presented as insight. Hopefully his computer doesn’t catch a virus that permanently changes his wallpaper, screensaver, and every text document he’s ever written into word clouds, or yes, he would die a little inside many times and effectively die a lot inside so much that it might show on the outside.

    Dramatics aside, I have to admit it is amusing when I get emails from people who think they have found the holy trinity of analysis, ease-of-use, and aesthetics that is Wordle. It was never intended as a serious analysis tool. Word clouds were originally made popular as a way to navigate tags for bookmarks, but other than that they’re more of a toy and should be treated that way.

  • Matthew Ericson, deputy graphics director at The New York Times, talks maps and when you should try something else:

    Maps also a terrific way to let readers look up information about specific places. On election night, they answer questions like like “Which seats did the Republicans gain?” or “Who won all the seats in Oregon?” or “Who won my Congressional district?” You don’t have to remember the number of the House district you live in — you can just look at the map, zero in on the area that you’re interested in, and see if it’s shaded red or blue.

    And obviously, when the story is completely based on the geography — “How far has the oil spill in the Gulf spread?” — there’s nothing more effective than a map showing just that.

    But sometimes the reflexive impulse to map the data can make you forget that showing the data in another form might answer other — and sometimes more important — questions.

    The full post is worth a read, chock-full of examples.

  • OpenBible quantifies the ups and downs of the Bible. Red is negative and black is positive.

    Things start off well with creation, turn negative with Job and the patriarchs, improve again with Moses, dip with the period of the judges, recover with David, and have a mixed record (especially negative when Samaria is around) during the monarchy. The exilic period isn’t as negative as you might expect, nor the return period as positive. In the New Testament, things start off fine with Jesus, then quickly turn negative as opposition to his message grows. The story of the early church, especially in the epistles, is largely positive.

    The Viralheat Sentiment Analysis API is used to assign a probability that each verse is positive or negative, and several translations are used to find a moving average.

    Those who know the Bible well want to chime in on the accuracy?

    [OpenBible]

  • Hundreds of thousands of emails are sent every second, and yet, you wouldn’t really know it because there aren’t public-facing streams like that of Twitter. Outside your own inbox, how much email is there exactly? Yahoo, in collaboration with information visualization firm Periscopic, shows you how much email they process in real-time with this interactive feature.
    Read More

  • The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.

  • Remember the Facebook connections map from a while back? It showed digital friendships around the world by connecting locations with arcs. Visual arts graduate student Ian Wojtowicz mashed that with NASA’s well-known map showing Earth at night, and the above is what you get.
    Read More

  • Cade Massey and Bob Tedeschi for The New York Times on the book, now turned movie, “Moneyball” and how it’s made data-backed thinking sound less crazy:

    At its heart, of course, “Moneyball” isn’t about baseball. It’s not even about statistics. Rather, it’s about challenging conventional wisdom with data. By embedding this lesson in the story of Billy Beane and the Oakland A’s, the book has lured millions of readers into the realm of the geek. Along the way, it converted many into empirical evangelists.

    Good. Sure makes my life a lot easier.

    Is the movie worth the 2 hours and 10 bucks in the theatre? The movie seems right up my alley, but for some reason the previews left me disinterested.

    [New York Times via @alexlundry]

  • A big thank you to the FlowingData sponsors. Without them, I would not be able to do what I do, and this site wouldn’t exist. Check ’em out. They help you do stuff with data.

    IDV Solutions Visual Fusion — Business intelligence software for building focused apps that unite data from virtually any data source in a visual, interactive context for better insight and understanding.

    Tableau Software — Tableau Software helps people see and understand data. Ranked by Gartner in 2011 as the world’s fastest growing business intelligence company, Tableau helps anyone quickly and easily analyze, visualize and share information.

    Column Five Media — Whether you are a startup that is just beginning to get the word out about your product, or a Fortune 500 company looking to be more social, they can help you create exciting visual content – and then ensure that people actually see it.

    InstantAtlas — Enables information analysts and researchers to create highly-interactive online reporting solutions that combine statistics and map data to improve data visualization, enhance communication, and engage people in more informed decision making.

    Want to sponsor FlowingData? Send interest to [email protected] for more details.

  • With the end of NASA’s human spaceflight program, Tommy McCall and Mike Orcutt for Technology Review explore space launches, since Sputnik 1 went into orbit in 1957. While humans won’t be going up in space for NASA anymore, that doesn’t mean there won’t be anything launching into space.

    Of the 7,000 spacecraft that have been launched into orbit or beyond, more than half were defense satellites used for such purposes as communication, ­navigation, and imaging. (The Soviet Union sent up a huge number, partly because its satellites tended to be much shorter-lived than those from the United States.) In the 1970s, private companies began increasingly adding to the mix, ­launching satellites for telecommunications and broadcasting.

    The stacked bar turned rocket blast aesthetic is a nice touch. Time runs on the vertical and launches are split by country, where USSR/Russia and the United States of course lead the way. The bigger the blast, the more launches for a given country. Color represents purpose of launch. I like it.

    [Technology Review via @pitchinteractiv]

  • Nobel Prizes have been awarded every year since 1901. Where are all the winners from? Jon Bruner from Forbes puts it in a graphic. It’s a simple yet effective approach where dots represent a won award, and countries are sorted by number of prizes won. The United States has clearly dominated the field since 1950, although many winners were foreign-born:

    The United States is also unique in the scale on which it attracts human capital: of the 314 laureates who won their Nobel prize while working in the U.S., 102 (or 32%) were foreign born, including 15 Germans, 12 Canadians, 10 British, six Russians and six Chinese (twice as many as have received the award while working in China). Compare that to Germany, where just 11 out of 65 Nobel laureates (or 17%) were born outside of Germany (or, while it still existed, Prussia). Or to Japan, which counts no foreigners at all among its nine Nobel laureates.

    Before World War II, it was a different story. Germany led the way.

    [Forbes | Thanks, Jon]

  • Twitter engineer Miguel Rios pays tribute to the man, the legend. Zoomed out you see the portrait of Steve Jobs. Zoom in, and you see public tweets tagged with #thankyousteve sent out over a four and a half hour period on the evening of October 5. Tweets are ordered by number of retweets, left to right and top to bottom.
    Read More

  • In a follow-up to their mood maps, Scott Golder and Michael Macy of Cornell University look at mood cycles during the hours of the day:

    They found that, on average, people wake up in a good mood, which falls away over the course of the day. Positive feelings peak early in the morning and again nearer midnight, while negative feelings peak between 9pm and 3am. Unsurprisingly, people get happier as the week goes on. They’re most positive on Saturdays and Sundays and they tend to lie in for an extra two hours, as shown by the delayed peak in their positive feelings. The United Arab Emirates provide an interesting exception. There, people work from Sunday to Thursday, and their tweets are most positive on Friday and Saturday.

    It’s strange that good mood peaks around midnight. Maybe the people who are in a bad mood slowly go to sleep, leaving only those in a good mood to tweet. Then again, negative mood also seems to peak around midnight. Peculiar. I don’t have access to the full article, so if anyone does, I’d be interested to hear Golder and Macy’s interpretations.

    [Discover Magazine via @albertocairo]

  • The creative process changes by person and project, but there are obstacles and steps along the way that you tend to pass with each. Graphic designer Melike Turgut maps his own process. Start from the inside (research, reading, and learning), and make your way out (questions, ideas, and refinement).

    [Melike Turgut via @brainpicker]

  • Twitter is a bustling place of tweets, retweets, and replies, and the growth and spread of news can be very organic. After all, there are actual human beings using the service. Kunal Anand, Director of Technology at the BBC, played on this idea of Twitter as an ecosystem and created Tweetures.
    Read More

  • After a certain point in math education, like some time during high school, the relevance of the concepts to the everyday and the real world seem to fade. However, in many ways, math lets you describe real life better than you can with just words. Designer Bret Victor hopes to make the abstract and conceptual to real and concrete with Kill Math.
    Read More