• The Washington Post visualized the use of specific words throughout the years during State of the Union addresses.

    Since 1900, there have been 116 State of the Union addresses, given by 20 presidents, with some presidents giving two addresses a year. Studying their choice of words, over time, provides glimpses of change in American politics—”communism” fades, “terrorism” increases—and evidence that some things never change (“America” comes up steadily, of course. As does “I.”).

    For some reason the interactive won’t load for me now (It did yesterday.), but there’s also a PDF version that you can download. Although the PDF only goes back to 1989 Bush, so try for the interactive version first. It was an interesting one. Update: Works again.

    Can you believe it? We made it through an entire SOTU without a single word cloud. Come to think of it, I can’t even remember the last time I saw one. I almost feel cheated.

  • Famous movie quotesIt’s been an interesting few days. I thought a few people would find the famous quotes graphic amusing, but I didn’t expect so many to share my odd sense of humor. Thanks.

    If you haven’t pre-ordered a poster yet, today’s the last day to get it at a discounted price.

    Put your order in here.

    I’m going to proof the poster a few more times tonight and then send it to the printers. They should take about a week to get the finished posters to me. From there, I’ll be (really) busy signing and rolling.

    I still expect mid-February shipments to you. International shipping takes a little longer of course, depending on where you are.

  • R, the statistical computing language of choice and what I use the most, can seem odd to those new to the language or programming. And I think this what holds a lot of people back and what keeps people stuck in limited software. The swirl package for R helps beginners get over that first hurdle by teaching you within R itself.

    swirl is a software package for the R statistical programming language. Its purpose is to teach users statistics and R simultaneously and interactively. It attempts to do this in the most authentic learning environment possible by guiding users through interactive lessons directly within the R console.

    Assuming you installed R on your computer already, install the package (and the other packages it depends on), make a call to swirl(), and you get a guide through the basics.

  • Benjamin Grosser visualized how computers “watch” movies through vision algorithms and artificial intelligence in Computers Watching Movies.

    Computers Watching Movies was computationally produced using software written by the artist. This software uses computer vision algorithms and artificial intelligence routines to give the system some degree of agency, allowing it to decide what it watches and what it does not. Six well-known clips from popular films are used in the work, enabling many viewers to draw upon their own visual memory of a scene when they watch it.

    Above is the bag scene from American Beauty. Contrast this with the more frantic Inception scene, and you get a good idea of how it works. See computer-watching scenes for several more movies here.

  • Members Only
    Tutorials  / 

    With the plethora of mobile apps to track your location and activities, such as OpenPaths and Moves, or the fitness-specific Endomondo, MapMyRun, and RunKeeper, many of us have a personal data source of where we are and how we got there. However, most of the maps available on these services only show a bunch of markers or only one path at once. It can be fun and useful to see more of the data at once.

  • Looking for a job in data science, visualization, or statistics? There are openings on the board.

    Senior Game Analytics Specialist for Activision Publishing, Inc. in Santa Monica, CA

    Data Scientist for Thumbtack in San Francisco, CA

    Instructional Technologist for Quantitative Applications for Reed College in Portland, OR

  • Last year, WNYC made an interactive map that shows transit times in New York, based on where you clicked. Geography graduate student Andrew Hardin expanded on the idea for San Francisco, Seattle, Boulder, and Denver, with additional options and more granular simulations.
    Read More

  • Researchers at Princeton released a study that said that Facebook was on the way out, based primarily on Google search data. Naturally, Facebook didn’t appreciate it much and followed up with their own “study” that debunks the Princeton analysis, blasted with a healthy dose of sarcasm. They also showed that Princeton is on their way to zero-enrollment.

    This trend suggests that Princeton will have only half its current enrollment by 2018, and by 2021 it will have no students at all, agreeing with the previous graph of scholarly scholarliness. Based on our robust scientific analysis, future generations will only be able to imagine this now-rubble institution that once walked this earth.

    While we are concerned for Princeton University, we are even more concerned about the fate of the planet — Google Trends for “air” have also been declining steadily, and our projections show that by the year 2060 there will be no air left

    Crud. Dibs on the oxygen tanks.

  • Dennis Hlynsky, an artist and a professor at the Rhode Island School of Design, recorded videos of flying birds and in post-processing shows previous flight positions for less than a second. The results are beautiful. It’s like the video version of long-exposure photography.

    This is just one video in the series. Also see this, this, and this. [via Colossal]

  • Remember when Amy Webb created a bunch of fake male profiles to scrape data from two dating sites and analyze it to find a husband? Mathematician Chris McKinlay took a similar route to find a girlfriend (and now fiancee). However, unlike Webb who used a relatively small sample, McKinlay scraped data for thousands of profiles in his area and analyzed the data more thoroughly, in search of the perfect mate.

    For McKinlay’s plan to work, he’d have to find a pattern in the survey data—a way to roughly group the women according to their similarities. The breakthrough came when he coded up a modified Bell Labs algorithm called K-Modes. First used in 1998 to analyze diseased soybean crops, it takes categorical data and clumps it like the colored wax swimming in a Lava Lamp. With some fine-tuning he could adjust the viscosity of the results, thinning it into a slick or coagulating it into a single, solid glob.

    He played with the dial and found a natural resting point where the 20,000 women clumped into seven statistically distinct clusters based on their questions and answers. “I was ecstatic,” he says. “That was the high point of June.”

    He selected the two clusters most to his liking, looked at what interested the women, and then adjusted his profile accordingly. He didn’t lie. He just emphasized the traits that he possessed and that women tended to like. Then he waited for women to notice him.

    It’s kind of like he built a targeted advertising system for himself and then cast a really wide net. Even though McKinlay is engaged now, I still wonder if it actually worked or if something similar might have happened if he left it to chance. I like to believe in the latter. He did after all go on dates with 87 other people before finding a match.

  • Kiln and the Guardian explored the 100-year history of passenger air travel, and to kick off the interactive is an interactive map that uses live flight data from FlightStats. The map shows all current flights in the air right now. Nice.

    Be sure to click through all the tabs. They’re worth the watch and listen, with a combination of narration, interactive charts, and old photos.

    And of course, if you like this, you’ll also enjoy Aaron Koblin’s classic Flight Patterns.

  • Famous movie quotesSince so many of you kind people asked, the movie-quotes-as-charts graphic is now coming to a poster near you. Take advantage of the early-bird pricing and pre-order the print now.

    The poster is 24 inches wide by 36 inches tall, printed on 80lb cover and with a matte finish. I’ll sign and hand-number each of them.

    I’ll take orders for a week, and then it’s off to the printers. Printing usually takes a week or two, depending on how many there are, and then I’ll roll and mail everything myself. So if all goes as planned, the posters go out in February.

    Thanks all for your interest. And one more time: Get your pre-order in here.

  • You can now wear a MagicBand when you enter Disneyland to get a more personalized experience, and in return, the park gets to know what their customers are up to. John Foreman, the chief data scientist at MailChimp, describes the new data toy after a trip to the happiest place on Earth.

    What does Disney get out of the deal? In short, it tracks everything you do, everything you buy, everything you eat, everything you ride, everywhere you go in the park. If the goal is to keep you in the park longer so you’ll spend more money, it can build AI models on itineraries, show schedules, line length, weather, etc., to figure out what influences stay length and cash expenditure. Perhaps there are a few levers they can pull to get money out of you.

    I knew Disney imagineers kept track of park activity, such as line length and congestion areas, but this takes it to the next level. Is it weird that I’m curious how this would work at home?

  • Two Google research groups, Big Picture and Music Intelligence, got together and made a music timeline baby.

    The Music Timeline shows genres of music waxing and waning, based on how many Google Play Music users have an artist or album in their music library, and other data (such as album release dates). Each stripe on the graph represents a genre; the thickness of the stripe tells you roughly the popularity of music released in a given year in that genre. (For example, the “jazz” stripe is thick in the 1950s since many users’ libraries contain jazz albums released in the ’50s.) Click on the stripes to zoom into more specialized genres.

    As you’d expect, the initial view is a stacked area chart that represents the popularity of genres over time, which feels fairly familiar, but then you interact with the stacks and it gets more interesting and almost surprisingly fast. The best part is the pointers to specific albums as you mouse over.

  • In celebration of their 100-year anniversary, the American Film Institute selected the 100 most memorable quotes from American cinema, and a few years ago, for kicks and giggles, I put the first eight quotes into chart form. I planned to chartify all 100, but I got distracted.

    Lately though, finishing what I started became my distraction. So here it is: the 100 most memorable quotes in chart form and I can finally put it to rest. See the big version for more detail.

    Also available in print.

  • The Donald Duck family tree is huge. Who knew? Above is only a sample. See the full version here.

  • Using data from linguistics research by Kostiantyn Tyshchenko, Teresa Elms clustered European languages in this network graph. If you look closely, you might wonder why English is considered a Germanic language. Elms explains:

    So why is English still considered a Germanic language? Two reasons. First, the most frequently used 80% of English words come from Germanic sources, not Latinate sources. Those famous Anglo-Saxon monosyllables live on! Second, the syntax of English, although much simplified from its Old English origins, remains recognizably Germanic. The Norman conquest added French vocabulary to the language, and through pidginization it arguably stripped out some Germanic grammar, but it did not ADD French grammar.

  • Most people, at least those who visit sites like FlowingData, know about map projections. You have to do math to get the globe, a thing that exists in this 3-dimensional world, into a two-dimensional space. The often-noted scene from the West Wing explains a bit, some demos help you compare, and there are map games that highlight distortions.

    But, it can still be fuzzy because most of us don’t deal with the true shape and size of countries regularly. These figures from Elements of map projection with applications to map and chart construction, published in 1921, take a different route and place a face — something familiar — to show distortions. Foreheads get bigger, ears get smaller, noses change sizes, and projections are easier to understand. [via io9]

  • Religion and life expectancyThe person in this cartoon nailed it. I’m going to do the same starting this weekend, and I recommend that you do too, if you want to live longer.

    (Couldn’t find where this is from. Anyone know?)

  • Hyperakt and Ekene Ijeoma visualized migrations over time and space in The Refugee Project. The interactive is based on United Nations data, which is naturally limited in scope, because it’s difficult to count undocumented migrations, but there is plenty to learn here about major political and social events in history.

    The map starts in 1975, and with each tick of a year, the circles adjust to show outgoing numbers. Mouse over a circle, and you can see estimates for where people went, which is represented with extending lines.

    Document icons appear over major event locations which provide more context about what happened in the country. This is key. I just wish there were more of them. It’d provide an even better history lesson.