• A decade of college degrees

    North by Northwestern looked closer at degrees awarded by their university over the past decade. Simply enter a degree to see the trend. As the makers note, the number of degrees is a lagging indictor of major popularity, since people pick their major and graduate three years later.

    Be sure to keep scrolling past the interactive for some explainers. Also, you can download the time series data for your own perusal via the link in the footnote.

  • Evolution and history of London

    Posted to Mapping  |  Tags: ,

    Using data from the National Heritage List for England, the London Evolution Animation shows the historical development of London. Mainly, it depicts roads and protected buildings, starting with the first road network built in 410 and into the present. The notes at the bottom provide a fine timeline quality rather than a hey-look-at-London-change video.

    [via kottke]

  • Gotham City map

    Posted to Mapping  |  Tags: , ,

    In 1998, artist Eliot R. Brown created a map of Gotham City for the Batman No Man's Land series. Brown describes the process of making the map, meant to look something like Manhattan with a lot more villains and a way for the federal government to blast the bridges and tunnels to the outside world.

    Gotham map

    [via Smithsonian]

  • Distribution of letters in the English language

    Some letters in the English language appear more often in the beginning of words. Some appear more often at the end, and others show up in the middle. Using the Brown corpus from the Natural Language Toolkit, David Taylor looked closer at letter position and usage.

    I've had many "oh, yeah" moments looking over the graphs. For example, words almost never begin with "x", but it's quite common as the second letter. There's a little hump near the beginning of "u" that's caused by its proximity to "q", which is most common at the beginning of a word. When you remove "q" from the dataset, the hump disappears. "F" occurs toward the extremes, especially in prepositions ("for", "from", "of", "off") but rarely just before the middle.

    Next step: letter proximity.

  • Tron-style dashboard shows Wikipedia and GitHub streams

    As a fun learning exercise, Rob Scanlon made a dashboard that shows GitHub and Wikipedia changes in the style of graphics in Tron: Legacy.

    Hello User. This is a reproduction of the graphics in the boardroom scene in Tron: Legacy. If you have not seen that movie, check out this background material on the making of that scene before proceeding.

    To make this a bit more fun, the boardroom is configured to visualize live updates from Github and Wikipedia, with more streams to come. Click on a stream in the window to the right to continue.

    Type "cd github" and "run github.exe" for maximum pleasure.

  • Where Bars Outnumber Grocery Stores

    Back in 2008, the Floatingsheep group collected data about the number of bars across the United States, and they compared those counts against the number of grocery stores. Their map showed what they called the "beer belly of America", which is a much higher than average number of bars in the Wisconsin area.

    I came back to the map recently, and three questions came to mind:

    1. The original map only showed a binary comparison. That is, areas were either colored as more bars or more grocery stores. What if we mapped the magnitude of the difference?
    2. The data from 2008 comes from the now defunct Google Maps Directory and only represented references to bars and grocery stores (which maybe made the previous bullet point not worth doing then). Would using the newer Google Places API provide more detail?
    3. What about other countries?

    I started with the first two questions and went from there.
     Continue Reading 

  • Careers after the college degree

    Posted to Network Visualization  |  Tags:

    Ben Schmidt, an assistant professor of history at Northeastern University, was curious about careers after college degrees, so he used a quick Sankey diagram to look at data from the American Community Survey. College degrees are on the left, and professions are on the right. The thicker a line that connects a degree and a profession, the more people tend to go a certain route.

    For example, if you click on "General Education" you see a lot of people become elementary and middle school teachers. The diagram works the other way around too, so that you can select a profession to see what people in that area tend to major in.

    As Schmidt says in his description, it was just quick sketch, so the interaction is rough around the edges, but the data here is kind of interesting to look at, especially with all the graduating kids right now.

  • Death penalty, the executed and the victims

    Posted to Infographics  |  Tags: ,

    The Washington Post provides a look at the death penalty in the United States, from 1977 to present. On the left is an icon for each executed and on the right are the murderers' victims. Be sure to read the annotation for full context.

    Death penalty comparison

  • What pregnant women want

    Posted to Statistics  |  Tags: , ,

    In another take on the game of what Google suggests while searching, Seth Stephens-Davidowitz for The New York Times looked at queries related to pregnant women. Some searches were similar across countries, whereas others varied culturally.

    Start with questions about what pregnant women can do safely. The top questions in the United States: Can pregnant women "eat shrimp," "drink wine," "drink coffee" or "take Tylenol"?

    But other countries don't look much like the United States or one another. Whether pregnant women can "drink wine" is not among the top 10 questions in Canada, Australia or Britain. Australia's concerns are mostly related to eating dairy products while pregnant, particularly cream cheese. In Nigeria, where 30 percent of the population uses the Internet, the top question is whether pregnant women can drink cold water.

    Stephens-Davidowitz's analysis is mostly anecdotal but a fun read.

    I want to see something like this direct from Google, with more rigor. Now that would be interesting.

  • Strava Metro aims to help cities improve biking routes

    Posted to Data Sharing  |  Tags: , ,

    Strava Metro MelbourneLast month, Strava, which allows users to track their bike rides and runs, launched an interactive map that shows where people move worldwide. That seems to be a lead-in to their larger project Strava Metro. Here's the pitch:

    Strava Metro is a data service providing "ground truth" on where people ride and run. Millions of GPS-tracked activities are uploaded to Strava every week from around the globe. In denser metro areas, nearly one-half of these are commutes. These activities create billions of data points that, when aggregated, enable deep analysis and understanding of real-world cycling and pedestrian route preferences.

    Strava had a handful of clients before the official launch, such as the Oregon Department of Transportation. From Bike Portland:

    Last fall, the agency paid $20,000 for one-year license of a dataset that includes the activities of about 17,700 riders and 400,000 individual bicycle trips totaling 5 million BMT (bicycle miles traveled) logged on Strava in 2013. The Strava bike "traces" are mapped to OpenStreetMap.

    This is what I was getting at with those running maps, so it's great to see that Strava was already on it.

    It'll be interesting to see where this goes, not just business-wise, but with data sharing, privacy, and how users react to their (anonymized) data being sold.

  • Your income versus what it feels like

    Posted to Statistical Visualization  |  Tags: , ,

    Incomes and the cost of living vary across the country. Some areas might have high median income, but the cost of living is also high. Similarly, areas might have low median income, but the cost of living is relatively low. So what happens when you take the income from the former and then move to the latter? The Bureau of Economic Analysis released estimates that help make that comparison.

    Quoctrung Bui for NPR made that data more accessible with a slope graph. On the left is median income, and on the right is what it feels like. Enter your metro area to focus on your point of interest.

  • Machine learning a cappella on overfitting

    Posted to Statistics  |  Tags: ,

    From the machine learning course on Udacity, an a cappella group sings a Thriller parody on overfitting. At first you're like, "Is this real? Am I dreaming?" Then you're like, "Oh my god, he has a gold glove on." And then you're like, "Yes! This is real! Oh internets, I adore you so."

  • Military infographic fascination

    Posted to Infographics  |  Tags: , ,

    Paul Ford describes his fascination with military infographics. Here's what he has to say about the graphic above:

    Take some time with that graphic. After a while you realize that this image could be used anywhere in any paper or presentation and make perfect sense. This is a graphic that defines a way of describing anything that has ever existed and everything that has ever happened, in any situation. The United States Military is operating at a conceptual level beyond every other school of thought except perhaps academic philosophy, because it has a much larger budget.

    Never mind the aesthetics and readability. It's the content and the scale at which these graphics are presented that make them fascinating. Okay, and maybe the aesthetics and readability lend to the entertainment value, too.

  • Beaker allows data exploration in various languages

    Posted to Software

    Currently in beta, Beaker lets you work and experiment with data with different languages, but in one environment.

    Beaker is a code notebook that allows you to analyze, visualize, and document data using multiple programming languages including Python, R, Groovy, Julia, and Node. Beaker's plugin-based polyglot architecture enables you to seamlessly switch between languages and add support for new languages.

    Sounds like a good place to tuck away your snippets or development in the early stages of larger projects.

  • The United States of Metrics isn’t such a bad thing

    Posted to Self-surveillance  |  Tags:

    Bruce Feiler for The New York Times describes his concern and distaste for data collection and analysis.

    In the last few years, there has been a revolution so profound that it's sometimes hard to miss its significance. We are awash in numbers. Data is everywhere. Old-fashioned things like words are in retreat; numbers are on the rise. Unquantifiable arenas like history, literature, religion and the arts are receding from public life, replaced by technology, statistics, science and math. Even the most elemental form of communication, the story, is being pushed aside by the list.

    The results are in: The nerds have won. Time to replace those arrows in the talons of the American eagle with pencils and slide rules. We've become the United States of Metrics.

    That's how the full article reads. Grouchy.

    Feiler jumps into a handful of examples that could've easily been used as positives, had they been in an article about the boom of data. For instance, he scoffs at a project from New York University and Hudson Yards that aims for a "smart community" that tracks pedestrian traffic, air quality, and energy consumption. Is it better to not know these things? Should we rely entirely on word of mouth for every problem in a city that can easily be fixed? That's a tough sell.

    He does suggest that we need balance between data-informed and data-only decisions, and yes to this absolutely, but he also suggests that we've already reached a maximum for the amount of data we want in our lives.

    The underlying premise is that if we observe, journal, and experiment our lives in data, we take away from the joy of living. Sports is less fun to watch and food doesn't taste as good. That's another tough sell.

    Here's how I see it: I strongly believe in going with your gut instincts. It's led me in the right direction more often than not. But, sometimes I move in the wrong direction, or I don't know enough about a subject and all I have is uncertainty. If there's data there to help then all the better.

  • Drought map shows extreme shortages

    Posted to Mapping  |  Tags:

    From the U.S. National Drought Monitor.

    The entire state of California is in some level of drought, much of it extreme to exceptional. Snowpack is 50 percent of normal in many locations in the West, and Svoboda noted that a lot of snow has completely melted before it normally would.

    Drought has had a serious impact on fruit and vegetable agriculture in California, and news reports sounded the alarm for grains and livestock in the Plains and South Central West. At least 54 percent of the nation’s wheat crop is affected by some level of drought, as is 30 percent of corn, and 48 percent of cattle.

    Hey, Californians, if you could dial your sprinklers down a couple notches so that we can bathe this summer, that'd be great. Thanks.

  • A majority of your email in Gmail, even if you don’t use it

    Posted to Statistics  |  Tags:

    For reasons of autonomy, control, and privacy, Benjamin Mako Hill runs his own email server. After a closer look though, he realized that much of the email he sends ends up in Gmail anyway.

    Despite the fact that I spend hundreds of dollars a year and hours of work to host my own email server, Google has about half of my personal email! Last year, Google delivered 57% of the emails in my inbox that I replied to. They have delivered more than a third of all the email I've replied to every year since 2006 and more than half since 2010. On the upside, there is some indication that the proportion is going down. So far this year, only 51% of the emails I've replied to arrived from Google.

    Factor in the other services such as Yahoo, Hotmail, etc, I imagine that majority percentage goes up quite a bit. If you want to look at your own inbox Gmail count, Hill posted the scripts for your perusal.

    This tutorial on downloading email metadata might be helpful too, if you're looking for a more general script.

  • Newborn false positives

    Posted to Mistaken Data  |  Tags: ,

    Shutterfly sent promotional emails that congratulate new parents and encourage them to send thank you cards. The problem: a lot of people on that list weren't new parents.

    Several tipsters forwarded us the email that Shutterfly sent out in the wee small hours of this morning. One characterized the email as "data science gone wrong." Another says that she had actually been pregnant and would have been due this month, but miscarried six months ago. Is it possible that Shutterfly analyzed her search data and just happened to conclude, based on that, that she would be welcoming a child around this time? Or is it, as she hoped via email, "just a horrible coincidence?"

    Only Shutterfly knows what actually happened (They insist it was a random mistake.), but it sounds like a naive use of data somewhere in the pipeline. Maybe someone remembered the Target story, got excited, and forgot about the repercussions of false positives. Or, maybe someone made an incorrect assumption about data points with certain purchases and didn't test thoroughly enough.

    In any case, this slide suddenly takes on new meaning.

  • Alcohol consumption per drinker

    We've seen rankings for alcohol consumption per capita around the world. These tend to highlight where people drink and abstain, but what about consumption among only those who drink? The Economist looked at this sub-population. Towards the top, you see countries where much of the population abstains but those who do drink appear to drink at higher volumes.

    Of course, it's better to take this with a grain of salt until you see the standard errors on these estimates.

  • Share your traces with a stranger

    Posted to Self-surveillance  |  Tags: ,

    The MIT Media Lab Playful Systems group is working on an experiment in data sharing, on a personal level. It's called 20 Day Stranger. You install an app on your phone that tracks your location and what you're doing, and that information is anonymously shared with a stranger. You also see what that stranger is doing.

    I can't decide if this is creepy or touching, or somewhere in between. I put myself on the waiting list to find out, but I imagine the experience has a little bit to do with the app and much more to do with the stranger on the other side.