• I’m late to this party. TileMill, by mapping platform MapBox, is open source software that lets you quickly and easily create and edit maps. It’s available for OS X, Windows, and Ubuntu. Just download and install the program, and then load a shapefile for your point of interest.

    For those unfamiliar with shapefiles, it’s a file format that describes geospatial data, such as polygons (e.g. countries), lines (e.g. roads), and points (e.g. landmarks), and they’re pretty easy to find these days. For example, you can download detailed shapefiles for roads, bodies of water, and blocks in the United States from the Census Bureau in just a few clicks.

    The fun part is that you can easily customize the maps using a map stylesheet, which is similar to CSS. There are examples with the software, so you can get a feel for how everything fits together. You can also export your results as an image file or as SVG to edit in your favorite vector-editing software. Or if you want to publish your map online, it’s straightforward to upload it to MapBox with an account.

  • During the Olympics, Studio NAND, Moritz Stefaner, and Drew Hemment tracked Twitter sentiment with Emoto. This interactive installation and data sculpture is the last leg of the project.

    The emoto data sculp­ture repres­ents message volumes, aggreg­ated per hour and senti­ment level in hori­zontal bands which move up and down according to the current number of Tweets at each time. This resulted in simpli­fied 3-dimensional surfaces which allows visitors to identify patterns in message frequency distri­bu­tion more easily. And while not being specific­ally designed in this direc­tion, the surfaces also nicely support haptic exploration.

    The sculpture itself is black and unchanging, and it’s used as a projection surface to display a heat map and overlay text. The projection is controlled by the user, which makes for an interesting blend of physical and digital.

  • A couple of years ago, xkcd ran a survey that asked people to name colors. Stephen Von Worley plotted that data by gender in an interactive.

    That’s a dot for each of the 2,000 most commonly-used color names as harvested from the 5,000,000-plus-sample results of XKCD’s color survey, sized by relative usage and positioned side-to-side by average hue and vertically by gender preference. Women tend to use color names nearer the top, men towards the bottom, and the dashed line represents the 50-50 split (equal usage by both sexes).

    While his original version was static, the interactive version lets you sort by hue, saturation, brightness, popularity, and name length. Most importantly, you can see the color names now when you mouse over. I like the vertical spectrum of purple, where women use names like bright lilac, orchid, and heather, and men tend to label similar shades as purplish, lightish purple, and oh yes, very light purple. [Thanks, Stephen]

  • Thomas H. Davenport and D.J. Patil give the rundown on what a data scientist is, what to look for and how to hire them. It’s an article in Harvard Business Review, so it’s geared towards managers, and I felt like I was reading a horoscope at times, but there are some interesting tidbits in there.

    Data scientists don’t do well on a short leash. They should have the freedom to experiment and explore possibilities. That said, they need close relationships with the rest of the business. The most important ties for them to forge are with executives in charge of products and services rather than with people overseeing business functions. As the story of Jonathan Goldman illustrates, their greatest opportunity to add value is not in creating reports or presentations for senior executives but in innovating with customer-facing products and processes.

    I still call myself a statistician. The main difference between data scientist and statistician seems to be programming skills, but if you’re doing statistics without code, I’m not sure what you’re doing (other than theory).

    Update: This recent panel from DataGotham also discusses the data scientist hiring process. [Thanks, Drew]

  • This month the Netherlands held national elections, and now that the results are in, interaction designer Jan Willem Tulp had a look at voting similarity between cities. I’m not sure what metric was used to judge similarity, but it looks like it was based on voting distributions for candidates.

    Each circle represents a city, and you can choose between a geographic layout or a radial one. When you select a circle, the others change size and color, where more red and larger means more similar. In the radial layout, circles that are farther are away are less similar. Be sure to look at the city of Urk in the radial layout. According to Tulp, it’s the most religious city, and it votes completely differently from the rest. [Thanks, Jan]

  • I’m not sure what I’d do with Ablaze.js, a JavaScript library by Patrick Gunderson, but the results are sexy. Play around with the app here. [via @jeffclark]

  • The Forest of Advocacy is a series of animations that explores the political contribution patterns among eight organizations, such as Bain Capital, Goldman Sachs, and Harvard Business School.

    These visualizations provide a dynamic look at the partisan tilt of giving within organizations. For each organization, individuals are characterized as points sketching out a line over time. The X axis is time, and the Y axis represents the net partisan tilt of contributions over the preceding 6 months. Over the decades, one sees lines sketched out, reflecting the partisanship of individuals over time. For each organization, we also provide the net contributions of the entire organization, and the names of biggest Democratic, Republican, and “bipartisan” contributors (the individual with the highest product of Democratic and Republican contributions).

    At the core, each animation is a time series chart, but the aesthetic and animation, which is narrated, provides for a more organic feel. In particular, the movements of people, represented by squares shifting straight across or up and down, makes it easy to see consistent and not so consistent contributions. [Thanks, Mauro]

  • Aaron Rueben and Gabriel Isaacman used data from sampling air in tunnels, where there are a lot of cars, to create unique soundscapes that represent the chemicals in the area.

    We created sounds from air samples (atmospheric particulate matter collected on filters) by first using gas chromatography to separate the thousands of compounds in the air (try it with markers at home) and then using mass spectrometry, which gives us a unique “spectrum” for chemicals based on their structure, to identify the compounds and assign them tones. Some compounds end up sounding clear and distinct, while others blur together into unresolvable chords. The result is a qualitative, sensory experience of hard, digital data. You can actually hear the difference between the toxic air of a truck tunnel (clogged with diesel hydrocarbons and carcinogenic particulate matter) and the fragrant air of the High Sierras.

    The audio above represents the air in the Caldecott Tunnel Oakland, California. Note the heavy hydrocarbons towards the end. Contrast that with the audio for a remote forest in the Sierras below.

  • Emily Chow, Ted Mellnik, and Karen Yourish for The Washington Post mapped where the candidates and their wives have visited since June in an interactive with filters and multiple views.

    On load, you see the visits of the eight, with a comparison between Democrats and Republicans. The map on top shows where, and the time series on the bottom shown when. Click on the map, and it zooms to show visits at city-level, and a click on a time slice updates a list of individual visits. Furthermore, you can select the individuals or categories for just the last 30 days, fundraisers, or your state.

    The interaction lets you narrow down quickly and easily to what you care about. The only other thing I would’ve liked to see is a tighter coupling between the time series and the map.