• RJ Andrews has a visualization design book coming out in January called Info We Trust. He hand-drew about 300 graphics for the book. One of the reasons:

    I decided very early that Info We Trust would not use any existing images, mine or others. Found examples work fine for certain books. They are also convenient, the work is already done! But found images bring baggage too. You might choose an existing work to highlight one aspect of its design. But the reader will see other facets too.

    I’m intrigued. On the upside, you get a consistent visual flow. A focused point of view. On the downside: the possibility of a view that is too focused. I have a feeling there will be more upside than downside. [Amazon pre-order]

  • Michael Correll on the use of “visualization literacy” in research:

    If people (and, by some definitions, many or even most people) are chart illiterates, then we may feel tempted to write those groups off. We may prioritize the design of visualizations to help the creators of, say, machine learning models, from whom we can presume a sufficient level of visual and statistical literacy, rather than the populations who may be impacted by these models (sometimes unjustly). If what we mean by “visualization literacy” is narrow enough, or rare enough, then we’re already setting ourselves mental upper bounds for the number of people we’ll impact with our work.

    This was an interesting perspective. I’m used to listening to or reading from people on the presentation side of visualization, in which case it’s your job to raise literacy. You should imagine what others are thinking and explain any points of possible confusion with annotation and intuitive visual encodings.

    Don’t ever use “people won’t understand it” as a crutch.

  • Jonathan Schwabish gave his fourth-grade son’s class a lesson on data visualization. He wrote about his experience:

    I’d love to see a way to make data visualization education a broader part of the curriculum, both on its own and linked with their math and other classes. Imagine adding different shapes to maps in their Social Studies classes to encode data or using waterfall charts in their math classes to visually demonstrate a simple mathematical equation or developing simple network diagrams in science class. The combination of the scientific approach to data visualization and the creativity it sparks could serve as a great way to help students learn.

    Maybe I should introduce Schwabish’s Match It Game to the Yau household. My five-year-old has been asking why I keep “doing data.”

  • Members Only

    Throughout the month I collect new tools for data and visualization and additional resources on designing data graphics. Here’s the new stuff for November.

  • Brian Brettschneider made a joke map randomly designating the favorite pies of certain areas. While intended as a joke and a parody of past “favorite” maps, some people took it too seriously — like Senator Ted Cruz. Brettschneider describes the lessons he learned:

    Maps hold a special standing among the public. We tend to place very high value in maps as holders of accurate information. If it’s in a map, it must be true, right? If I had tweeted a joke list of favorite pies by region, it would be very quickly ignored. Since it was in map form, it had an air of authenticity. This matters. As cartographers, we need to be cognizant of the power that maps have.

    I had a similar experience years ago with Data Underload. These days I use the project as a catch-all for my own analyses, but I started it as a comic-ish chart series that I used to communicate anecdotes, opinions, and random stuff based on no actual data.

    The form confused people at times, especially for those unfamiliar with the series, because it looked so much like charts based on real data. It was like reading a sarcastic comment online from someone you don’t know and trying to guess if it’s serious or not.

    So now I keep the real-looking charts for real data unless it’s super obvious. Sometimes your choices on form can lead to unintended interpretations — like a joke taken as serious commentary.

  • Based on the “half-your-age-plus-seven” rule, the range of people you can date expands with age. Combine that with population counts and demographics, and you can find when your non-creepy dating pool peaks.

  • The maps that we imagine as we think about locations around the world often don’t match up with reality. Betsy Mason for National Geographic explains the discrepancy. On the misalignment of Europe:

    Europe is also often placed much farther south on mental maps than it really is, appearing directly across the Atlantic from the contiguous United States. But it actually lines up better with Canada: Paris is further north than Montreal, Barcelona is at a similar latitude as Chicago, and Venice lines up with Portland, Oregon.

    …and the world as we knew it was never the same again.

  • Researchers at the University of Chicago’s Energy Policy Institute estimated the number of years lost and the number of people affected due to particulate matter in the air. They estimated per country. The Washington Post used a mosaic plot, aka a Marimekko chart, to show the differences.

    The width of each column represents total population for a country. The sections in each columns represent the number of people who will lose a certain number of years. Color represents average years of life lost.

    These charts are often a bit confusing at first glance, but the scrolling format used here provides some guidance.

  • Members Only

    In the spirit of the holidays, here are the tools I am most thankful for. Without them, work would be much more tedious and painful.

  • Michelle Chandra uses street data as a base for solvable mazes:

    I draw each maze map by hand using the real street data of cities. In keeping with the fun nature of my art, I choose iconic city landmarks for the start and end of each maze – landmarks like the Golden Gate Bridge, Coney Island, or the Santa Monica Pier. All my maze maps are tested with friends and family to make sure they are, well, challenging to solve!

    Grab a screen print here. Each one comes with an extra sheet to solve for yourself.

  • In news graphics, blue typically represents Democrat and red represents Republican. However, the definition isn’t so clear-cut by actual party usage. Chris Alcantara for The Washington Post broke it down in 900 campaign logos used during the recent midterms. Each strip represents a logo.

  • Kyle McDonald describes some of the history and current research on using algorithms to generate music. On how David Cope incorporated Markov chains to aid in his work:

    In 1981 David Cope began working with algorithmic composition to solve his writers block. He combined Markov chains and other techniques (musical grammars and combinatorics) into a semi-automatic system he calls Experiments in Musical Intelligence, or Emmy. David cites Iannis Xenakis and Lejaren Hiller (Illiac Suite 1955, Experimental Music 1959) as early inspirations, and he describes Emmy in papers, patents, and even source code on GitHub. Emmy is most famous for learning from and imitating other composers.

    I expected samples to sound robotic and unnatural, and some are, but some are quite pleasant to listen to.

  • When you go skiing or snowboarding, you get a map of the mountain that shows the terrain and where you can go. James Niehues is the man behind many of these hand-painted ski maps around the world, and he has a kickstarter to catalog his life’s work.

    This is kind of amazing. I went skiing a lot as a kid, and I have distinct memories of these maps. I would stand at the top of the mountain, rip off one of my gloves with my teeth, and then pull out a folded map from a zipped pocket. I never knew they were by the same man, but in retrospect, it makes sense.

  • Newsy, Reveal and ProPublica look into rape cases in the U.S. and law enforcement’s use of exceptional clearance.

    The designation allows police to clear cases when they have enough evidence to make an arrest and know who and where the suspect is, but can’t make an arrest for reasons outside their control. Experts say it’s supposed to be used sparingly.

    Culled data from various police departments shows the designation is used more often that one would expect.

  • The Camp fire death toll rose to 63 and 631 missing as of yesterday. The Los Angeles Times provides some graphics showing scale and the buildings that burned.

    Ugh. I live a few hundred miles away and the smoke is bad enough that my son’s school is closed today. It has not been a good year for California in terms of wildfires.

  • Members Only

    Important question: Is animation in visualization even worthwhile? Well, it depends. Surprise, surprise. In this issue, I look at animation in data visualization, its uses, and how I like to think about it when I implement moving data.

  • I’m behind on my podcast listening (well, behind in everything tbh), but Reply All covered the flaws of CompStat, a data system originally employed by the NYPD to track crime and hold officers accountable:

    But some of these chiefs started to figure out, wait a minute, the person who’s in charge of actually keeping track of the crime in my neighborhood is me. And so if they couldn’t make crime go down, they just would stop reporting crime. And they found all these different ways to do it. You could refuse to take crime reports from victims, you could write down different things than what had actually happened. You could literally just throw paperwork away. And so that guy would survive that CompStat meeting, he’d get his promotion, and then when the next guy showed up, the number that he had to beat was the number that a cheater had set. And so he had to cheat a little bit more.

    I sat in on a CompStat meeting years ago in Los Angeles. I went into it excited to see the data system that helped decrease crime, but I left skeptical after hearing the discussions over such small absolute numbers, which in turn made for a lot of fluctuations percentage-wise. Maybe things are different now a decade later, but I’m not surprised that some intentionally and unintentionally gamed the system.

    See also: FiveThirtyEight’s CompStat story from 2015.

  • Atma Mani, a geospatial engineer for ESRI, imagined shopping for a house with data, maps, and analysis. Basically, a personalized recommendation system:

    The type of recommendation engine built in this study is called ‘content based filtering’ as it uses just the intrinsic and spatial features engineered for prediction. For this type of recommendation to work, we need a really large training set. In reality nobody can generate such a large set manually. In practice however, another type of recommendation called ‘community based filtering’ is used. This type of recommendation engine uses the features engineered for the properties, combined with favorite / blacklist data to find similarity between a large number of buyers. It then pools the training set from similar buyers to create a really large training set and learns on that.

    I love going all nerd on these sort of things. The most interesting part for me though is that it always seems to come down to a gut feeling. You have to see the house and get a feel for the area, which is much harder to get through data. So then, how do you couple the information you get from the data with more fuzzy emotions?

  • From Streetscapes by Zeit:

    Street names are stories of life. They tell us something about how the people in a given place work and live, what they believe in and their dreams. There are more than a million streets and squares in Germany. ZEIT ONLINE has compiled a database of the roughly 450,000 different names used. Some street names are used hundreds of times and others only once. But none of the names were chosen at random.

    It’s for street names in Germany, so the meaning might be lost for many of you, but much of the data comes from OpenStreetMap, which should mean something like this is doable for other cities and countries.

    See also the San Francisco history of street names mapped by Noah Veltman a few years ago. [via @maartenzam]

  • Reading visualization research papers can often feel like a slog. As a necessity, there’s usually a lot of jargon, references to William Cleveland and Robert McGill, and sometimes perception studies that lack a bit of rigor. So for practitioners or people generally interested in data communication, worthwhile research falls into a “read later” folder never to be seen again.

    Multiple Views, started by visualization researchers Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini, aims to explain the findings and the studies to a more general audience. (The UW Interactive Data Lab’s feed comes to mind.) Maybe the “read later” becomes read.

    I’m looking forward to learning more. These projects have a tendency to start with a lot of energy and then fizzle out, so I’m hoping we can nudge this a bit to urge them on. Follow along here.