• It was a rough year, which brought about a lot of good work. Here are my favorite data visualization projects of the year.

  • Based on data from CAL FIRE, Erin Ross, for Axios, plotted California wildfires that spanned at least 300 acres since 2000. Each triangle represents a fire, where the height represents acres burned (width is the same for all triangles) and color represents duration. The fires appear to be burning hotter and longer.

    I wonder if it’s worth doubling up on the triangle encoding by using width to represent duration, similar to the Washington Post graphic made during the elections.

  • Fire spread over Los Angeles, but the famous art works in the Getty Center stayed put. John Schwartz and Guilbert Gates reporting for The New York Times:

    The Getty’s architect, Richard Meier, built fire resistance into the billion-dollar complex, said Ron Hartwig, vice president of communications for the J. Paul Getty Trust. These hills are fire prone, but because of features like the 1.2 million square feet of thick travertine stone covering the outside walls, the crushed rock on the roofs and even the plants chosen for the brush-cleared grounds, “The safest place for the artwork to be is right here in the Getty Center,” he said.

    It’s a short visual piece, but I found the forethought in building design and the straightforward graphics fascinating.

  • Hundreds of thousands of families were displaced in the 1950s under “urban renewal” programs. The families were disproportionately minorities. Renewing Inequality, from a research group at the University of Richmond’s Digital Scholarship Lab, revisits the topic and how it reflects in the present.

    Renewing Inequality presents a newly comprehensive vantage point on mid-twentieth-century America: the expanding role of the federal government in the public and private redevelopment of cities and the perpetuation of racial and spatial inequalities. It offers the most comprehensive and unified set of national and local data on the federal Urban Renewal program, a World War II-era urban policy that fundamentally reshaped large and small cities well into the 1970s.

    Many of the people displaced never receive promised compensation or fair market value for their property, which is kind of messed up.

    Learn.

  • Oh. It’s that time of year already. Time to hate on the rainbow color scale, which is still prevalent but equally less useful than alternatives. Matt Hall provides (scientific!) reasons for looking to scales that don’t include the full spectrum and some solutions.

    We know what kind of colourmaps are good for interpretation: those that increase linearly and monotonically in brightness, with no jumps or stripes of luminance. I’ve linked to lots of places where you can read about these — see the end of the post. You already know one perceptual colourmap: the humble Greyscale. But there are lots of others, so let’s start with one of them.

  • Statistics professor Di Cook was one of the first people I ever talked to about visualization. She has a short Q&A over at StatsChat.

    I spent a few years doing that [a research assistant] and then realised I’d really like to make art, because some of the research-assistant work I was doing was computer graphics for data online. It fed into my art instincts from teenage years, so I spent some time as an artist before finding a graduate programme in statistics in the US that focused on data visualisation.

  • If you want to analyze bodies of text, it’s a good to know how to use regular expressions. That way you can programmatically extract complex text patterns instead of marking and encoding items manually. Thomas Nield for O’Reilly provides an introduction:

    Many data science, analyst, and technology professionals have encountered regular expressions at some point. This esoteric, miniature language is used for matching complex text patterns, and looks mysterious and intimidating at first. However, regular expressions (also called “regex”) are a powerful tool that only require a small time investment to learn. They are almost ubiquitously supported wherever there is data.

    Nield says it isn’t a steep learning curve, which I agree with, but I would suggest not trying to learn every part of the syntax. Learn it piecewise, and it’ll seem like less of a jumble of brackets, periods, and question marks.

    See also the RegExr. It’s an interactive tool that lets you paste a body of text and then enter regular expressions to see what matches your given pattern in real-time.

  • The New York Times is back at it in explaining the creative process. A couple of years ago they explained the making of a Justin Bieber song. This time they talked to Ed Sheeran and his collaborators about the making of their hit song Shape of You. The musicians talk and the visualization serves as a backdrop.

  • Hearing about machine learning and algorithms a lot recently and not sure what that means? CGP Grey explains:

  • For the past few months, Cards Against Humanity polled the American public to ask important questions such as whether or not it is okay to pee in the shower.

    To conduct our polls in a scientifically rigorous manner, we’ve partnered with Survey Sampling International — a professional research firm — to contact a nationally representative sample of the American public. For the first three polls, we interrupted people’s dinners on both their cell phones and landlines, and a total of about 3,000 adults didn’t hang up immediately. We examined the data for statistically significant correlations, and boy did we find some stuff.

    The poll is in the context of political leanings, which leads to some interesting cross-sections.

    Maybe the best part though is that CAH will continue to poll for a full year, and you can download the data, which I am sure makes for a fun class project. They are also asking social scientists for question suggestions that would otherwise go unasked by more traditionally funded public polling.

  • Here’s a different look at tax cuts and increases from Reuben Fischer-Baum for The Washington Post. As Fischer-Baum points out, keep in mind that these are just estimates and they calculations vary:

    Analyses that use data from real taxpayers as their starting point – like the calculator put together by the New York Times – produce lower estimates. Other calculators like the one put together by the Wall Street Journal produce similar results to ours. For example, a household in D.C. filing jointly with two kids under 17, earning a total of $150,000 and itemizing $20,000 gets a tax cut of $3,796 in our analysis. Roughly equivalent inputs to the New York Times calculator produces an estimated range of a $1,020 to $3,280 cut, while the Wall Street Journal calculator – which is based on the less generous House bill – produces a cut of $3,230.

  • I’m pretty sure this is all that most people want to know. The Upshot provides a tax calculator that considers the Republican tax bill and the variation of taxes between households that earn similar incomes. Punch in some information like income range and marital status, and you get a range of tax cuts or increases for households similar to yours.

  • The David Rumsey Map Collection, known for its many browsable historical maps, now has a “data visualization” subject tag. This means you can now quickly access over 1,000 charts that date back centuries. I’m not sure how long the browser has had the filter available, but I’m glad it does. [via @srendgen]

  • I heard you like maps. Jim Vallandingham put together a collection of maps that show multiple variables, for inspiration and perusal.

  • Disney is set to buy 21st Century Fox for $52.4 billion. I honestly don’t have the mental capacity or imagination to comprehend such a large sum, much less figure out how such a deal works. At least Youyou Zhou, reporting for Quartz, provides breakdowns of market share for the two companies, which makes things a bit more understandable. If the deal goes through, Disney is going to be (an even bigger) behemoth.

  • Introducing yourself to R as an Excel user can be tricky, especially when you don’t have much programming experience. It requires that you switch from one mental model of the data that exists in an interactive spreadsheet to one that exists in vectors and lists. Steph de Silva provides a translation of these data structures for Excel users.

  • Research group Euphrates experimented with lines and a ballet dancer’s movements in Ballet Rotoscope:

    By the way, rotoscoping is an old technique used by animators to capture movement. Pictures or video are taken and lines are traced for use in different contexts. [via @Rainmaker1973]

  • Doug Mills, reporting for The New York Times:

    Echoing his days as a real estate developer with the flair of a groundbreaking, Mr. Trump used an oversize pair of scissors to cut a ribbon his staff had set up in front of two piles of paper, representing government regulations in 1960 (20,000 pages, he said), and today — a pile that was about six feet tall (said to be 185,000 pages).

    Interpret as you like.

  • Statistician Kristian Lum described her experiences with harassment as a graduate student at stat conferences. She held back on talking about it for many of the same reasons others have, but then there was a shift and she began warning colleagues.

    I started doing this because I heard that S (for the second time to my knowledge) had taken advantage of a junior person who had had too much to drink. This time, his act had been witnessed first-hand by several professors at the conference. Since then, I have heard one professor who witnessed the incident openly lament that he’ll have to find a way to delicately advise his female students on “how not to get raped by S” so as not to lose promising students.

    What the hell? Unacceptable.