FlowingData

My chat with Data Stories

September 14, 2018

Topic
Site News / Data Stories

I talked with Moritz and Enrico on Data Stories, my favorite visualization podcast. They’ve been providing a healthy balance of practice and research since 2012.

I don’t dare listen to myself, but based on the show notes we talked about FlowingData over the years, some of the changes in visualization, and answered listener questions. You can listen here.
Realistic storm surge depicted in Weather Channel forecast

September 13, 2018

Topic
Infographics / flood, hurricane, weather

The Weather Channel is using a realistic 3-D depiction surrounding a reporter to show what a storm surge might bring. Here, just watch it:
Waffle House index as a storm indicator

September 13, 2018

Topic
Statistics / correlation, Waffle House, weather

Waffle House activated their storm center in preparation for Hurricane Florence. Their restaurants are open 24/7, so they need to keep track of which ones need to close or limit their menus. This might also have to do with an informal Waffle House Index that FEMA described last year:

If a Waffle House can serve a full menu, they’ve likely got power (or are running on a generator). A limited menu means an area may not have running water or electricity, but there’s gas for the stove to make bacon, eggs, and coffee: exactly what hungry, weary people need.

It’s more than just a Waffle House though.

Businesses in communities are often some of the biggest drivers of recovery. If stores can open, people can go back to work. If people can go back to work, they can return to at least one piece of a normal life—and that little piece of normalcy can make a big difference.

Hold up. I think I got it. If we just keep all the businesses open, we can avoid all disaster. That’s how causation-correlation works, right? Nailed it.

(Stay safe, Carolinians.)
Turning water pollution into audiolized awareness

September 13, 2018

Topic
Data Art / audiolization, pollution, water

Brian House collected polluted water with acid mine drainage in the Tshimologong Precinct, Johannesburg and translated pollution levels to sound:

Acid Love comprises vessels of AMD gathered from a mine on the outskirts of the city. These are connected in an electrical circuit that measures the conductivity from the metals of the water and coverts it into sound. The sound is further modulated by data gathered from remediation efforts at the mine. The installation itself also performs a remediation process—over time the metals will precipitate to the bottom of the vessels, and both the sound and the color of the water will change as it is purified.

[via @blprnt]
Members Only

Google Dataset Search Impressions, the Challenges of Looking for Data, and Other Places to Find Data

September 13, 2018

Topic
The Process / Google, search

Google released Dataset Search to the world last week. Here are my first impressions.
Hurricane Florence trackers

September 12, 2018

Topic
Maps / hurricane, tracker

Hurricane Florence is forecast to touch down Thursday night or Friday, and what’s become the norm, there are several ways to see where the hurricane is and where it might go. Here are a handful of views. Each focuses on different aspects of potential storm.
Read More
Algorithms to fix underrepresentation on Wikipedia

September 12, 2018

Topic
Statistics / gender equality, machine learning, Wikipedia

Wikipedia is human-edited, so naturally there are biases towards certain groups of people. Primer, an artificial intelligence startup, is working on a system that looks for people who should have an article. It’s called Quicksilver.

We trained Quicksilver’s models on 30,000 English Wikipedia articles about scientists, their Wikidata entries, and over 3 million sentences from news documents describing them and their work. Then we fed in the names and affiliations of 200,000 authors of scientific papers.

In the morning we found 40,000 people missing from Wikipedia who have a similar distribution of news coverage as those who do have articles. Quicksilver doubled the number of scientists potentially eligible for a Wikipedia article overnight.

Then, after it finds people, it generates sample articles to get things started.
Interactive recreation of an 1821 color guidebook

September 11, 2018

Topic
Design / color, Nicholas Rougeux, recreation, vintage

I’m always down for faux vintage, online recreations of actual vintage visualization-related things. Using scans from the real thing, Nicholas Rougeux recreated Werner’s Nomenclature of Colours, supplementing with interaction and photo references.
Night lights mapped as terrain

September 10, 2018

Topic
Maps / light, terrain

You’ve probably seen the maps of Earth at night. It gives you a good idea of activity around the world, through the eyes of light. As an experiment and a shift in view, Jacob Wasilkowski mapped the light as terrain.
Visualization in the 1980s, just before the rise of computers

September 7, 2018

Topic
Design / tools, vintage

Graham Douglas, a data journalist at The Economist, looks back on the days when getting data and visualizing it was tedious from start to finish:

But even these seemingly simple charts had their challenges and took a lot of time to make. Data were found in books by a research department skilled in the art of extracting obscure economic figures and statistics, which were copied to scraps of paper. We would use rulers, dividers, protractors and geometry (Thales’s theorem) to divide axis lines into equal parts to draw the scale ticks. We would plot the data manually in pencil on a special drawing board and sketch out the wording and title for approval before we inked the whole thing in. Text was added last using stencilling, or later, Letraset dry-transfer lettering. Making a spelling mistake was distressing. Areas were filled with sticky-back plastic pre-printed film cut out with a scalpel.

Maybe grabbing data out of PDF files isn’t so bad.

No. Still horrible.

This reminds me of my dad’s work though. He’s a retired civil engineer. When I was young, he brought home these giant blueprints. He’d roll them out after dinner, and armed with a protractor, a scaled ruler, and a calculator I could never figure out, he’d mark up building plans. Towards the end of his career, he kept everything on a flash drive.
Live polling results for transparency and a way to learn about the process

September 6, 2018

Topic
Statistics / polling, real-time, Upshot

In a collaboration with Siena College, The Upshot is showing live polling results. The ticker moves in real-time for every phone call.

For the first time, we’ll publish our poll results and display them in real time, from start to finish, respondent by respondent. No media organization has ever tried something like this, and we hope to set a new standard of transparency. You’ll see the poll results at the same time we do. You’ll see our exact assumptions about who will turn out, where we’re calling and whether someone is picking up. You’ll see what the results might have been had we made different choices.

Gulp.
Members Only

Make It Mean Something or It Didn’t Happen

September 6, 2018

Topic
The Process / audience, purpose

Visualization as template-filling content is lazy visualization that no one draws benefit from. Give people a reason to care.
Google Dataset Search now in public beta

September 6, 2018

Topic
Data Sources / Google, search

Datasets are scattered across the web, tucked into cobwebbed corners where nobody can find them. Google Dataset Search aims to make the process easier:

Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher’s site, a digital library, or an author’s personal web page. To create Dataset search, we developed guidelines for dataset providers to describe their data in a way that Google (and other search engines) can better understand the content of their pages. These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset.

I’m always a little wary of dataset search engines. They never seem to live up to their promises, because they always require that those with the data do a little bit of work, such as publish metadata that makes indexing easier. But this is Google. I’ll have to give it a go the next time a curiosity pops in.
Experience a soccer game through crowd noise

September 5, 2018

Topic
Statistical Visualization / soccer, sound, sports

Sports visualization and analysis tends to focus on gameplay — where the players are, where the ball goes, etc. In Reimagine the Game, the focus in on crowd noise through the course of a game. Pick a game and see the waves of noise oscillate through the arena during significant events.

It’s an advertisement feature on The Economist, which is kind of interesting, but it’s still fun to watch the games play out.
Hotter days where you were born

September 4, 2018

Topic
Infographics / climate, New York Times, temperature, uncertainty

It’s getting hotter around the world. The New York Times zooms in on your hometown to show the average number of “very hot days” (at least 90 degrees) since you were born and then the projected count over the next decades. Then you zoom out to see how that relates to the rest of the world.

I’ve always found it interesting that visualization and analysis are typically “overview first, then details on demand”, whereas storytelling more often goes the opposite direction. Focus on an individual data point first and then zoom out after.
Counting baseball cliches

August 31, 2018

Topic
Statistics / baseball, cliche, Washington Post

Post-game sports interviews tend to sound similar. And when you do say something out of pattern, the talk shows and the social media examine every word to find hidden meaning. It’s no wonder athletes talk in cliches. The Washington Post, using natural language processing, counted the phrases and idioms that baseball players use.

We grouped phrases that were variations of each other together (within a one- or two-word difference) into a list of roughly 20,000 possible cliches. Then came the subjective part. From that list, we chose the ones that were the most interesting, then grouped those with similar meanings. And voila — the phrases we considered to be the cream of the cliche crop.

I can’t decide if the word cloud to open the article is a fun hook or a distraction. I’m learning towards the former, but I think it would’ve been less the latter without the interaction.
Weaponised design

August 30, 2018

Topic
Design / ethics, human

When the web was relatively new, things were more of a free-for-all. Everything was an experiment, and it always felt like there were fewer consequences online, because not that many people really used the internet. Now a large portion of people’s lives are online. There is more at stake.

Tactical Tech focuses in on the (careless) design of systems that allows bad actors to thrive:

Design can also be weaponised through team apathy or inertia, where user feedback is ignored or invalidated by an arrogant, culturally homogenous or inexperienced team designing a platform. This is a notable criticism of Twitter’s product team, whose perceived lack of design-led response is seen as a core factor for enabling targeted, serious harassment of women by #Gamergate, from at least 2014 to present day.

Finally, design can be directly weaponised by the design team itself. Examples of this include Facebook’s designers conducting secret and non-consensual experiments on voter behaviour in 2012–2016, and emotional states of users in 2012, and Target, who in 2014 through surveillance ad tech and careful communications design, informed a father of his daughter’s unannounced pregnancy. In these examples, designers collaborate with other teams within an organisation, facilitating problematic outcomes whose impact scale exponentially in correlation with the quality of the design input.
Members Only

Better than Default

August 30, 2018

Topic
The Process / custom, default, tools

Defaults are generalizations to fit many datasets, which means you usually get barebone charts. For analysis, all well and good. However, data graphics for presentation require more care after the initial output.
Considering the “valuable-ness” of the things we make

August 30, 2018

Topic
Design / Nicky Case, value

Nicky Case ponders the “valuable-ness” of the things he makes as the product of the number of people reached and the average value for each person reached. Finding the balance is tricky.
Algorithmic art shows what the machine sees

August 29, 2018

Topic
Data Art / algorithms, neural network

Tom White is an artist who uses neural networks to draw abstract pictures of objects. What looks blobby and fuzzy to us looks more concrete to the machine.

James Vincent for The Verge:

That “voice” is actually a series of algorithms that White has dubbed his “Perception Engines.” They take the data that machine vision algorithms are trained on — databases of thousands of pictures of objects — and distill it into abstract shapes. These shapes are then fed back into the same algorithms to see if they’re recognized. If not, the image is tweaked and sent back, again and again, until it is. It’s a trial and error process that essentially ends up reverse-engineering the algorithm’s understanding of the world.