• When you drink bubble tea, ideally you’d like to finish with the same proportions of boba and tea that you started at. Krist Wongsuphasawat took care of the math and provides a simulator for this ever important challenge:

    This article simulates an optimized sip based on amount of boba and tea in the straw before sipping (method adopted from this post). The simulation assumes that all bobas sit in the bottom of the cup and stack on top of each other nicely. If you put a straw straight down when there are n layers of bobas, you will get n bobas in the straw. The rest of the straw up to the drink’s height is tea. The drinker sips until all n bobas are in his/her mouth then stop. After each sip these n bobas and tea inside the straw are gradually reduced from the cup.

    The final recommendations: use a slim cup, minimize ice, and drink strongly. Mess around with variables here.

  • A few months back, Microsoft released a comprehensive dataset that included the estimated footprints of all of the buildings in the United States. The New York Times mapped all of it.

    The footnote says a lot about their attention to detail:

    In some cases, the building shapes generated by Microsoft’s automated process do not match the existing building footprints exactly. We manually corrected as many of these mistakes as we found, or, where available, replaced the shapes using more precise local data sets. Data was unavailable for much of Alaska.

  • Based on data from the Census Bureau, National Geographic mapped predominant race in 11 million administrative regions in the United States. Many of the regions are the size of a single block.

    Looking at the national overview, the country looks predominantly white (represented blue), but as you zoom in for more details, you start to see the mix.

  • Giorgia Lupi and Stefanie Posavec continue on their path of Dear Data with a book that you draw in: Observe, Collect, Draw!

    The first section describes some of the basics of journaling with data and how you can use various visual encodings. However, the main part of the book is a journal that guides you through collection and the visual encodings that Lupi and Posavec used with their postcards. First, there’s an instruction page and then the adjacent page provides blank scales for you to sketch yourself.

    Fun. It seems like a good way to jog your imagination, in case you feel like you’re stuck in a bar chart geometry funk. [Amazon link]

  • Members Only

    If the charts themselves are fairly straightforward without any dubious design choices, are you still “lying with charts” when only the data itself was manipulated?

  • Members Only

    Compare distributions side-by-side with a pyramid chart. Observe the change over the years by animating it.

  • As you click through the news, you can probably almost always figure out what source is loading without the URL or title. Just judge based on the layout. Noah Veltman made this overview to show how news orgs prioritize editorial content, ads, and sponsored content.

  • There was a survey a while back that asked people to provide a 0 to 100 percent value to probabilistic words like “usually” and “likely”. YouGov did something similar for words describing good and bad sentiments.

  • This 3-D view inside Hurricane Maria, from NASA’s Scientific Visualization Studio and NASA’s Goddard Space Flight Center, lets you see the data and the lead-up to the storm in a neat 360-degree view. Be sure to watch it on your phone or with a VR thingy for full effect. Disregard the questionable color scale.

  • FiveThirtyEight and The Trace investigate the uncertainty and accuracy of gun injury data released by the Centers for Disease Control and Prevention:

    An analysis performed by FiveThirtyEight and The Trace, a nonprofit news organization covering gun violence in America, found that the CDC’s report of a steady increase in nonfatal gun injuries is out of step with a downward trend we found using data from multiple independent public health and criminal justice databases. That casts doubt on the CDC’s figures and the narrative suggested by the way those numbers have changed over time.

    It might be time to update the statistical models used to estimate injuries.

  • Getting into data science typically requires that you have access to a decent computer or server. You also usually need to install software. Chromebook Data Science, a set of online sources from the Johns Hopkins Data Science Lab, lets people learn with just a Chromebook and an internet connection:

    Today I’m excited to announce the first part of our new system, a new set of massive online open courses called Chromebook Data Science. These MOOCs are for anyone from high schoolers on up to get into data science. If you can read and follow instructions you can learn data science from these courses!

    The reason they are called Chromebook Data Science is because philosophically our goal was that anyone with a Chromebook could do the courses. All you need is a web browser and an internet connection. The courses all take advantage of RStudio Cloud so that all course work can be completed entirely in a web browser. No need to install software or have the latest MacBook Computer.

    It’s pay-what-you-want with a $0 minimum and it’s designed specifically for people with no data science experience. Sounds pretty sweet.

    Find out more here.

  • Members Only

    The truth is that all charts are misleading. In some sense. The key is minimizing how much.

  • Opportunity Atlas, a collaboration between Opportunity Insights and the Census Bureau, is the product of ongoing research on the demographics of people, based on the neighborhood they grew up in.

    The Opportunity Atlas provides data on children’s outcomes in adulthood for every Census tract in the United States through an interactive map providing detailed research on the roots of these outcomes, such as poverty and incarceration rate, back to the neighborhoods in which children grew up. This tool will enable policy makers, practitioners, and the public the unprecedented ability to look within their city to understand better where opportunity exists and how each neighborhood shapes a child’s future economic and educational success.

    The map application was developed by Darkhorse Analytics. Zoom in to an area of interest, subset on demographics such as income level, race, and gender, and see how the people who grew up in those areas fared later in life. You can also download the tract-level data to look for yourself.

  • The most common causes of death changed over the years. They vary across sex and age group. This animation shows the details of these changes.

  • Craig Taylor from Ito World used a coral metaphor to visualize road networks in major cities around the world:

    For the past six months I have been fascinated by the concept of making city networks look like living corals. The varying patterns of urban forms are inherently dictated by their road network; a complex, seemingly organic connection of links moving people across their city. Like branches of coral they have a pattern and a function, I chose to expose this pattern and manipulate it to become something far more conceptual. However, whilst being incredibly beautiful they are derived from various geo-spatial analysis of drive-times catchments making them somewhat informative as well.

    Pretty.

  • Founded by Sue Gardner, the former head of the Wikimedia Foundation and Julia Angwin and Jeff Larson, journalists formerly for ProPublica, The Markup will aim to use data to help non-experts better understand everyday technologies that often go unchecked.

    When Angwin and Larson worked together at ProPublica, their data-driven investigations included exposing discriminatory advertising practices at Facebook, bias in software that is used in criminal sentencing and algorithms that result in unfair car insurance pricing. They also uncovered evidence of domestic surveillance practices in the Snowden archives and revealed technology vulnerabilities at the President’s Mar-A-Lago country club.

    “I’m excited to build a team with deep expertise that can really scale up and advance the work Jeff and I began at ProPublica,” Angwin said. “We see The Markup as a new kind of news organization, staffed with journalists who know how to investigate the uses of new technologies and make their effects understandable to non-experts.”

    “People know that these new technologies are important and want to better understand their societal effects. We will help them do that,” said Larson. “The Markup will hold the powerful to account, raise the cost of bad behavior, and spur reforms.”

    The venture is primarily backed by a $20 million donation from Craigslist founder Craig Newmark and $2 million from the Knight Foundation. Amazing.

    Looking forward to this.

  • Jeffrey Heer, a computer science professor at the University of Washington, provides an overview of building charts for analysis and exploration. It’s an iterative process between acquisition, cleaning, integration, visualization, modeling, presentation, and dissemination. [via @albertocairo]

  • Tim Meko and Aaron Steckelberg for The Washington Post compared this summer’s rains with the average. The combination of mapping as terrain and color-encoding provides an interesting foam-looking aesthetic.

  • Members Only

    If you’re trying to learn how to work with data, make time to fiddle with the toys in your growing toolbox. Otherwise, you just have a bunch of bookmarks and no new skills.

  • Morph, by Datavized in collaboration with the Google News Initiative, is a tool to generate abstract images from data:

    Morph exists to engage users in the creative expression of data without having to code. Generative art based algorithms turn data into a visual representation and the user can affect how their data interacts with the final visual via the algorithm. The algorithms themselves are not fixed; the user can randomly mutate, evolve and generate new algorithms creating new visuals, encouraging the sense of creative exploration and discovery.

    Just upload your data, select some options to map variables to visual encodings (or have the app pick random ones for you), and see what you get. Be sure to try the “evolve” option at the end, which shows a bunch of variations of your generated image.

    The results are pretty abstract, so I’m not sure if it has practical uses in the traditional data settings, but it’s fun to play with. And maybe it could be useful to quickly flip through visual encodings.