• As I peel myself out of bed in the morning after again not going to sleep at a civilized hour, blurry-eyed, I wonder what hours others sleep. Certainly, I must be in the majority. According to the National Health Interview Survey from 2022, I am not. Two-thirds of adults get at least 7 hours of sleep.

  • Members Only
    Tutorials  / 

    In making an ever-important comparison between McDonald’s locations and golf courses in the United States, I wanted to use Dorling cartograms to show counts and which was more common in a given location. But my data wasn’t shaped quite right, so I broke it down and used parts of previous projects and tutorials.

  • In 1942, Franklin Delano Roosevelt mandated that those of Japanese descent be sent to prison camps. Through the lens of recently released Census records, the San Francisco Chronicle examined the impact of forcing thousands of residents out of their homes.

    Over nearly a year, the Chronicle collected and analyzed this data, seeking to understand just how Executive Order 9066 reshaped Japantown. For the first time, we can count the number of Japanese American residents in the neighborhood in 1940 and 1950 — an unequivocal measure of the order’s disastrous effect on the community.

  • For Bloomberg, Daniela Sirtori, Madeline Campbell, and Marie Patino do some product counting:

    Data collected by the California Department of Public Health showed 108 potentially harmful substances listed as fragrance ingredients in everyday products ranging from face wash to conditioner, a Bloomberg analysis of database entries as of Feb. 6 found. Some of the compounds are identified as potential carcinogens by authorities such as the World Health Organization.

    Scaled jars of cream are used to show ingredient categories. I like it.

  • I missed this announcement at the end of last year:

    Sherwood Media, LLC has added U.K.-based Chartr Limited, a data-driven media company and newsletter publisher, to its portfolio through an acquisition by Robinhood Markets, Inc. Chartr’s visual storytelling turns complex data into easy-to-understand narratives, and will now give the tens of millions of readers of Sherwood Media the ability to better understand the finer details of important trends and the news of the day.

    Offering clear and thoughtful insights into complex data is a natural extension of Sherwood’s mission to empower its readers to have the information they need to control their financial future. The acquisition of Chartr gives audiences new ways to understand and see news and market-moving trends. Readers can expect to find Chartr stories across Sherwood content, including Snacks, a daily markets and business newsletter that has one of the largest audiences in the country.

    Sherwood Media is a subsidiary of Robinhood. They recently launched Sherwood News, and I saw a chart that looked familiar in format but with a different logo. It’ll be interesting to watch where this publication and relatively straightforward chart machine goes.

  • The National Longitudinal Surveys from the Bureau of Labor Statistics are unique in that they run long-term to survey the lives of individuals for decades. For The Pudding, Alvin Chang visualized survey responses to show how adversity as a teenager carries into adulthood.

    Each person icon represents a respondent and the collective bar chart stacks track through the years. The icons run across the screen on each time segment and demographic shift.

    There’s a video version, shown below, and while I enjoy Alvin’s dulcet voice, I prefer the scrolling version.

  • A decade and a half ago, I wrote the first edition of Visualize This as a how-to guide to my past self. It was for someone who was familiar with visualization but was stuck on the part where it’s time to make and design charts with your own data.

    What tools should you use? How do you use them? How do you get from rough sketch to finished graphic? How do you get the visualization idea in your imagination on to a screen where others can see?

    It turns out that you can read and learn a lot about visualization — the chart types, the best visual encodings, design considerations, and purpose — without actually knowing how to follow through with the advice. There’s a technical side to visualizing data that couples with the thinking side. I wrote Visualize This for the person who wants to make the coupling and follow through.

    The challenge of writing a book with concrete, how-to examples that rely on software is that some of the software fades. The technology and applications shift.

    Flash dies. People consume data through different screen sizes. New tools make it easier to visualize data. Tastes change. The field develops.

    Visualize This, Second Edition is an update for the tools, chart types, and overall process that changed over the years. The examples are better balanced and more focused.

    The new book is still a practical, easy-to-read guide intended for my past self who wanted to make all the charts for all the data. But this time around, I had a decade and a half more experience analyzing data, making charts, and thinking about process.

    Visualize This, Second Edition is out in June, but you can pre-order a copy now. I hope it helps you have fun with data.

  • NatureQuant processes and analyzes satellite imagery to quantify people’s access to nature. They call it a NatureScore. For the Washington Post, Harry Stevens mapped and charted the scores across the United States. At first glance, the map looks a lot like population density, but the better comparison is in how cities with similar population densities look next to each other.

  • Members Only

    People need a sense of how distributions work before they can make sense of a histogram. Here’s how I (try to) make these misunderstood charts easier to read.

  • As you might expect, the path of totality brought increased activities as people tried to get in the right spots. For the New York Times, Charlie Smart mapped the movements based on activity data from Mapbox and traffic data from TomTom.

  • In our earlier years, we tend to date and marry others who are around our age. However, this is not true for everyone. Variation kicks in when you look at the later years, consider multiple marriages, divorce, separation, and opposite-sex versus same-sex relationships. This chart breaks it all down.

  • From xkcd, a Rube Goldberg machine that keeps on going. Edit a cell by adding xkcd-esque objects and watch the balls fall and bounce to neighboring cells.

  • Maybe you heard there’s a total eclipse happening today. AirDNA mapped Airbnb occupancy rates over the week. There might be a pattern.

    The anticipation of the solar eclipse has transformed an otherwise ordinary Monday into a lucrative opportunity for STR hosts located within the path of totality. As of March 25th, occupancy rates for April 7th have soared to an impressive 88% across all listings. This represents a massive surge in demand for accommodations on the night before the big celestial event.

  • April 5, 2024

    Topic

    Coding  /  ,

    The easystats R package in on my to-try list.

    easystats is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.

    Apparently it’s been around since 2022, but it’s new to me.

  • Members Only

    Show all the data at once so that you can see a full trend efficiently, but show a bit at a time and show how the data builds.

  • Joanie Lemercier used a grid of spinning paddles that turn with the wind. Collectively, they show the flows through the air in real-time.

    It reminds me of a digital map that used a similar geometry to show wind patterns across the United States.

  • Alexander Miller wrote a “fable of emergence” that combines Conway’s Game of Life with Pandora’s Box.

    Conway’s game grew on Pandora the more she played. Although the rules of the game were relatively straightforward, it was surprisingly difficult to predict the next generation from the previous. Something was hidden within this deceptively simple format. The rules formed a subterranean structure of which she could only see the surface.

    It’s animated with a sprinkle of interaction to make sure you’re paying attention.

  • It continues to get easier to take someone’s face and put that person in compromising situations. For The Markup, Mariel Padilla reports on states trying to catch up with the fast-developing technology.

    Carrie Goldberg, a lawyer who has been representing victims of nonconsensual porn—commonly referred to as revenge porn—for more than a decade, said she only started hearing from victims of computer-generated images more recently.

    “My firm has been seeing victims of deepfakes for probably about five years now, and it’s mostly been celebrities,” Goldberg said. “Now, it’s becoming children doing it to children to be mean. It’s probably really underreported because victims might not know that there’s legal recourse, and it’s not entirely clear in all cases whether there is.”

    The internet is going to get very weird and very confusing, especially for those who can’t fathom how a photo, a video, or audio could be fake when it seems so real. Scammers’ imaginations must be running wild these days.

  • OpenAI previewed Voice Engine, a model to generate voices that mimic, using just a 15-second audio sample:

    We first developed Voice Engine in late 2022, and have used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud. At the same time, we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.

    They provide worthwhile use cases, such as language translation and providing a voice to those who are non-verbal, but oh boy, the authenticity of online things is going to get tricky very soon.

  • For Knowing Machines, an ongoing research project that examines the innards of machine learning systems, Christo Buschek and Jer Thorp turn attention to LAION-5B. The large image dataset is used to train various systems, so it’s worth figuring out where the dataset comes from and what it represents.

    As artists, academics, practitioners, or as journalists, dataset investigation is one of the few tools we have available to gain insight and understanding into the most complex systems ever conceived by humans.

    This is why advocating for dataset transparency is so important if AI systems are ever going to be accountable for their impacts in the world.

    If articles covering similar themes have confused you or were too handwavy, this one might clear that up. It describes the system and steps more concretely, so you finish with a better idea of how systems can end up with weird output.