Say you want to identify clusters in a scatterplot of points. K-Means is commonly used method that might get you there. Yi Zhe Ang explains how the method works with a visual and interactive essay.
-
Anahad O’Connor, Aaron Steckelberg and Garland Potts, for The Washington Post, made charts that compare the benefits of coffee and tea. But let’s be honest here. All we really want to see in a battle between coffee and tea is an anthropomorphic bean and leaf wrestle.
-
The Olli library aims to make it easier for developers to improve the accessibility of existing charts:
Olli is an open-source library for converting data visualizations into accessible text structures for screen reader users. Starting with an existing visualization specification created with a supported toolkit, Olli produces a keyboard-navigable tree view with descriptions at varying levels of detail. Users can explore these structures both to get an initial overview, and to dive into the data in more detail.
-
Simon Willison asked a straightforward question about the tools people use:
If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?
Then he expanded the question asking what people use for files with 1 million rows, 10 million rows, and 1 billion rows.
Browse the thousands of replies, and you quickly see that (1) there are many options to explore a dataset and (2) many people feel that what they’re using is the best option. There’s click-and-play programs, web-based products, programming languages, and command-line options. Some use a combination of whatever works for them at a given time for a certain dataset.
This is why when people ask me what the “best” tool is, I usually have to follow up with what they know already and what they want to do with the tool. It’s also why best-of lists for data exploration are usually not worth your time, unless you account for the assumptions about usage.
-
It seems a lot of data scientists have either left or were laid off from their jobs during the past few months. Jacqueline Nolis and Emily Robinson, data scientists who hosted a podcast and wrote a book on building a career in the field, happened to be in the lot. So naturally, they brought back the podcast for a bonus episode on their experiences with sudden unemployment and the job search.
I’ve never had a “real” job (as some tend to tell me), so workplace experiences are always interesting to me, like peering into an aquarium. The layoff process seems not fun.
-
Kelton Sears used a vertical scroll upwards to think about trees and time.
-
Bringing in data from various federal agencies:
Climate Mapping for Resilience and Adaptation (CMRA) integrates information from across the federal government to help people consider their local exposure to climate-related hazards. People working in community organizations or for local, Tribal, state, or Federal governments can use the site to help them develop equitable climate resilience plans to protect people, property, and infrastructure.
-
Members Only
-
You know those signs in workplaces that keep track of days since injury? Making use of NASA APIs, Neal Agarwal used that concept to keep track of natural disasters. As of this writing, it’s been 9,691,764 since the last Apocalyptic Volcanic Eruption (VEI 8). Pretty good.
-
Members Only
You can use straightforward functions in R to draw certain shapes, such as circles, squares, and rectangles. However, sometimes you need to draw a more complicated shape or one that’s based on data.
-
NOAA provides a map of potential flooding due to Hurricane Ian headed towards Florida. Red indicates greater than 9 feet of flooding above ground.
-
When someone fires a gun into the air, the bullet travels thousands of feet in elevation. Gravity pulls the bullet back down, and it accelerates fast enough to penetrate a human skull by the time it reaches ground-level. Acceleration and trajectory vary by type of gun and the shot angle. 1Point21 Interactive shows the variation and dangers with a visual explainer.
-
To teach, learn, and measure the process of analysis more concretely, Lucy D’Agostino McGowan, Roger D. Peng, and Stephanie C. Hicks explain their work in the Journal of Computational and Graphical Statistics:
The design principles for data analysis are qualities or characteristics that are relevant to the analysis and can be observed or measured. Driven by statistical thinking and design thinking, a data analyst can use these principles to guide the choice of which data analytic elements to use, such as code, code comments, data visualization, non-data visualization, narrative text, summary statistics, tables, and statistical models or computational algorithms (Breiman 2001), to build a data analysis. Briefly, the elements of an analysis are the individual basic components of the analysis that, when assembled together by the analyst, make up the entire analysis.
-
Randall Munroe provides another fine observation through xkcd.
I often wonder what our data and charts will look like a century or two from now. Will the conventions and aesthetics look silly and amateur or classic and vintage? Will what seems like a lot of detailed data now seem spotty and useless, or will we look back in disbelief that companies were allowed to track our activities? Will AI have taken over human cognition and make these questions obsolete, because we’re in a suspended dream state, our bodies used as energy to power super computers, unsure of what is real and what is simulated? Important questions.
-
Wildfire obviously damages the areas it comes in direct contact with, but wildfire smoke can stretch much farther. Based on research by Childs et al., Mira Rojanasakul, for The New York Times, shows how pollution from smoke spread between 2006 and 2020.
My kids’ rooms still have air filters from a few years ago, when a fire many miles away made the sky orange and our indoor environment smokey.
-
I heard you like spiral charts when the data is seasonal. I think that’s what Kevin Schaul and Hamza Shaban, for The Washington Post, had in mind when they charted housing demand through the lens of percentage of houses sold within two weeks.
-
Rafael Moral sang a very nerdy data analyst song, to the tune of “One Week” by Barenaked Ladies:
The “Data Horror Stories Song”, inspired by a tweet by @rogierK and commissioned by @LisaDeBruine
Any of these ever happened to you?#rstats #Statistics #DataScience pic.twitter.com/7A8PYGbolq
— Rafael Moral (@rafamoral) September 18, 2022
-
Members Only
-
In a collaborative effort with UX agency Kore, Moritz Stefaner describes work with World Health Organization to develop a data design language for their evolving data collections:
Deliberately designed as a toolbox, rather than a “rule book”, the Data Design Language includes not only principles and guidelines, but also a corresponding design vocabulary and repertoire — for instance, downloadable styles for color and typography, exemplary chart designs, as well as clear specifications for achieving high levels of responsiveness, interaction, internationalization and accessibility.
A custom chart library constitutes the reference implementation for the language and its principles. A corresponding chart creation tool will make it very easy for editors to effortlessly create and publish excellent charts.
-
A reliable dense fog in San Francisco is a defining characteristic of the city, to the delight of some and less delight to others, but the pattern of fog could be on its way out as the climate changes. Scott Reinhard, for The New York Times, visualized the flow of fog and what sucks it into the bay. That intro image is something.