-
Members Only
-
While it is often easy, and tempting, to write a scraper as a dirty one-off script, spatula makes an attempt to provide an easy framework that most scrapers fit within without additional overhead.
This reflects the reality that many scraper projects start small but grow quickly, so reaching for a heavyweight tool from the start often does not seem practical.
The initial overhead imposed by the framework should be as light as possible, providing benefits even for authors that do not wish to use every feature available to them.
Although, without my dirty one-off scripts, what will I put in my tmp data folder?
-
How to Make Alluvial Diagrams
Here’s how to do it in R from start to finish, plus editing in illustration software. Make design choices and trade-offs for more readable charts.
-
The New York Times investigated the collapse of the Metro, which killed 26 people. They explain their findings with a 3-D model of the train and the tracks:
The Times took thousands of photographs of the crash site and shared the evidence with several leading engineers who reached the same conclusion: The steel studs that were vital to the strength of the overpass — linchpins of the entire structure — appear to have failed because of bad welds, critical mistakes that likely caused the crash.
Impressive use of visuals and scrollytelling to take you through the seemingly small mistakes that added to a terrible outcome.
-
xkcd points out the importance of considering the baseline when making comparisons:
-
There’s a mouse plague in Australia right now. The words alone don’t express the scale and seriousness of this problem, but this Washington Post piece sure does. The combination of video, photos, and graphics clearly demonstrates the scale. It starts with a pair of mice and escalates quickly from there — and might give you the willies along the way.
-
In what’s become a recurring theme almost every year, the western United States is experiencing drought, much of it exceptional or extreme. Nadja Popovich for The New York Times has the small multiple maps to show June conditions each year since 2000.
-
As a way to reflect on 2020, Catherine Madden wove tapestry to visualize six time series. So nice. [via Visualising Data]
-
Members Only
-
This is a fun soccer graphic by Karim Douïeb. It shows 882,536 passes from 890 matches across various leagues and seasons. It looks cool as a static point cloud, but be sure to check out the animated, interactive version which lets you isolate the view to specific parts of the field.
It reminds me of the Windows 3.1 fireworks screensaver. Those were the days.
You can find the data via StatsBomb, in case you want to play around.
-
ProPublica anonymously obtained billionaires’ tax returns. Combining the data with Forbes’ billionaire wealth estimates, ProPublica calculated a “true tax rate” for America’s 25 richest people:
The results are stark. According to Forbes, those 25 people saw their worth rise a collective $401 billion from 2014 to 2018. They paid a total of $13.6 billion in federal income taxes in those five years, the IRS data shows. That’s a staggering sum, but it amounts to a true tax rate of only 3.4%.
It’s a completely different picture for middle-class Americans, for example, wage earners in their early 40s who have amassed a typical amount of wealth for people their age. From 2014 to 2018, such households saw their net worth expand by about $65,000 after taxes on average, mostly due to the rise in value of their homes. But because the vast bulk of their earnings were salaries, their tax bills were almost as much, nearly $62,000, over that five-year period.
As you might guess, a lot of the disparity has to do with wealth held in unrealized capital gains. The other part is how the ultrawealthy still pay for everything when most of their money is in investments and how that factors into deductions.
-
How long will chicken reign supreme? Who wins between lemon and lime? Is nonfat ice cream really ice cream? Does grapefruit ever make a comeback? Find out in these charts.
-
Center for Responsive Politics and National Institute on Money in Politics are merging their datasets to make it more accessible:
The nation’s two leading money-in-politics data organizations have joined forces to help Americans hold their leaders accountable at the federal and state levels, they said today.
The combined organization, OpenSecrets, merges the Center for Responsive Politics (CRP) and the National Institute on Money in Politics (NIMP), each leading entities for three decades. The merger will provide a new one-stop shop for integrated federal, state and local data on campaign finance, lobbying and more, that is both unprecedented and easy to use.
Good. More important than ever.
-
Any outdoor activity sounds amazing at this point. Andrea Roberson and Casey Miller for Los Angeles Times put together this charming to rockhounding in California. Each rock type has the tools needed, laws, and where to find it. The guide even has some 3-D models in there for good measure.
-
We see probabilities mentioned in the news, in weather forecasts, during sporting events, political arguments, business reports, elections, medical advice, and scientific findings. But probability is a tricky concept that not all (most?) people understand. Grace Huckins for The Open Notebook outlines useful ways to communicate the numbers more clearly — to increase the chances readers do understand.
On using concrete numbers over percentages:
Concrete numbers can also make statistics feel more personally relevant. A 0.5 percent risk of developing a particular kind of cancer may seem minuscule. But if a reader went to a high school with 1,000 students, they may find it more impactful to hear that five of their classmates, on average, will develop the disease. In a March 2021 story, American Public Media used concrete numbers rather than percentages to communicate race disparities in COVID deaths. They reported that 1 of every 390 Indigenous Americans had died of COVID.
Other tips include using visuals, relatable comparisons, and acknowledging uncertainty instead of speaking in absolutes.
-
Members Only
-
Asian Americans are often viewed as a “model minority”, but when you look, just a little bit closer, the tag doesn’t fit. Connie Hanzhang Jin for NPR breaks it down in a set of six charts.
-
Reddit user desfirsit made heatmaps to show where on the chess board pieces are usually captured. The top two rows are for black pieces, and the bottom row is for white pieces.
I’m no chess player, but this seems to look right? The frequency of captures appears to agree with movement patterns. Although I’m surprised that the queen, despite having the most freedom of movement, is often captured in the same place. But like I said, I’m no chess player.
You can grab the data from Lichess, which provides data for millions of chess games.
-
Your computer connects to your router, which connects to your modem. Your printer connects to your computer. The devices all send data and talk to each other. Nicole He and Eran Hilleli imagined these conversations in augmented reality:
The application would first detect all of the different devices connected to your network; this would include the more obvious ones like computers or phones, as well as other things, like TVs, speakers, game consoles, vacuums or washing machines. It would then locate their manufacturing data and use it to recast your devices as charming characters, spawning on nearby surfaces in augmented reality. Each character’s design would hint at the device it represents while remaining playful and open to interpretation (e.g. a character that resembles a TV portraying your TV).
The playful, cartoon-like devices contrast with the more creepy angle of a connected home.
-
Connor Rothschild charted all the colors Bob Ross used in The Joy of Painting:
Most commonly, paintings have 12 colors. Of the 403 pieces in The Joy of Painting, 100 used 12 colors.
The peak is concentrated around 12, meaning most of Ross’ paintings used somewhere in the range of 7-13 colors; very rarely did they venture outside of that range.
Grab the data here, which was collected by Jared Wilber.