R has found its way into a good number of news groups who do data journalism. Andrew Flowers for FiveThirtyEight talks about how they use the statistical computing language throughout their workflow.
R is used in every step of the data journalism process: for cleaning and processing data, for exploratory graphing and statistical analysis, for models deploying in real time as and to create publishable data visualizations. We write R code to underpin several of our popular interactives, as well, like the Facebook Primary and our historical Elo ratings of NBA and NFL teams. Heck, we’ve even styled a custom ggplot2 theme. We even use R code on long-term investigative projects.
Thanks to the Digital Humanities and Data Journalism Symposium for sponsoring the feed this week.
In 2012 Dan Cohen, founding executive director of the Digital Public Library of America, wrote: “I’ve increasingly felt that digital journalism and digital humanities are kindred spirits.” Inspired by Cohen’s article, the University of Miami is launching the first Digital Humanities and Data Journalism (DH+DJ) Symposium between September 29 and October 1.
The DH+DJ symposium will bring together two communities that share similar interests and use largely the same tools and programming languages to interrogate data. It will be a relatively small gathering consisting of a series of talks and classes on topics that are important to both areas and to their intersection, such as big data management and analytics, effective communication, data visualization and infographics, mapping, etc.
The talks and classes will be delivered by prominent names from both disciplines, who will blend the theoretical with the practical: Dan Cohen himself, as well as ProPublica’s Scott Klein, FiveThirtyEight’s Allison McCann, Mashable’s Haile Owusu, Georgia Tech’s Lauren Klein, Northeastern University’s Ben Schmidt, Carnegie Mellon’s Scott Weingart, Northwestern University Knight Lab’s Joe Germuska, and many others. Please refer to the website for a complete schedule.
In addition to the talks, attendants will have ample opportunity to network and engage in discussions. The main goal of this experimental gathering is to prompt these conversations. We expect them to be a major highlight of the event.
Most people have one or two drinks on average, but some consume much more.
Clive Thompson for Smithsonian Magazine gives a quick history lesson on infographics.
[D]ata visualization was rare because data was rare. That began to change rapidly in the early 19th century, because countries began to collect—and publish—reams of information about their weather, economic activity and population. “For the first time, you could deal with important social issues with hard facts, if you could find a way to analyze it,” says Michael Friendly, a professor of psychology at York University who studies the history of data visualization. “The age of data really began.”
Thompson uses “infographic” but really means “data visualization” most of the time, but still a good overview.
And while we’re on the topic of old visualization stuff, you should also check out Scott Klein’s newsletter, Above Chart. The history provides fine context for where visualization is at now.
Last year the New York Times interviewed Justin Bieber, Diplo, and Skrillex about how they put together their song Where Are Ü Now. NYT coupled the video with data visualization elements that helped you understand what the artists talked about. Pretty great.
Now here’s what happens when you switch out the original song and insert the Seinfeld theme song.
what if Bieber diplo and skrilex created the seinfeld themehttps://t.co/BUTOrNDwGQ
— Seinfeld Current Day (@Seinfeld2000) July 4, 2016
Also pretty great.
Drought continues to trudge along. My grass is just about dead, save a few hearty patches clinging on to the last few drops in the soil. Sad state of affairs it is. Drought is not static though. The boundaries move and the levels change, which is what John Nelson mapped in an overlay of five years of drought in the United States, based on data from the Drought Monitor.
We’ve seen this as small multiples and animated maps, but I like how this static boundary version gifts a sense of shift without actually moving.
Musicmap is an attempt to show the history of music over time and how it came to be what it is today.
Musicmap attempts to provide the ultimate genealogy of popular music genres, including their relations and history. It is the result of more than seven years of research with over 200 listed sources and cross examination of many other visual genealogies. Its aim is to focus on the delicate balance between comprehensibility, accuracy and accessibility.
Be sure to zoom in for the details of how genres and sub-genres are connected. Click on any group for samples, playlists, and written history.
Let the data speak for itself they say. That doesn’t work a lot of the time, and when that happens, you need to explain.
Glenn McDonald attempts to graph the musical space in its entirety on a two-dimensional scale. He calls it Every Noise at Once.
This is an ongoing attempt at an algorithmically-generated, readability-adjusted scatter-plot of the musical genre-space, based on data tracked and analyzed for 1491 genres by Spotify. The calibration is fuzzy, but in general down is more organic, up is more mechanical and electric; left is denser and more atmospheric, right is spikier and bouncier.
Click on the genres for music samples, if you are like me and are not sure what rap metalcore or ghettotech sounds like. [Thanks, Namir]
FiveThirtyEight published their election forecast tracker this week, and it’s a beaut. It starts with the standard state map and most importantly the probability of each candidate winning the presidency. But after that, you can look into much detail on a state-by-state basis.
They currently give Hillary Clinton a 79 percent chance of winning and Donald Trump a 21 percent chance. I think many interpret this as Clinton is practically a lock, but it’s actually far from it. That 21 percent is freakin’ high.
As much as I want to forget, let’s remember that the Cleveland Cavaliers only had an 11 percent chance of winning the title, and I think we know what happened there.
Meredith Reba, Femke Reitsma and Karen C. Seto compiled a dataset of urban settlements between 3700 BC and AD 2000. It’s the first of its kind, and while not comprehensive, it does provide a view into growth around the world. Max Galka put together the straightforward map above to display the data.
D3 4.0 is modular. Instead of one library, D3 is now many small libraries that are designed to work together. You can pick and choose which parts to use as you see fit. Each library is maintained in its own repository, allowing decentralized ownership and independent release cycles. The default bundle combines about thirty of these microlibraries.
Small files are nice, but modularity is also about making D3 more fun. Microlibraries are easier to understand, develop and test. They make it easier for new people to get involved and contribute. They reduce the distinction between a “core module” and a “plugin”, and increase the pace of development in D3 features.
For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.
Hollywood has been talking gender equality in the movies more than usual lately, so Hanah Andersen and Matt Daniels for Polygraph looked into the matter from a data perspective.
We didn’t set out trying to prove anything, but rather compile real data. We framed it as a census rather than a study. So we Googled our way to 8,000 screenplays and matched each character’s lines to an actor. From there, we compiled the number of words spoken by male and female characters across roughly 2,000 films, arguably the largest undertaking of script analysis, ever.
We have another data point on the way, so it might suddenly go silent around these parts soon. There was a sursprising amount of downtime with the first data point, with naps and feeding and such, so I was able to keep going. But I expect my hands to be more full this time, because, well, two data points.
Video game developer Michael Davies provides a Blender script to procedurally generate pretty 3-D spaceships. Enter your parameters, such as number of hull segments, scaling, and rotation, and you’ve got a new vehicle for the stars. [via @albertocairo]
Filmmaker Oscar Sharp and technologist Ross Goodwin fed a machine learning algorithm with a bunch of Sci-Fi movie scripts to see what new script it would spit out. A script for Sunspring is the result, and this is the film, starring Thomas Middleditch. Riveting.
Dango is an Android app that predicts relevant emojis as you type. Xavier Snelgrove, the CTO for the group, explains how they use neural networks to make that happen.
Recently, neural networks have become the tool of choice for a variety of tough computer-science problems: Facebook uses them to identify faces in photos, Google uses them to identify everything in photos. Apple uses them to figure out what you’re saying to Siri, and IBM uses them for operationalizing business unit synergies.
It’s all very impressive. But what about the real problems? Can neural networks help you find the ? emoji when you really need it?
Why, yes. Yes they can. ?
The Tampa Bay Times takes you through a 3-D model of Pulse Nightclub in Orlando, driven by the narratives of those who were there at night. Heartbreaking.