In news graphics, blue typically represents Democrat and red represents Republican. However, the definition isn’t so clear-cut by actual party usage. Chris Alcantara for The Washington Post broke it down in 900 campaign logos used during the recent midterms. Each strip represents a logo.
-
Kyle McDonald describes some of the history and current research on using algorithms to generate music. On how David Cope incorporated Markov chains to aid in his work:
In 1981 David Cope began working with algorithmic composition to solve his writers block. He combined Markov chains and other techniques (musical grammars and combinatorics) into a semi-automatic system he calls Experiments in Musical Intelligence, or Emmy. David cites Iannis Xenakis and Lejaren Hiller (Illiac Suite 1955, Experimental Music 1959) as early inspirations, and he describes Emmy in papers, patents, and even source code on GitHub. Emmy is most famous for learning from and imitating other composers.
I expected samples to sound robotic and unnatural, and some are, but some are quite pleasant to listen to.
-
When you go skiing or snowboarding, you get a map of the mountain that shows the terrain and where you can go. James Niehues is the man behind many of these hand-painted ski maps around the world, and he has a kickstarter to catalog his life’s work.
This is kind of amazing. I went skiing a lot as a kid, and I have distinct memories of these maps. I would stand at the top of the mountain, rip off one of my gloves with my teeth, and then pull out a folded map from a zipped pocket. I never knew they were by the same man, but in retrospect, it makes sense.
-
Newsy, Reveal and ProPublica look into rape cases in the U.S. and law enforcement’s use of exceptional clearance.
The designation allows police to clear cases when they have enough evidence to make an arrest and know who and where the suspect is, but can’t make an arrest for reasons outside their control. Experts say it’s supposed to be used sparingly.
Culled data from various police departments shows the designation is used more often that one would expect.
-
The Camp fire death toll rose to 63 and 631 missing as of yesterday. The Los Angeles Times provides some graphics showing scale and the buildings that burned.
Ugh. I live a few hundred miles away and the smoke is bad enough that my son’s school is closed today. It has not been a good year for California in terms of wildfires.
-
Members Only
-
I’m behind on my podcast listening (well, behind in everything tbh), but Reply All covered the flaws of CompStat, a data system originally employed by the NYPD to track crime and hold officers accountable:
But some of these chiefs started to figure out, wait a minute, the person who’s in charge of actually keeping track of the crime in my neighborhood is me. And so if they couldn’t make crime go down, they just would stop reporting crime. And they found all these different ways to do it. You could refuse to take crime reports from victims, you could write down different things than what had actually happened. You could literally just throw paperwork away. And so that guy would survive that CompStat meeting, he’d get his promotion, and then when the next guy showed up, the number that he had to beat was the number that a cheater had set. And so he had to cheat a little bit more.
I sat in on a CompStat meeting years ago in Los Angeles. I went into it excited to see the data system that helped decrease crime, but I left skeptical after hearing the discussions over such small absolute numbers, which in turn made for a lot of fluctuations percentage-wise. Maybe things are different now a decade later, but I’m not surprised that some intentionally and unintentionally gamed the system.
See also: FiveThirtyEight’s CompStat story from 2015.
-
Atma Mani, a geospatial engineer for ESRI, imagined shopping for a house with data, maps, and analysis. Basically, a personalized recommendation system:
The type of recommendation engine built in this study is called ‘content based filtering’ as it uses just the intrinsic and spatial features engineered for prediction. For this type of recommendation to work, we need a really large training set. In reality nobody can generate such a large set manually. In practice however, another type of recommendation called ‘community based filtering’ is used. This type of recommendation engine uses the features engineered for the properties, combined with favorite / blacklist data to find similarity between a large number of buyers. It then pools the training set from similar buyers to create a really large training set and learns on that.
I love going all nerd on these sort of things. The most interesting part for me though is that it always seems to come down to a gut feeling. You have to see the house and get a feel for the area, which is much harder to get through data. So then, how do you couple the information you get from the data with more fuzzy emotions?
-
Street names are stories of life. They tell us something about how the people in a given place work and live, what they believe in and their dreams. There are more than a million streets and squares in Germany. ZEIT ONLINE has compiled a database of the roughly 450,000 different names used. Some street names are used hundreds of times and others only once. But none of the names were chosen at random.
It’s for street names in Germany, so the meaning might be lost for many of you, but much of the data comes from OpenStreetMap, which should mean something like this is doable for other cities and countries.
See also the San Francisco history of street names mapped by Noah Veltman a few years ago. [via @maartenzam]
-
Reading visualization research papers can often feel like a slog. As a necessity, there’s usually a lot of jargon, references to William Cleveland and Robert McGill, and sometimes perception studies that lack a bit of rigor. So for practitioners or people generally interested in data communication, worthwhile research falls into a “read later” folder never to be seen again.
Multiple Views, started by visualization researchers Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini, aims to explain the findings and the studies to a more general audience. (The UW Interactive Data Lab’s feed comes to mind.) Maybe the “read later” becomes read.
I’m looking forward to learning more. These projects have a tendency to start with a lot of energy and then fizzle out, so I’m hoping we can nudge this a bit to urge them on. Follow along here.
-
How I Made That: Animated Difference Charts in R
A combination of a bivariate area chart, animation, and a population pyramid, with a sprinkling of detail and annotation.
-
Charles-Joseph Minard, best known for a graphic he made (during retirement, one year before his death) showing Napoleon’s March, made many statistical graphics over his career. The Minard System from Sandra Rendgen is a collection of these works. The first section is background on Minard, his famed graphic, and his process, but really, you get it for the collection of vintage graphic goodness. [Amazon link]
-
The Earth Puzzle by generative design studio Nervous System has no defined borders. You put it together how you want.
Start anywhere and see where your journey takes you. This puzzle is based on an icosahedral map projection and has the topology of a sphere. This means it has no edges, no North and South, and no fixed shape. Try to get the landmasses together or see how the oceans are connected. Make your own maps of the earth!
Get it here. There’s also one for the moon.
-
Members Only
-
Ben Schmidt uses deep scatterplots to visualize millions of data points. It’s a combination of algorithm-based display and hiding of points as you zoom in and out like you might an interactive map. Schmidt describes the process and made the code available on GitHub.
-
The Guardian goes with scaled, angled arrows to show the Republican and Democrat swings in these midterms for the House compared against those of 2016.
It reminds me of the classic wind-like map by The New York Times from 2012, but the angles seem to give the differences a bit more room to breathe.
Update: Also, see a similar map by NYT from 2016, except the arrows point the other direction.
-
Artificial intelligence, given its name, sounds like a computer learns everything its own. However, a set of algorithms can only become useful if there’s something to learn from: data. Dave Lee for BBC reports on a company in Kenya that supplies training data for self-driving cars:
Brenda loads up an image, and then uses the mouse to trace around just about everything. People, cars, road signs, lane markings – even the sky, specifying whether it’s cloudy or bright. Ingesting millions of these images into an artificial intelligence system means a self-driving car, to use one example, can begin to “recognise” those objects in the real world. The more data, the supposedly smarter the machine.
On the one hand it sounds like tedious work on the cheap, but on the other it provides people with more opportunities that were previously unavailable.
-
Data grows more intertwined with the everyday and more involved in important decisions. However, data is biased in many ways from collection, to analysis, and the conclusions, which is a problem when it is often intended to provide an objective point of view. In their recently released manuscript for Data Feminism, Catherine D’Ignazio and Lauren Klein discuss the importance of varied points of view:
The double-edged sword of data shows just how important it is to understand how structures of power and privilege operate in the world. The questions we might ask about these structures can relate to issues of gender in the workplace, as in the case of Christine Darden and her wrongly delayed promotion. Or they can relate to issues of broader social inequality, as in the case of predictive policing described just above. So one thing you will notice throughout this book is that not all of our examples are about women–and deliberately so. This is because data feminism is about more than women. It’s is about more than gender. Put simply: Data Feminism is a book about power in data science. Because feminism, ultimately, is about power too. It is about who has power and who doesn’t, about the consequences of those power differentials, and how those power differentials can be challenged and changed.
In the interest of making the published work as complete as possible, D’Ignazio and Klein made the manuscript public and are ready for feedback.
-
xkcd referenced the ever-so-loved forecasting needle. I’m so not gonna look at it this year. Maybe.
-
A meme that cried “jobs not mobs” began modestly, but a couple of weeks later it found its way into a slogan used by the President of the United States. Keith Collins and Kevin Roose for The New York Times traced the spread of the meme through social media using a beeswarm chart. Blue represents activity on Twitter, yellow represents Facebook, and orange represents Reddit. Circles are sized by retweets, likes, and upvotes. The notes for key activities move the story forward.