• November 14, 2018

    Atma Mani, a geospatial engineer for ESRI, imagined shopping for a house with data, maps, and analysis. Basically, a personalized recommendation system:

    The type of recommendation engine built in this study is called ‘content based filtering’ as it uses just the intrinsic and spatial features engineered for prediction. For this type of recommendation to work, we need a really large training set. In reality nobody can generate such a large set manually. In practice however, another type of recommendation called ‘community based filtering’ is used. This type of recommendation engine uses the features engineered for the properties, combined with favorite / blacklist data to find similarity between a large number of buyers. It then pools the training set from similar buyers to create a really large training set and learns on that.

    I love going all nerd on these sort of things. The most interesting part for me though is that it always seems to come down to a gut feeling. You have to see the house and get a feel for the area, which is much harder to get through data. So then, how do you couple the information you get from the data with more fuzzy emotions?

  • November 13, 2018

    Topic

    Maps  /  ,

    From Streetscapes by Zeit:

    Street names are stories of life. They tell us something about how the people in a given place work and live, what they believe in and their dreams. There are more than a million streets and squares in Germany. ZEIT ONLINE has compiled a database of the roughly 450,000 different names used. Some street names are used hundreds of times and others only once. But none of the names were chosen at random.

    It’s for street names in Germany, so the meaning might be lost for many of you, but much of the data comes from OpenStreetMap, which should mean something like this is doable for other cities and countries.

    See also the San Francisco history of street names mapped by Noah Veltman a few years ago. [via @maartenzam]

  • November 12, 2018

    Reading visualization research papers can often feel like a slog. As a necessity, there’s usually a lot of jargon, references to William Cleveland and Robert McGill, and sometimes perception studies that lack a bit of rigor. So for practitioners or people generally interested in data communication, worthwhile research falls into a “read later” folder never to be seen again.

    Multiple Views, started by visualization researchers Jessica Hullman, Danielle Szafir, Robert Kosara, and Enrico Bertini, aims to explain the findings and the studies to a more general audience. (The UW Interactive Data Lab’s feed comes to mind.) Maybe the “read later” becomes read.

    I’m looking forward to learning more. These projects have a tendency to start with a lot of energy and then fizzle out, so I’m hoping we can nudge this a bit to urge them on. Follow along here.

  • Members Only

    How I Made That: Animated Difference Charts in R

    A combination of a bivariate area chart, animation, and a population pyramid, with a sprinkling of detail and annotation.

  • November 9, 2018

    Charles-Joseph Minard, best known for a graphic he made (during retirement, one year before his death) showing Napoleon’s March, made many statistical graphics over his career. The Minard System from Sandra Rendgen is a collection of these works. The first section is background on Minard, his famed graphic, and his process, but really, you get it for the collection of vintage graphic goodness. [Amazon link]

  • November 8, 2018

    The Earth Puzzle by generative design studio Nervous System has no defined borders. You put it together how you want.

    Start anywhere and see where your journey takes you. This puzzle is based on an icosahedral map projection and has the topology of a sphere. This means it has no edges, no North and South, and no fixed shape. Try to get the landmasses together or see how the oceans are connected. Make your own maps of the earth!

    Get it here. There’s also one for the moon.

  • Members Only
    November 8, 2018

    Topic

    The Process  / 

    Election night has become quite the event for newsrooms and graphics departments over the years, and the visualization production cycle has started to feel more familiar each time.

  • November 8, 2018

    Ben Schmidt uses deep scatterplots to visualize millions of data points. It’s a combination of algorithm-based display and hiding of points as you zoom in and out like you might an interactive map. Schmidt describes the process and made the code available on GitHub.

  • November 7, 2018

    The Guardian goes with scaled, angled arrows to show the Republican and Democrat swings in these midterms for the House compared against those of 2016.

    It reminds me of the classic wind-like map by The New York Times from 2012, but the angles seem to give the differences a bit more room to breathe.

    Update: Also, see a similar map by NYT from 2016, except the arrows point the other direction.

  • November 7, 2018

    Topic

    Statistics  /  , , ,

    Artificial intelligence, given its name, sounds like a computer learns everything its own. However, a set of algorithms can only become useful if there’s something to learn from: data. Dave Lee for BBC reports on a company in Kenya that supplies training data for self-driving cars:

    Brenda loads up an image, and then uses the mouse to trace around just about everything. People, cars, road signs, lane markings – even the sky, specifying whether it’s cloudy or bright. Ingesting millions of these images into an artificial intelligence system means a self-driving car, to use one example, can begin to “recognise” those objects in the real world. The more data, the supposedly smarter the machine.

    On the one hand it sounds like tedious work on the cheap, but on the other it provides people with more opportunities that were previously unavailable.

  • November 6, 2018

    Data grows more intertwined with the everyday and more involved in important decisions. However, data is biased in many ways from collection, to analysis, and the conclusions, which is a problem when it is often intended to provide an objective point of view. In their recently released manuscript for Data Feminism, Catherine D’Ignazio and Lauren Klein discuss the importance of varied points of view:

    The double-edged sword of data shows just how important it is to understand how structures of power and privilege operate in the world. The questions we might ask about these structures can relate to issues of gender in the workplace, as in the case of Christine Darden and her wrongly delayed promotion. Or they can relate to issues of broader social inequality, as in the case of predictive policing described just above. So one thing you will notice throughout this book is that not all of our examples are about women–and deliberately so. This is because data feminism is about more than women. It’s is about more than gender. Put simply: Data Feminism is a book about power in data science. Because feminism, ultimately, is about power too. It is about who has power and who doesn’t, about the consequences of those power differentials, and how those power differentials can be challenged and changed.

    In the interest of making the published work as complete as possible, D’Ignazio and Klein made the manuscript public and are ready for feedback.

  • November 6, 2018

    Topic

    News  /  , , ,

    xkcd referenced the ever-so-loved forecasting needle. I’m so not gonna look at it this year. Maybe.

  • November 5, 2018

    A meme that cried “jobs not mobs” began modestly, but a couple of weeks later it found its way into a slogan used by the President of the United States. Keith Collins and Kevin Roose for The New York Times traced the spread of the meme through social media using a beeswarm chart. Blue represents activity on Twitter, yellow represents Facebook, and orange represents Reddit. Circles are sized by retweets, likes, and upvotes. The notes for key activities move the story forward.

  • November 5, 2018

    The Economist built an election model that treats demographic variables like blocks that output a probability of voting Republican or Democrat:

    Our model adds up the impact of each variable, like a set of building blocks. As a result, a group of weak predictors that point in the same direction can cancel out a single strong one. In theory, the model could identify a black voter as a Republican leaner, or a white evangelical as a probable Democrat—though it would require quite an unusual profile.

    Remember when most people paid little attention to midterm elections and result forecasting was not really a thing? Yeah, me neither.

    Be sure to check out the small interactive on the same page that lets you “build a voter” and get the model’s probability output. I’m a fan of the demographic-field-dropdowns-in-a-sentence format.

  • November 5, 2018

    As the midterm elections loom, the ads focusing on key issues are running in full force. Using data from Nielsen, Bloomberg mapped the issues talked about across the country.

    Bloomberg News analyzed more than 3 million election ads for 2018 congressional and gubernatorial races to get a sense of the most commonly discussed issue in 210 local television markets, as defined by the Nielsen Company. Across the U.S., 16 different topics are mentioned more than anything else during midterm TV ads.

    The map above shows the most common per Nielsen market, but read the full article for the national breakdowns of the major issues.

    Health care has been huge in my area. For the past few weeks, every YouTube video I watch is preceded by an ad, and my mailbox keeps getting filled with ads for and against a certain proposition, often on the same day.

  • November 2, 2018

    As one might expect, many women, people of color, and L.G.B.T. candidates are running in this year’s midterms. It’ll be one of the most diverse elections in U.S. history. The New York Times provides a scrolly breakdown with 410 cutout faces floating around on your screen.

  • November 2, 2018

    Topic

    Maps  /  ,

    Randall Munroe, Kelsey Harris, and Max Goodman for xkcd mapped all the challengers for the the upcoming midterm elections. Names are colored by political party. They are sized by the level of office a candidate is running for and the chances of success. (I’m not totally sure how that scale works though.) Interact with the map to focus on regions, and click on names, which directs you to the candidate’s election site.

    Wow.

  • Members Only
    Tutorials  / 

    How to Make Frequency Trails in R

    Also known as ridgeline plots, the method overlaps time series for a 3-D-ish view of the data. While perhaps not the most visually efficient, the allure is undeniable.

  • November 2, 2018

    I really like what The New York Times has been doing with augmented reality lately. What usually feels gimmicky is used as a tool to provide scale and detail and to invite closer observation. In their most recent, the Times got in the Halloween spirit and showed the “monsters that live on you.” You can view it in the browser, but it doesn’t quite compare to seeing a human-sized cockroach sitting your living room.

  • Members Only
    November 1, 2018

    Topic

    The Process  /  ,

    Over the next few months, I’ll be looking more closely at the available visualization apps to see what works and what doesn’t. In this issue, I start with Flourish.