Vector paths of meaning between words and phrases

Jun 15, 2018

Benjamin Schmidt, an assistant professor of history at Northeastern University, explored the space between words and drew the paths to get from one word to another. The above, for example, is the path between Seinfeld and Breaking Bad. Using Google News as the corpus, the steps:

  1. Take any two words. I used “duck” and “soup” for my testing.
  2. Find a word that is, in cosine distance, between the two words: that is, that is closer to both of them than either is to each other. Select for one as close to the midpoint as possible.* With “duck” and “soup,” that word turns out to be “chicken”: it’s a bird, but it’s also something that frequently shows up in the same context as soup.
  3. Repeat the process to find words between “duck” and “chicken.” That, in this corpus, turns out to be “quail.” The vector here seems to be similar to the one above–quail is food relatively more often than duck, but less overwhelmingly than chicken.
  4. Continue subdividing each path until no more intermediaries exist. For example, “turkey” works as a point between “quail” and “chicken”; but nothing intermediates between turkey and quail, or between turkey and chicken.

Schmidt’s results actually make a lot of sense.

See also: the Google arts experiment that motivated this one.

Favorites

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.

The Best Data Visualization Projects of 2014

It’s always tough to pick my favorite visualization projects. Nevertheless, I gave it a go.

Shifting Incomes for American Jobs

For various occupations, the difference between the person who makes the most and the one who makes the least can be significant.

The Changing American Diet

See what we ate on an average day, for the past several decades.