Search how phrases have been used via Google Ngram Viewer

Language changes. Culture changes. And we can see some of these changes via what authors write about in books over the years. Google’s Book Ngram Viewer lets you search through this data, and shows a graph similar similar to the output of Google Trends. The above is the trends for nursery school, kindergarten, and child care:

This shows trends in three ngrams from 1950 to 2000: “nursery school” (a 2-gram or bigram), “kindergarten” (a 1-gram or unigram), and “child care” (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are “nursery school” or “child care”? Of all the unigrams, what percentage of them are “kindergarten”? Here, you can see that use of the phrase “child care” started to rise in the late 1960s, overtaking “nursery school” around 1970 and then “kindergarten” around 1973. It peaked shortly after 1990 and has been falling steadily since.

Find anything interesting?

Here’s a search for video, radio, and internet. I think there’s something to this Internet fad:

Here’s a search for can, cannot, and maybe:

The more notable part of this launch is perhaps that all of the data backing the Ngram Viewer is available for download so that you can run your own experiments.

[Books Ngram Viewer | Thanks, @mattorantimatt and Michael]

20 Comments

Favorites

Graphical perception – learn the fundamentals first

Before you dive into the advanced stuff – like just about everything in your life – you have to learn the fundamentals before you know when you can break the rules.

The Best Data Visualization Projects of 2011

I almost didn’t make a best-of list this year, but as I clicked through the year’s post, it was hard …

Famous Movie Quotes as Charts

In celebration of their 100-year anniversary, the American Film Institute selected the 100 most memorable quotes from American cinema, and …

Life expectancy changes

The data goes back to 1960 and up to the most current estimates for 2009. Each line represents a country.