Google’s Ngram Viewer

I’ve only just come across Books Ngram Viewer, a Google Labs tool that lets you derive graphs from their Books database at the text level. You can enter up to three terms and graph the frequency with which each term occur in a given corpus over time. Drawn from five million of the 15 million books Google has digitized thus far, there are five corpora in English, and one for each of Chinese (simplified), French, Spanish, Russian, and German.

In English, the basic corpus has books ranging from 1500 to 2008 and is offered without any filtering except as to quality of OCR and metadata, resulting in 361 billion words. Further filtering produces English Fiction, British English (published in UK), and American English (published in US). There’s also the English Million, built of 6000 books from each year randomly selected. The About page explains all this and more, and alerts you to the fact that punctuation counts in this exercise. (If you’re interested in the difficult math and linguistics issues encountered in constructing the Viewer, feel free to get yourself a free account on Science and read “Quantitative Analysis of Culture Using Millions of Digitized Books.“)

To illustrate what can be done, I ran a search on the word “privacy” in books from 1900 to 2008, resulting in this graph:

Click on image to enlarge.

It’s also interesting to run terms against each other to seek correlations (not causes, remember). Thus, in a graph produced by Rob Sanderson, whose tweet alerted me to this tool, we see the terms [feminism] [terrorism] [civil rights] played out on the same scale:

Click on image to enlarge.

Note that beneath each block of years there’s a link to books from that period that might be relevant to your search terms.

Comments

  1. Google’s “Books Ngram Viewer” is another free online tool that allows us to visualize information in new ways. I explore how to use the tool in the classroom to help students better understand the research method in my blog post – “How To Quantify Culture? Explore 500 Billion Published Words With Google’s Ngram Viewer” http://bit.ly/gcKJdp

    PS – It includes an Easter Egg – Search for “never gonna give you up” and see what pops up!