Google’s Ngram Viewer

by Simon Fodden

I’ve only just come across Books Ngram Viewer, a Google Labs tool that lets you derive graphs from their Books database at the text level. You can enter up to three terms and graph the frequency with which each term occur in a given corpus over time. Drawn from five million of the 15 million books Google has digitized thus far, there are five corpora in English, and one for each of Chinese (simplified), French, Spanish, Russian, and German.

In English, the basic corpus has books ranging from 1500 to 2008 and is offered without any filtering except as to quality of OCR and metadata, resulting in 361 billion words. Further filtering produces English Fiction, British English (published in UK), and American English (published in US). There’s also the English Million, built of 6000 books from each year randomly selected. The About page explains all this and more, and alerts you to the fact that punctuation counts in this exercise. (If you’re interested in the difficult math and linguistics issues encountered in constructing the Viewer, feel free to get yourself a free account on Science and read “Quantitative Analysis of Culture Using Millions of Digitized Books.“)

To illustrate what can be done, I ran a search on the word “privacy” in books from 1900 to 2008, resulting in this graph:

Click on image to enlarge.

It’s also interesting to run terms against each other to seek correlations (not causes, remember). Thus, in a graph produced by Rob Sanderson, whose tweet alerted me to this tool, we see the terms [feminism] [terrorism] [civil rights] played out on the same scale:

Click on image to enlarge.

Note that beneath each block of years there’s a link to books from that period that might be relevant to your search terms.

Comments

Peter Pappas

December 17th, 2010 at 6:26 pm

Google’s “Books Ngram Viewer” is another free online tool that allows us to visualize information in new ways. I explore how to use the tool in the classroom to help students better understand the research method in my blog post – “How To Quantify Culture? Explore 500 Billion Published Words With Google’s Ngram Viewer” http://bit.ly/gcKJdp

PS – It includes an Easter Egg – Search for “never gonna give you up” and see what pops up!

Most Recent Comments

Kari D Boyle on Meaningful Participation of Children and Youth in Justice: Voice Is Not Enough:

Sorry for my delay in getting back to you Noel. Great question! We definitely need more research in this area.… more »
Alastair Clarke on Issues of Self-Representation in a Landmark Decision: Reflecting on Ahluwalia v. Ahluwalia:

Indeed, this situation is very serious within the immigration context. IRCC encourages applicants to follow their guides and they actively… more »
David Collier-Brown on Resisting the Echo Chamber: AI-Assisted Judgment Writing and the Risk of Homogenization:

I find LLMs are better at critiquing text than writing it. I also tell the editor-bots "If you suggest alternate… more »
Bryce Smith on Issues of Self-Representation in a Landmark Decision: Reflecting on Ahluwalia v. Ahluwalia:

Thank you for highlighting the stated purpose of the justice system to provide justice, alongside the profound tensions created by… more »

+ -

Beyond Fake Cases: The Other Ways AI Is Going Wrong in Canadian Courts

Beyond Role Playing: How Simulated Clients Enhance Learning

Caught Between a Rock and a Hard Place: Research Libraries, AI Research and Contract Override

“Refs, You Suck!”: Personal Attacks on Decision Makers

Tips Tuesday: Use Newspaper Archives to Find Cases

Forum Shopping Could Fix the Delay Problem

Google’s Ngram Viewer

Comments