Law as Algae
One of the many brilliant things that Google indexing has created is something known as the Web 1T 5-gram corpus made available for scholars via the Linguistic Data Consortium at the University of Pennsylvania.
Very roughly stated, as I understand it, n-grams have to do with the frequency with which one unit in a language is followed by another unit — e.g. how many times in a given body of text is the word “love” followed by the word “fifteen,” and what, then, is the predictability of this 2-gram occuring when “love” occurs. You can see how Google would . . . [more]
