WebCorp

by Simon Fodden

Those of you who like words — discreet pause while 2.36 readers sigh and leave their machines — will be interested in WebCorp, a lingustic tool of the University of Central England that treats the web as a corpus:

However large and up-to-date the electronic text corpora available are, there will always be aspects of the language which are too rare or too new to be evidenced in them. WebCorp is a suite of tools which allows access to the World Wide Web as a corpus – a large collection of texts from which facts about the language can be extracted.

WebCorp takes your search terms and comes back with instances of them that are embedded in a context, the amount of which you’re able to dictate to some degree. One notion behind this is, of course, to come at meaning by way of context. This is generally of interest. But it may be even more interesting that WebCorp will do pattern matching during searches in ways that Google and the general run of search engines will not. Thus, for example, you can use a wildcard * pretty much anywhere you choose.

WordCorp also has a word frequency viewer that will list for you all of the word on a particular web page either alphabetically or by frequency.

Comments

Alexander Potyomkin

April 12th, 2007 at 12:50 am

Hello

I would like to introduce yet another word frequency analyser, Textanz. Besides individual words Textanz calculates phrase and wordform frequencies, evaluates a number of readability parameters , builds charts and reports, provides export functions.

If you work with texts, either as a writer or reader, Textanz is a must-have tool for you. If interested, take a look at
http://www.cro-code.com/textanz.jsp

Kind regards,
Alexander
David Cheifetz

April 15th, 2007 at 8:58 am

In the theme of those obsessed with words (ok, Simon F wrote “like” but we all know what he meant, don’t we?) –

I was looking for something in my home library and found a Dover edition (published in 1959) paperback of a 1932 text on Symbolic Logic. The original sale price of the Dover edition: $2. It cost me $7.

Is there a future for publishers such as Dover? Will we be able to these sorts of reprints in the future, all nicely bound and easily used? Or will projects such as Gutenberg, or Googles’ scan everything in public domain, replace the Dover world, and we’ll simply go online somewhere, and download. Then we’ll print out as needed or, when we have useful digital devices as useful as the BOOK, move the text to that device.

So much for the smell and texture of the old book searching reading experience – even with the the occasional bout of sneezing from decades of dust and the musty smell.

Most Recent Comments

Alastair Clarke on Issues of Self-Representation in a Landmark Decision: Reflecting on Ahluwalia v. Ahluwalia:

Indeed, this situation is very serious within the immigration context. IRCC encourages applicants to follow their guides and they actively… more »
David Collier-Brown on Resisting the Echo Chamber: AI-Assisted Judgment Writing and the Risk of Homogenization:

I find LLMs are better at critiquing text than writing it. I also tell the editor-bots "If you suggest alternate… more »
Bryce Smith on Issues of Self-Representation in a Landmark Decision: Reflecting on Ahluwalia v. Ahluwalia:

Thank you for highlighting the stated purpose of the justice system to provide justice, alongside the profound tensions created by… more »
Dennis Prieto on Law and Literature in Latin America: Context in the Classroom:

When I think of Law and Literature in the North American context, I think of Stevens, MacLeish, Dos Passos, and… more »

+ -

“Refs, You Suck!”: Personal Attacks on Decision Makers

Tips Tuesday: Use Newspaper Archives to Find Cases

Forum Shopping Could Fix the Delay Problem

Resisting the Echo Chamber: AI-Assisted Judgment Writing and the Risk of Homogenization

AI in Mediation. the Tool Is Not the Process: Using the IBA Guidelines to Evaluate Risk in Mediation Practice

RECLAIM: A Is for Autonomy

WebCorp

Comments