Open Text Mining Interface

A couple of days ago, I came across a note by Tim O’Reilly concerning the Open Text Mining Interface (OTMI). O’Reilly described it as a “copyright hack.” It seems this initiative was started by Timo Hannay, who has also blogged about it on the website of his employer, Nature magazine. The initiative itself is an attempt to respond positively to requests from indexers and data-miners for full-text versions of articles, but without at the same time making human-readable versions of the articles readily available free to non-subscribers. OTMI, an XML format, consists of “word vectors” plus “snippets” which amount, more or less, to all of the sentences in the article arranged alphabetically instead of in their original order. Links to samples are available in Hannay’s posting.


  1. It’s funny, John: one of the ideas that came to mind when I was fussing how to deal with the refusal of Big Law Publishers to let Slaw put up the tables of contents of their books was to “unarrange” them in some way that would free them from whatever copyright there might be. The notion was that the search mechanism on the TOC site would do the work of getting you where you wanted to go within the mess of data that was once a table of contents. I don’t think it would work legally, though, and might not work within the aim of the venture.