Google News is reporting that IBM is about to give away a concept-based search engine for unstructured data:
This piece explains the concept
Search concepts, not keywords, IBM tells business
Mon Aug 8, 2005 12:02 AM EDT
By Eric Auchard
NEW YORK (Reuters) – IBM plans to give away key search technologies for corporate data retrieval that use concepts and facts instead of simpler “keyword” searches relied upon by consumer Web companies such as Google Inc., the world’s largest computer company said on Monday.
While simple but powerful keyword searches have revolutionized how Internet users locate and retrieve information, IBM is looking to transform how office workers sift through the piles of data stored inside organizations.
“I don’t see any of the major players moving into this area,” Arthur Ciccolo, head of search technology at IBM Research, said of how major consumer Internet search companies such as Google, Yahoo Inc. and Microsoft have focused on the public Internet instead of private record data retrieval.
IBM plans to openly offer other software developers its Unstructured Information Management Architecture (UIMA), a technology that can analyze text within documents and other media to understand latent meanings, relationships and facts.
Some 15 companies, including Attensity, ClearForest, Cognos, Endeca, Factiva, Kana, Inquira, iPhrase, Inxight, nStein, QL2, SAS, Schemalogic, Semagix, SPSS Inc. and Temis plan to use UIMA as a framework for search and text analysis of unstructured data, IBM said. Factiva is a joint venture of financial information providers Dow Jones & Co. Inc. and Reuters Group Plc .
IBM is also offering its WebSphere OmniFind software for helping users perform searches on unstructured data in a variety of formats or languages, be they located in databases, e-mail files, audio recordings, pictures or video images.
Ciccolo said UIMA will allow many different suppliers of software used in knowledge management, search, business intelligence and text analytics to work with one another.
The corporate data search framework being made available to other software developers is the result of more than four years of development by IBM Research, with contributions from researchers at top U.S. universities, and support from the U.S. Defense Advanced Research Projects Agency (DARPA), IBM said.
Other researchers working on UIMA are military contractors Science Applications International Corp., BBN Technologies and MITRE Corp. and health care provider The Mayo Clinic.
As an example, a combination of software from Attensity, ClearForest, iPhrase, Kana and IBM can be used by consumer goods makers to monitor the Web for initial complaints about a product defect and locate internal corporate data that might help it quickly respond to potential product quality issues.
There has been an explosion in “unstructured” information on the web, taking the form of documents, images, comment and note fields, e-mail and even rich media like video and audio.
However, the technology has not existed to allow software to search out and make sense of these disparate forms of data.
But the push to render meaning out of unstructured information will take many years to solve. To be sure, the issue is as old as messy filing systems and woolly thinking.
A decade ago many database developers, including Informix, a company subsequently acquired by IBM, said their database management systems were close to solving the unstructured data issue. Yet some 85 percent of corporate data still sits in unstructured form outside of databases, analysts estimate.
UIMA technology is expected to be made available through open-source software site SourceForge by the end of 2005. The UIMA framework can currently be downloaded free of charge from IBM AlphaWorks at http://www.alphaworks.ibm.com/tech/uima/.