Next Generation Search – the Voice Interface

by Simon Chester

Techtree in Mumbai is reporting today on a new Google patent for a voice interface for search engines.

This essentially means if and when the product is built, users will be able to phone an internet search query or say it aloud instead of typing it inAccording to Swapnil Bhartiya of EFY News Network, “You can have a demo of Google Voice Search on Google Labs, Google’s pre-beta-test site, for well over a year. Google Voice Search, still up on Google Labs, lets people call into Google by phone”.
Or in Patent Speak:

A system provides search results from a voice search query. The system receives a voice search query from a user, derives one or more recognition hypotheses, each being associated with a weight, from the voice search query, and constructs a weighted boolean query using the recognition hypotheses. The system then provides the weighted boolean query to a search system and provides the results of the search system to a user.

To satisfy the average user, a voice interface to a search engine must recognize spoken queries, and must return highly relevant search results. Several problems exist in designing satisfactory voice interfaces. Current speech recognition technology has high word error rates for large vocabulary sizes. There is very little repetition in queries, providing little information that could be used to guide the speech recognizer. In other speech recognition applications, the recognizer can use context, such as a dialogue history, to set up certain expectations and guide the recognition. Voice search queries lack such context. Voice queries can be very short (on the order of only a few words or single word), so there is very little information in the utterance itself upon which to make a voice recognition determination.

Current voice interfaces to search engines address the above problems by limiting the scope of the voice queries to a very narrow range. At every turn, the user is prompted to select from a small number of choices. For example, at the initial menu, the user might be able to choose from “news,” “stocks,” “weather,” or “sports.” After the user chooses one category, the system offers another small set of choices. By limiting the number of possible utterances at every turn, the difficulty of the speech recognition task is reduced to a level where high accuracy can be achieved. This approach results in an interactive voice system that has a number of severe deficiencies. It is slow to use, since the user must navigate through may levels of voice menus. If the user’s information need does not match a predefined category, then it becomes very difficult or impossible to find the information desired. Moreover, it is often frustrating to use, since the user must adapt his/her interactions to the rigid, mechanical structure of the system.

Therefore, there exists a need for a voice interface that is effective for search engines.

And Google intends to build it.

Comments

Simon Fodden

April 15th, 2006 at 9:55 am

Seems like the demo is no longer active:
http://labs1.google.com/gvs.html

What a shame. It would have been great to be able to say, “Johnson? Get me everything on formedon in the remainder!”

Most Recent Comments

Arun on Meaningful Participation of Children and Youth in Justice: Voice Is Not Enough:

This is a thoughtful and important article. The point that “voice” alone is not enough is especially meaningful in family… more »
Cecil Lyon on The Task for Ontario’s Next Chief Justice:

Michael, What an excellent piece - well done! You can note that as of last year we now have in… more »
David Schulze on Summaries Sunday: Supreme One-Liners:

I am not sure that the appeal in Haggaï v. Loisell will turn mostly on "Professional discipline issues re pharmacist."… more »
Andrea Stuart on What’s an Author to Do? Shadow Libraries in the Age of AI.:

Thanks for writing this, Mark. The prospect of everything being scraped by AI triggers for me a longing for the… more »

+ -

Meaningful Participation of Children and Youth in Justice: Voice Is Not Enough

How I Learned About Mentorship by Being “Exiled” to the Library

Notes to a Young AI Professional: On Speed, Status, and Sanity

Book Review: Chilton & Rozema’s Trial by Numbers: A Lawyer’s Guide to Statistical Evidence

The Law Firm Foundational Rebuild

Consciously Competent: A State of Mind for Supporting Student Learning

Next Generation Search – the Voice Interface

Comments