Google OCRing Scanned Documents

by Shaunna Mireau

I wonder how Google is choosing the material that it reports it is OCRing from scanned material save to the web?

In the past, scanned documents were rarely included in search results as we couldn’t be sure of their content. We had occasional clues from references to the document– so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe’s PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world’s information accessible and useful.

Comments are closed.

Most Recent Comments

Steph Swierenga on Resisting the Echo Chamber: AI-Assisted Judgment Writing and the Risk of Homogenization:

It would be interesting to measure this convergence. Citation diversity could be tracked. If models keep reaching for the same… more »
Kari D Boyle on Meaningful Participation of Children and Youth in Justice: Voice Is Not Enough:

Sorry for my delay in getting back to you Noel. Great question! We definitely need more research in this area.… more »
Alastair Clarke on Issues of Self-Representation in a Landmark Decision: Reflecting on Ahluwalia v. Ahluwalia:

Indeed, this situation is very serious within the immigration context. IRCC encourages applicants to follow their guides and they actively… more »
David Collier-Brown on Resisting the Echo Chamber: AI-Assisted Judgment Writing and the Risk of Homogenization:

I find LLMs are better at critiquing text than writing it. I also tell the editor-bots "If you suggest alternate… more »

+ -

Voice Is Not Enough: Co-Creating the Future of Child-Inclusive Mediation

A Woman’s Work Is Never Done. or Valued Appropriately.

New Perspectives on the Legal Treatise

What Does It Mean to Be a Competent Lawyer in the Age of AI?

The Dangers of Catastrophizing in Client Communications

The Wellness Lawyer: “How Are You?”

Google OCRing Scanned Documents