Hein OnLine’s Reprise of Google’s Indexing

In a Nov. blog post, Hein addressed deficiencies in Google’s indexing of their content.

We provided the metadata for and allowed Google Scholar to crawl more than 1 million documents from HeinOnline’s Law Journal Library. … Of these … they opted to only include about 50% of the content in the Google Scholar index.

[…]

While it is hard to pinpoint exactly what Google Scholar’s methodology is for adding documents to their index, we do know that they have left out some key documents from HeinOnline’s Law Journal Library.

Then in Dec., Hein addressed improvements to the indexing made by Google in response:

Google indicated that they improved their indexing algorithms and subsequently rebuilt the index. With this new index in place, Google Scholar now includes an additional 250,000 documents/sections/articles from HeinOnline’s Law Journal Library

Hein goes on to point out how much value their increasingly sophisticated search and personal account tools offer, and point to their citation counting and other features.

We have seen a few other examples in recent months of Google failing take account of the variety and nuances embedded in bibliographic metadata. This incomplete indexing adds a level of concern. For instance, of the millions of court decisions released into the wild by Bulk.Resource.Org and subsequently offered via Google Scholar, how many are actually available via Google?

Comments are closed.