Making Enterprise Search Work

The biggest question I’ve been getting lately from clients and potential clients is why they need to bother with things like organizing documents or content, and why taxonomy and metadata needs to be applied. Why can’t they just drop in a search tool like Google to work its magic instead? Why bother spending time cleaning out irrelevant stuff and getting the useful material into good order?

I tell them essentially it is things like the structure, organization, and metadata such as keywords, taxonomy, author names, dates created and modified, that help the search engines do their job. The better things are organized and described with metadata, the better quality of results.

Screen shot 2014-10-06 at 2.31.42 PMThen I saw the infographic posted last week by Berrie Pelser about factors that Google uses in its search results ranking algorithm: Google’s 200 Ranking Factors (click through to see it since it is too large for me to post here–it’s huge!)

You can argue the accuracy of what he has put together (the actual algorithm is kept secret by Google and does change periodically), but what has struck me with this:

  1. how many of these factors are indeed metadata based and
  2. how many of the other factors are not applicable to documents and information not on the public Internet (such as what might be on an intranet or inside an organization’s document management system).

A quick scan of some of the factors that Google uses to provide search results that are metadata based:

  • keywords used in various places such as title and domain name (including too many keywords stuffed into the metadata or coding of the site that are not seen by the average website visitor)
  • how recently content was updated
  • age of the page
  • WordPress tags
  • contact information
  • number of pages of a website
  • location of a page within the navigation of the site
  • relative geographic location of the site and the website visitor (“geo targeting”)

A lot of the other factors include quality of links coming into a site or going out of a site, something that typically we do not have inside a system such as a DMS (document management system) although may be seen on a large intranet. Some of the other factors involve data being kept about the website visitor rather than the site itself.

All of this makes me increasingly curious about how enterprise search engines work “under the hood” and how many of the other factors–such as profiles of the searchers–are being used or will be used in the future to improve and even customize results. It is always interesting to read the search engine product descriptions, get demos from the vendors and talk to in-house IT and Knowledge Management folk about what they are using as I have been. But that is just scraping the surface, isn’t it?

I’m curious to know what resources others might recommend for resources to stay on top of the developments in this area–any good blogs or books, for example? Any organizations doing comparative reports, such as the way the Real Story Group monitor and evaluate the CMS (content management system) space? I would love your suggestions in the comments below.

Related Slaw posts by Connie Crosby:

Why Can’t You Just Make it Work Like Google? (December 3, 2012)
Why Can’t You Just Make It Work Like Google? Part 2 – Good Enough Is Not Good Enough (December 10, 2012)

Comments

  1. This is a popular topic is “Enterprise Search ready to replace Document Management?” and I talked about it on the Cersys blog: http://cersys.ca/2013/02/27/enterprise-search-ready-to-replace-document-management/

    I’ve deployed DMS and Enterprise Search at multiple law firms. From what I’ve seen of today’s predictive search technology, trying to use today’s search instead of metadata is a steep slippery slope that’ll drop you into the swamp.

    The email management tools that I’m familiar with – learning or predicting metadata based on context or semantic analysis – are still leaving the approval step for human intervention.

  2. Thank you for your additional thoughts, Sandy. Yes, at times the question is framed in terms of whether documents can remain in file drives and just searched.