The biggest question I’ve been getting lately from clients and potential clients is why they need to bother with things like organizing documents or content, and why taxonomy and metadata needs to be applied. Why can’t they just drop in a search tool like Google to work its magic instead? Why bother spending time cleaning out irrelevant stuff and getting the useful material into good order?
I tell them essentially it is things like the structure, organization, and metadata such as keywords, taxonomy, author names, dates created and modified, that help the search engines do their job. The better things are organized and described with metadata, the better quality of results.
Then I saw the infographic posted last week by Berrie Pelser about factors that Google uses in its search results ranking algorithm: Google’s 200 Ranking Factors (click through to see it since it is too large for me to post here–it’s huge!)
You can argue the accuracy of what he has put together (the actual algorithm is kept secret by Google and does change periodically), but what has struck me with this:
- how many of these factors are indeed metadata based and
- how many of the other factors are not applicable to documents and information not on the public Internet (such as what might be on an intranet or inside an organization’s document management system).
A quick scan of some of the factors that Google uses to provide search results that are metadata based:
- keywords used in various places such as title and domain name (including too many keywords stuffed into the metadata or coding of the site that are not seen by the average website visitor)
- how recently content was updated
- age of the page
- WordPress tags
- contact information
- number of pages of a website
- location of a page within the navigation of the site
- relative geographic location of the site and the website visitor (“geo targeting”)
A lot of the other factors include quality of links coming into a site or going out of a site, something that typically we do not have inside a system such as a DMS (document management system) although may be seen on a large intranet. Some of the other factors involve data being kept about the website visitor rather than the site itself.
All of this makes me increasingly curious about how enterprise search engines work “under the hood” and how many of the other factors–such as profiles of the searchers–are being used or will be used in the future to improve and even customize results. It is always interesting to read the search engine product descriptions, get demos from the vendors and talk to in-house IT and Knowledge Management folk about what they are using as I have been. But that is just scraping the surface, isn’t it?
I’m curious to know what resources others might recommend for resources to stay on top of the developments in this area–any good blogs or books, for example? Any organizations doing comparative reports, such as the way the Real Story Group monitor and evaluate the CMS (content management system) space? I would love your suggestions in the comments below.
Related Slaw posts by Connie Crosby:
Why Can’t You Just Make it Work Like Google? (December 3, 2012)
Why Can’t You Just Make It Work Like Google? Part 2 – Good Enough Is Not Good Enough (December 10, 2012)