Column

Solex : Lexum’s Latest Search Engine

In the movie The Man with the Golden Gun, the Solex is a revolutionary device that is meant to solve the 1973 energy crisis. After killing its British inventor, an elite assassin steals the Solex to sell it to foreign powers. James Bond is dispatched to find the assassin and recover the precious device. Because this is a James Bond movie, as a matter of course, there’s also a laser.

Solex also stands for SolrCloud Lexum plugins, the latest iteration of the search engine Lexum deploys in all its products.

Lexum has used a wide variety of search engines throughout its history. It all started at the dawn of the Web, in 1994, with the Wide Area Information Server (WAIS). Then came the NQL search engine from a local Montreal firm. Then, for a year or so, AustLII’s SINO search engine. In 2003, we elected to build a search engine of our own: Eliisa, a faster, more capable Apache Lucene based search engine library. Finally, in 2009, we integrated Apache Solr elements into Eliisa and turned it into a standalone server application.

A few months ago, we announced the release of our 3rd generation search engine: Solex.

Over the years, we’ve added a number of functionalities to the stock Apache Lucene/Solr search platforms: faster result list “snippet” generation, phrase query performance improvements, whole document highlighting, phrase and sentence proximity operators, a smart auto-complete mechanism for document identifiers, a lenient query parser with a custom syntax, HTML aware indexation, citation indexation and highlighting, noteup counting functionalities and more.

So, what new features does Solex bring? The answer is scalability, performance improvements and flexibility.

For the last fifteen years, Lexum has proudly provided CanLII with the fastest legal search engine in Canada. Content and traffic growth have however made our exacting performance standards harder to maintain. Nowadays, CanLII indexes several billion words of content and handles on average fifteen queries per second, with frequent spikes of 50 or more queries per second. Quite simply, our previous search engine had reached the limit of what could be done in a single server process. Solex, our new search engine, is based on Apache SolrCloud, a technology Netflix, Instagram, Reddit, and other Internet giants rely on for their own search platforms. Solex scales horizontally by distributing content and queries to as many servers as necessary. As a result, response time is better and more consistent, with up to 500% speedups for certain queries, ensuring that users of Lexum’s search products enjoy nearly instantaneous response times in nearly all situations.

One of the interesting things to know about search engines is that relevance ranking is extremely sensitive to the nature of the indexed content. No search engine can serve relevant content out of the box without extensive tuning. Furthermore, recipes that work for retail are completely different from those that work for intranets, the Web, or job searches. Similarly, legal search engines are specialized and unique. Lexum has put a lot of efforts into hand tuning its search engine for that special purpose, taking into account metrics such as citations count, documents length, documents age, etc.

Although Solex modestly improves top results ranking precision over our previous generation engine, the best is yet to come. A distributed paradigm will provide us with the flexibility required to experiment with new, more processor hungry machine-learning based relevance algorithms such as citation network analysis and learning to rank. The latest is an algorithm that teaches itself to rank results better by learning from the users clickstream, i.e. the various interactions users have with the hundreds of thousands of search results that are displayed on CanLII each day. Learning to rank works by learning to downrank results that are ignored by users and uprank those that see strong engagement.

We’ll be experimenting with these techniques in the coming months.

In the meantime, Solex has been powering CanLII since the end of February and is currently being integrated to the complete Lexum’s product suite. Our Solex might not solve the next energy crisis but we hope it will provide a solution to your next legal research.

Marc-Andre Morissette
Chief Technology Officer, Lexum

Start the discussion!

Leave a Reply

(Your email address will not be published or distributed)