Thursday Thinkpiece: Whelan on Finding Legal Information on the Internet

Each Thursday we present a significant excerpt from a recently published book or journal article. In every case the proper permissions have been obtained. If you are a publisher who would like to participate in this feature, please let us know via the site’s contact form.

David Whelan
Toronto: Canada Law Book, 2012
[© 2012 Thomson Reuters Canada Limited. Reproduced by permission of Carswell, a division of Thomson Reuters Canada Limited.]

Excerpt: pp. 64-67

General Search

The primacy of Google and Bing has not stopped other search engines from entering the fray. Each one has a particular focus that separates them from the major engines, even if they are focused on the same type of content.

Blekko appeared in late 2010 with a focus on a concept known as curated search: <>. It provided a way for searchers to create subsets of sites to search, similar to using Roll- or Google Custom Search. Then, when you wanted to run a search, you could focus it just on this set of sites. The sites were attached to a slashtag, which defines the category. Slashtags can be shared, so that you can get the benefit from other people’s categorizations.

Here is an example. Once you have created an account and logged into Blekko, you can start to create your own slashtags. You give your slashtag a name — it only has to be unique to you, not unique on the Blekko site — and then start to add the sites you want to search. Here is a listing of my case-law slashtag showing the sites that I want included:

Click on image to enlarge.

Click on image to enlarge.

When you are ready to search, you type your search query followed by your slashtag:

“choice of laws” /case-law

and Blekko will run that search query but only retrieve results that come from the sites that you have listed when you created your slashtag. It provides a way to eliminate sites that are unlikely to have useful results.

Blekko has made its campaign against spam and content farms a significant feature of its search engine. Even a search without any slashtag will use a much smaller index than Google, because Blekko purposefully blocks sites that it designates as having content of low value.

DuckDuckGo follows a similar pattern except that its focus is on search and privacy. They promise not to track you or retain information about your searches. They have a number of the same functions that are available on Google — search limiters based on site or region, for example — but you need to know their syntax in order to use them. The goodies page has a list of all of the shortcuts and many of the quick reference searches you might want to run: <>.

DuckDuckGo also uses something called the bang search. It is similar to creating a search engine plug-in for your Web browser search bar. You type your query into the DuckDuckGo search box but you add the site you want to search with an exclamation point in front of it. The site needs to be defined in the DuckDuckGo bang list: <http://>. If you can find an appropriate bang search, it will focus your results like Blekko. The most useful bang searches for a legal researcher are probably the U.S. government searches. For example, if you want to look up SEC filings for Thomson Reuters, it is easy to create the search:

!sec “thomson reuters”

The result will be to land on the SEC search page with matching results. The results do not seem to be different than, certainly not better than, results you would receive from Google or Bing with a limiter.

Cluuz is an unusual and quite unassuming search site: <http://>. When you visit the page, it does not look like much but a quick search will offer multiple ways of looking at the results. First, it returns top results categories that collapse your results into its best guess as to the most relevant hits.

A search on Google will return results that, for the most part, include the page title that you can click to see the matching Web page, and a short description of the page. Cluuz’ results include both the linked page title and description, but also a set of keywords extracted from the matching page. These keywords are also hyperlinked, and execute a new search based on the keyword. This can provide new search topics and enable a quick search on them.

Cluuz also offers a graphical cluster showing relationships between results. This can help you to see what is in your search results and whether there is a cluster that might help you focus your search more effectively.

Finally, there is the faceted list at the top right of your search results, called Top Linked Entities. Click on any of the facets (the plus sign) and it will run a search retrieving all of the results that are lumped in the category. Since these are created “on the fly”, they do not always look like typical categories and you may need to click on one to see how it will impact the results set.

Big data is also a focus for search engine development and Wolfram Alpha has gathered a lot of attention for being able to handle data-related searches. If you type in a data-oriented keyword search, for example how many lawyers in the U.S., Wolfram Alpha will attempt to locate data sources that will provide an answer. For example, there is an industry classification for judges and lawyers, Bureau of Labor Statistics data, and information from human resources Web sites. In fact, the number is substantially lower than is reported by the American Bar Association. A similar search for the number of lawyers in Canada returns no information about lawyers at all, just general demographics for Canada’s population.

Wolfram Alpha has a page dedicated to example searches and it is clear that, while some of the money and finance-related searches might be useful to the legal researcher, most of these are science-focused. There will be some client issues that will require this sort of deep data dive but for the most part, Wolfram Alpha will be a search engine on the fringe of a legal researcher’s world: < examples/?src=input>.


  1. Don’t forget about David’s excellent companion blog.