Should Search Engines Index Court Decisions?

In the days of electronic access, judicial decisions (and sometimes other court records that have always been public in principle) no longer benefit from practical obscurity. Court have had to wrestle with the consequences of this, including tailoring the way decisions are written to reduce the amount of personal information they contain.

The Canadian Judicial Council has published material on this, as have the federal and state courts in the US.

Recently a US lawyer proposed that databases of court decisions should block search engines from indexing the decisions – a block that is very easy to implement, with a robots.txt notice put in the metadata of the site containing the decisions.

Is that a good idea? It would not bar access to those who know where to look (such as Can LII for Canadian decisions), but it would keep the casual searcher from stumbling upon potentially intimate, disputed or outdated information about people who may not have been voluntarily engaged in litigation.

Should Canadian court databases do this? What is CanLII’s policy?

Comments

  1. Hi John,

    CanLII employs the robots.txt protocol to shield some – not all – of our databases from search engine crawling. Excluded from the protection are legislative collections and Supreme Court of Canada decisions. You can see the full list of treatment here: http://www.canlii.org/robots.txt

    To the extent that Canadian courts are putting their decisions directly online, I believe that most, if not all, rely on the same protocol and/or use other means to limit deep indexing of content accessible through their sites.

    Reliance on any efforts short of password-protected access can still fall short because once a page is copied and reposted on a different site, the search engines make the information available. So what do we do? I and others have written on Slaw and elsewhere about this. I’ll spare readers all the links, but offer this one because it provides a good round-up of these discussions and pathways to even more discussion: http://www.slaw.ca/2014/05/26/google-gonzalez-and-globe24h/

    While I’m tempted to go on and on referencing more of my own past statements on this topic, I’ll limit myself to just one more: “When it comes to material originating from the courts, we have to start thinking of “the internet” as beginning the moment a judge shares a final draft of her ruling with her clerk.”

    Colin Lachance, CanLII CEO

  2. I think it’s a bad idea.

    Courts that post their own opinions are often part of the normal Google results (for example, you can paste this into Google to see US 6th Circuit criminal cases site:www.ca6.uscourts.gov/opinions criminal ). Same for Canadian courts. Ontario, like CanLII, uses a robots.txt file to block searching but Manitoba doesn’t (hyra “criminal matters” manitoba).

    Privacy through obscurity blocks the ability of people who do not regularly do legal research – and know where to start – from finding relevant documents. The 2013 ABA Legal Technology Survey (vol. V, p. 38) found that Google was the preferred free tool for legal research for 36% of respondents. If that’s what lawyers are choosing, it seems hard to block access to opinions by the public who just as likely to use it for their legal research.

    If the opinions contain sensitive information, they can be fixed by the court or the publisher (CanLII does this, I believe). If they’re documents that people, especially those who aren’t legal professionals, might need to access, it would be better for them to be easy to access rather than having to know where they are stored.

  3. There’s still too much information publicly available in court decisions. I’m still finding full names and full birthdates, even of children, in family decisions. While it’s good that search engines can’t search those, I’m sure identity thieves have figured out which sites to troll.

  4. My company Reputation.ca deals with the impacts of this on a daily basis. We help people protect their privacy online and remove damaging information. We have helped many people remove documents (which originated on CANLii) from Caselaw.Globe24h.com

    Robots.txt is not a realistic or effective solution for keeping the information off of the internet. A lot of CANLii’s database was duplicated by the Romainian scraper site Globe24h. A database of court decisions can be mirrored and then exploited for advertising traffic fairly easily. It can actually be done with one Linux command and then waiting about ten minutes.

    The courts, the OSC, BCSC, the professional self regulatory colleges and other organizations that publicize private information need to rethink their whole process. They need to understand that the publication of the private information is effectively an additional punishment for the person and they should assume it will end up in Google, ranking for the person’s name. I think if they recognized this fully they could decide whether it was fair to publish it all and what information should be included in their decisions.

    The situation right now is not fair and not right and something needs to change.

  5. David Collier-Brown

    An interesting balancing question: should courts set publication rules for anyone employing their decisions? If so, what should they allow?

    I might suggest that personal names that are part of case-named remain, but publishers be required to exclude others. This could range from replacing the name with a black blob, simulating on-paper practice, to substituting a non-personal identifier like “party 2” if the publisher want the material to be easily readable.

    At that point, google-style indexing becomes much less of a problem.