Should Access to Ontario Court Schedules Be Restricted?

Last week, I blogged about the fact that Ontario’s court schedules are finally online for viewing by the public. It is a small step; hopefully many more are to come.

Blogger and lawyer Addison Cameron-Huff brought to my attention the fact that the schedule website is protected by a CAPTCHA. The purpose of CAPTCHAs is to prevent automated access by search engines like Google and other similar webcrawlers.

I am not sure yet what to think about this. Cameron-Huff suggests it is a terrible development because the information should be made freely accessible to anyone who wants to use it. The information is in the public domain. Courts should not restrict access to it.

A similar debate happened in the U.S. over its public court document system called pacer.gov (I blogged about pacer.gov in an earlier post). One activist downloaded millions of documents from the site to make them available to the general public before he was stopped from doing so. One university project called RECAP has as its goal the objective of building a free alternative to the PACER system (PACER charges a per page fee for access).

What do you think? At present, only court schedules are available online but hopefully more will follow – filed statements of claims, statements of defences, motion records and factums. Should access to court documents be restricted, or should it be freely accessible by web crawlers like the Google Scholar engine?

Comments

  1. They don’t necessarily use CAPTCHA to restrict the public’s access to the schedules. They could be doing it to regulate loads on their bandwidth and server capacity.

    I agree that CAPTCHA makes it hard to build third-party apps on top of the schedule data. But lack of CAPTCHA doesn’t make it easy necessarily. What we need is real APIs on top of the data like CanLII’s or TTC’s.

    Again, I don’t think the courts (or MAG) will deliberately prevent that. It’s more likely that they don’t understand why it’s good, and there is no one who can explain it to them. There is probably no budget for that type of thing too.

  2. Jean-Marc Leclerc

    Pulat, I am skeptical about the use of CAPTCHAS by courts to regulate bandwidth and server capacity. These court schedules don’t involve transmission of a lot of data at all. If they have bandwidth problems with the low volume of data that I expect is involved, then they really have some serious technological deficiencies.

  3. I agree with your concerns about the use of Captchas.

    In addition, I understand that there are programs out there that are as accurate as humans in solving Captchas, and I find the Captcha this site is using to be particularly frustrating. It contains 8-digits with no spaces – that is difficult for anyone to read. Never mind someone with some dyslexia or a vision problem.

    I tried playing the audio captcha and on my first try it contained a digit that did not sound like any number I am familiar with.

  4. Might be to prevent cyber “sham scams”. It looks like ontariocourtdates.ca is setup so as to not have an archive of past schedules. They probably don’t want someone else providing that sort of service. One reason to have the CAPTCHA is to prevent a third-party company with questionable intentions from scraping all the info and putting it on its own site. The company could then extort people by forcing them to pay to have their name removed. CanLii recently switched to captcha as well citing privacy concerns: http://business.financialpost.com/2014/03/29/how-cyber-shame-scams-are-playing-on-our-privacy-fears-and-scaling-up/

  5. Jean-Marc Leclerc

    Thanks Thomas, I was not aware of that problem arising out of CanLII decisions. The part that doesn’t make sense to me about the Financial Post article is how the Romanian company obtained access to the litigant’s full name given that initials were apparently used in the proceedings. Maybe one solution to balance the interests might involve the greater use of initials in cases involving sensitive issues.

  6. Hi Jean-Marc,

    We introduced the CAPTCHA in early March. It is presented to a CanLII user when a sufficiently high number of sequential engagements with the site occur from a single source. Under the current parameters, and after making a few tweaks following the introduction, it is highly unlikely that a casual CanLII user will ever see the captcha as their usage levels will rarely trip the wire. Some high volume users such as very large government departments of even national firms that route their traffic through one or a small number of servers may see the notice occasionally.

    Regarding the Financial Post article, the Romanian site operator appears to have indiscriminately scraped a small sampling of cases from across several years and from across several databases. The cumulative effect is a sizeable collection of cases, arranged in a series of folders/sub-folders/sub-sub-folders/etc…but not in any useful organization. CanLII was not the only target, but presumably was sufficiently attractive to the operator as he “planted” his content farm. From the home page, this site claims host the following databases, the content from which was presumably acquired in similar fashion to from CanLII:

    Globe24h | Articles | Articulos | Artigos | Asia Newswire | 亚洲新闻 | Gesetzblatt – Österreich | Nachrichten aus Österreich | Boletin Oficial – España | Artigos do Brasil | Notícias do Brasil | SEC Filings – Securities and Exchange Commission | Canada Newswire | Actualités de Canada | Canadian Caselaw | Jurisprudence de Canada) | Clinical Trials | 中国新闻 | Nyheder fra Danmark | Nachrichten aus Deutschland | Diario Oficial – México | Noticias de España | European Court of Human Rights | Cour Européenne des Droits de l’Homme | Federal Register | Actualités de France | Journal Officiel de la République Française | Jurisprudence de France | 香港新闻 | Notizie Italia | 日本からのニュース | 한국 소식 | Noticias de América Latina | Hírek Magyarország | Minas Gerais | NASA | NATO | OTAN | Nieuws uit Nederland | Newswire | Nyheter fra Norge | Wiadomości z Polski | Notícias de Portugal | T.C. Resmi Gazete – Türkiye | Comunicate de Presă | Новости из России | Legislação de São Paulo | Nachrichten aus der Schweiz | Science | Actualités de la Suisse | Uutisia Suomi | Nyheter från Sverige | Notizie dalla Svizzera | Switzerland Newswire | UK Newswire | UK Gazettes | Dezvăluiri |

    This indiscriminate approach to content acquisition results in many cases containing full names – i.e., in the format it was supplied for publication. It’s through the indexing by search engines of this content on that site that full names within texts often describing less-than-desirable circumstances are become discoverable.

    What to do about the privacy issues?

    I fully agree that it would be better (and for historical judgments, would have been better) for courts to use less personal detail to describe the parties, the witnesses and others named. Anonymization of certain parties is certain cases is highly desirable (family cases, any mention of children, etc….), but there are many other cases where the public interest demands disclosure of names.

    I’ve touched on these issues a couple times here on Slaw.

    http://www.slaw.ca/2013/05/22/the-price-of-open-and-free/
    http://www.slaw.ca/2012/07/23/immovable-object-meet-irresistible-force/

    Many others have weighed in on this topic before and since.

    Just like the issue you raise in your post, it all comes down to what we mean and what we want when we talk about open courts in the age of the internet when anything that can be seen can be copied and shared.

    This is a conversation that needs to continue and an issue that needs to be settled.