AI Takes Over Google Translate

You may have missed, during the holidays, the news that Google has replaced the technology underlying its Google Translate tool, going from a “phrase-based” system to neural networks (i.e., AI).

The improved technology was announced in September, but it has only recently been made available in the publicly available Google Translate (and only for the most common language pairings).

Translation of legal information is an important issue in Canada. First, language barriers faced by different groups (Indigenous people, Francophone minorities outside Québec, Anglophones in Québec and immigrants) are a significant component of the access to justice problems.

But there are also more “technical” problems related with translations (or lack thereof) that affect the publishing and use of legal information:

  • In New Brunswick, for example, if a court finds that a judgment “determines a question of law of interest or importance to the general public”, this case needs to be translated. Of course, this comes at a cost, which costs may have led – I’m told – to the non-publication of a quantity of significant cases. Refer to this excellent Slaw post from Ted Tjaden for more information.
  • In Quebec, the judiciary is known to worry that significant cases with a potentially high precedential value that are issued by Quebec judges in non-civil law matters (e.g. criminal law) are all but ignored outside the province even when they advance the law and could help litigants there.

All that to say that we, as players of the justice system, should welcome any improvement in the quality of the machine-based translation. The improvements to Google Translate seem to be quite promising: According to Google, the performance of the new translation technology brings us very close to human quality translation:

(Chart pasted from the Google article.)

While the gains seem, from the chart alone, incremental, moving the bar this close to human quality translation can make the difference between a translation that is insultingly bad and one that is almost usable as is.

I could not confirm this, but I noticed that it didn’t seem like the new neural networks based translation engine (Google calls it “Neural Machine Translation” or NMT, and will do the same from now on) has been deployed to the document translation tool embedded in Google Drive. This assumption relies on the incomprehensible text it generated from a few documents that I had tested it on.

I thus think that we currently have access to both technologies (i.e., the “old” phrase-based translation via the translator in Google Drive and Neural Machine Translation in Google Translate itself), and I thus proceeded to compare their respective performances using a recent and reasonably short Supreme Court case. Of course, if you happen to be aware that I am wrong and that the translator tool in Google Drive is in fact just a slightly different implementation of Neural Machine Translation, forget about the rest of this article and, by all means, go on with your day.

I translated from English into French since I always found that automated translation tools struggled more with coming up with readable French prose (the above chart seems to confirm this impression) than the opposite. The side-to-side results are here.

I spent time analyzing the differences in the two translations in the first two paragraphs (only two because that was a surprisingly painstaking exercise that reminded me of my Latin classes). The results (see my annotations here) are still interesting despite the small sample size:

  • There were about 52 differences between the two translations. I “scored” 38 for Neural Machine Translation, 11 for phrase-based translation and 3 “ties”. That seems like a clear win for the new technology.
  • 19 of these differences were vocabulary changes. I scored 12 of those as wins for NMT and 7 as wins for phrase-based translation.
  • I identified 24 syntax changes, 22 of those as wins for the new technology.

By and large, NMT completely dominates phrase-based translation for syntax and the old technology gets most of its wins for vocabulary choices (even if it still doesn’t beat the new technology for vocabulary). I, of course, have never trained an AI-based translation engine, but I can assume that syntax is more difficult to get right than vocabulary, especially in a field like law where there are set translations for most of the concepts lawyers juggle with.

More subjectively, the “new” technology provides a much more readable translated text and seems to have finally figured out how to use articles and prepositions. This makes a world of difference: It’s like going from reading a text that feels like it was written by a Roomba to reading a text that has been drafted by a (slightly distracted, mind you) human. I would also say that the difference between the two technologies was even more spectacular in tests I made with less formal texts (old blog posts of mine for example), which suggests that phrase-based translation was already pretty good at translating more formal text.

All of this is already promising, but there’s more: According to this article, Google was able to obtain these great results while using flawed training data. We can therefore expect that the technology will soon obtain even more spectacular results by just cleaning up and beefing up the training set:

Google trains these neural networks by feeding them massive collections of existing translations. Some of this training data is flawed, including lower quality translations from previous versions of the Google Translate app. But it also includes translations from human experts, and this buoys the quality of the training data as a whole. That ability to overcome imperfection is part of deep learning’s apparent magic: given enough data, even if some is flawed, it can train to a level well beyond those flaws.

I obviously don’t know what was in the training data and what wasn’t. Assuming that it didn’t contain a lot of bilingual Canadian content, there’s hope that we could, in a very near future, feed the NMT with the plethora of bilingual Canadian content that can be found online (from Supreme Court cases to most, if not all, of the content on the Justice Canada website). I dare to predict that we will relatively soon have access to a legal translation engine whose work product would be good enough to eliminate some of the problems I referred to at the beginning of this post.


  1. This is a potentially game changing development for the many organizations that produce monolingual (i.e. English) PLEI materials.
    It also probably underscores the importance of ensuring that whatever material is written is written in plain language, avoiding terms of art whenever possible. Less complex and shorter sentences would probably allow NMT to spit out serviceable translations.

  2. Addison Cameron-Huff

    Both Google and Microsoft use EU translations as part of their training. The EU has published an enormous amount of text that’s professionally translated into several languages. Legislation was probably part of the training data but it’s probably not a big part of it at this point, although if you try to translate EU legal concepts you’ll find the translations are quite good.

    Google isn’t public about their training data, but all services like this require translated pairs. Ideally sentence pairs. So wherever you find huge amounts of professionally translated language pairs I’m sure you’ll find Google’s training data (e.g. translation companies).

    As for the issue of French case law in Canada, I think the current state of the art in machine translation is useful enough for at least finding the cases and understanding whether they’re relevant. Machine translation of French case law would be quite cheap and make the cases much more widely available. Although perhaps Google will soon switch to search results that include automatically translated text and beat the courts to it.

  3. Good background info, Addison.
    I can imagine adducing French-language case authority in an English-speaking courtroom, and relying on a machine interpretation, would not please the judge. At least… not a human judge.