Column

Namespaces

Namespaces are used in XML to avoid the problems caused by what would, without the use of namespaces, be latent ambiguities.

A long time ago, in a place far away, there were two ships called “Peerless”. The plaintiff in Raffles v. Wichelhaus (1864), 2 H. & C. 906 (LLMC), 159 E.R. 375 (Hein), 33 L.J.N.S. 160, claimed he had a contract with the defendants for the delivery of some cotton to the defendants in Liverpool “ex Peerless from Bombay” (a.k.a. Mumbai). The defendants claimed they had refused to accept delivery from the Peerless which had sailed from Mumbai in December because they had intended that the cotton would be delivered from the Peerless which had sailed from Mumbai in October. The court famously decided: “There must be judgment for the defendants.” What were the court’s unstated reasons? Opinions differ. The case has frequently been cited, commentators struggling to reach consensus ad idem. (See, for example, A.W.B. Simpson, “The Beauty of Obscurity: Raffles v. Wichelhaus and Busch (1864)”, chapter 6 of Leading Cases in the Common Law (Oxford: Oxford University Press, 1995); or “Contracts for Cotton to Arrive: the Case of the Two Ships Peerless” (1989) 11 Cardozo Law Review 287 (Hein).)

Whatever the occasional benefits of latent ambiguities in legal writing–giving contracts professors something to talk about, for example–there’s less tolerance for that sort of thing in computing. Hence the document, “Namespaces in XML 1.0 (Third Edition)“.

To see namespaces in use, have a look at the XML source for https://www.slaw.ca/feed/. The file begins:

There are six namespace declarations here, each beginning with “xmlns:”. A Uniform Resource Identifier (URI) is used in each declaration. That’s the convention. Interestingly, however, it doesn’t really matter that http://purl.org/rss/1.0/modules/slash/, as of the date of writing, is not a working link. Any string of characters could, once upon a time, have been used. The string just shouldn’t be changed now. Section 3 of the Recommendation says: “The namespace name, to serve its intended purpose, SHOULD have the characteristics of uniqueness and persistence. It is not a goal that it be directly usable for retrieval of a schema (if any exists).” The goal is simply to make it possible for an application–in this case a feed-reader–to look at the file and to find that distinctive character string. It’s just a bonus if the string provides a link to a useful document.

Each namespace name is associated with a prefix. In this example, the “http://purl.org/rss/1.0/modules/slash/” string is associated with the “slash” prefix. Later in the file you are likely to find a tag <slash:comments> (“comments” in the slash namespace), and a tag <comments> (“comments” not in a declared namespace). The objective is for an application to be able to identify, without ambiguity, the appropriate type of “comments” when looking at the file. In this case, your feed-reader should know that the <slash:comments> tag actually indicates a comments count. (See, for example, http://wordpress.org/extend/plugins/slash-comments/ and https://developer.mozilla.org/en/RSS/Module/Slash.)

It is important to stress that it is the namespace name, not the prefix, that the rss reader looks for. If the author of the page were to use

xmlns:walrus="http://purl.org/rss/1.0/modules/slash/"

and

<walrus:comments>

instead, it shouldn’t (in principle, anyway) trouble rss readers.

XML is ubiquitous these days, of course. The following is what can usually be found in the “document.xml” file buried in a .docx zip archive (a.k.a. a Microsoft Word file):

And so, in simple Word-processed files, the “w” prefix will be everywhere. The link, http://schemas.openxmlformats.org/, however, will take you nowhere. To find the .docx documentation, it would be better to start with Chris Rae, “Where is the documentation for Office’s docx/xlsx/pptx formats?” (Part 1: Office 2007, September 25, 2010 and Part 2: Office 2010, October 6, 2010).

The documentation will help you to understand, if you’re curious, the tags most often seen: “<w:document>”, “<w:body>”, “<w:p>” for a paragraph, “<w:r>” for a run, and “<w:t>” for a range of text. It all gets pretty complicated pretty quickly. The use of XML namespaces, however, has made it easier to integrate, without troublesome latent ambiguities, a number of smaller projects (such as wordprocessing, math, and drawing) into a greater project.

I don’t know whether the people doing large-scale contract or securities document assemblies have similar writing conventions, but I’d be interested in comments from anyone who does happen to know.

I sometimes think we have learned a lot about language through advances in computing. But is the “Extensible Markup Language (XML)” (w3.org) a plain language?

 

Comments are closed.