Column

XSLT

XSLT refers to Transformations, a member of the Extensible Stylesheet Language family: http://www.w3.org/Style/XSL/. As the World Wide Web Consortium (W3C) describes it, “An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO.” In other words, you can use it to transform an XML document into HTML. A little less directly, it’s also used to transform XML into PDF, EPUB, etc. This is important stuff, but not really very new anymore. The XSLT 1.0 recommendation is dated November 16, 1999. The XSLT 2.0 recommendation is dated January 23, 2007. An XSLT 3.0 working draft was published July 20, 2012.

An illustration

Did you know that your browser has an XSLT processor inside? Well, why would you? Frankly, it’s not something that very many people need to know. Your browser expects HTML, and usually gets it. I only mention this to illustrate a point. Check out this link: http://johndavis.ca/slaw_201307.xml. As the title indicates, it’s “Nothing Much.” The point is that, instead of giving your brower an HTML page directly, I have given it an XML page and an XSLT page. Your browser turned it all into HTML.

You can use the view source option on your browser to look at the XML. In what follows, the Document Type Definition (DTD) has been omitted, and whitespace has been added as decoration.

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="slaw_201307.xsl"?>
...
<paper>
<title>Nothing Much</title>
<updated>Last updated: July 13, 2013</updated>
<paragraph>
   This is a little bit of nothing done for demonstration
   purposes only. It's meant to provide an example
      <footnote>
         This is one.
      </footnote>
   of what can be done. See paragraph <pararef pidref="pig" />.
</paragraph>
<paragraph>
   Here's another paragraph
      <footnote fid="fox">
         This is another footnote. I want it to be a very long
         one so that it wraps around onto a second line of
         text. I had to add just this little bit extra to make
         sure that it would.
      </footnote>
   which extends the example. And the examples
      <footnote>
         Why not another?
      </footnote>
   aren't over yet.
</paragraph>
<paragraph pid="pig">
   So this is a something more
      <footnote>
         This is another footnote, with a little something extra.
         See <em>supra</em>, note <footref fidref="fox" />.
      </footnote>
   with which to test your patience.
</paragraph>
</paper>

The first thing to note is that XML makes choices possible. I can use whatever elements or tags I like. I used “paragraph” and “footnote” to make the markup more or less self-documenting. If I were planning to do a lot of keyboarding, I could use “p” and “f” instead (or “a” and “b”, for that matter). The XSLT document worries about the need for HTML compliance in the output document.

The second thing to note is that the XSLT document did a bunch of other work. It numbered all the paragraphs and footnotes, and put the footnotes at the ends of the paragraphs. If I changed the XSLT document, the footnotes could all have been put at the end of the last paragraph instead. If I added more paragraphs and footnotes to the XML document, and you reloaded, the numbers would all have been updated too, including the cross-references. I wouldn’t want to give the impression that the programming involved was altogether trivial, but the XSLT document isn’t a lengthy one. (The document will show nicely in some browsers. In others, you may need to use a view source option.)

XML in word-processing

Of course, this is the kind of thing that goes on behind the scenes in word-processors too. For example, here’s a fun article from Microsoft: Michael Case, “Using XSLT and Open XML to Create a Word 2007 Document” (December 2009) (msdn.microsoft.com). For those of you who love the Microsoft universe, or can’t escape from it, there’s much, much more at “Open XML for Office developers”: http://msdn.microsoft.com/en-ca/office/bb265236.aspx.

A DOCX file actually consists of a number of XML files zipped up together into a single package. (See “A Column About ZIP” (April 13, 2011) (slaw.ca).) If this is the XML you want, then all is well in your world. But it doesn’t work for everyone.

You can, of course, write your own XML instead, as I’ve done in the example I gave above. I could (in principle, anyway) treat MS-Word as just one output format among many. The down side is that there are a lot of things an author using this approach needs to keep track of.

DocBook and TEI

An in-between approach is to use something like DocBook or TEI. The basic concept is this: A community maintains a whole suite of XSLT stylesheets (and related processors) which can be used to transform XML into the usual outputs (HTML, PDF, EPUB, etc.). All you have to do is learn to use the right elements or tags in your XML.

DocBook tends to be used for technical documentation. The DocBook community itself gives an indication of its users: “WhoUsesDocBook” (docbook.org). The leader of the community has been O’Reilly, the publisher of such notable titles as Sal Mangano, XSLT cookbook, 2nd ed (Sebastopol, CA: O’Reilly, 2006), LCCN 2006275164.

The leading guides to DocBook are Bob Stayton, DocBook XSL: The Complete Guide, 4th ed. (Santa Cruz CA: Sagehill Enterprises, 2007) (sagehill.net); free version (sagehill.net); and Norman Walsh, DocBook 5: The Definitive Guide, ed. by Richard L. Hamilton (Sebastopol CA: O’Reilly, 2010) (docbook.org); free version (docbook.org). Also relevant are the “DocBook Technical Committee Document Repository” (oasis-open.org); and “The DocBook Project” (sourceforge.net). There’s also a nice Wikipedia article: “DocBook” (wikipedia.org).

There’s a nice Wikipedia article for TEI too: “Text Encoding Initiative” (wikipedia.org). TEI tends to be used by academics who need to mark up text in rather more elaborate ways than the average person would. The TEI consortium describes itself as “a non-profit membership organization composed of academic institutions, research projects, and individual scholars from around the world.” Its website is “TEI: Text Encoding Initiative” (tei-c.org).

At the TEI site, under “Tools” (or at the direct link, http://www.tei-c.org/oxgarage/) is OxGarage, which will transform TEI documents into a variety of outputs. It will transform DocBook, Word, Wordperfect, RTF and other formats too. Indeed, one of the possible outputs is TEI P5 XML, in case you want to see what it looks like. As the name suggests, OxGarage is a University of Oxford project. For background and documentation, see http://www.oucs.ox.ac.uk/oxgarage/.

The truth about XSLT and browsers

For the illustration I gave above, I depended on your browser having native support for some simple XSLT. I didn’t bother to test all the different browsers. I just carelessly relied on statements–like the one at wikipedia.org–that most browsers will do XSLT 1.0. The http://caniuse.com/ site is generally helpful when trying to figure out what different browsers can do, but as of July 2013, no information has been provided. Some guidance is provided at http://greenbytes.de/tech/tc/xslt/, with data as of about January 2013.

As W3Schools notes, in “XSLT – On the Client” (w3schools.com), it isn’t really wise to make the assumptions I did. One alternative would be to use JavaScript to respond to the various browsers’ XSLT capabilities in a browser-specific way. But that seemed like too much work for one little column.

Another alternative is to put the Javascript code for a complete, client-side XSLT 2.0 processor on your server. The idea is that it will be fetched and installed, when needed, by the browser. Wow. If that sounds like fun to you, see “Saxon-CE” (22 February 2013) (saxonica.com) and “XSLT 2.0″ (developer.mozilla.org).

Standalone XSLT

Actually, it isn’t usually going to make any practical sense to try to run an XSLT processor of any kind from a browser. This is because XSLT, by itself, won’t generate outputs like PDF or EPUB. The XSLT processor is only a part of that process, and the remaining steps are a substantial topic in their own right. To read about one common approach, see “XSL Formatting Objects” (wikipedia.org) and “Formatting Objects Processor” (wikipedia.org). Alternative approaches can be found in the documentation for DocBook and TEI, noted above.

If you’re regularly creating documents in different formats from a common XML base, it will usually be simplest just to process your documents locally and put successive versions on your server as required. Wikipedia provides a useful list of XSLT processor implementations in the article, “XSLT” (wikipedia.org).

If you’d like to experiment without installing anything, the W3C has provided this site: http://www.w3.org/2005/08/online_xslt/. You should also have a look at OxGarage, noted above.

Server-side XSLT

The alternative, if your XML base document is very frequently updated, is to run the processing software on the server-side, and have it invoked whenever required by a browser. This too is a topic in its own right–a big one, actually. My only, quite limited, personal experience has been with Apache Cocoon: http://cocoon.apache.org/. One should note that Cocoon requires Java. According to “Apache Cocoon” (wikipedia.org), there are similar projects based on PHP and Python.

I don’t happen to have a Sharepoint installation I can play with, but a Google search turns up things like “XSLT Parameter Bindings” (msdn.microsoft.com) and “How to: Change the Properties Returned in the Core Search Results” (ditto), in case anyone is interested.

Conclusion

XSLT is another of those things which, like so much in today’s world, is much more commonly used than understood. I hope this column has shed just a little bit of light on the topic.

Retweet information »

Start the discussion!

Leave a Reply

(Your email address will not be published or distributed)