Column

Starting to Think About HTML5, or the Joy of Firebug

I drift along sometimes, dreaming that maybe I can stop learning new technical things for a while and actually use a few technologies to finish some long-term projects. Then something comes along to wake me up. One of the more recent of these wake-up calls came with the announcement of Apple’s iPad. People were complaining that it didn’t support Adobe Flash content.

I don’t generally pay a lot of attention to either Apple or Adobe, because I tend not to associate either one with open standards or open software. That’s just a bias though, and life is never that simple. What caught my attention was the nature of Apple’s defence to the complaints. Here’s a quotation from a February 20, 2010 posting on AppleInsider, “Inside Apple’s iPad: Adobe Flash“:

… Apple’s opposition to Adobe’s Flash isn’t an attack on a popular plugin to limit choice, but really an effort to restore the use of open standards on the web, which creates a real marketplace for consumer choice. If Adobe were really interested in supporting open standards rather than being a gatekeeper wielding proprietary control over multimedia playback on the web, it could have opened up Flash just as it once did with PDF.

The article stressed the importance of HTML5 in Apple’s decision, linking to an earlier posting from September 19, 2009, “Why Apple is betting on HTML 5: a web history“. Here is AppleInsider’s account of the origins of HTML5:

The glacial and mostly irrelevant progress being made on web standards within the W3C resulted in Apple, Mozilla, and Opera joining together in 2004 to start up the independent WHATWG (Web Hypertext Application Technology Working Group), focused on advancing HTML, CSS, DOM, and JavaScript to serve as a viable platform for rich web apps.

WHATWG was pointedly designed to bypass Adobe Flash, Microsoft Silverlight, and Sun JavaFX and return the web to its open roots. In order to achieve that goal, it needed to modernize the HTML specification, which hadn’t really progressed since 1999. Additional work was also required to adapt the DOM and advance JavaScript with APIs to provide rich web app features more akin to desktop applications, such as drag and drop, advanced drawing, and offline editing.

In 2007, WHATWG recommended its specification of HTML 5 for adoption by the WC3 as the new starting point for the future of the web in place of XHTML 2.0. This time, the W3C accepted the proposal and a new HTML working group was subsequently formed. In January 2008, the first public working draft of HTML 5 was published.

Steve Jobs, in an apparent attempt to distinguish the colouring of pots and kettles, also blogged some “Thoughts on Flash” in April 2010:

Apple has many proprietary products too. Though the operating system for the iPhone, iPod and iPad is proprietary, we strongly believe that all standards pertaining to the web should be open. Rather than use Flash, Apple has adopted HTML5, CSS and JavaScript — all open standards. Apple’s mobile devices all ship with high performance, low power implementations of these open standards. HTML5, the new web standard that has been adopted by Apple, Google and many others, lets web developers create advanced graphics, typography, animations and transitions without relying on third party browser plug-ins (like Flash). HTML5 is completely open and controlled by a standards committee, of which Apple is a member.

So is it as simple as that? Some say not. Erik Sherman said the dispute was more about Flash cookies and consumer data than anything: “Apple-Adobe Flash Fight: Think Ads, Not Video” (bnet.com, February 19, 2010). Erick Schonfeld noted the significance of codecs and digital rights management in “Microsoft Agrees With Apple And Google: ‘The Future Of The Web Is HTML5’” (techcrunch.com, April 30, 2010), something also stressed by Gavin Clarke in “Apple’s HTML5 ‘standards’ hype debunked: ‘It’s open. But it only works with Safari’” (theregister.co.uk, June 4, 2010), and by Cade Metz in “Google: Flash stays on YouTube, and here’s why HTML5 doesn’t cut it” (theregister.co.uk, June 30, 2010). Adobe doesn’t have all of its eggs in one basket either, as Tom Krazit reported, quoting Kevin Lynch, Adobe’s chief technology officer: “We’re going to try and make the best tools in the world for HTML5”: “Adobe’s Apple tiff won’t prevent HTML5 support” (cnet.com, May 5, 2010).

I’m really in no position to assess anybody’s business prospects. The bottom line for me is that everybody seems to agree that HTML5 (which will be an open W3C standard when finalized) will be important. The latest published version of the working draft is made available at http://www.w3.org/TR/html5/, but I consider that a pretty tough read. I suspect most other Slaw readers would as well. A better introduction is provided by “Dive Into HTML 5“, by Mark Pilgrim of Google.

In my own work, I’m not at the moment very interested in using the <video> element. I’m more concerned that my HTML also be valid XML. (The reason–radically abbreviated–for this concern is that, in theory at least, I can save myself some time and effort by making it so.) Here are a couple of key paragraphs from section 1 of “Frequently Asked Questions (FAQ) about the future of XHTML” (W3C, July 2, 2009):

When W3C announced the HTML and XHTML 2 Working Groups in March 2007, we indicated that we would continue to monitor the market for XHTML 2. W3C recognizes the importance of a clear signal to the community about the future of HTML.

While we recognize the value of the XHTML 2 Working Group’s contributions over the years, after discussion with the participants,
W3C management has decided to allow the Working Group’s charter to expire at the end of 2009 and not to renew it.

This is from section 3:

Regarding the XML serialization of HTML, the HTML 5 specification includes a section on XML serialization, as well as a section on text/html serialization. W3C plans to continue work on both serializations in the HTML Working Group. Thus, we expect the next generation XML serialization of HTML to be defined in the HTML 5 specification. Currently, the HTML 5 specification refers to this serialization as “XHTML 5” [HTML 5, section 1.6]

“HTML 5, section 1.6”, just noted, describes pretty concisely the things one needs to keep in mind if one wants one’s HTML to be valid XML.

If anyone is still reading and wonders how the heck any of this HTML/XML stuff relates to actual web content, some familiar examples are illustrative. I know of several sources for the Slaw article and comment feeds:

Load up one of these and, if you’re using Firefox, click “Tools–Page info”. (Your preferred browser may or may not have an obvious equivalent.) You’ll see that the type is “application/xhtml+xml” (RFC 3236.) Then load up http://slaw.ca/. This time the type is “text/html” (RFC 2854.) The “type”, when push comes to shove, describes how the page was served up more than it does the page itself, as noted in more detail below.

To examine the page itself using Firefox, you click “View–Page Source”. (If your preferred browser seems to give you something different, you can read the source using the validator noted below instead.) Load up http://www.slaw.ca/feed/ and have a look. The source begins:

<?xml version=”1.0″ encoding=”UTF-8″?>
<rss version=”2.0″ … >

Drop the URI into http://validator.w3.org/, and it will probably validate nicely as XML, with just one warning:

No DOCTYPE found! Checking XML syntax only.

The same goes for the other feed pages, except for the one from Yahoo,
which will probably begin:

<?xml version=”1.0″?>
<rss version=”2.0″ … >

Validating this page, you’ll probably get a second warning:

No Character encoding declared at document level

In fact, Simon Fodden occasionally gets a message from me whenever one XML processor or another has found an invalidity in the feed. The standard XML processors are very unforgiving, as they should be if the standards are to mean anything. I’ve never asked Simon how many other people report these little problems, but I suspect there aren’t many of us. It makes me wonder what most people use to get the Slaw feeds.

In any event, if you look at the source of http://slaw.ca/, it will probably begin:

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN”
    “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
<html xmlns=”http://www.w3.org/1999/xhtml” dir=”ltr” lang=”en-US”>

When I checked it, there was no XML declaration, but that’s never absolutely necessary, even in a document served up as “application/xhtml+xml”. (This one was served up “text/html”, as noted.) There was just another “note” about that when I validated:

No Character encoding declared at document level

The page did have a DOCTYPE declaration (as above) when I checked it. This indicated that the root element would be “<html>”, and that the Document Type Definition (DTD) was “XHTML 1.0 Transitional”. It also provided a default XML namespace (xmlns) for the element and attribute names.

Is this “really” an XHTML document? No. A valid XHTML document could be processed as XML. If you check this one, the validator will probably find a lot of invalidities. (There were 136 Errors and 106 warnings when I checked it.) Of course, none of these will likely trouble your browser in the least. The HTML processors in most browsers are very, very tolerant. Unlike an RSS feed, the typical HTML page is only intended for browsers. Thus, re-purposing portions of it is more “scraping” than “syndication“.

If you’re interested in the processing of page components that goes on within your browser, you’ll need to become familiar with the W3C Document Object Model (DOM) and Javascript (standardized by ECMA as ECMAScript). For viewing pages from this perspective, the tool I would recommend is Firebug. It’s a Firefox add-on, but there is a Lite version said to be compatible with IE6+, Opera, Safari and Chrome. You could get most of the same information using just “View–Page Source”, but it’s so much easier with Firebug.

Looking at http://www.slaw.ca/feed/ with Firebug, you’ll learn something about how your browser presents XML. Firefox, for example, transforms the XML source into HTML, wrapping it up in:

<html id=”feedHandler” xmlns=”http://www.w3.org/1999/xhtml”
   xmlns:xul=”http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul”>

The reference is to XUL (XML User Interface Language), described as:

Mozilla’s XML-based language that lets you build feature-rich cross platform applications that can run connected or disconnected from the Internet.

Apart from that, it’s mostly just interesting to notice how much more nesting one ordinarily sees in XML than in HTML. On the other hand, Firebug reveals that there’s a lot more going on in http://slaw.ca/ than in http://www.slaw.ca/feed/: lots of styling and scripts, and most of the content in a three-column table.

I realize, of course, that this article won’t be everyone’s cup of tea, but perhaps a few Slawyers will be tempted to explore further. Many of the ongoing achievements we have seen in computer ordering are simply the result of rigorous attention to detail. Long ago, Aristotle [Nichomachean Ethics, trans. by H. Rackham, rev.ed. (Cambridge MA: Harvard University Press, 1934), VI, iv, 6, p. 335] described the relationship between τέχνη (techne), λόγος (logos) and ποίησις (poiesis):

Art … is a rational quality, concerned with making, that reasons truly.
ἡ … τέχνη … ἕξις τις μετὰ λόγου ἀληθοῦς ποιητική ἐστιν
he … techne … hexis tis meta logou alethous poietike estin

I have often wondered whether, if lawyers devoted more time to learning, in detail, about the tools they use, the τέχνη (techne) of legal ordering might somehow similarly be improved.

Comments

  1. John:

    I agree that lawyer should pay more attention to the technological infrastructure they use EVERYDAY! I cannot imagine using a browser and NOT understanding how – at even the most basic level – they work and display data.

    I wish that lawyers could learn basic html just so they understand how this data is served up to them. Imagine if they understood just how much benefit that XML could bring to their long-term publishing prospects. One of my favorite books is Siegel’s “Pull: The Power of the Semantic Web to Transform Your Business.” Much of the future belongs to semantic apps. XML is going to be instrumental.

  2. Daniel–thank you for mentioning this book. It may be up to those of us who “get it” to explain it to others.

    I attended the Toronto Semantic Web meetup group’s meeting with David Siegel (via Skype) to launch his book. Here he is giving a similar talk for the New York Semantic Web meetup group: