Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Wednesday, November 29, 2006

RELAX Wins: Not So Fast

Bob DuCharme blogged about Elliotte Rusty Harold's declaration that "RELAX Wins" from which you can then get to all the commentary, of which there is legion.

For myself, I have never paid any attention to RELAX for the simple reason that I had no particular reason to. It always made sense to me that XSD schemas were the most appropriate replacement for DTDs and I didn't really see a need for anything else. In short, for my clients, using Schemas seems like the only reasonable recommendation, since it is the official W3C schema mechanism, it has a number of important advantages over DTDs, it's ubiquitously supported by most, if not all, XML tools, and seems to be pretty future proof. And I've never gotten that excited about what one could or couldn't do with a particular document constraint language: they are all weak and can never replace the need for validation applications, so really, who cares?

So the arguments along the lines of "RELAX can {slice your bread | butter your toast | walk your dog}" never really carried much weight for me because it didn't really matter for the stuff I do. Also, there's a sense in which document-level constraints really aren't that important, except for syntax-driven authoring and for providing attribute defaults. Otherwise, it's really just documentation for what authors and processors should do.

So it never seemed really important to learn anything about RELAX (none of my clients have had them or requested that we develop one).

But this assertion that "RELAX wins" suggested that I actually look at RELAX--if the tool support is there and it really is easier to use than XSD schemas, maybe it makes sense?

So I looked and I find I am underwhelmed.

Why?

First, like I said, additional constraint features don't really interest me at all, so the fact that RELAX lets you say a few things that Schema can't doesn't carry any weight. Also, the fact that the design is elegant isn't really that compelling either--elegant design by itself is of minimal value unless all other factors are equal.

But what I find missing is:

- Any sort of classing mechanism. My focus for the last 15 years has been on architecture-type mechanisms (i.e., HyTime architectures, DITA class hierarchies) and I feel that that approach to schema design and extension is the most effective way to manage systems of related schemas. XSD Schemas do this to some degree (although the mechanism is both somewhat broken {constraints on derived content models is too restrictive} and strangely designed {not closed over classes alone) but it does offer some immediate advantage when you have schemas where there are clear specializations of base types with the document type or you want to enable controlled specializations).

Unless I missed it, I didn't see any sort of type or class hierarchy mechanism in RELAX at all. I realize that enabling useful type hierarchies is a seriously complicating feature (it is a lot of what makes XSD schemas complicated) but it's also very useful.

Of course, the counter argument is that XSD schemas don't really work for doing architecture-like things so you have no choice but to do what DITA did and create your own extra-schema mechanism. Maybe so. But in that case RELAX falls down because:

- No attribute defaults.

RELAX explicitly does not modify the info set produced for a document (in the core feature set--DTD compatibility does provide defaulted attributes). That's fine, and I think I understand the reason for that, but defaulted attributes are really really handy and enable architecture-style processing without having to have lots of attributes on instance elements. Since I can get default attributes with schemas and just a little bit of custom parser configuration, at least in Java, I find this a definite strike against RELAX (although I suppose I could use the DTD compatibility stuff but I don't know how widely it's supported).

- Not supported by Arbortext Editor

As far as I can tell, Arbortext Editor does not support RELAX schemas for document editing. As Arbortext Editor is the main editor used by my clients (and by me) that's a serious problem and essentially removed RELAX as an option.

Yes, I could use RELAX as the primary form and generate an XSD Schema, but why? That just adds complexity that isn't justified on any other grounds.

- No self-described relationship between a given RELAX schema and a namespace

One of the important features of XSD schemas, in my opinion, is the ability to unambiguously relate a schema document to a namespace. This provides, in a standard way, something missing from the namespace specification itself, namely a formal way to define the member names in a namespace, as well as some of the semantics of those names. With schemas you have at least some hope of automatically and reliably associating namespaces with schemas such that given a document that has elements in one or more namespaces, you can have a system that automatically associates those elements with their governing schemas (e.g., the XIRUSS system).

I see no such mechanism in RELAX. While RELAX lets you associate a namespace with a given element type (which it must to be namespace aware) a given RELAX document can directly define types in any namespaces. This is convenient but doesn't make RELAX particularly useful or reliable as a way to define namespace constraints (as opposed to document constraints).

That is, in essence, XSD schemas are intended to define the constraints on namespaces while RELAX schemas (and DTDs) are intended to define the constraints on documents. It's a subtle but important difference and one that I think is very important. The schema approach explicitly or implicitly recognizes that fundamentally, documents are arbitrary and what's really important is what the individual elements mean, not how they are organized for storage into documents. This was always the problem with DTDs: they governed instances, not types (that is, the term "document type definition" was always a lie). RELAX seems to make the same mistake. I see this as SGML brain damage and I have no use for it.

- No defined mechanism for defining reference constraints

In DTDs you have ID/IDREF, in XSD schemas you have key/keyref. This is a very important feature, I think, and it's something I make heavy use of in schemas (when I can--there's a still a limitation in XSD with declaring and validating references that cross document boundaries, but I'm not sure that's a solvable problem without a more formal definition of what compound documents are {and I'm not talking about XInclude as currently formulated, which punts in an unacceptable way, as far as I'm concerned}).

So for all of these reasons, I don't see RELAX being particularly useful, at least as I use XSD schemas and I find no particularly compelling features and I find at least two essential missing features.

So I must respectfully disagree with Elliotte: RELAX has not won. I don't dispute the utility of RELAX or the elegance of its design but I do dispute any assertion that it is interchangeable with XSD schemas. It is not, for the reasons given above.

If the choice was "DTDs or RELAX?" then I would say without reservation that RELAX would be the right choice, but when the question is "XSD vs. RELAX" I say without reservation that XSD is the right choice. Which is not to say that XSD schemas are perfect by any means, they absolutely are not, but they are better than anything else on offer.

Labels:

Sunday, November 26, 2006

XML: Ten Year Aniversary

It's hard to believe that in a couple of weeks we will celebrate the 10 year aniversary of the first public unveiling of the XML specification at the SGML 1996 conference. It doesn't feel like it's been that long (at that time we were marveling that it had been 10 years since the publication of the SGML Standard).

I will, as it happens, be at XML 2006 for a couple of days--I wonder who else from the original committee will be there? Jon Bosak, of course, as he's giving the closing keynote, and Michael Sperberg-McQueen, who's also on the program. Anyone else? I've been out of circulation for a few years so I don't even know who's still active in the community except for those members who have blogs (Tim, Eve) or who have prominent jobs (Jean Paoli) or simply are prominent (James Clark). Paul Grosso of course is still very active but doesn't go to many conferences these days. Paula has her new Paula's Texas Orange business (very tasty stuff, by the way). Does Peter Sharpe still work on XMetal (I couldn't find anything later than 1998 on Peter with a quick Google search)?

Being a member of the XML Working Group was a singular experience and one I'll treasure. My personal contribution to the final form of XML was fairly slight I would say but we all contributed. I fought for some things that didn't get in, probably for the best. In hindsight I would have fought for some more things to be left out (entities, notations).

As a standards-making activity it was unique: a small group of people with a clear common goal, consistently strong technical knowledge, diverse backgrounds and constituencies, and a task that was relatively easy: take an existing standard and simply cut away everything that wasn't absolutely essential for use on the Web. Few people appreciate that in XML 1.0 there was no invention. We didn't add any features to SGML, we only removed them. [Although the SGML standard had to rush to keep up with some of the syntax changes we made so that a fully-conforming SGML parser could parse XML documents correctly without any special-case code--as far as I know, that update to SGML was only implemented by James Clark in the SP parser, but it was important at the time that we do it (at that time I was also a member of the ISO Technical Committee responsible for SGML, HyTime, and DSSSL). In 1996 we had no idea if XML would sink or swim and we had to assume that SGML would still be the primary standard. Of course it didn't take long to see that XML would in fact sweep away the past with nary a glance back. {But Innodata Isogen has clients who are still using SGML systems and SGML documents, so go figure.}]

XML is, if anything, a singular marketing success--we took something that already existed (SGML) and without changing anything essential about it, rebranded it and made it suddenly not only acceptable to the Web world, but essential.

It was also unique in the way the activity was conducted. Jon Bosak as chair realized that the only way to ensure the cleanest, purest result was to allow members to make decisions that their constituents would not necessarily have supported. Thus all deliberation of the Working Group was private and confidential. (This was also how the U.S. founding fathers operated the Constitutional Convention that resulted in the current U.S. constitution.) This allowed us to focus almost entirely on the technical issues at hand.

Together, these aspects of the Working Group allowed us to produce XML in record time (about 18 months, if memory serves). Certainly no specification of any import within the XML family has been developed as quickly since then. Even namespaces, which should have been a no-brainer, took at least two years (first published in 1999, started spring 1997). It was namespaces that drove me out of the W3C standards arena at the time--I was upset by Tim B-L's insistence that we use attributes to declare namespaces (I wanted to use processing instructions). At the time I felt that Tim's overriding of the consensus of the Working Group was unacceptably heavy handed, that he should be a benevolent dictator and what's the point of having technical experts if you're not going to respect their decisions? That is still a potential problem with the W3C as a standards-making body--it is ultimately controlled by a single person (it doesn't really matter who that person is, just that it's a single person). It didn't help that I was also burned out from working on both HyTime and XML at the same time, while working on a very demanding project for work (Boeing's eMOD system). So maybe I was looking for an excuse to get out of the standard's business for a while. But in 1997 it still wasn't clear the degree to which the Web would become the primary platform for information systems.

In hindsight Tim's decision was probably the best one--my objections had more to do with invading the user's name space (the set of names that a user can choose for element types and attributes) than for any particular love of PIs. Charles Goldfarb had thoroughly schooled me in the inviolability of the user's name space and many of the features of HyTime are there specifically to avoid any need to take away names from users.

Of course, in the much more pragmatic world of the Web, we make a small (and essentially insignificant) sacrifice of a few names in order to provide an essential feature (clear name disambiguation) with the simplest possible syntax.

But that's all history now. XSL-FO brought me back into the W3C fold and I'm happy to be there, although my patience for standards work has, I think, reached its lifetime limit.

One of the things that happened when XML was announced was that everybody wanted to be involved. The way the W3C works (or at least worked then--I haven't checked the participation rules lately) is that any W3C member can place anybody they want on any working group. Over night, the XML Working Group went from being a small group of people with a common goal and universal respect for each other to a large group of largely competing interests, many of whom had no particular technical interest but only wanted to protect or promote their business interests, for good or ill. This had the effect of slowing progress way down, as the sheer number of people made communication difficult and made it easy for anybody to impede progress just by monopolizing the weekly conference call.

This seems to be the way that many, if not all, of the important standards get done these days, with standards activities being more like battlefields and less like technical working groups. And of course this has its highest expression in the battles of competing standards produced by different bodies (i.e., OASIS vs. ECMA vs. W3C vs. ISO).

I still marvel that XQuery ever made it to Proposed Recommendation at all.

I definitely miss the old days, when a standard could be developed by a handful of very sharp technical folks more or less without non-technical interference. I don't see those days returning any time soon--standards are too important to business to let the technoids run the show and the days of big industry investing in big standards is long gone. Today most standards activity is funded either by marketing budgets or by the personal commitments of single practitioners who have made a living out of being an expert in their chosen standards. You do see some standards, like DITA, being driven largely by the user community, both that's because it's an end-user application standard in which the users have an immediate and obvious stake, rather than an infrastructure standard that only a few people really understand or care enough about to invest time and money in standardizing it (XSL-FO, for example).

So here's hoping I'll get the chance to catch up with some of my former XML Working Group collegues, see how their lives have changed in the last ten years, see where they are driving the future.

See you in Boston.