RELAX Wins: Not So Fast
For myself, I have never paid any attention to RELAX for the simple reason that I had no particular reason to. It always made sense to me that XSD schemas were the most appropriate replacement for DTDs and I didn't really see a need for anything else. In short, for my clients, using Schemas seems like the only reasonable recommendation, since it is the official W3C schema mechanism, it has a number of important advantages over DTDs, it's ubiquitously supported by most, if not all, XML tools, and seems to be pretty future proof. And I've never gotten that excited about what one could or couldn't do with a particular document constraint language: they are all weak and can never replace the need for validation applications, so really, who cares?
So the arguments along the lines of "RELAX can {slice your bread | butter your toast | walk your dog}" never really carried much weight for me because it didn't really matter for the stuff I do. Also, there's a sense in which document-level constraints really aren't that important, except for syntax-driven authoring and for providing attribute defaults. Otherwise, it's really just documentation for what authors and processors should do.
So it never seemed really important to learn anything about RELAX (none of my clients have had them or requested that we develop one).
But this assertion that "RELAX wins" suggested that I actually look at RELAX--if the tool support is there and it really is easier to use than XSD schemas, maybe it makes sense?
So I looked and I find I am underwhelmed.
Why?
First, like I said, additional constraint features don't really interest me at all, so the fact that RELAX lets you say a few things that Schema can't doesn't carry any weight. Also, the fact that the design is elegant isn't really that compelling either--elegant design by itself is of minimal value unless all other factors are equal.
But what I find missing is:
- Any sort of classing mechanism. My focus for the last 15 years has been on architecture-type mechanisms (i.e., HyTime architectures, DITA class hierarchies) and I feel that that approach to schema design and extension is the most effective way to manage systems of related schemas. XSD Schemas do this to some degree (although the mechanism is both somewhat broken {constraints on derived content models is too restrictive} and strangely designed {not closed over classes alone) but it does offer some immediate advantage when you have schemas where there are clear specializations of base types with the document type or you want to enable controlled specializations).
Unless I missed it, I didn't see any sort of type or class hierarchy mechanism in RELAX at all. I realize that enabling useful type hierarchies is a seriously complicating feature (it is a lot of what makes XSD schemas complicated) but it's also very useful.
Of course, the counter argument is that XSD schemas don't really work for doing architecture-like things so you have no choice but to do what DITA did and create your own extra-schema mechanism. Maybe so. But in that case RELAX falls down because:
- No attribute defaults.
RELAX explicitly does not modify the info set produced for a document (in the core feature set--DTD compatibility does provide defaulted attributes). That's fine, and I think I understand the reason for that, but defaulted attributes are really really handy and enable architecture-style processing without having to have lots of attributes on instance elements. Since I can get default attributes with schemas and just a little bit of custom parser configuration, at least in Java, I find this a definite strike against RELAX (although I suppose I could use the DTD compatibility stuff but I don't know how widely it's supported).
- Not supported by Arbortext Editor
As far as I can tell, Arbortext Editor does not support RELAX schemas for document editing. As Arbortext Editor is the main editor used by my clients (and by me) that's a serious problem and essentially removed RELAX as an option.
Yes, I could use RELAX as the primary form and generate an XSD Schema, but why? That just adds complexity that isn't justified on any other grounds.
- No self-described relationship between a given RELAX schema and a namespace
One of the important features of XSD schemas, in my opinion, is the ability to unambiguously relate a schema document to a namespace. This provides, in a standard way, something missing from the namespace specification itself, namely a formal way to define the member names in a namespace, as well as some of the semantics of those names. With schemas you have at least some hope of automatically and reliably associating namespaces with schemas such that given a document that has elements in one or more namespaces, you can have a system that automatically associates those elements with their governing schemas (e.g., the XIRUSS system).
I see no such mechanism in RELAX. While RELAX lets you associate a namespace with a given element type (which it must to be namespace aware) a given RELAX document can directly define types in any namespaces. This is convenient but doesn't make RELAX particularly useful or reliable as a way to define namespace constraints (as opposed to document constraints).
That is, in essence, XSD schemas are intended to define the constraints on namespaces while RELAX schemas (and DTDs) are intended to define the constraints on documents. It's a subtle but important difference and one that I think is very important. The schema approach explicitly or implicitly recognizes that fundamentally, documents are arbitrary and what's really important is what the individual elements mean, not how they are organized for storage into documents. This was always the problem with DTDs: they governed instances, not types (that is, the term "document type definition" was always a lie). RELAX seems to make the same mistake. I see this as SGML brain damage and I have no use for it.
- No defined mechanism for defining reference constraints
In DTDs you have ID/IDREF, in XSD schemas you have key/keyref. This is a very important feature, I think, and it's something I make heavy use of in schemas (when I can--there's a still a limitation in XSD with declaring and validating references that cross document boundaries, but I'm not sure that's a solvable problem without a more formal definition of what compound documents are {and I'm not talking about XInclude as currently formulated, which punts in an unacceptable way, as far as I'm concerned}).
So for all of these reasons, I don't see RELAX being particularly useful, at least as I use XSD schemas and I find no particularly compelling features and I find at least two essential missing features.
So I must respectfully disagree with Elliotte: RELAX has not won. I don't dispute the utility of RELAX or the elegance of its design but I do dispute any assertion that it is interchangeable with XSD schemas. It is not, for the reasons given above.
If the choice was "DTDs or RELAX?" then I would say without reservation that RELAX would be the right choice, but when the question is "XSD vs. RELAX" I say without reservation that XSD is the right choice. Which is not to say that XSD schemas are perfect by any means, they absolutely are not, but they are better than anything else on offer.
Labels: relaxng schemas dtds