RELAX Wins: Not So Fast
Bob DuCharme blogged about Elliotte Rusty Harold's declaration that "RELAX Wins" from which you can then get to all the commentary, of which there is legion.
For myself, I have never paid any attention to RELAX for the simple reason that I had no particular reason to. It always made sense to me that XSD schemas were the most appropriate replacement for DTDs and I didn't really see a need for anything else. In short, for my clients, using Schemas seems like the only reasonable recommendation, since it is the official W3C schema mechanism, it has a number of important advantages over DTDs, it's ubiquitously supported by most, if not all, XML tools, and seems to be pretty future proof. And I've never gotten that excited about what one could or couldn't do with a particular document constraint language: they are all weak and can never replace the need for validation applications, so really, who cares?
So the arguments along the lines of "RELAX can {slice your bread | butter your toast | walk your dog}" never really carried much weight for me because it didn't really matter for the stuff I do. Also, there's a sense in which document-level constraints really aren't that important, except for syntax-driven authoring and for providing attribute defaults. Otherwise, it's really just documentation for what authors and processors should do.
So it never seemed really important to learn anything about RELAX (none of my clients have had them or requested that we develop one).
But this assertion that "RELAX wins" suggested that I actually look at RELAX--if the tool support is there and it really is easier to use than XSD schemas, maybe it makes sense?
So I looked and I find I am underwhelmed.
Why?
First, like I said, additional constraint features don't really interest me at all, so the fact that RELAX lets you say a few things that Schema can't doesn't carry any weight. Also, the fact that the design is elegant isn't really that compelling either--elegant design by itself is of minimal value unless all other factors are equal.
But what I find missing is:
- Any sort of classing mechanism. My focus for the last 15 years has been on architecture-type mechanisms (i.e., HyTime architectures, DITA class hierarchies) and I feel that that approach to schema design and extension is the most effective way to manage systems of related schemas. XSD Schemas do this to some degree (although the mechanism is both somewhat broken {constraints on derived content models is too restrictive} and strangely designed {not closed over classes alone) but it does offer some immediate advantage when you have schemas where there are clear specializations of base types with the document type or you want to enable controlled specializations).
Unless I missed it, I didn't see any sort of type or class hierarchy mechanism in RELAX at all. I realize that enabling useful type hierarchies is a seriously complicating feature (it is a lot of what makes XSD schemas complicated) but it's also very useful.
Of course, the counter argument is that XSD schemas don't really work for doing architecture-like things so you have no choice but to do what DITA did and create your own extra-schema mechanism. Maybe so. But in that case RELAX falls down because:
- No attribute defaults.
RELAX explicitly does not modify the info set produced for a document (in the core feature set--DTD compatibility does provide defaulted attributes). That's fine, and I think I understand the reason for that, but defaulted attributes are really really handy and enable architecture-style processing without having to have lots of attributes on instance elements. Since I can get default attributes with schemas and just a little bit of custom parser configuration, at least in Java, I find this a definite strike against RELAX (although I suppose I could use the DTD compatibility stuff but I don't know how widely it's supported).
- Not supported by Arbortext Editor
As far as I can tell, Arbortext Editor does not support RELAX schemas for document editing. As Arbortext Editor is the main editor used by my clients (and by me) that's a serious problem and essentially removed RELAX as an option.
Yes, I could use RELAX as the primary form and generate an XSD Schema, but why? That just adds complexity that isn't justified on any other grounds.
- No self-described relationship between a given RELAX schema and a namespace
One of the important features of XSD schemas, in my opinion, is the ability to unambiguously relate a schema document to a namespace. This provides, in a standard way, something missing from the namespace specification itself, namely a formal way to define the member names in a namespace, as well as some of the semantics of those names. With schemas you have at least some hope of automatically and reliably associating namespaces with schemas such that given a document that has elements in one or more namespaces, you can have a system that automatically associates those elements with their governing schemas (e.g., the XIRUSS system).
I see no such mechanism in RELAX. While RELAX lets you associate a namespace with a given element type (which it must to be namespace aware) a given RELAX document can directly define types in any namespaces. This is convenient but doesn't make RELAX particularly useful or reliable as a way to define namespace constraints (as opposed to document constraints).
That is, in essence, XSD schemas are intended to define the constraints on namespaces while RELAX schemas (and DTDs) are intended to define the constraints on documents. It's a subtle but important difference and one that I think is very important. The schema approach explicitly or implicitly recognizes that fundamentally, documents are arbitrary and what's really important is what the individual elements mean, not how they are organized for storage into documents. This was always the problem with DTDs: they governed instances, not types (that is, the term "document type definition" was always a lie). RELAX seems to make the same mistake. I see this as SGML brain damage and I have no use for it.
- No defined mechanism for defining reference constraints
In DTDs you have ID/IDREF, in XSD schemas you have key/keyref. This is a very important feature, I think, and it's something I make heavy use of in schemas (when I can--there's a still a limitation in XSD with declaring and validating references that cross document boundaries, but I'm not sure that's a solvable problem without a more formal definition of what compound documents are {and I'm not talking about XInclude as currently formulated, which punts in an unacceptable way, as far as I'm concerned}).
So for all of these reasons, I don't see RELAX being particularly useful, at least as I use XSD schemas and I find no particularly compelling features and I find at least two essential missing features.
So I must respectfully disagree with Elliotte: RELAX has not won. I don't dispute the utility of RELAX or the elegance of its design but I do dispute any assertion that it is interchangeable with XSD schemas. It is not, for the reasons given above.
If the choice was "DTDs or RELAX?" then I would say without reservation that RELAX would be the right choice, but when the question is "XSD vs. RELAX" I say without reservation that XSD is the right choice. Which is not to say that XSD schemas are perfect by any means, they absolutely are not, but they are better than anything else on offer.
For myself, I have never paid any attention to RELAX for the simple reason that I had no particular reason to. It always made sense to me that XSD schemas were the most appropriate replacement for DTDs and I didn't really see a need for anything else. In short, for my clients, using Schemas seems like the only reasonable recommendation, since it is the official W3C schema mechanism, it has a number of important advantages over DTDs, it's ubiquitously supported by most, if not all, XML tools, and seems to be pretty future proof. And I've never gotten that excited about what one could or couldn't do with a particular document constraint language: they are all weak and can never replace the need for validation applications, so really, who cares?
So the arguments along the lines of "RELAX can {slice your bread | butter your toast | walk your dog}" never really carried much weight for me because it didn't really matter for the stuff I do. Also, there's a sense in which document-level constraints really aren't that important, except for syntax-driven authoring and for providing attribute defaults. Otherwise, it's really just documentation for what authors and processors should do.
So it never seemed really important to learn anything about RELAX (none of my clients have had them or requested that we develop one).
But this assertion that "RELAX wins" suggested that I actually look at RELAX--if the tool support is there and it really is easier to use than XSD schemas, maybe it makes sense?
So I looked and I find I am underwhelmed.
Why?
First, like I said, additional constraint features don't really interest me at all, so the fact that RELAX lets you say a few things that Schema can't doesn't carry any weight. Also, the fact that the design is elegant isn't really that compelling either--elegant design by itself is of minimal value unless all other factors are equal.
But what I find missing is:
- Any sort of classing mechanism. My focus for the last 15 years has been on architecture-type mechanisms (i.e., HyTime architectures, DITA class hierarchies) and I feel that that approach to schema design and extension is the most effective way to manage systems of related schemas. XSD Schemas do this to some degree (although the mechanism is both somewhat broken {constraints on derived content models is too restrictive} and strangely designed {not closed over classes alone) but it does offer some immediate advantage when you have schemas where there are clear specializations of base types with the document type or you want to enable controlled specializations).
Unless I missed it, I didn't see any sort of type or class hierarchy mechanism in RELAX at all. I realize that enabling useful type hierarchies is a seriously complicating feature (it is a lot of what makes XSD schemas complicated) but it's also very useful.
Of course, the counter argument is that XSD schemas don't really work for doing architecture-like things so you have no choice but to do what DITA did and create your own extra-schema mechanism. Maybe so. But in that case RELAX falls down because:
- No attribute defaults.
RELAX explicitly does not modify the info set produced for a document (in the core feature set--DTD compatibility does provide defaulted attributes). That's fine, and I think I understand the reason for that, but defaulted attributes are really really handy and enable architecture-style processing without having to have lots of attributes on instance elements. Since I can get default attributes with schemas and just a little bit of custom parser configuration, at least in Java, I find this a definite strike against RELAX (although I suppose I could use the DTD compatibility stuff but I don't know how widely it's supported).
- Not supported by Arbortext Editor
As far as I can tell, Arbortext Editor does not support RELAX schemas for document editing. As Arbortext Editor is the main editor used by my clients (and by me) that's a serious problem and essentially removed RELAX as an option.
Yes, I could use RELAX as the primary form and generate an XSD Schema, but why? That just adds complexity that isn't justified on any other grounds.
- No self-described relationship between a given RELAX schema and a namespace
One of the important features of XSD schemas, in my opinion, is the ability to unambiguously relate a schema document to a namespace. This provides, in a standard way, something missing from the namespace specification itself, namely a formal way to define the member names in a namespace, as well as some of the semantics of those names. With schemas you have at least some hope of automatically and reliably associating namespaces with schemas such that given a document that has elements in one or more namespaces, you can have a system that automatically associates those elements with their governing schemas (e.g., the XIRUSS system).
I see no such mechanism in RELAX. While RELAX lets you associate a namespace with a given element type (which it must to be namespace aware) a given RELAX document can directly define types in any namespaces. This is convenient but doesn't make RELAX particularly useful or reliable as a way to define namespace constraints (as opposed to document constraints).
That is, in essence, XSD schemas are intended to define the constraints on namespaces while RELAX schemas (and DTDs) are intended to define the constraints on documents. It's a subtle but important difference and one that I think is very important. The schema approach explicitly or implicitly recognizes that fundamentally, documents are arbitrary and what's really important is what the individual elements mean, not how they are organized for storage into documents. This was always the problem with DTDs: they governed instances, not types (that is, the term "document type definition" was always a lie). RELAX seems to make the same mistake. I see this as SGML brain damage and I have no use for it.
- No defined mechanism for defining reference constraints
In DTDs you have ID/IDREF, in XSD schemas you have key/keyref. This is a very important feature, I think, and it's something I make heavy use of in schemas (when I can--there's a still a limitation in XSD with declaring and validating references that cross document boundaries, but I'm not sure that's a solvable problem without a more formal definition of what compound documents are {and I'm not talking about XInclude as currently formulated, which punts in an unacceptable way, as far as I'm concerned}).
So for all of these reasons, I don't see RELAX being particularly useful, at least as I use XSD schemas and I find no particularly compelling features and I find at least two essential missing features.
So I must respectfully disagree with Elliotte: RELAX has not won. I don't dispute the utility of RELAX or the elegance of its design but I do dispute any assertion that it is interchangeable with XSD schemas. It is not, for the reasons given above.
If the choice was "DTDs or RELAX?" then I would say without reservation that RELAX would be the right choice, but when the question is "XSD vs. RELAX" I say without reservation that XSD is the right choice. Which is not to say that XSD schemas are perfect by any means, they absolutely are not, but they are better than anything else on offer.
Labels: relaxng schemas dtds
9 Comments:
The default-attribute part of DTD compatibility is not widely supported, it's true. (ID/IDREF is, though.) The general feeling, I think, is that attribute defaulting is neither necessary (XSLT wins here) nor sufficient (they only handle defaulting by element name, not any of the other reasonable kinds of defaults).
Type extension is available in RNG, but it's done by interleaving (an extension of SGML &) rather than concatenating, so that it is not order-dependent.
And as for not binding to a namespace, the trouble with doing that is that it inappropriately reifies the namespace. There is no problem with importing definitions that happen to apply to elements or attributes with a common namespace if that's the way you want to organize things. But if you don't, you have no hoops to jump through. Nor is there any problem with supplying multiple schemas for a given namespace, or for that matter a given document, to be used in different stages of the processing pipeline. Hardwiring schemas prevents you from doing that.
I don't understand how XML Schema has anything to say about the meanings of elements. Like RNG and DTDs, it's a grammatical/structural constraint engine.
Your point about identity constraints is well-taken; in this case, the best was the enemy of the good. The RELAX NG TC didn't want to add identity constraints, even simple ones, when they were still an active research effort.
What is the difference then between the schemas you are making, and the DocBook schema? The DocBook TC seems to be very happy with RELAX NG.
Two things I'd like to suggest you read: James' The Design of RELAX NG and my XML 2004 paper, Documents vs. Data, Schemas vs. Schemas.
You make some very good points.
The one thing I cannot stomach with XSD, though, is the fact that order of child elements is considered significant. That is so ridiculous when you've gone to the trouble of creating descriptive tags.
Both XSD and RelaxNG are also far too wordy. I agree that schemas are more about showing the information model than a particular document. For a simpler approach, see http://tobe.homelinux.net/xis
tobe says:
"The one thing I cannot stomach with XSD, though, is the fact that order of child elements is considered significant. That is so ridiculous when you've gone to the trouble of creating descriptive tags."
I'm not sure I understand this statement. First, no XML constraint mechanism *requires* that you make order significant--with XSD (and RELAX) you can explicitly say that order is not important, either using an "OR" group or what SGML called an "AND" group.
Sometimes order isn't important and sometimes it is. For pure data applications where you're just capturing name/value pairs, order usually doesn't matter, but in documentation applications order often is very important.
This is probably a stupid question (completely unrelated to Thai wordbreaking ;-), but I've never minded asking them before, so...
Given the limitations of Relax NG and that it is now the normative DocBook schema, would you recommend current users or new adopters of DocBook to stick with version 4.x.x releases or to use version 5 XSD?
Thanks.
In response to Bill Burns' question:
I'm not sure there's a general answer to the question. Just yesterday I went through the exercise of using the V5.0RC1 XSD schemas as a base for a new DocBook-based document type. I found some problems with the schemas (it looks like the generation process didn't correctly reflect the RNG constraints in all cases--for example, the XSD allows titles to be omitted where the docs and RNG clearly require them).
However, version 5 has some important improvements, including the definition of a DocBook namespace (finally), so it would seem to be useful to start with version 5.
The question of which schema form you use is really more a question of what your tools and local practice is more than which one is better.
One thing I noticed that I hadn't possibly fully appreciated before is the ability in Relax to define exclusion rules (roughly equivalent to SGML's exclusion mechanism). This seems like a useful thing to at least document clearly, especially for a very general schema like DocBook where it's not practical to define the necessary specialized element types or local-context versions of element types that would enforce these constraints, but it's definitely something you can't say directly in schema (you can get the same effect by having local declarations of the same element type name with more constrained content models, but it's difficult to see that it's worth the trouble when the constraints are trivially easy to validate with a few select statements in an XSLT or an XQuery).
For myself, I stand by my policy of preferring XSD schemas as a matter of practice, for all the reasons I gave, not least of which is its better way of associating rules with namespaces.
That is, unless my clients require that I provide Relax schemas, I will recommend that they use XSD schemas and provide those by default.
I'm not sure I've usefully answered your question, but for me, I think the answer is: use Version 5 and use whichever schema form you want, but be advised that the currently-provided XSD schemas have some bugs (they also lack the same parameterization that the RNG schemas do, which was annoying) [I've reported these issues to Norm]).
Your response was quite helpful, thanks.
You can try to convert your XSD schema to RelaxNG there :
http://debeissat.nicolas.free.fr/XSDtoRNG.php
Post a Comment
<< Home