Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Monday, October 01, 2007

Automatic Handling of DITA Docs In XML Editors

I'm in demo prep heck at the moment, trying to get some real DITA functionality built on top of Really Strategies' RSuite CMS product. One of the key challenges here is integrating XML editors to handle this use case:

Initial state: You are presented with some valid, conforming DITA documents in some locally configured and/or specialized document type, organized by one or more maps. You (and your repository and supporting tools) have never seen this particular set of documents or their DTDs before.

Step 1. Import map and all dependencies (including its DTD) into the repository

Step 2. Within the repository, find a topic to edit and push the "Edit with {name of integrated editor}" button in the repository UI.

Step 3. Editor opens with document, with all DITA support features applied.

It is that step 3 that is currently causing me a bit of pain. And it shouldn't.

The reason it's causing me pain is because every graphical XML editor has been built on the presumption that document types are relatively static and that some XML specialist will develop lots of doctype-specific setup and then deploy that setup once, followed by a long time with no changes to that setup.

Thus, if you're presented with new documents in a heretofore unseen DTD, they're not going to work in the editor until you go through the setup and configuration process for the new document types [And remember that DITA 1.1 requires at least six distinct shell types: map, concept, reference, task, glossentry, and dita, plus any additional specialized map or topic types you might have--that's a lot of DTD-specific configurations to set up, even if most of that effort is just copy and paste, it's still tedious and prone to the usual errors of catalog misconfiguration, filename misspelling, and so on.]

However, DITA totally chunks the assumption of static, well-known doctypes out the window. DITA says "hey, every shell is different, specialize away, apply agile approaches to developing and refining your local DITA-based DTDs, combine topics from everywhere willy-nilly, go nuts, have fun".

To support this DITA does something very important: it enables reliable auto-recognition of DITA documents, regardless of the details of the local configuration or the use of specialization.

DITA must have this mechanism because the specialization feature allows generic DITA processing to be reliably applied to any conforming DITA document. Because it can be, it should be.

For the DITA Open Toolkit this means applying default processing (transforms, filtering, etc.).

For editors it means applying default editing style sheets, enabling DITA-specific user interface components (e.g., "Insert topicref"), etc., if no more specific configuration already exists for the document or its shell doctype.

And there's no reason for any DITA-aware editor not to, except that, without exception that I can find, they've all implemented their document-to-functionality mapping in a way that doesn't enable this sort of dynamic association. The closest I've found so far is Syntext's Serna editor, which while it doesn't recognize specialized topics as DITA topics and apply its (very nice) built-in DITA support, it does make it a two-click process to manually apply their built-in DITA support. So kudos to Syntext. But it should be a zero-click process.

For this automatic process to work processors have to be able to examine any document they're presented with and reliably determine whether or not the document is or is not DITA-based. Note that the Open Toolkit presumes that what it's given is DITA-based because that's the only thing it is designed to process. But things like editors and CMS systems are, for the most part, completely generic and designed to handle any XML at all. So they cannot presume (or at least they should not presume).

The recognition of DITA documents cannot be based on the use of any particular DTD's system or public ID, because they'll all be different. You can't look for a particular well-known element type because the element types could be completely different from anything previously seen (let's imagine a specialization where all the element type names are in Chinese--there's nothing that prevents it and if I was a native reader of Chinese and wanted to create tech docs I'd probably do just that).

That means you've got to go by something invariant that is reliably in every document. In XML that really means the use of a particular well-known namespace. However, DITA element types cannot be in namespaces because the current DITA class mechanism syntax cannot support namespace-qualified names. Knowing that about DITA you might think "well what to do then?"

However, just because elements can't be in a namespace, it doesn't mean attributes can't be. And that's the trick DITA uses in DITA 1.1 to enable autorecognition of DITA documents, regardless of any other aspects of the DTD (it's public or system IDs, the element type names used, etc.).

This trick is the DITAArchVersion attribute. This attribute is in the namespace "http://dita.oasis-open.org/architecture/2005/". Any document that includes this namespace is almost certainly a DITA document, especially if the namespace qualifies an attribute named "DITAArchVersion" and the element on which that attribute occurs has a class= attribute conforming to the DITA class attribute syntax.

This means that regardless of the actual DTD or schema a DITA document uses, it can be recognized as being a DITA document. That means that you can then reliably and usefully apply default DITA processing to the document without having specifically configured its particular DTD or schema as being a DITA schema.

That is, the behavior I expect from any editor that claims to be DITA-aware is that if I open any conforming DITA document, regardless of what declaration set it happens to use, I should get all the default DITA-specific stuff automatically.

While the most robust implementation of this behavior would make all the checks described above, it is probably sufficient to assume that if a document's root element has a DITAArchVersion attribute or if the root element is named "dita" and any of its children have a DITAArchVersion attribute, then the document is a DITA document.

The DITA spec only really recognizes three possible configurations of elements in a conforming DITA document: root of base type "map", root of base type "topic", or root of type "dita" [the dita element is not specializable in DITA 1.1] where its direct child elements are of base type "topic"--anything else is not a conforming DITA document (although it may contain individually conforming topics or maps) and you have no obligation to apply DITA-specific features to it (although you could if you wanted to).

That's by way of saying it's probably good enough to just look for the DITA namespace anywhere in the document and go by that, but it could lead to false positives in cases where the document is not strictly a conforming DITA document.

And it would be really cool if editors provided defined extension points by which this type of recognition could be added to doctypes as plug-ins to the editor.

Labels: ,

4 Comments:

Anonymous Anonymous said...

You have to push on editor vendors. For example XMLMind XML Editor recently added ability to switch configurations based on attribute value of root element.

I needed it to differentiate between several DocBook5-based configurations.

Having full XPath support and having it in all tolls will be even better indeed.

1:35 PM  
Anonymous Anonymous said...

I suspect I know one reason why vendors are leery of ditaArchVersion: it involves namespaces. Many editors are pure-DTD things and just don't know how to deal with prefixed elements or attributes.

I predict a slew of editors which support ditaArchVersion only if it uses the "ditaarch" prefix, without checking the URL that the prefix maps to. That's better, but it just leaves room for a more subtle series of bugs.

I'm not trying to be an apologist; I want this kind of auto-detection too. But until vendors do some rather low-level refactoring of their editors, I doubt there will be much progress.

DITA (the spec) could help push things along by declaring the XSD versions of the schemas to be the normative ones.

5:23 PM  
Blogger Eliot Kimber said...

The problem with making the schemas normative is that they don't actually work.

The problem is that DITA depends on redefine feature of XSD, with has two problems. First, as speced it doesn't really do what we need, second, a number of XSD processors will never implement it. Doh!

So for the moment we are pretty well stuck with DTDs.

The only XML editor I can think of that is likely not be to usefully namespace aware is FrameMaker (at least Frame7, don't know about Frame8). But certainly XMetal and Arbortext and Serna and Oxygen are all perfectly namespace aware.

8:35 PM  
Blogger Don R. Day said...

You said, "And it would be really cool if editors provided defined extension points by which this type of recognition could be added to doctypes as plug-ins to the editor."

Yes, and moreoever, if editors would automatically detect any already installed DITA Open Toolkit specialization plugins.

These plugins (for example, the demo-oriented music plugin) already come with DTDs, processing overrides, and oftentimes sample documents--everything needed to support the full end-to-end editing and integrated production support. Such support would really close the productivity loop between designing, testing, and then rolling out a new specialization with no extra effort to pick up at least fall-through support for the new vocabulary. During the lifetime of an editor, users will often pick up new distributions of DITA OT, or augment their installations with new specializations.

My challenge is, "Why not pick up that user-oriented content profile by default?" Then you don't have to reinstall the DTDs all over again within the editor, which I believe is part of the pain you are experiencing when supporting arbitrary new DITA specializations.

7:26 AM  

Post a Comment

<< Home