Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Friday, October 12, 2007

I'm Bein' Macified

Through a series of more or less accidents I came to have physical possession of Really Strategies' one and only MacBook, purchased in order to support testing and delivery of software to a Mac-based client (which, considering that most of our clients are publishers should be most of them, but apparently hasn't been to date).

After some soul searching I have decided to make this Mac my primary development machine, giving up my oh-so-familiar Dell Windows-XP-based laptop.

We'll see how it goes. I must say that it's been quite an adjustment for me, somebody with nearly 20 years of Windows brain damage, to move to a Mac.

Of course it helps that most of the development tools I use are completely cross platform: Eclipse, Java, OxygenXML, Syntext Serna. It also helps that OS X is an *nx-based system under the covers, so I can get a command line that is familiar, although the configuration details are not (I've been using Debian-based distributions for most of the time I've used Linux). And other key tools have solid Mac versions (e.g., all the Adobe products).

I will even be able to get an RSuite server running on this machine, using an unsupported OS X build of MarkLogic.

I'm even starting to get used to the bizare control key mechanism, although it's still a struggle--it feels like trying to learn a new musical instrument that is just enough different from one you know to really hose you up.

I'm even writing this post using Safari, rather than Firefox, which I would normally use, but it's acting up this morning.

So wish me luck as I start on this new adventure in computing....

Labels:

Monday, October 01, 2007

Automatic Handling of DITA Docs In XML Editors

I'm in demo prep heck at the moment, trying to get some real DITA functionality built on top of Really Strategies' RSuite CMS product. One of the key challenges here is integrating XML editors to handle this use case:

Initial state: You are presented with some valid, conforming DITA documents in some locally configured and/or specialized document type, organized by one or more maps. You (and your repository and supporting tools) have never seen this particular set of documents or their DTDs before.

Step 1. Import map and all dependencies (including its DTD) into the repository

Step 2. Within the repository, find a topic to edit and push the "Edit with {name of integrated editor}" button in the repository UI.

Step 3. Editor opens with document, with all DITA support features applied.

It is that step 3 that is currently causing me a bit of pain. And it shouldn't.

The reason it's causing me pain is because every graphical XML editor has been built on the presumption that document types are relatively static and that some XML specialist will develop lots of doctype-specific setup and then deploy that setup once, followed by a long time with no changes to that setup.

Thus, if you're presented with new documents in a heretofore unseen DTD, they're not going to work in the editor until you go through the setup and configuration process for the new document types [And remember that DITA 1.1 requires at least six distinct shell types: map, concept, reference, task, glossentry, and dita, plus any additional specialized map or topic types you might have--that's a lot of DTD-specific configurations to set up, even if most of that effort is just copy and paste, it's still tedious and prone to the usual errors of catalog misconfiguration, filename misspelling, and so on.]

However, DITA totally chunks the assumption of static, well-known doctypes out the window. DITA says "hey, every shell is different, specialize away, apply agile approaches to developing and refining your local DITA-based DTDs, combine topics from everywhere willy-nilly, go nuts, have fun".

To support this DITA does something very important: it enables reliable auto-recognition of DITA documents, regardless of the details of the local configuration or the use of specialization.

DITA must have this mechanism because the specialization feature allows generic DITA processing to be reliably applied to any conforming DITA document. Because it can be, it should be.

For the DITA Open Toolkit this means applying default processing (transforms, filtering, etc.).

For editors it means applying default editing style sheets, enabling DITA-specific user interface components (e.g., "Insert topicref"), etc., if no more specific configuration already exists for the document or its shell doctype.

And there's no reason for any DITA-aware editor not to, except that, without exception that I can find, they've all implemented their document-to-functionality mapping in a way that doesn't enable this sort of dynamic association. The closest I've found so far is Syntext's Serna editor, which while it doesn't recognize specialized topics as DITA topics and apply its (very nice) built-in DITA support, it does make it a two-click process to manually apply their built-in DITA support. So kudos to Syntext. But it should be a zero-click process.

For this automatic process to work processors have to be able to examine any document they're presented with and reliably determine whether or not the document is or is not DITA-based. Note that the Open Toolkit presumes that what it's given is DITA-based because that's the only thing it is designed to process. But things like editors and CMS systems are, for the most part, completely generic and designed to handle any XML at all. So they cannot presume (or at least they should not presume).

The recognition of DITA documents cannot be based on the use of any particular DTD's system or public ID, because they'll all be different. You can't look for a particular well-known element type because the element types could be completely different from anything previously seen (let's imagine a specialization where all the element type names are in Chinese--there's nothing that prevents it and if I was a native reader of Chinese and wanted to create tech docs I'd probably do just that).

That means you've got to go by something invariant that is reliably in every document. In XML that really means the use of a particular well-known namespace. However, DITA element types cannot be in namespaces because the current DITA class mechanism syntax cannot support namespace-qualified names. Knowing that about DITA you might think "well what to do then?"

However, just because elements can't be in a namespace, it doesn't mean attributes can't be. And that's the trick DITA uses in DITA 1.1 to enable autorecognition of DITA documents, regardless of any other aspects of the DTD (it's public or system IDs, the element type names used, etc.).

This trick is the DITAArchVersion attribute. This attribute is in the namespace "http://dita.oasis-open.org/architecture/2005/". Any document that includes this namespace is almost certainly a DITA document, especially if the namespace qualifies an attribute named "DITAArchVersion" and the element on which that attribute occurs has a class= attribute conforming to the DITA class attribute syntax.

This means that regardless of the actual DTD or schema a DITA document uses, it can be recognized as being a DITA document. That means that you can then reliably and usefully apply default DITA processing to the document without having specifically configured its particular DTD or schema as being a DITA schema.

That is, the behavior I expect from any editor that claims to be DITA-aware is that if I open any conforming DITA document, regardless of what declaration set it happens to use, I should get all the default DITA-specific stuff automatically.

While the most robust implementation of this behavior would make all the checks described above, it is probably sufficient to assume that if a document's root element has a DITAArchVersion attribute or if the root element is named "dita" and any of its children have a DITAArchVersion attribute, then the document is a DITA document.

The DITA spec only really recognizes three possible configurations of elements in a conforming DITA document: root of base type "map", root of base type "topic", or root of type "dita" [the dita element is not specializable in DITA 1.1] where its direct child elements are of base type "topic"--anything else is not a conforming DITA document (although it may contain individually conforming topics or maps) and you have no obligation to apply DITA-specific features to it (although you could if you wanted to).

That's by way of saying it's probably good enough to just look for the DITA namespace anywhere in the document and go by that, but it could lead to false positives in cases where the document is not strictly a conforming DITA document.

And it would be really cool if editors provided defined extension points by which this type of recognition could be added to doctypes as plug-ins to the editor.

Labels: ,