Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Friday, April 18, 2008

Choosing an XML Schema: DocBook or DITA?

Richard Hamilton has presented a thoughtful analysis of when to choose DocBook or DITA, published on the Content Wrangler blog here: choosing_an_xml_schema_docbook_or_dita/

I started to post the following as a comment to that post but it got long enough that I thought it better to post my full response here.

I generally agree with Richard's analysis as far as it goes, but I think it misses several important points that I assert tip the scales significantly in favor of DITA over DocBook.

If you are looking for a documentation schema that you can just pick up and use and you don't need the modularity features of DITA (that is, you don't need the functionality of DITA maps) then DocBook probably makes the most sense for the reasons Richard cites, namely that there are more element types of likely utility out of the box and the processing infrastructure is more mature and better documented.

However, if you know you need to add markup for your specific requirements or are developing a new XML application where things like markup tailored for local users or requirements is important or modularity is important, then DITA has a very clear advantage because it is so much easier to develop and extend custom document types from a DITA base than from a DocBook base.

The reason is very simple: DITA's specialization mechanism, coupled with the declaration set design patterns defined by the DITA architecture, make it as easy as it could possibly be to develop new markup structures. In particular, having defined specializations you may need to do nothing more in order to have documents that use those new types work with existing DITA processors, editors, CMS systems, etc.

DocBook cannot have this characteristic until such time as it either adopts the DITA specialization mechanism (which it could easily do--I worked hard to have the specialization aspects of DITA defined as distinct from the DITA element types specifically so that it could be adopted by other XML applications with a minimum of fuss) or adds the equivalent functionality using some other syntax [one limitation in the current DITA specialization mechanism is no good way to support namespaced elements--that will be fixed in DITA 2.0 but nobody has yet started to work in earnest on what that might be--this could be an opportunity for DocBook to take the lead since DocBook definitely has a namespace requirement.]

With any DocBook application, if you define new element types, there is no defined way to map those back to existing types and DocBook processors are not designed to handle new types by processing them in terms of some base type. That means that if you define new element types in a DocBook context you must update all processors that need to act with those documents even if all they need to do is nothing with those elements.

On the subject of narrative documents, there is essentially no practical difference between DITA and DocBook in their ability to support the creation of single-instance documents of arbitrary depth. This is obvious for DocBook (because that's what it was designed for), not so obvious for DITA (because it was designed for the opposite).

But with DITA all you need to do is configure your local doctypes ("shells" in DITA parlance) to allow topics to nest. For example, the simplest case is to simply allow generic topic to test. With that you can represent any possible narrative document structurally.

The only meaningful difference in this scenario between DITA and DocBook is that DITA requires the body of a section to be wrapped in a container (the topic body), while DocBook does not provide such a container (or at least it didn't last time I looked).

This is really a trivial difference.

For several clients who are doing publishing rather than technical documentation I have developed essentially trivial specializations that provide generic topics distinguished only by their topic type names but using otherwise generic DITA elements for content. I usually define a specialized topic called "subsection" that can nest to any depth. With that model you can represent documents as well as or better than you can with DocBook and you get all the other DITA goodness as well.

Finally, there is a free DITA-to-DocBook transform that is part of the free DITA Open Toolkit that allows you to use all the DocBook processing infrastructure with DITA-based content. This is used, for example, to use non-DITA-aware composition systems like XPP with DITA-based content.

Because DITA offers a number of very important features that DocBook does not, in particular specialization, modularity, and external links (relationship tables), and because DITA can be configured to work as well for non-modular documents as DocBook can, and because DITA lowers the cost of developing new element types as low as it could possibly be, I've come to the conclusion that DITA is the best answer for any XML-based document-centric application I've seen.

Just the fact you can get OxygenXML for almost nothing, define a completely new DITA specialization, deploy it to your local Toolkit as a plugin (a very easy operation once you know what to do, something I need to write a tutorial for), you can then edit documents using that specialization in a full-featured graphical, tags off editor with no additional work of any sort is pretty powerful. DocBook simply cannot enable that because it doesn't have DITA's specialization feature.

If DocBook adopted DITA's specialization mechanisms then this discussion wouldn't even be meaningful because DocBook would get all the value that specialization accrues to DITA and would still have the value of being a conceptually simpler model for documents.

Which raises the question: why doesn't DocBook simply adopt DITA's specialization mechanism? It would cost DocBook almost nothing to add and add tremendous value. It would not require DocBook changing anything about its current markup design, except to possibly back-form some base types that are currently not explicit in DocBook but would be useful as a specialization base. But that would only make DocBook cleaner.