Dr. Macro's XML Rants: Choosing an XML Schema: DocBook or DITA?

Friday, April 18, 2008

Choosing an XML Schema: DocBook or DITA?

Richard Hamilton has presented a thoughtful analysis of when to choose DocBook or DITA, published on the Content Wrangler blog here: http://www.thecontentwrangler.com/article_comments/ choosing_an_xml_schema_docbook_or_dita/

I started to post the following as a comment to that post but it got long enough that I thought it better to post my full response here.

I generally agree with Richard's analysis as far as it goes, but I think it misses several important points that I assert tip the scales significantly in favor of DITA over DocBook.

If you are looking for a documentation schema that you can just pick up and use and you don't need the modularity features of DITA (that is, you don't need the functionality of DITA maps) then DocBook probably makes the most sense for the reasons Richard cites, namely that there are more element types of likely utility out of the box and the processing infrastructure is more mature and better documented.

However, if you know you need to add markup for your specific requirements or are developing a new XML application where things like markup tailored for local users or requirements is important or modularity is important, then DITA has a very clear advantage because it is so much easier to develop and extend custom document types from a DITA base than from a DocBook base.

The reason is very simple: DITA's specialization mechanism, coupled with the declaration set design patterns defined by the DITA architecture, make it as easy as it could possibly be to develop new markup structures. In particular, having defined specializations you may need to do nothing more in order to have documents that use those new types work with existing DITA processors, editors, CMS systems, etc.

DocBook cannot have this characteristic until such time as it either adopts the DITA specialization mechanism (which it could easily do--I worked hard to have the specialization aspects of DITA defined as distinct from the DITA element types specifically so that it could be adopted by other XML applications with a minimum of fuss) or adds the equivalent functionality using some other syntax [one limitation in the current DITA specialization mechanism is no good way to support namespaced elements--that will be fixed in DITA 2.0 but nobody has yet started to work in earnest on what that might be--this could be an opportunity for DocBook to take the lead since DocBook definitely has a namespace requirement.]

With any DocBook application, if you define new element types, there is no defined way to map those back to existing types and DocBook processors are not designed to handle new types by processing them in terms of some base type. That means that if you define new element types in a DocBook context you must update all processors that need to act with those documents even if all they need to do is nothing with those elements.

On the subject of narrative documents, there is essentially no practical difference between DITA and DocBook in their ability to support the creation of single-instance documents of arbitrary depth. This is obvious for DocBook (because that's what it was designed for), not so obvious for DITA (because it was designed for the opposite).

But with DITA all you need to do is configure your local doctypes ("shells" in DITA parlance) to allow topics to nest. For example, the simplest case is to simply allow generic topic to test. With that you can represent any possible narrative document structurally.

The only meaningful difference in this scenario between DITA and DocBook is that DITA requires the body of a section to be wrapped in a container (the topic body), while DocBook does not provide such a container (or at least it didn't last time I looked).

This is really a trivial difference.

For several clients who are doing publishing rather than technical documentation I have developed essentially trivial specializations that provide generic topics distinguished only by their topic type names but using otherwise generic DITA elements for content. I usually define a specialized topic called "subsection" that can nest to any depth. With that model you can represent documents as well as or better than you can with DocBook and you get all the other DITA goodness as well.

Finally, there is a free DITA-to-DocBook transform that is part of the free DITA Open Toolkit that allows you to use all the DocBook processing infrastructure with DITA-based content. This is used, for example, to use non-DITA-aware composition systems like XPP with DITA-based content.

Because DITA offers a number of very important features that DocBook does not, in particular specialization, modularity, and external links (relationship tables), and because DITA can be configured to work as well for non-modular documents as DocBook can, and because DITA lowers the cost of developing new element types as low as it could possibly be, I've come to the conclusion that DITA is the best answer for any XML-based document-centric application I've seen.

Just the fact you can get OxygenXML for almost nothing, define a completely new DITA specialization, deploy it to your local Toolkit as a plugin (a very easy operation once you know what to do, something I need to write a tutorial for), you can then edit documents using that specialization in a full-featured graphical, tags off editor with no additional work of any sort is pretty powerful. DocBook simply cannot enable that because it doesn't have DITA's specialization feature.

If DocBook adopted DITA's specialization mechanisms then this discussion wouldn't even be meaningful because DocBook would get all the value that specialization accrues to DITA and would still have the value of being a conceptually simpler model for documents.

Which raises the question: why doesn't DocBook simply adopt DITA's specialization mechanism? It would cost DocBook almost nothing to add and add tremendous value. It would not require DocBook changing anything about its current markup design, except to possibly back-form some base types that are currently not explicit in DocBook but would be useful as a specialization base. But that would only make DocBook cleaner.

Labels: dita docbook contentwrangler

9 Comments:

Anonymous said...: Hi Elliot,

Great article! Your DITA/Docbook comparison is very relevant and clearly expose some of the key differences between the two standards.

One thing I did not find in your article (and in other DITA articles in general): what about the real cost of implementing DITA?

Ok, DITA is open-source and XML Editors are cheap. However DITA is not an 'out-of-the-box' solution and requires quite some work to get up and running and technical skills for in-house maintenance...

When does it make sense for a documentation team to move to single-source and DITA? Any thoughts on this would be much appreciated!

Fabrice; 12:17 PM
Anonymous said...: Eliot,

Thanks for reading and commenting on my article. You make some excellent points, especially with respect to specialization.

Have you seen Norm Walsh's article on implementing DITA features in DocBook? Here's a link:

http://norman.walsh.name/2005/10/21/dita

He implemented specialization in DocBook using annotations in the RelaxNG grammar (he also implemented DITA style cross-references and conrefs, but that's another discussion).

He then modified the stylesheets to process specializations the way DITA does. That is, fall back to the processing for the parent element if the specialized element doesn't have it's own processing.

I'm curious as to whether you think Norm's approach provides the level of specialization you suggest DocBook adopt? The mechanisms, both in the schema and in the stylesheets, are different, but I think his approach gets you to the same place.

Regards,
Dick Hamilton
http://rlhamilton.net; 4:44 PM
seltzerwater464 said...: Hi Eliot,

http://www.miskatonic.org/library/facet-web-howto.html

This is what I mentioned last Friday as a way to categorize/taxonomize information. I hope that you will find it at least thought provoking - as perhaps it could be used in the navigational aspects of information systems, since it has more flexibility .; 7:01 AM
seltzerwater464 said...: Hi Elliot,

Check this out - a DITA editing suite!:

http://dita.xml.org/news/inmedius%C2%AE-introduces-dita-storm%E2%84%A2-desktop

Thanks!

Scott; 1:03 PM
Anonymous said...: Hello, Elliot Kimber; I'm a very minor blast from your past. I apologize in advance, if mentioning "Passage Systems" rings any bells of alarm..? :-]

This is Daniel Green, part of the 'cordwood' pitched over the side when Passage had one of its employee bloodletting sessions (the second..?), in May 1996. So, long story short: I'm now the technical communicator in the bicycle division of FOX Racing Shox in Watsonville, CA.

We use Madcap Software's Flare online authoring tool here, and just upgraded to v5, which offers a certain degree of integration with DITA. So far DITA's appearing to be more transparent than anything else, but if you would have *any* free minutes, pointing me to some good documentation about DITA would be greatly appreciated!
Working with SGML back then makes working with XML very easy already. I have the basics about DITA understood; evolved @ IBM to basically replace IBMIDDOC, which I also worked with some, with two contract stints at Cottle Rd and Santa Teresa Lab in San Jose. I guess I'm simply looking to be best educated as to the ways DITA takes it to another (and better) level, is all.
Anyway, sorry for the blather; you might still be wondering who the bloody hell I am, anyway. Thanks hugely for your time! Love the beard; I have one also. Cheers,

sincerely,

Daniel Green

Technical Communications
Fox Racing Shox - Bicycle Division
130 Hangar Way
Watsonville, CA 95076
(800) 369-7469 x6543

http://www.foxracingshox.com
http://service.foxracingshox.com; 11:33 AM
Eliot Kimber said...: dita.xml.com will get you to a lot of resources.

You might check out JuliaVazquez's book: http://www.lulu.com/content/paperback-book/practical-dita/5418702

Cheers,

E.; 3:55 PM
Lance said...: Elliot, when is your book coming out? I could use it *right now*.; 8:47 AM
Anonymous said...: Hi Elliot,

Any chance you have a more recent comparison of DocBook vs DITA?

Debbie; 7:58 AM
Eliot Kimber said...: I have not written an updated comparison, largely because the question never really comes up any more.

If I was to update this post for today (5 years later), it would be that DITA has progressed quite a bit, both in terms of vocabulary available and in terms of software availability. Not surprisingly, the DocBook world is pretty static, which both reflects its maturity and the fact that it's essentially reached the limits of its utility as currently architected. As I said in the post, if DocBook provided something equivalent to the DITA specialization feature, that might change things, but at this point, there's really no reason to have two vocabularies for this type of content and all the energy I see is going into DITA. DITA 1.3 will add a bunch more useful vocabulary and some important new architectural features, including scoped keys and cross-deliverable linking.

I have seen significant use of the DITA for Publishers vocabulary and Toolkit plugins in the Publishing industry and I see at least exploration of DITA in other verticals as well, including standards, internal business documents, and even STEM journal publishing (at least on the authoring side, if not on the delivery and interchange side).

I can't think of any reason to start with DocBook today if you were starting from scratch with an XML solution.

DITA implementation still has its challenges, as does the implementation of any new technology, but the total cost of implementation and ownership is still much lower than it is for any other XML application.

Note that DocBook is still cost effective for *startup*, because of course you can pick up the whole technology chain and use it (just as you can with DITA) but the medium and long-term costs are higher, the long-term risk is much higher (because the technology is aging and support is essentially at maintenance levels. I'm sure Norm and the other key members of the DocBook community will continue to support it as they can over the coming years but it is unlikely to progress much beyond its current state).

If you have requirements for modularity, granual reuse, and so on, the DITA is clearly a better fit than DocBook, for the simple reason that DITA was designed specifically to satisfy those requirements and DocBook was not.

Of late, the one place where I found DocBook to have superior vocabulary was in bibliography markup (which I discovered in trying to tag up a bibliography for my own book). I tried to adapt the DocBook markup design to DITA as a new vocabulary module but I found the declarations to be so intertwingled that it was impossible to easily cut out just the element type declarations for the bibliography elements and adapt them and the volume of distinct element types was so large that I gave up for lack of time and urgency.; 1:36 PM

Dr. Macro's XML Rants

Friday, April 18, 2008

Choosing an XML Schema: DocBook or DITA?

9 Comments:

About Me

Previous Posts