Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Saturday, May 16, 2009

Why DITA Requires Topic IDs (And Why Their Values Don't Matter)

The DITA standard requires all topics to have an id= attribute.

Why?

The reason is simple: so you can point to elements within topics. And for no other reason.

Surprised?

Most people seem to assume that topics are required to have IDs so you can point to the topics. And they further seem to assume that topic IDs need to be unique within some fairly wide scope (e.g., within their local topic repository).

But that's not the case at all.

For the case of topics that are the root elements of their containing XML documents and that contain no elements that themselves have IDs, the topic ID isn't needed at all. In this case the topic can be unambiguously addressed by the location of the containing XML document (e.g., "mytopic.xml").

In the case of topics that are not root elements and that are not themselves pointed to and that do not contain any elements with IDs, again the ID is not needed (because nothing points at the topic or its elements).

So why does the DITA standard require topics to have IDs?

It is because topics establish the addressing scope for their direct-descendant non-topic elements.

By the DITA spec, to point to an element that is not a topic you use a two-part pointer: {topicid}/{elementid}.

Without a topic ID it would be impossible to point to a non-topic element.

By requiring all topics to have some ID, it ensures that any non-topic elements with IDs are immediately addressable without the need to also add an ID to their containing topic.

In general by normal DITA practice, non-topic elements are given IDs only when they are intended to be either used by conref or be the target of a cross reference. Both of these tend to be carefully considered decisions driven by editorial and business rules, not arbitrary author decision. Which means you would tend to know, in advance of creation of a given element, that it is a candidate for conref use or xref use, which means you know to give it an ID at the time you create it.

By requiring that topics always have IDs it means that authors don't have to worry about adding IDs to topics just because they also happened to put an ID on an element. [In normal XML practice, elements are addressed directly by ID within their containing document, which means it is sufficient to simply put an ID on the element with no other dependencies. That is not the case in DITA, which defines its own unique syntax for non-topic element addressing.]

Because topic IDs are XML IDs (as opposed to non-topic-element IDs, which are just name tokens and have no special XML-defined rules), any XML editor will both require topics to have IDs and ensure that topic IDs are unique within the scope of their containing document.

If topic IDs were not required, DITA-aware editors would have to have special rules to know to require topic IDs whenever non-topic elements got IDs and it would mean that generic XML editors would not ensure that this important DITA rule was met (topics with elements with IDs must themselves have IDs).

So the DITA spec requires that all topics have IDs.

But the fact that topics must have IDs does not imply that topic IDs need to be either descriptive or unique within any scope wider than the XML documents that contain them.

In the case where every topic is the root of its own document, the topic ID can be the same for every topic. To make this point I have standard practice of using the value "topicid" for the IDs of all my root topics. There is absolutely no need to generate unique topic IDs for document-root topics as a matter of standard practice.

The only other case is ditabase documents.

If you are using ditabase documents, stop.

Sorry.

There are some legitimate uses of ditabase documents, for example, as a first-pass target for data conversions and as a way to hold otherwise unrelated topics that need to be managed as a single unit of storage, such as topics that exist only to hold reusable elements.

[NOTE: Using ditabase simply to allow the mixing of different topic types in a single document during authoring* is the wrong thing to do. You should have already created local shell DTDs and within those shells you can allow whatever topic type mixing is appropriate for your local environment. There is no need to use ditabase in that case and many reasons not to. See my many other posts about why you should always create local shell DTDs as the first step in setting up a production use of DITA.]

In that case, the topic IDs must be unique within the scope of the ditabase element, simply because XML rules demand it. But the IDs need not be unique beyond that scope and they need not be meaningful.

One of the implications of this is that if you always edit topics as individual documents and never have nested topics you never have to think about topic IDs. Your topic document template should already have an ID value and it can be something like "topicid" and there is no reason whatsoever for that ID to ever be changed.

In the case where you do edit topics with nested topics (for example, you're authoring more or less narrative documents or you've designed some topics types that need nested topics to allow a bit of hierarchy where the nested topics would never be meaningful in isolation) then you either have to configure your editor to assign IDs to the nested topics for you (if your document template doesn't already have the subtopics with IDs assigned) or you have to think about it. But even in that case, the IDs can be pretty generic, e.g. "st1", "st2", etc. The IDs in that case still don't need to be unique beyond the scope of the containing document.

Labels: