Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Friday, July 20, 2007

InDesign CS3 and XML Authoring: Could be Good

In my new job at Really Strategies I have started digging pretty deeply into how to get XML into and out of Adobe InDesign CS3. This has turned out to be pretty interesting.

In InDesign CS2 the XML support was somewhat weak. While you could import an XML document into InDesign and then associate styling with it, it was very simplistic in that you had no direct way to do context-based associations and no easy way to script it, either on import or inside the editor.

In CS3 that has largely changed. CS3 adds several new XML support features that appear to serve to make InDesign a quite powerful XML rendering tool that could be integrated loosely or tightly with any other XML authoring tool to create an interesting environment. (You could, in theory use InDesign to author the XML as well but it wasn't really designed for that and I don't think it's a good use of resources to try to make it an XML editor, not when the process I outline here is so easy to implement.)

Here's the general mechanism I'm working toward:

1. Using InDesign, you create a template document that will accept your XML. This requires setting up all the usual styling stuff (page masters, frames, named styles) as well as creating instances of the markup structures that will populate different text frames (InDesign's XML import works by matching imported elements to existing elements and replacing the existing ones with matching structures on import, more or less).

2. You create an XSLT that takes your XML source and "augments" it with Adobe-specific attributes that specify the per-element-instance mapping to InDesign paragraph and characters, as well as generated elements for any generated text that needs to be rendered as a separate paragraph (analogous to the gentext psuedo elements Arbortext Editor uses to manage generated text display).

This XSLT can be pretty simple--it's just an identity transform with a little bit of per-element-type logic to define the mapping (and it could be further parameterized through some sort of more direct mapping specification, although I'm not sure it's worth the effort). This script could also re-order things as needed, generate TOCs, etc. But the minimum required is pretty small. There're a few more things you need to handle, but they can be generalized easily enough.

The main gotcha here is that InDesign is sensitive to newlines in the XML data, because newlines trigger the application of paragraph styles. What I've found so far is that you have to manage the text content very carefully so that you only emit newlines at true paragraph boundaries. This also means that you only apply paragraph styles to the lowest-level elements that will become paragraphs in InDesign--you can't just blindly apply styles at higher levels in the XML hierarchy (InDesign is not XSL-FO).

3. You run this transform outside of InDesign. InDesign lets you apply a transform as part of the import process, but we don't want to do that for reasons that will become clear in a moment (unless I've missed a feature of InDesign, which is quite possible--I'm still coming up to speed on its intricacies).

I use OxygenXML for most XML editing and it provides a very convenient mechanism for applying a transform to a document and saving the result wherever you want. But any good XML editor should provide a way to do this so that you have some sort of "run the transform" button or menu item. The key is that the result (the XML with the InDesign augmentations) is always put to some consistent place.

4. Import the augmented XML (not the XML you're authoring in your XML editor) into InDesign using InDesign's XML import (without applying the XSLT) but being sure to check the "link to XML" check box and select "merge" not "append"--this is the key.

5. Go back to your XML editor, make changes to the XML and push the "transform" button again.

6. Switch to InDesign and bring up the Links pallet. In that you'll find your XML document listed. Select it and click the "update link" button. Magically, your XML changes are re-imported into InDesign and the styles applied.

Hey presto! Immediate, easy, convenient pagination of XML using InDesign. Something that was not immediate, easy, or convenient with CS2.

I haven't looked into it but it should be possible to script the triggering of the link update as well, although that might require a little C code, I'm not sure. But it's clear that by this mechanism you can use InDesign as a "page preview" mechanism from any XML editor with very little work.

Beyond the simple element-to-style mapping you can do on import, CS3 also provides scripting support for working with XML in the form of XPath-based functions that allow you to easily apply any script to elements in context. I haven't used this yet but a brief look at the docs suggests that it's just the thing to take your XML to the next level.

It's still not going to give you what products like Typefi give you, which is complete complex layout heuristics, but it should be sufficient for relatively simple layouts such as typify technical documentation. It occurred to me, for example, that it wouldn't be very hard to create a process that would allow you to use InDesign to create nice books from DITA source using this mechanism. Hmmm

Note that you can download a one-month eval of InDesign from Adobe's Web site.