Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Friday, July 20, 2007

InDesign CS3 and XML Authoring: Could be Good

In my new job at Really Strategies I have started digging pretty deeply into how to get XML into and out of Adobe InDesign CS3. This has turned out to be pretty interesting.

In InDesign CS2 the XML support was somewhat weak. While you could import an XML document into InDesign and then associate styling with it, it was very simplistic in that you had no direct way to do context-based associations and no easy way to script it, either on import or inside the editor.

In CS3 that has largely changed. CS3 adds several new XML support features that appear to serve to make InDesign a quite powerful XML rendering tool that could be integrated loosely or tightly with any other XML authoring tool to create an interesting environment. (You could, in theory use InDesign to author the XML as well but it wasn't really designed for that and I don't think it's a good use of resources to try to make it an XML editor, not when the process I outline here is so easy to implement.)

Here's the general mechanism I'm working toward:

1. Using InDesign, you create a template document that will accept your XML. This requires setting up all the usual styling stuff (page masters, frames, named styles) as well as creating instances of the markup structures that will populate different text frames (InDesign's XML import works by matching imported elements to existing elements and replacing the existing ones with matching structures on import, more or less).

2. You create an XSLT that takes your XML source and "augments" it with Adobe-specific attributes that specify the per-element-instance mapping to InDesign paragraph and characters, as well as generated elements for any generated text that needs to be rendered as a separate paragraph (analogous to the gentext psuedo elements Arbortext Editor uses to manage generated text display).

This XSLT can be pretty simple--it's just an identity transform with a little bit of per-element-type logic to define the mapping (and it could be further parameterized through some sort of more direct mapping specification, although I'm not sure it's worth the effort). This script could also re-order things as needed, generate TOCs, etc. But the minimum required is pretty small. There're a few more things you need to handle, but they can be generalized easily enough.

The main gotcha here is that InDesign is sensitive to newlines in the XML data, because newlines trigger the application of paragraph styles. What I've found so far is that you have to manage the text content very carefully so that you only emit newlines at true paragraph boundaries. This also means that you only apply paragraph styles to the lowest-level elements that will become paragraphs in InDesign--you can't just blindly apply styles at higher levels in the XML hierarchy (InDesign is not XSL-FO).

3. You run this transform outside of InDesign. InDesign lets you apply a transform as part of the import process, but we don't want to do that for reasons that will become clear in a moment (unless I've missed a feature of InDesign, which is quite possible--I'm still coming up to speed on its intricacies).

I use OxygenXML for most XML editing and it provides a very convenient mechanism for applying a transform to a document and saving the result wherever you want. But any good XML editor should provide a way to do this so that you have some sort of "run the transform" button or menu item. The key is that the result (the XML with the InDesign augmentations) is always put to some consistent place.

4. Import the augmented XML (not the XML you're authoring in your XML editor) into InDesign using InDesign's XML import (without applying the XSLT) but being sure to check the "link to XML" check box and select "merge" not "append"--this is the key.

5. Go back to your XML editor, make changes to the XML and push the "transform" button again.

6. Switch to InDesign and bring up the Links pallet. In that you'll find your XML document listed. Select it and click the "update link" button. Magically, your XML changes are re-imported into InDesign and the styles applied.

Hey presto! Immediate, easy, convenient pagination of XML using InDesign. Something that was not immediate, easy, or convenient with CS2.

I haven't looked into it but it should be possible to script the triggering of the link update as well, although that might require a little C code, I'm not sure. But it's clear that by this mechanism you can use InDesign as a "page preview" mechanism from any XML editor with very little work.

Beyond the simple element-to-style mapping you can do on import, CS3 also provides scripting support for working with XML in the form of XPath-based functions that allow you to easily apply any script to elements in context. I haven't used this yet but a brief look at the docs suggests that it's just the thing to take your XML to the next level.

It's still not going to give you what products like Typefi give you, which is complete complex layout heuristics, but it should be sufficient for relatively simple layouts such as typify technical documentation. It occurred to me, for example, that it wouldn't be very hard to create a process that would allow you to use InDesign to create nice books from DITA source using this mechanism. Hmmm

Note that you can download a one-month eval of InDesign from Adobe's Web site.

18 Comments:

Anonymous Anonymous said...

Eliot,

Have you found the mechanism that calls the various page masters that are available? Are these related to content - for example, a specific matched element in the XSL creates a new page using a different page master?

And that got me thinking to some features like "first page", "last page" in sequences of pages within a given flow. Is this kind of capability available?

Michael

1:54 PM  
Blogger Eliot Kimber said...

There's no direct mechanism that I've seen for associating content to page masters automatically.

Either you already have the pages set up and you just flow content into it or you'll need to use scripting to create new pages and populate them based on element types in context.

One could imagine using something like FO's conditional page sequence masters to declaratively define page sequences (in terms of InDesign master page names) and then apply them via scripts. That would be instead of just hard coding the logic in your script. You could establish a naming convention for master pages to indicate things like first, last, etc. (you can already get even/odd from the page properties).

You could also use script labels or extended labels to further annotation page masters and master spreads, if you needed more distinction than you can get from page master names or if you wanted to allow local names but have a separate classification mechanism for use with a more generic script.

9:27 PM  
Blogger Silver Arrow said...

Hey
nice article :)
Do you know if there is a way to convert back an XSL:FO into INX format so that we could eventually re-work the layout/content when the fo is coming from another source?
Thx a lot

8:36 AM  
Blogger Unknown said...

Eliot,

I'm very interested in this issue, actually I have tried to achieve this but found it difficult. It would be nice if some could publish/develop some DITA2InDesing application. Have you had the time to take this issue further?

///bme

2:12 PM  
Blogger Eliot Kimber said...

I have not yet had a chance to do anything with DITA to InDesign, unfortunately. It's still high on my list of things to do if time permits but at the moment I can't predict when or if time will permit.

7:27 PM  
Anonymous Anonymous said...

Eliot,
I've come across your Blog and have watched your introduction to DITA videos, which have helped me learn a lot. I work for a company that specializes in printing multi-national instruction manuals. Our clients and us use FrameMaker and InDesign for the bulk of this work. We are looking at getting into a DITA workflow with our clients. I've been tasked with getting a conventional InDesign file into DITA topics so it could then be handed off to the DITA CMS system we are contemplating. My first stab at this was to individually tag each item on the page for XML, but it's extremely laborious as it's a 48 page times 3 language document.
Since this article on InDesign CS3 and XML was published last July, have you had any more time to look at what an InDesign to DITA conversion would take to accomplish?

12:44 PM  
Blogger Eliot Kimber said...

WRT to InDesign-to-DITA conversion: I think that's pretty much the same as any other non-XML format into DITA. What I do have some more practical experience with is automating the conversion of unstructured InDesign into structured XML. The key is having consistent paragraph and character style names and application. Given that you can use the automatic "map styles to tags" to generate an XML version of the InDesign data that you can then attack with traditional XML tools (e.g., XSLT).

Trying to do the tagging directly in InDesign would be too hard, I think--InDesign simply wasn't designed for that sort of activity.

As for DITA-to-InDesign, I have gotten approval to start an open-source project for developing a DITA-to-InDesign plug-in for the Open Toolkit. I'll have a formal announcement as soon as I get the project set up, which should be in the next week or so.

1:45 PM  
Anonymous Anonymous said...

I'm just starting down this path, but do you suppose indesign could be the ticket for importing an xml file off the web (in my case an xml file of used cars from our site's database) and be able to import that content into a template that would be bound for the newspaper? Typically we would have to copy and paste content for each car, price, description etc. into each little box on the template.

Any ideas?

2:29 PM  
Blogger Unknown said...

Hi, I'm trying to locate an XSLT that will add carriage returns to exported FileMaker Pro data, so that I can apply Paragraph Styles in InDesign. I've no experience of XSLT so having to write one myself is a bit daunting. Can you point me in the right direction.
Thanks, Nigel

4:30 PM  
Blogger Eliot Kimber said...

Would have to see what the Frame output looks like, but the basic approach is demonstrated in the (not at all complete) XSLTs that are in the DITA2InDesign source repository (dita2indesign.sourceforge.net).

Essentially it's an identity transform that processes all the PCDATA content to adjust the whitespace and newlines appropriately.

11:43 PM  
Anonymous Anonymous said...

Eliot,

Basically i just want to know that if it is at all possible for indesign to distinguish between one record and th e next in a data merge and apply a particular master to a flagged record. Is that at all possible?

Jon

7:06 PM  
Anonymous Anonymous said...

Hi all
Last year I was asking if some component was available to create Indesign documents. Unfortunately, none was available.
This is not the case anymore
Check this out

http://www.inxbuilder.com
http://philippegraca.wordpress.com/2008/05/23/quickly-create-adobe%c2%ae-indesign%c2%ae-documents-via-simple-xml-files-or-programmatically-in-net/

Regards
Philippe

11:31 AM  
Anonymous Anonymous said...

with regard to your step 1, where you say to create a document template. What do you do when you're in a publishing environment where there is little consistency with regard to page dimensions and layout arrangment? I'm at an illustrated press; our books can often be quite complex from a design/layout standpoint.

4:12 AM  
Anonymous Anonymous said...

Thanks for a very helpful article. Would it be possible to post a sample of what the transformed XML should look like? I'm having problems getting repeating data to work properly - I either get all the formatting removed, and the contents of the whole element (i.e. contents of sub-elements), or nothing at all.

Many thanks,
Dave.

1:20 AM  
Blogger Eliot Kimber said...

Since I made this post I've come to the conclusion that trying to import XML directly into InDesign is counterproductive--it's easier to generate INCX (InCopy XML) or INX (InDesign interchange) directly and gives you more control.

One thing I've started doing (using a Java library of I've been developing) is generating InDesign documents by taking template documents and adding to them components from other data sources, both XML and things grabbed from other InDesign docs.

If you have widely varying page layouts, you can try to define conventions for how things are named or labeled (using object styles and/or script labels) so that code that generates the content to go in the document has a consistent set of targets (frames, page masters, etc.).

Then you can create new templates that reflect new geometries that will still work as generation or import targets as long as the names of things are the same.

If there is more variation then that then automation may be difficult.

Another approach is to do what we are now doing with our RSuite product, and dynamically transform XML into INCX as it is served to InDesign. This lets you treat XML documents as InCopy articles from InDesign's point of view, so you can place them in frames like you would any other article. The design task for the page layout is the same as it would otherwise be but getting the data in is a simple matter of placing the link (given an a appropriate XML-to-INCX transform on the back end).

Not sure how much this answer helps....

12:17 AM  
Blogger Eliot Kimber said...

To David Brotherstone:

That problem (repeating elements) is one of the reasons I eventually abandoned the XML import route in preference to generating INCX directly--it's not that much harder than doing the preprocessing necessary to make XML import work at all and it lets you do whatever you need to do.

The next step up is generating stories and frames and pages, which is possible but a bit more than is reasonable in XSLT, which is why I wrote a Java library to manipulate INX documents. Unfortunately that library is not something I can currently give away but it doesn't do anything you couldn't do with XSLT given a bit of work and a close study of the INX docs.

I will say that InDesign is remarkably robust in terms of what you can leave out of INX files and still get a working doc, so that makes it a bit easier.

12:21 AM  
Anonymous james said...

My company recently began adding XML to our InDesign workflow. Here are two steps we would like do. Does anyone have any suggestions?

1. After an XML file is imported into InDesign, what are the steps necessary to import additional XML files into that same InDesign document, so that each of the content remains distinct, imports into its own text box, and can export back into its own XML file? (So far, when we try to import multiple XML files into InDesign, each new XML file nests/merges into the first file imported. Is there a way to keep them separate.)

2. Once the XML content is imported into InDesign what steps are needed to export only specific sections of content from that InDesign file? For example, tagged content that was pasted into an unlinked sidebar or the unlinked endnotes section of a book. How can you isolate specific sections of the tagged content from InDesign instead of exporting all of the tagged content in one data dump?

12:00 PM  
Blogger Unknown said...

You can learn how to do many XML techniques by reading my book, A Designer's Guide to Adobe InDesign and XML from Adobe Press or my XML video training tutorials at Lynda.com.

InDesign has pretty good XML features but it is true that XML:FO is not among them. You can still get many things done by using XSLT and understanding how InDesign can clone data and structures.

Adobe could definitely improve InDesign's XML feature, but they won't unless they think people want them. Send emails to Adobe or to your Adobe reps or directly to the InDesign team. The more people that do the better chance we have to get improved features.

One note: The InDesign format is XML-based. The IDML format is directly XML. If you were really good at writing XSLT you could build whatever you want by generating the IDML format and then opening the file in InDesign. This is beyond my needs or capabilities. I can do almost everything I need to do with the existing features. Although I'd really love to have XSLT 2.0 support.

9:46 AM  

Post a Comment

<< Home