Composition Call To Arms
For the last year or so I've been working on a new Innodata Isogen initiative called the Tools Agnostic Layout System (TALS), whose primary goal is to provide a generic page layout style mechanism that can then be used to generate renderers for a variety of back-end composition systems. We are not implementing composition functionality directly, rather we're adding another layer of abstraction above what you get with tools like XSL-FO or commercial XML-aware composition systems.
Our original target was reduction of the engineering cost of creating XSL-FO generation systems using XSLT, while also formalizing the practice and reducing the cost of capturing detailed formatting requirements. However, driven by our pilot customer, our focus changed from XSL-FO to automation, as much as possible, of the composition of high-end documents such as textbooks.
In order to do this I've been implementing a "renderer generator" that takes our style sheets (what we call "master format specifications") and generates from them XSLT programs that then generate files for a specific composition engine.
This engine, which I will not name, provides powerful composition features required to do things like textbooks, features like vertical justification and sophisticated hyphenation control (which in turn requires sophisticated line layout algorithms). It provides features for intelligent placement of floated objects.
This engine is also a very old piece of technology that reflects a time before object-oriented methods or syntaxes with more than 8-letter keywords. It is poorly documented. It is clearly a decades-old accretion of features with little overarching consistency of design or convention. It has many annoying bugs and implementation shortcomings that speak to weak overall engineering. It's control language syntax is obtuse and impenatrable.
Nevertheless, it is one of only two or three products that can do what we need to do. It has a lot of market penetration and a large install base. So we are using it.
But as a side effect of this experience, I've come to to realize that the world needs something that does not today exist (at least I haven't seen any hints of it): a modern, open-source, well-engineered, full-featured composition engine.
The business problem I'm seeing is that high-quality largely-automated composition is a functionality that many enterprises need, obviously mostly large publishers, but the cost of using the existing tools is prohibitive, both in terms of the raw license costs and in terms of the labor cost needed to use those tools.
High-quality composition seems to be one of the last major areas where there is no good open-source solution that can be integrated into the rest of the XML support infrastructure.
It's no surprise why this is: it's a very hard problem, there are existing tools, the primary user enterprises are, by their nature, not quick to seek new processes when old ones work well enough, and composition has never been such a big fraction of the cost of doing books or magazines to make finding a less-expensive solution that compelling. But I think that is changing in large part because of the disruptive effect that the Internet itself is having on the book publishing world.
I think that the time is right for the development of an object-oriented, general-purpose, open-source composition engine designed from the ground up to satisfy the typesetting requirements of the most demanding documents.
For the purposes of this call to arms I will focus on textbooks as the target application, in particular high school and college textbooks. That is because textbooks appear to present the most challenging requirements within the set of requirements that can be met (or mostly met) using automated composition from XML. Some very heavily designed documents, such as magazines or marketing material, are too ideosyncratic to be practically composed automatically--it requires too much hand work. But textbooks can be automated close to 100% in most cases (depending of course on design choices).
What I have is a pretty deep knowledge of these requirements and the issues inherent in automating them. For example, many textbooks use sidebars and similar marginal material. These present a serious automation challenge because they must be positioned relative to each other using various algorithms and rules of thumb that invariably involve some aesthetic choices ("space them equally on the page vertically unless they are too tight and then do blah blah blah").
I also know that these problems are solvable to one degree or another because software exists in various forms that can do it.
I also know that both software engineering techniques and tools as well as computers themselves have improved dramaticaly in the 30 years since many of these tools or their underpinnings (usually TeX) were first developed. This means that many things that we would now identify as premature optimizations but that at the time were simply the only way to make it work on affordable (or even any available) hardward are no longer necessary, at least not in the first iteration. I know for example that the guys at RenderX built an FO-capable composition system from the ground up in a reasonable amount of time. They're smart guys but not super geniuses (at least not as far as I know) so if they can do it we should be able to to.
I think that, given a reasonable amount of design work that it should be possible to design a composition system architecture that will satisfy the composition requirements of textbooks. I think that given the architecture implementation should be fairly straightforward, although it will not always be easy because there will assuredly be lots of wrinkles that can't be anticipated in the design. But we know what the business problem is, we know what the result needs to be, and the basic techniques of automatic typesetting are well established and well documented, thanks to Dr. Knuth. In addition, over the years a number of libraries have become available that handle a lot of the details, such as working with fonts and font metrics, doing line layout, rendering vector graphics, and so on. That should all reduce the total engineering cost considerably over what it would have been even just five years ago.
So it should be doable.
Note that this type of activity is not one where you can start of simply and iterate your way towards greater sophistication (as you can, for example, with XML-aware content management). This is because the software architecture and implementation techniques that will solve the hardest problems will differ quite markedly from solutions that will satisfy less demanding requirements. This is clear from looking at existing XSL-FO solutions. XSL-FO's abstract architecture is explicitly defined so that it avoids the hardest composition problems, those that require feedback during the pagination process into the initial layout process. Thus XSL-FO systems can be significantly simpler in this area than more complete composition systems.
So the only way to really proceed is to start with the hardest problems and develop a solution for those, figuring all the rest of the details will work themselves out in the wash.
In addition, we can assume that a lot of the data processing that currently complicates XML-aware systems, such as generating tables of contents, reordering content, generating text and other decorations, rendering indexes, and so on, will all be handled in a separate pre-process phase such that the input to the composition engine should reflect the linear structure of the data as it will be layed out, lacking only those things that cannot be known in advance of doing layout and pagination. That also significantly simplifies the problem.
The system should expose all of its functionality through an API as well as a standard character-based input format (i.e., an XML syntax which would be most logically based on XSL-FO and extended where appropriate to reflect features XSL-FO does not provide)
It should be implemented in either Java or .NET. My preference is Java but if the architecture is correct it shouldn't really matter since the hard part is the algorithms, not the code writing.
The initial implementation effort should focus on completeness of functionality, not performance optimization. Once you get the data structures and algorithms right, then you can work on optimizing it (or commercial concerns can add value worth paying for by optimizing it, as we've seen in the XSL-FO world).
I think it's doable and I would love to have the opportunity to participate but it's certainly not something I can do by myself (I am no James Clark or Mike Kay). If I can think of a catchy name I might start a Sourceforge project for it.
And in the hopes that a puzzle might motivate someone to sign up for such a project, here's what I think is the essential challenge:
We have in the input data elements A and B. The semantics of A and B map to formatting rules that say that if A occurs on a left-hand page, B must be presented on the right-hand page following it, but if A occurs on a right-hand page, B may be presented on the either the preceding left-hand page or on the following right-hand page, depending on where B occurred in the original source data relative to A. As it happens, B precedes A in the input source. The first stage of the rendering process renders B in its initial location and then renders A. This happens to land A on a left-hand page. On the next phase, B's rendition is moved so that it is placed on the right-hand page following A. This frees up enough space so that A moves back one page to the preceding right-hand page. This now allows B to be moved back to its original location, which pushes A to a left-hand page....
How do you resolve this potentially infinite loop? This type of problem is endemic in documents where there are lots of out-of-line elements with complex rules about how they can be placed both relative to each other and relative to the ordinality of the pages they fall on.
And of course there are other similar challeges, such as tables that span pages both horizontally and vertically or the layout of footnotes that span columns or pages or vertical justification of text on the page or within a column or within a table.
These all involve feedback and the application of numerous rules with the attendant need to prioritize and resolve the rules dynamically and do it in a reasonable amount of time.
There must be a general architectural and algorithmic approach that address these requirements. I'm sure there is an established body of knowledge of how to address these general problems, which must be general problems in computer science beyond page composition. It's just a matter of putting it all together in the context of a well-engineered implementation.
How hard could it be?