Subscribe to Dr. Macro's XML Rants

NOTE TO TOOL OWNERS: In this blog I will occasionally make statements about products that you will take exception to. My intent is to always be factual and accurate. If I have made a statement that you consider to be incorrect or innaccurate, please bring it to my attention and, once I have verified my error, I will post the appropriate correction.

And before you get too exercised, please read the post, date 9 Feb 2006, titled "All Tools Suck".

Saturday, July 22, 2006

XCMTDMW: XIRUSS Demonstration XML-Aware CMS

Before I continue the discussion of XML content management system features, since we're talking about tools I thought I would go ahead and mention my XIRUSS-T system as it will be useful once we get to the discussion of boundary issues.

XIRUSS-T is a project on Sourceforge, home page is here: http://xiruss-t.sourceforge.net/

The XIRUSS-T tool is a toy content management system that I started writing a couple of years ago. It is intended primarily to provide a test bed for exploring and demonstrating my ideas about how to architect and implement a content management system using the principles I'm expounding on here, as well as showing how support for various general standards (XInclude, DITA, etc.) can provide serious value without too much effort.

XIRUSS stands for XInclude-based Re-Use Support System. The "T" stands for "toy" in that as I've implemented it it is explicitly not intended or capable of any production work and is not licensed or certified for it. I do not want to even appear to be trying to compete with any commercial systems or custom systems built by system's integrators. My goal with XIRUSS-T is simply to demonstrate my ideas and hopefully influence the builders of production systems to incorporate them as appropriate. It could also be useful as a base for prototype or proof-of-concept systems that would then be replaced by beefier systems. If I was to build (or help build) a production-capable version of XIRRUS (XIRUSS-P?) I would not give it away (unless I become otherwise independently wealthy in which case I would build it and give it away just to be distruptive bwah ha ha--but since I don't play the lottery and have no rich uncles this seems unlikely).

Anyway, I started writing XIRUSS-T as an exercise in forward engineering using Eclipse. As I mentioned before, I had been part of a team at DataChannel that developed a fairly sophisticated content management framework, code named Bonnell [after the highest point inside the city of Austin, our little joke at the time as all the DataChannel code names were mountains like K2, McKinnley, etc.]. The Bonnell system was based on a general versioning framework developed by John Heintz and myself called SnapCM, for snap-shot-based content management (a version of our paper on Snap CM is here. This version is not formatted very well--I'll have to track down the original version.)

Anyway, the basic data model of SnapCM is pretty simple and, except for syncing (which is a generalization of merging operations), the actual business logic is pretty simple too. So I was in an airport with about a 2-hour layover and thought "hey, I wonder how quickly I can instantiate the SnapCM data model in Java using forward engineering?"

So I started by creating a test case that did something like

Repository rep = new Repository();

and went from there. I actually got pretty far in those two hours and it didn't take too much more time to define all the classes, the core methods, and the basic operations needed to have a minimally-functional SnapCM-based system. Of course there was a lot more to do but it went remarkably quickly.

I also spent a lot of time on test cases as this was also, as much as possible, an exercise in pure test-driven development.

The result was a more or less working system that does exactly what I think a system should do, reflecting the layering, components, and separation of concerns that I think are important, just without any optimization or scalability features. It's amazing how simple things can be when you don't worry about performance (beyond simple competent engineering like caching query results or not doing stupid things in loops) or scale.

The code that is there now provides a more or less read-only system in that, while the repository is exposed via a simple HTTP server, I have only implemented the GET part of the protocol--you can't use HTTP to put stuff back in (loading the repository requires direct Java connection to the repository at the moment). I started working on the PUT part of it but got bogged down in that code. About the time I started working on that code my personal and professional situation changed dramatically (see my personal family blog) and I put it all aside.

To use the code on Sourceforge you run the main test case which will create and populate a repository with various sample objects. You can then use any Web browser to access the repository and navigate around it. You can also use any XML processor that does URL resolution (which should be all of them) to process documents directly from the repository (no need to create a local working copy). But you can't put anything back in.

In addition, I have not yet implemented any persistence mechanism for the repository. I started on it but it was more involved than I had time for--essentially, on this project, if I couldn't do it in a few lines of code after downloading a library and reading a page or two of docs, it was too hard and I put it off until later (did I mention that I'm also very lazy?).

But I think it's time to reactivate the XIRUSS-T project and try to get PUT and persistence implemented, which will then make it generally useful as a experimentation and proof of concept platform.

In addition to implementing SnapCM, XIRUSS-T also focuses on boundary stuff, which is where the complexity lies. In order to support this boundary stuff, XIRUSS-T provides a generic layered code framework for building importers and exporters. The layering is intended to allow you to implement more-and-more specialized importers by building on more general code layers that support common or standard syntaxes and semantics.

In particular, I wanted to demonstrate the utility of XInclude as a standard and methodology for defining compound documents and doing re-use, as well as thereby expose the inherent issues surrounding version-aware linking and management of the same.

The SnapCM model includes a generic dependency relationship model. The XIRUSS-T importer framework includes generic facilities for instantiating relationships within the repository. On top of this I implemented a generic XInclude processor that recognizes XIncludes and instantiates XInclude-specific dependency instances (i.e., instantiates Dependency-type objects of class XIncludeDependency). Most of the XML document types I design use specializations of xi:include (to meet authoring requirements) so I implemented a specialization of the generic XInclude importer that recognizes my private convention for xi:include specializations and instantiates those dependencies as specializations of XIncludeDependency objects.

I also implemented some generic link import objects and then specialized those to implement a DITA importer layer that recognizes DITA documents and imports them, capturing all (at least all I had time or patience to implement) relationships inherent in DITA documents. This layer could then be easily used as the basis for a importer for locally-specialized DITA documents.

You get the idea.

So I'm not just talking here--I've tried to put my ideas into easily-accessible, well-engineered working code. The code I've got at the moment is not quite as complete as it needs to be and now that my house is built and my daughter is a little older and things are more or less settled down I think I can return my attention to it. And of course, as an open-source project anyone who feels compelled to help will be greeted with open arms and tears of gratitude.

You can download the code today and it should run. If it doesn't or something about it confuses you or isn't clear, let me know.

Labels:

1 Comments:

Blogger John Cowan said...

Beware, beware!

If a few people decide that XIRUSS-T is cool, and they convince a few more people, it may end up being "big and professional" before you know it. (See the history of Linux.) And then how you'll kick yourself for not writing XIRUSS-P while you could.

:-)

10:56 AM  

Post a Comment

<< Home