<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-22194031</id><updated>2011-12-14T20:45:30.014-06:00</updated><category term='dita docbook specialization'/><category term='xiruss xiruss-t'/><category term='dita publishing fasb asc standards'/><category term='4hb ferriss &quot;four hour body&quot; weightloss dieting'/><category term='chevyvolt volt ev'/><category term='iPad apple ebooks'/><category term='typefi indesign &quot;xml composition&quot;'/><category term='&quot;adobe mars&quot; pdf'/><category term='dita shell specialization configuration &quot;standard practice&quot;'/><category term='roomba robot &quot;geek toys&quot;'/><category term='xml editors'/><category term='XCMTDMW &quot;xml content management&quot; subversion'/><category term='composition &quot;xml composition&quot; &quot;batch composition&quot;'/><category term='xinclude use-by-reference specialization'/><category term='xiruss marklogic integration &quot;xml search and retrieval&quot;'/><category term='namespaces'/><category term='XCMTDMW &quot;xml content management&quot;'/><category term='xquery'/><category term='XCMTDMW &quot;xml content management&quot; indirection xinclude'/><category term='XCMTDMW &quot;xml content management&quot; &quot;cms characteristics&quot;'/><category term='spam'/><category term='dita specialization props configuration integration'/><category term='XCMTDMW &quot;xml content management&quot; indirection xinclude xlink linking hytime  snapcm xpointer'/><category term='&quot;office open xml&quot; microsoft &quot;microsoft office&quot;'/><category term='&quot;cals table&quot; oasis'/><category term='XCMTDMW &quot;xml content management&quot; import'/><category term='XCMTDMW &quot;xml content management&quot; &quot;woodward governor&quot; CVS gadfly python hytime groveminder'/><category term='XCMTDMW &quot;xml content management&quot; import namespaces'/><category term='xiruss http jython'/><category term='topicmap opencyc xtm'/><category term='XCMTDMW &quot;xml content management&quot; import snapcm'/><category term='xiruss'/><category term='XCMTDMW &quot;xml content management&quot; reuse use-by-reference Xinclude DITA conref use-by-copy'/><category term='xiruss eclipse &quot;eclipse plug-in&quot; marklogic'/><category term='docbook schemas namespaces'/><category term='xiruss xiruss-t xinclude dita'/><category term='dita docbook contentwrangler'/><category term='namespaces &quot;cals tables&quot; schemas &quot;oasis exchange table model&quot;'/><category term='relaxng schemas dtds'/><category term='XCMTDMW &quot;xml content management&quot; &quot;referent tracking documents&quot; snapcm'/><category term='dita specialization ditaopentoolkit'/><category term='&quot;dita 2006&quot; &quot;eve maler&quot;'/><category term='edubuntu ubuntu linux children education'/><category term='&quot;dita 2006&quot; docbook dita'/><category term='dita'/><category term='roomba robot'/><category term='dita cms content management ditamap link linkmanagement specialization generalization'/><category term='mac apple osx'/><category term='pdf &quot;pdf data extraction&quot; pdfbox'/><title type='text'>Dr. Macro's XML Rants</title><subtitle type='html'>W. Eliot Kimber's personal blog about XML as a technology, tools that support it, what I think is and isn't good practice, and technical issues in general. Other keywords that might be relevant: XSLT, XSL-FO, schema, publishing, composition, formatting, Python, Java.&lt;br&gt;&lt;br&gt; All tools suck&lt;br&gt;Some tools suck less than others</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>88</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-22194031.post-338099210509098095</id><published>2011-02-20T06:50:00.006-06:00</published><updated>2011-02-20T08:48:47.440-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='4hb ferriss &quot;four hour body&quot; weightloss dieting'/><title type='text'>Physical Improvement for Geeks: The Four Hour Body</title><content type='html'>I've just read through all of Tim Ferriss' &lt;span style="font-style:italic;"&gt;The Four Hour Body&lt;/span&gt; (&lt;a href="http://fourhourbody.com/"&gt;http://fourhourbody.com/&lt;/a&gt;) (4HB). Short version of review: found it really interesting and helpful and generally to be full of sound advice and guidance provided with a dose of humor. I am starting on the book's Slow Carb Diet (SCD) in an attempt to lose 20lbs of mostly visceral fat (read "lose my beer gut" and try to live to see my daughter graduate from college).&lt;br /&gt;&lt;br /&gt;The book is written from a geek's perspective for geeks. It essentially takes an engineering approach to body tuning based on self experimentation, measurement, and application of sound scientific principles. In a post on the 4HB blog Tim captures the basic approach and purpose of the book:&lt;br /&gt;&lt;br /&gt;"To reiterate: The entire goal of 4HB is to make you a self-sufficient self-experimenter within safe boundaries. Track yourself, follow the rules, and track the changes if you break or bend the rules. Simple as that. That’s what I did to arrive at my conclusions, and that’s what you will do — with a huge head start with the 4HB — to arrive at yours."&lt;br /&gt;&lt;br /&gt;I've done Atkins in the past with some success so I know that for me a general low-carb approach will work. The Slow Carb Diet essentially takes Atkins and reduces it to the essential aspects that create change. The biggest difference between Atkins and the SCD is the SCD eliminates all dairy because of its contribution to insulin spiking despite a low glycemic index. So no cheese or sugar-free ice cream (which we got really good at making back in our Atkins days). The SCD also includes a weekly "cheat day" where you eat whatever crap you want, as much as you can choke down. After 6 days I've lost 3.5 lbs, which is about what I would expect at the start of a strict low-carb diet. I haven't had the same degree of mind alteration that I got from the Atkins induction process, which is nice, because that was always a pretty rough week for everybody.&lt;br /&gt;&lt;br /&gt;What I found interesting about the 4HB was that Tim is simply presenting his findings and saying "this worked, this didn't, here's why we think this did or didn't work." He's not selling a system or pushing supplements or trying to sell videos. His constant point is "don't take my word for it, test it yourself. I might be spouting bullsh*t so test, test, test."&lt;br /&gt;&lt;br /&gt;As an engineer that definitely resonated with me. He also spends a lot of time explaining why professional research is often useless, flawed, biased, or otherwise simply not helpful, if not downright counterproductive. As somebody who's always testing assumptions and asking for proof I liked that too.&lt;br /&gt;&lt;br /&gt;He even has an appendix where he presents some data gathered from people who used the SCD, which, as presented suggested some interesting findings and made the diet look remarkably effective. He then goes through the numbers and shows why the numbers are deceptive and can't be trusted in a number of ways. If his intent was to sell the diet he would have just presented the numbers. Nice.&lt;br /&gt;&lt;br /&gt;His focus is as much on the mental process as on the physical process: measure, evaluate, question, in short, think about what you're doing and why. Control variables as much as possible in your experiments.&lt;br /&gt;&lt;br /&gt;I highly recommend the book for anyone who's thinking about trying to lose weight or improve their physical performance in whatever way they need to--Ferriss pretty much covers all bases, from simple weight and fat loss to gaining muscle, improving strength, etc. &lt;br /&gt;&lt;br /&gt;He has two chapters focused on sexual improvements, one on female orgasm and one on raising testosterone levels, sperm count, and general libido in males. These could have come off as pretty salacious and "look what at what a sex machine I've become" but I didn't read them that way. Rather his point was that improving the sexual aspects of ones life is important to becoming a more complete person--it's an important part of being human so why not enjoy it to its fullest? I personally went through a male fertility issue when my wife and I tried to start a family and if I'd had the chapter on improving male fertility at that time (and if my fertility had actually been relevant) it would have been a godsend.  One easy takeaway from that chapter: if you want kids don't carry an active cell phone in your pocket.&lt;br /&gt;&lt;br /&gt;An interesting chapter on sleep: how to get better sleep, how to need less sleep, etc. Some interesting and intriguing stuff there as well. Some simple actions that might make significant positive changes in sleep patterns, as well as a technique for getting by on very little sleep if you can maintain a freaky-hard nap schedule.&lt;br /&gt;&lt;br /&gt;Overall I found the book thoughtful, clearly written, engaging and entertaining and generally helpful. I found very few things that made me go "yeah right" or "oh please" or any of the reactions I often have to self help books. He stresses being careful and responsible and having a clear undestanding of what your goal is. In short, sound engineering practice applied to your physical self.&lt;br /&gt;&lt;br /&gt;Dr. Macro says check it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-338099210509098095?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/338099210509098095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=338099210509098095' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/338099210509098095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/338099210509098095'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2011/02/physical-improvement-for-geeks-four.html' title='Physical Improvement for Geeks: The Four Hour Body'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7015459377196130231</id><published>2011-02-19T04:24:00.002-06:00</published><updated>2011-02-19T05:29:24.345-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chevyvolt volt ev'/><title type='text'>Chevy Volt Adventure: Feb Diagnostic Report</title><content type='html'>Just got the February vehicle diagnostic report email from the Volt. I'm not sure why I find it so cool that my car can send me email, but I do.&lt;br /&gt;&lt;br /&gt;The salient numbers are:&lt;br /&gt;&lt;br /&gt;35 kW-hr/100 miles&lt;br /&gt;&lt;br /&gt;1 Gallon of gasoline used. [This is actually an overstatement as we have only used 0.2 gallons since returning from our Houston trip at the end of December.]&lt;br /&gt;&lt;br /&gt;Our electricty usage for January (the latest numbers I have) was (numbers in parens are for Jan 2010):&lt;br /&gt;&lt;br /&gt;Total kW-hr: 954 (749)&lt;br /&gt;Grid kW-hr:  723 (455)&lt;br /&gt;Solar kW-hr: 231 (294)&lt;br /&gt;Dollars billed: $58.37 ($35.12)&lt;br /&gt;&lt;br /&gt;$/kWh used: $0.06 ($59.00/954)&lt;br /&gt;&lt;br /&gt;kWh/mile: 0.35 (35kWh/100miles)&lt;br /&gt;&lt;br /&gt;$/mile: $0.02&lt;br /&gt;&lt;br /&gt;Our bill for Dec was $32.00, so we spent an extra $26.00 on electricity in January, some of which can be attributed to the unusually cold winter we've been having. We also produced about 60kWh less this January than last.&lt;br /&gt;&lt;br /&gt;But if we assume that most of the difference was the Volt, that means it cost us about $20.00 to drive the vehicle for the month. We used essentially no gasoline so the electricity cost was our total operating cost.&lt;br /&gt;&lt;br /&gt;Looking at the numbers it also means that the draw from the car is less than or roughly equal to the solar we produced over the same period. Not that much of that solar went to actually charging the Volt since we tend to charge later in the day or over night after having done stuff during the day, but if Austin Energy actually gave us market rates for our produced electricity rather than the steep discount they do give us, we could truthfully say we have a solar powered car, even in January. For contrast, our maximum solar production last year was 481 kWh in August, with numbers around 400 kWh most months.&lt;br /&gt;&lt;br /&gt;Compare this cost with a gasoline vehicle getting 30 mpg around town at $3.00/gallon (current price here in Austin):&lt;br /&gt;&lt;br /&gt;30 miles/gallon = 0.03 gallons/mile * $3.00/gallon = &lt;br /&gt;&lt;br /&gt;$/mile: 0.09&lt;br /&gt;&lt;br /&gt;However, our other car, a 2005 Toyota Solar only gets about 22 mpg around town, which comes out to &lt;br /&gt;&lt;br /&gt;$/mile: 0.15&lt;br /&gt;&lt;br /&gt;Of course these numbers only reflect direct operating cost, not the cost of our PV system or the extra cost of the Volt itself relative to a comparable gas-powered vehicle, but that's not the point is it? Because it's not just lowered operating cost but being a zero-emissions vehicle most days and using (or potentially using) more sustainable sources of energy.&lt;br /&gt;&lt;br /&gt;But another interesting implication here is what would happen (or will happen) when the majority of vehicles are electric? If our use is typical, it means about a 25% increase in electricity consumption just for transportation. What does that mean for the electricity infrastructure? Would we be able in the U.S. to add 25% more capacity in say 10 years without resorting to coal? How much of that increase can be met through conservation? It seems like it could be a serious challenge for the already-straining grid infrastructure, something we know we need to address simply to make wind practical (because of the current nature of the U.S. grid).&lt;br /&gt;&lt;br /&gt;If Chevy and the other EV manufacturers can bring the cost down, which they inevitably will, people are going to flock to these cars because they're fun to drive, cheaper to operate, and better for the air. Given the expected rate of advance in battery technology and the normal economies of scale, it seems reasonable to expect the cost of electric vehicles to be comparable to gasoline vehicles in about 5 years. If gas prices rise even $1.00/gallon in that time, which seems like a pretty safe bet (but then I would have expect gas to be at $5.00/gallon by now after it's spike back in 2008), then the attractiveness of electric vehicles will be even greater. &lt;br /&gt;&lt;br /&gt;Which is all to say that I fully expect EVs like the Volt to catch on in a big way in about 5 years, which I think could spell, if not disaster, then at least serious strain in the U.S. electricity infrastructure. I know the City of Austin is thinking about it because that's their motivation for paying for our charging station: monitor the draw from the car so they can plan appropriately. But are we doing that a national level? I have no idea, but history does not instill confidence, let us say.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7015459377196130231?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7015459377196130231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7015459377196130231' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7015459377196130231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7015459377196130231'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2011/02/chevy-volt-adventure-feb-diagnostic.html' title='Chevy Volt Adventure: Feb Diagnostic Report'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2726858468750093550</id><published>2011-01-19T08:54:00.003-06:00</published><updated>2011-01-19T09:09:01.471-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chevyvolt volt ev'/><title type='text'>Chevy Volt Adventure: Fun to Drive</title><content type='html'>We've been driving the Volt around town now for a few weeks and the biggest surprise to me is how much fun it is to drive. The instant acceleration, freaky smoothness, and weight-enhanced handling make it a lot of fun to drive. You can zip around, corner hard, and do it all without fuss or noise. And we haven't even tried sport mode yet.&lt;br /&gt;&lt;br /&gt;As for the car itself, it seems to be holding up well--I haven't noticed anything particularly tinny or annoying, with the possible exception of the charge port cover, which seems a little weak but then it's just a little cover, but the latch is a little less aggressive than I'd like--a couple of times I've thought I pushed it closed but it hadn't caught.&lt;br /&gt;&lt;br /&gt;We are clearly not driving in the most efficient manner because our full-charge electric range is currently estimated at about 30 miles, which our Volt Assistant at GM assures us reflects our profligate driving style and not an issue with reduced battery capacity.&lt;br /&gt;&lt;br /&gt;As a family car it's working fine. With our around-town driving we've only had to use a fraction of a gallon of gas when we've forgotten to plug in after a trip. So our lifetime gas usage total is about 8.6 gallons, of which 8.5 were used on the round trip to Houston.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2726858468750093550?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2726858468750093550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2726858468750093550' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2726858468750093550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2726858468750093550'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2011/01/volt-adventure-fun-to-drive.html' title='Chevy Volt Adventure: Fun to Drive'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-1625154379226079432</id><published>2011-01-04T08:28:00.003-06:00</published><updated>2011-01-04T09:38:10.639-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chevyvolt volt ev'/><title type='text'>Chevy Volt Adventure: Houston Trip 1</title><content type='html'>On Christmas Eve we loaded up the Volt and headed to Grandma's house in Houston.&lt;div style="float: right;"&gt;&lt;a href="http://www.flickr.com/photos/woods-kimber/5323985178/" title="IMG_0948 by drmacro, on Flickr"&gt;&lt;img src="http://farm6.static.flickr.com/5003/5323985178_84106b9b6e_m.jpg" width="240" height="179" alt="IMG_0948" /&gt;&lt;/a&gt;&lt;/div&gt; The picture shows the cargo area loaded for the trip. The cargo space is a little cramped but was able to accomodate what we needed for this trip, including all the gifts. It would be hard pressed to hold three full-sized rollaboards.&lt;br /&gt;&lt;br /&gt;In the car we had me, my wife, our daughter, and our dog, Humphrey (a basset hound). Everyone was comfortable but this is definitely a 4-passenger vehicle because of the bucket seats in back. The seats were reasonably comfortable for a 3-hour trip, comparable to what I'm used to from our other car, a 2005 Toyota Solara convertible. &lt;br /&gt; &lt;br /&gt;The total round trip from our house to Grandma's house is about 450 miles. The trip meter reports we used 8.1 gallons for a trip MPG of about 51, which is pretty good.&lt;br /&gt;&lt;br /&gt;In our Solara, which averages about 22 MPG overall and gets probably 30 or so on the highway, we usually fill up at the halfway point out and back, using a full 15-gallon tank over the course of the trip. On this trip we didn't stop to fill up until the return, when the tank showed 3/4 empty. I put in about 6 gallons but I think the tank didn't fill (it was the first time I'd put gas in so I had no idea how much to expect to need—the tank must be 10 gallons if 3/4 reflected an 8-gallon deficit).&lt;br /&gt;&lt;br /&gt;On the way out the battery lasted from Austin to just outside Bastrop, about 30 miles. It's clear that, as expected, highway speeds are less efficient than around-town speeds. I'd be interested to know what the efficiency curve is: is it more or less linear  or, more likely, curves sharply up above say 50 MPH. My intuition says 40 MPH is the sweet spot. I tried to keep it between 60 and 70 for most of the trip (the posted limit for most of the trip is 70). I drove a little faster on the way home having realized that it didn't make much difference in efficiency.&lt;br /&gt;&lt;br /&gt;Highway driving was fine. The car is heavy for its size, with the batteries distributed along the main axis, which makes it handle more like a big car than the compact it is. Highway 71 is pretty rough in places but the car was reasonably quiet at 70. When we left I-10 in Houston there was enough accumulated charge to use the battery for the couple of miles to my mother-in-law's house.&lt;br /&gt;&lt;br /&gt;It definitely has power to spare and plenty of oomph. There's no hesitation when you stamp the accelerator and I had no problem going from 45 to 65 almost instantly to get from behind a slow car on I-10. We have yet to try the "sport" driving mode but now I'm almost afraid to.&lt;br /&gt;&lt;br /&gt;The car is really smooth to drive--like driving an electric golf cart in the way it just smoothly takes off and doesn't make any noise.&lt;br /&gt;&lt;br /&gt;If we had a problem it was the underbuilt electrical circuit at Grandma's that served the garage—at one point when we had the car plugged in and charging the circuit breaker flipped (a 15-amp circuit)—turned out the circuit also served most of the kitchen, where we were busy preparing Christmas dinner.&lt;br /&gt;&lt;br /&gt;If there is any practical issue with the vehicle it's the climate control—it takes a lot of energy to heat it. Houston was having a cold snap so we got to test the heating system. The multi-position seat heaters are nice but keeping the controls on the "econ" setting meant that backseat passengers sometimes got a little chilled. You do realize how much waste heat gas engines produce when you don't have it available to turn your car into a sauna.&lt;br /&gt;&lt;br /&gt;It was also weird to get back from a drive and realize that the hood is still cold.&lt;br /&gt;&lt;br /&gt;We spent the last week traveling in the Northwest and rented the cheapest car Enterprise offers, which turned out to be a Nissan Versa, a tinny little econobox. The contrast was dramatic and made me appreciate the Volt. The two vehicles are comparable in size and capacity (but not cost, of course), but the Versa had a hard time making it up to highway speed and sounded like the engine might come out or explode under stress or blow off the road in a stiff breeze.&lt;br /&gt;&lt;br /&gt;Now that we're back to our normal workaday life we'll see how it does in our normal around-town driving, but my expectation is that we'll use very little, if any, gas as we seldom need to go more than 10 miles from home (our longest usual trip is up north to Fry's, which is about a 20-mile round trip). We'll probably take it out to Llano and Lockhart for BBQ if we get a warm weekend in the next month or so.&lt;br /&gt;&lt;br /&gt;On the way back from Houston we ended up near a Prius and ran into them at the gas station. They were interested in how the Volt was working and we got to compare MPG and generally be smug together. I ended up following them the rest of the way into Austin, figuring they probably reflected an appropriately efficient speed.&lt;br /&gt;&lt;br /&gt;And I'm still getting a kick out of plugging it in whenever I bring it back home.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-1625154379226079432?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/1625154379226079432/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=1625154379226079432' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1625154379226079432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1625154379226079432'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2011/01/chevy-volt-adventure-houston-trip-1.html' title='Chevy Volt Adventure: Houston Trip 1'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm6.static.flickr.com/5003/5323985178_84106b9b6e_t.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-3355242465137359430</id><published>2010-12-21T08:39:00.003-06:00</published><updated>2011-01-04T09:37:57.966-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='chevyvolt volt ev'/><title type='text'>Chevy Volt Adventure</title><content type='html'>My family is now the second (in Texas or Austin, not 100% sure) to take delivery of a 2011 Chevy Volt. We got it last night and it's sitting in the carport happily charged.&lt;br /&gt;&lt;br /&gt;The car is very cool, very high tech. It sends you status emails. It chides you for jackrabbit starts (although I gather other electric and hybrid vehicles do as well).&lt;br /&gt;&lt;br /&gt;It is freaky quiet in electric mode, a bit rumbly in extended mode.&lt;br /&gt;&lt;br /&gt;The interior is pretty nice, reasonably well laid out, nicely detailed. The back seat is reasonably comfortable (I have the torso of a 6-foot person and my head cleared the back window).&lt;br /&gt;&lt;br /&gt;Accelerates snappily in normal driving mode (haven't had a chance to try the "sport" mode yet). Handles pretty nicely (the batteries are stored along the center length of the vehicle, giving it pretty good balance).&lt;br /&gt;&lt;br /&gt;We'll be driving it to Houston, about 500 miles round trip, in a couple of days. I'll report our experience.&lt;br /&gt;&lt;br /&gt;Early adopters get some perks. We get 5 years of free OnStar service. We get a free 240v charging station from the City of Austin at the cost of letting them monitor the energy usage of the charger. We get a special parking space at the new branch library near us. The Whole Foods flagship store has charging stations--might actually motivate me to shop there (we normally avoid that Whole Foods because it's really hard to park and you know, it's Whole Foods).&lt;br /&gt;&lt;br /&gt;One thing that will take some getting used to is not having to put a key into it in order to operate it. I kept reflexively reaching toward the steering column to remove the key that wasn't there. &lt;br /&gt;&lt;br /&gt;Here's a question for you Electrical Engineers out there: what is the equivalent to miles per gallon for an electric vehicle? Is it miles per megajoule? miles per amp-hour?&lt;br /&gt;&lt;br /&gt;I'm trying to remember what the unit of potential electrical energy is and coming up blank (not sure I ever really knew).&lt;br /&gt;&lt;br /&gt;Oh, and since we have a PV system on the house and can control when charging takes place, I am going to claim that this Volt is a solar powered vehicle.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-3355242465137359430?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/3355242465137359430/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=3355242465137359430' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/3355242465137359430'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/3355242465137359430'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/12/chevy-volt-adventure.html' title='Chevy Volt Adventure'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-8553716222769202367</id><published>2010-09-01T09:24:00.002-05:00</published><updated>2010-09-01T09:46:19.657-05:00</updated><title type='text'>Norm Reconsiders DITA Specialization</title><content type='html'>Norm Walsh has published a very interesting post to his blog, &lt;a href="http://norman.walsh.name/2010/08/30/specialization"&gt;Reconsidering specialization, part the first&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This is very significant and I eagerly await Norm's thoughts.&lt;br /&gt;&lt;br /&gt;As Norm relates in his post, he and I had what I thought was a very productive discussion about specialization and what it could mean in a DocBook context. I think Norm characterized my position accurately, namely that the essential difference between DocBook and DITA is specialization and that makes DITA better.&lt;br /&gt;&lt;br /&gt;Here by "better" I mean "better value for the type of applications to which DITA and DocBook are applied". It's a better value because:&lt;br /&gt;&lt;br /&gt;1. Specialization enables blind interchange, which I think is very important, if not of utmost importance, even if that interchange is only with your future self.&lt;br /&gt;&lt;br /&gt;2. Specialization lowers the cost of implementing new markup vocabularies (that is, custom markup for a specific use community) roughly an order of magnitude easier.&lt;br /&gt;&lt;br /&gt;There's more to it than that, of course, but that's the key bits.&lt;br /&gt;&lt;br /&gt;All the other aspects of DITA that people see as distinguishing: modularity, maps, conref, etc., could all be replicated in DocBook.&lt;br /&gt;&lt;br /&gt;If we assume that DITA's more sophisticated features like maps and keyref and so forth are no more complicated than they need to be to meet requirements, then the best that DocBook could do is implement the exact equivalent of those features, which is fine. So to that degree, DocBook and DITA are (or could be) functionally equivalent in terms of specific markup features. (But note that any statement to the effect that "DITA's features are too complicated" reflects a lack of understanding of the requirements are that DITA satisfies--I can assure you that there is no aspect of DITA that is not used and depended on by at least one significant user community. That is, any attempt, for example, to add a map-like facility to DocBook that does not reflect all the functional aspects of DITA maps will simply fail to satisfy the requirements of a significant set of potential users.)&lt;br /&gt;&lt;br /&gt;But note that currently DocBook and DITA are *not* functionally equivalent: DocBook lacks a number of important features needed to support modularity and reuse. But I don't consider that important. What really matters is specialization.&lt;br /&gt;&lt;br /&gt;Note also that I'm not necessarily suggesting that DocBook adapt the DITA specialization mechanism exactly as it's formulated in DITA. I'm suggesting that DocBook needs the &lt;span style="font-style:italic;"&gt;functional equivalent&lt;/span&gt; of DITA's specialization facility. &lt;br /&gt;&lt;br /&gt;Note also that DocBook as currently formulated at a content model level probably cannot be made to satisfy the constraints specialization requires in terms of consistency of structural patterns along a specialization hierarchy and probably lacks a number of content model options that you'd want to have in order to support reasonable specializations from a given base.&lt;br /&gt;&lt;br /&gt;But those are design problems that could be fixed in a DocBook V6 or something if it was important or useful to do so.&lt;br /&gt;&lt;br /&gt;Finally, note that in DITA 2.0 there is the expectation that the specialization facility will be reengineered from scratch. That would be the ideal opportunity to work jointly to develop a specialization mechanism that satisfied requirements beyond those specifically brought by DITA. In particular, any new mechanism needs to play well with namespaces, which the current DITA mechanism does not (but note that it was designed before namespaces were standardized).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-8553716222769202367?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/8553716222769202367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=8553716222769202367' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8553716222769202367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8553716222769202367'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/09/norm-reconsiders-dita-specialization.html' title='Norm Reconsiders DITA Specialization'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-4593669434077100327</id><published>2010-08-09T10:04:00.003-05:00</published><updated>2010-08-09T11:07:31.148-05:00</updated><title type='text'>Worse is Better, or Is It?</title><content type='html'>At the just-concluded Balisage conference, Michael Sperberg-McQueen brought up the (apparently) famous "worse is better" essay by Richard P. Gabriel (Wikipedia entry &lt;a href="http://en.wikipedia.org/wiki/Worse_is_better"&gt;here&lt;/a&gt;, original paper &lt;a href="http://dreamsongs.com/WIB.html"&gt;here&lt;/a&gt;). I had never heard of this (or at least had no memory of ever hearing of it) even though it is directly relevant to my experiences as a standard developer and engineer, where I've done things in both the "MIT" way (correctness is most important) and, more or less, the "New Jersey" way (simplicity is most important). I was actually very surprised that nobody had ever pointed me to it before.&lt;br /&gt;&lt;br /&gt;Gabriel's original argument is essentially that software that chooses simplicity over correctness and completeness has better survivability for a number of reasons, and cites as a prime example Unix and C, which spread precisely because they were simple (and thus easy to port) in spite of being neither complete functionally nor consistent in terms of their interfaces (user or programming). Gabriel then goes on, over the years, to argue against his own original assertion that worse is better and essentially falls into a state of oscillation between "yes it is" and "no it isn't" (see his history of his thought &lt;a href="http://dreamsongs.com/WorseIsBetter.html"&gt;here&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;The concept of "worse is better" certainly resonated with me because I have, for most of my career, fought against it at every turn, insisting on correctness and completeness as the primary concerns. This is in some part because of my work in standards, where correctness is of course important, and in part because I'm inherently an idealist by inclination, and in part because I grew up in IBM in the 80's when a company like IBM could still afford the time and cost of correctness over simplicity (or thought it could).&lt;br /&gt;&lt;br /&gt;XML largely broke me of that. I was very humbled by XML and the general "80% is good enough" approach of the W3C and the Web in general. It took me a long time to get over my anger at the fact that they were right because I didn't want to live in that world, a world where &amp;lt;a href/&gt; was the height of hyperlinking sophistication. &lt;br /&gt;&lt;br /&gt;I got over it.&lt;br /&gt;&lt;br /&gt;Around 1999 I started working as part of a pure Extreme Programming team implementing a content management system based on a simple but powerful abstract model (the SnapCM model I've posted about here in the past) and implemented using iterative, requirements-driven processes. We were very successful, in that we implemented exactly what we wanted to, in a timely fashion and with all the performance characteristics we needed, and without sacrificing any essential aspects of the design for the sake of simplicity of implementation or any other form of expediency.&lt;br /&gt;&lt;br /&gt;That experience convinced me that agile methods, as typified by Extreme Programming, are very effective, if not the most effective engineering approach. But it also taught me the value of good abstract models, that they ensure consistency of purpose and implementation and allow you to have both simplicity of implementation and consistency of interface, that one need not be sacrificed for the other if you can do a bit of advanced planning (but not too much--that's another lesson of agile methods).&lt;br /&gt;&lt;br /&gt;Thinking then about "worse is better" and Gabriel's inability to decide conclusively if it is actually better got me to thinking and the conclusion I came to is that the reason Gabriel can't decide is because both sides of his dichotomy are in fact wrong.&lt;br /&gt;&lt;br /&gt;Extreme Programming says "start with &lt;span style="font-style:italic;"&gt;the simplest thing that could possibly work&lt;/span&gt;" (italics mine). This is not the same as saying "simplicity trumps correctness", it just says "start simple". You then iterate until your tests pass. The tests reflect documented and verified user requirements.&lt;br /&gt;&lt;br /&gt;The "worse is better" approach as defined by Gabriel is similar in that it also involves iteration but it largely ignores requirements. That is, in the New Jersey approach, "finished" is defined by the implementors with no obvious reference to any objective test of whether they are in fact finished.&lt;br /&gt;&lt;br /&gt;At the same time, the MIT approach falls into the trap that agile methods are designed explicitly to avoid, namely overplanning and implementation of features that may never be used.&lt;br /&gt;&lt;br /&gt;That is, it is easy, as an engineer or analyst who has thought deeply about a particular problem domain, to think of all the things that &lt;span style="font-style:italic;"&gt;could&lt;/span&gt; be needed or useful and then design a system that will provide them, and then proceed to implement it. In this model, "done" is defined by "all aspects of the previously-specified design are implemented", again with no direct reference to actual validated requirements (except to the degree the designer asserts her authority that her analysis is correct). [The HyTime standard is an example of this approach to system design. I am proud of HyTime as an exercise in design that is mathematically complete and correct with respect to its problem domain. I am not proud of it as an example of survivable design. The fact that the existence of XML and the rise of the Web largely made HyTime irrelevant does not bother me particularly because I see now that it could never have survived. It was a dinosaur: well-adapted to its original environment, large and powerful and completely ill adapted to a rapidly changing environment. I learned and moved on. I am gratified only to the degree that no new hyperlinking standard, with the possible exception of DITA 1.2+, has come anywhere close to providing the needed level of standardization of hyperlinking that HyTime provided. It's a hard problem, one where the minimum level of simplicity needed to satisfy base requirements is still dauntingly challenging.]&lt;br /&gt;&lt;br /&gt;Thus both the MIT and New Jersey approaches ultimately fail because they are not directly requirements driven in the way that agile methods are and must be.&lt;br /&gt;&lt;br /&gt;Or put another way, the MIT approach reflects the failure of overplanning and the New Jersey approach reflects the failure of underplanning.&lt;br /&gt;&lt;br /&gt;Agile methods, as typified by Extreme Programming, attempt to solve the problem by doing just the right amount of planning, and no more, and that planning is primarily a function of requirements gathering and validation in the support of iteration.&lt;br /&gt;&lt;br /&gt;To that degree, agile engineering is much closer to the worse is better approach, in that it necessarily prefers simplicity over completeness and it tends, by its start-small-and-iterate approach, to produce smaller solutions faster than a planning-heavy approach will. &lt;br /&gt;&lt;br /&gt;Because of the way projects tend to go, where budgets get exhausted or users get bogged down in just getting the usual stuff done or technology or the business changes in the meantime, it often happens that more sophisticated or future-looking requirements never get implemented because the project simply never gets that far. This has the effect of making agile projects look, after the fact, very much like worse-is-better projects simply because informed observers can see obvious features that haven't been implemented. Without knowing the project history you can't tell if the feature holes are there because the implementors refused to implement them on the grounds of preserving simplicity or because they simply fell off the bottom of the last iteration plan.&lt;br /&gt;&lt;br /&gt;Whether an agile project ends with a greater degree of consistency in interface is entirely a function of engineering quality but it is at least the case that agile projects need not sacrifice consistency as long as the &lt;span style="font-style:italic;"&gt;appropriate&lt;/span&gt; amount of planning was done, and in particular, a solid, universally-understood data or system model was defined as part of the initial implementation activity.&lt;br /&gt;&lt;br /&gt;At the time Unix was implemented the practice of software and data modeling was still nascent at best and implementation was hard enough. Today we have deep established practice of software models, we have well-established design patterns, we have useful tools for capturing and publishing designs, so there is no excuse for not having one for any non-trivial project.&lt;br /&gt;&lt;br /&gt;To that degree, I would hope that the "worse is more" engineering practice typified by Unix and C is a thing of the past. We now have enough counterexamples of good design with simplest-possible implementation and very consistent interfaces (Python, Groovy, Java, XSLT, and XQuery all come to mind, although I'm sure there are many many more).&lt;br /&gt;&lt;br /&gt;But Michael's purpose in presenting worse-is-better was primarily as it relates to standards and I think the point is still well taken--standards have value only to the degree they are adopted, that is to the degree they survive in the Darwinian sense. Worse is more definitely tells us that simplicity is a powerful survival characteristic--we saw that with XML relative to SGML and with XSLT relative to DSSSL. Of course, it is not the only survival characteristic and is not sufficient, by itself, to ensure survival. But it's a very important one.&lt;br /&gt;&lt;br /&gt;As somebody involved in the DITA standard development, I certainly take it to heart.&lt;br /&gt;&lt;br /&gt;My thanks to Michael for helping me to think again about the value of simplicity.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-4593669434077100327?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/4593669434077100327/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=4593669434077100327' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4593669434077100327'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4593669434077100327'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/08/worse-is-better-or-is-it.html' title='Worse is Better, or Is It?'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2747530061534125110</id><published>2010-07-22T16:11:00.002-05:00</published><updated>2010-07-22T16:47:22.155-05:00</updated><title type='text'>At Least I Can Walk To Work, Part 2</title><content type='html'>OK, I may have overreacted before. I was really pretty depressed there for a couple of days. I even started seriously considering buying a gun to have in the house "just in case". I think I've calmed down a bit. For one thing, I couldn't stay in that state without going literally mad.&lt;br /&gt;&lt;br /&gt;As a counter to the depressing doomsaying of James Kunstler I found &lt;a href="http://peakoildebunked.blogspot.com/"&gt;Peak Oil Debunked&lt;/a&gt;, which seems to be a reasonably thoughtful counter to the most extreme of the Peak Oil predictions. It cheered me up a bit, although I found a number of the arguments therein to be not entirely convincing or accurate-but-missing-the-point, especially as regards agriculture. Peak Oil Debunked (POD) isn't actually debunking the notion of peak, just trying to counter the most extreme doomsday prognosticating, which is good. POD isn't saying there is no peak, just that the results of it can't be as extreme as Kunstler and other Peak Oil doomsayers are predicting.&lt;br /&gt;&lt;br /&gt;But it's hard to see how a serious contraction of resources, especially food (and by extension, capital for investment), isn't inevitable in the relatively short term. That can't play out well. Our current recession has to be a taste of what's to come--there just isn't going to be the level of energy input needed to pull us out the way WWII did for the Great Depression, so even if the contraction is slow it will still be a contraction and that will be hard on everyone who is even a little overextended or otherwise dependent on continued growth, which may be all of us, even those of us who have eliminated our debt and have a little cash set aside. &lt;br /&gt;&lt;br /&gt;We're starting to see practical electric vehicles coming on line, but does that help if it keeps us from replacing the suburbs with more dense towns and villages? If we aren't building trains at the same time we're building wind farms, I fear we're missing the point. Why can't I take a 300kph train from Austin to Houston or Austin to Dallas?&lt;br /&gt;&lt;br /&gt;On the other hand, it's going to be a long time before we can electrify air travel, barring some unexpected miracle in electricity storage density and planes can't run on coal, so I think we're not far away from seeing air travel severely curtailed. The answer to my train question is "because I can fly Austin to Dallas on SWA for $100.00"--what motivation does any private enterprise have in building that train and what motivation does the State of Texas have, given it's millions in the hole just now? None. But I think that economic picture has to change soon (within the next decade).&lt;br /&gt;&lt;br /&gt;But in many ways Kunstler's arguments are about financial chaos as a side effect of inevitable contraction and that seems much scarier and more likely than simply running out of fuel for cars. We're already in a situation where credit is hard to get. It won't matter how many electric cars GM or Toyota builds if nobody can get a loan to buy them.&lt;br /&gt;&lt;br /&gt;So I don't know. It seems likely that market forces will tend to reduce consumption as prices increase, mitigating at least the immediate effects of fall-off in supplies. As the Peak Oil Debunked blog points out, there is a lot of room for conservation in the U.S. For myself, I could almost entirely eliminate the use of my car for things like getting groceries, the pharmacy, going out to eat, as long as I was willing to eat the time cost. We live within walking distance of all the essential services we need. But Austin, like most U.S. cities, doesn't provide the level of public transport needed to make more far-flung trips convenient, must less pleasant. I could shave a couple of kWhs a day from my electricity use if I really had to. But my house is already very energy efficient so it would be about using less A/C and putting all the wall warts on switches and that sort of thing, stuff for which we currently have no economic incentive in the face of the inconvenience and discomfort. I'd rather just invest in more solar panels as long as I have the cash to do so.&lt;br /&gt;&lt;br /&gt;Let's say oil supplies tighten significantly over the next two years and bus ridership goes up--Austin's transit agency is already in serious budget trouble and would have a hard time reacting to a surge in ridership (as they did in 2008). While we finally have a (largely pointless) light rail system, it would take a remarkable effort to put in a more comprehensive trolly or Portland-style light rail system in less than 5 years given public demand for it. &lt;br /&gt;&lt;br /&gt;[I say our light rail system is pointless because it essentially serves one suburb of Austin, making it convenient for people who work downtown and live in Cedar Park to get to work. The train doesn't go anywhere else interesting and doesn't usefully serve anyone south of down town. Ridership has been a fraction of projections and of capacity. Small surprise. Now if there was a train that went from downtown to the airport and that came south to at least Ben White on Congress that would help a lot. I know of no plans along those lines.]&lt;br /&gt;&lt;br /&gt;So is there anything actionable out of this new-found appreciation for peak oil and the inevitable contraction in our economy and life styles? Thanks to a recent inheritance I have some cash available. Should I buy a Volt? Expand my PV system? Buy gold? Put by a year's worth of rice and canned goods? Buy Treasury bonds? Fill my garage with machine tools? Buy a shotgun?&lt;br /&gt;&lt;br /&gt;For now I think I'm going to take the following actions:&lt;br /&gt;&lt;br /&gt;1. Put my name down on the list for the Chevy Volt. Austin is one of four launch cities. &lt;br /&gt;2. Travel with my family as much as we can before air travel becomes a thing of the past for all but the richest humans.&lt;br /&gt;3. Put in the rain water cistern we had to cut from our original house construction project (the rain capture plumbing is in place).&lt;br /&gt;4. Think seriously about expanding the PV system, although I'm hesitant to do so too quickly as new PV technology is developing rapidly.&lt;br /&gt;5. Avoid taking on any new debt--we are currently free of consumer debt and I'd like to keep it that way.&lt;br /&gt;6. Continue to reduce our expectations, as a family, of what "enough" means, and try to teach my daughter that things are not what life is about.&lt;br /&gt;&lt;br /&gt;I'm going to keep tracking opinion and bloviation in the Peak Oil space--it's entertaining if nothing else.&lt;br /&gt;&lt;br /&gt;More as it develops....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2747530061534125110?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2747530061534125110/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2747530061534125110' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2747530061534125110'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2747530061534125110'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/07/at-least-i-can-walk-to-work-part-2.html' title='At Least I Can Walk To Work, Part 2'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-1755279423125730822</id><published>2010-07-19T22:38:00.002-05:00</published><updated>2010-07-19T23:44:15.204-05:00</updated><title type='text'>Good Thing I Can Walk to Work</title><content type='html'>Given some of my colleagues in the XML community, I feel like I might be coming a little late to this party, but I just read James Howard Kunstler's &lt;span style="font-style:italic;"&gt;&lt;a href="http://www.kunstler.com/books.php/#TLE"&gt;The Long Emergency&lt;/a&gt;&lt;/span&gt;, a cogent and calm analysis of peak oil and an exploration of what that is likely to mean for the world as a whole and the U.S. in particular.&lt;br /&gt;&lt;br /&gt;Kunstler's point is basically this: 100 years of cheap oil have allowed us to create a society of artificially inflated wealth, allowing us to overpopulate the Earth far beyond its normal carrying capacity (literally "eating oil" in the form of crops grown with artificial fertilizers, pesticides, and irrigated using water pumped by cheap oil), live in unsustainable suburbs and skyscrapers, and generally live far beyond our means. And this period of cheap oil is about to (and now, 6 years after the book was published, has) start ending, as we pass the "peak oil" point, the point at which the world supply of oil steadily decreases.&lt;br /&gt;&lt;br /&gt;His prediction is that the decrease in supply will inevitably lead to a number of very serious problems, of which the most dire will simply be a lack of food--without cheap oil to irrigate, fertilize, and transport food people will starve. Lots of people. He makes the point that the world population at the start of the industrial revolution was about 1 billion people, which we can take to be the maximum solar carrying capacity of the earth. We're roughly 6 times that now. Not good. This can only lead to resource conflicts of the most serious kind.&lt;br /&gt;&lt;br /&gt;Thus we are entering a time of contraction on all fronts: food supplies, energy for transportation, fuel for heating and industry, feedstocks for chemicals and plastics, etc. The reduction in real wealth will make it harder to maintain our current infrastructure and even build those things that might replace some of the lost oil (how do you raise the money to build wind farms or solar panel fabs when the growth-based economy has collapsed and the cost of everything is rising?). &lt;br /&gt;&lt;br /&gt;In short, the 100-year period of constant growth in wealth is over, never to return. The disruptions will be severe and hard to predict.&lt;br /&gt;&lt;br /&gt;All of this will be exacerbated by global warming (itself caused by coal and oil), which will further disrupt agriculture not to mention flooding coastal cities and, in the worst case, shutting down the gulf stream and freezing Europe.&lt;br /&gt;&lt;br /&gt;Kunstler's arguments seem to be well grounded in fact and a clear historical context. Many of his facts were consistent with what I've read in other contexts. I didn't check his primary sources but he at least documented his key facts. He's not hysterical and does not appear to have any particular ideological axe to grind, other than clearly viewing all politicians as spineless and useless.&lt;br /&gt;&lt;br /&gt;One part of the book that I found particularly striking was his remarkably accurate prediction of the financial meltdown that occurred in 2008. &lt;br /&gt;&lt;br /&gt;He also makes the point that no alternative energy source is going to do much to help, certainly not in the short term (meaning the next few decades) for the simple reason that, even if we could generate all the electricity we needed with solar, wind, and nuclear, we can't replace our oil-based transportation system with an electricity-based one. Not to mention it's unlikely the U.S. would be willing or able to build enough nuclear plants fast enough. Hydrogen is patently nonsense and ethanol is a scam. If start rebuilding the train network now, we might avoid the worst, but we're not seeing any political movement in that direction (although Warren Buffet's investment in Union Pacific Santa Fe seems much more shrewd than it may have seemed at the time). &lt;br /&gt;&lt;br /&gt;If he's anywhere close to right (and I think he is, or I wouldn't be mentioning it here), then we're in for some pretty serious disruptions in all aspects of society.&lt;br /&gt;&lt;br /&gt;For information technology, one question becomes: where does the electricity come from to power the servers that we now depend on for our Google and Amazon and Bing? Not to mention the manufacturing facilities to build the computers themselves. Chip fabs cannot be scaled down to cottage industries. &lt;br /&gt;&lt;br /&gt;I find it interesting that the trend in large-scale server farms has been to cite them in the Pacific Northwest where hydro power is plentiful. That could be to our advantage. Will the economy remain sufficiently intact to even have a need for the sort of abstract computing infrastructure we build and maintain? Will soaring transportation costs make computing that much more important? &lt;br /&gt;&lt;br /&gt;I just returned from a week-long trip to Oxford, UK, which I found to be precisely the sort of small city Kunstler says we'll all have to live in before too long: dense, walkable, with good public transport, not too vertical. I walked and took the bus to get from my hotel to my client about 2 miles away. I ate in neighborhood restaurants, all quite good, no more than a few minutes' walk or a short bus ride away. Several of my colleagues did not own cars and several biked to work every day. The cars on the streets were significantly smaller, on average, than I'm used to here in the States.&lt;br /&gt;&lt;br /&gt;When I got home to Austin I was struck by the almost cartoonish hugeness of the vehicles in the airport parking garage--they seemed to be almost exclusively the hugest SUV models made. Even when we drive our 15-year-old Explorer, which was the largest SUV Ford made at the time we bought it, we often lose it behind bigger monsters parked around it. Our Toyota, which would have been a large car in Oxford, was really hidden.&lt;br /&gt;&lt;br /&gt;How many more overseas trips can I expect to take in my job? I didn't do anything that couldn't have been done well enough remotely, although being there physically made things more efficient and had significant social benefit.&lt;br /&gt;&lt;br /&gt;Another question that immediately came to my mind is what can I do to really prepare for the sort of almost unthinkable change that Kunstler says is coming? I've already done a good bit by moving closer to the city center, building an energy-efficient house that supplies some of it's own electricity (and could supply more given an investment in more PV panels or a small wind generator). We raise chickens and grow a few vegetables and could grow a good bit more. There's land in the neighborhood on which a community garden could be based (could probably feed a good bit of the neighborhood with the open (and largely unused) areas on elementary school grounds just a couple of blocks away).&lt;br /&gt;&lt;br /&gt;But what about natural gas? Kunstler points out something I hadn't known, which is that when natural gas wells tap out, they do so very quickly and without warning. And he claims most of our gas fields are already starting to play out. There's not much you can replace natural gas with. Could I build a digester and produce enough methane from chicken poop? Where would I get the grain to feed the chickens to make the poop? Right now we get organic grain from a mill here in Texas, but before that mill opened, we got it from a mill somewhere quite far away (Kansas? Pennsylvania? I don't remember now). Seems unlikely I could produce enough to keep my on-demand water heater running, much less run the stove or the furnace.&lt;br /&gt;&lt;br /&gt;Here in Austin we're reasonably well situated--we have relatively warm winters, good solar exposure, water from lakes (not the Ogalalla aquifer), decent agricultural land, and less sprawl than most other cities in Texas. But would that be enough? I don't know. Austin was the edge of the frontier when it was first settled in the 1830s. It could be again.&lt;br /&gt;&lt;br /&gt;It would be hard for us to survive full summer heat without air conditioning, even with the passive solar aspects of our house, although our PV system was designed in part to cover the daytime load of the A/C system, so maybe we would be OK there. With an electric car we could get around as long as there was sun to charge it. And we can walk or bike to most of what we need. (Should I get on the list for a Volt--Austin will be a launch city. I'm seriously thinking about it since I do now have the cash to buy one.)&lt;br /&gt;&lt;br /&gt;In any case, the book is a real eye-opener.&lt;br /&gt;&lt;br /&gt;If there is any silver lining to all this it seems pretty clear that a lot of current issues like over-dependence on corn-based food, globalization's homogenizing and dislocating effects, over-sedentary, over-fed first-worlders, will go away pretty quick. But I can't seen those changes being necessarily pleasant or welcomed by the majority. I'm glad I'm handy with tools and subscribe to Make. I'm thinking that maybe having a milling machine and a metal lathe in the garage might be good moves. &lt;br /&gt;&lt;br /&gt;My father grew up on a pear and apple ranch in the Hood River valley of Oregon. We visited there recently and I got to talk to the woman who now runs the ranch (the granddaughter of the man who founded the ranch back in the early 20's, just as oil-based agriculture was coming into its own). We talked about globalization of agriculture and the local food movement and such.&lt;br /&gt;&lt;br /&gt;She pointed out that their ranch, which is small by Hood River valley standards, produces as many pears as the state of Oregon consumes in a year. Most of the pears they grow are sold overseas. If they didn't have access to those markets, where would they sell? It's hard to see that ranch (or the Hood River Valley or the Yakama Valley) being viable in their current forms for much longer.&lt;br /&gt;&lt;br /&gt;So enjoy those pears and $3.00 a pound Yakama cherries while you can. I know I am.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-1755279423125730822?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/1755279423125730822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=1755279423125730822' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1755279423125730822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1755279423125730822'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/07/good-thing-i-can-walk-to-work.html' title='Good Thing I Can Walk to Work'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7171791250815533228</id><published>2010-04-04T09:11:00.013-05:00</published><updated>2010-04-04T10:49:45.024-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iPad apple ebooks'/><title type='text'>My Precious: iPad Day 1</title><content type='html'>I bought an iPad at 9:30 am CDT 3 April 2010. I had to. &lt;br /&gt;&lt;br /&gt;My nominal justification was to see if it would work for my dad. It absolutely will. &lt;br /&gt;&lt;br /&gt;But really I just couldn't not have one. &lt;br /&gt;&lt;br /&gt;Here are my initial impressions:&lt;br /&gt;&lt;br /&gt;-typing is remarkably efficient. I am writing this on the iPad sitting in a comfy chair, pad in my lap (put it on a pillow after a while so cord would reach--battery finally running down). I am a fast touch typist and I can sort of touch type but really it's just fast hunt and peck. But I don't feel like it's slowing me down. I do miss arrow keys--finding navigating around in a multiline edit field a bit tedious. &lt;br /&gt;&lt;br /&gt;- the response is very fast.  Web pages load fast, apps load fast. Not like an iPhone at all. &lt;br /&gt;&lt;br /&gt;- web browsing has full computer feel. Have not yet gone to site that didn't seem to work in safari (flash sites excepted of course)&lt;br /&gt;&lt;br /&gt;- so far everything has just worked, which is a lot of the point of all apple products&lt;br /&gt;&lt;br /&gt;- the only potential issue so far has been the volume indicator wouldn't go away watching some YouTube videos but it hasn't recurred&lt;br /&gt;&lt;br /&gt;- everyone who sees it wants one. Badly. &lt;br /&gt;&lt;br /&gt;- netflix app worked well. Punched up Willy Wonka and it played very nicely&lt;br /&gt;&lt;br /&gt;- &lt;a href="http://periodictable.com/"&gt;The Elements&lt;/a&gt; interactive book is a pretty amazing demonstration of what the device can mean for instruction and reference. &lt;br /&gt;&lt;br /&gt;- I downloaded all the free newspaper apps I could find and they all provided a very satisfying reading experience. One of e things i was looking for was that &lt;a href="http://en.wikipedia.org/wiki/File:2001interview.jpg"&gt;Dave-Bowman-reading-the-paper-on-his-tablet-over-breakfast&lt;/a&gt; experience and i think we have it. Will definitely consider a NYT subscription--we get the Sunday times and usually buy the Tuesday edition. So $4.00 a month for full access would be reasonable. &lt;br /&gt; &lt;br /&gt;- iBooks seemed to work pretty well although I'd really like to be able to add my own epub books to it. Not sure if there's a way to do that in iTunes. &lt;br /&gt;&lt;br /&gt;- upgraded to plants vs zombies HD and have been having a hard time dragging the device away from my daughter (age 6). She also likes the drawing apps. &lt;br /&gt;&lt;br /&gt;- mail working pretty well, but not that different from iPhone experience except for more screen space and easier typing&lt;br /&gt;&lt;br /&gt;- battery life seems as advertised ran all day on a charge including video, lots of PvZ playing, and weak wifi signals&lt;br /&gt;&lt;br /&gt;It definitely meets the pick it up and carry it everywhere requirement, which raises several practical issues:&lt;br /&gt;&lt;br /&gt;- will I ever be able to put it down? There's a serious danger of always having it to hand which means always reading something or playing a game or whatever.&lt;br /&gt;&lt;br /&gt;- where will I set it down?  We have concrete floors so you want to set it in a relatively safe place, of which we have few&lt;br /&gt;&lt;br /&gt;- how do you keep it from being stolen?  &lt;br /&gt;&lt;br /&gt;So far I can say without reservations that it has exceeded my fairly high expectations after a day of use. &lt;br /&gt;&lt;br /&gt;Cory Doctorow has made an &lt;a href="http://www.boingboing.net/2010/04/02/why-i-wont-buy-an-ipad-and-think-you-shouldnt-either/#more"&gt;eloquent and principled argument against the iPad&lt;/a&gt; as being a closed system that is counter to the basic concept of freedom and access the Internet represents. I agree with Cory in principal. I have spent my entire career championing standards specifically because they protect against proprietary control and lock in. Yet I have a MacBook and an iPhone and now an iPad and would not part with them. Why? Because they fricken work. They are solid and beautiful and reliable. Even though Cory is right it doesn't matter because there are very few of us who can trade reliability for openness. &lt;br /&gt;&lt;br /&gt;If Google can build a software and hardware platform comparable to the iPad then I'm there. But so far not even Microsoft much less the open source world has succeeded in building a device (since WebTV) that I would put in father's hands. Even my mother, who is quite computer savvy, has just traded in her dell for a Mac.&lt;br /&gt;&lt;br /&gt;At the same time, the content standards my clients depend on are all well supported: epub for ebooks, HTML for web delivery, PDF for page fidelity. Lack of flash is an annoyance but not a deal killer since nobody should be depending on flash exclusively anyway. &lt;br /&gt;&lt;br /&gt;There is the question of whether the App Store as Apple manages it is draconian or a necessary evil in order to have a system safe for unsupervised use by children. I'm sure I can form a useful opinion without a lot more thought. I'm not one for censoring children's access to information in general once they are old enough to understand what they might be finding, but 6 is not yet that age.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7171791250815533228?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7171791250815533228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7171791250815533228' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7171791250815533228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7171791250815533228'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/04/my-precious-ipad-day-1.html' title='My Precious: iPad Day 1'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-9129192695826680453</id><published>2010-01-09T07:21:00.002-06:00</published><updated>2010-01-09T07:36:20.231-06:00</updated><title type='text'>Need a WebTV Replacement</title><content type='html'>Some years ago now I set my father up with WebTV. It met his needs perfectly: it gave him email and Web access from his TV (he spends most of his time in front of his TV), it was reliable, it didn't require him to learn how to use a computer generally, and it didn't require any support from me (my father is in Tacoma, Washington and I'm in Austin, Texas, so I can't just pop over to provide hands-on support). &lt;br /&gt;&lt;br /&gt;My father is not tech savy--he used a manual typewriter to produce a club newsletter for years until the club finally forced him to upgrade to an electric typewriter. He refuses to carry a mobile phone or use ATMs. You get the idea. However, he depends on email and e-bay so he has to have some sort of Internet access.&lt;br /&gt;&lt;br /&gt;Unfortunately, while Microsoft has not completely abandoned WebTV, they have not enhanced it in years and clearly have no intention of doing so--you can't even download the emulator they used to provide.&lt;br /&gt;&lt;br /&gt;The problem for my father is that WebTV is simply no longer up to the task of supporting modern Web sites and it's becoming harder and harder for him to use e-bay and other Web sites, like Amazon or Flicker. And forget about Facebook.&lt;br /&gt;&lt;br /&gt;My quandry is what to replace WebTV with. So far I haven't been able to identify any obvious good solutions. The Wii's Web browser is close but it's still pretty clunky--even with a keyboard I don't think it would be reliable or simple enough for my dad--it requires a lot of wimote fiddling to scroll and pan around Web sites that don't fit nicely on a screen. &lt;br /&gt;&lt;br /&gt;AppleTV would seem likely except that it doesn't come out of the box with a Web browser and I'm not going to support a hack that adds one.&lt;br /&gt;&lt;br /&gt;A Mac mini might serve, but that gets us into the having a full computer problem, and I'm not sure my dad's TV takes HDMI input (I need to find out about that).&lt;br /&gt;&lt;br /&gt;It seems like the new tablets that are all the buzz of the gadget world might serve, especially the rumored Apple tablet, but I'm not sure my dad would be willing to drop a grand on it, and I'm not keen to have him be an early adopter.&lt;br /&gt;&lt;br /&gt;But I feel like I'm missing some obvious technology choice. Anyone out there have any thoughts about how to provide a TV-connected Web browser that is easy to use, works reliably, and will work with modern Web sites?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-9129192695826680453?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/9129192695826680453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=9129192695826680453' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/9129192695826680453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/9129192695826680453'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/01/need-webtv-replacement.html' title='Need a WebTV Replacement'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2264693740626038805</id><published>2010-01-09T07:19:00.002-06:00</published><updated>2010-01-09T07:21:33.434-06:00</updated><title type='text'>PDF2 Transform Now Enabled for Plugin-Based Extension</title><content type='html'>In the latest 1.5 Toolkit distributions, the PDF2 transform has been enabled for plugin-based extension. This means that you can use normal plugin techniques to provide extensions to the PDF processing that support specializations or global overrides, rather than customizations for specific publication sets or book designs.&lt;br /&gt;&lt;br /&gt;As originally implemented, the PDF2 processor could only be extended through its unique Customization facility, whereby you either add things to its built-in Customization directory or create copies of that directory and then specify where the Customization directory is as a parameter to the transform. This is appropriate for customizations that are not global, that is, they are specific to particular publications, sets of publications, products, or whatever.&lt;br /&gt;&lt;br /&gt;It is not appropriate, however, for providing general extensions, such as support for new domains where the domain-specific processing would normally be the same in all outputs or where the base processing is the same but can be customized using the normal PDF2 customization facilities.&lt;br /&gt;&lt;br /&gt;In the latest DITA 1.5 Toolkit, you can now have both plugin-provided extensions as well as Customization-based extensions. This makes it easy to provide generic PDF2 support for specializations or provide global overrides for existing topic and map types.&lt;br /&gt;&lt;br /&gt;A PDF2-extending plugin can provide only overrides, or only a Customization directory or both.&lt;br /&gt;&lt;br /&gt;For example, for &lt;a href="http://dita4publishers.sourceforge.net"&gt;DITA for Publishers&lt;/a&gt;, I've started implementing support for the Publication Map (pubmap) map domain, which is similar to bookmap but tailored for Publishers. To support the PDF2 transform, I've created a plugin that provides both general extensions and a base Customization directory that can be used as a basis for local customizations.&lt;br /&gt;&lt;br /&gt;The directory structure of the plugin is:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;net.dita4publishers.pubmap.fo/&lt;br /&gt;  Customization/&lt;br /&gt;  xsl/&lt;br /&gt;  plugin.xml&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Where the Customization/ directory follows the rules and conventions for the PDF2 Customization directories and xsl/ holds the plugin-provided XSLTs that extend the base PDF2 processing.&lt;br /&gt;&lt;br /&gt;The plugin.xml file looks like this:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;&amp;lt;plugin id="net.sourceforge.dita4publishers.pubmap.fo"&gt;&lt;br /&gt;  &amp;lt;require plugin="net.sourceforge.dita4publishers.formatting-d.fo"/&gt; &lt;br /&gt;  &amp;lt;require plugin="net.sourceforge.dita4publishers.pubContent-d.fo"/&gt; &lt;br /&gt;  &amp;lt;require plugin="net.sourceforge.dita4publishers.xml-d.fo"/&gt; &lt;br /&gt;  &amp;lt;feature extension="dita.xsl.xslfo"&lt;br /&gt;      value="xsl/pubmap2xslfo.xsl" type="file"/&gt;      &lt;br /&gt;&amp;lt;/plugin&gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;The &lt;require&gt; elements are indicating dependencies on other PDF2-extending plugins for the different domains that DITA For Publishers provides.&lt;br /&gt;&lt;br /&gt;The &lt;feature&gt; line is what integrates the XSLTs into the main PDF2 XSLT transforms and it works just as for the HTML plugins, namely, the integrator.xml Ant tasks adds an xsl:include of the plugin-provided XSLT module into the main PDF2 transform shell XSLT. &lt;br /&gt;&lt;br /&gt;One thing that plugin-provided PDF2 transforms can do is define additional customization points: named attribute sets, named variables, and new XSLT modes, which can then be customized using the normal PDF2 customization mechanisms.&lt;br /&gt;&lt;br /&gt;In the case of the pubmap extensions, I've extended the XSLT so that publication maps produce the same output as bookmaps (that is, a pubmap-d/chapter topicref goes through the same base processing as a bookmap/chapter topicref) and added support for DITA for Publishers-specific topic types, in particular, sidebar, which gets a box around it by default (XSL-FO 1.1 can't render multi-page floats, which would be the ideal way to render sidebars).&lt;br /&gt;&lt;br /&gt;This enhancement to the PDF2 processor, along with the many other improvements made by the Suite Solutions team, makes it much easier to extend and customize the processor and, in particular, support new domains and topic types. The Customization process is as it was, but now you only need to use XSLT in your customization when you need truly customization-specific processing (for example, generating a publication-specific title page or copyright page).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2264693740626038805?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2264693740626038805/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2264693740626038805' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2264693740626038805'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2264693740626038805'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2010/01/pdf2-transform-now-enabled-for-plugin.html' title='PDF2 Transform Now Enabled for Plugin-Based Extension'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-8927283734161374633</id><published>2009-05-16T06:56:00.002-05:00</published><updated>2009-05-16T07:01:45.361-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita'/><title type='text'>Why DITA Requires Topic IDs (And Why Their Values Don't Matter)</title><content type='html'>The DITA standard requires all topics to have an id= attribute.&lt;br /&gt;&lt;br /&gt;Why?&lt;br /&gt;&lt;br /&gt;The reason is simple: so you can point to elements within topics. &lt;span style="font-style:italic;"&gt;And for&lt;br /&gt;no other reason.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Surprised?&lt;br /&gt;&lt;br /&gt;Most people seem to assume that topics are required to have IDs so you can&lt;br /&gt;point to the topics. And they further seem to assume that topic IDs need to&lt;br /&gt;be unique within some fairly wide scope (e.g., within their local topic&lt;br /&gt;repository).&lt;br /&gt;&lt;br /&gt;But that's not the case at all.&lt;br /&gt;&lt;br /&gt;For the case of topics that are the root elements of their containing XML&lt;br /&gt;documents and that contain no elements that themselves have IDs, the topic&lt;br /&gt;ID isn't needed at all. In this case the topic can be unambiguously&lt;br /&gt;addressed by the location of the containing XML document (e.g.,&lt;br /&gt;"mytopic.xml").&lt;br /&gt;&lt;br /&gt;In the case of topics that are not root elements and that are not themselves&lt;br /&gt;pointed to and that do not contain any elements with IDs, again the ID is&lt;br /&gt;not needed (because nothing points at the topic or its elements).&lt;br /&gt;&lt;br /&gt;So why does the DITA standard require topics to have IDs?&lt;br /&gt;&lt;br /&gt;It is because topics establish the addressing scope for their&lt;br /&gt;direct-descendant non-topic elements.&lt;br /&gt;&lt;br /&gt;By the DITA spec, to point to an element that is not a topic you use a&lt;br /&gt;two-part pointer: &lt;code&gt;{&lt;span style="font-style:italic;"&gt;topicid&lt;/span&gt;}/{&lt;span style="font-style:italic;"&gt;elementid&lt;/span&gt;}&lt;/code&gt;.&lt;br /&gt;&lt;br /&gt;Without a topic ID it would be impossible to point to a non-topic element.&lt;br /&gt;&lt;br /&gt;By requiring all topics to have &lt;span style="font-style:italic;"&gt;some&lt;/span&gt; ID, it ensures that any non-topic&lt;br /&gt;elements with IDs are immediately addressable without the need to also add&lt;br /&gt;an ID to their containing topic.&lt;br /&gt;&lt;br /&gt;In general by normal DITA practice, non-topic elements are given IDs only&lt;br /&gt;when they are intended to be either used by conref or be the target of a&lt;br /&gt;cross reference. Both of these tend to be carefully considered decisions&lt;br /&gt;driven by editorial and business rules, not arbitrary author decision. Which&lt;br /&gt;means you would tend to know, in advance of creation of a given element,&lt;br /&gt;that it is a candidate for conref use or xref use, which means you know to&lt;br /&gt;give it an ID at the time you create it.&lt;br /&gt;&lt;br /&gt;By requiring that topics always have IDs it means that authors don't have to&lt;br /&gt;worry about adding IDs to topics just because they also happened to put an&lt;br /&gt;ID on an element. [In normal XML practice, elements are addressed directly&lt;br /&gt;by ID within their containing document, which means it is sufficient to&lt;br /&gt;simply put an ID on the element with no other dependencies. That is not the&lt;br /&gt;case in DITA, which defines its own unique syntax for non-topic element&lt;br /&gt;addressing.]&lt;br /&gt;&lt;br /&gt;Because topic IDs are XML IDs (as opposed to non-topic-element IDs, which&lt;br /&gt;are just name tokens and have no special XML-defined rules), any XML editor&lt;br /&gt;will both require topics to have IDs and ensure that topic IDs are unique&lt;br /&gt;within the scope of their containing document.&lt;br /&gt;&lt;br /&gt;If topic IDs were not required, DITA-aware editors would have to have&lt;br /&gt;special rules to know to require topic IDs whenever non-topic elements got&lt;br /&gt;IDs and it would mean that generic XML editors would not ensure that this&lt;br /&gt;important DITA rule was met (topics with elements with IDs must themselves&lt;br /&gt;have IDs).&lt;br /&gt;&lt;br /&gt;So the DITA spec requires that all topics have IDs.&lt;br /&gt;&lt;br /&gt;But the fact that topics must have IDs does not imply that topic IDs need to&lt;br /&gt;be either descriptive or unique within any scope wider than the XML&lt;br /&gt;documents that contain them.&lt;br /&gt;&lt;br /&gt;In the case where every topic is the root of its own document, the topic ID&lt;br /&gt;can be the same *for every topic*. To make this point I have standard&lt;br /&gt;practice of using the value "topicid" for the IDs of all my root topics.&lt;br /&gt;There is absolutely no need to generate unique topic IDs for document-root&lt;br /&gt;topics as a matter of standard practice.&lt;br /&gt;&lt;br /&gt;The only other case is ditabase documents.&lt;br /&gt;&lt;br /&gt;If you are using ditabase documents, stop.&lt;br /&gt;&lt;br /&gt;Sorry. &lt;br /&gt;&lt;br /&gt;There are some legitimate uses of ditabase documents, for example, as a&lt;br /&gt;first-pass target for data conversions and as a way to hold otherwise&lt;br /&gt;unrelated topics that need to be managed as a single unit of storage, such&lt;br /&gt;as topics that exist only to hold reusable elements.&lt;br /&gt;&lt;br /&gt;[NOTE: Using ditabase simply to allow the mixing of different topic types in&lt;br /&gt;a single document during &lt;span style="font-style:italic;"&gt;authoring&lt;/span&gt;* is &lt;span style="font-style:italic;"&gt;the wrong thing to do&lt;/span&gt;. You should&lt;br /&gt;have already created local shell DTDs and within those shells you can allow&lt;br /&gt;whatever topic type mixing is appropriate for your local environment. There&lt;br /&gt;is no need to use ditabase in that case and many reasons not to. See my many&lt;br /&gt;other posts about why you should always create local shell DTDs as the first&lt;br /&gt;step in setting up a production use of DITA.]&lt;br /&gt;&lt;br /&gt;In that case, the topic IDs must be unique within the scope of the ditabase&lt;br /&gt;element, simply because XML rules demand it. But the IDs need not be unique&lt;br /&gt;beyond that scope and they need not be meaningful.&lt;br /&gt;&lt;br /&gt;One of the implications of this is that if you always edit topics as&lt;br /&gt;individual documents and never have nested topics &lt;span style="font-style:italic;"&gt;you never have to think&lt;br /&gt;about topic IDs&lt;/span&gt;. Your topic document template should already have an ID&lt;br /&gt;value and it can be something like "topicid" and there is no reason&lt;br /&gt;whatsoever for that ID to ever be changed.&lt;br /&gt;&lt;br /&gt;In the case where you do edit topics with nested topics (for example, you're&lt;br /&gt;authoring more or less narrative documents or you've designed some topics&lt;br /&gt;types that need nested topics to allow a bit of hierarchy where the nested&lt;br /&gt;topics would never be meaningful in isolation) then you either have to&lt;br /&gt;configure your editor to assign IDs to the nested topics for you (if your&lt;br /&gt;document template doesn't already have the subtopics with IDs assigned) or&lt;br /&gt;you have to think about it. But even in that case, the IDs can be pretty&lt;br /&gt;generic, e.g. "st1", "st2", etc. The IDs in that case still don't need to be&lt;br /&gt;unique beyond the scope of the containing document.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-8927283734161374633?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/8927283734161374633/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=8927283734161374633' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8927283734161374633'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8927283734161374633'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2009/05/why-dita-requires-topic-ids-and-why.html' title='Why DITA Requires Topic IDs (And Why Their Values Don&apos;t Matter)'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7747973983486717038</id><published>2008-05-06T10:21:00.002-05:00</published><updated>2008-05-06T11:21:18.030-05:00</updated><title type='text'>Help Me Learn: How to Design a Solar Charging CIrcuit</title><content type='html'>I have an general interest in sustainable power systems (my home has a 5-star rating and a 3Kw PV system) but I am not an electrical engineer and have no useful understanding of electrical circuit design beyond very basic stuff (I know what a resister is but I couldn't reliably tell you how resistance relates to current and voltage).&lt;br /&gt;&lt;br /&gt;I have two projects in mind that I'd like to pursue, both of which require a bit more knowledge than I have and I have no idea where to go to get the knowledge--all of my resources are focused on small-scale electronics (digital circuits, basic occilators, etc.).&lt;br /&gt;&lt;br /&gt;My first project is to create a water feature that is variously powered by wind, solar, humans, etc. where all the different power sources contribute to charging a storage system which then drives an electrically-powered pump of some sort. I'm thinking of something like a 6-volt marine battery, something that can hold a good charge and produce enough current to drive a beefy motor.&lt;br /&gt;&lt;br /&gt;What I don't know is how to design a charging circuit that will feed the battery from multiple input sources.&lt;br /&gt;&lt;br /&gt;The other project I'm thinking about is modifying an RV to be electrically driven so that it could be, as much as possible, solar powered (e.g., for traveling about the American West during summer). That is, building an electric RV that would run off batteries for cruising and be recharged by a combination of solar, auxiliary generator (presumably a diesel engine that could run on waste vegetable oil or the most ecologically sound fuel available at the moment), or grid connection when parked.&lt;br /&gt;&lt;br /&gt;It would need to enable a 200- to 300-mile range on a single charge to account for the (almost) worst case where you have no solar input and must recharge overnight from a campground. The worst case is no solar input and no grid access, so you'd have to run the generator in order to get to the nearest power source (or wait out the clouds without your beer getting too warm).&lt;br /&gt;&lt;br /&gt;Some obvious questions are:&lt;br /&gt;&lt;br /&gt;- Assuming an Airstream RV (chosen to minimize drag, even though they're frightfully expensive), how much energy would be required to provide a 200-mile range at 55 MPH?&lt;br /&gt;&lt;br /&gt;- Assuming a more affordable typical RV, what would the cost from drag be?&lt;br /&gt;&lt;br /&gt;- Given current solar panel technology, what output could be expected from the maximum area one could reasonably attach to an Airstream? Would it make sense to include some sort of fold-out panel system for use when parked (e.g., you're stopped for the afternoon at some tourist spot)?&lt;br /&gt;&lt;br /&gt;- Assuming worst case of no solar input and no access to the grid, what size of generator would be needed to enable direct operation of the vehicle at say 40 MPH?&lt;br /&gt;&lt;br /&gt;All of this would go to answering the first question, which is "is this even practical with today's generally-available and affordable technology?" If the answer to that is "no", then what advances would be required to make it affordable?&lt;br /&gt;&lt;br /&gt;We could start with the presumption of a 50,000 USD budget, which is about what it costs to buy a full-sized conventional RV. So if I bought a used one and refit it, could that even be done for that budget?&lt;br /&gt;&lt;br /&gt;Another consideration is the value of not buying fuel. With gasoline pushing 4.00USD a gallon and diesel already over that as of May 2008, a 3000-mile trip at say 8 MPG starts to add up pretty fast. That's roughly 1500 USD in fuel costs for that trip. At 4 dollars a gallon, I can recoup 15,000 USD in investment in 10 years of driving vacations. If fuel was at European rates that payback would of course be much higher (and it seems reasonable to expect that U.S. fuel will climb to approach European rates over the next 10 years simply because of both market pressures and increasing social acceptance of the true cost of our life styles in the face of global warming).&lt;br /&gt;&lt;br /&gt;So I'm wondering if anyone can provide pointers to resources, online or otherwise, where I could start developing the necessary knowledge to start answering these questions?&lt;br /&gt;&lt;br /&gt;I don't think any of this is particularly challenging from either a design or implementation aspect, I just have no idea how to go about learning about it efficiently....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7747973983486717038?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7747973983486717038/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7747973983486717038' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7747973983486717038'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7747973983486717038'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2008/05/help-me-learn-how-to-design-solar.html' title='Help Me Learn: How to Design a Solar Charging CIrcuit'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2067678692192513868</id><published>2008-04-18T09:45:00.003-05:00</published><updated>2008-04-18T10:02:35.925-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita docbook contentwrangler'/><title type='text'>Choosing an XML Schema: DocBook or DITA?</title><content type='html'>Richard Hamilton has presented a thoughtful analysis of when to choose DocBook or DITA, published on the Content Wrangler blog here: &lt;a href="http://www.thecontentwrangler.com/article_comments/choosing_an_xml_schema_docbook_or_dita/"&gt;http://www.thecontentwrangler.com/article_comments/ choosing_an_xml_schema_docbook_or_dita/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I started to post the following as a comment to that post but it got long enough that I thought it better to post my full response here.&lt;br /&gt;&lt;br /&gt;I generally agree with Richard's analysis as far as it goes, but I think it misses several important points that I assert tip the scales significantly in favor of DITA over DocBook.&lt;br /&gt;&lt;br /&gt;If you are looking for a documentation schema that you can just pick up and use and you don't need the modularity features of DITA (that is, you don't need the functionality of DITA maps) then DocBook probably makes the most sense for the reasons Richard cites, namely that there are more element types of likely utility out of the box and the processing infrastructure is more mature and better documented.&lt;br /&gt;&lt;br /&gt;However, if you know you need to add markup for your specific requirements or are developing a new XML application where things like markup tailored for local users or requirements is important or modularity is important, then DITA has a very clear advantage because it is so much easier to develop and extend custom document types from a DITA base than from a DocBook base.&lt;br /&gt;&lt;br /&gt;The reason is very simple: DITA's specialization mechanism, coupled with the declaration set design patterns defined by the DITA architecture, make it as easy as it could possibly be to develop new markup structures. In particular, having defined specializations you may need to do nothing more in order to have documents that use those new types work with existing DITA processors, editors, CMS systems, etc.&lt;br /&gt;&lt;br /&gt;DocBook &lt;span style="font-weight:bold;"&gt;cannot&lt;/span&gt; have this characteristic until such time as it either adopts the DITA specialization mechanism (which it could easily do--I worked hard to have the specialization aspects of DITA defined as distinct from the DITA element types specifically so that it could be adopted by other XML applications with a minimum of fuss) or adds the equivalent functionality using some other syntax [one limitation in the current DITA specialization mechanism is no good way to support namespaced elements--that will be fixed in DITA 2.0 but nobody has yet started to work in earnest on what that might be--this could be an opportunity for DocBook to take the lead since DocBook definitely has a namespace requirement.]&lt;br /&gt;&lt;br /&gt;With any DocBook application, if you define new element types, there is no &lt;span style="font-weight:bold;"&gt;defined&lt;/span&gt; way to map those back to existing types and DocBook processors are not designed to handle new types by processing them in terms of some base type. That means that if you define new element types in a DocBook context you must update &lt;span style="font-weight:bold;"&gt;all&lt;/span&gt; processors that need to act with those documents &lt;span style="font-style:italic;"&gt;even if all they need to do is &lt;span style="font-weight:bold;"&gt;nothing&lt;/span&gt; with those elements&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;On the subject of narrative documents, there is essentially no practical difference between DITA and DocBook in their ability to support the creation of single-instance documents of arbitrary depth. This is obvious for DocBook (because that's what it was designed for), not so obvious for DITA (because it was designed for the opposite).&lt;br /&gt;&lt;br /&gt;But with DITA all you need to do is configure your local doctypes ("shells" in DITA parlance) to allow topics to nest. For example, the simplest case is to simply allow generic topic to test. With that you can represent any possible narrative document structurally.&lt;br /&gt;&lt;br /&gt;The only meaningful difference in this scenario between DITA and DocBook is that DITA requires the body of a section to be wrapped in a container (the topic body), while DocBook does not provide such a container (or at least it didn't last time I looked).&lt;br /&gt;&lt;br /&gt;This is really a trivial difference.&lt;br /&gt;&lt;br /&gt;For several clients who are doing publishing rather than technical documentation I have developed essentially trivial specializations that provide generic topics distinguished only by their topic type names but using otherwise generic DITA elements for content. I usually define a specialized topic called "subsection" that can nest to any depth. With that model you can represent documents as well as or better than you can with DocBook and you get all the other DITA goodness as well.&lt;br /&gt;&lt;br /&gt;Finally, there is a free DITA-to-DocBook transform that is part of the free DITA Open Toolkit that allows you to use all the DocBook processing infrastructure with DITA-based content. This is used, for example, to use non-DITA-aware composition systems like XPP with DITA-based content.&lt;br /&gt;&lt;br /&gt;Because DITA offers a number of very important features that DocBook does not, in particular specialization, modularity, and external links (relationship tables), and because DITA can be configured to work as well for non-modular documents as DocBook can, and because DITA lowers the cost of developing new element types as low as it could possibly be, I've come to the conclusion that DITA is the best answer for any XML-based document-centric application I've seen.&lt;br /&gt;&lt;br /&gt;Just the fact you can get OxygenXML for almost nothing, define a completely new DITA specialization, deploy it to your local Toolkit as a plugin (a very easy operation once you know what to do, something I need to write a tutorial for), you can then edit documents using that specialization in a full-featured graphical, tags off editor &lt;span style="font-style:italic;"&gt;with no additional work of any sort&lt;/span&gt; is pretty powerful. DocBook simply cannot enable that because it doesn't have DITA's specialization feature.&lt;br /&gt;&lt;br /&gt;If DocBook adopted DITA's specialization mechanisms then this discussion wouldn't even be meaningful because DocBook would get all the value that specialization accrues to DITA and would still have the value of being a conceptually simpler model for documents.&lt;br /&gt;&lt;br /&gt;Which raises the question: why doesn't DocBook simply adopt DITA's specialization mechanism? It would cost DocBook almost nothing to add and add tremendous value. It would not require DocBook changing anything about its current markup design, except to possibly back-form some base types that are currently not explicit in DocBook but would be useful as a specialization base. But that would only make DocBook cleaner.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2067678692192513868?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2067678692192513868/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2067678692192513868' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2067678692192513868'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2067678692192513868'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2008/04/choosing-xml-schema-docbook-or-dita.html' title='Choosing an XML Schema: DocBook or DITA?'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2124291750267721932</id><published>2008-02-17T09:58:00.003-06:00</published><updated>2008-02-17T11:06:42.609-06:00</updated><title type='text'>XML is 10</title><content type='html'>The XML Recommendation is celebrating its 10-year anniversary, that is, the anniversary of the official publication of the Recommendation on 10 February 1998. However I think of XML as really starting in 1996, when the activity was revealed publicly for the first time at the SGML 2006 conference. I wrote about XML and its development at that anniversary here: &lt;a href="http://drmacros-xml-rants.blogspot.com/2006/11/xml-ten-year-aniversary.html"&gt;Dr. Macro&amp;#39;s XML Rants: XML: Ten Year Aniversary&lt;/a&gt; (And I discovered this post, which I had totally forgotten about, when I googled "sgml 1996 conference" in order to verify that my memory of the dates was correct. How sad is that? [or conversely, how cool is that?--you choose.]).&lt;br /&gt;&lt;br /&gt;I will re-iterate what I said two years ago: while Tim Bray and Michael Sperberg-McQueen, as the editors of the XML 1.0 Recommendation, are most publicly associated with the XML it was Jon Bosak who made XML happen. It was Jon who put the "SGML on the Web" working group together, personally invited all the initial members, set the working rules that allowed us to work quickly and productively, and managed the political and procedural process of getting XML through the W3C. Jon knew what he wanted and knew the ingredients that were needed and knew how to put them together in a way that would most likely produce the desired result. In that sense he was like a chef producing a dish dependent on the complex interactions of different ingredients, a dish that is not a simple assembly task but one that involves carefully managed reactions and cooking times applied to a variety of ingredients where quality was a key determining factor. &lt;br /&gt;&lt;br /&gt;Without Jon's drive, judgment, and leadership, the XML development process could have easily bogged down or been derailed in any number of ways. It would have taken only one spoiler or resistance from inside the W3C or simple poor management of the process to kill or delay the whole thing.&lt;br /&gt;&lt;br /&gt;It's also important to remember that what we developed as XML represents absolutely no technical innovation. There is nothing in the XML 1 Recommendation that isn't in SGML, with the possible exception of well-formedness being sufficient (since SGML required the use of DTDs with document instances). The genius of XML, and the challenge in developing the spec, was figuring out what to leave out of XML. Each of us on the Working Group had our pet features, without which we felt XML would be at best crippled, at worst useless. I think we did a remarkably good job of not including features that were not essential.&lt;br /&gt;&lt;br /&gt;In retrospect, I wish we had gone farther and left out DTDs and entities entirely, but of course that would not have been politically acceptable at the time and there would have been nothing to replace DTDs with (in fact, I still find it amazing that the XSD spec was ever finished given the challenge inherent in developing that specification given the wide range of requirements and constituencies driving it).&lt;br /&gt;&lt;br /&gt;I think it's also fair to say that XML has succeeded far beyond any of our initial expectations. All we really wanted was a way to publish SGML data using Web technology. It never occurred to us that it would be embraced as a general-purpose data structuring and program-to-program communication format (for good or ill). I've always found it a little annoying that the vast majority of data using XML has nothing or little to do with documents in the sense of information intended primarily for human consumption. Whatever.&lt;br /&gt;&lt;br /&gt;I suppose prognostication is expected at this point.&lt;br /&gt;&lt;br /&gt;Where do I see XML going in the next 10 years?&lt;br /&gt;&lt;br /&gt;I think it's fair to say that XML is entrenched and unlikely to be replaced any time soon. It's hard to imagine that any group would have the motivation and resources to build a general-purpose XML alternative given XML works more than well enough for most of the applications to which it is put. From an engineering standpoint, it would be a case of overoptimization.&lt;br /&gt;&lt;br /&gt;In the domain of structured documentation I think that the DITA standard in particular will accelerate the adoption of XML for docment representation. The values have been well understood for decades and they aren't going to change. Because DITA, leveraging XML's deep and ubiquitous infrastructure, lowers the cost of entry of using XML for sophisticated document representation it can only serve to bring more enterprises and users to XML, users for whom in the past an SGML or even XML solution would have been prohibitively expensive. I find that very exciting. I don't remember well enough to know if that particular effect of XML was envisioned or even hoped for, but I think we all, even at that time, understood to some degree the power that Web technology had in general to make things easier and cheaper. But certainly lowering the cost of &lt;span style="font-style:italic;"&gt;building&lt;/span&gt; XML parsers was a primary design driver, our mythical "graduate student with a weekend" to build a parser. That vision has definitely been realized.&lt;br /&gt;&lt;br /&gt;In the domain of program-to-program communication it would not surprise me if something specifically designed for that task supplants XML, something like JSON. This is a domain where, because there is no particular great body of data, but only processing code, APIs, and support libraries, the engineering equation would make optimization more attractive: there's no question that XML is not the best solution for character-based serialization of arbitrary objects and data structures. I certainly wouldn't object to proposals to replace XML with JSON for those applications. The key is to understand that XML is still the best available solution for persistent data. I think a lot of people who use XML day to day forget (or never were told) that XML, via SGML, was originally designed to facilitate search and long-term, application-independent archiving of data. It is almost coincidence that makes that same application-independence useful for communication of transient data. Convenient but not optimal.&lt;br /&gt;&lt;br /&gt;I fully expect to be able to do more or less what I'm doing now ten or twenty years from now. Whether I will be is another question, but so far, just when I thought I was completely bored of it, something new in the XML world has come along to re-energize my interest. And we're still struggling to build truly useful XML-aware hyperdocument management systems. Hopefully that won't be the case in 2018.&lt;br /&gt;&lt;br /&gt;And lets not forget Dr. Charles Goldfarb, who's own singleminded passion, drive, and leadership produced SGML, without which XML (and HTML, for that matter) would never have happened. SGML turned 20 in 2006. It's largely now forgotten except by a few early adopters who have been using their SGML-based systems productively for ten or fifteen years now and had no compelling business reason to move to XML. But I remember.&lt;br /&gt;&lt;br /&gt;Kids today....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2124291750267721932?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2124291750267721932/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2124291750267721932' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2124291750267721932'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2124291750267721932'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2008/02/xml-is-10.html' title='XML is 10'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-8643482706753827643</id><published>2008-01-23T11:10:00.000-06:00</published><updated>2008-01-24T08:52:10.900-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita publishing fasb asc standards'/><title type='text'>FASB ASC U.S. GAAP DITA Application Is Live</title><content type='html'>[How many initialisms can I get in one post title?]&lt;br /&gt;&lt;br /&gt;For the last year or so I've been working as part of a larger team at the Financial Accounting Standards Board (FASB), helping with the implementation of a DITA-based system to support authoring and delivery of the newly-codified U.S. Generally Accepted Accounting Principles (GAAP), the Accounting Stanards Codification (ASC).&lt;br /&gt;&lt;br /&gt;I contributed design of the DITA topic and map specializations used for the codified content and also implemented automated data conversion from an earlier XML format used in the initial codification editorial process.&lt;br /&gt;&lt;br /&gt;The live Web site is here: &lt;a href="http://asc.fasb.org/home"&gt;http://asc.fasb.org/home&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I've posted some details about the project on the Really Strategies blog: &lt;a href="http://blog.reallysi.com/2008/01/live-dita-appli.html"&gt;http://blog.reallysi.com/2008/01/live-dita-appli.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Update: I should have mentioned (but wasn't 100% sure of the details) that the CMS system was built on the empolis e:CLS product and the Web delivery platform uses the empolis e:IAS search platform. The empolis Web site is &lt;a href="http://www.empolis.com"&gt;www.empolis.com&lt;/a&gt;. I was not personally involved with that aspect of the system and was not involved with FASB's technology selection process (I was brought onto the project after they had selected their core techology). The system integration and development work was done by &lt;a href="http://www.ovitas.com"&gt;Ovitas&lt;/a&gt;, empolis' chief North American integrator.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-8643482706753827643?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/8643482706753827643/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=8643482706753827643' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8643482706753827643'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8643482706753827643'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2008/01/fasb-asc-us-gaap-dita-application-is.html' title='FASB ASC U.S. GAAP DITA Application Is Live'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-525732474158705211</id><published>2008-01-01T12:48:00.000-06:00</published><updated>2008-01-01T13:04:45.021-06:00</updated><title type='text'>I For One Welcome Our Cleaning Robot Overlords</title><content type='html'>As long as they keep the house clean.&lt;br /&gt;&lt;br /&gt;My main gift this holiday season was a new &lt;a href="http://store.irobot.com/product/index.jsp?productId=2804959"&gt;Roomba 560&lt;/a&gt; floor cleaning robot. This brings the cleaning robot population of Chez Kimber/Woods to 3, including our original Roomba 300 series and the Scooba.&lt;br /&gt;&lt;br /&gt;I wanted the new Roomba because I found that, with a two-story house, it was just inconvenient enough to move the one roomba between floors that I was less likely to go to the trouble to run it at all. We also found that the 300 series couldn't really deal with the area rug in our livingroom (we have concrete floors with one big run in the livingroom) and that the noise of running it on the concrete floors was just a little too annoying. So in short, the robot was underused and the house tended to be not as clean as we would like (but couldn't actually be bothered to clean ourselves, not being what you would call obsessive house cleaners).&lt;br /&gt;&lt;br /&gt;The 500 series promised to address all those problems with improved tolerance of things that would stop the 300 series (such as cords, furniture it tended to get trapped under, and the edge of the rug), reduced noise levels, and more effective capturing of pet hair (the 300 tended to just push around big clots of pet hair rather than sucking it up).&lt;br /&gt;&lt;br /&gt;So far I have been very pleased with the 500 series--if anything it exceeded my expectations. It is significantly quieter, handles the rug just fine, doesn't get trapped where the 300 did (we have one big sideboard with these decorative bits at the base that the 300 would tend to get wedged under, the 500 never does) and seems to have a longer battery life.&lt;br /&gt;&lt;br /&gt;So now the old 300 lives upstairs where it can focus on keeping our master bedroom clean and the 500 takes care of the downstairs.&lt;br /&gt;&lt;br /&gt;As I said in my report on the original Roomba, these are amazingly well-engineered products that can serve as models and inspirations for all of us that build things for other people to use. Compared to the 300 the 500 is not signficantly different but there are a number of minor but important refinements that add up to a much improved user experience, from the simplified controls (got rid of one button that wasn't of much use) to the better brushes to the easier-to-empty dirt chamber. And all at a reasonable price.&lt;br /&gt;&lt;br /&gt;And it makes cleaning the house fun.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-525732474158705211?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/525732474158705211/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=525732474158705211' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/525732474158705211'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/525732474158705211'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2008/01/i-for-one-welcome-our-cleaning-robot.html' title='I For One Welcome Our Cleaning Robot Overlords'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-8277780744907857877</id><published>2008-01-01T12:28:00.001-06:00</published><updated>2008-01-01T12:35:34.084-06:00</updated><title type='text'>Loopwing Wind Generator</title><content type='html'>&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/woods-kimber/2154866026/" title="photo sharing"&gt;&lt;img src="http://farm3.static.flickr.com/2371/2154866026_26b5645667_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/woods-kimber/2154866026/"&gt;IMG_4216&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;One of my best Christmas gifts this year was this model of a loopwing wind generator from Tamiya (&lt;a href="http://www.tamiya.com/english/products/75021loopwing"&gt;http://www.tamiya.com/english/products/75021loopwing&lt;/a&gt;)&lt;br /&gt;&lt;br /&gt;I was unaware of this particular wind generation technology but it seems quite intriguing in that it claims to be better able to extract energy from light winds and takes less vertical space (and presumably is less dangerous to birds) than straight-wing wind turbines. This means you could have one on your house in your back yard and maybe not put the entire neighborhood in danger or violate local noise ordinances.&lt;br /&gt;&lt;br /&gt;The kit itself went together quite quickly, the hardest part being cutting out the wings themselves, which actually required a little skill and care rather than just screwing the parts together (there's no gluing or anything).&lt;br /&gt;&lt;br /&gt;The turbine drives a generator that then charges a little model car that plugs onto the top of the generator body. The energy is collected in a super capacitor that can then run the car for about 3 minutes on a full charge.&lt;br /&gt;&lt;br /&gt;The connector to the car appears to be standard connector so it ought to be easy to build other things that can charged. I was thinking a little LED display that indicates the level of output or something or maybe something decorative. It would certainly be easy to adapt it to charging solarengine BEAM robots.&lt;br /&gt;&lt;br /&gt;The generator doesn't swivel to face the wind but it would easy enough to mount it on a turntable with a wind vane if you really cared. I've got it mounted on a pipe that rises to about 5 feet and stands where the north side of our house forms a little wind tunnel that catches the northwest wind that tends to blow this time of year.&lt;br /&gt;&lt;br /&gt;I find the prospect of having a home-sized loopwing generator interesting. We already have a 3K watt PV system on our house--it couldn't be that hard to add in the output from a small turbine, such as described here: &lt;a href="http://www.treehugger.com/files/2006/11/loopwing_wind_t.php"&gt;http://www.treehugger.com/files/2006/11/loopwing_wind_t.php&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Where we are in Central Texas we have a pretty reliable 5-10 MPH breeze most of the time and quite often stronger winds, especially in the spring and fall.&lt;br /&gt;&lt;br /&gt;I think this year will start to see some interesting developments in alternative energy generation. Austin will be home to a new thin-film solar cell factory and is already home to a company trying to make high-capacity capacitors usable in electric vehicles. Taken together those technologies could make electric and solar-electric vehicles much more attractive in cost and range, not to mention the possibilities for home energy.&lt;br /&gt;&lt;br /&gt;For example, imagine having a bank of capacitors that could provide the same power output as the little gas motors in all the three-wheel taxis in all of Asia and that can fit in the space currently used by the fuel tanks those vehicles carry (or otherwise fitted into available space). &lt;br /&gt;&lt;br /&gt;Now imagine putting low-cost, flexible solar panels on the top of each of those three-wheels (they all have some sort of canopy on them) as well as on taxi stand shelters scattered around a typical Asian city. If most of those three-wheelers spend most of their time waiting for a fare, it seems reasonable to think that they could be mostly or entirely charged by their solar panels, taking from the main grid or a taxi-stand battery or capacitor bank only during peak times (e.g., morning and evening rush hour).  Or maybe they could use one of those small fuel cells the Japanese are selling for home power use for peak-time charging where the grid is not reliable (or where natural gas is inexpensive).&lt;br /&gt;&lt;br /&gt;The effect of such a change would be dramatic: a significant source of air polution would be eliminated, the need for fossil fuel would be significantly reduced in a part of the world where oil demand is rising much too sharply, and the operating cost of the taxis themselves would be reduced (assuming both that electricity costs per kilometer would be lower than fuel costs and that much of the operating energy would be from the vehicles' own solar panels).&lt;br /&gt;&lt;br /&gt;With current battery technology, batteries could never be used to realize this vision: they're too expensive and too toxic and have too little energy capacity. But capacitors, if the current claims of orders of magnitude improved capacity prove out, could, because they have both a much higher energy density and lower toxicity (at least I assume they do) and they can charge very quickly, meaning that a taxi could do a 15 or 20 minute fare and then recharge in minutes at a recharging station or charge over say an hour using its own solar cells. That means a three-wheel taxi doesn't need to carry as much on-board energy capacity as it would for a battery solution.&lt;br /&gt;&lt;br /&gt;Assuming the technology were there, what would it cost to, for example, provide a retrofit kit to every tuk-tuk operator in, for example, the Philippines? It would be several hundred million dollars at least (e.g., say $500.00 per vehicle) and as difficult to administer fairly and efficiently as any other aid project, but I would think that there would be lots of incentive from many parties to make something like that happen.  And once the local population got used to the technology and had access to spares and second-hand parts so forth, the technology would be applied in many other creative ways. And at some point you'd hope it would be good enough to, for example, allow Philippine jeepneys to be retrofitted for electric power.&lt;br /&gt;&lt;br /&gt;And cities like Manila and Columbo and New Deli would be much much quieter, with all those two-cycle motors replaced with electric drives.&lt;br /&gt;&lt;br /&gt;Of course, the possibilities for other transforming uses of low-cost, physically flexible (that is, bendable) solar panels in developing and third-world countries are quite exciting. It will be interesting to see how the technology develops in terms of its economics and manufacturing environmental costs. &lt;br /&gt;&lt;br /&gt;While there's no obvious direct connection between XML and alternative energy we, as an industry and as a society of large-scale computer system users are starting to realize that the collective cost of computing equipment does represent a significant fraction of our total societal energy draw. So the degree to which a technology like XML enables more people to do more with system, the greater the power such use will draw. &lt;br /&gt;&lt;br /&gt;As a I write this, I'm sitting in a room with three computers running, drawing a couple hundred watts, as well as using Google and Yahoo, backed by massive data centers drawing terawatts of largely coal-produced electricity (except for those data centers built in Central Washington to take advantage of the cheap hydropower provided by salmon-habitat-destroying dams on the Columbia river and its tributaries). I'd feel a little better about that if I could at least make my urban house electricity self sufficient without spending too much more than I already have on alternative energy systems that make little economic sense under current U.S., Texas, and Austin energy policy (in particular, that, unlike Europe, utilities can buy back excess power at a steep discount from market rates, making the payback on my solar PV system 15 years or more *after* having half the initial cost rebated by the city and federal tax credits). Obviously we did it because we felt it was the right thing do and we could afford it, not because we had any financial incentive to do so).&lt;br /&gt;&lt;br /&gt;Anyway, that's a long way from a cool toy that I got for Christmas....&lt;br clear="all" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-8277780744907857877?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/8277780744907857877/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=8277780744907857877' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8277780744907857877'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8277780744907857877'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2008/01/loopwing-wind-generator.html' title='Loopwing Wind Generator'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm3.static.flickr.com/2371/2154866026_26b5645667_t.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2986909106885278919</id><published>2007-10-12T07:00:00.000-05:00</published><updated>2007-10-12T07:19:20.156-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='mac apple osx'/><title type='text'>I'm Bein' Macified</title><content type='html'>Through a series of more or less accidents I came to have physical possession of Really Strategies' one and only MacBook, purchased in order to support testing and delivery of software to a Mac-based client (which, considering that most of our clients are publishers should be most of them, but apparently hasn't been to date).&lt;br /&gt;&lt;br /&gt;After some soul searching I have decided to make this Mac my primary development machine, giving up my oh-so-familiar Dell Windows-XP-based laptop.&lt;br /&gt;&lt;br /&gt;We'll see how it goes.  I must say that it's been quite an adjustment for me, somebody with nearly 20 years of Windows brain damage, to move to a Mac.&lt;br /&gt;&lt;br /&gt;Of course it helps that most of the development tools I use are completely cross platform: Eclipse, Java, OxygenXML, Syntext Serna. It also helps that OS X is an *nx-based system under the covers, so I can get a command line that is familiar, although the configuration details are not (I've been using Debian-based distributions for most of the time I've used Linux). And other key tools have solid Mac versions (e.g., all the Adobe products).&lt;br /&gt;&lt;br /&gt;I will even be able to get an RSuite server running on this machine, using an unsupported OS X build of MarkLogic.&lt;br /&gt;&lt;br /&gt;I'm even starting to get used to the bizare control key mechanism, although it's still a struggle--it feels like trying to learn a new musical instrument that is just enough different from one you know to really hose you up.&lt;br /&gt;&lt;br /&gt;I'm even writing this post using Safari, rather than Firefox, which I would normally use, but it's acting up this morning.&lt;br /&gt;&lt;br /&gt;So wish me luck as I start on this new adventure in computing....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2986909106885278919?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2986909106885278919/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2986909106885278919' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2986909106885278919'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2986909106885278919'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/10/im-bein-macified.html' title='I&apos;m Bein&apos; Macified'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2375623237381068024</id><published>2007-10-01T07:00:00.000-05:00</published><updated>2007-10-01T07:52:17.814-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xml editors'/><category scheme='http://www.blogger.com/atom/ns#' term='dita specialization ditaopentoolkit'/><title type='text'>Automatic Handling of DITA Docs In XML Editors</title><content type='html'>I'm in demo prep heck at the moment, trying to get some real DITA functionality built on top of Really Strategies' RSuite CMS product. One of the key challenges here is integrating XML editors to handle this use case:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Initial state:&lt;/span&gt; You are presented with some valid, conforming DITA documents in some locally configured and/or specialized document type, organized by one or more maps. &lt;span style="font-style:italic;"&gt;You (and your repository and supporting tools) have never seen this particular set of documents or their DTDs before.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Step 1.&lt;/span&gt; Import map and all dependencies (including its DTD) into the repository&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Step 2.&lt;/span&gt; Within the repository, find a topic to edit and push the "Edit with {name of integrated editor}" button in the repository UI.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Step 3.&lt;/span&gt; Editor opens with document, &lt;span style="font-style:italic;"&gt;with all DITA support features applied.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;It is that step 3 that is currently causing me a bit of pain. And it shouldn't.&lt;br /&gt;&lt;br /&gt;The reason it's causing me pain is because every graphical XML editor has been built on the presumption that document types are relatively static and that some XML specialist will develop lots of doctype-specific setup and then deploy that setup once, followed by a long time with no changes to that setup. &lt;br /&gt;&lt;br /&gt;Thus, if you're presented with new documents in a heretofore unseen DTD, they're not going to work in the editor until you go through the setup and configuration process for the new document types [And remember that DITA 1.1 requires at least six distinct shell types: map, concept, reference, task, glossentry, and dita, plus any additional specialized map or topic types you might have--that's a lot of DTD-specific configurations to set up, even if most of that effort is just copy and paste, it's still tedious and prone to the usual errors of catalog misconfiguration, filename misspelling, and so on.]&lt;br /&gt;&lt;br /&gt;However, DITA totally chunks the assumption of static, well-known doctypes out the window. DITA says "hey, every shell is different, specialize away, apply agile approaches to developing and refining your local DITA-based DTDs, combine topics from everywhere willy-nilly, go nuts, have fun".&lt;br /&gt;&lt;br /&gt;To support this DITA does something very important: it enables reliable auto-recognition of DITA documents, regardless of the details of the local configuration or the use of specialization. &lt;br /&gt;&lt;br /&gt;DITA must have this mechanism because the specialization feature allows generic DITA processing to be reliably applied to &lt;span style="font-style:italic;"&gt;any&lt;/span&gt; conforming DITA document. Because it &lt;span style="font-style:italic;"&gt;can&lt;/span&gt; be, it &lt;span style="font-style:italic;"&gt;should&lt;/span&gt; be.&lt;br /&gt;&lt;br /&gt;For the DITA Open Toolkit this means applying default processing (transforms, filtering, etc.).&lt;br /&gt;&lt;br /&gt;For editors it means applying default editing style sheets, enabling DITA-specific user interface components (e.g., "Insert topicref"), etc., if no more specific configuration already exists for the document or its shell doctype.&lt;br /&gt;&lt;br /&gt;And there's no reason for any DITA-aware editor not to, except that, without exception that I can find, they've all implemented their document-to-functionality mapping in a way that doesn't enable this sort of dynamic association. The closest I've found so far is Syntext's Serna editor, which while it doesn't recognize specialized topics as DITA topics and apply its (very nice) built-in DITA support, it does make it a two-click process to manually apply their built-in DITA support. So kudos to Syntext. But it should be a zero-click process.&lt;br /&gt;&lt;br /&gt;For this automatic process to work processors have to be able to examine any document they're presented with and reliably determine whether or not the document is or is not DITA-based. Note that the Open Toolkit presumes that what it's given is DITA-based because that's the only thing it is designed to process. But things like editors and CMS systems are, for the most part, completely generic and designed to handle any XML at all. So they cannot presume (or at least they should not presume).&lt;br /&gt;&lt;br /&gt;The recognition of DITA documents cannot be based on the use of any particular DTD's system or public ID, because they'll all be different. You can't look for a particular well-known element type because the element types could be completely different from anything previously seen (let's imagine a specialization where all the element type names are in Chinese--there's nothing that prevents it and if I was a native reader of Chinese and wanted to create tech docs I'd probably do just that).&lt;br /&gt;&lt;br /&gt;That means you've got to go by something invariant that is reliably in every document. In XML that really means the use of a particular well-known namespace. However, DITA element types &lt;span style="font-style:italic;"&gt;cannot be in namespaces&lt;/span&gt; because the current DITA class mechanism syntax cannot support namespace-qualified names. Knowing that about DITA you might think "well what to do then?"&lt;br /&gt;&lt;br /&gt;However, just because &lt;span style="font-style:italic;"&gt;elements&lt;/span&gt; can't be in a namespace, it doesn't mean &lt;span style="font-style:italic;"&gt;attributes&lt;/span&gt; can't be. And that's the trick DITA uses in DITA 1.1 to enable autorecognition of DITA documents, regardless of any other aspects of the DTD (it's public or system IDs, the element type names used, etc.).&lt;br /&gt;&lt;br /&gt;This trick is the DITAArchVersion attribute. This attribute is in the namespace "http://dita.oasis-open.org/architecture/2005/". Any document that includes this namespace is almost certainly a DITA document, especially if the namespace qualifies an attribute named "DITAArchVersion" and the element on which that attribute occurs has a class= attribute conforming to the DITA class attribute syntax.&lt;br /&gt;&lt;br /&gt;This means that regardless of the actual DTD or schema a DITA document uses, it can be recognized as being a DITA document. That means that you can then reliably and usefully apply default DITA processing to the document without having specifically configured its particular DTD or schema as being a DITA schema.&lt;br /&gt;&lt;br /&gt;That is, the behavior I expect from any editor that claims to be DITA-aware is that if I open any conforming DITA document, regardless of what declaration set it happens to use, I should get all the default DITA-specific stuff automatically.&lt;br /&gt;&lt;br /&gt;While the most robust implementation of this behavior would make all the checks described above, it is probably sufficient to assume that if a document's root element has a DITAArchVersion attribute or if the root element is named "dita" and any of its children have a DITAArchVersion attribute, then the document is a DITA document. &lt;br /&gt;&lt;br /&gt;The DITA spec only really recognizes three possible configurations of elements in a conforming DITA document: root of base type "map", root of base type "topic", or root of type "dita" [the dita element is not specializable in DITA 1.1] where its direct child elements are of base type "topic"--anything else is not a conforming DITA document (although it may contain individually conforming topics or maps) and you have no obligation to apply DITA-specific features to it (although you could if you wanted to).&lt;br /&gt;&lt;br /&gt;That's by way of saying it's probably good enough to just look for the DITA namespace anywhere in the document and go by that, but it could lead to false positives in cases where the document is not strictly a conforming DITA document.&lt;br /&gt;&lt;br /&gt;And it would be really cool if editors provided defined extension points by which this type of recognition could be added to doctypes as plug-ins to the editor.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2375623237381068024?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2375623237381068024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2375623237381068024' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2375623237381068024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2375623237381068024'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/10/automatic-handling-of-dita-docs-in-xml.html' title='Automatic Handling of DITA Docs In XML Editors'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-5445094994572838216</id><published>2007-09-23T15:25:00.001-05:00</published><updated>2007-09-23T16:13:47.979-05:00</updated><title type='text'>It Just Works: Not So Much</title><content type='html'>When I started my new job I bought an inexpensive desktop to have as a home development machine (my last desktop had long since died and I'd not had justification or need to replace it). The machine, a Gateway Dual Core Pentium, came preloaded with Vista Business, which would not have been my first choice but there didn't seem to be a lot of options. After putting in another Gig of RAM (having discovered that Vista at idle takes about 950 Meg--yow) Vista ran OK but it just seemed to be stupidly slow, especially Windows Explorer. I really don't know anything about Vista nor do I care to, but at Windows XP seemed to work reasonably well. Vista, running on twice the hardware seems to perform half as fast. Why? Microsoft is certainly capable of producing good software. &lt;br /&gt;&lt;br /&gt;I more and more got tired of having to wait for Explorer to respond and the inexplicably show unzip times and stuff and came close to putting on Ubuntu a couple of times, even though I don't really have time to spend on it. But finally I ran into a problem where I couldn't unzip the Web Tools distributions of Eclipse because of some filenames in the package that are too long for Windows to handle. WTF?&lt;br /&gt;&lt;br /&gt;So I found some very clear instructions for installing Ubuntu alongside an existing Vista installation, resized my Windows partition and installed Ubuntu. That went quite smoothly, which I've come to expect from Ubuntu. So far so good.&lt;br /&gt;&lt;br /&gt;But then things went down hill. First, I couldn't get the sound to work and what I could find in some frantic Googling suggested that others had had the same problem and had not had much luck fixing it. How can that be? This machine should be a pretty standard setup--it's clearly all-Intel motherboard with built-in sound and video.&lt;br /&gt;&lt;br /&gt;I could live without sound (although I wouldn't like it--my desktop is also my office music system) but then the video stopped working and I can't get it back. The machine is attached to a Westinghouse 22in display. On install the desktop was using the standard 1280 x 1024 resolution. Once I got things otherwise working I turned my attention to the display. I reran the X-config application, chose the appropriate resolutions and so forth, and restarted the X server. &lt;br /&gt;&lt;br /&gt;Boom "Out of Range". So far nothing I've been able to do to my X configuration file has made a difference. I was able to track down the manual for the display to get the actual horizontal and vertical sync frequencies but specifying those didn't work. I was not able to find any particularly useful guideance online beyond "rerun the configuration utility" or "specify the right sync values in the config files". The log-in screen is clearly running at the appropriate resolution for the display but why does it stop working when I log in? It's quite maddening.&lt;br /&gt;&lt;br /&gt;But the real problem is that this just should not happen. It should not be possible for the windowing system to not work. As long as this is the case, as long as the &lt;span style="font-weight:bold;"&gt;first and only&lt;/span&gt; way to approach this failure is to open a terminal window and use sudo to edit a configuration file, it's a non-starter for anyone except the geekiest of geeks. If there's anything that should &lt;span style="font-weight:bold;"&gt;always&lt;/span&gt; work it's the display, even if it's to fall back to a lesser resolution. But I'm not even getting that.&lt;br /&gt;&lt;br /&gt;I would really like some distribution of Linux to be a viable alternative to Windows for non-geeks but Ubuntu has definitely failed this test. I'm a geek. I've installed Linux countless times and gotten it working on a variety of machines. I should be able to make it work. Granted I was doing this while trying to do other stuff and didn't have endless hours to solve this problem, but that's the point: I shouldn't have to have endless hours or only expect Ubuntu to work on ancient machines (which is where I've been most successful). Ubuntu advertises itself as the Linux that just works. So it should just work. Certainly it should work on a commercial machine from a major vendor using generic Intel hardware.&lt;br /&gt;&lt;br /&gt;I'm not sure if the problem is a technical one or a cultural one or an economic one. I recognize that hardware manufacturers don't have a huge motivation to write Linux drivers and that a volunteer-based project is dependent on people actually doing what's needed and maybe they will and maybe they won't. But there also seems to be a cultural component of "well, if you can't edit a configuration file you really don't belong in our club". And of course, since there's no single enterprise with economic incentive to make Ubuntu or any Linux distribution work as smoothly as Windows or OS X there's no reason to expect it will. &lt;br /&gt;&lt;br /&gt;But that doesn't make it any less frustrating. It doesn't help that I've recently had to use a Mac for work, which makes it clear that there's no reason a *nx-based system couldn't be as smooth as the commercial systems are, but clearly it helps to have a multi-billion dollar company driving the activity.&lt;br /&gt;&lt;br /&gt;Anyway, it's very frustrating since I had to boot back to Vista, I still can't install the Eclipse Web tools (and don't have any more time to spend working out that problem than I do on my Ubuntu problem), and now half my hard drive is tied up with a a useless operating system.&lt;br /&gt;&lt;br /&gt;I know I can get back to a minimally working system by re-installing Ubuntu, but I have no confidence that I'll have any better luck getting the display to work.&lt;br /&gt;&lt;br /&gt;Hmph.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-5445094994572838216?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/5445094994572838216/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=5445094994572838216' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/5445094994572838216'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/5445094994572838216'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/09/it-just-works-not-so-much.html' title='It Just Works: Not So Much'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-8891771128861270887</id><published>2007-07-20T06:05:00.000-05:00</published><updated>2007-07-20T06:41:59.172-05:00</updated><title type='text'>InDesign CS3 and XML Authoring: Could be Good</title><content type='html'>In my new job at &lt;a href="http://www.reallysi.com"&gt;Really Strategies&lt;/a&gt; I have started digging pretty deeply into how to get XML into and out of Adobe InDesign CS3. This has turned out to be pretty interesting.&lt;br /&gt;&lt;br /&gt;In InDesign CS2 the XML support was somewhat weak. While you could import an XML document into InDesign and then associate styling with it, it was very simplistic in that you had no direct way to do context-based associations and no easy way to script it, either on import or inside the editor. &lt;br /&gt;&lt;br /&gt;In CS3 that has largely changed. CS3 adds several new XML support features that appear to serve to make InDesign a quite powerful XML rendering tool that could be integrated loosely or tightly with any other XML authoring tool to create an interesting environment. (You could, in theory use InDesign to author the XML as well but it wasn't really designed for that and I don't think it's a good use of resources to try to make it an XML editor, not when the process I outline here is so easy to implement.)&lt;br /&gt;&lt;br /&gt;Here's the general mechanism I'm working toward:&lt;br /&gt;&lt;br /&gt;1. Using InDesign, you create a template document that will accept your XML. This requires setting up all the usual styling stuff (page masters, frames, named styles) as well as creating instances of the markup structures that will populate different text frames (InDesign's XML import works by matching imported elements to existing elements and replacing the existing ones with matching structures on import, more or less).&lt;br /&gt;&lt;br /&gt;2. You create an XSLT that takes your XML source and "augments" it with Adobe-specific attributes that specify the per-element-instance mapping to InDesign paragraph and characters, as well as generated elements for any generated text that needs to be rendered as a separate paragraph (analogous to the gentext psuedo elements Arbortext Editor uses to manage generated text display). &lt;br /&gt;&lt;br /&gt;This XSLT can be pretty simple--it's just an identity transform with a little bit of per-element-type logic to define the mapping (and it could be further parameterized through some sort of more direct mapping specification, although I'm not sure it's worth the effort). This script could also re-order things as needed, generate TOCs, etc. But the minimum required is pretty small. There're a few more things you need to handle, but they can be generalized easily enough.&lt;br /&gt;&lt;br /&gt;The main gotcha here is that InDesign is sensitive to newlines in the XML data, because newlines trigger the application of paragraph styles. What I've found so far is that you have to manage the text content very carefully so that you only emit newlines at true paragraph boundaries. This also means that you only apply paragraph styles to the lowest-level elements that will become paragraphs in InDesign--you can't just blindly apply styles at higher levels in the XML hierarchy (InDesign is not XSL-FO).&lt;br /&gt;&lt;br /&gt;3. You run this transform outside of InDesign. InDesign lets you apply a transform as part of the import process, but we don't want to do that for reasons that will become clear in a moment (unless I've missed a feature of InDesign, which is quite possible--I'm still coming up to speed on its intricacies).&lt;br /&gt;&lt;br /&gt;I use OxygenXML for most XML editing and it provides a very convenient mechanism for applying a transform to a document and saving the result wherever you want. But any good XML editor should provide a way to do this so that you have some sort of "run the transform" button or menu item. The key is that the result (the XML with the InDesign augmentations) is always put to some consistent place.&lt;br /&gt;&lt;br /&gt;4. Import the augmented XML (not the XML you're authoring in your XML editor) into InDesign using InDesign's XML import (without applying the XSLT) but being sure to check the "link to XML" check box and select "merge" not "append"--this is the key.&lt;br /&gt;&lt;br /&gt;5. Go back to your XML editor, make changes to the XML and push the "transform" button again.&lt;br /&gt;&lt;br /&gt;6. Switch to InDesign and bring up the Links pallet. In that you'll find your XML document listed. Select it and click the "update link" button. Magically, your XML changes are re-imported into InDesign and the styles applied.&lt;br /&gt;&lt;br /&gt;Hey presto! Immediate, easy, convenient pagination of XML using InDesign. Something that was not immediate, easy, or convenient with CS2.&lt;br /&gt;&lt;br /&gt;I haven't looked into it but it should be possible to script the triggering of the link update as well, although that might require a little C code, I'm not sure. But it's clear that by this mechanism you can use InDesign as a "page preview" mechanism from any XML editor with very little work.&lt;br /&gt;&lt;br /&gt;Beyond the simple element-to-style mapping you can do on import, CS3 also provides scripting support for working with XML in the form of XPath-based functions that allow you to easily apply any script to elements in context. I haven't used this yet but a brief look at the docs suggests that it's just the thing to take your XML to the next level. &lt;br /&gt;&lt;br /&gt;It's still not going to give you what products like Typefi give you, which is complete complex layout heuristics, but it should be sufficient for relatively simple layouts such as typify technical documentation. It occurred to me, for example, that it wouldn't be very hard to create a process that would allow you to use InDesign to create nice books from DITA source using this mechanism. Hmmm&lt;br /&gt;&lt;br /&gt;Note that you can download a one-month eval of InDesign from Adobe's Web site.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-8891771128861270887?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/8891771128861270887/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=8891771128861270887' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8891771128861270887'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/8891771128861270887'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/07/indesign-cs3-and-xml-authoring-could-be.html' title='InDesign CS3 and XML Authoring: Could be Good'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-6023601363628638909</id><published>2007-06-29T14:14:00.000-05:00</published><updated>2007-06-29T15:22:27.894-05:00</updated><title type='text'>And Now for Something Completely Different....</title><content type='html'>Well, not that different actually.&lt;br /&gt;&lt;br /&gt;As of Monday 2 July I will be moving from Innodata Isogen (www.innodata-isogen.com) to Really Strategies (www.reallysi.com). This move comes after 11 years at ISOGEN in its various incarnations, from a 20-person consultancy to a part of DataChannel (riding that Internet wave) to part of Innodata. It was a great period in my career--I got to do lots of interesting things and work with really great people, but it was time for a change, time to see what else was out there. &lt;br /&gt;&lt;br /&gt;One big change with this job move is that I will, for the first time since becoming a Standards Maven, work for a company with a product. I have until now been a fiercely product-independent integrator, working for companies that had no preferential agreements with any vendors, partners with all. &lt;br /&gt;&lt;br /&gt;Really Strategies has in the last year developed and marketed its RSuite Content Management System, targeted primarily at publishers.&lt;br /&gt;&lt;br /&gt;How do I justify this change as still being consistent with my values and principles as a standards-championing consultant? Am I simply throwing away the reputation I've built as a person who hates all products, for the reasons outlined in the first posts to this blog? I hope not. I'm sure people will tell me if I have. &lt;br /&gt;&lt;br /&gt;My justification is driven by the following observations: &lt;br /&gt;&lt;br /&gt;1. My personal situation has changed over the last two years in a way that has forced me to realign my priorities, in particular, having started a family and built a house, I have to give more weight to compensation than I ever had to in the past. In short, I've gone from living well below my means to living just beyond them. And I've got to think about things like gymnastics class and preschool tuition and what it costs to travel with a child and so on. So yes, my convictions have been moderated just a little by demon cash. &lt;br /&gt;&lt;br /&gt;2. For financial and other reasons, I'm not ready to hang up a shingle. It's just too much risk right now. Maybe in a few years.&lt;br /&gt;&lt;br /&gt;3. There are only a few companies that do independent XML-related systems integration in North America and even fewer who could meet my outrageous compensation requirements or otherwise integrate me into their business. &lt;br /&gt;&lt;br /&gt;4. Really Strategies is still first and foremost an integration consultancy that happens to have a product, rather than a product company that has an unavoidable services group. So far they've found that having the product is resulting in much more services work, mostly unrelated to the product. That is, the product is serving as much as a marketing tool as it is as a revenue source (although I understand we're selling a few licenses too, which isn't bad). That means that it's not all about the product, but still about solving client problems.&lt;br /&gt;&lt;br /&gt;5. As ISOGEN was and is, Really Strategies is still about standards and that is reflected in their products as well as their services work (and if it's not, you can be sure I'll have something to say about it, and maybe even something to do about it).&lt;br /&gt;&lt;br /&gt;6. Part of my job responsibility will be contributing to the architectural definition of the RSuite product. Because it's a very new product, even if it turns out to be heinous (which I don't think it is but I haven't yet had a chance to look under the hood), there's lots of opportunity to correct it. In addition, it's built on top of MarkLogic, for which I have a great deal of respect, both in terms of its engineering quality and in terms of its respect for and adherence to the standards it implement  Just the fact that the RSuite engineers made that choice is a good sign (although it's no guarantee of anything, but there are only a couple of correct choices for a CMS base and a whole lot of wrong ones, and MarkLogic is definitely one of the correct choices, in my opinion).&lt;br /&gt;&lt;br /&gt;Thus, while I will be working for a product company, it's in a situation where standards are still paramount and where I'll personally have an opportunity to express my principles and thoughts and ideas about how a product like RSuite should serve its users and not the other way around. And I suspect that most of my work as an integrator will not be related to RSuite at all, simply because of the nature of the types of projects that Really Strategies tends to sell.&lt;br /&gt;&lt;br /&gt;Time, of course, will tell.&lt;br /&gt;&lt;br /&gt;In any case, I'm very much looking forward to this opportunity to have early and direct influence on a product and not just via bug reports and feature requests and whining at the engineers I happen to know inside the company. I'll continue to do that of course, but now I'll be able to whine at engineers I actually work with.&lt;br /&gt;&lt;br /&gt;I will also continue to be involved in standardization activities as much as I can, continuing the work I've been doing with the DITA and XSL-FO standards and probably contributing to other standards that are more relevant to Publishers rather than technical documentors, which has been my primary focus to date.&lt;br /&gt;&lt;br /&gt;And it shouldn't surprise anyone if you start to see stuff about DITA-specific features in RSuite. Just saying....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-6023601363628638909?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/6023601363628638909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=6023601363628638909' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/6023601363628638909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/6023601363628638909'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/06/and-now-for-something-completely.html' title='And Now for Something Completely Different....'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7048873601668583667</id><published>2007-06-29T14:13:00.001-05:00</published><updated>2007-06-29T14:14:36.657-05:00</updated><title type='text'>Comments: My Bad</title><content type='html'>I just realized that since I turned on comment moderation to reduce comment spam, I actually have to go in and moderate comments. Doh!&lt;br /&gt;&lt;br /&gt;My apologies to those who waited a month or more to see their comments approved. I'll try not to let it happen again.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7048873601668583667?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7048873601668583667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7048873601668583667' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7048873601668583667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7048873601668583667'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/06/comments-my-bad.html' title='Comments: My Bad'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-4384543215350094723</id><published>2007-06-01T09:43:00.001-05:00</published><updated>2007-06-01T09:43:26.055-05:00</updated><title type='text'>A Message To Warm a Taghead's Heart</title><content type='html'>&lt;div style="float: right; margin-left: 10px; margin-bottom: 10px;"&gt; &lt;a href="http://www.flickr.com/photos/woods-kimber/524854780/" title="photo sharing"&gt;&lt;img src="http://farm1.static.flickr.com/204/524854780_83fcc74e1d_m.jpg" alt="" style="border: solid 2px #000000;" /&gt;&lt;/a&gt; &lt;br /&gt; &lt;span style="font-size: 0.9em; margin-top: 0px;"&gt;  &lt;a href="http://www.flickr.com/photos/woods-kimber/524854780/"&gt;no-xml-response&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;Trying to connect to my Yahoo! Mail account, I got this popup box. I was quite pleased to see that it was upset about &lt;i&gt;not&lt;/i&gt; getting any XML in  the response.&lt;br clear="all" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-4384543215350094723?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/4384543215350094723/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=4384543215350094723' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4384543215350094723'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4384543215350094723'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/06/message-to-warm-taghead-heart.html' title='A Message To Warm a Taghead&amp;#39;s Heart'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://farm1.static.flickr.com/204/524854780_83fcc74e1d_t.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-6562963629028344615</id><published>2007-04-24T07:35:00.000-05:00</published><updated>2007-04-24T08:16:00.673-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita shell specialization configuration &quot;standard practice&quot;'/><title type='text'>DITA Standard Practice: Always Make Local Shells</title><content type='html'>As I started doing serious work with DITA and in particular implementing specializations, it became clear to me that the first thing anyone using DITA should do is make local copies of all the DITA-provided shell DTDs or schemas. This should just be automatic. [Note: I'm going to use the term "DTD" to mean "DTD or schema" from now on.]&lt;br /&gt;&lt;br /&gt;Why?&lt;br /&gt;&lt;br /&gt;DITA, like DocBook, is a generic standard designed specifically to allow controlled &lt;i&gt;local&lt;/i&gt; configuration and extension. Anyone who uses DITA in any non-trivial way will need to do at least some configuration, if not specialization, before too long. It is the rare user of DITA who genuinely needs to use all of the topic types and domains reflected in the base DITA distribution. Even if you're not using any specializations you probably only need some of the domains DITA provides out of the box.&lt;br /&gt;&lt;br /&gt;By "configuration" I mean adjusting the set of topic types and domains that are or are not used for a given document set. For example, if you're not documenting interactive software you probably have no use for the User Interface domain and would just as soon not have those element types available to authors. Turning off that domain for your authors is "configuration".&lt;br /&gt;&lt;br /&gt;By "specialization" I mean new domains or topic types derived from the base DITA-defined types (see my &lt;a href="http://www.xiruss.org/tutorials/dita-specialization"&gt;specialization tutorial&lt;/a&gt; for a deeper discussion). Even if you don't develop your own specializations, it is likely that you will use specializations developed by others. This will be increasingly likely as the DITA community begins to develop more and more special-purpose specializations--this is one of the really cool things about DITA--it enables and rewards the creation of "plug-ins" that are relatively easy to create, distribute, and integrate with the base DITA document types and supporting infrastructure.&lt;br /&gt;&lt;br /&gt;In order to do either configuration or use specializations you must create local shell DTDs that reflect the local configuration or integrate the specializations.&lt;br /&gt;&lt;br /&gt;Since you're going to do it sooner or later, you might as well start your DITA life there and be prepared. Eat the (minor) pain up front of configuring your local environment to use your local shells and then you're set to go.&lt;br /&gt;&lt;br /&gt;If you set up your local shells first, then as you add new DITA-aware tools to your system, you can simply configure them to use your shells from the get-go, rather than building a system of tools and a set of documents that then have to all be reconfigured later when you finally do implement local shells (or worse, you discover that your system has become such a lava flow that you &lt;i&gt;can't&lt;/i&gt; reconfigure them meaning that you can't do any configuration or use new specializations because the cost of reconfiguration would be too high or too risky).&lt;br /&gt;&lt;br /&gt;NOTE: when you create local shells you &lt;b&gt;&lt;i&gt;must&lt;/i&gt;&lt;/b&gt; give them unique global identifiers (URIs or PUBLIC IDs). You &lt;b&gt;&lt;i&gt;must not&lt;/i&gt;&lt;/b&gt; refer to them by the DITA-defined URIs or PUBLIC IDs. Local shells are just that, local. You create them, you own them, you name them. You should consider the DITA-defined shells and attendant module and entity files to be invariant, meaning that you should never ever modify them directly, but only use them by reference, configured using the DITA-defined configuration mechanisms (parameter entities for DTDs, named groups for schemas).&lt;br /&gt;&lt;br /&gt;All DITA-capable tools should (dare I say "must"?) be capable of using local shells, otherwise they aren't DITA-capable, QED. Probably the biggest potential problem tool is FrameMaker, but then FrameMaker is something of a special case because it's not a true XML tool and it's design makes reconfiguration much more expensive than it is with any other XML editor you're likely to use. I'm sure it can be done but I wouldn't want to have to do it (of course, as systems integrator I might be asked to and of course I would do it but that doesn't mean I'd have to &lt;i&gt;like&lt;/i&gt; it).&lt;br /&gt;&lt;br /&gt;For example, I've just gone through the exercise of setting up Arbortext Editor 5.3 to support editing of heavily specialized topic types. Once you know what to do it's not too hard and is reasonably well documented in the online help. The basic process is:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Put each shell DTD in its own directory, named the same as the DTD or schema file. This organization is at least suggested, if not required, by Arbortext, but it's pretty good general practice anyway (even though the base DITA distribution doesn't organize things this way, there's no reason it couldn't and I've suggested that maybe it should, just on general principles of neatness).&lt;/li&gt;&lt;li&gt;Create an old-style (non-XML) OASIS entity catalog for mapping the URIs of your local shell DTDs to their local location. (Arbortext Editor 5.3 doesn't support XML-syntax catalogs.)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;For each topic or map type shell, copy the Arbortext-specific configuration and style files from the Arbortext-supplied DITA doctypes that are the closest match to your local shells. Rename as necessary per the Arbortext naming conventions.&lt;/li&gt;&lt;li&gt;Edit the configuration files to reflect the details of your shells. This is stuff like setting the name used in the New file dialog, pointing to templates and samples, and so on. For specializations you'll need to account for new element types in the editor configuration, style sheets, and whatnot, if they require special handling.&lt;/li&gt;&lt;li&gt;Update the Arbortext Editor catalog path to include your catalog so it can resolve the references to the DTDs.&lt;/li&gt;&lt;/ol&gt;That's it. I would expect other XML editors to require a similar process (I haven't tried setting up XMetal for these specializations yet so I don't know what its configuration details would be).&lt;br /&gt;&lt;br /&gt;Note too that as long as you are putting your shells in your own directory structures, and not in the dtd/ directory of the DITA Open Toolkit (which you should &lt;b&gt;&lt;i&gt;never&lt;/i&gt;&lt;/b&gt; do), it doesn't matter what you call your shell DTDs. That is, there's no particular reason not to call your local configuration of the concept shell DTD "concept.dtd".&lt;br /&gt;&lt;br /&gt;So if you are a new user of DITA (by which I mean somebody setting up the DITA environment for a defined set of users, not an individual author [unless you are a writing team of one, in which case you are performing both roles]) I strongly urge you to create your own local shell DTDs &lt;i&gt;right now&lt;/i&gt; if you haven't done so already.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-6562963629028344615?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/6562963629028344615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=6562963629028344615' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/6562963629028344615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/6562963629028344615'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/dita-standard-practice-always-make.html' title='DITA Standard Practice: Always Make Local Shells'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-4387821096294242097</id><published>2007-04-24T06:27:00.000-05:00</published><updated>2007-04-24T06:49:09.616-05:00</updated><title type='text'>Comment Spam Continued</title><content type='html'>So after turning on comment moderation, I still got two or three spam comments from Blogger.com members (which are not moderated) and blocked two from non-Blogger members. Which means that comment moderation is not very useful. At least the spam just inane and not particularly offensive.&lt;br /&gt;&lt;br /&gt;My initial take was that it must be humans doing the spamming, but googling on "captcha bypass" quickly leads to information indicating that picture-based captcha can be cracked with 80 to 100% accuracy.&lt;br /&gt;&lt;br /&gt;So I guess there's not much I can do about the spam.&lt;br /&gt;&lt;br /&gt;Hmph.&lt;br /&gt;&lt;br /&gt;It does lead to the idle thought that maybe it will be the spammers who first develop true AI in their quest to win the humans vs bots arms race....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-4387821096294242097?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/4387821096294242097/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=4387821096294242097' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4387821096294242097'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4387821096294242097'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/comment-spam-continued.html' title='Comment Spam Continued'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7632334296081165534</id><published>2007-04-17T16:58:00.000-05:00</published><updated>2007-04-17T17:02:34.131-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='spam'/><title type='text'>Moderating Comments</title><content type='html'>Either spambots have cracked the comment captcha mechanism or humans are being paid to leave comments. In any case, I've turned on comment moderation to try to turn off the comment spam.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7632334296081165534?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7632334296081165534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7632334296081165534' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7632334296081165534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7632334296081165534'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/moderating-comments.html' title='Moderating Comments'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-3077119069303941663</id><published>2007-04-11T07:58:00.000-05:00</published><updated>2007-04-11T08:37:01.669-05:00</updated><title type='text'>XML Documents vs XML Data Packages</title><content type='html'>Both James Clark's recent posts on XML and JSON as well as some recent attempts I've made to describe what I do professionally with respect to XML led me to realize that there doesn't seem to be an easy way to distinguish XML documents that are intended primarily to produce human-consumed results (e.g, published books, Web pages, online help, whatever) and XML documents that are purely for program-to-program communication (e.g., the use case things like JSON are trying to address more effectively than XML necessarily does).&lt;br /&gt;&lt;br /&gt;Also, there's a part of me that wouldn't mind XML being returned to it's more or less strictly document-centered use rather than being the all-purpose data serialization and communication language it's become. Of course that's not really a productive line of thought. &lt;br /&gt;&lt;br /&gt;But it did make me start to think that the stuff that James and others are starting to think about reflects the historical accident that the world, and in particular the Web-based world, needed a more transparent data communication mechanism than things like CORBA and DCOM provided just when XML appeared. People with the requirement saw XML as a way to do what they needed without spending too much time thinking about whether or not it was optimal--it was there and it would work well enough and here we are.&lt;br /&gt;&lt;br /&gt;But it leads me to think that I agree with what I think James is saying: that it's probably not a bad thing to start designing serialization languages that are optimized for the specific tasks of program-to-program communication. &lt;br /&gt;&lt;br /&gt;The existence of such languages would not in any way threaten the status of XML as a language for (human readable) document representation.&lt;br /&gt;&lt;br /&gt;One thing that XML has done is embedded a number of key concepts and practices into the general programming world, such as making a clearer distinction between syntax and abstraction, which sets the base for realizing that once you have the abstraction, the original syntax doesn't matter, which means you can have multiple useful syntaxes for the same abstraction. It has made the general notion of serialization to and from abstract data structures via a transparent, human-readable syntax a fundamental aspect of data processing and communication infrastructures.&lt;br /&gt;&lt;br /&gt;I think this means that we are now at a place where the community at large can see how you could refactor the syntax part of the system without the immediate need to refactor the abstractions (which is where most of the code is bound, that is, code that operates on DOM nodes rather than code that operates on SAX events or directly on XML byte sequences).&lt;br /&gt;&lt;br /&gt;But it seems reasonable to me to at least start planning this refactor simply in the name of system optimization. It will probably take 20 years (it took 20 years to go from SGML as published in 1986 to today, when we can clearly understand why XML isn't the best solution for some applications) but it seems doable.&lt;br /&gt;&lt;br /&gt;While the infrastructure for XML is widely deployed and ubiquitous, we also have the advantage that that infrastructure is by and large modular (in the sense that it's provided by more or less pluggable libraries in Java, .NET, C++, and so on) and in languages that are themselves ubiquitous. &lt;br /&gt;&lt;br /&gt;For example, if Java or .NET released core libraries with base support for something like JSON it would not be hard for application programmers to start refactoring their systems to move from using XML for data packages to using JSON. Of course the systems using heavyweight things like SOAP would have a harder row to hoe.&lt;br /&gt;&lt;br /&gt;If we take the Flickr API as an example of an XML-based API where something like JSON might be a better fit (or at least simplify or optimize the serialization/deserialization process), it would take a few person months on the Flickr end to provide an JSON version of the API (which would have to live along side the XML version) and a few person days or weeks for each of the language-specific client-side bindings for the Flickr API to use the JSON version instead of the XML version. At some point, say in a couple of years, the XML version of the API could be retired. That seems like a reasonable refactor cost if the value of using something like JSON is non trivial (I don't have an opinion on the value of something like JSON in this case--I just don't care enough and I've never had too much patience for "but this is more elegant than that" arguments if that's your *only* argument).&lt;br /&gt;&lt;br /&gt;The Flickr API may be a poor example only in that the data structures communicated are fairly simple, mostly just sets of name/value pairs (metadata on photos) or lists of pictures or sets or users or tags. In that use case, XML works as well as anything else.&lt;br /&gt;&lt;br /&gt;But in a more complex use case, where the data structures serialized are more complicated, in the way James talks about, with non-trivial data types and complex composite object structures and whatnot, I can definitely see a purpose built language having real value, primarily in the ease with which programmers doing the serialization/deserialization can both design and understand the mapping from the objects to the serialized form.&lt;br /&gt;&lt;br /&gt;I spent some time working with the STEP standard (ISO 10303), a standard for generic representation of complex data structures, originally designed for interchange of CAD drawings and 3-D models and eventually generalized into a language for general product data interchange. It provides a sophisticated data modeling language. I was involved in the subgroup that was trying to define the XML interchange representation of STEP models. This turned out to be a really hard problem precisely because of the mismatch between XML data structures and data types (String at the time) and the sophisticated STEP models. It confirmed what I already knew, which was that mapping abstract data structures to efficient and complete XML representations is hard and naive approaches based on simple samples will not work.&lt;br /&gt;&lt;br /&gt;That means that a comparable interchange syntax that &lt;i&gt;is&lt;/i&gt; a better match for complex data structures will have value simply by making the conceptual task easier, so that designing and understanding serialization forms is easy (or at least easier) than it is using XML.&lt;br /&gt;&lt;br /&gt;And then I can have my XML all to myself for creating "real" documents....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-3077119069303941663?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/3077119069303941663/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=3077119069303941663' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/3077119069303941663'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/3077119069303941663'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/xml-documents-vs-xml-data-packages.html' title='XML Documents vs XML Data Packages'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2497253147241822227</id><published>2007-04-10T16:58:00.000-05:00</published><updated>2007-04-10T17:20:08.070-05:00</updated><title type='text'>Help Me Obi-Wan: Java and Encodings and XML</title><content type='html'>While I consider myself a pretty good Java programmer I don't actually do that much processing of XML with Java and so I've never fully internalized the details of SAX and JAXP and all that. Pretty much I just crib code that will get me a DOM and hope it works or get someone else to implement the fiddly bits.&lt;br /&gt;&lt;br /&gt;But today I ran into a wall and all my fiddly bit colleagues are elsewhere so I thought I would ask my readers for help.&lt;br /&gt;&lt;br /&gt;Here's what I'm trying to do: &lt;br /&gt;&lt;br /&gt;I have XML documents with Arabic content. I read these documents into an internal data structure, do stuff, and write the result out as different XML. Should be easy.&lt;br /&gt;&lt;br /&gt;However, I'm finding several odd things that I don't quite understand:&lt;br /&gt;&lt;br /&gt;1. Text.getData() is *not* returning a sequence of Unicode characters, it is returning a sequence of characters that correspond one-to-one to the bytes of the UTF-8 encoding of the original Unicode characters. &lt;br /&gt;&lt;br /&gt;That threw me because I thought XML data *was* Unicode and therefore Text.getData() should return Unicode characters, not a sequence of single-byte chars. Or have I totally misunderstood how Java manages Strings (I don't think so)?&lt;br /&gt;&lt;br /&gt;This is solved by getting the bytes from the string returned by Text.getData() and reinterpreting them using an InputStreamReader with the encoding set to "utf-8". (Is there a better way? Have I again missed something obvious?)&lt;br /&gt;&lt;br /&gt;2. When I save the same document as UTF-16 the DOM construction process fails with "Content not allowed in prolog", which doesn't compute because it's not conceivable that any non-trivial XML parser wouldn't handle UTF-16 correctly.&lt;br /&gt;&lt;br /&gt;3. When re-interpreting the UTF-8 bytes into characters, it mostly works, except that at least one character, \uFE8D (Arabic Letter Alef Isolated Form), whose UTF-8 byte sequence is EF BA 8D, is reported as EF BA EF, which is not a Unicode character and is converted to \uFFFD and "?" by the input stream reader.&lt;br /&gt;&lt;br /&gt;WTF?&lt;br /&gt;&lt;br /&gt;I suspect that I am in fact using a crappy parser but there is so much indirection and layers and IDEs and stuff that it's very difficult, at least for me, to determine which parser I'm using, much less how to control the parser I want to use. I'm developing my code using Eclipse 3.2. I've tried setting my project to both Java 1.4 and 5.0 with no change in behavior.&lt;br /&gt;&lt;br /&gt;For this project I have the Xerces 2.9.0 library (as reported by org.apache.xerces.impl.Version) in my classpath.&lt;br /&gt;&lt;br /&gt;Does anyone have any idea what might be going on here?&lt;br /&gt;&lt;br /&gt;Any help or pointers on what I might be doing wrong or how to fix it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2497253147241822227?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2497253147241822227/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2497253147241822227' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2497253147241822227'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2497253147241822227'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/help-me-obi-wan-java-and-encodings-and.html' title='Help Me Obi-Wan: Java and Encodings and XML'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-4213172201224078331</id><published>2007-04-06T10:40:00.000-05:00</published><updated>2007-04-06T10:44:13.154-05:00</updated><title type='text'>James Clark in the House</title><content type='html'>&lt;a href="http://norman.walsh.name/2007/04/06/JamesClark"&gt;Norm Walsh reports that James Clark has entered the blogosphere&lt;/a&gt;: &lt;a href="http://blog.jclark.com/"&gt;http://blog.jclark.com/&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;Let me add my welcome to Norm's. You can bet that I'm subscribed....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-4213172201224078331?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/4213172201224078331/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=4213172201224078331' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4213172201224078331'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4213172201224078331'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/james-clark-in-house.html' title='James Clark in the House'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-9163611265785690818</id><published>2007-04-06T10:07:00.000-05:00</published><updated>2007-04-06T10:26:43.105-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita specialization ditaopentoolkit'/><title type='text'>DITA Specialization Tutorial Now on Xiruss.org</title><content type='html'>I have started writing a more complete DITA specialization tutorial, which will eventually cover all aspects of DITA specialization and likely lead to additional tutorials on other aspects of using DITA (using the DITA Open Toolkit, writing a Toolkit plug-in, etc.).&lt;br /&gt;&lt;br /&gt;The tutorial itself is published on my xiruss.org site here: &lt;a href="http://www.xiruss.org/tutorials/dita-specialization/"&gt;http://www.xiruss.org/tutorials/dita-specialization/&lt;/a&gt;, including a package with all the source materials as well as the generated HTML version.&lt;br /&gt;&lt;br /&gt;The source materials are managed for development in the XIRUSS Subversion repository here: &lt;a href="http://xiruss-t.svn.sourceforge.net/viewvc/xiruss-t/specialization_tutorial/"&gt;http://xiruss-t.svn.sourceforge.net/viewvc/xiruss-t/specialization_tutorial/&lt;/a&gt; should you for some reason want to track the development of the files or get the very latest stuff (can't imagine why but who knows?) or just get a particular file without downloading the whole package.&lt;br /&gt;&lt;br /&gt;The tutorial includes an improved version of the DITA attribute domain specialization tutorial I posted here a while back.&lt;br /&gt;&lt;br /&gt;It is of course written as a set of DITA topics, which is interesting in and of itself because a tutorial is a type of document for which the DITA concept/task/reference and highly fragmented presentation paradigms are not necessarily a good match. For example, I discovered that the only way to get prev/next links from one topic to the next within a logical narrative sequence of topics is to set their parent container in the organizing map to "sequence". However, this has the effect of numbering each topic in the sequence, which makes sense for the topics that represent a logical sequence of steps within the tutorial, but not for the purely conceptual overview of what DITA specialization is. (This is what the DITA Open Toolkit does today--whether this behavior is required by the DITA spec is a more subtle question.)&lt;br /&gt;&lt;br /&gt;So it raises some issues, like do we need a tutorial-specific set of specializations and corresponding rendering customizations to get the effects I want as a tutorial author, or does the DITA spec need to be refined to reflect these sorts of more subtle rhetorical distinctions? Are my topics that describe a sequence of steps to be performed really task or concept topics (I've coded them as concepts because even in DITA 1.1, the task topic type is too restrictive in the way it represents sequences of steps)?&lt;br /&gt;&lt;br /&gt;This makes the activity more fun than it would otherwise be--I always like it when the things I do result in both concrete products (a useful tutorial) and help to advance the state of our understanding and, hopefully, the supporting infrastructure, in this case, by serving as an experiment in applying DITA to a type of information for which it was not directly designed (not that I'm the first to create tutorials in DITA, or even the first to think about it--see discussion around this on the DITA Users Yahoo group--but as an informal, spare-time activity, this tutorial provides more opportunity for both introspection about the process and methods and, because it's public, more opportunity for community involvement).&lt;br /&gt;&lt;br /&gt;I've also learned a lot about using DITA and hacking the Toolkit and stuff, which makes it fun.&lt;br /&gt;&lt;br /&gt;Now if I could just stop waking up at 5:30 a.m. to work on the thing (It's not that I want to wake up at 5:30, it's just that once I am awake and my brain starts spinning I can't go back to sleep, so I am compelled to start working. Good for productivity, bad for physical and mental health.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-9163611265785690818?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/9163611265785690818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=9163611265785690818' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/9163611265785690818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/9163611265785690818'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/04/dita-specialization-tutorial-now-on.html' title='DITA Specialization Tutorial Now on Xiruss.org'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7758984161297904023</id><published>2007-03-22T10:53:00.000-05:00</published><updated>2007-03-22T11:30:36.143-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita cms content management ditamap link linkmanagement specialization generalization'/><title type='text'>CMS Requirements for DITA</title><content type='html'>At last night's Central Texas DITA User's Group we had a nice presentation from France Baril of &lt;a href="http://www.ixiasoft.com/"&gt;Ixiasoft&lt;/a&gt; on some of the challenges that authoring DITA documents can pose, in particular the need to be able to find topics and know what the dependencies among topics are as you revise the topics through the life cycle of the documentation set.&lt;br /&gt;&lt;br /&gt;This sparked a discussion on some basic requirements on CMS systems that provide DITA-specific features. In addition, one of my colleagues is doing a DITA CMS project for one of our clients and he and I got to talking about what the CMS they're implementing did and didn't do with the DITA data, which revealed that the CMS vendor was perhaps not displaying as much insight into and imagination about how DITA should be managed as it could be.&lt;br /&gt;&lt;br /&gt;So I thought I would try to outline what I think the key DITA non-obvious content management features are that any CMS that claims to provide DITA support should provide. I will not state what should be obvious requirements related to the creation and management of links, the ability to search on content and metadata, and so on.&lt;br /&gt;&lt;br /&gt;See my earlier posts tagged XCMTDMW for a discussion of general XML content management requirements. Those requirements are the base from which these DITA requirements start. Therefore I won't state obvious things like XML-aware query, basic link management, and so on.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Map-Related Requirements&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Maps are a key feature of DITA and the management of maps is essential to productive use of DITA. Key map-related features are:&lt;br /&gt;&lt;br /&gt;M.1 - Import of an entire map as a single action. Given a map, the system should be able to import the map and all maps and topics it directly or indirectly links to as a single action. The system should provide options for how imported maps and topics are organized into whatever the CMS' organization mechanism is (folders, cabinets, whatever). The system should provide options for how to handle the import of link targets that are not of scope "local", including the creation of proxy topics for locally unavailable targets. Following import, all links in the imported content must resolve correctly to their targets &lt;span style="font-style:italic;"&gt;as imported&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;M.2 - Export of an entire map as a single action. Given a map in CMS, the system should be able export to some location outside the CMS (the filesystem, a Zip file, a WebDAV repository, etc.) the map and all of its direct or indirect dependencies. Following import all the links in the exported should resolve correctly to their targets &lt;span style="font-style:italic;"&gt;as exported&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;M.3 - Map-based views. All CMS operations that involve access to topics should require or allow the selection of a map that establishes the "map context" for the operation such that the operation only reflects those subordinate maps and topics that are within the direct or indirect scope of the map. For example, you should be able to select a map and then do queries that only return topics in the map's scope or, when creating direct links, only provide as candidate targets those topics that are in the scope of the map.&lt;br /&gt;&lt;br /&gt;M.4 - Map view of everything. The CMS should be closed over maps such that all DITA-related content is presented in a map (for example, the system synthesizes a map that includes all topics in the repository or all topics within the scope of a particular CMS-specific organizing structure). By the same token, the results of queries should be viewable as DITA maps (that is, given a query result, it is either literally returned as a map to which the normal CMS map functionality is applied or the CMS provides a "save as map" option).&lt;br /&gt;&lt;br /&gt;M.5 - Support for compound maps. CMS must support the use of topicref format="ditamap" to construct "compound maps". Any CMS functionality that modifies existing maps must preserve any pre-existing map-to-map relationships. Any CMS functionality that creates new maps should provide features for creating subordinate maps.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Specialization-Related Features&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;S.1 - All processing applied to specializations automatically. Any CMS functionality that is specific to a DITA-defined base type should be automatically applied to specialized elements. For example, if the CMS provides a feature for importing maps, it should automatically provide that feature for importing any specialization of map. If the CMS provides some form of configuration for mapping from specific element types to generic CMS functionality (for example, indicating which elements are links), such a mapping defined for the DITA base types should automatically be applied to all specialized documents without additional user or configuration effort.&lt;br /&gt;&lt;br /&gt;That is, one of the main points of specialization is that DITA-specific processing &lt;span style="font-weight:bold;"&gt;&lt;span style="font-style:italic;"&gt;just works&lt;/span&gt;&lt;/span&gt; for any specialized element. This is how the DITA Open Toolkit works and all other DITA-aware tools should too.&lt;br /&gt;&lt;br /&gt;Or said another way, DITA awareness means "specialization awareness" in addition to whatever else it might mean.&lt;br /&gt;&lt;br /&gt;S.2 - Capture and maintain the dependency relationships among shell document types and the base and specialized modules they use. For example, if I import a shell document type that includes a specialization module I created, the CMS should capture the dependency between the shell and the specialization module as well as the dependencies from the specialization module to the base DITA-provided modules.&lt;br /&gt;&lt;br /&gt;S.3 - Specialization project management. System should provide features for managing the components of specialization modules as "projects" such that there is clear binding between the specialization module name and the specific implementation components that make it up. This project manager should reflect and, as appropriate, enforce (or at least encourage and reward) the implementation design patterns defined by the DITA architecture. This management should include tracking dependencies among specialization schema components (that is, from local specializations to the DITA-provided modules they depend on).&lt;br /&gt;&lt;br /&gt;S.4 - Generalized views. System should provide ability to see, on demand, a generalized view of a given map or topic. It should provide a way to select the level of generalization desired. This view should be read-only by default but should allow for saving the generalized view as a new object.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7758984161297904023?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7758984161297904023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7758984161297904023' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7758984161297904023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7758984161297904023'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/03/cms-requirements-for-dita.html' title='CMS Requirements for DITA'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-4591237820608691778</id><published>2007-03-15T11:59:00.000-05:00</published><updated>2007-04-06T10:30:15.325-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita specialization props configuration integration'/><title type='text'>Tutorial: Specializing DITA Conditional Attributes</title><content type='html'>[5 April 2007: This tutorial has been incorporated in my more complete and formal DITA Specialization Tutorial hosted here: &lt;a href="http://www.xiruss.org/tutorials/dita-specialization/"&gt;http://www.xiruss.org/tutorials/dita-specialization/&lt;/a&gt;.]&lt;br /&gt;&lt;br /&gt;A new feature in DITA 1.1 is the ability to specialize from the base= and props= attributes. For conditional processing, this lets you add your own attributes rather than using otherprops=, which can be clearer to authors and implementors. [NOTE: at the time of writing the DITA Open Toolkit does not implement support for specializations of props=, but it should be added soon.]&lt;br /&gt;&lt;br /&gt;This form of specialization is fairly easy to implement. This tutorial shows you how to do it using DTDs (the mechanism using Schemas is essentially the same and if you've stepped up to using the DITA 1.1 schemas I'm going presume you can figure this out on your own).&lt;br /&gt;&lt;br /&gt;The specialization requires two things:&lt;br /&gt;&lt;br /&gt;1. Modification of any shell DTDs that need to reflect the specialized attribute (e.g., topic.dtd, reference.dtd, or your own specialized topic types' shell DTDs). You integrate the specialization attribute domain through the shell DTDs.&lt;br /&gt;&lt;br /&gt;2. For each specialization of props=, a .ent declaration set that defines the attribute and a corresponding domain declaration. This is the "attribute domain specialization module".&lt;br /&gt;&lt;br /&gt;Note that as a rule, any production use of DITA will likely require local versions of the DITA-provided shell DTDs, if only to do configuration of the domains you need, so unless you are using DITA very informally, you should already have local copies of all the DITA-provided shell DTDs. Just saying.&lt;br /&gt;&lt;br /&gt;For this tutorial we want to create a specialization of "props=" called "phase-of-moon" that takes as its value one or more moon phase names (e.g., "full", "new", "waning", "waxing", etc.). We will call our domain "moonPhaseProp". (Domains must have unique names within the scope of the shell DTDs or schemas that use them.)&lt;br /&gt;&lt;br /&gt;For organizing the files, I like to create a separate directory to put my local shell DTDs and specializations in. For this tutorial assume we're putting everything in the directory dtd/myspecs within the normal DITA Open Toolkit distribution structure.  (It can go anywhere as long as you configure the entity resolution catalogs appropriately, but for initial development and testing I find it convenient to use relative paths to the various declaration components as that eliminates a variable from the configuration (resolution via catalogs) that can lead to confusing errors. Once you've established that the declarations are correct you should replace all relative paths with absolute URLs or (if you must) PUBLIC IDs that are resolved via catalogs. For my development work I use the OxygenXML editor, which makes it easy to set up catalog configurations for testing resolution via catalogs (and generally testing the correctness of all the parts). Similar tools like XML Spy are probably comparable (but I don't use them).&lt;br /&gt;&lt;br /&gt;Step 1 is to create the attribute domain declaration:&lt;br /&gt;&lt;br /&gt;1.a. Create a file named moonPhasePropsDomain.ent&lt;br /&gt;&lt;br /&gt;1.b. In that file, create these two declarations:&lt;pre&gt;&lt;br /&gt;&amp;lt;!ENTITY % moon-phase-props-d-attribute &lt;br /&gt;   "phase-of-moon &lt;br /&gt;     CDATA &lt;br /&gt;     #IMPLIED&lt;br /&gt;   " &lt;br /&gt;&gt;&lt;br /&gt;&lt;br /&gt;&amp;lt;!ENTITY moon-phase-props-d-att&lt;br /&gt;  "a(props  phase-of-moon)"&lt;br /&gt;&gt;&lt;/pre&gt;The first declaration declares the "phase-of-moon=" attribute and puts it in a parameter entity so we can add it to the DITA-defined &lt;code&gt;%selection-atts&lt;/code&gt; parameter entity via the &lt;code&gt;%props-attribute-extensions&lt;/code&gt; configuration parameter entity.&lt;br /&gt;&lt;br /&gt;The second declaration is the domain declaration string for the attribute domain. It will be added to the value of the "domains=" attribute declared for each topic-type element type.&lt;br /&gt;&lt;br /&gt;You should of course add an appropriate descriptive header to the file as well as a little documentation for the attribute itself.&lt;br /&gt;&lt;br /&gt;This is all that is required for the attribute domain module.&lt;br /&gt;&lt;br /&gt;Step 2 is to integrate the domain into your local copy of each shell DTD. The pattern is the same for each shell. For this tutorial I'm using a copy of the topic.dtd shell.&lt;br /&gt;&lt;br /&gt;2.a. Find the comment that reads "DOMAIN ATTRIBUTE DECLARATIONS". Following that comment, add this declaration:&lt;pre&gt;&amp;lt;!ENTITY % moon-phase-props-d-dec     &lt;br /&gt;  SYSTEM "moonPhasePropsDomain.ent"                                                &lt;br /&gt;&gt;&lt;br /&gt;%moon-phase-props-d-dec;&lt;br /&gt;&lt;/pre&gt;This pulls in the attribute domain module.&lt;br /&gt;&lt;br /&gt;2.b. Find the comment that reads "DOMAIN ATTRIBUTE EXTENSIONS". Following that comment you should see a declaration for the &lt;code&gt;%props-attribute-extensions&lt;/code&gt; parameter entity. It will probably be declared as an empty string.&lt;br /&gt;&lt;br /&gt;Modify the entity replacement text to include a reference to the &lt;code&gt;%moon-phase-props-d-attribute&lt;/code&gt; parameter entity:&lt;pre&gt;&amp;lt;!ENTITY % props-attribute-extensions  &lt;br /&gt;  "%moon-phase-props-d-attribute;"&lt;br /&gt;&gt;&lt;/pre&gt;This adds the "phase-of-moon=" attribute to the &lt;code&gt;%selection-atts&lt;/code&gt; parameter entity which is then included in the &lt;code&gt;%univ-atts&lt;/code&gt; parameter entity, making this new attribute available on most elements (some elements, such as title, are not selection candidates).&lt;br /&gt;&lt;br /&gt;2.c. Find the comment that reads "DOMAINS ATTRIBUTE OVERRIDE". Following that you should see the declaration of the text entity &lt;code&gt;included-domains&lt;/code&gt; and it should include references to a number of "&lt;i&gt;x&lt;/i&gt;-d-att" text entities.&lt;br /&gt;&lt;br /&gt;To this entity add a reference to the &lt;code&gt;moon-phase-props-d-att&lt;/code&gt; text entity:&lt;pre&gt;&amp;lt;!ENTITY included-domains &lt;br /&gt;  "&amp;hi-d-att;&lt;br /&gt;   &amp;ut-d-att;&lt;br /&gt;   &lt;b&gt;&amp;moon-phase-props-d-att;&lt;/b&gt;&lt;br /&gt;   "                &lt;br /&gt;&gt;&lt;/pre&gt;This formally declares your props= attribute specialization so that DITA 1.1 processors will know that "phase-of-moon=" is in fact a conditional attribute and that they should filter on it as appropriate.&lt;br /&gt;&lt;br /&gt;That's all there is to it. Now just repeat Step 2 for each shell DTD you use and you're done.&lt;br /&gt;&lt;br /&gt;Step 3 is to test your declarations to make sure they work. This is simply a matter of creating an XML document that uses your local shell DTD as its DTD and verifying that the "phase-of-moon=" attribute is now available on all elements that allow the selection attributes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-4591237820608691778?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/4591237820608691778/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=4591237820608691778' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4591237820608691778'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4591237820608691778'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/03/tutorial-specializing-dita-conditional.html' title='Tutorial: Specializing DITA Conditional Attributes'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-1650780749100070674</id><published>2007-03-07T10:21:00.000-06:00</published><updated>2007-03-07T10:24:43.070-06:00</updated><title type='text'>Tagging the Old Posts</title><content type='html'>Now that Blogger lets you tag your posts with descriptive tags I'm going through and tagging all my old posts to help with retrieval (which is pretty bad right now--not sure how to address that without creating some sort of hand-crafted index over the posts).&lt;br /&gt;&lt;br /&gt;If you're subscribed to this blog as a feed this may cause you to get re-fed all the old posts. &lt;br /&gt;&lt;br /&gt;If this does happen, I apologize for any stuffing of feed reader inboxes this causes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-1650780749100070674?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/1650780749100070674/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=1650780749100070674' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1650780749100070674'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1650780749100070674'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/03/tagging-old-posts.html' title='Tagging the Old Posts'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-1735902822671185778</id><published>2007-03-07T10:06:00.000-06:00</published><updated>2007-03-07T10:18:46.318-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dita docbook specialization'/><title type='text'>Nothing To Say?</title><content type='html'>I haven't posted in a long while, what with the holidays and work and vacation and being sick and ...&lt;br /&gt;&lt;br /&gt;Since the first of the year I've been very busy with client work, most of it DITA-related (creating sophisticated specializations, doing data analysis of documents that don't have an obvious mapping to DITA, etc.). Very interesting stuff--I've learned a lot about DITA and the Open Toolkit and XSD schemas but nothing that translates directly into pithy blog posts (although I do plan to write a tutorial on creating DITA specializations, which turns out to be remarkably easy once you get the pattern down).&lt;br /&gt;&lt;br /&gt;In the meantime, I haven't really been doing much with any interesting technology nor have I seen much interesting coming down the pike (although Mike Kay's recent posting about assertions in XML Schema is pretty interesting--that could be very powerful if the Working Group can get it right). [And let me say that all the WS-* and identity standards stuff just bores me so totally to tears that I can't stand it--I'm sure it's important stuff but I just don't see how at the end of the day it's really going to matter much to our day to day and if it does I'm certainly not going to be anything other than a naive user of it....]&lt;br /&gt;&lt;br /&gt;So I thought I should post something just to remind people that I'm still out here.&lt;br /&gt;&lt;br /&gt;Some of the topics that are on my list to talk about, but that will require a good bit of time to discuss clearly and cogently, include:&lt;br /&gt;&lt;br /&gt;- Why Norm just doesn't get what's wrong with DocBook and right with DITA, namely specialization&lt;br /&gt;&lt;br /&gt;- So much more the DITA Open Toolkit could do with relationship tables&lt;br /&gt;&lt;br /&gt;- Using DITA maps to model time-specific versions and similar configurations&lt;br /&gt;&lt;br /&gt;- Reforming DITA's linking semantics and addressing infrastructure (a road map for DITA 2.0)&lt;br /&gt;&lt;br /&gt;So here's hoping I have a little more time to write about these things in the future...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-1735902822671185778?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/1735902822671185778/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=1735902822671185778' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1735902822671185778'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1735902822671185778'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/03/nothing-to-say.html' title='Nothing To Say?'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-2278690622708296813</id><published>2007-01-15T11:04:00.000-06:00</published><updated>2007-01-15T12:02:45.921-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='edubuntu ubuntu linux children education'/><title type='text'>Edubuntu: Remarkably easy to set up and use</title><content type='html'>In the spirit of other bloggers in the XML space who have recently talked about their personal experiences with technology and their children and/or Linux, I thought I would mention my experience over the weekend setting up a computer for my daughter.&lt;br /&gt;&lt;br /&gt;My daughter has just turned three and is starting to learn her letters and numbers and how to spell a few words (e.g., her name). I decided it was time to get her her own computer but being cheap I didn't want to go so far as to actually buy one, especially not when I have a veritable scrapyard of old PCs and parts at home.&lt;br /&gt;&lt;br /&gt;As it happens, we moved the Austin office of Innodata to new space last week and as a side effect I got to take home an ancient dual-proc PIII machine. So I decided yesterday, a cold rainy day, to try to build an &lt;a href="http://www.edubuntu.org/"&gt;Edubuntu&lt;/a&gt; machine. Edubuntu is a configuration of &lt;a href=""&gt;ubuntu&lt;/a&gt; Linux specially designed for kids and classroom use. It comes with a number of educational applications and games, including Tuxpaint, which is perfect for Dada as she learns to use the mouse and keyboard. There are some nice little learn-to-use-the-keyboard-and-mouse games as well.&lt;br /&gt;&lt;br /&gt;I also had an LCD display that I wasn't using (in our new house there's really no need for a dedicated desktop and we don't really need or want docking stations for our laptops so the display was only being used as a console for the network firewall machine, which I needed maybe twice a year). &lt;br /&gt;&lt;br /&gt;The machine (which had been named "Doublebot" back when it was a development support box) wouldn't come on so I pulled the power supply out of my old game machine desktop [an AMD box I built some years ago--it had gotten flaky but by that time I was in the process of becoming a parent and long hours of gaming in a room by myself were not really relevant to my now any more] and slapped it into Doublebot, along with a wireless PCI card and the not-quite-as-ancient video card from the old game machine. During this time I was also downloading the bootable CD image for Edubuntu. It did take me a while to figure out how to cable up the various drives but I did eventually get all the jumpers set right and the cables hooked up correctly. Finally the machine got to the point where it was correctly recognizing the drives and trying to boot from them (the hard drive in the machine didn't have a usable operating system on it). &lt;br /&gt;&lt;br /&gt;By the time I got the hardware going the CD image had downloaded and I burned it to a disk. Popped the disk in the drive and it booted right up. The network connection worked, the screen resolution was correct, all the devices were recognized. It just worked. Then I just selected the "install" option and it put itself on the disk drive--I didn't have to do anything beyond select my language and keyboard layout. I let it set up the disk partition for me (I've spent so many hours over the last 10 years or so configuring disk partitions, hours that I'll never get back). I ran the software update, which updated everything to the latest versions, added a few more packages that I wanted, and verified that all the kid stuff worked.&lt;br /&gt;&lt;br /&gt;I put the covers back on and set it up in the livingroom on Dada's little table. Booted it up and showed her how to log in (since she can spell her name she can log in herself, although she is still getting used to seeing dots instead of letters when she puts in her password). She easily spent three hours yesterday playing with Tuxpaint. She got the basic mouse skills remarkably quickly, given that she'd never really used a mouse before, although she still needs help with selecting stuff (and she can't read the message boxes that come up when she accidently clicks on things like "save" or "exit"). She can also use Tuxpaint to type words, which she likes to do.&lt;br /&gt;&lt;br /&gt;I can't tell you how many times I've installed Linux or Windows over the years and this was by far and away the easiest it's ever been--I don't think it could have been any easier unless it had just magically appeared on the hard drive without any physical intervention from me.  Of course I was using a very old computer with fairly old components (the newest part was probably the wireless PCI card and that was at least two years old), so it's no surprise that there were no driver problems or anything, but just the fit and finish was so much better than I've ever seen from a Linux distribution before. I also liked the window environment (I assume it's KDE but I really don't know what it is), partly because it's very close to Windows, which means it looks and behaves like I expect it to.&lt;br /&gt;&lt;br /&gt;The only other thing I did was install secure shell so I could connect to the machine remotely (using Cygwin and Cygwin X11 under Windows) and that was as easy as could be using the Synaptics package manager (of course, I did know what I was doing at that point, having configured a few Linux boxes in my day).&lt;br /&gt;&lt;br /&gt;I would like to see more games and applications for pre-literate children, but I know that that's a lot to ask of the open source community. But I would be willing to pay a fair price for applications that run under Linux (just as I would for Windows-based apps).&lt;br /&gt;&lt;br /&gt;Coupled with the latest versions of Open Office, which seems to finally be able to really handle MS Office stuff completely enough, it might be time to take another look at going to Linux (something I did some years ago but finally got beaten down, in particular by the lack of a version of Arbortext Editor that would run on Linux, back when Arbortext Editor was central to a lot of my work as an integrator, as well as a change in the pricing for VMWare, which enabled running Windows in a virtual machine).&lt;br /&gt;&lt;br /&gt;Hmmm...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-2278690622708296813?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/2278690622708296813/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=2278690622708296813' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2278690622708296813'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/2278690622708296813'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/01/edubuntu-remarkably-easy-to-set-up-and.html' title='Edubuntu: Remarkably easy to set up and use'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-6985903412894171019</id><published>2007-01-05T16:41:00.000-06:00</published><updated>2007-03-07T10:20:42.979-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xinclude use-by-reference specialization'/><title type='text'>Specializing xi:include</title><content type='html'>I've posted before about how useful it is to specialize the XInclude include element--it makes authoring easier, it lets you define constraints on what can be referenced, etc. &lt;br /&gt;&lt;br /&gt;But until now I'd not really appreciated another serious benefit: It avoids ambiguous content models.&lt;br /&gt;&lt;br /&gt;I ran into this in the process of modifying the DocBook 5.0RC1 XSD schemas to add xincludes. The obvious approach of just adding xi:include wherever something that could be included is allowed &lt;b&gt;&lt;i&gt;did not work&lt;/i&gt;&lt;/b&gt; because it created all sorts of ambiguity problems. Doh!&lt;br /&gt;&lt;br /&gt;Consider this content model from DocBook schemas (somewhat modified by me for my local use):&lt;pre&gt;&amp;lt;xs:sequence&gt;&lt;br /&gt;&amp;lt;xs:choice minOccurs="0" maxOccurs="unbounded"&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:glossary"/&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:bibliography"/&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:index"/&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:toc"/&gt;&lt;br /&gt;&amp;lt;/xs:choice&gt;&lt;br /&gt;&amp;lt;xs:choice&gt;&lt;br /&gt;  &amp;lt;xs:sequence&gt;&lt;br /&gt;    &amp;lt;xs:group ref="dbparms:all_blocks" maxOccurs="unbounded"/&gt;&lt;br /&gt;    &amp;lt;xs:element minOccurs="0" maxOccurs="unbounded" ref="docbook:section"/&gt;&lt;br /&gt;  &amp;lt;/xs:sequence&gt;&lt;br /&gt;  &amp;lt;xs:sequence&gt;&lt;br /&gt;    &amp;lt;xs:element maxOccurs="unbounded" ref="docbook:section"/&gt;&lt;br /&gt;  &amp;lt;/xs:sequence&gt;&lt;br /&gt;&amp;lt;/xs:choice&gt;&lt;br /&gt;&amp;lt;xs:choice minOccurs="0" maxOccurs="unbounded"&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:glossary"/&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:bibliography"/&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:index"/&gt;&lt;br /&gt;  &amp;lt;xs:element ref="docbook:toc"/&gt;&lt;br /&gt;&amp;lt;/xs:choice&gt;&lt;br /&gt;&amp;lt;/xs:sequence&gt;&lt;br /&gt;&lt;/pre&gt;The intuitive thing would be to allow xi:include in each place where section or section-like things are allowed.&lt;br /&gt;&lt;br /&gt;But this creates a horribly ambiguous content model. Now I happen to thing that the ambiguity rules are completely bogus, nevertheless, having chosen to live in XSD land I'm stuck with them (at least for now).&lt;br /&gt;&lt;br /&gt;But it should be immediately obvious that if we specialize xi:include to reflect the specific element types of the things we want to include, for example docbook:section_include, then the ambiguity problem goes away because you'll be adding tokens with the same distinction as the existing tokens, so you can never create an ambiguity that wasn't already there.&lt;br /&gt;&lt;br /&gt;I also observe that since xi:include's complex type is named named then you can do the specialization formally using substitution groups at the XSD level. Hmmm.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-6985903412894171019?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/6985903412894171019/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=6985903412894171019' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/6985903412894171019'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/6985903412894171019'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/01/specializing-xiinclude.html' title='Specializing xi:include'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-1316749407161763328</id><published>2007-01-05T14:18:00.000-06:00</published><updated>2007-03-07T10:26:05.564-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='docbook schemas namespaces'/><title type='text'>DocBook, Schemas, and Customization</title><content type='html'>Back in March I was experimenting with trying to put the CALS table element types into their own namespace and then using those types from the context of a different namespace, but with elements from the using namespace allowed in the content of table cells. That led me to ask how best to do it on the XML Schema developer list: &lt;a href="http://lists.w3.org/Archives/Public/xmlschema-dev/2005Mar/0076.html"&gt;http://lists.w3.org/Archives/Public/xmlschema-dev/2005Mar/0076.html&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;If I understand the result of that discussion, the solution was to define an element type that is allowed w/in the cell element that is the head of a substitution group. You make that element abstract so that it can't itself be used in instances. To customize the elements allowed within the table cell you create your own subtype of the abstract element and give it whatever content model you want and put your type in the substitution group of the base type. Whew.&lt;br /&gt;&lt;br /&gt;This works, but has the (potential) problem that it requires an extra, otherwise unnecessary, level of containment &lt;i&gt;if&lt;/i&gt; you want the name of the cell element to be invariant (because, for example, you have processors that can't look at its type hierarchy, which would be the normal case today [because as far as I know the only tool that supports type-aware XSLT processing is the for-money version of Saxon]).&lt;br /&gt;&lt;br /&gt;If you don't care about the cell type name then you can just make it the head of the substitution group. I suppose you could have the local name be invariant, which is a  bit of a hack but would probably mostly work or at least could be made to work easily in XSLTs, such as the ubiquitous DocBook CALS table processing code and all of its many derivatives.&lt;br /&gt;&lt;br /&gt;Fine, so that works for tables, but what about a more general case, like DocBook itself?&lt;br /&gt;&lt;br /&gt;I've started the exercise/experiment of using the new DocBook 5.0RC1 XSD schemas as a base for creating a customized doctype. As part of this customization I want to remove unneeded elements, add new elements, and generally modify content models here and there. &lt;br /&gt;&lt;br /&gt;My first approach was completely brute force: I just copied the DocBook declarations, changed the namespace to my own, and modified things as I needed to. This is essentially the same thing as you would do pre version 5 where there is no namespace (and therefore no clear way to distinguish core DocBook constructs from your customzations at the name level).&lt;br /&gt;&lt;br /&gt;This was easy enough to do (once I factored out the appropriate groups, which were not in the generated XSD schemas) but it's not very satisfying:&lt;br /&gt;&lt;br /&gt;- The elements that come straight from DocBook are not in the DocBook namespace, so processors that actually look at the namespace and expect it to be docbook (that is, processors that don't just look at the local names), will fail to recognize my DocBook elements as DocBook.&lt;br /&gt;&lt;br /&gt;- There's still no distinction between the base DocBook elements and new element types I've added.&lt;br /&gt;&lt;br /&gt;- Reacting to new versions of DocBook will be difficult and tedious because I'll have to manually copy changes from the base DocBook schema to my schema.&lt;br /&gt;&lt;br /&gt;- It sort of misses the point of having a parameterized set of element types that are designed to be refined and extended.&lt;br /&gt;&lt;br /&gt;[NOTE: telling me to use the RelaxNG versions of the schemas is not an option. See my earlier post on RelaxNG and schemas.]&lt;br /&gt;&lt;br /&gt;What I'd like to do is from my top-level schema in my namespace configure the groups used in the various content models to reflect both my removal of unneeded elements from the core DocBook declarations and my addition of new elements in my namespace (keeping them clearly distinct from the base DocBook elements). I'd also like to, as appropriate, use DocBook types as the base for restriction (unfortunately, extension in XSD schemas is essentially useless since you can only add things to the end of content models, you can't do the equivalent of Relax's "interleave").&lt;br /&gt;&lt;br /&gt;So my next experiment is to pull the groups out into a separate namespace. This results in a separate XSD document that is then intended to be copied and modified by the using top-level schema in order to modify the content models as needed. I've done this far enough to let me both add my own element types from my namespace and customize the content models.&lt;br /&gt;&lt;br /&gt;This results in a system of two XSD files for DocBook as distributed (not counting the little ancillary XSDs like xml.xsd and xinclude.xsd):&lt;br /&gt;&lt;br /&gt;- docbook_parms.xsd -- Contains all the attribute sets and groups. Imports docbook.xsd.&lt;br /&gt;&lt;br /&gt;- docbook.xsd -- The base DocBook declarations, imports docbook_parms.xsd&lt;br /&gt;&lt;br /&gt;To create a custom DocBook-based DTD I do the following:&lt;br /&gt;&lt;br /&gt;1. Copy docbook_parms.xsd to myschema_docbook_parms.xsd and add to it an import of my schema (myschema.xsd)&lt;br /&gt;&lt;br /&gt;2. Modify (or copy) docbook.xsd and change the existing import of docbook_parms.xsd to instead point to myschema_docbook_parms.xsd.&lt;br /&gt;&lt;br /&gt;3. Create myschema.xsd that imports both docbook.xsd and myschema_docbook_parms.xsd. This schema declares any new element types I need (in its own namespace).&lt;br /&gt;&lt;br /&gt;4. Modify the groups in myschema_docbook_parms.xsd as needed to reflect my desired changes.&lt;br /&gt;&lt;br /&gt;This feels better but it's still not completely satisfactory. In particular, it requires that you still modify the base DocBook schema in order to change the URL on the import of the parameter file. But that's it--otherwise the base XSD is unmodified and my local element types are in their own namespace. It would be really nice if you could do something like a substitution group but with groups instead of element types--I think that would be much closer to being a replacement for parameter entities then XSD substitution groups are. &lt;br /&gt;&lt;br /&gt;Unfortunately, the DocBook XSDs as currently supplied don't make this very easy. For a complete solution you'd want to have a group for every element that has a unique content model. These groups would then make it easy to locally tweak the content models as needed without having to do anything to the original declarations. Also, there are elements that are clearly subtypes of general types (e.g., chapter and appendix are both instances of an [undefined] "ChapterDivision") and it would be useful to have these types actually declared.&lt;br /&gt;&lt;br /&gt;So this works and it feels much closer to what I think the real intent of DocBook's customization mechanism always was (even though the reality was that you were just making syntactic changes to a copy of the original DTD declarations). &lt;br /&gt;&lt;br /&gt;But I'm wondering if I've missed an easier way to do it? I don't think so because substitution groups won't work in this case (XSD's rules for what can substitute for what are too restrictive, at least with XSD 1.0 and raise the invariant name problem decribed above). But I can't claim to be an XSD wizard so it's quite possible I've missed something.&lt;br /&gt;&lt;br /&gt;In any case, this approach does address what has historically been one of my big complaints about DocBook: until now there was no way, looking at a given document instance, to know what parts of it were base DocBook and which were local modifications, without doing some sort of tedious inspection against the base DocBook declaration set--there was nothing in either the document instance or its local declaration set that told you what was and wasn't DocBook. This was because DocBook had no defined mechanism for classifying things as being or not being from DocBook (e.g., something like DITA's class= attribute or HyTime's architectural form mechanism). Namespaces do give you this, as long as you respect the namespace and don't add your own element types to the DocBook namespace (which of course you could do and again the only way to detect it would a comparison of your declarations with the base DocBook declarations). But if you respect the namespace then distinctions are clear.&lt;br /&gt;&lt;br /&gt;So for now I'm satisfied with this approach. We'll see how I feel after I've done a bit more work with the stuff I'm working on....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-1316749407161763328?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/1316749407161763328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=1316749407161763328' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1316749407161763328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/1316749407161763328'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2007/01/docbook-schemas-and-customization.html' title='DocBook, Schemas, and Customization'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-4547793867053116533</id><published>2006-12-09T09:55:00.000-06:00</published><updated>2007-03-07T10:26:43.898-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='typefi indesign &quot;xml composition&quot;'/><title type='text'>Typefi Publishing System: May not suck</title><content type='html'>One of the encouraging things I saw at XML 2006 is the &lt;a href="http://www.typefi.com/solutions/book.htm"&gt;Typefi Publishing System (TPS)&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;TPS is a plug-in to Adobe InDesign that attempts to automate the process of bringing XML content into InDesign in the most automated way possible. It does this by providing extensions that allow designers to indicate, within InDesign, what components of their page templates are dynamic and will be filled with imported data. You can also define sophisticated rules for dynamic placement and copy fitting that the TPS engine then uses, along with heuristics for layout aesthetics, to create the best page layout it can. TPS then provides an XML format that you can convert your documents to (using XSLT or whatever). The TPS-specific XML data is then imported. However, TPS claims to be able to preserve the mapping back to the original markup so that you can make content changes in InDesign and push them back into the original input (we'll see--that's the aspect of this system about which I'm most dubious given some inherent problems in doing that kind of round tripping).&lt;br /&gt;&lt;br /&gt;At least that is the promise.&lt;br /&gt;&lt;br /&gt;I only saw a demo and talked to the Typefi guys at length so I can't say whether what they showed really works but if it does it's pretty amazing stuff. There are more features of TPS, including something to do with Word, but I didn't pay attention to those as I'm completely focused on XML-based publishing workflows.&lt;br /&gt;&lt;br /&gt;I've spent the better part of the last two years trying to implement the automation of publishing of highly-styled documents using high-end typographic systems (in my case, 3B2) so I am painfully familiar with the inherent challenges in automating things like the placement of figures and sidebars relative to their anchors and automatic copyfitting. It's a hard problem and if Typefi has done at least as much as I have (which it looks like they have) then they have done something of real value.&lt;br /&gt;&lt;br /&gt;What they demonstrated, which included the system used by Lonely Planet Books, was pretty impressive and I had no reason to believe that it was not genuine.&lt;br /&gt;&lt;br /&gt;You can be sure that I will be looking into Typefi more deeply for application in Innodata Isogen's composition practice, as well as an option for our professional services clients who want in-house automation of publishing workflows where XSL-FO is not up to the task.&lt;br /&gt;&lt;br /&gt;Dr. Macro says check it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-4547793867053116533?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/4547793867053116533/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=4547793867053116533' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4547793867053116533'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/4547793867053116533'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/12/typefi-publishing-system-may-not-suck.html' title='Typefi Publishing System: May not suck'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-7157646626479108972</id><published>2006-12-09T08:51:00.000-06:00</published><updated>2007-03-07T10:27:01.126-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;adobe mars&quot; pdf'/><title type='text'>Adobe MARS: Looks Interesting</title><content type='html'>I just returned from the XML 2006 conference in Boston and saw a few of interesting things, including a presention on Adobe MARS and the Typefi product.&lt;br /&gt;&lt;br /&gt;In this post, I want to talk about &lt;a href="http://labs.adobe.com/wiki/index.php/Mars"&gt;Adobe MARS&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;MARS is an XML-based format that is intended as a functional replacement for PDF. It's not really accurate to call it an XML version of PDF because it's not a simple transliteration of PDF into tags (which could be done easily enough) but a ground-up exercise in designing and XML-based scheme for doing what PDF does.&lt;br /&gt;&lt;br /&gt;After seeing Adobe's presentation and talking to the guys from Adobe it's clear that what they've done is a sincere and well-thought-out attempt to Do The Right Thing rather than a cynical recasting of proprietary stuff into markup so it's "open."&lt;br /&gt;&lt;br /&gt;MARS tries to use standards as much as it can and it seems to do so to a remarkable level of completeness. It uses SVG for representing each page, supports the usual standards for media objects (bitmaps, videos, etc.). Uses Zip for packaging, and so on.&lt;br /&gt;&lt;br /&gt;Philip Levy, the chief engineer for MARS, did appologize that they had to add a few extensions to SVG to handle some high-end typography stuff that isn't in SVG but said that if you were, for example, using MARS for office documents that you could get by with pure SVG.&lt;br /&gt;&lt;br /&gt;Within Acrobat, the user experience off MARS is identical to that for PDF: all the behavior and functionality is the same. There is a MARS plug-in for Adobe 8 (reader or professional).&lt;br /&gt;&lt;br /&gt;From a creation and manipulation standpoint, the advantage of MARS over PDF is obvious: you can use all the usual XML infrastructure to create and manipulate the data. That would certainly make things like PDF data extraction easier. The use of SVG would make embedding foreign name-spaced data into the PDF much easier (for example, to preserve structural indicators from the original source, something you can do with PDF today but that very few tools do in fact do).&lt;br /&gt;&lt;br /&gt;The Adobe guys made it clear that MARS is not intended as a replacement for the current PDF format--there's just too much installed infrastructure and dependency on PDF for that to happen quickly and MARS isn't 100% complete over PDFs features (mosly around support for high-end printing workflows, I would guess). But it seems reasonable to think that, like MS Office, Adobe will slowly raise the profile of the XML version of PDF until it can make it the default format rather than the alternative. But I would expect that to take at least five years or more, given the speed with which the publishing industry, in particular, changes (which is more or less glacial).&lt;br /&gt;&lt;br /&gt;Dr. Macro says check it out.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-7157646626479108972?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/7157646626479108972/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=7157646626479108972' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7157646626479108972'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/7157646626479108972'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/12/adobe-mars-looks-interesting.html' title='Adobe MARS: Looks Interesting'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-116483319847273585</id><published>2006-11-29T14:30:00.000-06:00</published><updated>2007-03-07T10:27:20.278-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='relaxng schemas dtds'/><title type='text'>RELAX Wins: Not So Fast</title><content type='html'>&lt;a href="http://www.snee.com/bobdc.blog/2006/11/schema_language_victory_and_ow.html"&gt;Bob DuCharme blogged&lt;/a&gt; about &lt;a  href="http://cafe.elharo.com/xml/relax-wins/"&gt;Elliotte Rusty Harold's declaration that "RELAX Wins"&lt;/a&gt; from which you can then get to all the commentary, of which there is legion. &lt;br /&gt;&lt;br /&gt;For myself, I have never paid any attention to RELAX for the simple reason that I had no particular reason to. It always made sense to me that XSD schemas were the most appropriate replacement for DTDs and I didn't really see a need for anything else. In short, for my clients, using Schemas seems like the only reasonable recommendation, since it is the official W3C schema mechanism, it has a number of important advantages over DTDs, it's ubiquitously supported by most, if not all, XML tools, and seems to be pretty future proof. And I've never gotten that excited about what one could or couldn't do with a particular document constraint language: they are all weak and can never replace the need for validation applications, so really, who cares?&lt;br /&gt;&lt;br /&gt;So the arguments along the lines of "RELAX can {slice your bread | butter your toast | walk your dog}" never really carried much weight for me because it didn't really matter for the stuff I do. Also, there's a sense in which document-level constraints really aren't that important, except for syntax-driven authoring and for providing attribute defaults. Otherwise, it's really just documentation for what authors and processors should do.&lt;br /&gt;&lt;br /&gt;So it never seemed really important to learn anything about RELAX (none of my clients have had them or requested that we develop one).&lt;br /&gt;&lt;br /&gt;But this assertion that "RELAX wins" suggested that I actually look at RELAX--if the tool support is there and it really is easier to use than XSD schemas, maybe it makes sense?&lt;br /&gt;&lt;br /&gt;So I looked and I find I am underwhelmed.&lt;br /&gt;&lt;br /&gt;Why?&lt;br /&gt;&lt;br /&gt;First, like I said, additional constraint features don't really interest me at all, so the fact that RELAX lets you say a few things that Schema can't doesn't carry any weight. Also, the fact that the design is elegant isn't really that compelling either--elegant design by itself is of minimal value unless all other factors are equal.&lt;br /&gt;&lt;br /&gt;But what I find missing is:&lt;br /&gt;&lt;br /&gt;- Any sort of classing mechanism. My focus for the last 15 years has been on architecture-type mechanisms (i.e., HyTime architectures, DITA class hierarchies) and I feel that that approach to schema design and extension is the most effective way to manage systems of related schemas. XSD Schemas do this to some degree (although the mechanism is both somewhat broken {constraints on derived content models is too restrictive} and strangely designed {not closed over classes alone) but it does offer some immediate advantage when you have schemas where there are clear specializations of base types with the document type or you want to enable controlled specializations).&lt;br /&gt;&lt;br /&gt;Unless I missed it, I didn't see any sort of type or class hierarchy mechanism in RELAX at all. I realize that enabling useful type hierarchies is a seriously complicating feature (it is a lot of what makes XSD schemas complicated) but it's also very useful.&lt;br /&gt;&lt;br /&gt;Of course, the counter argument is that XSD schemas don't really work for doing architecture-like things so you have no choice but to do what DITA did and create your own extra-schema mechanism. Maybe so. But in that case RELAX falls down because:&lt;br /&gt;&lt;br /&gt;- No attribute defaults.&lt;br /&gt;&lt;br /&gt;RELAX explicitly does not modify the info set produced for a document (in the core feature set--DTD compatibility does provide defaulted attributes). That's fine, and I think I understand the reason for that, but defaulted attributes are really really handy and enable architecture-style processing without having to have lots of attributes on instance elements. Since I can get default attributes with schemas and just a little bit of custom parser configuration, at least in Java, I find this a definite strike against RELAX (although I suppose I could use the DTD compatibility stuff but I don't know how widely it's supported).&lt;br /&gt;&lt;br /&gt;- Not supported by Arbortext Editor&lt;br /&gt;&lt;br /&gt;As far as I can tell, Arbortext Editor does not support RELAX schemas for document editing. As Arbortext Editor is the main editor used by my clients (and by me) that's a serious problem and essentially removed RELAX as an option.&lt;br /&gt;&lt;br /&gt;Yes, I could use RELAX as the primary form and generate an XSD Schema, but why? That just adds complexity that isn't justified on any other grounds.&lt;br /&gt;&lt;br /&gt;- No self-described relationship between a given RELAX schema and a namespace&lt;br /&gt;&lt;br /&gt;One of the important features of XSD schemas, in my opinion, is the ability to unambiguously relate a schema document to a namespace. This provides, in a standard way, something missing from the namespace specification itself, namely a formal way to define the member names in a namespace, as well as some of the semantics of those names. With schemas you have at least some hope of automatically and reliably associating namespaces with schemas such that given a document that has elements in one or more namespaces, you can have a system that automatically associates those elements with their governing schemas (e.g., the XIRUSS system).&lt;br /&gt;&lt;br /&gt;I see no such mechanism in RELAX. While RELAX lets you associate a namespace with a given element type (which it must to be namespace aware) a given RELAX document can directly define types in any namespaces. This is convenient but doesn't make RELAX particularly useful or reliable as a way to define &lt;i&gt;namespace&lt;/i&gt; constraints (as opposed to document constraints).&lt;br /&gt;&lt;br /&gt;That is, in essence, XSD schemas are intended to define the constraints on &lt;i&gt;namespaces&lt;/i&gt; while RELAX schemas (and DTDs) are intended to define the constraints on &lt;i&gt;documents&lt;/i&gt;. It's a subtle but important difference and one that I think is very important. The schema approach explicitly or implicitly recognizes that fundamentally, documents are arbitrary and what's really important is what the individual elements mean, not how they are organized for storage into documents. This was always the problem with DTDs: they governed instances, not types (that is, the term "document &lt;i&gt;type&lt;/i&gt; definition" was always a lie). RELAX seems to make the same mistake. I see this as SGML brain damage and I have no use for it.&lt;br /&gt;&lt;br /&gt;- No defined mechanism for defining reference constraints&lt;br /&gt;&lt;br /&gt;In DTDs you have ID/IDREF, in XSD schemas you have key/keyref. This is a very important feature, I think, and it's something I make heavy use of in schemas (when I can--there's a still a limitation in XSD with declaring and validating references that cross document boundaries, but I'm not sure that's a solvable problem without a more formal definition of what compound documents are {and I'm not talking about XInclude as currently formulated, which punts in an unacceptable way, as far as I'm concerned}).&lt;br /&gt;&lt;br /&gt;So for all of these reasons, I don't see RELAX being particularly useful, at least as I use XSD schemas and I find no particularly compelling features and I find at least two essential missing features.&lt;br /&gt;&lt;br /&gt;So I must respectfully disagree with Elliotte: RELAX has not won. I don't dispute the utility of RELAX or the elegance of its design but I do dispute any assertion that it is interchangeable with XSD schemas. It is not, for the reasons given above.&lt;br /&gt;&lt;br /&gt;If the choice was "DTDs or RELAX?" then I would say without reservation that RELAX would be the right choice, but when the question is "XSD vs. RELAX" I say without reservation that XSD is the right choice. Which is not to say that XSD schemas are perfect by any means, they absolutely are not, but they are better than anything else on offer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-116483319847273585?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/116483319847273585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=116483319847273585' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/116483319847273585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/116483319847273585'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/11/relax-wins-not-so-fast.html' title='RELAX Wins: Not So Fast'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-116460437782808906</id><published>2006-11-26T22:59:00.000-06:00</published><updated>2006-11-27T00:04:12.206-06:00</updated><title type='text'>XML: Ten Year Aniversary</title><content type='html'>It's hard to believe that in a couple of weeks we will celebrate the 10 year aniversary of the first public unveiling of the XML specification at the SGML 1996 conference. It doesn't feel like it's been that long (at that time we were marveling that it had been 10 years since the publication of the SGML Standard). &lt;br /&gt;&lt;br /&gt;I will, as it happens, be at XML 2006 for a couple of days--I wonder who else from the original committee will be there? Jon Bosak, of course, as he's giving the closing keynote, and Michael Sperberg-McQueen, who's also on the program. Anyone else? I've been out of circulation for a few years so I don't even know who's still active in the community except for those members who have blogs (Tim, Eve) or who have prominent jobs (Jean Paoli) or simply are prominent (James Clark). Paul Grosso of course is still very active but doesn't go to many conferences these days. Paula has her new Paula's Texas Orange business (very tasty stuff, by the way). Does Peter Sharpe still work on XMetal (I couldn't find anything later than 1998 on Peter with a quick Google search)?&lt;br /&gt;&lt;br /&gt;Being a member of the XML Working Group was a singular experience and one I'll treasure. My personal contribution to the final form of XML was fairly slight I would say but we all contributed. I fought for some things that didn't get in, probably for the best. In hindsight I would have fought for some more things to be left out (entities, notations).&lt;br /&gt;&lt;br /&gt;As a standards-making activity it was unique: a small group of people with a clear common goal, consistently strong technical knowledge, diverse backgrounds and constituencies, and a task that was relatively easy: take an existing standard and simply cut away everything that wasn't absolutely essential for use on the Web. Few people appreciate that in XML 1.0 there was &lt;b&gt;&lt;i&gt;no invention&lt;/i&gt;&lt;/b&gt;. We didn't add any features to SGML, we only removed them. [Although the SGML standard had to rush to keep up with some of the syntax changes we made so that a fully-conforming SGML parser could parse XML documents correctly without any special-case code--as far as I know, that update to SGML was only implemented by James Clark in the SP parser, but it was important at the time that we do it (at that time I was also a member of the ISO Technical Committee responsible for SGML, HyTime, and DSSSL). In 1996 we had no idea if XML would sink or swim and we had to assume that SGML would still be the primary standard. Of course it didn't take long to see that XML would in fact sweep away the past with nary a glance back. {But Innodata Isogen has clients who are still using SGML systems and SGML documents, so go figure.}]&lt;br /&gt;&lt;br /&gt;XML is, if anything, a singular marketing success--we took something that already existed (SGML) and without changing anything essential about it, rebranded it and made it suddenly not only acceptable to the Web world, but &lt;i&gt;essential&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;It was also unique in the way the activity was conducted. Jon Bosak as chair realized that the only way to ensure the cleanest, purest result was to allow members to make decisions that their constituents would not necessarily have supported. Thus all deliberation of the Working Group was private and confidential. (This was also how the U.S. founding fathers operated the Constitutional Convention that resulted in the current U.S. constitution.) This allowed us to focus almost entirely on the technical issues at hand.&lt;br /&gt;&lt;br /&gt;Together, these aspects of the Working Group allowed us to produce XML in record time (about 18 months, if memory serves). Certainly no specification of any import within the XML family has been developed as quickly since then. Even namespaces, which should have been a no-brainer, took at least two years (first published in 1999, started spring 1997). It was namespaces that drove me out of the W3C standards arena at the time--I was upset by Tim B-L's insistence that we use attributes to declare namespaces (I wanted to use processing instructions). At the time I felt that Tim's overriding of the consensus of the Working Group was unacceptably heavy handed, that he should be a benevolent dictator and what's the point of having technical experts if you're not going to respect their decisions? That is still a potential problem with the W3C as a standards-making body--it is ultimately controlled by a single person (it doesn't really matter who that person is, just that it's a single person). It didn't help that I was also burned out from working on both HyTime and XML at the same time, while working on a very demanding project for work (Boeing's eMOD system). So maybe I was looking for an excuse to get out of the standard's business for a while. But in 1997 it still wasn't clear the degree to which the Web would become the primary platform for information systems. &lt;br /&gt;&lt;br /&gt;In hindsight Tim's decision was probably the best one--my objections had more to do with invading the user's name space (the set of names that a user can choose for element types and attributes) than for any particular love of PIs. Charles Goldfarb had thoroughly schooled me in the inviolability of the user's name space and many of the features of HyTime are there specifically to avoid any need to take away names from users.&lt;br /&gt;&lt;br /&gt;Of course, in the much more pragmatic world of the Web, we make a small (and essentially insignificant) sacrifice of a few names in order to provide an essential feature (clear name disambiguation) with the simplest possible syntax.&lt;br /&gt;&lt;br /&gt;But that's all history now. XSL-FO brought me back into the W3C fold and I'm happy to be there, although my patience for standards work has, I think, reached its lifetime limit.&lt;br /&gt;&lt;br /&gt;One of the things that happened when XML was announced was that everybody wanted to be involved. The way the W3C works (or at least worked then--I haven't checked the participation rules lately) is that any W3C member can place anybody they want on any working group. Over night, the XML Working Group went from being a small group of people with a common goal and universal respect for each other to a large group of largely competing interests, many of whom had no particular technical interest but only wanted to protect or promote their business interests, for good or ill. This had the effect of slowing progress way down, as the sheer number of people made communication difficult and made it easy for anybody to impede progress just by monopolizing the weekly conference call.&lt;br /&gt;&lt;br /&gt;This seems to be the way that many, if not all, of the important standards get done these days, with standards activities being more like battlefields and less like technical working groups. And of course this has its highest expression in the battles of competing standards produced by different bodies (i.e., OASIS vs. ECMA vs. W3C vs. ISO). &lt;br /&gt;&lt;br /&gt;I still marvel that XQuery ever made it to Proposed Recommendation at all.&lt;br /&gt;&lt;br /&gt;I definitely miss the old days, when a standard could be developed by a handful of very sharp technical folks more or less without non-technical interference. I don't see those days returning any time soon--standards are too important to business to let the technoids run the show and the days of big industry investing in big standards is long gone. Today most standards activity is funded either by marketing budgets or by the personal commitments of single practitioners who have made a living out of being an expert in their chosen standards. You do see some standards, like DITA, being driven largely by the user community, both that's because it's an end-user application standard in which the users have an immediate and obvious stake, rather than an infrastructure standard that only a few people really understand or care enough about to invest time and money in standardizing it (XSL-FO, for example).&lt;br /&gt;&lt;br /&gt;So here's hoping I'll get the chance to catch up with some of my former XML Working Group collegues, see how their lives have changed in the last ten years, see where they are driving the future. &lt;br /&gt;&lt;br /&gt;See you in Boston.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-116460437782808906?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/116460437782808906/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=116460437782808906' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/116460437782808906'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/116460437782808906'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/11/xml-ten-year-aniversary.html' title='XML: Ten Year Aniversary'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-116058133312313390</id><published>2006-10-11T10:01:00.000-05:00</published><updated>2007-03-07T10:28:12.851-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='topicmap opencyc xtm'/><title type='text'>Topic Maps, Knowledge, and OpenCyc</title><content type='html'>The recent comment about XTM (&lt;a href="http://www.topicmaps.org/xtm/"&gt;XML Topic Maps&lt;/a&gt;) reminded me that I ought to express a thought I've been having about topic maps for a long time.&lt;br /&gt;&lt;br /&gt;First, let me say that I've been involved with topic maps from the first moment the name was coined, way back at the CApH meeting in 1992. Our original goal was to define a &lt;i&gt;simple&lt;/i&gt; application of HyTime that would put that very abstract and wide-ranging standard into a concrete application context that people could readily understand. The target use case was the generic representation of back-of-the-book indexes and thesauri.&lt;br /&gt;&lt;br /&gt;If you think about a back-of-the-book (botb) index it is nothing more than a set of terms and phrases linked back to the specific content relevant to those terms. It's not a huge leap to go from the idea of a print botb index to a more general collection of terms and links into any data that can be linked to (which with HyTime was any data at all). &lt;br /&gt;&lt;br /&gt;At its simplest a botb index is just a flat list of terms with links, with no explicit relationship between the terms.  Of course, some groupings represent categorizations rather than just shortcuts (i.e., pastry/pies represents a kind-of classification (pie is a kind of pastry), while a first-level entry of "pie" with "apple" and "peach" is just a shortcut for the two entries "pie, apple" and "pie, peach". That is, in this second case, "pie" does not classify "apple" and "peach" but just forms two phrases that happen to both start with "pie".&lt;br /&gt;&lt;br /&gt;Another characteristic of indexes is that the same concept will be represented in different ways, i.e., "pie, apple" and "apple pie". &lt;br /&gt;&lt;br /&gt;There is also some amount of cross-concept linking via "see" and "see also" links: "pie, see also tarts".&lt;br /&gt;&lt;br /&gt;Finally, there may be indications of controlled vocabularies: "crumble: see cobbler" where certain terms are depricated by refusing to index them directly.&lt;br /&gt;&lt;br /&gt;Trying to abstract this a little bit leads to this basic data model for indexes:&lt;ul&gt;&lt;li&gt;"concept" or "topic": the abstract thing being indexed. A topic may have any number of names. Topics are objects that have well-defined identity within some bounded scope.&lt;/li&gt;&lt;li&gt;"name" or "alias": an arbitrary human-readable label for a topic. For example "apple pie", "pie, apple", "tarte au pomme", etc.&lt;/li&gt;&lt;li&gt;"association": a relationship between two topics indicating that they are related in some way, i.e.: is-a, part-of, parent-of, related-to, etc. The set of possible association types is unbounded. In the context of an index, typical relationships would be "is-a" (e.g., pie-&gt;apple, pastry-&gt;pie), "similar-to" (see-also), etc.&lt;/li&gt;&lt;li&gt;"instance": any data object that is linked to from a topic to indicate that the link target is in some way an instance of the concept. For example, a topic for the concept apple pie might link to an apple pie recipe as well as a picture of an apple pie as well as the Wikipedia entry for apple pies&lt;/li&gt;&lt;/ul&gt;Given this data model it should be pretty easy to see how you could represent the data in a botb index and then generate a traditional index from it (for example, for each topic you would create an index entry for each of its names, sorted appropriately, is-a relationships would imply second-level entries, and so on. You start to run into some practical problems, such as out of all the names a topic might have, which ones do you use in the index, but that can be handled by having application-specific metadata for the names (i.e., national language, use context, etc.). For example, the topic for "apple pie" might have the names "apple pie" and "apple", with the name "apple" flagged as "use for subordinate index entries", allowing you to then construct the entries:&lt;pre&gt;pie&lt;br /&gt;   apple&lt;br /&gt;apple &lt;br /&gt;   pie&lt;/pre&gt;But not "apple pie" (which would be redundant with the general entry for "apple").&lt;br /&gt;&lt;br /&gt;Thesauri lead to a similar data model.&lt;br /&gt;&lt;br /&gt;If you look at the data model, in particular the associations, you start to see that you can easily construct arbitrarily sophisticated systems of relationships among topics. A set of is-a relationships is a taxonomy or ontology (depending on how you define those terms). A set of part-of relationships is an assembly tree or a bill of materials. &lt;br /&gt;&lt;br /&gt;At this point, what we have would translate to a syntactically simple (in the sense there there are a relatively small set of element types, required properties, and core semantics) way of representing things like indexes, thesauri, navigation hierarchies, taxonomies, ontologies, and so on. A very useful thing, especially for interchange and interoperation.&lt;br /&gt;&lt;br /&gt;Given this simple but powerful model you start to see that it could be usefully applied to the general problem of "metadata management", that is the definition of metadata schemas (taxonomies, ontologies, what have you) and the association of relevant metadata to specific objects (i.e., documents in a repository, Web pages, data captured through data mining, etc.). In particular, it provides a clear, standard, generic way to unilaterally apply metadata to objects. In addition, by using queries to link from topics to their instances, you can bind things based on their inherent metadata (i.e., if I have a database of recipies that are already tagged by type of dish and ingredients, I can use a query to link from my apple pie topic to any recipe instance of dish type "pie" and main ingredient "apple").&lt;br /&gt;&lt;br /&gt;This allows you the possibility of layering any number of descriptive metadata sets over existing data sets. A very useful thing to be able to do.&lt;br /&gt;&lt;br /&gt;From there you start to think that you could start to represent &lt;i&gt;knowledge&lt;/i&gt; as a set of topics and associations. &lt;br /&gt;&lt;br /&gt;This is where I think things have gone wrong in the topic map world. I realize that this is not necessarily a popular or welcome opinion: the topic map community has tried to bill itself as one of the primary players in the knowledge management domain (along with RDF and related approaches, such as OWL and whatnot).  Michael Sperberg-McQueen gave an excellent closing keynote [I could stop there--it's always true] at one of the Extreme Markup conferences where he provided a hilarious comparison of RDF and Topic Maps and made it pretty clear that both were just different views of the same space and that neither was complete nor could it be. [See also &lt;a href="http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems"&gt;Goedel's Incompleteness Theorems&lt;/a&gt;.] So consider this a minority dissenting voice.&lt;br /&gt;&lt;br /&gt;And let me be clear: I think that topic maps are useful and attractive as far as they go: for the general business problem of managing metadata and associating it with data objects, it's well suited and well thought out.&lt;br /&gt;&lt;br /&gt;Why do I think that topic maps (and anything similar, such as RDF) is not suitable for knowledge representation?&lt;br /&gt;&lt;br /&gt;For the simple reason that knowledge representation is much more sophisticated and subtle than just topics with associations. That is, having topics with associations and a processor that can examine those is necessary but not sufficient for enabling true knowledge representation and true knowledge-based processing (that is, automatic processes that can do useful things with that knowledge, such as reliably categorize and index medical journal articles or make sense out of a vast pool of intercepted emails or analyze financial information to find market trends).&lt;br /&gt;&lt;br /&gt;That is, "knowledge management", to be truly useful, has to start doing things that heretofore only humans could do.&lt;br /&gt;&lt;br /&gt;I came to this understanding when I started trying to use the &lt;a href="http://www.opencyc.org/"&gt;OpenCyc&lt;/a&gt; system to do reasoning on topic maps.&lt;br /&gt;&lt;br /&gt;The Cyc system is the brainchild of Doug Lenat, who had the idea that the only way to create a true artificial intelligence was to build up a massive database of "common sense", that is facts about everything in the world. The hypothesis was that given a rich enough body of such facts and an appropriate reasoning engine that the system would be able to do useful, reliable, and unique reasoning about anything, not just the narrow domains to which expert systems had been applied at the time (this was in the mid-to-late 80's). Doug figured it would take about 10 years to build up the the initial database of facts and set about doing it by hiring people from pretty much any and all domains to start putting in facts and assertions. After about 10 years they had their first success and went from there.&lt;br /&gt;&lt;br /&gt;[Historical note: Steve Newcomb invited Doug to give the keynote at the first HyTime conference--it was electrifying because Doug had assumed there would be conflict between the hypertext people and the expert systems people, but he was pleasantly surprised to discover that in fact we saw a powerful potential synergy in connecting authored links to the power of something like Cyc to do automatic linking to existing, undifferentiated data. We all had a wonderful dinner with Doug that night--I've been a Cyc watcher and fan ever since. It was at an Extreme Markup convention (I think) when Cycorp announced OpenCyc--we were all very pleased at that announcement.]&lt;br /&gt;&lt;br /&gt;Anyway, for a brief moment I had a little extra time and some business need to play around with topic maps (my work assignments have never involved topic maps). I wanted to see what I would get if I applied Cyc's common-sense reasoning ability to an arbitrary topic map and what it would take to marry the two. I was going on the hypothesis that there would be a reasonably direct mapping from the data in a topic map into however Cyc holds its data. Certainly there would be a way to represent topics as objects and there should be a way to represent associations. By then associating specific topics with existing concepts in the Cyc database it should be possible to either have Cyc reason about the topic map based on what it already knows or extend its knowledge with the facts in the topic map and then apply its reasoning engine to those.&lt;br /&gt;&lt;br /&gt;Cyc has both an XML representation format for its data as well as a Python API, both of which made getting the topic maps into Cyc easy enough. However, at the time I tried this I was limited by the limitations in the OpenCyc database, which reflected only a fraction of the total Cyc database available in the commercial product. Doh! However, I notice that OpenCyc 1.0 claims to include the entire Cyc database. That would make a big difference.&lt;br /&gt;&lt;br /&gt;But more importantly, I quickly realized that the way Cyc represents the world is much much more sophisticated than a simple set of topics and associations. Here is a quote from the Cycorp Web site that explains the basic Cyc model:&lt;blockquote&gt;The Cyc KB is divided into many (currently thousands of) "microtheories", each of which is essentially a bundle of assertions that share a common set of assumptions; some microtheories are focused on a particular domain of knowledge, a particular level of detail, a particular interval in time, etc. The microtheory mechanism allows Cyc to independently maintain assertions which are prima facie contradictory, and enhances the performance of the Cyc system by focusing the inferencing process.&lt;/blockquote&gt;This notion of "microtheories" and the ability to organize them in various ways reflects a degree of sophistication that goes far beyond what you get with topic maps alone. Add to that the sophistication of the reasoning heuristics and the way that the rules have been crafted and meriod other details of how the concepts and assertions are represented and you quickly start to realize that there is a lot more to "knowledge" representation than topics and associations. You also quickly realize that the mechanisms defined by the topic map specifications alone are nowwhere near enough to represent knowledge in a way that enables non-trivial automatic reasoning.&lt;br /&gt;&lt;br /&gt;At a minimum, there's a whole other layer of semantics and descriptive metadata that has to be added to the information in a topic map to make it approach the completeness of a Cyc's knowledgebase. For topic maps to be useful for true knowledge representation these semantics and metadata would have to be defined and standardized, which is of course possible, but much much harder to do than standardizing the base topic map syntax itself (which itself took over 10 years, which is pretty remarkable considering how simple it appears to be at first glance).&lt;br /&gt;&lt;br /&gt;Thus my conclusion that topic maps, by themselves, do not in any really meaningful way "capture knowlege". They can at best provide identifying objects for concepts, express simple facts about those concepts in relation to each other, and bind those facts to instances of the concepts. But that's it. This is &lt;i&gt;information&lt;/i&gt;. Very useful information and a sophisticated way to capture it, but it is not knowledge.&lt;br /&gt;&lt;br /&gt;You could of course argue that what's in Cyc is not really knowledge either, but you cannot deny that whatever is in Cyc, it's much closer to being knowledge then a topic map can be.&lt;br /&gt;&lt;br /&gt;But I still think it would be a useful experiment to see what you get if you try to apply Cyc to arbitrary topic maps. If OpenCyc's knowledgebase is really complete then this could be quite fruitful.&lt;br /&gt;&lt;br /&gt;One key challenge is binding the topics in the input topic map to the correct concept or micortheory in the Cyc knowledgebase. This gets you to the fundamental problem of subject identification, which is something topic maps try to address through the notion of subject identifiers. An interesting question for Cyc to try to answer would be "given two topics are these topics about the same subject?".&lt;br /&gt;&lt;br /&gt;One of the subtleties of Cyc that really got me to realize how involved the subject is was the question of vampires. If your domain is "the real world" then of course you know that vampires don't exist (that is undead humans who drink the blood of the living) (except that some people must believe they do...hmmm). But if you are in the domain literature then of course vampires do exist because there are endless books that feature vampires as characters. So clearly any system that hopes to be able to model everything has to be able to hold at once the fact that vampires don't exist and that they do and keep the contexts in which those statements are and are not true clearly distinct in a way that still allows them to be used together ("Bella Lugosi, a real human, played an (imaginary) vampire in motion pictures." or "the vampires in Bram Stoker's &lt;i&gt;Dracula&lt;/i&gt; follow very different rules from the vampires in Ann Rice's vampire books."). Clearly it's not sufficient to have a single subject "vampire" but multiple related subjects in different knowledge contexts.&lt;br /&gt;&lt;br /&gt;In any case, I found it a humbling relevation and turned my attention to the more concrete challenges of automated composition and technical document authoring and management, content to leave knowledge representation to the experts.&lt;br /&gt;&lt;br /&gt;But that's just me....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-116058133312313390?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/116058133312313390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=116058133312313390' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/116058133312313390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/116058133312313390'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/10/topic-maps-knowledge-and-opencyc.html' title='Topic Maps, Knowledge, and OpenCyc'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115998535589830438</id><published>2006-10-04T11:28:00.000-05:00</published><updated>2007-03-07T10:28:45.996-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='pdf &quot;pdf data extraction&quot; pdfbox'/><title type='text'>PDF Processing Fun</title><content type='html'>The company I work for does data conversion as a large part of its business. Some of this involves extracting content from PDF documents to turn it into XML in whatever schema the customer specifies.&lt;br /&gt;&lt;br /&gt;This is, in the general case, a difficult, if not impossible problem in some pathological cases. This is because PDF, like PostScript on which it is based, is a graphic command language optimized for drawing glyphs (graphical representations of characters) in a sequence of two-dimensional spaces called pages.&lt;br /&gt;&lt;br /&gt;For example, there's no requirement that any two characters have any particular relationship to each other in the PDF data stream--their visual relationship is entirely a function of their (possibly coincidental) placement on the page next to each other. Two glyphs that are adjacent on the rendered page need not be adjacent in the PDF data stream. &lt;br /&gt;&lt;br /&gt;For the most part PDF documents are not quite this pathological but they still aren't necessarily easy to process. For example, in nicely typeset documents a sequence of characters can be expressed in the PDF as a sequence of characters and positioning values, representing the kerning between characters, something like this:&lt;pre&gt;[(a)4(b)-0.35(c)-0.64(def)] TW&lt;/pre&gt;&lt;br /&gt;Rendered this will be "abc def" where the characters are closer or nearer, with a lot of space between the "c" and "d" (but no literal space character).&lt;br /&gt;&lt;br /&gt;An obvious challenge is determining whether or not the space between the "c" and "d" should be captured as a literal space or not--that turns out to be a big problem.&lt;br /&gt;&lt;br /&gt;Another basic problem is paragraph detection. In the PDF, each line of text is specified as one or sequences like that shown above. There's nothing in the PDF that explicitly tells you that the lines form a larger logical construct. But of course for creating marked up documents you really need to know what are the paragraphs.&lt;br /&gt;&lt;br /&gt;This turns out to be really hard problem. And there's there's stuff like table recognition, dehyphenating words at the ends of lines, handling sentences and paragraphs that break across pages, isolating things like headers and footers, and so on, that make the problem challenging.&lt;br /&gt;&lt;br /&gt;We've been experimenting with some commercial tools, which I can't really mention, that do a lot of this--there's obviously some clever folks at these companies. But they tend to charge a pretty hefty price for their software--I think it's comensurate with the value they provide but it still places it out of the reach of a lot of casual users. &lt;br /&gt;&lt;br /&gt;For my own work, I needed a quick way to just get at the raw PDF data within a document, partly so I could see why the tools we are using were not always giving the answer we wanted in order to see if it was a bug or just pathological PDF data. &lt;br /&gt;&lt;br /&gt;So I poked around for an open-source, Java-based PDF library and found &lt;a href="http://www.pdfbox.org/"&gt;PDFBox&lt;/a&gt;, which appears to be a very complete library for reading and writing PDF, including things like providing the X/Y location of constructs (a challenge because it requires essentially rendering the data just like Acrobat Reader would--PDF allows quite complex chains of calculations that all contribute to the final location and orientation of any given graphical element). I haven't had a chance to really push on it but I did read over the docs and the API and some of the sample apps they provide and it looks pretty promissing. &lt;br /&gt;&lt;br /&gt;PDFBox doesn't do everything the commercial tools do--it doesn't do the sort of synthesis of higher-level constructs that the commercial tools do (paragraph recognition, dehyphenation, etc.), which is where the core value of the commercial tools are (these are hard problems) but it looks like it provides enough raw functionality to let you develop these features to one degree or another. The problems are challenging but they're also interesting puzzles too.&lt;br /&gt;&lt;br /&gt;Anyway, Dr. Macro says check out PDFBox.&lt;br /&gt;&lt;br /&gt;I'm particularly pleased to see PDFBox because several years ago, after a very painful experience with a particularly poorly-implemented Java PDF library (which shall remain nameless), I started implementing my own but only got as far as reading and writing pages, not page components. The project (PDF4J on Sourceforge) sat idle until recently, when I deactivated it in the face of the existence of PDFBox--I don't mind having PDF4J obsoleted; far from it I'm happy to see that someone did what I didn't have time or energy to do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115998535589830438?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115998535589830438/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115998535589830438' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115998535589830438'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115998535589830438'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/10/pdf-processing-fun.html' title='PDF Processing Fun'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115844062546485820</id><published>2006-09-16T15:43:00.000-05:00</published><updated>2007-03-07T10:29:06.279-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xquery'/><title type='text'>XQuery: Not So Bad After All</title><content type='html'>I've recently finally had a need to use XQuery for something--up to this point everything I've done with XML since XQuery was solidified was with XSLT and DOM programming. I've been observing XQuery since the effort was started back in the dim mists of time and, like XML Schemas, had little hope that it would ever see the light of day, for the simple reason that there seemed to be too many cooks and too many different requirements for the committee to ever reach a useful concensus on things like syntax and semantics. In particular, there seemed to be a fundamental chasm between the database people who wanted an SQL for XML, and the document people, who wanted XSLT with result sets. But that was from a distant vantage point with little direct visibility into the activity other than getting all all the function-related committee email because I'm a member of the XSL working group (not that I actually read any of that email unless the subject line was particularly intriguing and I had the time to devote to reading it).&lt;br /&gt;&lt;br /&gt;But of course my pessimism was unfounded and XQuery has emerged as a solid and useful specification with a number of implementations. &lt;br /&gt;&lt;br /&gt;I've now been constructing simple XQueries for a couple of weeks and I must say it's pretty cool to do with a simple query what would take a good bit more work to do with an XSLT script (given a running XQuery-supporting repository, of course). &lt;br /&gt;&lt;br /&gt;I also found Mike Kay's &lt;a href="http://www.stylusstudio.com/xquery_primer.html"&gt;Learn XQuery in 10 Minutes&lt;/a&gt; to be a very helpful startup guide, providing just the how-to information I needed to get the basic syntax and techniques. The rest of XQuery (at least the part I've used) is pretty obvious and intuitive to anyone familiar with XSLT.&lt;br /&gt;&lt;br /&gt;Of course, I haven't had the opportunity or time to determine to what degree various tools provide complete and correct implementations of XQuery, but I'm sure I will. By the same token, the standard has a solid set of test cases that make it pretty hard to not know if you are doing it both correctly and completely. &lt;br /&gt;&lt;br /&gt;My main concern would collation, which was very broken in XSLT (in the sense that the mechanism for specifying custom collators for doing sorting was not well standardized and was only usefully implemented by Saxon for the purposes of doing XSLT processing of localized documents [i.e., back-of-the-book index collation]). I know that XSLT 2 (and therefore XQuery, which share the same collation semantics) have attempted to be more general but when I first looked at what was in Saxon 8 (a couple years ago now) it wasn't quite what I needed (you had to declare a separate collation URI for each locale, while I wanted a single collation URI that named a collator that then did the right thing at the right time based on an outside configuration mechanism). [With Saxon 6 you had to implement per-locale classes that Saxon used based on an invariant mapping of locale names to collator class names. At least the XSLT 2 mechanism is more general.]  &lt;br /&gt;&lt;br /&gt;But I haven't had time or business need to push on the XSLT 2/XQuery collation mechanism for a while so I really don't know. But I suppose I really should, because as far as I can tell I'm about the only person who really worries about this particular issue (I developed a generic index configuration and collation support library for use with Saxon, which is available here: &lt;a href="http://www.innodata-isogen.com/knowledge_center/tools_downloads/i18nsupport"&gt;Internationalization Support Library&lt;/a&gt; [note: log-in may be required. If this is a problem, send me an email and I'll forward you a copy.]. Note that this code is equally applicable to XSL-FO 1.0 and the new indexing support in XSL-FO 1.1 as the FO indexing is only about constructing sequences of page numbers and not about sorting the index entries themselves. In addition, this code is equally useful for things like generated glossaries.)&lt;br /&gt;&lt;br /&gt;One interesting question that I've already run into is when engineering a complete Web site to serve XML data and queries against it, how much should be done in XQuery alone and how much should be done with more traditional Web site technologies such as JSP or Ruby? You can of course use XQueries to generate HTML pages that reflect the query results and can therefore use XQuery exclusively to build a Web site (given some sort of CGI-like facility, such as the extensions that MarkLogic provides or just everyday CGI scripts) but should you? &lt;br /&gt;&lt;br /&gt;My initial instinct is that you should not, that good engineering practice argues for clear separatation of concerns and that XQuery should focus on queries and something else should focus on the user interface. But I'd be curious if anyone has a strong counter argument. One of the things that's attractive about the do-it-all-in-XQuery approach is that you can build stuff really quick because there's little overhead, so it is good for proofs of concepts and demos. But I can't see it being a sustainable approach for production Web sites (although I'm sure more than one person will answer that they've been doing it for years now).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115844062546485820?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115844062546485820/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115844062546485820' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115844062546485820'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115844062546485820'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/09/xquery-not-so-bad-after-all.html' title='XQuery: Not So Bad After All'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115814001124286834</id><published>2006-09-13T03:37:00.000-05:00</published><updated>2007-03-07T10:29:46.545-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; indirection xinclude'/><title type='text'>XCMTDMW: Why Indirection is So Important for Authoring</title><content type='html'>[This is a continuation of the discussion of linking and addressing that left off here: &lt;a href="http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-element-to-element-linking.html"&gt; XCMTDMW: Element to Element Linking: Overview&lt;/a&gt;]&lt;br /&gt;&lt;br /&gt;In my last installment we saw that when we had a direct link to an element, that when a new version of that element was created in a way that changed the location of the element so that the existing pointers to it would no longer resolve to the correct element, that we had to create new versions of the documents with the pointers.&lt;br /&gt;&lt;br /&gt;That is, we started with doc_01.xml, which has an XInclude link to a warning:&lt;pre&gt;&amp;lt;?xml version="1.0?&gt;&lt;br /&gt;&amp;lt;doc&gt;&lt;br /&gt;...&lt;br /&gt;&amp;lt;:xi:include href="../common/warnings/dont_run_scissors.xml"/&gt;&lt;br /&gt;...&lt;br /&gt;&amp;lt;/doc&gt;&lt;/pre&gt;Which points to dont_run_scissors.xml:&lt;pre&gt;&amp;lt;?xml version="1.0"?&gt;&lt;br /&gt;&amp;lt;warning&gt;&lt;br /&gt;&amp;lt;p&gt;Don't run with scissors.&amp;lt;/p&gt;&lt;br /&gt;&amp;lt;/warning&gt;&lt;/pre&gt;Thus at time T[1] we have two resources (in the SnapCM sense), one for doc_01.xml and one for dont_run_scissors.xml, and one version of each resource. &lt;br /&gt;&lt;br /&gt;The XInclude link relates itself to the warning element that is the root element of dont_run_scissors.xml.&lt;br /&gt;&lt;br /&gt;At time T[2] we create a new resource warnings.xml and create the first version of that resource by copying the warning from dont_run_scissors into it, along with other warnings we have:&lt;pre&gt;&amp;lt;?xml version="1.0"?&gt;&lt;br /&gt;&amp;lt;warning_set&gt;&lt;br /&gt;&amp;lt;warning&gt;&lt;br /&gt;&amp;lt;p&gt;Don't run with scissors.&amp;lt;/p&gt;&lt;br /&gt;&amp;lt;/warning&gt;&lt;br /&gt;&amp;lt;warning_set&gt;&lt;br /&gt;&amp;lt;warning&gt;&lt;br /&gt;&amp;lt;p&gt;Don't stand on the top rung of a step ladder&amp;lt;/p&gt;&lt;br /&gt;&amp;lt;/warning&gt;&lt;/pre&gt;At this point, time T[2] (each T time reflects a snapshot in time of the repository state, reflecting the commitment of an invariant version of one or more resources into the repository), if we resolve the XInclude link in doc_01.xml, what will we get? We'll get the warning element in dont_run_scissors.xml, for the simple reason that that's where the XInclude is pointing to in the first (and so far only) version of resource doc_01.xml. &lt;br /&gt;&lt;br /&gt;But we know that there's a new version of the warning (how we know is a question that we'll come back to later--for now assume that we talked to Jane, the Warning Mistress, and she happened to mention that she had decided to reorganize all the warnings into a single document). &lt;br /&gt;&lt;br /&gt;As the author of doc_01.xml that leaves us with a choice: do we leave things as they are, knowing that we'll forever get that original version of the warning or do we react to the change in location of the warning? Of course we must react because we want to make sure we get the latest version of the warning content. &lt;br /&gt;&lt;br /&gt;This means that we must create a new version of doc_01.xml that differs only in the &lt;i&gt;form of address&lt;/i&gt; used to include the warning. Note that the warning text has not changed nor has the meaning of the warning or how we are using it in doc_01.xml[v1] (meaning version 1 of resource doc_01.xml). &lt;br /&gt;&lt;br /&gt;This is a problem: nothing about the &lt;i&gt;information content&lt;/i&gt; of the elements involved has changed: the content is the same, the semantics are the same, and doc_01.xml had no other need to change. Yet a simple relocation of the warning element forces us to create a new version of doc_01.xml. And not just doc_01.xml, but &lt;b&gt;&lt;i&gt;every document that uses that warning&lt;/i&gt;&lt;/b&gt;, which could be a very large number of documents indeed, if it's a common warning. &lt;br /&gt;&lt;br /&gt;This is clearly not good. It is especially not good if our addresses are to specific versions of resources and not to resources (which are then resolved to specific versions using some policy such as "latest visible version"). This is because the simple act of creating a new version would require that all pointers to previous versions would at least have to be evaluated and quite likely new versions of the documents containing those versions would also have to have new versions created, which would require analysis of the pointers to &lt;i&gt;those&lt;/i&gt; versions and so on. In the worst case, creation of a single new version of a resource requires creation of new versions of all the other documents in the repository. Not good. &lt;br /&gt;&lt;br /&gt;How can we address this problem? Here are some options:&lt;br /&gt;&lt;br /&gt;A. Disallow the reorganization (effectively requiring that every warning be a separate document)&lt;br /&gt;&lt;br /&gt;B. Assign each warning some sort of universal identifier by which it can be addressed irregardless of its storage location&lt;br /&gt;&lt;br /&gt;C. Somehow automate the creation of the new versions with rewritten links when the target element changes in a way that requires a new pointer. &lt;br /&gt;&lt;br /&gt;None of these options is particularly attractive. Option A either turns your respository into a lava flow that can't be changed once constructed or requires you to, as a matter of practice, decompose everything at the lowest level at which you &lt;i&gt;might&lt;/i&gt; want to reuse elements, creating a potentially huge collection of small objects, most of which will never in fact be used.&lt;br /&gt;&lt;br /&gt;Option B works but requires that you use non-standard addressing mechanisms (because there is no &lt;i&gt;standard-defined&lt;/i&gt; space of universal identifiers for elements by which they can be addressed using a standard-defined means, at least for W3C standards. That is, if you use some sort of repository-specific object ID or UUID or whatever, it is, unavoidably, proprietary and you (or your repository provider) will be on the hook for implementing all the addressing infrastructure needed to work with those IDs. This ties your &lt;i&gt;data&lt;/i&gt; to a proprietary system. People do it all the time but that doesn't make it a good thing, especially when it can be easily avoided. &lt;br /&gt;&lt;br /&gt;Option C doesn't really solve the problem, it just hides the problem from users and slows the system down. Don't do that.&lt;br /&gt;&lt;br /&gt;The problem of course is that the problem cannot be solved using direct element-to-element addressing. The problem can only be solved by introducing at least one level of indirection.&lt;br /&gt;&lt;br /&gt;That is, rather than pointing to the warning directly, we point to something that then, by some mechanism, gets us to right version of the warning at the right time.&lt;br /&gt;&lt;br /&gt;In programming terms this is basic pointer stuff. But in XML linking terms it's a problem because there is no W3C standard that defines any form of indirect address resolution. Think about that. &lt;br /&gt;&lt;br /&gt;It does in fact make some sense, because the Web standards are almost entirely focused on information &lt;i&gt;delivery&lt;/i&gt;, not authoring. For delivery there is little value in indirection because the information to be delivered is invariant--that is, when you publish one page you can publish all the other pages as well and therefore everything can just point directly to what it needs to--versioning issues don't really apply in the delivery space in the same way they do in the authoring space.&lt;br /&gt;&lt;br /&gt;But for authoring we must have indirection. Of course the HyTime standard's addressing stuff is nothing but indirection. It's so indirect you can barely make out what you can actually do. But since HyTime is not a realistic option we need something more Web friendly. To satisfy that requirement I defined the XIndirect specification and submitted it as a Note to the W3C: &lt;a href="http://www.w3.org/TR/2003/NOTE-XIndirect-20030612/"&gt;XML Indirection Facility&lt;/a&gt; (I presented a paper on it at Extreme Markup 2003: &lt;a href="http://www.idealliance.org/papers/extreme/Proceedings/html/2003/Kimber01/EML2003Kimber01.html"&gt;XIndirect&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;XIndirect is the simplest thing that could possibly work. It defines two element types, one of which is just for convenience:&lt;br /&gt;&lt;br /&gt;- indirector, which takes an href= attribute that points to the desired ultimate resource target (I should probably update the note to reflect the XInclude-style href=/xpointer= attribute pairs but the function would be the same). An indirector element can have a unique ID. The indirector element has the default semantic of "redirect to my target resource" when it is itself the target of a pointer (of any sort).&lt;br /&gt;&lt;br /&gt;- indirectorset, which contains zero or more indirector elements. It's just for convenience, for example, if it's useful to group a bunch of indirector instances together under a common root element or to bind documentation or application-specific metadata to a bunch of indirectors or whatever.&lt;br /&gt;&lt;br /&gt;As far as the XIndirect spec is concerned, indirector elements can occur anywhere--it doesn't matter where they are.&lt;br /&gt;&lt;br /&gt;One important aspect of XIndirect is that it can be used unilaterally with any other linking or addressing scheme--all it requires is that the software that does the address resolution be XIndirect aware so that it knows to resolve the indirections. At the implementation level this is usually expressed as a recursive function that resolves an input pointers and, if that pointer resolves to one or more indirectors, applies itself to those, otherwise it returns whatever it got that wasn't indirectors. A complete implementation also requires cycle detection and hop counting so you can avoid infinite loops or can bail if the resolution is taking too long, but those are frills.&lt;br /&gt;&lt;br /&gt;Given the XIndirect spec and the indirector element we can now start building a more-or-less standards-defined, pure-XML indirect address management system.&lt;br /&gt;&lt;br /&gt;The key to this system is taking advantage of the SnapCM resource concept to create proxy &lt;i&gt;storage object resources&lt;/i&gt; for the element targets of our links. Remember that in our abstract versioning system, only storage objects are versioned. Therefore it follows that to version something you must either make it a storage object or create a storage object proxy for it. The first option is option A above: decompose at whatever level you need so you can version something. The section option is our new option D: use indirection.&lt;br /&gt;&lt;br /&gt;OK, lets turn back the clock to time T[0], before we had created doc_01.xml and its XInclude link to the warning. At time T[0] we have the resource don_run_scissors.xml and its first (and only) version, which is just the single warning element as shown at the start of this post. &lt;br /&gt;&lt;br /&gt;We are tasked with creating new resource doc_01.xml and creating its first version. As part of that, we need to XInclude the warning. Here's what we do:&lt;br /&gt;&lt;br /&gt;1. We create a new document instance (outside the repository) and create an XInclude link to the warning (inside the repository). We create this link by using some sort of editor customization that lets us pick the link target from a list of available targets in the repository and constructs the link syntax for us. We select the don't-run-with-scissors warning (note we're selecting the warning, not the XML document that contains the warning--our intent as authors is to link to a warning--we don't care where that warning element is stored).&lt;br /&gt;&lt;br /&gt;2. The system does the following:&lt;br /&gt;&lt;br /&gt;a. It looks in its "where-used" index and sees that the selected warning has not yet been used as the target of any links.&lt;br /&gt;&lt;br /&gt;b. It creates a new resource rtd_01.xml and creates the first version of that resource with the following content:&lt;pre&gt;&amp;lt;?xml version="1.0"?&gt;&lt;br /&gt;&amp;lt;indirector xmlns="http://www.isogen.com/papers/xindirection.xml"&lt;br /&gt;href="../common/warnings/dont_run_scissors.xml[v1]"&lt;br /&gt;/&gt;&lt;/pre&gt;&lt;br /&gt;It commits the first version of rtd_01.xml to the respository.&lt;br /&gt;&lt;br /&gt;c. In our doc_01.xml file it creates this XInclude link:&lt;pre&gt;&amp;lt;xi:include href="/rtds/rtd_01.xml"/&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;The new resource rtd_01.xml now acts as a proxy for the warning &lt;i&gt;element&lt;/i&gt;. Each version of the resource contains a hard pointer to the warning element's location at the time the version is created. &lt;br /&gt;&lt;br /&gt;We commit our new doc_01.xml file as the first version of resource doc_01.xml, creating a new snapshot at time T[1].&lt;br /&gt;&lt;br /&gt;When we go to process doc_01.xml[v1] we resolve the XInclude link as follows:&lt;br /&gt;&lt;br /&gt;1. We resolve the url "/rtds/rtd_01.xml" to the &lt;i&gt;resource&lt;/i&gt; rtd_01.xml. Applying the default resolution policy of "latest visible" we get version rtd_01.xml[v1]. Because there is no fragement identifier we resolve the URL to the root element of the version, which is the indirector element. &lt;br /&gt;&lt;br /&gt;2. We see it is an indirector and therefore resolve &lt;i&gt;its&lt;/i&gt; href= to its target, which is to specific version v1 of resource dont_run_scissors.xml. Again there is no fragement identifier so we resolve to the root element of the version, which is the warning element. &lt;br /&gt;&lt;br /&gt;3. The warning element is returned as the ultimate target of the XInclude link and the normal XInclude semantics are applied to it.&lt;br /&gt;&lt;br /&gt;Whew. &lt;br /&gt;&lt;br /&gt;Now Jane the Warning Mistress decides to reorganize all the warnings into one file. She does this in an authoring tool that is integrated with our repository such that at the start of the editing session it gets a list of all the elements in the document that are pointed to by any links, whether they be direct links or indirectors. This list allows the editor to do what it can to keep the links consistent as the data is changed in the editor. For example, it might (as a matter of policy) disallow the deletion of any element that is a link target or it might make sure that if a target element is copied that the copy is not confused with the original or if an element is copied from a different resource that it remembers that the copied element was a link target in its previous context.&lt;br /&gt;&lt;br /&gt;In this case Jane copies the warning from dont_run_scissors to her new all_warnings.xml document. The editor sees that the warning was the target of an indirector link and remembers that so it can do the right thing at commit time.&lt;br /&gt;&lt;br /&gt;When Jane commits her new all_warnings.xml it creates a new resource all_warnings.xml and the first version of it. As part of the commit process, because the authoring tool knows that the copy of the warning in all_warnings.xml was a copy of the original warning, it also creates a new version of rtd_01.xml that reflects the new version of the warning:&lt;pre&gt;&amp;lt;?xml version="1.0"?&gt;&lt;br /&gt; &amp;lt;indirector xmlns="http://www.isogen.com/papers/xindirection.xml"&lt;br /&gt; href="../common/warnings/all_warnings.xml[v1]#xpointer(/*/warning[1])"&lt;br /&gt; /&gt;&lt;/pre&gt;This creates a snapshot at time T[3] that now includes the new resource all_warnings.xml and its initial version.&lt;br /&gt;&lt;br /&gt;Now, when we go to process document doc_01.xml[v1], when we resolve the XInclude, we will see the indirector, resolve it to the latest version of resource rtd_01.xml, rtd_01[v2], which in turn points to the warning element in all_warnings[v1].&lt;br /&gt;&lt;br /&gt;Note that we &lt;i&gt;did not&lt;/i&gt; need to do anything to doc_01.xml for this to work--only the indirection was versioned--all uses of that indirection are unchanged, because they point to the indirection resource and not a specific version.&lt;br /&gt;&lt;br /&gt;By the same token, if we process document doc_01[v1] in the context of snapshot T[1] we will resolve the XInclude to the warning in its original location.&lt;br /&gt;&lt;br /&gt;This solves the link management problem inherent in doing hard pointing, at least as far as link representation goes. That is, given that you know to create the indirectors, once created, they just work, given XIndirect-aware address resolution where its needed. Of course, there are still a few challenges here.&lt;br /&gt;&lt;br /&gt;First, this does require pretty sophisticated authoring tool functionality for it to be practical--while the indirector elements could be created by hand no sane person would expect other sane poeple to do it as a standard practice. However, this level of sophistication is required regardless of how you manage your links and addresses: it's simply a fact that building a system that supports linking completely has to be sophisticated and there's no getting around it. You can take some shortcuts if your authors are reasonably savy and you can impose some reasonable constraints, but a fully-realized link-aware authoring environment is non-trivial. It also depends heavily on the details of how your repository manages links and addresses and indirections and versions and resources and they all do it differently.&lt;br /&gt;&lt;br /&gt;Finally, there are some inherent rhetorical challenges that can come up once you start creating version proxies for elements.&lt;br /&gt;&lt;br /&gt;One is that the same target element might be linked to for different purposes (use-by-reference, navigation, semantic association for a specific purpose, whatever). Each of these uses might really want to have its own separate indirector resource that reflects the semantic of the thing used rather than just the initial thing that happened to reflect that semantic (I think this can be usefully cast as a naming problem of the sort that Norm Walsh is currently discussing on his blog). That is, if your authors are sophisticated in their use of links for different semantic purposes they may well need sophisticated indirection support as well. At a minimum, the system has to be prepared to manage multiple indirectors for the same target element.&lt;br /&gt;&lt;br /&gt;Another inherent problem is that of notification of linkors. That is, when a new version of a element to which you link is created, you, the owner of the link, need to be informed that the new version exists so that you can decide how to react. If your reaction is simply to use the latest version you do nothing (assuming your link is using the default "latest visible" resolution policy). If your reaction is to continue to use an older version, then you either have to create a new version of your link to change the address to point to either a specific version of the indirector or point directly to the specific version of the target element (both are functionally equivalent) or you have to modify the resolution policy associated with the repository-level dependency object that reflects the version-to-resource link. This creates a new version of the dependency link but doesn't require a new version of the document that contains the link itself.&lt;br /&gt;&lt;br /&gt;In practice these two problems tend to be limited by both keeping the link semantics pretty simple (transclusion and navigation and that's it) and by imposing invariant policies for reacting to new versions that allow the link reaction to be automatic (i.e., always resolve to the latest visible version).&lt;br /&gt;&lt;br /&gt;Finally, why did I call my indirector document "rtd_01"? "RTD" stands for "referent tracking document", that is a document (in the XML sense) that tracks the versions of a "referent", that is, the target of a reference of any sort.&lt;br /&gt;&lt;br /&gt;To sum up:&lt;br /&gt;&lt;br /&gt;- For authoring purposes, the basic problem of change ripples cannot be solved except through the use of indirection.&lt;br /&gt;&lt;br /&gt;- The indirection provided by repository-level dependency links (version-to-resource links) is necessary but not sufficient when you need to address elements that are not document root elements.&lt;br /&gt;&lt;br /&gt;- The XIndirect W3C Technical Note provides the simplest possible indirect addressing syntax for use in a W3C/XML environment&lt;br /&gt;&lt;br /&gt;- By creating RTD documents (element proxies) for elements, we can track the version history of individual elements regardless of where they are stored and regardless of whether or not they are document root elements.&lt;br /&gt;&lt;br /&gt;- Tracking the version history of an element requires maintaining knowledge that given element has a proxy during the editing process so that you can accurately create new poxy versions following a change in location of the element.&lt;br /&gt;&lt;br /&gt;- With this mechanism, it doesn't matter how you address an element from an indirector--unique IDs have no particular functional advantage over simple XPaths, for example. However, IDs might have an advantage if you have to guess at the correspondence between an older version of an element and new versions that are being put into the repository, for example, as the result of an upload of a new version that was edited outside the scope of the repository. But even there IDs can only be a clue.&lt;br /&gt;&lt;br /&gt;- The actual data processing involved in resolving indirection is not a big deal (and I include a sample XSLT implementation in the XIndirect note) but there are some things to be careful of, in particular cycles and over-long sequences of indirectors.&lt;br /&gt;&lt;br /&gt;- The indirection doesn't need to literally be represented as XML documents (even tiny one-element documents). You could of course do the same thing using relational tables or whatever, and you probably should for scalability and/or performance (although I think that a tool like MarkLogic would probably scale and perform pretty well for resolution of XIndirect indirectors and answering were-used questions. Hmm. definitely worth experimenting with...).&lt;br /&gt;&lt;br /&gt;I've discussed all of this in the context of an abstract versioning model (SnapCM) and an abstract repository that manages resources, versions, and dependency links.&lt;br /&gt;&lt;br /&gt;You can use this abstract model to help you evaluate commercial or one-off XML CMS systems to see how close they come to being able to manage links completely. You will find that some, such as X-Hive's Docato product, come pretty close. Others do not.&lt;br /&gt;&lt;br /&gt;This essentially wraps up my discussion of XML Content Management The Dr. Macro Way--we've taken it all the way through to the management of element versions with sustainable link management using a completely generic and standards-based approach.&lt;br /&gt;&lt;br /&gt;About the only thing I haven't covered is the SnapCM notion of "sync", which I think is important but it is not necessary &lt;i&gt;per-se&lt;/i&gt; for doing link management as described here (although it is essential for configuration management of the linked documents).&lt;br /&gt;&lt;br /&gt;Maybe that's what I'll talk about next, who knows?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115814001124286834?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115814001124286834/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115814001124286834' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115814001124286834'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115814001124286834'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/09/xcmtdmw-why-indirection-is-so.html' title='XCMTDMW: Why Indirection is So Important for Authoring'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115720374635660312</id><published>2006-09-02T08:06:00.000-05:00</published><updated>2007-03-07T10:30:33.784-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss eclipse &quot;eclipse plug-in&quot; marklogic'/><title type='text'>XIRUSS-T Update: Eclipse Plug-in</title><content type='html'>Last Thursday I bought a copy of &lt;i&gt;&lt;a href="http://www.amazon.com/Eclipse-Building-Commercial-Quality-Plug-ins-2nd/dp/032142672X/sr=8-1/qid=1157202130/ref=pd_bbs_1/102-7714224-6492113?ie=UTF8&amp;s=books"&gt;Eclipse: Building Commercial-Quality Plug-ins (2nd Edition)&lt;/a&gt;&lt;/i&gt; by Eric Clayberg and Dan Rubel with the goal of creating an Eclipse plug-in XIRUSS client. The book is very well written and authoritative and the Eclipse plug-in framework is a remarkable piece of work, both in its overall design for extension and integration and in its execution. It makes creating plug-ins remarkably easy (at least from a getting started standpoint) and the SWT/JFace libraries for user interface components feel more solid and logical than AWT (not that I have any particular basis on which to judge as I've done very little UI development over the years).&lt;br /&gt;&lt;br /&gt;Anyway, I had to do some business travel over the weekend (tip: if you're going from Austin, Texas to Norwalk, CT, don't try to drive from Newark Airport--take the train) so I packed the book (which fortunately isn't enormous, just big) and made some progress.&lt;br /&gt;&lt;br /&gt;As of this morning I have a very simple repository tree viewer that will reliably navigate the branch-snapshot-version structure of a running repository. The next step will be to make it sufficiently sophisticated to be an actual useful viewer, such as being able to refresh the view, doing filtering, and do sorting. Once I get that going then I can start adding actions to the tree, such as creating new mutable snapshots and committing them, using drag and drop to organize versions inside organizers, and so on. The next step after that (or possibly before that) is to implement a property page view that can show the properties of repository objects. Once that's in place, then I can start working on integrating editing from the repository, which shoudn't be too hard but at that point I'll be doing deeper integration with the Eclipse framework. If I understand the general Eclipse framework, I should eventually be able to hook the XIRUSS client into the Team infrastructure at which point any Ecplise-managed resource could be managed in XIRUSS more or less transparently, although I can't imagine that would be trivial.&lt;br /&gt;&lt;br /&gt;I don't expect any of this to be hard but there are a lot of moving parts and a lot of details to attend to and things like listeners and event handlers that are somewhat outside my normal pipeline tree-walking data processing programming experience.&lt;br /&gt;&lt;br /&gt;At work I've been put onto some high-visibility sales support activities which have the downside that I've less time and energy to spend on XIRUSS but the upside that I'm getting to push pretty hard on current XML content management and indexing tools. I've already reported on MarkLogic and my opinion has, if anything, only improved as I've worked more closely with the software and the folks at MarkLogic. I'm also learning and using XQuery for the first time, which is kind of fun (I simply had no need to use it up until now as XSLT and XPath did what I needed). &lt;br /&gt;&lt;br /&gt;I realize that y'all realize that most of this XIRUSS status reporting is for my own benefit and that nobody's waiting breathlessly for me to get this code to a more usable state but I would be interested to know if anybody is either trying the code or otherwise tracking my progress. My main motivation is to get the code to a state such that when I start writing in detail about the versioned linking scenarios there will be running code that demonstrates the management functionality.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115720374635660312?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115720374635660312/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115720374635660312' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115720374635660312'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115720374635660312'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/09/xiruss-t-update-eclipse-plug-in.html' title='XIRUSS-T Update: Eclipse Plug-in'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115659815403007280</id><published>2006-08-26T07:46:00.000-05:00</published><updated>2007-03-07T10:30:58.078-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss http jython'/><title type='text'>XIRUSS-T Update: Now WIth Direct HTTP Access to Contente</title><content type='html'>I have uploaded a new release of XIRUSS-T to Sourceforge, &lt;a href="https://sourceforge.net/project/showfiles.php?group_id=110203&amp;package_id=119097&amp;release_id=442458"&gt;xiruss_t_build_20060826&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The code now includes two HTTP servers, the API server and the "viewer" server. I've provided a new top-level server runner that starts both servers. I've added some convenience functions to the Jython xiruss_client.py script to make manipulating the repository contents a little easier.&lt;br /&gt;&lt;br /&gt;This release adds support for direct HTTP-based access to version content such that you can now import XML documents and then access them via URL from any HTTP-aware tool (i.e., a Web browser, an XSLT processor, an editor like oXygen XML, etc.) such that references to other documents will be resolved against the repository correctly. &lt;br /&gt;&lt;br /&gt;For example, using the Jython client as a helper, I imported a directory containing an XML document that uses another schema that imports yet another schema (which is in turn made up of many small parts). The whole lot gets imported (via the built-in directory importer). Having done that, I then navigated via the new HTTP server (started on port 9091 by default) to the version that is an imported XML document. Clicking on the link from the snapshot view to the version you get the content of the version in the browser window. There you can see that the pointers to, for example, schemas, have been rewritten as references to resource IDs with resolution policy names as URL parameters.&lt;br /&gt;&lt;br /&gt;I then copy the URL of the version to the clipboard, open oXygenXML editor and do "Open URL". I paste the copied URL into the box and do open. The version is opened in the editor. I then push the "validate" button and the document is validated against its schema accessed directly from the repository.&lt;br /&gt;&lt;br /&gt;Not earth shattering functionality but it's a big milestone for XIRUSS.&lt;br /&gt;&lt;br /&gt;The HTTP viewer server is very crude and unsophisticated--I'm not a Web guy and have not put any real effort into making it look pretty--it's just a way to demonstrate accessibility of versions. &lt;br /&gt;&lt;br /&gt;This is the minimal functionality needed to make the XML versions stored in a XIRUSS repository directly usable without any sort of explicit export action.&lt;br /&gt;&lt;br /&gt;Note that being able to actually edit a version through a tool like oXygenXML would require either implementing the necessary WebDAV protocols or providing a plug-in that works via the XIRUSS client API. The last time I looked into implementing WebDAV it appeared to be harder than I expected so I didn't do it. At the time I couldn't find a nice layered WebDAV implementation that would have been quick to adapt to my stuff. That might be different now, I don't know.&lt;br /&gt;&lt;br /&gt;Finally, I think that my approach to the rewritten URLs needs to be thought through carefully. The current approach works but it binds the resolution policy into the version content and I think that is wrong. You should be able to change the resolution policy for a dependency without modifying the version (the current code reflects my initial implementation from a couple of years ago). I think the right thing to do is to point dependency objects but then that imposes some requirements for dependency existence that the repository model currently doesn't impose. So I have to think it through. But definitely what I'm doing now is not 100% correct.&lt;br /&gt;&lt;br /&gt;Toward that end I think my next task will be implement an Eclipse plug-in that provides more sophisticated access to the repository and enables direct editing of new versions through Eclipse-integrated editors. I don't think this will be too hard, certainly no harder than doing my crude HTTP stuff is and will provide a much nicer interface overall.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115659815403007280?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115659815403007280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115659815403007280' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115659815403007280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115659815403007280'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/xiruss-t-update-now-with-direct-http.html' title='XIRUSS-T Update: Now WIth Direct HTTP Access to Contente'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115642552019730147</id><published>2006-08-24T07:38:00.000-05:00</published><updated>2007-03-07T10:31:33.300-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='&quot;office open xml&quot; microsoft &quot;microsoft office&quot;'/><title type='text'>Office Open XML: Good or Evil?</title><content type='html'>In response to a prospect's alleged comment that "Word will save tables as CALS tables" (or something to that effect--I got the comment second or third hand) I downloaded the Office 2007 Beta and started looking into the whole &lt;a href="http://www.ecma-international.org/memento/TC45.htm"&gt;Office Open XML&lt;/a&gt; thing. &lt;br /&gt;&lt;br /&gt;First, as far as I can tell from a little hands-on testing and Google searches, there's no built-in support for CALS or OASIS Exchange tables in Office 2007, at least in the beta. The table markup in Office Open XML is definitely the same Word ML stuff from Office 2003.&lt;br /&gt;&lt;br /&gt;I also read some some of the commentary on both sides of this issue and found it amusing. It's amusing because it's just so typical of everyone involved.&lt;br /&gt;&lt;br /&gt;My feelings about Microsoft as an enterprise are no secret, but I'll outline them here:&lt;br /&gt;&lt;br /&gt;- As a true blue legacy IBMer I was raised with a built-in hatred of Microsoft (I lived through the whole Windows vs. OS/2 times, having started at IBM about the time the IBM XT was released). I try not to let this youthful indoctrination color my objective analyses too much.&lt;br /&gt;&lt;br /&gt;- I feel strongly that enterprises should compete on value not proprietary lock-in and therefore have many objections to Microsoft's core business practices. This is particularly frustrating to me because of my next opinion.&lt;br /&gt;&lt;br /&gt;- Microsoft has lots of smart people who can and do create excellent software. That is, Microsoft is more than capable of competing on value alone, at least now that it has established market dominance. Of course there is the issue of free vs licensed software that does throw a wrinkle into this equation--if OpenOffice is free how does Microsoft get legitimate revenue in a value-only competition? They would have to offer enough extra value to make it worth paying for. In fact they probably do but it would be a leap of faith for them to go that route (although Open Office XML may in fact represent an unavoidable move in that direction anyway, see below).&lt;br /&gt;&lt;br /&gt;- Microsoft also has lots of people, smart or not, who make totally boneheaded design and implementation decisions that then get baked into products forever. I'm thinking specifically of the fact that Word has not been able to manage the auto-numbering of nested numbered lists since Version 2 (and maybe not before then). Some of this is just people not thinking it through, as always happens in software development, but I think a lot of it is a corporate culture of "get it out quick will fix it in the next release", that is not valuing engineering quality quite as much as I think they should (which really means, caring more about maximizing revenue than about providing the best possible solutions to customers--which if you're a stockholder is a good thing but if you're a user is a bad thing [that being one of the essential problems with Capitalism as an economic system ]).&lt;br /&gt;&lt;br /&gt;To a large degree this makes Microsoft no different from most software companies. The difference of course is Microsoft's monopoly position in both operating systems and office software--it paints a big target on their backs. But Microsoft isn't doing anything that IBM didn't do for 20 years before the PC came out.&lt;br /&gt;&lt;br /&gt;I used to rant about how evil MS Office (and in particular MS Word) was as a proprietary format--it locked your data into a format you didn't own and over which you had no control. That was definitely bad and anyone who accepted that agreement was a dupe and fool. This led of course to discussions of why (at the time) SGML was A Better Way. And what tool were the slides for those presentations done in almost without exception? Of course it was PowerPoint. [I did try on occasion to hack my own SGML-based presentation systems but I never had the time or tools to make it really work and I had to be able to interoperate with my less-enlightened colleagues.]&lt;br /&gt;&lt;br /&gt;I must also confess that after years of resisting I got an XBox and actually subscribe to Official XBox Magazine (and Lego Star Wars II is going to &lt;i&gt;ROCK&lt;/i&gt;). So clearly when they want to do it right Microsoft can: they're big, they've got lots of talent at their disposal. In short, they can choose to do things however they want to.&lt;br /&gt;&lt;br /&gt;And I'll just add that for all the rantings I've spewed about Bill Gates and his evil business practices, the Bill and Melinda Gates Foundation demonstrates that he's actually got a heart and is actively trying to do serious good for the world, so full props to Mr. Bill for putting his billions to use. &lt;br /&gt;&lt;br /&gt;Oh, and I hate MS Word with the fiery passion of a thousand burning suns. I'd sooner chew off my own arm than spend any time actually authoring words in Word. I've spent so many years authoring XML that having to deal with $*%&amp;# like doing a backspace at the end of a paragraph destroys its formatting with no good way to get it back or the complete inability to do autonumbering and any other number of just stupid things that people tolerate day and after day for reasons that I can't understand and the egregious waste of productivity that I've observed in my own XML-steeped colleagues who are &lt;i&gt;literally sitting next to me&lt;/i&gt; just makes me want to &lt;i&gt;&lt;b&gt;SCREAM&lt;/b&gt;&lt;/i&gt;. But that's just me.&lt;br /&gt;&lt;br /&gt;So what about Office 2007 and Office Open XML?&lt;br /&gt;&lt;br /&gt;I'm not going to bother to form a technical opinion about the relative merits of, for example, ODF and OOX because it just doesn't matter. I mean really. At the end of the day the people who create Word documents (poor bastards) or spreadsheets or presentations are the ones who care and they only care about whether they can get the work done reasonably quickly and does it look right? They don't care about formats or XML data islands or how metadata is stored relative to the core content. They also don't, by and large, care about interoperation because everybody uses Word don't they?&lt;br /&gt;&lt;br /&gt;Microsoft has consistently demonstrated that their policy is to use standards only when it suites their interests. They were dragged into XML kicking and screaming (despite being founding members of the XML Working Group) because they knew it would be a chink in their proprietary armor that would allow wedges to be driven in. But then XML took hold and they had no choice so they embraced it, which is to their credit. That they embraced it by just XMLifying RTF is no surprise but at least they did it. And they documented it, something they never did completely with RTF (I'm sure there are those of you who remember when alternating versions of Word would fail to parse RTF that was valid per the RTF spec &lt;i&gt;in different ways&lt;/i&gt;).&lt;br /&gt;&lt;br /&gt;And Office 2003 even let you edit XML documents in arbitrary schemas (as long as they were in a namespace and defined using XSD schemas, a decision which is too strict but since it's my preferred policy for XML usage generally I can't really fault them). Of course this feature is largely useless for lots of reasons but hey they did it, so good for them. It demonstrated that they weren't just giving lip service to XML--they took the trouble to design and build a working arbitrary XML editor. [Now if they would just make it useful I would be happy.]&lt;br /&gt;&lt;br /&gt;But it's hard not to see Open Office XML as a cynical attempt to satisfy the European Union and fight OpenOffice in the standards arena. All the arguments about "backward compatibility" and "we have to support all the features" are really not germain: if they really cared about there being a single universal standard for office documents they would have started with ODF and gone from there, since it already existed and is certainly close enough to what they need to be a starting point. They could have chosen to eat the cost of using MathML instead of their own math presentation markup. They could have chosen to use SVG instead of their own vector graphic language. It would have cost more, both in development time and application migration, but they could have easily said "As a company we are fully committed to open standards and are willing to do what it takes to make it work." But they didn't, for whatever reason. This saddens me a little, because there was an opportunity here that would have had some real benefit, probably, but it doesn't surprise me at all (in fact, if they had done it that would have surprised me).&lt;br /&gt;&lt;br /&gt;I don't think any of this will materially change the day-to-day situations of people who use office software (whether MS Office or OpenOffice).&lt;br /&gt;&lt;br /&gt;I do think it's a good thing that Office 2007 now stores its data exclusively in XML by default and I think the use of Zip files to organize the different parts which are stored as individual documents is the right thing to do and I applaud Microsoft for that decision. &lt;br /&gt;&lt;br /&gt;And even though the ECMA standardization of Open Office XML is driven by cynical business motives, it's still a standard which means that it is truly open (in the sense that there is no license cost or exposure for using the format or implementing support for it) which will be to our benefit. I suspect that it will have the same effect that using XML did: it will force Microsoft to compete more on value than on lock-in, to engineer things a bit more carefully, and to be more consistent in their implementations from release to release.&lt;br /&gt;&lt;br /&gt;For integrators it definitely makes it easier for us to connect things to Office (i.e., creating an X-to-OOX transform or adapter) with some assurance that the code we write today will still work five years from now.&lt;br /&gt;&lt;br /&gt;So while I think it's pretty clear that Office Open XML was driven almost entirely by self-serving business needs I can't see how its a bad thing in general and it looks like it's actually a good thing if you recognize the reality that most office documents are in fact created in MS Office.&lt;br /&gt;&lt;br /&gt;Now as for the new user interface--that's going to take some getting used to, but since I don't use Word it doesn't really matter to me, does it?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115642552019730147?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115642552019730147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115642552019730147' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115642552019730147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115642552019730147'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/office-open-xml-good-or-evil.html' title='Office Open XML: Good or Evil?'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115630787118732636</id><published>2006-08-22T22:53:00.000-05:00</published><updated>2007-03-07T10:32:06.991-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss marklogic integration &quot;xml search and retrieval&quot;'/><title type='text'>MarkLogic: Integrated with Xiruss in No Time At All</title><content type='html'>For a project at work I've started evaluating the &lt;a href="http://www.marklogic.com"&gt;MarkLogic&lt;/a&gt; XML search engine. So far I'm pretty impressed (although I haven't certainly stressed the software beyond just getting it running and writing a little code against it). The installation and setup is pretty straightforward. The provided user interfaces are solid and usable. The documentation is clear and informative. The Java API is logical and small. My reports from colleagues who have used it more heavily is that it is very fast. [Disclaimer: Innodata Isogen is a MarkLogic partner (as far as I know) as we are partners with almost every product vendor in the XML space. This is the first time I've personally done anything with MarkLogic. My other experience with XML-aware search and retrieval tools was over six years ago when we beat our heads against the version of Verity that was at that time integrated with Documentum and that had some serious problems, including the inability to index and retrieve elements with "." in their tag names. So I can't claim to have any basis for comparison of MarkLogic to other similar tools--I'm simply reporting on my impressions of MarkLogic. This is also my first real use of XQuery for anything.]&lt;br /&gt;&lt;br /&gt;One of the things I like about MarkLogic is that it is focused on a specific task, indexing and retrieving XML using XQuery. It doesn't also try to be a content management server or anything and MarkLogic seems to be clear about that, which is good.&lt;br /&gt;&lt;br /&gt;After reading the Java API documentation I realized that it would be trivial to integrate a MarkLogic server with XIRUSS-T, which I did this evening in about an hour (of which 30 minutes was spent working out how my own code worked, 20 minutes was figuring out the sequence of MarkLogic API calls to make (which I did using Jython connected to both a running XIRUSS-T server and a running MarkLogic server) and then about 10 minutes coding up my integration code.&lt;br /&gt;&lt;br /&gt;This is pretty impressive to me because it indicates that the MarkLogic system is solid and easy to integrate (at least for the simple thing I did, but I don't see any great potential complexities other than careful error handling in what I've seen so far). It certainly passed the first gates of being easy to get running, easy to figure out how to do something useful with it, and easy to write custom code against it's API. A lot of tools don't pass the first gate and even fewer pass the second or third.&lt;br /&gt;&lt;br /&gt;My integration with XIRUSS is very simple and not anywhere near as complete as you'd really want in a fully-realized system, but it's sufficient to allow me to throw an XQuery at any document stored in the XIRUSS repository.&lt;br /&gt;&lt;br /&gt;To do the integration I wrote a simple XIRUSS StorageManager that is just a wrapper over any other storage manager. This wrapper creates a MarkLogic-specific StorageObjectData instance that is itself a wrapper over any other StorageObjectData implementation (which in turn manages the actual storage and access to the data content of a StorageObject version).&lt;br /&gt;&lt;br /&gt;The MarkLogicStorageManager is constructed with the URI of a MarkLogic XDBC server (which provides remote access to a MarkLogic server via a simple Java API) and holds the top-level server access object. It is constructed with a real storage manager and just delegates to it for all the methods except setStorageObjectData(), in which it constructs a new MarkLogicStorageObjectData instance (that in turn wraps the StorageObjectData instance created by the underlying real storage manager). This approach lets you use the MarkLogic storage manager with any particular way of storing the data. The alternative would have been to directly subclass InMemoryStorageManager or FileStorageManager, but since MarkLogic doesn't care where the data is stored in the repository, the wrapper approach seems more appropriate.&lt;br /&gt;&lt;br /&gt;The MarkLogicStorageObjectData class adds to the "close()" method of StorageObjectData (called when you're done writing to a mutable version's content) the logic to get a new MarkLogic session, create a "content" object (which indexes the storage object's content), and insert that content object into the MarkLogic repository, named in a way that maps directly to the Version object as stored in the XIRUSS repository:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;public void close() throws IOException {&lt;br /&gt; this.data.close();&lt;br /&gt; // Now write the data to the MarkLogic repository.&lt;br /&gt; StorageObject so = this.getStorageObject();&lt;br /&gt; String mlUrl = HttpApiUrlConstants.VERSIONS + "/" + so.getId();&lt;br /&gt; Session mlSession = this.contentSource.newSession();&lt;br /&gt; ContentCreateOptions options = null;&lt;br /&gt; if (so instanceof XmlStorageObject) {&lt;br /&gt;  options = ContentCreateOptions.newXmlInstance();&lt;br /&gt; } else if (so instanceof TextStorageObject) {&lt;br /&gt;  options = ContentCreateOptions.newTextInstance();&lt;br /&gt; } else {&lt;br /&gt;  options = ContentCreateOptions.newBinaryInstance();&lt;br /&gt; }&lt;br /&gt; logger.debug("Creating MarkLogic content object for version " + so.getId() + ": " + so.getName());&lt;br /&gt; Content content = ContentFactory.newContent(mlUrl, this.getInputStream(), options);&lt;br /&gt; logger.debug("Content object created");&lt;br /&gt; try {&lt;br /&gt;  logger.debug("Inserting content into MarkLogic server...");&lt;br /&gt;  mlSession.insertContent(content);&lt;br /&gt;  logger.debug("Content inserted");&lt;br /&gt; } catch (RequestException e) {&lt;br /&gt;  logger.error(e);&lt;br /&gt;  throw new IOException("Exception putting content into MarkLogic server: " + e.getMessage());&lt;br /&gt; }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;I then created a subclass of JettyXirussHttpApiRunner that does nothing more than create a new MarkLogicStorageManager and sets it as the default storage manager for the repository:&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;XirussRepository rep = new XirussRepositoryDefaultImpl();&lt;br /&gt;URI mlURI = new URI("xcc://admin:admin@localhost:8010/Documents");&lt;br /&gt;StorageManager sm = new MarkLogicStorageManager(rep, rep.getDefaultStorageManager(), mlURI);&lt;br /&gt;rep.addStorageManager(sm);&lt;br /&gt;rep.setDefaultStorageManager(sm.getId());&lt;br /&gt;rep.setPort(port);&lt;br /&gt;MarkLogicXirussHttpApiRunner runner = new MarkLogicXirussHttpApiRunner(rep);&lt;br /&gt;runner.start();&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;That's all there was to it. I then used my little Python XIRUSS client to import an XML document and hey presto, from the MarkLogic sample query UI (a Web page that just lets you submit arbitrary XQueries and see the results) I could query against the document I just imported.&lt;br /&gt;&lt;br /&gt;I think this exercise certainly validates the XIRUSS design to some degree (the fact that it was that easy to bind it to MarkLogic by using the defined extension points).&lt;br /&gt;&lt;br /&gt;To make this integration more complete I'd want to do things like reflect XIRUSS-maintained Version properties in the MarkLogic repository as appropriate (MarkLogic has the concept of arbitrary properties associated with indexed documents), have some association between branch and snapshot visibility in XIRUSS and the equivalent security settings in MarkLogic (i.e., when you query the MarkLogic database you can only see results for Versions that are visible in your current branch and snapshot context) and that sort of thing, as well as integration of XIRUSS's schema registry with the MarkLogic schema awareness (needed to do schema-type-aware XQueries and validation through MarkLogic's built in processing support). There's also some schema-specific configuration of MarkLogic that you may need to do (such as fragementation points in documents so it can handle large document instances).&lt;br /&gt;&lt;br /&gt;But I've certainly proven to myself that the minimal useful integration is not at all hard.&lt;br /&gt;&lt;br /&gt;Also, MarkLogic offers both time-limited evaluation versions and a size-limited "community" version that is an ideal companion to XIRUSS-T (as a toy system). &lt;br /&gt;&lt;br /&gt;The MarkLogic storage manager code is in the XIRUSS-T Subversion repository on SourceForge--I created it as a separate Eclipse project from the main xiruss-t code base so it's in trunk/marklogic_storage_manager).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115630787118732636?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115630787118732636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115630787118732636' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115630787118732636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115630787118732636'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/marklogic-integrated-with-xiruss-in-no.html' title='MarkLogic: Integrated with Xiruss in No Time At All'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115613517452190101</id><published>2006-08-20T23:26:00.000-05:00</published><updated>2007-03-07T10:32:27.861-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss'/><title type='text'>XIRUSS-T Update: New Release On SourceForge</title><content type='html'>I have finally gotten the client/server code sufficiently complete and functional to make it worth formally releasing: &lt;a href="http://sourceforge.net/project/showfiles.php?group_id=110203"&gt;Xiruss-t-build_20060820 &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This code provides two jars, xiruss-t-client.jar and xiruss-t-server.jar, as well as all the source code (including unit tests). I've also included a very simple but very handy Python script (python/xiruss_client.py) for use with Jython that makes it easy to import a file into the running repository. See the release nodes for the release on SourceForge for details.&lt;br /&gt;&lt;br /&gt;I still need to create a little GUI for doing importing of files and directories and for navigating the repository, as well as restoring the currently broken Web-based end-user interface.&lt;br /&gt;&lt;br /&gt;But what's released should be sufficient for anyone who is interested in looking under the hood to easily play around with a working system. And by "working" I mean "all the unit tests pass but beyond that I make no guarantees and a number of client-side methods are not yet implemented".&lt;br /&gt;&lt;br /&gt;So I'm going to put this code down for a while (or at least not work on it quite so obsessively--my wife has been starting to give me rather dirty looks the last few days) and return to the main discussion of XML content management. &lt;br /&gt;&lt;br /&gt;And here's a little side question: does anyone know of a quick way to translate a bunch of POJO code into the equivalent Python? I have a Web site for xiruss.org but my cut-rate hosting service only supports Python and Perl [And I'd eat a gun before I'd ever write another line of Perl]. I'd like to set up a demonstration server that people can use to put their own stuff into but I'd need a Python implementation of the server. My quick research suggests there's no such animal. I know I could do most of it with either reflection or just search and replace but I thought maybe somebody out there would have some ideas.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115613517452190101?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115613517452190101/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115613517452190101' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115613517452190101'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115613517452190101'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/xiruss-t-update-new-release-on.html' title='XIRUSS-T Update: New Release On SourceForge'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115591017525387426</id><published>2006-08-18T09:05:00.000-05:00</published><updated>2007-03-07T10:32:43.390-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss'/><title type='text'>XIRUSS-T Update: Can store and retrieve XML compound documents</title><content type='html'>I have finally completed the minimal implementation of support for importing and getting back XML compound documents. This means that from the client you can use the provided XML importer to import an XML compound document (any document using XInclude, an XSD schema, or an XSLT style sheet) and, on the client, request the imported versions and get a DOM from it. &lt;br /&gt;&lt;br /&gt;This is the core functionality needed to make XIRUSS-T a useful &lt;i&gt;XML-aware&lt;/i&gt; content management system.&lt;br /&gt;&lt;br /&gt;I still have more testing to do and more client-side methods to implement but the system is now minimally usable for realistic XML management use cases. &lt;br /&gt;&lt;br /&gt;My next tasks, in addition to further testing and method implementation, is to get things packaged up nicely, hack a little client GUI, and start documenting the API and code design in more detail.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115591017525387426?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115591017525387426/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115591017525387426' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115591017525387426'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115591017525387426'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/xiruss-t-update-can-store-and-retrieve.html' title='XIRUSS-T Update: Can store and retrieve XML compound documents'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115518655200899694</id><published>2006-08-10T00:01:00.000-05:00</published><updated>2007-03-07T10:33:00.635-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss'/><title type='text'>XIRUSS-T Update: Can Write To Storage Object Via HTTP API</title><content type='html'>I've been working feverishly on getting the XIRUSS HTTP API implemented. It's been more work than I anticipated, mostly because doing the API has revealed a number of weaknesses in my original code (not surprisingly since it was hacked at top speed). Extracting interfaces took more time than I thought (Eclipse didn't do everything it should have--not sure if it is a limitation or user error). I also reorganized the code packages to make the distinction between client and server code components clearer and to make a cleaner distinction between core implementations and repository-specific code. Finally, I had to seriously rework my storage manager implementation. But now I have all that in place and I just got the test case that demonstrates that I can create a StorageObject version and put data into it and get it back out via the HTTP API. This is a major milestone. Now all I have to do is refactor the existing Importer code to use the new API and code patterns  and implement any remaining client-side proxy methods that the importers require and it should all just work. Once I get that done I can get back to the discussion of versioned hyperdocument lifecycle management.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115518655200899694?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115518655200899694/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115518655200899694' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115518655200899694'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115518655200899694'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/xiruss-t-update-can-write-to-storage.html' title='XIRUSS-T Update: Can Write To Storage Object Via HTTP API'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115478569211087881</id><published>2006-08-05T08:29:00.000-05:00</published><updated>2007-03-07T10:33:16.112-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss xiruss-t'/><title type='text'>XIRUSS-T Update: Client Almost Done</title><content type='html'>I have achieved the milestone in my HTTP client implementation in that my client test case demonstrates that you can create new versions, commit them to a branch, and get them back again via the newly-created snapshot. This is the core functionality needed to get things into the repository and get them back. I also demonstrate support for all the methods on RepositoryObject (the base class for all objects managed by the repository). This code committed to the Sourceforge Subversion repository.&lt;br /&gt;&lt;br /&gt;Still to do:&lt;br /&gt;&lt;br /&gt;- Implement all the remaining methods on Version&lt;br /&gt;&lt;br /&gt;- Implement writing to storage object versions via the API client helper&lt;br /&gt;&lt;br /&gt;- Figure out the best model for getting a new repository and session on the client side (this is an API design question not a functionality question).&lt;br /&gt;&lt;br /&gt;- Refactor the organization of the various interfaces and classes to clearly separate the stuff that is only relevant to servers from stuff needed by clients so that the client-side library can be as small as possible. This will also involve refactoring the abstraction layers from the core repository up through the Xiruss-specific HTTP server. There needs to be a clear abstract layer that adds in user and session awareness. I feel strongly that the core repository data model be completely generic so that it can be exposed as essentially a single-user process. Issues of multi-user support, including authentication and so forth are implementation issues that need to be able to vary among implementations. In addition, things like supporting multiple users or ensuring transaction safety and so forth are performance and scalability issues that I am explicitly not addressing in XIRUSS-T. These are things that could be addressed either on top of the base code using aspects or how the server-specific objects are implemented or at the core SnapCM object implementation level. But none of that is needed in order to provide a semantically correct distributed server and ignoring those issues here keeps the code very very simple, which is my goal.&lt;br /&gt;&lt;br /&gt;I've been thinking about it and I realized that with XIRUSS-T I don't want to impress anyone with the dazzling complexity of my code but with the breathtaking simplicity of the underlying data model and the core implementation objects. My whole point is that these complex challenges of management of versioned XML compound documents can be met through the use of fundamentally tools used in clever ways.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115478569211087881?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115478569211087881/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115478569211087881' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115478569211087881'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115478569211087881'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/xiruss-t-update-client-almost-done.html' title='XIRUSS-T Update: Client Almost Done'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115457950601472634</id><published>2006-08-02T23:16:00.000-05:00</published><updated>2007-03-07T10:33:38.659-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss xiruss-t'/><title type='text'>XIRUSS-T Update: Client Starting To Take Shape</title><content type='html'>In the unlikely event that there's somebody out there waiting breathlessly for me to continue my exploration of versioned hyperdocument management, I wanted to report on why I haven't posted in the last couple of days. &lt;br /&gt;&lt;br /&gt;I've been working full out on implementing a usuable HTTP-based REST server API and corresponding client layer for XIRUSS-T. This required me to extract interfaces for all the core SnapCM classes (something I should have done from the start but that's test-driven development for you--until I started on the client there was no need for interfaces because there was only one implementation of each class). This also required that I refine and fix the implementation of some core SnapCM semantics. Needless to say this was an involved refactor. Thank goodness for reasonably complete unit tests, that's all I've got say.&lt;br /&gt;&lt;br /&gt;At the moment, the XIRUSS-T code in the Subversion repository on SourceForge now has a client API layer and corresponding unit test that can connect to a running XIRUSS-T server over HTTP, create a user, get a session for that user, get the user's session again, and get the same session. The client provides proxy objects that reflect the XIRUSS abstract API (thus the need for the interfaces).&lt;br /&gt;&lt;br /&gt;This may sound simple but it was a lot of work to get this point. Now it's pretty much just a matter of typing to get all the client-side classes and methods implemented. &lt;br /&gt;&lt;br /&gt;Once I have the client in place then it will be easy to create scripted or graphic clients to do stuff like navigate the repository, manage imports and exports, and so on. It will also make it easy to implement Layer 3 components as distributed clients, which is the most general thing to do even if they are running on the same machine as the server.&lt;br /&gt;&lt;br /&gt;I really like the REST approach (using normal HTTP protocols and returning the result as XML chunks). I could have used something like RMI but that felt harder, even though it's probably actually less work to implement. But there's something very comforting about being able to point a browser at the server and see the XML response right there in the browser. Once you know the URL construction rules for the API you can navigate around manually. In the case of XIRUSS you can eventually navigate to the data content of a storage object version and see it in the browser. &lt;br /&gt;&lt;br /&gt;It also means that code in any language can connect to the server--no need to somehow provide different language bindings (or be limited to only Java clients). &lt;br /&gt;&lt;br /&gt;So I haven't had any time to write my next post in the XCMTDMW series. But I figure most people who are interested probably need some time to catch up to me anyway....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115457950601472634?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115457950601472634/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115457950601472634' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115457950601472634'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115457950601472634'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/08/xiruss-t-update-client-starting-to.html' title='XIRUSS-T Update: Client Starting To Take Shape'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115435031994318992</id><published>2006-07-31T07:06:00.000-05:00</published><updated>2007-03-07T10:35:37.559-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; indirection xinclude xlink linking hytime  snapcm xpointer'/><title type='text'>XCMTDMW: Element to Element Linking: Overview</title><content type='html'>What do we mean by "linking" in the context of XML document processing?&lt;br /&gt;&lt;br /&gt;The most general definition is "a &lt;i&gt;semantic&lt;/i&gt; object that establishes a set of one or more relationships among uniquely-addressible XML components". This definition is reflected by the XLink and HyTime standards, which provide syntax and semantics for establishing arbitrarily-complex relationships between arbirarily-addressible things. (&lt;span style="text-decoration: line-through"&gt;XLink is limited to the domain of linking among XML components&lt;/span&gt;XLink is limited to the domain of linking among components for which a URL-compatible fragment addressing syntax and semantic have been defined, HyTime provides generic facilities for making anything generically addressable and therefore enables linking anything to anything via a single standard representation mechanism (groves)). [See the comments to this post for more good discussion around my original mistatement of XLink's limitations. The distinction between XLink and HyTime in this area is subtle but important: HyTime provides a generic mechanism by which you can define the "in-memory" addressing representation of anything (the downside is somebody has to define it and somebody has to implement the instantiation of the representation). By contrast, XLink is dependent on different defined data formats (XML, HTML, CGM, MS Word, whatever) defining, as IETF or W3C specifications, what their addressible components are and what the syntax for doing that addressing is. If there is no such specification XLink can't link it. This is one reason it often seems like a good idea to use XML to encode everything: it makes it universally addressible. It's true but it's not the only way it could be done. The Web world could easily define the functional equivalent of HyTime groves and XLink/XPointer could then be defined in terms of addresses of things in terms of that generic grove-like thing. But I don't see that happening any time soon. In addition, XLink addressing is done via URIs exclusively, which should not be a limitation in practice but is another difference--HyTime is so flexible that almost any reasonable way of writing down addresses using SGML or XML markup can be made recoganizable to a HyTime processer--a degree of flexibility that made HyTime difficult for many people to understand, but I digress).]&lt;br /&gt;&lt;br /&gt;For example, using XLink you could semantically link words in one section of PCDATA to words in another section just as easily as you can link one element to another.&lt;br /&gt;&lt;br /&gt;While that level of generality is sometimes useful and it's important for standards like XLink and HyTime and XPointer to both enable it and standardize it clearly and completely, for the purposes of both discussing the issues inherent in linking and in doing workaday technical documentation, we can narrow our focus to a key subset of the general case, linking one element to another element or set of elements.&lt;br /&gt;&lt;br /&gt;First, let me explain my repeated stress on the word "semantic". A link is a semantic relationship whose meaning is independent of how the relationship is established. Think of it like marriage: it doesn't matter whether you're married in a church or a justice of the peace, in Austin or Amsterdam, in English or in Chinese, the resulting relationship is the same: A is married to B.&lt;br /&gt;&lt;br /&gt;By the same token it doesn't matter how a link is expressed syntactically in your data: XLink, XIinclude, HyTime, HTML, your own 20-year-old link markup, the relationships will be the same: Element A is linked to Element B for some reason.&lt;br /&gt;&lt;br /&gt;By the same token, &lt;i&gt;addressing&lt;/i&gt;, on which semantic linking depends, is entirely syntactic. Addressing is the plumbing or mechanics that let you physically connect things together: the pointers. The addressing syntax you use has many practical implications, including the availability of implementations, the cost of implementation and processing, the opportunities for interoperation, and so on, but the specific syntax you use doesn't affect the meaning of the relationships established by the links that do the addressing.&lt;br /&gt;&lt;br /&gt;That is, it doesn't matter whether you arrive at your wedding via train or car or pontoon boat, the end result is the same. The cost and speed and availability are different but as long as you get there on time, it doesn't matter which one you use.&lt;br /&gt;&lt;br /&gt;This is very important. Clear thinking about linking requires that you be able to make a complete and clear distinction between the syntax-independent and syntax-specific parts of linking. &lt;br /&gt;&lt;br /&gt;And clear thinking is the only way we will be able to find our way safely to a generalized approach to XML document and link management that can both satisfy all our requirements and not require crazy optimizations.&lt;br /&gt;&lt;br /&gt;My promise to you is that if you stick with me in this exploration of the intricacies and pitfalls of linking is that when we come out the other end you will have at your disposal a general architectural approach to link management that can be implemented simply or sophisticatedly as you require that will do everything you need it to do at a cost exactly proportional to your specific needs in terms of scale, performance, and completeness. That is, if you need to do simple element-to-element linking there is a simple solution that is completely compatible with and upgradable to the most sophisticated system that lets you link anything to anything. We already saw this with the Woodward Governor system. You do not need a hugely-expensive, overoptimized XML-aware CMS as cost of entry to doing sophisticated linking. You may need one eventually if your scale and performance requirements are high. But it is likely that in fact you need something less daunting and expensive.&lt;br /&gt;&lt;br /&gt;OK, back to linking. &lt;br /&gt;&lt;br /&gt;Here are some facts that help us narrow our problem statement while preserving our ability to do more sophisticated things in the future:&lt;br /&gt;&lt;br /&gt;- You can always change the addressing syntax without changing the semantics of the links. For example, you can change from using only ID references to using full XPointers without changing the meaning of any links as long as the addressed result is the same.&lt;br /&gt;&lt;br /&gt;- Any link expressed as an element that points directly to another element (an "inline link") can be replaced with an "out-of-line" link that points to the two original elements without changing the semantics of the relationship expressed. The reverse is also true.&lt;br /&gt;&lt;br /&gt;- Doing one-to-many linking or many-to-many linking is no harder than doing one-to-one linking in a generalized system. It mostly becomes a user interface issue (which is why HTML doesn't directly allow it).&lt;br /&gt;&lt;br /&gt;- The nature of the things linked doesn't change the general nature of the issues inherent in doing semantic linking and addressing.&lt;br /&gt;&lt;br /&gt;- Most of the complexity of link processing and management is in the addressing. &lt;br /&gt;&lt;br /&gt;- Most of the complexity of addressing comes from managing addresses within a body of information under revision. If your data is static and unchanging, addressing is easy, just a simple matter of programming. It's when your data changes over time that things get interesting. &lt;br /&gt;&lt;br /&gt;- The core requirements for linking and addressing in authoring support repositories and delivery support requirements are fundamentally different. In particular, authoring repositories must provide sophisticated mechanisms for doing indirect addressing while delivery repositories need not do any indirection and would rather not do any (in order to keep things as simple and quick as possible).&lt;br /&gt;&lt;br /&gt;Taken together, these facts mean the following for us:&lt;br /&gt;&lt;br /&gt;- We can focus on the simplest case, element-to-element links and know that the same issues and principles will apply to less common cases, such as element-to-text links.&lt;br /&gt;&lt;br /&gt;- Our choice of addressing method will be the primary determiner of the cost of our system in terms of both cost to implement and cost to use.&lt;br /&gt;&lt;br /&gt;I will also observe that in most technical documentation linking is limited to element-to-element links and usually to strictly binary links of one element to one element, for the simple reason that doing anything more sophisticated is challenging for writers from a rhetorical standpoint and is complicated by the inherent lifecycle management challenges posed by linking in general. That is, doing more than simple links is just too hard in most cases.&lt;br /&gt;&lt;br /&gt;[NOTE: In examples that follow I will omit namespace declarations just to keep the examples simple but my policy is that all elements should be in a namespace other than the no-namespace namespace. Just so we're clear.]&lt;br /&gt;&lt;br /&gt;OK, let's pull the covers off a link and see what makes it tick. Let's start with one we've seen, an XInclude link:&lt;pre&gt;&amp;lt;?xml version="1.0?&gt;&lt;br /&gt;&amp;lt;doc&gt;&lt;br /&gt;  ...&lt;br /&gt;  &amp;lt:xi:include href="../common/warnings/dont_run_scissors.xml"/&gt;&lt;br /&gt;  ...&lt;br /&gt;&amp;lt;/doc&gt;&lt;/pre&gt;Here we have a simple XInclude "include" link. This link is establishing a relationship between itself, the &amp;lt;xi:include&gt; element, and the element that is the document element of the XML document named by the href= attribute. The semantics of the relationship are defined by the XInclude specification and are "transclude" or "use-by-reference". &lt;br /&gt;&lt;br /&gt;Note that this is &lt;i&gt;not&lt;/i&gt; a link between the xi:include element and &lt;i&gt;document entity&lt;/i&gt; "dont_run_scissors.xml". It is also &lt;i&gt;not&lt;/i&gt; a link between the document that contains the xi:include element and the document entity. It is a link from one element, the xi:include element to another element, the document element of the document entity named. This is very important and if you aren't seeing the distinction we need to stop now and make sure you do see it because this is crucial to our understanding going forward. To make it clearer, lets look at dont_run_scissors.xml:&lt;pre&gt;&amp;lt;xml version="1.0"?&gt;&lt;br /&gt;&amp;lt;warning&gt;&lt;br /&gt;&amp;lt;p&gt;Don't run with scissors.&amp;lt;/p&gt;&lt;br /&gt;&amp;lt;/warning&gt;&lt;/pre&gt;The relationship established by the xi:include element is between itself and the &amp;lt;warning&gt; element that happens to be the document element of dont_run_scissors.xml. &lt;br /&gt;&lt;br /&gt;Why is this? It's because XInclude defines a useful shortcut which is that, by definition (not just by convention), a reference to a document entity with no explicit XPointer is a reference to that document entity's document element.&lt;br /&gt;&lt;br /&gt;Let's make this clear by changing our data a bit. Let's aggregate all our standard warnings into a single document for convenience:&lt;pre&gt;&amp;lt;xml version="1.0"?&gt;&lt;br /&gt;&amp;lt;warning_set&gt;&lt;br /&gt;&amp;lt;warning&gt;&lt;br /&gt;&amp;lt;p&gt;Don't run with scissors.&amp;lt;/p&gt;&lt;br /&gt;&amp;lt;/warning&gt;&lt;br /&gt;&amp;lt;warning_set&gt;&lt;br /&gt;&amp;lt;warning&gt;&lt;br /&gt;&amp;lt;p&gt;Don't stand on the top rung of a step ladder&amp;lt;/p&gt;&lt;br /&gt;&amp;lt;/warning&gt;&lt;br /&gt;&lt;/pre&gt;Now let's create a new version of our linking document to reflect this new organization of warnings [NOTE: I'm pretty sure my xpointer syntax is not complete. I'm keeping it simple for example purposes. See the spec for the exactly correct syntax]:&lt;pre&gt;&amp;lt;?xml version="1.0?&gt;&lt;br /&gt;&amp;lt;doc&gt;&lt;br /&gt;  ...&lt;br /&gt;  &amp;lt:xi:include href="../common/warnings/warnings.xml"&lt;br /&gt;       xpointer="xpointer(/*/warning[1])"&lt;br /&gt;/&gt;&lt;br /&gt;  ...&lt;br /&gt;&amp;lt;/doc&gt;&lt;/pre&gt;What have we changed? Because the warning we want is no longer a document element (it's no longer the root element of its containing document), we can't use just an href=--we have to add an xpointer= in order to address the element we want. So we've added an xpointer= attribute with an XPointer that addresses the first warning in the new warning_set document.&lt;br /&gt;&lt;br /&gt;The relationship is still the the same: the xi:include element is pointing to the don't run with scissors warning. The addressing has changed (because the data changed) but the semantics are the same and the processing result will be the same.&lt;br /&gt;&lt;br /&gt;And note that it doesn't matter &lt;i&gt;how&lt;/i&gt; we address the target warning. Here I made the smallest possible change to the warning data (added a wrapper warning_set element) but I didn't change the target warning at all. In particular I didn't do what a lot of people would either assume is required or do instinctively: add an ID to the warning.&lt;br /&gt;&lt;br /&gt;This is to make the point that &lt;i&gt;how you do addressing doesn't frickin' matter&lt;/i&gt; as regards the semantics of the links. The only questions are "how hard is it to create the pointer in the first place and how hard will it be to resolve?" As it happens, with XPointer, most of it is pretty easy and you can do it in XSLT 1 (and it's really easy with XSLT 2). I've done it and I make that XSLT code freely available (I believe an older version is somewhere on the XSL FAQ site--I have a newer version that supports XSLT 2 but I need to post it somewhere). In any case, it's not that hard and it gets easier every day.&lt;br /&gt;&lt;br /&gt;Have I made my point about addressing vs semantics? I hope so because it's crucial to making everything work. In particular, if you can't change the form of address without changing the meaning of your links, link management would be very hard indeed.&lt;br /&gt;&lt;br /&gt;Having said that, it's also the case that the form of address you choose will affect many practical aspects of the system. In particular, if you choose a form of address that is not standards based (that is, is not XPointer or some form of schema-defined key/keyref) then you are at a minimum increasing the cost of implementation because you'll be on the hook for all the code components that have to work with those addresses (both to create them in new documents and to resolve them during processing). If the addressing mechanism is specific to a product (for example, references to object IDs in some proprietary repository) then you've tied yourself to that repository &lt;i&gt;at the data level&lt;/i&gt; which I think is a very dangerous thing to do and should only be done when there is no alternative (and there's always an alternative).&lt;br /&gt;&lt;br /&gt;Note too that if your address is to object IDs in a repository you are doing exactly what we did above when we used just the href= to point to dont_run_scissors.xml: you're addressing a &lt;i&gt;storage object&lt;/i&gt; in order to address its root element. That is, any system that decomposes documents at the element level is making individual documents out of each of those elements. That's not necessarily bad (and we'll see later where having the ability to do that as needed is a good thing) but let's not pretend that you are addressing &lt;i&gt;elements&lt;/i&gt; directly. You are not. A lot of the incorrect behavior of these systems (such as synthesizing invalid documents on export) comes from not realizing or admitting that their objects are documents and not elements in some element tree reflecting a single document (which is what they usually claim or the appearance they expose through UIs and APIs). Just saying.&lt;br /&gt;&lt;br /&gt;OK, let's look at what we've done and what we've got so far:&lt;br /&gt;&lt;br /&gt;- We started with a very simple link, an XInclude from inside one document to an element in another document. Our intent was to relate the xi:include element to a single warning element and we did that by pointing to the document entity that contained the warning element and for which the warning element was the root element. &lt;br /&gt;&lt;br /&gt;In terms of our storage-management framework, this created a system of two documents with a dependency between the first document and the warning document of type "component of".&lt;br /&gt;&lt;br /&gt;- We decided to put all our warnings into one document (for example, because they all go through a single approval workflow and must all be approved by the same deadline or because they're created and managed by one author). This required us to create a new document, warnings.xml. Into this document we copied the original warning from dont_run_scissors.xml as well as other warnings. We committed this new document into our system.&lt;br /&gt;&lt;br /&gt;- By some means as yet unrevealed, we, the authors of the original document, came to know that the authoritative version of our warning is now in warnings.xml and that we need to create a new version of our document that reflects this new location. So we checked out our doc (let's call it doc_01.xml), added the necessary xpointer= attribute to the xi:include element, and committed this new version into the repository.&lt;br /&gt;&lt;br /&gt;There's some interesting stuff going on here that I need to point out:&lt;br /&gt;&lt;br /&gt;- The original version of doc_01.xml continues to irrevocably point to the original warning in dont_run_scissors.xml. The creation of warnings.xml did not change anything about this. If you were to process version 1 of doc_01.xml right now you would get the same result you got before we created warnings.xml--that is, the warning we would use would be the one dont_run_scissors.xml, not the one in warnings.xml. &lt;br /&gt;&lt;br /&gt;- There are two versions of the don't run with scissors warning that we, as humans doing this work, know are versions in time. However, the information we have seen so far does not explicitly relate the two versions in any way and only weakly implies it  through the two versions of doc_01.xml, which differ only in the form of address used for the xi:include (but note that could be because we decided to use an entirely different warning--there's nothing about the link that says we were linking to a new version of the same warning &lt;i&gt;resource&lt;/i&gt; (in SnapCM terms)). And not that making each element its own document wouldn't help us here because the whole point was we wanted all the warnings in one document. If we want that reasonabl level of storage organization flexibility then we have to step up to being able to both address elements that are not document elements and provide some way of tracking the version history of elements regardless of their storage locations. Fortunately it's not too hard to do.&lt;br /&gt;&lt;br /&gt;- The change to the warning, in this case a change to its physical location required us to react by creating a new version of our document doc_01.xml even though the content of the warning itself &lt;i&gt;did not change&lt;/i&gt; and therefore we had no other reason to need to change doc_01.xml. This is very important. This is the essential problem in the management of versioned hyperdocuments. Think about the implications here for a large body of documents all of which use this standard warning.&lt;br /&gt;&lt;br /&gt;From this simple use case, which is pretty much the simplest use case, you should start to see a few things with some clarity:&lt;br /&gt;&lt;br /&gt;- Moving from addressing elements indirectly via reference to the XML documents of which they are the root to addressing elements anywhere inside their containing documents complicates things a good bit (mostly for address creation, which really means for authoring user interfaces).&lt;br /&gt;&lt;br /&gt;- There is a need to track the version history of &lt;i&gt;elements&lt;/i&gt;, not just storage objects. It would really be nice to know where our warning, as a unit of managed information in a non-trivial workflow, has been over its lifetime. &lt;br /&gt;&lt;br /&gt;I picked warnings on purpose because they are the most obvious example of information for which there could be severe legal and safety implications and for which you therefore need to know what you said when and where you said it and what documents used which version and what time in the past. That is, when ScissorCo gets sued you need to be able to prove that your authors used the right warning in the right documents and therefore the plaintif should have known not to run with them. I also chose warnings because they are an obvious target of re-use and they tend to go through an authoring and revision workflow separate from any documents that use them. Keep that in mind as we go forward. Warnings are just an obvious instance of a more common general case in use-by-reference, which is using information among publications or data sets with different workflows that have no necessary or natural synchronizations. For example, where core content is developed on a per-engine basis but is used in publications whose workflow schedule is driven by specific product development and release cycles.&lt;br /&gt;&lt;br /&gt;- During authoring (that is, during the revision life cycle of the information) there is a strong requirement for various forms of &lt;i&gt;indirect addressing&lt;/i&gt; in order to avoid the very problem we ran into here: change to a link target requires changing the link source even though the semantics of the link were otherwise not affected.&lt;br /&gt;&lt;br /&gt;The SnapCM model provides one form of indirect addressing, the dependency link, but that alone is not sufficient if we want to enable direct addressing of elements regardless of how they are stored (because SnapCM dependencies are only between storage objects). If your requirements can be met by only doing linking and addressing of document root elements then it is sufficient (although the implication is sometimes that you end up with a lot of very small documents). But it's not that hard to step up to doing indirect addressing of elements anywhere.&lt;br /&gt;&lt;br /&gt;Finally, I'll leave you with one question: what W3C or OASIS or IETF standard provides a mechanism for doing indirect addressing of XML elements that are not document elements? [I left out ISO because we already know the answer: HyTime (ISO/IEC 10744:1996).]&lt;br /&gt;&lt;br /&gt;Next time: Why indirection is so important for authoring&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115435031994318992?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115435031994318992/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115435031994318992' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115435031994318992'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115435031994318992'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-element-to-element-linking.html' title='XCMTDMW: Element to Element Linking: Overview'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115423900280333702</id><published>2006-07-30T00:42:00.000-05:00</published><updated>2007-03-07T10:36:41.890-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='xiruss xiruss-t'/><title type='text'>XIRUSS-T Update</title><content type='html'>I have updated the XIRUSS-T source code so that the latest code in the Subversion repository on SourceForge works correctly (the URL processing had been broken). The code that's there includes the beginnings of support for HTTP PUT and POST methods by which you can modify and add to the repository remotely. Using the code that's there, you can, for example, use an interactive Jython session and the XirussHttpClientHelper class to add things to the repository. Not that that's any sort of real client user interface but it does demonstrate that XIRUSS-T is moving from being write-once read-many to fully read/write.&lt;br /&gt;&lt;br /&gt;I've also started implementing a simple REST API that will make it easier to implement clients. I've got the code framework in place (I've implemented returning a list of all the branches in the repository) and implementing the rest of the operations shouldn't take too long--it's mostly typing and working out what the best URL syntax should be.&lt;br /&gt;&lt;br /&gt;I also plan to implement some sort of minimal graphic client UI that will let you import things, set version properties, create dependencies, commit snapshots, create branches, and so on. Unfortunately, I'm not much of a UI programmer so I don't know how much I'll really be able to do quickly.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115423900280333702?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115423900280333702/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115423900280333702' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115423900280333702'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115423900280333702'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/07/xiruss-t-update.html' title='XIRUSS-T Update'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115409028829472632</id><published>2006-07-28T07:10:00.000-05:00</published><updated>2007-03-07T10:38:01.097-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; import namespaces'/><title type='text'>XCMTDMW: Import is Everything, Part 4</title><content type='html'>OK, back to our import use cases.&lt;br /&gt;&lt;br /&gt;In Part 2 we left off after having imported the source XML for a publication (doc_01.xml) and its schema (book.xsd) and then having imported a second version of doc_01.xml without importing an unnecessary second version of the schema (because we were able to tell, through the intelligence about XSD schemas in our importer, that we already had the right schema instance in the repository).&lt;br /&gt;&lt;br /&gt;We saw that the dependency relationships let us dynamically control which version or versions a link will resolve to at a particular point in time by changing the "resolution policy" of the dependency. This allowed us to import a new version of the schema without automatically making old versions invalid. It also gave us the choice of making some versions invalid or not as best reflected our local business policies. &lt;br /&gt;&lt;br /&gt;Note that the storage object repository (Layer 1) doesn't care whether or not the XML documents are valid--it's just storing a sequence of bytes. It's the business processes and processors doing useful work that cares or doesn't. This is why we can put the repository into a state where we know some of the documents in it are not schema valid. &lt;br /&gt;&lt;br /&gt;Also, it should be clear that whether you allow import of invalid (or even non-well-formed) documents is entirely a matter of policy enforced by the importer. For example, you could say "if it's not schema valid it's not getting in" or you could simply capture the validity state in metadata, as we've done here. You could have a process that will import everything, no matter what, but if, for example, an imported XML document is not well-formed, it will import it as a simple text file with a MIME type of "text" not "application/xml". It's up to you. If your CMS doesn't give you this choice out of the box you've given up a lot of your right to choose your policies.&lt;br /&gt;&lt;br /&gt;In our repository we have the "is schema valid" property which will either be "true" or "false" (it could also be "don't know", for example, if you imported a document that referenced a schema you don't have or is in a namespace for which you have no registered schema). &lt;br /&gt;&lt;br /&gt;Now imagine that we've built a Layer 3 Rendition Server that manages the general task of rendering publication source documents into some output, such as PDF or HTML. It's pretty likely that there's no point in rendering documents that are known to not be schema valid. With our "is valid" property the rendition server can quickly look to see if all the components of a publication are valid before it does any processing, which would be a big time saver.&lt;br /&gt;&lt;br /&gt;Likewise, we can easily implement a Layer 3 management support feature that notifies authors or managers of invalid documents so they know they need to modify them to make them valid again. This is especially important if, as in this case, we might unilaterally cause documents to become invalid through no fault on the part of authors.&lt;br /&gt;&lt;br /&gt;Anyway, back to the use cases.&lt;br /&gt;&lt;br /&gt;I've stipulated that doc_01.xml represents a publication, that is, a unit of publication or delivery. This notion of "publication" is my private jargon but the need for it should be clear in the context of technical documentation and publishing. Most publishing business processes are driven by the creation of publications or "titles" or "doc numbers".&lt;br /&gt;&lt;br /&gt;But there's nothing particular about the XML data that identifies it, generically, as the root of a publication and in fact there's no requirement that a publication be represented by a single root document (although that's a simple and obvious thing to do). &lt;br /&gt;&lt;br /&gt;So we probably need some business-process-specific metadata to distinguish publication roots from non-publication roots. So let's define the metadata property "is publication root" with values "true" or "false". For doc_01.xml we set the value to "true" since it is a publication root.&lt;br /&gt;&lt;br /&gt;Now we can do some interesting stuff--we can query the repository to find only those documents that are publication roots, which would be pretty useful. For example, it would allow us to narrow a full-text search to just a specific publication or just produce a list of all the publications. If we also have process-specific metadata for publications, such as its stage in the overall publication development and delivery workflow, we can see where different publications are. If we capture the date it was last published we can know if its up to date relative to some other related publications. You get the idea.&lt;br /&gt;&lt;br /&gt;So import a new document already!&lt;br /&gt;&lt;br /&gt;Fine. Let's import doc_02.xml, which is the root of another publication. It conforms to the latest version of book.xsd. We have authored this document with knowledge of doc_01.xml and have created a navigation link from doc_02.xml to doc_01.xml, like so:&lt;pre&gt;&lt;br /&gt;&amp;lt;p&gt;See&lt;br /&gt;&amp;lt;bibcite href="http://mycms/repository/resources/RES0001"&gt;Document One&amp;lt;/bibcite&gt;&lt;br /&gt;for more information&amp;lt;/p&gt;&lt;/pre&gt;&lt;br /&gt;It also has a link to a Web resource:&lt;pre&gt;&lt;br /&gt;&lt;p&gt;See&lt;br /&gt;&amp;lt;bibcite href="http://www.docbook.org"/&gt;docbook.org&amp;lt;/bibcite&gt;&lt;br /&gt;for more information&amp;lt;/p&gt;&lt;/pre&gt;&lt;br /&gt;Note that the link to doc_01.xml uses a URL that points into our repository. But this is an absolute URL, which it needs to be as long as the document is outside the repository. It is a pointer to the resource for doc_01.xml which will be resolved by the repository into the latest version of that resource, which at the moment will be version VER0003, the second version of doc_01.xml.&lt;br /&gt;&lt;br /&gt;This is the easiest case for import because it's unambiguous what the link points to in terms of data that is already in the repository. The address will still have to be rewritten on import but there's no question what it needs to be rewritten to and no question that the target is a version that is already in the repository.&lt;br /&gt;&lt;br /&gt;By contrast, if we had instead authored the link as a reference to a local copy of doc_01.xml, the link might look something like this:&lt;pre&gt;&lt;br /&gt;&amp;lt;p&gt;See&lt;br /&gt;&amp;lt;bibcite href="../doc_01/doc_01.xml"&gt;Document One&amp;lt;/bibcite&gt;&lt;br /&gt;for more information&amp;lt;/p&gt;&lt;/pre&gt;&lt;br /&gt;In that case, the importer has to figure out, by whatever method, whether the file at "../doc_01/doc_01.xml" is in fact a version of a resource already in the repository and whether or not this local version of doc_01.xml is itself a new version that needs to be imported. Again, figuring this out cannot be automatic in the general case but depends on implementation-specific mechanisms.&lt;br /&gt;&lt;br /&gt;This raises the question of how the link was authored in the first place. It's unlikely the author looked inside the repository to figure out that doc_01.xml is really RES0001 and then typed the correct URL. So the authoring tool must be integrated with the repository such that the author can request a list of potential reference targets, pick one, and have the most appropriate address put into the href= attribute value. So there must be some sort of integration API that the repository exposes that the authoring tool can use.&lt;br /&gt;&lt;br /&gt;Note too that the details of the link authoring are completely schema-specific as are the policy rules for what can be linked to. In this case, let us assume that the "bibcite" (bibliographic citation) element can only link to entire publications. That's the easiest case because it only requires us to point to entire storage objects, which we can do with just our storage object repository and which is the easiest UI to create (a flat list of publications). Note also that by adding the "is publication root" metadata property we've enabled the creation of just this selection aid since now we can query the repository to get a list of publication root documents. For a complete implementation we'd probably want to capture things like document title and document number (if it's known) as storage object metadata just for convenience (we could always look inside the documents at the time we build the UI but that would be time consuming, easier to capture it on import or get it from somewhere else once).&lt;br /&gt;&lt;br /&gt;If rather than pointing into the repository we pointed to a local working copy, then the UI could be as simple as just putting up a file chooser and letting the author figure out which file is a publication root (which they would probably know if you've imposed a consistent file organization scheme, such as putting each publication in its own directory under a common root directory). Or you could export sufficient metadata to enable the same UI as before or you could ask the repository, as above, but then, because the repository remembered where stuff was checked out, it knows what local URL to use.&lt;br /&gt;&lt;br /&gt;Which approach is best depends a lot on how the data will be used or handled outside the repository. If your authors are always connected to the repository, no reason not to point to it directly. If your authors need to able to work offline then you'll have to go the local working copy route. And of course you can support both modes of operation from a single repository because it's entirely a function of the importer and exporter logic.&lt;br /&gt;&lt;br /&gt;In any case, on import the resulting URL for pointing to doc_01.xml is "/repository/resources/RES0001", that is, a relative URL (because everything's in the same storage space at this point). We also determine that we don't need to create a new version of doc_01.xml so we don't.&lt;br /&gt;&lt;br /&gt;For the second bibcite, the one to the DocBook site, what happens? The importer could blindly look to see if it has anything for the DocBook site, see that it doesn't, and start importing all the HTML from the site as new resources and versions (XIRUSS-T includes a default HTML importer that will do this). But that's probably not what you want to do. &lt;br /&gt;&lt;br /&gt;So the importer has to have some rules about what things are and are not ever going to be in the repository and external Web sites are probably never going to be in the repository. So on import the importer does not rewrite the href= to the DocBook site.&lt;br /&gt;&lt;br /&gt;However, it could do something very interesting. It could create a new resource and version in the repository that acts as a &lt;i&gt;proxy&lt;/i&gt; for the DocBook Web site. This would be useful because then we can create a dependency relationship between version doc_02.xml and the DocBook Web site without having to literally copy the Web site into our repository. This lets us manage knowledge of an important dependency using the same facilities we use for all our dependencies and gives us a way to capture and track important properties of the Web site using our local metadata facilities. The version is a version that is just a collection of properties with no data (although we could capture data if we wanted to, for example, to capture a cache of the state of the target Web site at the time we did the import).&lt;br /&gt;&lt;br /&gt;So how has the repository changed following the import of doc_02.xml?&lt;br /&gt;&lt;br /&gt;- Created new resource RES0005 and new version VER0005 for doc_02.xml. &lt;br /&gt;&lt;br /&gt;- Set the "is publication root" and "is schema valid" properties to "true" for VER00005.&lt;br /&gt;&lt;br /&gt;- Created new resource RES0006 and new version VER0006 for www.docbook.org.&lt;br /&gt;&lt;br /&gt;- Created three new dependencies from VER0005 to the following resources:&lt;br /&gt;&lt;br /&gt;  - "governed by" dependency to RES0002 (the book.xsd schema)&lt;br /&gt;  &lt;br /&gt;  - "document citation" to RES0001, reflecting the first bibcite link&lt;br /&gt;&lt;br /&gt;  - "document citation" to RES0006, reflecting the second bibcite link&lt;br /&gt;&lt;br /&gt;Now lets do something useful with this data we've worked so hard to create: print it.&lt;br /&gt;&lt;br /&gt;Let's say we have an XSLT script that converts book.xsd documents into XSL-FO for rendering into PDF. To apply this script to a publication we need only point our XSLT engine at the publication root version and style sheet and away it goes, e.g.:&lt;pre&gt;c:&gt; transform http://mycms/repository/versions/VER0005 book-to-fo.xsl &gt; temp.fo&lt;/pre&gt;&lt;br /&gt;Because our repository acts as an HTTP server we don't have to do an export first.&lt;br /&gt;&lt;br /&gt;But there's an important question yet to answer: what does our XSLT script do with the various links?&lt;br /&gt;&lt;br /&gt;In doc_02.xml we have to links to separate &lt;i&gt;publications&lt;/i&gt; and those links need to be published in a form that will be useful in the published result. What does that mean?&lt;br /&gt;&lt;br /&gt;In this case, we're publishing to PDF so we can presume that we want the links to be navigable links in the resulting PDF. Easy enough. But what will the links be to?&lt;br /&gt;&lt;br /&gt;Hmmm.&lt;br /&gt;&lt;br /&gt;In the case of the link to the DocBook Web site that's pretty easy: just copy the URL out as it was originally authored (or as constructed through the use of our proxy version which will have had to remember the original URL or the normalized URL for the target Web site). No problem. Unless we use the proxy object, in which case either the XSLT has to know how to translate a reference to the proxy into a working URL, e.g., get the proxy object, get the appropriate metadata values, and go from there, or the repository has to provide a "getUrlForWebSite()" method that takes a Web site proxy object as input and returns the best URL to use for getting to the Web site itself. This type of function could be characterized as "top of Layer 1" or "bottom of Layer 2" bit of functionality, in that it's generic but it's reflecting our locally-specialized version types. But in this case it's generic enough that it should probably be built into Layer 1. But since it deals with issues of link resolution and data processing it's arguably a Layer 2 functionality.&lt;br /&gt;&lt;br /&gt;In any case, the Web site link is relatively easy.&lt;br /&gt;&lt;br /&gt;But the link to publication doc_01.xml is a bit trickier: we almost certainly don't want the PDF to link to the original source XML, either as it resides in the repository or in some checked-out location. We want it to link to doc_01.xml &lt;i&gt;as published&lt;/i&gt;. But what is that? &lt;br /&gt;&lt;br /&gt;This is the tricky bit: if we haven't already published doc_01.xml then we either have to first publish it and then point to that result or we have to be able to predict in advance where it will be when published or we have to be prepared to post-process the published result (the PDF in this case) to rewrite the pointer to doc_01.xml as published at such time as we know where it is. And even then, if we move the PDFs around we may still need to rewrite the pointers.&lt;br /&gt;&lt;br /&gt;This suggests that we need to be able to do pointer rewriting. For anything. But we already have a generic facility for that in our import/export framework. Happy day! All we have to do is implement code that knows how to do it for PDF and Bob's your uncle. [Have I had too much coffee this morning?]&lt;br /&gt;&lt;br /&gt;This also suggests that the best place to publish &lt;i&gt;to&lt;/i&gt; is the repository itself, because we can both easily serve the results from there and we can easily export them as needed, doing any pointer rewriting that might be necessary. We can also establish dependency relationships between the published results and their source data and capture any other useful metadata about the published artifacts. Because the core repository is generic there's no problem using it to store PDFs or anything else. &lt;br /&gt;&lt;br /&gt;So we're starting to build up a set of components and repository features that together form a "Rendition Manager" that handles the generic aspects of publishing. This rendition manager needs to do the following:&lt;br /&gt;&lt;br /&gt;- Get the input parameters for applying a given rendition process to a given version or set of versions.&lt;br /&gt;&lt;br /&gt;- Provide the appropriate utility functions to rendition processors needed to get access to object metadata, resolve pointers, and so forth.&lt;br /&gt;&lt;br /&gt;- Manage the import of newly-created rendition results back into the repository reflecting its knowledge of the inputs to the process. That is, while we can certainly have a generic PDF importer, we need a PDF importer that also knows that PDF doc_01.pdf was generated from version VER0003 of doc_01.xml and sets a dependency relationship reflecting that.&lt;br /&gt;&lt;br /&gt;Some of this rendition manager can be built into the Layer 1 code, as discussed above (i.e., the API or protocol functions needed) but the management of the specific processors will be be a Layer 3 component. That is, conceptually, the Rendition Manager is a client of Layers 1 and 2, in just the way an integrated authoring tool would be.&lt;br /&gt;&lt;br /&gt;But you must have some form of Rendition Manager in order to do manageable publishing from the repository unless you do everything via bulk export and ad-hoc processes.&lt;br /&gt;&lt;br /&gt;This is an important question to ask of any full-featured CMS provider: do you provide features and components that either comprise a rendition manager or make creating one easy?&lt;br /&gt;&lt;br /&gt;Ok, so we run our rendition process and create a new PDF, doc_02.pdf, and bring it into the repository. The link to doc_01.pdf uses the URL "/repository/resources/RES0007". The link to the DocBook Web site uses the url "/repository/resources/RES0008". In the repository we create the following new objects:&lt;br /&gt;&lt;br /&gt;- Resource RES0006 and version VER0007 for doc_02.pdf. It's MIME-type property indicates that it is of type "application/pdf".&lt;br /&gt;&lt;br /&gt;- Resource RES0007 (and no version) for doc_01.pdf. Surprised? This reflects the fact that we know that at some point in the future there will need to be a doc_01.pdf but we haven't created it yet. The resource object lets us link to it even though we haven't created any versions.&lt;br /&gt;&lt;br /&gt;- Resource RES0008 and version VER0008 for the Web site www.docbook.org. The metadata would include the absolute URL of the Web site and anything else we can usefully glean from it.&lt;br /&gt;&lt;br /&gt;- A dependency of type "rendered from" from VER0007 to resource RES0005 with policy "Version VER0006" indicating the exact version the PDF was created from.&lt;br /&gt;&lt;br /&gt;- A dependency of type "navigates to" from VER0007 to resource RES0008, indicating the link to the docbook.org Web site&lt;br /&gt;&lt;br /&gt;- A dependency of type "navigates to" from VER0007 to resource RES0001, indicating the link to doc_01.xml.&lt;br /&gt;&lt;br /&gt;Why did we create the dependencies from the PDF document? We'll need these should we ever need to export a set of inter-linked PDFs to some delivery location, i.e., the external corporate Web site, an online review server, our local file system, whatever. We also need to know whether or not all the link dependencies are satisfied. We also may need to know if the workflow states of the source publications are those required in order to complete a publishing operation, which we can get by navigating from a given PDF to its publication source to see if it is, for example, in the "approved for publication" state, or if it exists at all.&lt;br /&gt;&lt;br /&gt;For example, lets say that doc_01.xml version VER0003 is in fact in the "approved for publication" state, as is the latest version of doc_02.xml. If we try to do the "publish to corporate Web site" action (a Layer 3 process), we'll first chase down all the "navigates to" dependencies so we can get the PDFs of targets that are PDFs. We navigate to resource RES0007 and discover that it has no versions. With no versions we can't go on--we have no way of knowing, with the repository data we have, what publication might correspond to this PDF resource. Hmmm.&lt;br /&gt;&lt;br /&gt;One way to address this would be to create a "rendition of" dependency from versions to the renditions generated from them. But those dependencies would be redundant with the equivalent links from the renditions to their source versions. In thinking about it it makes more sense to create a resource-to-resource "rendition of" relationship.&lt;br /&gt;&lt;br /&gt;This can be done with metadata on the resource object where the value is just a list of resources that are renditions of this resource. There's no need for indirection because we don't need to select a version, we just need to know that resource RES0007 (doc_01.pdf) is a rendition of resource RES0001. We need to know this because when we finally get around to rendering doc_01.xml we need to know what resource the PDF we create is a version of. The PDF-to-source dependency links will establish the version-to-version relationships.&lt;br /&gt;&lt;br /&gt;Ok, so we do that such that resource RES0007 has the metadata property "rendition of" with the value "RES0001" (doc_01.xml). &lt;br /&gt;&lt;br /&gt;Now when we go to do our publication, we resolve the navigates-to dependency from doc_02.pdf to resource RES0008, the DocBook.org Web site. We discover that this is a resource that is really outside the repository (by looking at the resource or version metadata, which I haven't shown). We see that it's a link to a Web site so we try to resolve the URL to make sure the Web site is at least still there. We can't really know if the Web site is still relevant without putting a human in the loop but we can at least catch the case where the Web site or specific Web resource is completely gone or unreachable.&lt;br /&gt;&lt;br /&gt;Next, we resolve the navigates-to dependency from doc-02.pdf (VER0007) to resource RES0007 and see that it is a rendition of resource RES0001. We get the latest version, VER0003 and check its "approved for publication" status. It's "true" so we can continue. If it had been "false" we'd have to stop right there and report back that not all the dependencies are ready to be published externally.&lt;br /&gt;&lt;br /&gt;But we still have the problem that there's no PDF for doc_01.xml. What do we do?&lt;br /&gt;&lt;br /&gt;We could halt the process and report that somebody needs to render doc_01.xml or we could just do the rendering job ourselves as we know that all the prerequisites have been met. Let's do that. This creates new PDF doc_01.pdf, which we import into the repository just like we did doc_02.pdf, with all the same dependencies and properties and whatnot.&lt;br /&gt;&lt;br /&gt;Now our requirement that all the local dependencies are satisfied is met. Everything's in the correct workflow state, so we now export the PDFs out in a form that can be placed on the corporate Web site. To do this we have to rewrite the URLs of the navigation links from pointers to PDFs inside the repository to pointers to PDF in what their locations will be on the corporate Web site.&lt;br /&gt;&lt;br /&gt;This means that the exporter component has to know what the business rules are for putting things on the corporate Web site, either directly because the rules are coded into the software, or indirectly because, for example, the PDF version objects have metadata values that say what the location should be or must be. &lt;br /&gt;&lt;br /&gt;Let's keep it simple and say that the PDFs are located relative to each other and in the same directory. This means we can rewrite the within-repository URL from "/repository/versions/VER0003" to "./doc_01.pdf".&lt;br /&gt;&lt;br /&gt;Whew.&lt;br /&gt;&lt;br /&gt;We've finally produced some usable output from our system. Time to go home and celebrate a job well done.&lt;br /&gt;&lt;br /&gt;Let's review what we've done and seen:&lt;br /&gt;&lt;br /&gt;- We've taken a system of two inter-linked publications through a cycle of authoring and revision and publication. &lt;br /&gt;&lt;br /&gt;- We've created document-to-document hyperlinks using services provided by the Layer 1 storage manager coupled with Layer 3 customizations integrated into our authoring tool (had I made it clear that authoring tools are Layer 3 components? That should be obvious by now as, except for simple text editors, they're all about the semantics of your documents.).&lt;br /&gt;&lt;br /&gt;- We enabled sophisticated workflow management reflecting local business rules and processes just by adding a few more metadata properties to our version, resource, and dependency objects. &lt;br /&gt;&lt;br /&gt;- We created a Rendition Manager that can manage the creation of renditions from our documents such that the rendered results are themselves managed in the repository, which is a requirement in order to support processes such as publication to a corporate Web site or any operation that requires address rewriting on export.&lt;br /&gt;&lt;br /&gt;- We created a Layer 3 component that manages the "publish to corporate Web site" action by using storage object metadata and dependencies to establish that all the necessary prerequisites are in place (workflow state, existence of renditions) or, if necessary, use the Rendition Manager to produce needed components (doc_01.pdf).&lt;br /&gt;&lt;br /&gt;- We introduced resource-to-resource links using simple metadata values on resources to establish relationships between resources to support the case where a resource may be created in advance of having any versions.&lt;br /&gt;&lt;br /&gt;- We made it clear that our repository can not only manage any kind of storage object but that it's essential that it do so in many cases. Thus we put our PDF renditions back into the repository from which they can be accessed directly for viewing or exported for delivery from other places.&lt;br /&gt;&lt;br /&gt;- We saw the utility in creating "proxy" versions for things we don't own or control so that we can manage our dependencies and metadata on those resources within the repository, keeping all our processing closed over resources, versions, and dependency objects. Very important. You can do all sorts of really useful and clever things with these proxies, including mirroring resources managed in other physical repositories as though they were in yours. [Pinky to corner of mouth, low evil chuckle. Mischievous faraway glint in eyes. Absently pat head of Mini Me.]&lt;br /&gt;&lt;br /&gt;This is pretty sophisticated stuff and is more than a lot of commerical systems do today (while at the same time they do stuff you don't want or need). And we've done it all with relatively simple software components that are connected together in clever ways. Because all the Layer 3 stuff we've invented for this use case can be built in isolation both from the Layer 1 repository and from each other, they can be individually as simple or sophisticated as needed or as you can afford. For example, the Rendition Manager could really just be a bunch of XSLT scripts or it could be a deeply-engineered body of Java code served through a full-scale Web server and designed to handle thousands of rendition requests an hour. But the minimum functionality of each of these components is pretty modest and no single component represents an unreasonable implementation difficulty--it's all very workaday programming: get an object, get a property value, chase it down, check a rule, get the target object, check it's properties, apply a business rule, run a process, move some data, create a new resource, set some properties, blah blah blah is it lunch yet?&lt;br /&gt;&lt;br /&gt;That is, the requirements might be broad but the implementation need not be deep and it certainly doesn't need to be monolithic or exclusive.&lt;br /&gt;&lt;br /&gt;I know that at this point you've been given a lot to think about and if you've read this far in anything like one go your head is probably spinning. Mine is and I've written this stuff. &lt;br /&gt;&lt;br /&gt;Hopefully I've succeeded in binding these general concepts and architectures to realistic use cases and processes that make it easier to see how they apply and where their power as enabling abstractions and implementation and design techniques really accrue.&lt;br /&gt;&lt;br /&gt;Note too that we've only done document-to-document links--we haven't said anything about element-to-element links and what that might imply. That's actually because most of the inherent complexity is at the storage-object-to-storage-object level. Going to element-to-element linking really only complicates user interfaces and presents some potential scale and performance problems (because of the sheer potential volume of data to be captured and managed because there're typically orders of magnitude more elements than storage objects). But the fundamental issues of resolution and dependency tracking are the same so we'll see that we really don't have to do much more to our system to enable creation, use, and management of element-to-element links. We've done almost all the hard work already. And it wasn't really that hard.&lt;br /&gt;&lt;br /&gt;[As an aside: I'm pretty happy with how this narrative is coming out even in this first draft. I fully intend to edit it together into a more accessible, coherent form as soon as I get it all out of my head, which shouldn't take too much longer. I hope.]&lt;br /&gt;&lt;br /&gt;Next time: element-to-element linking (probably)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115409028829472632?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115409028829472632/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115409028829472632' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115409028829472632'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115409028829472632'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-import-is-everything-part-4.html' title='XCMTDMW: Import is Everything, Part 4'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115408858823965890</id><published>2006-07-28T06:14:00.000-05:00</published><updated>2007-03-07T10:39:34.107-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; import'/><title type='text'>XCMTDMW: Import is Everything, Part 3</title><content type='html'>I hope I'm starting to get the point across that import, the act of crossing the boundary between outside and inside the repository, is where everything really happens. Because if I'm not making that point something is really wrong.&lt;br /&gt;&lt;br /&gt;Before we continue exploring the import and access use cases started in Part 2, let's talk about schema-specificity for a moment, because I want to be careful I'm not painting too rosy a picture with all my talk about generic XML processing.&lt;br /&gt;&lt;br /&gt;One issue with managing XML documents is the sensitivity of the management system to the details of the schemas. In the worst case the low-level repository schema directly reflects the schema such that any change to the document schema requires a change to the repository schema which, in the worst case, requires an export and re-import of all the data in the repository, which is a dangerous and disruptive thing to have to do.&lt;br /&gt;&lt;br /&gt;That's clearly crazy and any system that has that implication is so inappropriately overoptimized that it makes make crazy to even think about it.&lt;br /&gt;&lt;br /&gt;We've also seen that a completely generic system for importing XML, while useful, isn't nearly complete enough to support the needs of local business proceses and business rules.&lt;br /&gt;&lt;br /&gt;In yesterday's entry, Import is Everything, Part 2, we had just gotten to the point where we were creating, on import, some storage object metadata properties that were specific to our local policies, such as the "is schema valid" property, in the sense that we needed those properties in order to implement our policies and the business processes or user actions they implied. But those properties are still generic with respect to the document's schemas. An XML document is either schema valid or it isn't, regardless of the schema.&lt;br /&gt;&lt;br /&gt;Because we were operating on just the XSD-defined links (schemaLocation=) some of our import processing was schema-specific but specific to a completely standard schema (XSD), not to our local schema.&lt;br /&gt;&lt;br /&gt;But we're about to explore some use cases where we do need local schema awareness and we'll start to see where that awareness resides in the code. The short answer is, it resides in the import processing, top-of-Layer 2, and Layer 3 components. None of these should require a complete export and import of the documents involved should the schema change, although they might require reprocessing some or all of the documents in the repository (but directly from the respository).&lt;br /&gt;&lt;br /&gt;It should be pretty clear by now that any extraction of metadata or recognition of dependency relationships that is schema-specific will of course happen in schema-specific import code. That's why the XIRUSS-T importer framework is designed the way it is, because you always have to write at least a little bit of code that is unique to your schemas and your business processes so why not make writing that code as easy as possible?&lt;br /&gt;&lt;br /&gt;By "top-of-Layer 2" I mean code that does semantic processing of the elements inside the documents, such as link management, that sits on top of the generic facilities in Layer 2 but that may be schema specific, for whatever reason (usually optimization necessary to achieve appropriate scale or performance). For example, any full-text or element metadata index is a Layer 2 component. You can implement a completely generic, schema-independent indexing mechanism but for non-trivial document volumes and/or sophisticated schemas you will very likely want to tailor the index to both not index things you're unlikely to ever search for or to index things in a way that is more abstract than the raw XML syntax (I'll talk about these in more detail when I get around to full-text indexing of XML as a primary topic). To implement these specializations you'll need to tailor the indexer and possibly the UI for using the index in ways that are schema specific. No getting around it.&lt;br /&gt;&lt;br /&gt;Likewise Layer 3 is where you implement functionality that reflects specific business processes and policies, which means processes that act on the XML in the repository as well as on the storage-object and element metadata in order to do useful stuff. Much of this functionality will be schema specific to one degree or another (but not all of it, of course). &lt;br /&gt;&lt;br /&gt;So unless you can get by with a very generic system that only implements support for standards, you will always have to create and maintain system components that are schema specific. However, there are some important characteristics of a system architected as I've outlined here:&lt;br /&gt;&lt;br /&gt;- The core storage object repository, Layer 1, is &lt;i&gt;never&lt;/i&gt; schema specific. This means that the importers and Layers 2 and 3 can change without ever effecting the storage objects managed in Layer 1. In particular, you will &lt;i&gt;never&lt;/i&gt; require an export and re-import if Layers 2 or 3 change.&lt;br /&gt;&lt;br /&gt;- The code most sensitive to the schema details is closest to the edges of the repository and, in most cases, builds on more generic facilities. This has two advantages: the amount of code that is actually schema specific is minimized and the disruptive potential of changing that code is minimized. &lt;br /&gt;&lt;br /&gt;- You get to choose, as a matter of policy and implementation, the degree of schema specificity is appropriate for a given feature. You can choose whether your full-text index is generic or tailored, the degree to which you reflect the semantics of your link types in the dependency objects created from them, and so on. So you can start small and work up as both your understanding of your business processes improves and as your schemas become more stable (assuming you're starting from scratch with brand-new schemas).&lt;br /&gt;&lt;br /&gt;Regardless of how its architected or implemented, most of the ongoing maintenance and operating cost of an XML-aware CMS comes from reaction to changes in the schemas of the documents managed. The only question is: does the CMS design and implementation minimize that cost or does it maximize it?&lt;br /&gt;&lt;br /&gt;Also, when you start planning for the creation and deployment of an XML-aware CMS you need to define your overall requirements such that you can clearly distinguish those requirements that are schema-specific or schema-sensitive from those that are not. For example, a requirement to impose a basic workflow onto documents is probably not schema specific but a requirement to manage a particular kind of link that is not defined in terms of any standard is schema specific.&lt;br /&gt;&lt;br /&gt;By separating the requirements in this way you can both better estimate the immediate and long-term costs of supporting those requirements and help the implementors keep the code that is schema independent more clearly separated from the code that is schema sensitive. This will go a long way toward making your system much less expensive to maintain in the long run and much more flexible in the face of new requirements, whether they are new business processes or new schema features.&lt;br /&gt;&lt;br /&gt;Next: More linking and stuff&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115408858823965890?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115408858823965890/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115408858823965890' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115408858823965890'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115408858823965890'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-import-is-everything-part-3.html' title='XCMTDMW: Import is Everything, Part 3'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115400260128734192</id><published>2006-07-27T06:04:00.000-05:00</published><updated>2007-03-07T10:42:40.933-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; import namespaces'/><title type='text'>XCMTDMW: Import is Everything, Part 2</title><content type='html'>At the end of part one we had successfully imported our system of two documents, our publication source document, doc_01.xml, and the XSD schema document that governs it, book.xsd. We created the dependency relationship between doc_01.xml version 1 and book.xsd and we captured as much object metadata as we could given what little we knew about the data at hand. This created a repository with the following state:&lt;pre&gt;/repository/resources/RES0001  - name: "doc_01.xml"; initial version: VER0001&lt;br /&gt;/repository/resources/RES0002  - name: "book.xsd"; initial version: VER0002&lt;br /&gt;           /versions/VER0001   - name: "doc_01.xml"; Resource: RES0001&lt;br /&gt;                                 dependency: DEP0001&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;                    /VER0002   - name: "book.xsd"; Resource: RES0002&lt;br /&gt;                                 root element type: "http://www.w3.org/2001/XMLSchema:schema"&lt;br /&gt;                                 namespaces: http://www.w3.org/2001/XMLSchema&lt;br /&gt;                                 target namespaces: &lt;br /&gt;http://www.example.com/namespaces/book&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.0&lt;br /&gt;                                 encoding: UTF-16&lt;br /&gt;           /dependencies/DEP0001 - Target: RES0002; policy: "latest"&lt;br /&gt;                                   Dependency type: "governed by"&lt;/pre&gt;&lt;br /&gt;We saw that it was the importer that needed to have all the XML awareness.&lt;br /&gt;&lt;br /&gt;Now we need to see what happens when we do something with our data. There are two interesting use cases at this point:&lt;br /&gt;&lt;br /&gt;1. Creation of a new version of doc_01.xml&lt;br /&gt;&lt;br /&gt;2. Creation of a new document governed by the same schema&lt;br /&gt;&lt;br /&gt;For use case 1 lets say that by some mechanism, and it doesn't matter what, we end up with a new document outside the repository called doc_01.xml that is different in its data content from the doc_01.xml we imported as VER0001 into the repository. E.g., we checked VER0001 out of the repository, edited it, and now want to check it back in. Or we left the original doc_01.xml where it was on our file system, edited that copy, and now want to check it in. Or our editor accessed the bytes in VER0001 directly from the repository, let us edit them, and now wants to create a new version in the repository. It doesn't matter how we come to have the changed version of doc_01.xml, the import implications are more or less the same.&lt;br /&gt;&lt;br /&gt;The first steps of the import process are the same:&lt;br /&gt;&lt;br /&gt;1. Process the XML document semantically in order to discover any relationships it expresses via links in order to determine the members of the bounded object set we need to import. &lt;br /&gt;&lt;br /&gt;2. Process the compound document children of the root storage object, i.e., book.xsd. We determine that book.xsd has no import or include relationships to any other XSD documents&lt;br /&gt;&lt;br /&gt;Assuming we haven't changed the schema reference or the "book" element's namespace, we get the same result BOS we did before: doc_01.xml and book.xsd.&lt;br /&gt;&lt;br /&gt;3. For each member of the BOS, determine whether or not the repository already has a resource for which the BOS member should be a new version.&lt;br /&gt;&lt;br /&gt;Now it gets interesting. First, we have to determine if our new doc_01.xml is really a version of resource RES0001. Remember: there's no general solution to this problem--you have to do something to either remember this information outside the repository or provide some heuristic for figuring it out when you need to or simply ask the user. &lt;br /&gt;&lt;br /&gt;When I said above that it didn't matter how we came to have a new doc_01.xml that wasn't quite true because the way that we came to have it will likely determine how we know what version and resource it relates to in the repository.&lt;br /&gt;&lt;br /&gt;If you use a check-out operation then you can capture the information about what version and resource you checked out, either as separate local metadata or embedded in the XML document (for example, as a processing instruction or attribute value). Putting the metadata in the document itself is safer because then you can't (easily) lose it but it limits you to managing XML data only (and it's not really safe because you can't prevent an author from modifying it if they really wanted to). Putting the metadata outside the document is more general but then requires a bit more work, either on the part of authors (they have to know where things are or should be on the file system) or in terms of some local data management facility to maintain the information. But this is the approach that CVS and Subversion use. It's simple and it works fine as long as users know that the limits are on their ability to do things like move files around.&lt;br /&gt;&lt;br /&gt;If you are accessing the bytes of a storage object directly via an editor then the editor can just remember where it got them from. This works as long as the editor doesn't crash or, if when it does crash, it's cached the metadata away somewhere.&lt;br /&gt;&lt;br /&gt;But it can still happen that you just get a file from somewhere and whoever gave it to you tells you "this should be a new version of resource RES0001". For example, somebody might have made some changes offline and mailed you the file. In this case, you, the human, have to figure out what to do. &lt;br /&gt;&lt;br /&gt;Note too that in the general case you can't depend on things like filenames. While we usually do as a matter of practice there's no magic to it. If you look at the repository listing above you'll notice that the resources and versions both have name properties. At least in the SnapCM model, these names are arbitrary and need to be unique in any scope beyond the object itself (and an object can have multiple names--they're just metadata values as far as SnapCM is concerned). The invariant, unique identifiers of the objects are the object IDs (RES0001, VER0002, etc.). For versions, the ultimate identifier is the resource they are a version of. &lt;br /&gt;&lt;br /&gt;For example, say you like to reflect the version of a file in the filename itself, a common practice when people are not using an actual versioning system. You find you've got directories full of files like "presentation_v1.ppt" and "presentation_v2.ppt" and "presentation_final_wek.ppt". The filenames may only be coincidently similar but you happen to know that they are all in fact different versions of the same resource, the presentation you were asked to write. In a repository like ours here you could import all these different versions and create them as versions of the same resource and they could keep their original names as their Name metadata value.&lt;br /&gt;&lt;br /&gt;This is all to make the point that two storage objects are different versions of the same resource &lt;i&gt;because we say they are&lt;/i&gt; and the general nature of the SnapCM model lets us say it however we want for whatever reason--there's no dependency on any particular storage organization or naming conventions or anything else. This means that you're free to apply the model to any particular way of organizing and naming things you happen to prefer. It also means that you can take any system of versioned information and recreate it exactly (in terms of the version-to-version and version-to-resource relationships) in a SnapCM repository.&lt;br /&gt;&lt;br /&gt;Ok, back to our task. In this case we know that our local doc_01.xml is in fact a new version of resource RES0001.&lt;br /&gt;&lt;br /&gt;Now we come to the schema, book.xsd. If we never exported it, meaning that we accessed it directly from the repository, then we will see that the pointer to it points back into the repository, that is, doc_01.xml as intially exported looks like this:&lt;pre&gt;&amp;lt;?xml version="1.1"?&gt;&lt;br /&gt;&amp;lt;book xmlns="http://www.example.com/namespaces/book"&lt;br /&gt;     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" &lt;br /&gt;     xsi:schemaLocation="&lt;b&gt;/repository/resource/RES0002&lt;/b&gt;"&lt;br /&gt;&gt;&lt;br /&gt;  ...&lt;br /&gt;&amp;lt;/book&gt;&lt;/pre&gt;The importer can therefore know with certainty that we never created a new version (because versions inside the repository are invariant and cannot be changed) and therefore excludes it from the BOS to be imported. It's part of the BOS rooted at doc_01.xml, but since it's already in the repository we don't need to import it.&lt;br /&gt;&lt;br /&gt;But if we had exported both doc_01.xml and book.xsd, such that doc_01.xml as exported looked like this:&lt;pre&gt;&amp;lt;?xml version="1.1"?&gt;&lt;br /&gt;&amp;lt;book xmlns="http://www.example.com/namespaces/book"&lt;br /&gt;     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" &lt;br /&gt;     xsi:schemaLocation="&lt;b&gt;../schemas/book/book.xsd&lt;/b&gt;"&lt;br /&gt;&gt;&lt;br /&gt;  ...&lt;br /&gt;&amp;lt;/book&gt;&lt;/pre&gt;Then we've got a potential issue because we may or may not have modified the schema (possibly inadvertently if we, for example, opened it in an editor to see what it's rules were and as a side effect saved it, changing even just some whitespace).&lt;br /&gt;&lt;br /&gt;The importer must now determine if it really needs to import a new version of book.xsd or not and, if it does, should it create it as a new resource or as a new version of an existing resource. How can it make this determination?&lt;br /&gt;&lt;br /&gt;First, it can look to see if there is already a schema in the repository that governs the namespace "http://www.example.com/namespaces/book". It can make this determination by doing a query like "find all latest versions with root element 'http://www.w3.org/2001/XMLSchema:schema' and with targetNamespace value 'http://www.example.com/namespaces/book'". If this returns any versions then you know that you have at least one resource related to the target namespace that is an XSD schema (and not, for example, a RelaxNG schema). &lt;br /&gt;&lt;br /&gt;If you get back more than one resource then you have a problem: either something screwed up on a prior import and created two resources for what should have been one resource or you have two truly different XSD documents that both target the same namespace. Now you have to figure out which one is the correct one to use before you can even decide whether or not to create a new version of it. How do you decide?&lt;br /&gt;&lt;br /&gt;I find this to be a tough question. The challenge here is partly a function of the details of XSD in that you can choose to organize an XSD schema into multiple XML documents, all of which may name the same target namespace. But only one of them is the real root of the compound document, that is, only one of them can actually be used as the starting point for validating documents.&lt;br /&gt;&lt;br /&gt;You might also have different variants of the same base schema for different purposes. For example, I have this case where I have one variant for publishing that defines global key constraints and another variant for authoring that does not, because for authoring the documents will be organized into many separate XML documents and XSD provides no way to constraint or validate cross-document references.&lt;br /&gt;&lt;br /&gt;One way to handle this would be to use version metadata to indicate explicitly which of your XSD documents are schema roots and which are not. Another way would be to put that inside the schema as an attribute on the schema element or a subelement in your own namespace or whatever. And of course you could do both, with your XSD importer using the embedded metadata to set the storage object metadata.&lt;br /&gt;&lt;br /&gt;But you should start to see that this is the first place where we are forced to integrate the repository with our local and non-standard business rules and that the knowledge and implementation of those business rules is in...wait for it...the importer.&lt;br /&gt;&lt;br /&gt;It should also be clear at this point that no out-of-the-box XML-aware importer is going to do the thing you need except by accident or if you modified your policies in order to fit what the tool does. If the tool you choose happens to match what you already do or what you're happy to do, then great, you chose well, buy the engineers who built it a beer and go on your way. But if it doesn't....&lt;br /&gt;&lt;br /&gt;Another approach would be to limit yourself to having exactly one XSD document per governed namespace. This is the easiest solution and a lot of times you can do it but it's not realistic as a general practice for the reasons given above.&lt;br /&gt;&lt;br /&gt;OK, so schemas (and not just XSD schemas, any form of schema) complicate things. &lt;br /&gt;&lt;br /&gt;So where were we? Oh yeah, is our book.xsd a new resource, a new version of an existing resource, or already in the repository?&lt;br /&gt;&lt;br /&gt;In our current example there is only one version that governs the namespace so we only need to determine if we need to import our local copy. Here we have to look to see if it's been modified locally. If the local copy has not been modified, which we can know if we captured the time it was checked out at (this is what CVS does) and compare that to the last-modified time stamp on the file, then we know we don't need to import it. If it has been modified, or at least the timestamp has changed or if we didn't capture that (somebody just sent us a bunch of files and said load these up), then our only choice is to do some form of diff against the version in the repository.&lt;br /&gt;&lt;br /&gt;We could just do a simple byte compare, which is easy to implement but for XML we might want to be more sophisticated and use an XML-aware diffing engine so we don't commit new versions that differ only in things like whitespace within markup. Again, this is a function of the importer and you, the importer implementor, get to choose how sophisticated you make it. For something simple like XIRUSS-T you can expect at most a simple byte-level diff. For a commercial system that claims XML awareness you should expect some sort of XML differencing that you can configure. Or you might just have to figure it out yourself by asking somebody or looking at the files or guessing.&lt;br /&gt;&lt;br /&gt;OK, in our case we do a simple byte compare and determine that the file we have locally and the one in the repository are identical, so no need to create a new version.&lt;br /&gt;&lt;br /&gt;3.1 In temporary storage (or in the process of streaming the input bytes into the newly-created version objects) rewrite all pointers to reflect the locations of the target resources or versions as they will be within the repository.&lt;br /&gt;&lt;br /&gt;This is just like the last time.&lt;br /&gt;&lt;br /&gt;3.2 For each BOS member, identify the relevant metadata items and create each one as a metadata item on the appropriate newly-created repository object.&lt;br /&gt;&lt;br /&gt;Ditto&lt;br /&gt;&lt;br /&gt;4. Having constructed our empty storage-object-to-version map, we execute the import process. In this case, we will construct the following new objects in the repository:&lt;br /&gt;&lt;br /&gt;- Version object VER0003, the next version of VER0001 (and by implication, a version of resource RES0001)&lt;br /&gt;&lt;br /&gt;- Dependency object DEP0002 from version VER0003 to resource RES0002, reflecting the governed-by relationship between doc_01.xml and book.xsd. &lt;br /&gt;&lt;br /&gt;The new state of the repository is:&lt;pre&gt;/repository/resources/RES0001  - name: "doc_01.xml"; initial version: VER0001&lt;br /&gt;/repository/resources/RES0002  - name: "book.xsd"; initial version: VER0002&lt;br /&gt;           /versions/VER0001   - name: "doc_01.xml"; Resource: RES0001&lt;br /&gt;                                 prev_versions: {none}&lt;br /&gt;                                 &lt;b&gt;next_versions: VER0003&lt;/b&gt;&lt;br /&gt;                                 dependency: DEP0001&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;                    /VER0002   - name: "book.xsd"; Resource: RES0002&lt;br /&gt;                                 prev_versions: {none}&lt;br /&gt;                                 next_versions: {none}&lt;br /&gt;                                 root element type:                                                 &lt;br /&gt;"http://www.w3.org/2001/XMLSchema:schema"&lt;br /&gt;                                 namespaces: http://www.w3.org/2001/XMLSchema&lt;br /&gt;                                 target namespace: &lt;br /&gt;http://www.example.com/namespaces/book&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.0&lt;br /&gt;                                 encoding: UTF-16&lt;br /&gt;                     &lt;b&gt;VER0003   - name: "doc_01.xml"; Resource: RES0001&lt;/b&gt;&lt;br /&gt;                                 prev_versions: VER0001&lt;br /&gt;                                 next_versions: {none}&lt;br /&gt;                                 dependency: DEP0002&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;           /dependencies/DEP0001 - Target: RES0002; policy: "latest"&lt;br /&gt;                                   Dependency type: "governed by"&lt;br /&gt;           &lt;b&gt;/dependencies/DEP0002 - Target: RES0002; policy: "latest"&lt;/b&gt;&lt;br /&gt;                                   Dependency type: "governed by"&lt;/pre&gt;&lt;br /&gt;Notice a few new things in this listing:&lt;br /&gt;&lt;br /&gt;- I've added the prev/next version pointers to the versions. In SnapCM, each version can have more than one previous or next version where different versions are organized into different "branches", which I haven't talked about yet (our current repository is a repository with exactly one branch, if you want to be precise about it).&lt;br /&gt;&lt;br /&gt;- There are two dependency objects which appear to be identical by the metadata shown. However, each dependency is owned by the version that uses it (it's really an exclusive property of the version) and its metadata is not invariant. In particular, you are likely to want to change the resolution policy for a given version as the state of the repository changes, as we'll see in a moment. Of course, a real implementation could transparently normalize the dependency objects so it only maintained instances that actually varied in their properties, creating new instances as necessary. But that's optimization we don't need to worry about here. [You may be starting to see the method in my madness: if I can think of a way it could be optimized I don't worry about reflecting that optimization in the abstract model, because I'm confident that if that optimization is needed it can be added to the implementation.]&lt;br /&gt;&lt;br /&gt;- Except for maybe doing a diff on import, we've said nothing about the data content of the versions. That's because, for most purposes the data content is really secondary and arbitrary. There's nothing about the functioning of the repository itself (as opposed to the importer, which is all about the data) that has any direct knowledge of or dependency on the data inside the storage objects. You can think of the repository as a Swiss bank: it doesn't know and it doesn't want to know. Knowing is somebody else's job. By the same token, there are lots of types of versions that are only collections of simple metadata values and are not storage objects at all.&lt;br /&gt;&lt;br /&gt;OK, so now we've successfully committed a new version of doc_01.xml into the repository, we correctly did not create an unnecessary new version of the schema. We did a good day's work, let's go home. &lt;br /&gt;&lt;br /&gt;OK, not so fast.&lt;br /&gt;&lt;br /&gt;We discovered that our schema is not complete with respect to our requirements and we have to add a couple of new element types or some attributes or whatever. The point is we have to modify it. We also discover that one of our existing content models is wrong wrong wrong and that we have to change it in a way that will make existing documents invalid. Doh!&lt;br /&gt;&lt;br /&gt;So we check out version VER0002 to create a local copy of book.xsd. We edit it to change the content model, and go to commit it back to the repository.&lt;br /&gt;&lt;br /&gt;But wait--if we do that, what will happen?&lt;br /&gt;&lt;br /&gt;By default, all the dependency links from documents to their governing schemas use the "latest" policy. If we commit a new version we will effectively break those documents even though they are, today, valid against the current latest version of book.xsd in the repository. What do we do?&lt;br /&gt;&lt;br /&gt;This is a matter of policy. You could choose to invalidate all the documents and require that they all be edited to make them valid. Sometimes that's the right thing to do based on whatever your local requirements are.&lt;br /&gt;&lt;br /&gt;Or you could do this:&lt;br /&gt;&lt;br /&gt;1. Find all the dependencies that point to schema book.xsd: "find all dependency objects of type 'governed by' that point to resource RES0002"&lt;br /&gt;&lt;br /&gt;2. For each dependency, change its resolution policy from "latest" to "Version VER0002".&lt;br /&gt;&lt;br /&gt;This changes the dependencies from being dynamic, resolution-time pointers to hardened version-specific pointers. Notice to that we didn't do anything to the versions involved.&lt;br /&gt;&lt;br /&gt;Now, lets refine this operation a little bit by saying that, as a matter of our policy, we want to harden the links to schemas for all versions that are not the latest version of their resource. That is, we don't want to break any old versions but we do want to break the latest so that we know we have to fix it.&lt;br /&gt;&lt;br /&gt;That means that for dependency DEP0001 we will change the policy to "Version VER0002" but for DEP0002 we will not. In addition, we will add a metadata value to the latest versions to indicate that we know they are not (or probably not) valid against their schema [I know I said that version metadata is invariant but actually some is and some isn't depending on the semantics of the metadata {or you can imagine that we created a new version to reflect the new metadata, updated the repository to reflect it and went on--since I have to type the repository state by hand, let's just say we can change version metadata.].&lt;br /&gt;&lt;br /&gt;The new state of the repository is:&lt;pre&gt;/repository/resources/RES0001  - name: "doc_01.xml"; initial version: VER0001&lt;br /&gt;/repository/resources/RES0002  - name: "book.xsd"; initial version: VER0002&lt;br /&gt;           /versions/VER0001   - name: "doc_01.xml"; Resource: RES0001&lt;br /&gt;                                 prev_versions: {none}&lt;br /&gt;                                 next_versions: VER0003&lt;br /&gt;                                 dependency: DEP0001&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;                                 &lt;b&gt;is schema valid: true&lt;/b&gt;&lt;br /&gt;                    /VER0002   - name: "book.xsd"; Resource: RES0002&lt;br /&gt;                                 prev_versions: {none}&lt;br /&gt;                                 next_versions: {none}&lt;br /&gt;                                 root element type:                                                 &lt;br /&gt;"http://www.w3.org/2001/XMLSchema:schema"&lt;br /&gt;                                 namespaces: http://www.w3.org/2001/XMLSchema&lt;br /&gt;                                 target namespace: &lt;br /&gt;http://www.example.com/namespaces/book&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.0&lt;br /&gt;                                 encoding: UTF-16&lt;br /&gt;                     VER0003   - name: "doc_01.xml"; Resource: RES0001&lt;br /&gt;                                 prev_versions: VER0003&lt;br /&gt;                                 next_versions: {none}&lt;br /&gt;                                 dependency: DEP0002&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;                                 &lt;b&gt;is schema valid: false&lt;/b&gt;&lt;br /&gt;           /dependencies/DEP0001 - Target: RES0002; &lt;b&gt;policy: "Version VER0002"&lt;/b&gt;&lt;br /&gt;                                   Dependency type: "governed by"&lt;br /&gt;           /dependencies/DEP0002 - Target: RES0002; policy: "latest"&lt;br /&gt;                                   Dependency type: "governed by"&lt;/pre&gt;&lt;br /&gt;Let's think about what we've done:&lt;br /&gt;&lt;br /&gt;- We've used the indirection of the dependency links to change or preserve the processing result of the XML documents even though we didn't change the documents themselves. For the old version of doc_01.xml we preserved our ability to process it as a valid document by explicitly binding it to the latest version of book.xsd against which it was validated. For the new version of doc_01.xml we made the conscious choice to allow it to become invalid when we commit the new version of book.xsd.&lt;br /&gt;&lt;br /&gt;- We added a new metadata value, "is schema valid" that allows us to capture information about the documents that reflects some aspect of their processing. In this case we're setting it because we know we're about to make it true, but you could imagine that we have a process that gets every latest XML document that is not a schema, validates it against its schema, and records the result in the "is schema valid" property. This could then drive a Layer 3 workflow application that every morning sends a report listing all the XML documents that are not valid. Or we could do a validation on import and indicate the result there. Whatever. The point is we've added more metadata that is specific to our business processes and policies.&lt;br /&gt;&lt;br /&gt;Now that we've made the repository safe for a new schema version, we import our updated book.xsd document using the same process as before. The new state of the repository is:&lt;pre&gt;/repository/resources/RES0001  - name: "doc_01.xml"; initial version: VER0001&lt;br /&gt;/repository/resources/RES0002  - name: "book.xsd"; initial version: VER0002&lt;br /&gt;           /versions/VER0001   - name: "doc_01.xml"; Resource: RES0001&lt;br /&gt;                                 prev_versions: {none}&lt;br /&gt;                                 next_versions: VER0003&lt;br /&gt;                                 dependency: DEP0001&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;                                 is schema valid: true&lt;br /&gt;                    /VER0002   - name: "book.xsd"; Resource: RES0002&lt;br /&gt;                                 prev_versions: {none}&lt;br /&gt;                                 &lt;b&gt;next_versions: VER0004&lt;/b&gt;&lt;br /&gt;                                 root element type:                                                 &lt;br /&gt;"http://www.w3.org/2001/XMLSchema:schema"&lt;br /&gt;                                 namespaces: http://www.w3.org/2001/XMLSchema&lt;br /&gt;                                 target namespace: &lt;br /&gt;http://www.example.com/namespaces/book&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.0&lt;br /&gt;                                 encoding: UTF-16&lt;br /&gt;                     VER0003   - name: "doc_01.xml"; Resource: RES0001&lt;br /&gt;                                 prev_versions: VER0003&lt;br /&gt;                                 next_versions: {none}&lt;br /&gt;                                 dependency: DEP0002&lt;br /&gt;                                 namespaces: http://www.example.com/namespaces/book&lt;br /&gt;                                 root element type: "book"&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.1&lt;br /&gt;                                 encoding: UTF-8&lt;br /&gt;                                 is schema valid: false&lt;br /&gt;                    &lt;b&gt;/VER0004   - name: "book.xsd"; Resource: RES0002&lt;/b&gt;&lt;br /&gt;                                 prev_versions: VER0002&lt;br /&gt;                                 next_versions: {none}&lt;br /&gt;                                 root element type:                                                 &lt;br /&gt;"http://www.w3.org/2001/XMLSchema:schema"&lt;br /&gt;                                 namespaces: http://www.w3.org/2001/XMLSchema&lt;br /&gt;                                 target namespace: &lt;br /&gt;http://www.example.com/namespaces/book&lt;br /&gt;                                 mime type: application/xml&lt;br /&gt;                                 xml version: 1.0&lt;br /&gt;                                 encoding: UTF-16&lt;br /&gt;           /dependencies/DEP0001 - Target: RES0002; policy: "Version VER0002"&lt;br /&gt;                                   Dependency type: "governed by"&lt;br /&gt;           /dependencies/DEP0002 - Target: RES0002; policy: "latest"&lt;br /&gt;                                   Dependency type: "governed by"&lt;/pre&gt;&lt;br /&gt;Now we we're starting to get some interesting stuff in the repository.&lt;br /&gt;&lt;br /&gt;We have cross-document links (the links from the doc_01.xml documents to their schemas), we have version-aware link resolution, via the dependencies, we have both generic and business-process-specific metadata, and we have some sequences of versions in time.&lt;br /&gt;&lt;br /&gt;We can also see that the repository itself stays remarkably simple--what you see here is not that far from what a fully-populated set of properties and objects would look like (as you can see if you run the XIRUSS-T application). You can also see that the repository state could easily be represented using a direct XML representation for export, archiving, or interchange (the storage object data streams could be held in the same XML or as separate storage objects on the file system). &lt;br /&gt;&lt;br /&gt;But we've done some pretty sophisticated stuff what with intelligent handling of schema versions, managing our links using indirect, version-aware, policy-based pointers. How did we do it? We did it in the importer (and to a lesser degree, in the exporter), where all the complexity lies because that's where the specific knowledge of the data formats and their semantics and our local business objects, processes, and policies lie.&lt;br /&gt;&lt;br /&gt;Let's talk about exporters for a minute. &lt;br /&gt;&lt;br /&gt;I haven't said much about exporters because most of the complexity is in the importers because that's where you have to do all the initial syntactic and semantic processing to get the stuff into the repository. Getting it out is usually much easier.&lt;br /&gt;&lt;br /&gt;In the best case there is no export at all: you access all storage objects directly from the repository without first copying them out to your local file system.&lt;br /&gt;&lt;br /&gt;But in reality you will always need to do some exporting, if only for long-term, repository-independent archiving of your data (you do do that, right?).&lt;br /&gt;&lt;br /&gt;For export, the main concern is rewriting of pointers on export so that the pointers point to the appropriate version of the correct resource in the correct location. As we saw above, this varies from doing nothing (if you are accessing the target object from the repository using the current resolution policy) to setting it to a relative URL that reflects where the target was copied to locally. &lt;br /&gt;&lt;br /&gt;In addition, depending on how you manage the local file-to-version metadata on export, the exporter needs to set that metadata. Essentially, the exporter needs to have in its head a mapping from versions in the repository to their eventual locations, as exported, so it can then rewrite any pointers that need rewriting. This map is either explicit because the exporter creates it as it does the exporting or it's implicit in some file organization convention, the most obvious of which is that the export structure matches the directory (or folder or cabinet or whatever) structure in the repository.&lt;br /&gt;&lt;br /&gt;Of course there's more that an exporter could do, such as creating Zip or tar packages of the exported files, loading the results into another repository, or whatever.&lt;br /&gt;&lt;br /&gt;So exporters also have to be smart and they will also have some knowledge of the data formats to be exported (so they can, at a minimum, rewrite pointers) and local business rules and policies, but they are still much simpler than the corresponding importers and much of their work is probably already supported by facilities needed by the importer (such as XML attribute rewriting).&lt;br /&gt;&lt;br /&gt;But we've now seen one complete cycle of the create-modify-create-new-version process, and once you can do one cycle you can do a million.&lt;br /&gt;&lt;br /&gt;We still need to look a bit more closely at the implications of resolution of links via dependency objects. We also need to look at more linking cases, both for import and for processing. Finally, we need to look at the requirements and implications for rendition processing (that is, processing compound documents to produce a deliverable publication, such as PDF or HTML pages).&lt;br /&gt;&lt;br /&gt;Next time: Linking and addressing with versioned hyperdocuments&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115400260128734192?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115400260128734192/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115400260128734192' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115400260128734192'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115400260128734192'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-import-is-everything-part-2.html' title='XCMTDMW: Import is Everything, Part 2'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115394809999434050</id><published>2006-07-26T15:54:00.000-05:00</published><updated>2007-03-07T10:44:36.664-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; &quot;referent tracking documents&quot; snapcm'/><title type='text'>XCMTDMW: Has it Really Been 11 Years?</title><content type='html'>A slight aside: I was poking around on the internal Innodata Isogen sales and marketing support portal and stumbled on an archive of all the old papers that various ISOGENers have written over the years, including some of mine. One I stumbled on was one I wrote and presented in 1995 titled "SGML Document Management". You can find an HTML version of it here: &lt;a href="http://www.oasis-open.org/cover/kimber-sgmldocm.html"&gt;http://www.oasis-open.org/cover/kimber-sgmldocm.html&lt;/a&gt; (thanks Robin).&lt;br /&gt;&lt;br /&gt;Even though my understanding and ideas have refined and evolved over the years, it's remarkably consistent with what I've been saying in this thread. I would now replace the focus on entities with a focus on link-based re-use but the overall architecture is very much the same. &lt;br /&gt;&lt;br /&gt;Another interesting historical footnote is the paper I wrote with Dr. Steve Newcomb and Peter Newcomb on "Referent Tracking Documents": &lt;a href="http://www.coolheads.com/SRNPUBS/ref-track-docs-paper.pdf"&gt;http://www.coolheads.com/SRNPUBS/ref-track-docs-paper.pdf&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;This paper describes a technique for using simple storage object versioning and straightforward link markup to represent links in a managable way. The key to this was that it provided, in the simplest possible way, a standards-based approach to capturing and managing complex element-to-element linking information. Of course we never expected that you would literally implement a system using huge collections of little documents &lt;i&gt;but you could if you wanted to and it would work&lt;/i&gt;. It would just be really slow (or maybe not so slow--parsing XML is pretty fast and the files are small). At a minimum it provided a standards-based interchange representation for an arbitrarily complex link index. I've never actually tried to implement a system that used this approach literally, although the Bonnell and XIRUSS systems both reflect the ideas, just not expressed as literal XML structures (but they could be using exactly these techniques).&lt;br /&gt;&lt;br /&gt;In any case, those ideas are now reflected more abstractly in the SnapCM model but the basic concept is the same, in particular the approach of using a reference to a resource (in the SnapCM sense) plus a resolution policy to address a specific version or versions. That paper was given in 1999.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/22194031-115394809999434050?l=drmacros-xml-rants.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://drmacros-xml-rants.blogspot.com/feeds/115394809999434050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=22194031&amp;postID=115394809999434050' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115394809999434050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/22194031/posts/default/115394809999434050'/><link rel='alternate' type='text/html' href='http://drmacros-xml-rants.blogspot.com/2006/07/xcmtdmw-has-it-really-been-11-years.html' title='XCMTDMW: Has it Really Been 11 Years?'/><author><name>Eliot Kimber</name><uri>http://www.blogger.com/profile/02285948329177704214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://photos1.blogger.com/blogger/2395/810/200/eliot-blog-photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-22194031.post-115391438849985459</id><published>2006-07-26T06:03:00.000-05:00</published><updated>2007-03-07T10:45:23.380-06:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='XCMTDMW &quot;xml content management&quot; import snapcm'/><title type='text'>XCMTDMW: Import is Everything</title><content type='html'>We've talked a lot about what an XML-aware CMS should look like and what it needs to do. Now it's time to put something into it. &lt;br /&gt;&lt;br /&gt;So first a little map of the area we're about to explore. Where we are is a border region, the boundary between where your XML documents are now, the "outside world", and where we want them to be, the "repository". Separating these two is a high ridge of mountains that can only be crossed with the aid of experienced guides and, depending on the cargo you're carrying, more or less sophisticated transport. [Or, on a bad day, some sort of demilitarized zone fraught with hidden dangers and mine fields on all sides.]&lt;br /&gt;&lt;br /&gt;If you're just bringing in files containing simple or opaque data with little useful internal structure or references to other files, a simple mule train will do the job. But if you're bringing in interconnected systems of files containing sophisticated data structures you're going to need the full logistical muscle of a FedEx or UPS, who can offer a range of services as part of their larger transportation operation.&lt;br /&gt;&lt;br /&gt;The point I guess I'm trying to make is that as soon as you go from files that are individual isolated islands of data to files that connect to each other in important ways, you're going from simple to dangerously complex. &lt;br /&gt;&lt;br /&gt;Most, if not all, data formats used for technical documentation use or can use interconnected files to create sophisticated systems of files. The most obvious case is documents that use graphics by reference or point to style sheets or that have navigation links to another document. Even PDFs, which we tend to think of as atomic units of document delivery can have navigable links to other PDFs (or to anything else you can point a URI at).&lt;br /&gt;&lt;br /&gt;So any repository import mechanism needs to be able to work with systems of files as systems of files, however those systems might be expressed in the data. Even if you aren't doing any semantic management but only storage object management, it is still useful, for example, to be able to import all of the files involved in a single publication as a single atomic action. &lt;br /&gt;&lt;br /&gt;I want to stress here that while XML as a data format standardizes and enables a number of ways to create systems of files, it is not in any way unique in creating systems of files to represent publications. &lt;br /&gt;&lt;br /&gt;This suggests that a generalized content management system must have generic features for both representing the connections between files and using and capturing those connections on import. We've already established that the storage management layer (Layer 1 in my three-layer model) should provide a generic storage-object-to-storage-object dependency facility. It follows that our import facilities should provide some sort of generic dependency handling facility.&lt;br /&gt;&lt;br /&gt;At this point I want to define a few terms that I will use in the rest of this discussion:&lt;br /&gt;&lt;br /&gt;- publication. A single unit of publishing, as distinct from the myriad data objects that make up the publication. This would normally translate to "doc number" or "title" but in any case it is the smallest unit of data that is published as an atomic unit for delivery and consumption. It is usually the largest or next to largest unit of management in a publication workflow in that you're normally managing the creation of publications for the purpose of publishing them atomically at specific times. That is while some information is published piece-meal as topics that are dynamically organized the typical case is you're publishing books in paper or as single PDFs. That book or that PDF is the "publication". Thus a "publication" is a business object that can be clearly distinguished from all other publications, i.e., by its ISBN or doc catalog part number or whatever. While it is not required, it is often the case that publications are represented physically by the top-level or "master" file of their source data (in DITA terms, by a map or bookmap).&lt;br /&gt;&lt;br /&gt;- compound document. A system of storage objects with a single root storage object linked together using some form of semantic link (i.e., XIncludes, topicrefs, conrefs, or whatever) in order to establish the direct content of a publication or similar unit of information organization or delivery. What exactly constitutes the members of a compound document is a matter of specific policy and document type semantics. For example, if you have both XIncludes and navigation links among several XML documents you would normally consider only the XIncludes for the purpose of defining the members of the compound document.&lt;br /&gt;&lt;br /&gt;- resource. The an object that represents and provides access to all the versions of a single logical version. For example, if you import a file for the first time, that creates both a resource and a version, which points to the resource. The resource represents the file as a set of versions. If you then import a second version of the same file, it would point to the first version from which you could then navigate to the resource. Resources are objects with unique identifiers within the repository. From a resource you can get to any of its versions. Therefore the resource acts as a representation of the file independent of any of its versions. Resources are vitally important because they are the targets of dependency relationships held in the storage management layer.&lt;br /&gt;&lt;br /&gt;- version. An invariant collection of metadata and, optionally, data, related to exactly one resource and to zero or more previous or next versions of the same resource. When you import files into the repository you are creating new versions. Once created versions do not change (you could, for example, implement your repository using write-only storage). The only possible exception to their invariance is version destruction--there are some use cases where it is necessary to be able to physically and irrevocably destroy versions (for example, document destruction rules for nuclear power plans in the U.S. or removal of draft bills from a legislative bill drafting system).&lt;br /&gt;&lt;br /&gt;- repository. A bounded system that manages a set of resources, their versions, and the dependencies between versions and resources.&lt;br /&gt;&lt;br /&gt;- storage object. A version that contains data as a set of bytes. A storage object provides methods for accessing its bytes.&lt;br /&gt;&lt;br /&gt;- dependency. A typed relationship between a specific version and a resource reflecting a dependency of some sort between the version and the resource. The pointer to the resource includes a "resolution policy" which defines how to choose a specific version or versions of the resource. The default policy is "latest". Therefore, by default, a version-to-resource dependency is a link between a version and the latest visible version of the target resource. Dependency policies can also specify specific versions or more complex rules, such as rules that examine metadata values, storage object content, the phases of the moon, the user's heart rate, or whatever.&lt;br /&gt;&lt;br /&gt;All of these terms except "publication" are from the SnapCM model &lt;a href="http://www.innodata-isogen.com/knowledge_center/white_papers/snap_cm.pdf"&gt;http://www.innodata-isogen.com/knowledge_center/white_papers/snap_cm.pdf&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;- bounded object set (BOS). The set of storage objects that are mutually dependent on each other, directly or indirectly. A compound document reflecting only XInclude links would form one BOS. If you also reflected any cross-storage object navigation links you would get a different (larger) BOS. BOSes are useful for defining units of import and export as atomic actions. A BOS is "bounded" in that it is finite. When constructing a BOS that includes navigation links you may need to define rules that let you stop including things at some point, otherwise you might attempt to include the entire Web in your BOS with is, for all practical purposes, unbounded. It is a set in that a given storage object occurs exactly once in the BOS, regardless of how many times it might be linked from various BOS members. The creation of a BOS requires that you be able to determine the identity of storage objects in some way, distinct from the mechanism by which they were addressed. That is, given two different URIs that you can resolve, you need to be able to determine that the resulting resources (in HTTP terms) are in fact the same resource. All file systems should provide this ability but not all storage systems can do this.&lt;br /&gt;&lt;br /&gt;This is almost all there is to the SnapCM model. There's a bit more that I'll introduce as we need it. I should also point out that you should be able to map the SnapCM abstract model more or less directly to any existing versioning system. For example, with Subversion, there is a very direct correspondence between the SnapCM version, resource, and repository objects and Subversion constructs. &lt;br /&gt;&lt;br /&gt;Therefore SnapCM can be valuable simply as a way to think about the basic operations and characteristics of systems of versions separated from distracting details of implementation. That thinking can then be applied to specific implementation approaches or existing systems. For example, you might have some crufty old content management system built up over the years with lots of specialized features, no clear code organization or component boundaries, and so on. By mapping what that system does to the SnapCM model you might be able to get a clearer picture of what your system does in order, for example, to separate, if only in your mind, those features that are really core content management features and those features that are business-object and business-logic specific (import, export, metadata reporting, UIs, etc.).&lt;br /&gt;&lt;br /&gt;For the rest of this discussion I will only talk about XML compound documents, because that's our primary focus and they are clear. But I want to stress that the basic challenges of import apply to any form of non-trivial documentation representation, proprietary or standard, and the basic solutions are the same. A system built to handle XML compound documents should be able to be &lt;i&gt;quickly&lt;/i&gt; adapted to managing Framemaker documents just by adding a bit of Frame-specific import functionality. Note my stress on the quickly.&lt;br /&gt;&lt;br /&gt;Let's start small, just a single XML document instance governed by an XSD schema. Let us call it "doc_01.xml". We want to import it into the repository. This is the simplest possible case for our purposes as we can assume that you will not be authoring documentation for which you do not have a governing schema. There are other XML use cases in which schemas are not needed or are not relevant. This is not one of them.&lt;br /&gt;&lt;br /&gt;So right away we have a system of at least two documents: the XML document instance and the XSD document that governs it. To import this system of documents I have to do the following:&lt;br /&gt;&lt;br /&gt;1. Process the XML document semantically in order to discover any relationships it expresses via links in order to determine the members of the bounded object set we need to import. We have to import at least the minimum required BOS so that the state of the repository after import, with respect to the semantics of the links involved in the imported data, is internally consistent. That is, if DocA has a dependency on DocB that if not resolved prevents correct processing of DocA, then if you only import DocA and not DocB, the internal state of the repository will be inconsistent. Therefore you must import DocA and DocB as an &lt;i&gt;atomic action&lt;/i&gt; in order to ensure repository consistency. &lt;br /&gt;&lt;br /&gt;In this case we discover that doc_01.xml uses an xi:schemaLocation= attribute to point to "book.xsd". This establishes a dependency from doc_01.xml to book.xsd of the type "governed by" (the inverse relationship, "governs", while interesting, is not a dependency because a schema is not dependent on the documents it governs). &lt;br /&gt;&lt;br /&gt;We don't find any other relevant links in doc_01.xml.&lt;br /&gt;&lt;br /&gt;At this point, we have established that doc_01.xml is the root storage object of our compound document and the first member of the BOS to be imported. We know that book.xsd is rooted (for this compound document) at doc_01.xml and will be the second member of our BOS.&lt;br /&gt;&lt;br /&gt;2. Process the compound document children of the root storage object, i.e., book.xsd. We determine that book.xsd has no import or include relationships to any other XSD documents (if it did we would of course add them to our BOS).&lt;br /&gt;&lt;br /&gt;At this point we have established a BOS of two members reflecting a compound document of two storage objects.&lt;br /&gt;&lt;br /&gt;3. For each member of the BOS, determine whether or not the repository already has a resource for which the BOS member should be a new version.&lt;br /&gt;&lt;br /&gt;Hold the phone! How can I possibly know, in the general case, whether a given file is already represented in the repository?&lt;br /&gt;&lt;br /&gt;The answer is: you can't. There is no general way to get this knowledge. There are a thousand ways you could do it. &lt;br /&gt;&lt;br /&gt;One approach would be to use a CVS- or Subversion- style convention of creating local metadata (the "working copy") that correlates files on the file system to resources and versions in the repository. This is a perfectly good approach.&lt;br /&gt;&lt;br /&gt;Another approach would be to use some sort of data matching heuristic to see if there are any versions in the repository that are a close match to what you're trying to import. There are systems that do something like this (I know some element-decomposition systems will normalize out elements with identical attributes and PCDATA content).&lt;br /&gt;&lt;br /&gt;You can use filenames and organization to assert or imply correspondence (if a file with name X is in directory Y on the file system and in the repository then they're probably versions of the same resource). Of course this presumes that the repository's organizational facilities include something like directories. Not all do.&lt;br /&gt;&lt;br /&gt;Another approach is to require the user to figure it out and tell the importer.&lt;br /&gt;&lt;br /&gt;This last approach is the only really generalizable solution but it's not automatic. In the XIRUSS-T system I've generalized this in the import framework through the generic "storage-object-to-version map", which defines an explicit mapping between storage objects to be imported and the versions of which they are to be the next version, if any. How this map gets created is still use-case-specific. It could be via an automatic process using CVS-like local metadata, it could heuristic, it could be via a user interface that the importing human has to fill out. But regardless you have to have some way to say explicitly at import time what existing versions the things you are importing are related to.&lt;br /&gt;&lt;br /&gt;OK, for this first import scenario we establish that in fact the repository is empty so there's no question that we will be creating new resources and versions for both doc_01.xml and book.xsd. &lt;br /&gt;&lt;br /&gt;4. Having constructed our empty storage-object-to-version map, we execute the import process, the result of which is that we create two new resources, one for doc_01.xml and one for book.xsd, and for each resource, the corresponding version, being storage objects holding the sequence of types from doc_01.xml and book.xsd respectively. We also create a dependency instance from the version doc_01.xml (let us call this doc_01.xml version 1) to the resource for book.xsd. &lt;br /&gt;&lt;br /&gt;The creation of these objects in the repository is an atomic transaction such that, as far as the repository is concerned, the resources, versions, and dependencies all came into existence at the same moment in time. This is very important--if the import activity is not atomic then it cannot be easily rolled back and the repository will likely be in an incomplete, inconsistent state for some period of time. This is an important difference between CVS and Subversion, for example. CVS does not have any reliable form of atomic commit of multiple files while Subversion does. Any repository that cannot do atomic commits as a single transaction that can be rolled back is seriously limited and should be given a very close look. I don't know if it's been corrected in the meantime, but in 1999, when we were using Documentum to store documents for a bill drafting system, we discovered that Documentum could not do atomic commits as single transactions. This was very distressing to us.&lt;br /&gt;&lt;br /&gt;Let's look at the data we have in our repository. For example, doc_01.xml might look like this:&lt;pre&gt;&amp;lt;?xml version="1.1"?&gt;&lt;br /&gt;&amp;lt;book xmlns="http://www.example.com/namespaces/book"&lt;br /&gt;     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" &lt;br /&gt;     xsi:schemaLocation="../dtds/book/book.xsd"&lt;br /&gt;&gt;&lt;br /&gt;  ...&lt;br /&gt;&amp;lt;/book&gt;&lt;/pre&gt;&lt;br /&gt;Anyone notice the problem?&lt;br /&gt;&lt;br /&gt;The problem is the value of the xsi:schemaLocation= attribute: it's a relative URI that reflects the location of the schema on the filesystem from which the documents were imported. But we're not in that domain any more. We've crossed the pass through the mountains and we're into a different country with different language and customs. That URI may or may not be resolvable in terms of the location of the data &lt;i&gt;within the repository&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;If you're using a system like Subversion where the documents are never processed directly from the repository but are always exported first to create working copies and those working copies will reflect the original relative locations then that's OK, because the repository is really just a holding area.&lt;br /&gt;&lt;br /&gt;But what you really want is the ability to process the documents in the repository directly from the repository (e.g., as though the repository were itself a file system of some sort). You want this because it's expensive and inefficient to have to do an export every time you want to process a document because, for most documents in the domain of technical documentation, there will be a lot of files involved, some of them potentially quite large (i.e., graphics). It would be much easier if you could just access the data directly, e.g., via an HTTP GET without the need to first make a copy of everything.&lt;br /&gt;&lt;br /&gt;But in order to do that all the pointers have to be rewritten to reflect the new locations of everything in the repository &lt;i&gt;as stored&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;This is non-trivial but it's not that hard either. You just need to know what the repository-specific method of referring to objects within the repository is and what the mapping is from the objects as imported (that is, in their original locations) and the objects as stored. The exact forms of the repository-specific pointers could take many different forms: object IDs, HTTP URLs, repository-specific URIs, or whatever. In today's world it generally makes most sense for the repository to use URLs so that you can use standard and ubiquitous HTTP services to access your repository contents. &lt;br /&gt;&lt;br /&gt;For example, the XIRUSS-T system defines a simple HTTP convention whereby you can refer to a version either by naming its resource by resource object ID and, optionally, naming a resolution policy (the default is "latest visible version") or by version object ID. The XIRUSS-T system also defines some basic organizational structures that can also be used to construct unambiguous and persistent URLs and you can define arbitrary organizational containers (analogous to directories) by which you can also address objects. So in XIRUSS there are two base addressing methods (resource ID + resolution policy and version ID) that will always work and can be constructed knowing only the resource ID or version ID and other "convenience" forms that will also work.&lt;br /&gt;&lt;br /&gt;So for our example, let us assume that book.xsd results in resource object RES0002 and version object VER0002. We can rewrite the xsi:schemaLocation= value in doc_01.xml like so:&lt;pre&gt;&amp;lt;?xml version="1.1"?&gt;&lt;br /&gt;&amp;lt;book xmlns="http://www.example.com/namespaces/book"&lt;br /&gt;     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" &lt;br /&gt;     xsi:schemaLocation="/repository/resource/RES0002"&lt;br /&gt;&gt;&lt;br /&gt;  ...&lt;br /&gt;&amp;lt;/book&gt;&lt;/pre&gt;&lt;br /&gt;This is still a relative URL (relative to the server that holds the repository) but it is now addressing book.xsd as a resource/policy pair that can be reliably resolved to the appropriate version at any moment in time.&lt;br /&gt;&lt;br /&gt;This need to rewrite pointers is universal if you want to able to process storage objects as stored and you don't want to limit yourself to the static and limiting organizational facilities of a typical file system (which you don't, trust me). &lt;br /&gt;&lt;br /&gt;Therefore, you need an import framework or mechanism that can do two things:&lt;br /&gt;&lt;br /&gt;- For a given storage object to be imported, determine what its address will be within the repository after import. This could either be by asking the repository (e.g., resource = repository.createResource(); resource.getId()) or by applying some established convention or using metadata within the data to be imported (for example, you might have already assigned globably-unique identifiers to your documents, captured as attributes on the root element, and you use those identifiers as your within-repository object IDs).&lt;br /&gt;&lt;br /&gt;- For each storage object, whatever its format, rewrite the pointers to reflect the new locations. It should go without saying that this process shouldn't break anything else. However, this is sometimes easier said than done. For example, the built-in XIRSS-T XML importer imposes some limitations on what XML constructs it can and can't preserve during import, mostly for practical reasons.&lt;br /&gt;&lt;br /&gt;This suggests that repositories should, as a matter of practice, provide some sort of import framework that makes it easy as it can be (which isn't always that easy) to implement these operations. Any repository that provides only built-in importers or that does not make creating new importers particularly easy should get a very close look because it's likely both that any built-in importer won't do exactly what you want done or everything you need done (even if what it does do it does just how you want). If, for example, the import API is poorly documented or incomplete or, for example, it doesn't provide any way to get, set,  or predict a resource's ID in advance of committing it to the repository, you've got a problem. &lt;br /&gt;&lt;br /&gt;This is an area that a lot of enterprises don't check when evaluating potential XML-aware content management systems but it is a crucial area to evaluate because it is where you will be investing most of your integration and customization effort. The last thing you want to have to do is call Innodata Isogen to help you figure out how to get your stuff into and out of the tool you've already bought. Not that we're not happy to help but we'd rather not see you be in that position at all. We'd rather you hired us to quickly implement the exact functionality you need, cleanly and efficiently, rather than bang our heads against some product that resists all our efforts to bend it to our will. We like to have fun in our jobs too.&lt;br /&gt;&lt;br /&gt;So our initial import process wasn't quite complete. We need to insert step 3.1 to include the pointer rewrite:&lt;br /&gt;&lt;br /&gt;3.1 In temporary storage (or in the process of streaming the input bytes into the newly-created version objects) rewrite all pointers to reflect the locations of the target resources or versions as they will be within the repository.&lt;br /&gt;&lt;br /&gt;In XIRUSS-T's import framework I have generic XML handling code that supports this rewrite activity and essentially acts as a filter between the input (from the file system) and the output (the new version objects) to do the rewriting. This generic XML handling code can then be used by schema-specific code that understands specific linking conventions. For example, there is an XInclude importer that recognizes xi:include elements and knows that it is the href= attribute that holds the pointer to be rewritten, an XSD schema importer that knows about schemaLocation, import, and include, and an XSLT importer that understands XSLT's import and include elements. You get the idea. &lt;br /&gt;&lt;br /&gt;Notice here the separation of concerns, separating the generic operation of essentially changing attribute values in XML documents from the concern of schema-speci
