XQuery: Not So Bad After All
I've recently finally had a need to use XQuery for something--up to this point everything I've done with XML since XQuery was solidified was with XSLT and DOM programming. I've been observing XQuery since the effort was started back in the dim mists of time and, like XML Schemas, had little hope that it would ever see the light of day, for the simple reason that there seemed to be too many cooks and too many different requirements for the committee to ever reach a useful concensus on things like syntax and semantics. In particular, there seemed to be a fundamental chasm between the database people who wanted an SQL for XML, and the document people, who wanted XSLT with result sets. But that was from a distant vantage point with little direct visibility into the activity other than getting all all the function-related committee email because I'm a member of the XSL working group (not that I actually read any of that email unless the subject line was particularly intriguing and I had the time to devote to reading it).
But of course my pessimism was unfounded and XQuery has emerged as a solid and useful specification with a number of implementations.
I've now been constructing simple XQueries for a couple of weeks and I must say it's pretty cool to do with a simple query what would take a good bit more work to do with an XSLT script (given a running XQuery-supporting repository, of course).
I also found Mike Kay's Learn XQuery in 10 Minutes to be a very helpful startup guide, providing just the how-to information I needed to get the basic syntax and techniques. The rest of XQuery (at least the part I've used) is pretty obvious and intuitive to anyone familiar with XSLT.
Of course, I haven't had the opportunity or time to determine to what degree various tools provide complete and correct implementations of XQuery, but I'm sure I will. By the same token, the standard has a solid set of test cases that make it pretty hard to not know if you are doing it both correctly and completely.
My main concern would collation, which was very broken in XSLT (in the sense that the mechanism for specifying custom collators for doing sorting was not well standardized and was only usefully implemented by Saxon for the purposes of doing XSLT processing of localized documents [i.e., back-of-the-book index collation]). I know that XSLT 2 (and therefore XQuery, which share the same collation semantics) have attempted to be more general but when I first looked at what was in Saxon 8 (a couple years ago now) it wasn't quite what I needed (you had to declare a separate collation URI for each locale, while I wanted a single collation URI that named a collator that then did the right thing at the right time based on an outside configuration mechanism). [With Saxon 6 you had to implement per-locale classes that Saxon used based on an invariant mapping of locale names to collator class names. At least the XSLT 2 mechanism is more general.]
But I haven't had time or business need to push on the XSLT 2/XQuery collation mechanism for a while so I really don't know. But I suppose I really should, because as far as I can tell I'm about the only person who really worries about this particular issue (I developed a generic index configuration and collation support library for use with Saxon, which is available here: Internationalization Support Library [note: log-in may be required. If this is a problem, send me an email and I'll forward you a copy.]. Note that this code is equally applicable to XSL-FO 1.0 and the new indexing support in XSL-FO 1.1 as the FO indexing is only about constructing sequences of page numbers and not about sorting the index entries themselves. In addition, this code is equally useful for things like generated glossaries.)
One interesting question that I've already run into is when engineering a complete Web site to serve XML data and queries against it, how much should be done in XQuery alone and how much should be done with more traditional Web site technologies such as JSP or Ruby? You can of course use XQueries to generate HTML pages that reflect the query results and can therefore use XQuery exclusively to build a Web site (given some sort of CGI-like facility, such as the extensions that MarkLogic provides or just everyday CGI scripts) but should you?
My initial instinct is that you should not, that good engineering practice argues for clear separatation of concerns and that XQuery should focus on queries and something else should focus on the user interface. But I'd be curious if anyone has a strong counter argument. One of the things that's attractive about the do-it-all-in-XQuery approach is that you can build stuff really quick because there's little overhead, so it is good for proofs of concepts and demos. But I can't see it being a sustainable approach for production Web sites (although I'm sure more than one person will answer that they've been doing it for years now).
But of course my pessimism was unfounded and XQuery has emerged as a solid and useful specification with a number of implementations.
I've now been constructing simple XQueries for a couple of weeks and I must say it's pretty cool to do with a simple query what would take a good bit more work to do with an XSLT script (given a running XQuery-supporting repository, of course).
I also found Mike Kay's Learn XQuery in 10 Minutes to be a very helpful startup guide, providing just the how-to information I needed to get the basic syntax and techniques. The rest of XQuery (at least the part I've used) is pretty obvious and intuitive to anyone familiar with XSLT.
Of course, I haven't had the opportunity or time to determine to what degree various tools provide complete and correct implementations of XQuery, but I'm sure I will. By the same token, the standard has a solid set of test cases that make it pretty hard to not know if you are doing it both correctly and completely.
My main concern would collation, which was very broken in XSLT (in the sense that the mechanism for specifying custom collators for doing sorting was not well standardized and was only usefully implemented by Saxon for the purposes of doing XSLT processing of localized documents [i.e., back-of-the-book index collation]). I know that XSLT 2 (and therefore XQuery, which share the same collation semantics) have attempted to be more general but when I first looked at what was in Saxon 8 (a couple years ago now) it wasn't quite what I needed (you had to declare a separate collation URI for each locale, while I wanted a single collation URI that named a collator that then did the right thing at the right time based on an outside configuration mechanism). [With Saxon 6 you had to implement per-locale classes that Saxon used based on an invariant mapping of locale names to collator class names. At least the XSLT 2 mechanism is more general.]
But I haven't had time or business need to push on the XSLT 2/XQuery collation mechanism for a while so I really don't know. But I suppose I really should, because as far as I can tell I'm about the only person who really worries about this particular issue (I developed a generic index configuration and collation support library for use with Saxon, which is available here: Internationalization Support Library [note: log-in may be required. If this is a problem, send me an email and I'll forward you a copy.]. Note that this code is equally applicable to XSL-FO 1.0 and the new indexing support in XSL-FO 1.1 as the FO indexing is only about constructing sequences of page numbers and not about sorting the index entries themselves. In addition, this code is equally useful for things like generated glossaries.)
One interesting question that I've already run into is when engineering a complete Web site to serve XML data and queries against it, how much should be done in XQuery alone and how much should be done with more traditional Web site technologies such as JSP or Ruby? You can of course use XQueries to generate HTML pages that reflect the query results and can therefore use XQuery exclusively to build a Web site (given some sort of CGI-like facility, such as the extensions that MarkLogic provides or just everyday CGI scripts) but should you?
My initial instinct is that you should not, that good engineering practice argues for clear separatation of concerns and that XQuery should focus on queries and something else should focus on the user interface. But I'd be curious if anyone has a strong counter argument. One of the things that's attractive about the do-it-all-in-XQuery approach is that you can build stuff really quick because there's little overhead, so it is good for proofs of concepts and demos. But I can't see it being a sustainable approach for production Web sites (although I'm sure more than one person will answer that they've been doing it for years now).
Labels: xquery
1 Comments:
Be sure that your are not *only* one who care about collating and index processing in XSLT. We are at least two ;-). See http://www.idealliance.org/proceedings/xml04/papers/77/xslindex.pdf
FYI: DocBook stylesheets in the latest relase have out-of-the-box support for both yours and mine indexing method.
Post a Comment
<< Home