Michael Kay Oft times the question arises when to use SAX filters and when to use XSLT. Here Mike Kay offers help in deciding Here's a starter for ten (sorry, that's a catchprase from a UK TV programme)
A SAX filter can sometimes be used instead of an XSLT transformation,
and it can sometimes be used for pre-processing the input to an XSLT
transformation, or for post-processing the output. The main cases where a SAX filter can be useful are: (a) in cases where the XML file is too large to be processed by XSLT | (b) in cases where you need to perform operations - usually text
processing - that can't be done easily in XSLT. | (c) to preserve information that the XSLT/XPath data model does not
retain |
To solve problems of document size, you can: (a) do all the processing in a SAX application (if the processsing is
simple and purely serial) | (b) use a preprocessing SAX filter to create a smaller input document
for the transformation to work with (e.g. by projection or restriction) | (c) use a preprocessing SAX filter to split the large document into
many small documents, each of which is then transformed independently by
XSLT. If necessary, you can then use a postprocessing SAX filter to put
the transformed pieces back together again. |
A SAX filter can be used to transform the input data into a form that
is more amenable to XSLT processing. Examples include: (a) preparsing a structured text field (e.g. CSV) into a set of
separate elements | (b) changing the representation of a date field to the ISO 8601 form
yyyy-mm-dd | (c) computing a derived attribute, e.g. adding @value as the product
of @price and @qty, making it easier for the XSLT stylesheet to do
sorting and totalling. | (d) simple grouping of elements, for example adding a <list> element
around any consecutive sequence of one or more <list-item> elements |
A SAX filter can be used to capture features of the source document
that are not representable in the XSLT data model. For example, entity
references and CDATA sections, as well as DTD declarations, can all be
captured in a SAX filter and translated into elements that are visible
to the XSLT stylesheet. A postprocessing SAX filter (or simply a SAX ContentHandler) is
useful in two principal situations: (a) to undo the changes made by a preprocessing filter | (b) to achieve serialization effects that cannot be achieved using
the standard serialization methods (as an alternative to
disable-output-escaping). |
Sometimes a user-written serializer can be produced by subclassing the
standard serializer supplied with your chosen product. This will of
course be product-dependent and your code may not work with future
releases of the product. It's also possible to write a SAX filter to preprocess the
stylesheet. This is less common, but it can be used to tackle problems
such as dynamic sort keys, or XPath expressions that are contained
within source documents.
The new STX specification provides the prospect of being able to write
SAX filters without needing to do low-level Java coding. If this takes
off, I think that the idea of doing a complex transformation as a
pipeline of SAX filters, some generated using XSLT and some using STX,
may become increasingly attractive. Although XSLT 2.0 deals with nearly
all the limitations of XSLT 1.0 in areas such as text processing,
grouping, and aggregation, it doesn't address the problem of handling
large input documents.
STX at sourceforge Streaming Transformations for XML (STX) is a one-pass transformation language for XML documents that builds on the Simple API for XML (SAX). STX is intended as a high-speed, low memory consumption alternative to XSLT. Since it does not require the construction of an in-memory tree, it is suitable for use in resource constrained scenarios. The aim of this project is to develop and maintain STX language specification. Attention good readers. I'm looking for two examples. 1. sax filter feeding an XSLT transform
2. XSLT transform feeding into a Sax filter.
Could you help? |