Various
Contributors:
David Carlisle, Jeni Tennison, Mike Kay, Oliver Becker,
Wendell Piez, Ken Holman. This all started off with a list question asking
How could I get the nearest ancestor with a certain attribute, say "attr"? The very simplistic answer is:
(ancestor-or-self::*/@attr)[last()]
or
ancestor-or-self::*[@attr][1]/@attr
Typically you want to do this because you have "inherited attributes"
eg if you want to know the current language, you want to know the
value of the nearest xml:lang attribute which is (ancestor-or-self::*/@xml:lang)[last()] similarly if you are reading an FO file and want to know current
effective value of any FO property. In order to collate the whole host of answers and comments on this question, I wrote a test harness and stylesheet. Test Data: I assumed that the sought attribute was a property of
an ancestor, rather than the 'home' element itself.
XML file
<?xml version='1.0'?>
<el>
<el attr="x1">
<el attr="y1">
<el>
<el attr="x2">
<el attr="This one"><!-- This one is sought -->
<el>
<el attr="y">Y2</el>
<el>
<home>Base template match</home>
</el>
</el>
</el>
</el>
</el>
</el>
</el>
</el>
XSL Stylesheet:
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="//home"/>
</xsl:template>
<xsl:template match="//home">
1 <xsl:value-of select="ancestor-or-self::*[@attr][1]/@attr"/>
2 <xsl:for-each select="ancestor::*[@attr]">
<xsl:value-of select="@attr"/>
</xsl:for-each>
3. <xsl:value-of select="ancestor::*[@attr][1]/@attr"/>
4. <xsl:variable name="tmp" select="ancestor::*[@attr]/@attr"/>
<xsl:value-of select="$tmp"/>
5. <xsl:value-of select="(ancestor::*/@attr)[last()]"/>
6. <xsl:value-of select="ancestor::*[@attr][1]/@attr"/>
7. <xsl:value-of select="ancestor::*/@attr[last()]"/>
</xsl:template>
<xsl:template match="*"/>
</xsl:stylesheet>
Each one I tried, I numbered. The following numbers relate to that
numbering. 1 <xsl:value-of select="ancestor-or-self::*[@attr][1]/@attr"/>
Provides the right one.
2. ancestor::*[@attr]
gets you a whole host of stuff..., the last node of which is the sought item!
3. <xsl:value-of select="ancestor::*[@attr][1]/text()"/>
gets the right node
4. ancestor::*/@attr gets the first in document order!
5. (ancestor::*/@attr)[last()] Actually provides the node we were looking for!
6. ancestor::*[@attr][1]/@attr works.
Xpath 4.2 is the reference here.
7. <xsl:value-of select="ancestor::*/@attr[last()]"/>
Gives me the last in the reverse document order!
This 7th expression will return a set consisting of the
last @attr attribute of each ancestor element. An element can only have one
@attr attribute so saying you want its last one is pointless.
An axis returns a list of nodes, which is always in either forwards or
reverse document order. This affects the meaning of position() and last()
when applied to a predicate used in a Step based on that axis. The final
result of a path expression, however, is always a set of nodes, with no
intrinsic order. Many operations on a node-set access the nodes in document
order. This includes a filter expression, so:
(ancestor::*/@attr)[last()] will select the last in document order (i.e. the
innermost available attribute). Which implies that example 5 is returning
the correct attribute.
So what *is* the order of nodes returned by ancestor::*/@attr (without the
grouping operator)?
XPath expressions select node sets. The distinguishing feature of a
set (as opposed to a list) is that it is unordered.
(Most operations on that set take the nodes in document order, but that
is a propoerty of the operation being applied, not of the set itself)
Regarding the importance of an axis:
The "order" of an axis only "rules" for the step that it is in, ie
it affects the meaning of position() and last() in predicates that occur
in that step. It doesn't affect the constructed node set.
The axis direction affects predicates in a step (in both patterns
and expressions) and it doesn't affect anything else. It _never_
affects the resulting node set.
(A/B/C[last()])[position()=2]
or
<xsl:variable name="x" select="A/B/C[last()]"/>
<xsl:value-of select="$x[2]"/>
or
<xsl:for-each select="A/B/C[last()]">
<xsl:if test="position()=2">
all select the 2nd node in document order irrespective of the axis used
in C.
The only thing the axis order affects is the direction of that step
ie the direction of predicates appearing between C and the next / (if
there is one)
Although actually since you've already used [last()] in this step the
node list in that step has at most one node so it makes no difference
which way you count, but for the record
select="A/B/C[last()][1]">
selects the first (and only) node counding from the front or the back,
depending on the axis direction of C. When you use a predicate on a node set within a step (e.g. //*/@*[1]
or foo[1]) then the node set is sorted according to the axis. If it's
preceding::, preceding-sibling::, ancestor:: or ancestor-or-self::
then it's a reverse axis, otherwise it's a forward axis. If it's a
reverse axis, then it's sorted into *reverse document order* before
the predicate is assessed; if it's a forward axis, then it's sorted
into *document order* before the predicate is assessed. When processing a node-set note that: If you use xsl:for-each or xsl:apply-templates then the nodes are
always processed in document order.
Stating the obvious, a node-set is a set
(and sets are by definition unordered).
select="ancestor::*"
produces the _set_ of ancestors, it isn't ordered at all.
<xsl:variable name="x" select="ancestor::*"/>
puts that set in a variable.
<xsl:variable name= "x" select="$x[1]"/>
the operation of applying the predicate [1] (ie [position()=1]takes the
first node in document order. But that is a feature of the operation not
of the node set $x. Note this is not the same as applying a predicate within a step, which
is the only place the axis order has any effect <xsl:variable name="y" select="ancestor::*[1]"/> selects the parent, unlike the $x[1] which selects the document element
(ie the outer most ancestor) A node set is always unordered until you do anything with it like
iterate over it, get a value from it or whatever. Of course a node set
is pretty useless unless you do something with it, so saying a node
set is unordered doesn't particularly help :) It mainly helps in defining | as set union, which is a standard
operation on sets. If you wanted to define | on ordered lists you'd have
to describe removing of duplicates and say which duplicate was removed,
etc.
When you use a predicate on a node set that's already been generated,
such as a node-set variable, a union or just by putting brackets
around it (e.g. $foo[1], (//*/@*)[1] or (foo|bar)[1]) then the node
set is always sorted into *document order* no matter what axes you
use.
Since a predicate can be applied to a location path expression, as in the
union of two other path expressions, that predicate is always evaluated in
document order.
xsl:value-of and xsl:for-each deal with location path expressions (not
location step expressions), therefore their evaluation is done in document
order.
Mainly of course we are just arguing about words, for the fun of it,
but if one was constructing ordered sets I would expect ancestor::*
to construct the set of ancestors with reverse document ordering,
but it doesn't. It just constructs the set of ancestors.
Now given a set there is (in XSLT 1.x) only one possible ordering you
can apply, namely document order, so whether that property is
intrinsic or applied makes no practical difference, except for the way
you describe things. I'd say that <xsl:variable name="x" select="
ancestor::*"/> constructs the set, and that <xsl:for-each select="$x">
steps through that set in document order. You'd presumably say that
the variable constructed the node set in document order (as you want the
sets to always be ordered), and then for-each does the obvious thing.
Now clearly in an implementation a set is probably implemented as some
kind of list and so is ordered, so probably the implementation follows
your model, but that's an implementation detail, I've been a pure
mathematician too long to worry about implementation details:-) The XSL spec is always very careful to distinguish between "node set"
and the "current node list" the latter, being a list and not a set,
has an order, and that's what position() refers to. In some places the
current node list emerges by magic out of a given node set, but it is
still useful to keep the things clearly separated in your (or at least,
my) mind.
An axis identifies an (ordered) list of nodes. The predicate associated with
the axis is applied to the ordered list. The result of an XPath expression,
however, is always a node-set, not a list. The node-set is unordered, but
the nodes have an ordering, called document order. Many operations on
node-sets process the nodes in document order. Is that clear now?
Drifting slightly off topic, regarding node-sets of attributes,
Attributes of an element aren't ordered in a defined way. The
order is implementation dependent. You can't tell, what @*[1] gives.
Document order or reverse document order is *only* important in
location steps, or to be more specific, in the predicates of a step.
If you have an expression that evaluates to a node-set and apply
a predicate, then this predicate "filters the node-set with respect to
the child axis." (see XPath 3.3)
To bring an even little bit more confusion:
1. Some nodes will have attributes that were not explicitly expressed
in the text of the xml document, but were defined/defaulted in a DTD.
What will be the "document order" for @* in this case? DTD implied attributes are no different from explicit ones.
The order is defined but implementation specific. I.e. for a given element node . each time you do @*[2] in a run of the stylesheet
you get the same attribute node, as th eattributes (defaulted or not)
are placed in some order when the input tree is constructed.
But you can't guarantee you get the same result if you run the
stylesheet again. (Most likely you will if you use th esame input and
same XSL engine, but that is not guaranteed) Expanding to a nod-set generated from multiple documents:
Nodes from different documents are also in an arbitrary implementation-defined
order. <xsl:for-each select="document('doc1.xml')//* | document('doc2.xml')//*">
...
</xsl:for-each>
The attribute:: axis and the namespace:: axis
are a bit weird because there is no real concept of 'document
order' for attributes or namespace nodes. So, although @*[1] gives you
the first attribute in document order, this could actually be any
attribute on the element. It rarely makes sense to use a positional
predicate with an attribute or namespace axis because attributes and
namespaces have no inherent position in an element.
Kens personal view is worth noting: I find it is more succinct to say: "Nodes from the attribute and namespace
axes are always in an arbitrary implementation-defined order, while nodes
from all other axes are in proximity order within location step expressions
and in document order in location path expressions." Note that attribute and namespace axes are *not* unordered (you can index
into them), just that the order cannot be relied upon. There is an
important nuance. When you say that the order can't be relied on, do you mean: (a) between two runs of a stylesheet? | (b) between two XSLT processors? |
Technically, any of the above. If an implementation doesn't keep it
consistent, that shouldn't have any effect on a properly written stylesheet
(i.e. one that does not ever rely on the order).
It is totally up to the implementation, so the stylesheet
shouldn't make any assumptions whatsoever.
The input to XSL is a tree in which there is a complete order applied to
the set of nodes. This ordered input tree does not change during a run.
The "implementation defined" variability is in the way the ordered tree
is built from a linear XML syntax. As Mike commented Saxon will give
different results with different parsers, as the XSL engine will have
effectively been handed different ordered trees. (In fact its probably
the XSL code building the tree from sax events rather than really being
passed an XPath tree model, but that is an implementation detail)
Of course there need never have been any linear input at all. MSXML
for example gets its input from a DOM object, that may have been
manipulated in memory with attributes added/changed/deleted etc.
Again the end result has to be ordered somehow before XSL starts work
but it isn't specified how that ordering is done. Once MSXSL gets that
tree though, it is fixed for the duration of the stylesheet. |