1. | tokenize over multiple elements | ||||
how about: current-group()[position() > 1]/tokenize(. , '\s*;\s*' ) | |||||
2. | Spurious spaces in function output | ||||
Demonstrated using this example. <xsl:stylesheet version="2.0" xmlns:d="data:,dpc" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" exclude-result-prefixes="d"> <xsl:function name="d:chars"> [<xsl:value-of select="'x'"/>] </xsl:function> <xsl:template match="/"> <xsl:value-of select="d:chars()"/> [<xsl:value-of select="'x'"/>] </xsl:template> </xsl:stylesheet> Running the above stylesheet on itself produces: bash-2.05b$ saxon8 bugchar.xsl bugchar.xsl Provides the following output <?xml version="1.0" encoding="UTF-8"?> [ x ] [x] Note the d:chars function produces [ x ] with spaces around the x. MK replies This isn't in fact a bug, it's just a surprising consequence of the current language specification. If a function produces as its result a sequence of text nodes, and this sequence is then displayed using xsl:value-of, the xsl:value-of instruction atomizes the sequence of text nodes into a sequence of strings, and the strings are then space-separated. the explanation of this effect is: The function returns a sequence of three text nodes, whose contents are '#[', 'x', and ']#' where # represents a newline. The template uses xsl:value-of with a select expression that selects this sequence of text nodes. By default, xsl:value-of atomizes the value of the select expression, and then inserts spaces between adjacent items. Atomization produces a sequence of three strings, and after adding spaces the result is a text node containing '#[_x_]#' where # represents newline and _ represents space. Using xsl:copy-of or xsl:sequence in place of xsl:value-of would solve the problem (because when three text nodes are added to a document node, they are combined without any separator); alternatively use xsl:value-of separator="". You can force the text nodes to be concatenated by writing the function as: <xsl:function name="d:chars" as="xs:string"> [<xsl:value-of select="'x'"/>] </xsl:function> Alternatively, use separator="" on the xsl:value-of instruction. | |||||
3. | xsl:function, user defined functions | ||||
I suspect that's because <xsl:function> was only introduced in XSLT 2.0, which isn't even a Last Call Working Draft yet and has very few implementations. <xsl:function> works in roughly the same way as <func:function> as defined in EXSLT (http://www.exslt.org/func/elements/function). You can find lots of examples of <func:function> on the EXSLT site -- most of the functions defined there have a <func:function> implementation. An example is the following fairly useless function that adds two things together: <xsl:function name="my:add"> <xsl:param name="val1" /> <xsl:param name="val2" /> <xsl:result select="$val1 + $val2" /> </xsl:function> All functions you define with <xsl:function> have to be in some namespace, which means that their names are always qualified. In this example, you have to have the 'my' prefix associated with a namespace at the top of your stylesheet. You can call the function with, for example: <xsl:value-of select="my:add(1, 3)" /> to get the value 4. If you want, you can constrain the types of the parameters to the function and declare the type of the result using 'as' attributes. This will enable/force the implementation to raise type errors if the function is passed the wrong type of arguments or used somewhere that expects something other than a number. For example, to create a my:add() function that will only work with integers: <xsl:function name="my:add"> <xsl:param name="val1" as="xs:integer" /> <xsl:param name="val2" as="xs:integer" /> <xsl:result select="$val1 + $val2" as="xs:integer" /> </xsl:function> Note again that the 'xs' prefix has to be associated with the 'http://www.w3.org/2001/XMLSchema' namespace at the top of your stylesheet. If you're after concrete examples of user-defined functions in use, I used quite a few in some stylesheets I wrote over the weekend, which are available at: http://www.lmnl.org/projects/LMNLCreator/LMNLCreator.xsl http://www.lmnl.org/projects/LMNLSchema/LMNLNester.xsl The stylesheets are not run-of-the-mill, but they do use XSLT 2.0 features, including <xsl:function>, quite heavily. | |||||
4. | Sorting | ||||
XSLT 2.0 also introduces a sort() function that takes a named sort key as a parameter, which can be determined at run-time: <xsl:for-each select="sort($x, if (condition1) then 'sortkey1' else 'sortkey2')"> | |||||
5. | Including CSS files | ||||
You can use the | |||||
6. | calculate depth of an xml-tree | ||||
<xsl:for-each select="//*"> <xsl:sort select="count(ancestor-or-self::*)" data-type="number"/> <xsl:if test="position()=last()"> <xsl:value-of select="count(ancestor-or-self::*)"/> </xsl:if> </xsl:for-each> Or in 2.0: max(for $n in //* return count($n/ancestor-or-self::*)) You can possibly speed it up a little by excluding non-leaf elements, or by using a recursive template that supplies the depth of a node as a parameter, avoiding the need to count ancestors of every node. | |||||
7. | Index-of, using nodes | ||||
You can write it yourself as: <xsl:function name="index-of-node" as="xs:integer*"> <xsl:param name="node-set" as="node()*"/> <xsl:param name="node" as="node()"/> <xsl:sequence select=" for $i in 1 to count($node-set) return if ($node-set[$i] is $node) then $i else ()"/> </xsl:function> | |||||
8. | Multiple string replacements | ||||
If you need 8 replacements then nested invocations of replace should do it: replace(replace(replace(....) ... '\{', '{day}{of}') This is still a little bit mess and of course you dont need to explictly nest the replace functions, you can get the system to do it for you. This defines an x:replace function that takes an input string and then a list of replacement pairs, it just recursively calles replace() until the list is done <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:x="data:,x" version="2.0"> <xsl:output method="text"/> <xsl:function name="x:replace"> <xsl:param name="string"/> <xsl:param name="list"/> <xsl:value-of select=" if(empty($list)) then $string else x:replace(replace($string,$list[1],$list[2]),$list[position()>2])"/> </xsl:function> <xsl:template match="/"> <xsl:value-of select=" x:replace('one two three four', ('o', '@', 'tw','TW', 'e', '3')) "/> </xsl:template> </xsl:stylesheet> $ saxon7 rep.xsl rep.xsl @n3 TW@ thr33 f@ur | |||||
9. | Insert a character every nth character | ||||
E.g. insert a space every tenth character, or as per this example, insert X every 4th character. gpSz is the group size, s is the source string to be split. Mike gave a solution untested, which I show here after testing it. <xsl:variable name="s" select="'A long string with commas inserted every 4th character'"/> <xsl:variable name="gpSz" select="4"/> <xsl:value-of select=" string-join( for $i in 0 to (string-length($s) idiv 4) return substring($s, $i*$gpSz + 1, $gpSz), ',')"/> | |||||
10. | Counting characters | ||||
Well the XPath 2.0 solution is sum(for $i in preceding-sibling::text() return string-length($i)) For XSLT 1.0 it's much more difficult, it's the classic problem of summing a calculated value over a node-set. There are several workable solutions:
| |||||
11. | Match an element and last two words of preceding element content | ||||
In XSLT 2.0 (actually XPath 2.0) there is a standard function tokenize() which does the job for you. You can then select the last two words using <xsl:value-of select="tokenize(preceding-sibling::node()[1][self::text()], "\s+")[position() > last() - 2]"/> | |||||
12. | Find Word frequency [tokenise] | ||||
Mike Kay offers Firstly, taking the string value of the element gets rid of all the element markup, which doesn't seem to play any role in this problem. Then you can tokenize using the tokenize() function, being as clever as you care about how to recognize word boundaries and inter-word space. Then you can convert everything to lower case using the lower-case() function. Then you can group using for-each-group. Sorted by descending frequency: <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output indent="yes"/> <xsl:template match="/"> <frequencies> <xsl:for-each-group group-by="." select=" for $w in tokenize(string(.), '[\s.?!,]+')[.] return lower-case($w)"> <xsl:sort select="count(current-group())" order="descending"/> <word><xsl:value-of select="current-grouping-key(), ' - ', count(current-group())"/></word> </xsl:for-each-group> </frequencies> </xsl:template> </xsl:stylesheet> (The predicate [.] elimitates the zero-length string) Here's the start of the output for othello.xml: <?xml version="1.0" encoding="UTF-8"?> <frequencies> <word>i - 816</word> <word>and - 794</word> <word>the - 762</word> <word>to - 591</word> <word>of - 476</word> David Carlisle offers another xslt 2.0 solution. <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:for-each-group select="tokenize(lower-case(.),'(\s|[,.!:;])+')[string(.)]" group-by="."> <xsl:sort select="- count(current-group())"/> <xsl:value-of select="concat(.,' - ',string(count(current-group())),' ')"/> </xsl:for-each-group> </xsl:template> </xsl:stylesheet> | |||||
13. | Configuration file as unparsed-text() | ||||
This sort of conversion can be done very nicely in XSLT 2.0: I showed a very similar example in my tutorial at XML Europe.
| |||||
14. | index-of function | ||||
You have misunderstood the spec. If the input is a sequence of two strings, ("Hello", "John"), then index-of will return 2. To convert the string "Hello John" into a sequence of two strings, use the tokenize() function. | |||||
15. | sequence() | ||||
Appendix C.3 of the Functions and Operators spec shows how to write this as a user-written function: XSLT implementation <xsl:function name="eg:index-of-node" as="xs:integer*"> <xsl:param name="sequence" as="node()*"/> <xsl:param name="srch" as="node()"/> <xsl:for-each select="$sequence"> <xsl:if test=". is $srch"> <xsl:sequence select="position()"/> </xsl:if> </xsl:for-each> </xsl:function> | |||||
16. | Escaping quotes | ||||
XPath 2.0 allows you to escape the delimiting quotes by doubling them, for example "He said: ""I don't""" You can achieve this escaping using the XPath 2.0 replace() function. | |||||
17. | Fast node comparison | ||||
> If there are many ranges and you need it to go at better than linear > speed, you could code a binary-chop. I think Dimitre has done this in > the past, I don't know if it's available in packaged form. Here are two XSLT 2.0 solutions: a DVC (Divide and Conquer) and BS (Binary Search): <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:f="http://fxsl.sf.net/" xmlns:t="http://fxsl.sf.net/test" exclude-result-prefixes="f xs t" > <xsl:output method="text"/> <xsl:variable name="vRanges" as="element()+"> <range from="988" to="989"/> <range from="1008" to="1009"/> <range from="1014" to="1014"/> <range from="1025" to="1036"/> <range from="1038" to="1103"/> <range from="1105" to="1116"/> <range from="1118" to="1119"/> <range from="4150" to="4150"/> <range from="8194" to="8197"/> </xsl:variable> <xsl:template match="/"> <xsl:value-of select="t:inRangeDVC($vRanges, 8195)"/>, <xsl:text/> <xsl:value-of select="t:inRangeBS($vRanges, 8195, 1, count($vRanges))"/> </xsl:template> <xsl:function name="t:inRangeDVC" as="xs:boolean"> <xsl:param name="pRanges" as="element()*"/> <xsl:param name="pVal"/> <xsl:sequence select= "if(empty($pRanges)) then false() else for $cnt in count($pRanges) return if($cnt = 1) then $pVal ge xs:integer($pRanges[1]/@from) and $pVal le xs:integer($pRanges[1]/@to) else for $vHalf in $cnt idiv 2 return if(t:inRangeDVC($pRanges[position() le $vHalf], $pVal)) then true() else t:inRangeDVC($pRanges[position() gt $vHalf], $pVal) " /> </xsl:function> <xsl:function name="t:inRangeBS" as="xs:boolean"> <xsl:param name="pRanges" as="element()*"/> <xsl:param name="pVal"/> <xsl:param name="pLow" as="xs:integer"/> <xsl:param name="pUp" as="xs:integer"/> <xsl:sequence select= "if($pLow gt $pUp) then false() else for $mid in ($pLow + $pUp) idiv 2, $v in $pRanges[$mid] return if($pVal ge xs:integer($v/@from) and $pVal le xs:integer($v/@to)) then true() else if($pVal lt xs:integer($v/@from)) then t:inRangeBS($pRanges, $pVal, $pLow, $mid - 1) else t:inRangeBS($pRanges, $pVal, $mid+1, $pUp) "/> </xsl:function> </xsl:stylesheet> | |||||
18. | URI Escaping | ||||
Non-Ascii characters in a URI should be escaped using the %HH convention, rather than using XML escaping. XSLT 2.0 provides a function escape-uri() to achieve this. In 1.0, it happens automatically when you use the HTML serialization method if the URI appears in an attribute such as <a href="..."> that is known to require a URI as its value. | |||||
19. | Data types in functions | ||||
Firstly, if you're expecting a numerical parameter it's best to say so: <xsl:param name="i" as="xs:double"/> and if you want to return a numerical result it's best to say so: <xsl:function name="my:increase" as="xs:double"> This might be enough to fix the problem (because it will force certain type conversions), and if it doesn't, it will give you error messages that point you closer to the answer. Secondly, xsl:value-of creates a text node. You don't want a text node here, you want a number. So use xsl:sequence: <xsl:function name="my:increase" as="xs:double"> <xsl:param name="i" as="xs:double"/> <xsl:sequence select="round(1.2 * $i)" /> </xsl:function> In most contexts, if you expect a number and provide an untyped text node, the number will be extracted from the text node. But it's better to return the number in the first place. | |||||
20. | Document Crossreferences. Keys? | ||||
Keys are usually recommended for performance, but when you're handling cross-references they can also make your code simpler and more understandable. <xsl:key name="man-by-name" match="Man" use="@name"/> <xsl:key name="woman-by-name" match="Woman" use="@name"/> <xsl:key name="man-by-wifes-name" match="Man" use="@wife"/> <xsl:template match="Woman"> <xsl:apply-templates select="key('man-by-wifes-name', @name)"/> In 2.0 I often write stylesheet functions to encapsulate a relationship: <xsl:function name="get-husband" as="element(Man)"> <xsl:param name="wife" as="element(Woman)"> <xsl:sequence select="key('man-by-wifes-name', $wife/@name)"/> </xsl:function> You can then use this in path expressions rather like a virtual axis: <xsl:template match="Woman"> <xsl:value-of select="get-husband(.)/get-children(.)/@date-of-birth"/> | |||||
21. | document-available() | ||||
> Does doc-available() do anything more than check for doc-available() checks that the file exists and that it contains well-formed XML (and valid XML if you are validating). It actually builds the tree in memory. When you subsequently call document() or doc(), this work won't be repeated. If you just want to check file existence, calling out to a java method is going to be rather cheaper. | |||||
22. | Find longest row (max function) | ||||
If you are generating your XSL-FO using XSLT 1.0 then the usual way is to select all <tr>'s and sort them by the count of their <td>'s, and then pick the first: see faq If you are using XSLT 2.0 then you can use the max() function, eg: <xsl:variable name="maxCells" select="max(//tr/count(td))"/> | |||||
23. | Hex to decimal conversion | ||||
The hex-to-decimal conversion: here's a function I wrote to do this: <xsl:function name="f:hex-to-char" as="xs:integer"> <xsl:param name="in"/> <!-- e.g. 030C --> <xsl:sequence select=" if (string-length($in) eq 1) then f:hex-digit-to-integer($in) else 16*f:hex-to-char(substring($in, 1, string-length($in)-1)) + f:hex-digit-to-integer(substring($in, string-length($in)))"/> </xsl:function> <xsl:function name="f:hex-digit-to-integer" as="xs:integer"> <xsl:param name="char"/> <xsl:sequence select="string-length(substring-before('0123456789ABCDEF', $char))"/> </xsl:function> | |||||
24. | intersect function | ||||
When you want to know if a node is in two different sets. for example suppose you have a key that returns some nodes key('x','a') and some more nodes key('x','b') now, which modes are returned by both a and b. You can do this in xslt1 as key('x','a')[count(key('x','b'))=count(.|key('x','b'))] but it's rather more readable to say key('x','a') intersect key('x','b') > (if automatic node-to-value were applied for intersect). That would get very confusing, especially for text nodes (which many people use interchangeably with strings). You want it to be clear in the syntax whether you are doing identity-equality (so two nodes are ony equal if they are the same node, or value-equality, where two items are equal if they have the same string value. | |||||
25. | How to use generate-id() inside an xsl:function without a node available? | ||||
Have an auxiliary function, which creates a new node every time t is evaluated, for example using: <xsl:function name="pref:GetNode" as="element()"> <xsl:variable name="myNode" as="element()"> <someNode/> </xsl:variable> <xsl:copy-of select="$myNode"/> <xsl:function Then in your code use: generate-id(pref:GetNode() ) It may even be possible to only use one node (and then immediately delete it as part of the closing of the scope of the function. Looking at the spec I am not sure, however if re-using the generated ID for a node (which is no longer alive) is allowed or not. If it is not allowed, then we have the following *cheap* implementation: <xsl:function name="pref:myId" as="xs:string"> <xsl:variable name="myNode" as="element()"> <someNode/> </xsl:variable> <xsl:variable name="vdynNode" as="element()"> <xsl:copy-of select="$myNode"/> </xsl:variable> <xsl:sequence select="generate-id($vdynNode)"/> </xsl:function> DC offers surely you can lose the first variable and write that as <xsl:function name="pref:myId" as="xs:string"> <xsl:variable name="myNode">x</xsl:variable> <xsl:sequence select="generate-id($myNode)"/> </xsl:function> MK comes back with Yes, this is fine. But please get out of the habit of using xsl:value-of when you mean xsl:sequence, and please get into the habit of declaring the types of your functions! This should be <xsl:function name="pref:getId as="xs:string"> <xsl:sequence select="generate-id(pref:getNode())" /> </xsl:function> xsl:value-of is creating a text node, and because of the very identity issues we're discussing, it's very hard to optimize this away: if the return type is declared as xs:string the processor has some chance of recognizing that the text node is going to be atomized as soon as it's created, but really it's better not to create it in the first place. The cheapest solution is probably a text or comment node rather than an element, something like: <xsl:function name="pref:getNode"><xsl:comment/></xsl:function> <xsl:function name="pref:getId"> <xsl:value-of select="generate-id(pref:getNode())" /> </xsl:function> Remember that an LRE like <node/> might be creating a lot of namespaces... | |||||
26. | Understanding position() | ||||
It's a classic, and every now and then, I do it wrongly still (check the archives, I have once or twice asked about the same question). The function position() (you don't need to put "fn:" in front of it) returns the position of the context node. The context changes within the XPath expression to each node it is evaluating. Thus, the expression ('a', 'b')[position()] a) will test the first item in the sequence 'a' for the predicate [position()] which evaluates to the predicate [1], which is short for [position() = 1], which will return true because the current position is 1. b) will test the second item in the sequence 'b' for the predicate [position()] which evaluates to the predicate [2], which is short for [position() = 2], which will return true because the current position is 2 (we are at the second item, 'b', remember). As it comes, the following: some-node[position()] will always evaluate to true, and (some-sequence)[position()] will always return the whole sequence, because each separate item will always have the position in the sequence that equals the result of the predicate [position()]. What you want is the following: <xsl:variable name="pos" select="position()" /> <xsl:sequence select="('a', 'b')[$pos]" /> | |||||
27. | Count all caution elements | ||||
In XPath 2.0, count(preceding::caution intersect ancestor::chapter//caution) In 1.0, you can simulate the intersect operator using the equivalence A intersect B ==> A[count(.|B) = count(B)] But you might be better off using <xsl:number count="caution" level="any" from="chapter"/> | |||||
28. | Document available? | ||||
Use doc-available() rather than fn:doc-available() (as earlier drafts of xslt2 used different namespaces, but all drafts define the default function namespaces to be the correct namespace for that draft, so if your implementation implements an old draft of xslt2 then it will still work) IE (and other browsers such as mozilla and opera) do not support XSLT2 (and are unlikely to support them for some years one would assume) To check if a file exists in xslt1 you can use test="document('foo.xml')" which will be false if the processor returns an empty node set for missing files (they are also allowed to raisean error) or you can escape to an extension language (in IE but not in mozilla) msxml for example allows you to define functions in javascript, so assuming that you are in a situation that browser security allows access to the filesystem at all you can text in javascript (or any other ms scripting language) | |||||
29. | for-each context | ||||
By using tokenize inside a for-each, you've set the context to a string that has no relationship to your input document. To fix it, use a variable that contains the root element (I call those "anchor variables"), thus: <xsl:template match="src"> <xsl:variable name="root" select="/" <xsl:for-each select="tokenize( 'a c', '[ ]')"> <xsl:apply-templates select="$root/doc/*[current() eq string(@id)]" /> </xsl:for-each> </xsl:template> That way, your apply-templates instruction finds the context you need. | |||||
30. | Function with variable number of arguments | ||||
Andrew offers, Why not just have the function take one argument that is a sequence? DC follows up with Also, without using any extension at all, you can make a function that takes a sequence (like string-join) rather than an arbitrary number of arguments (like concat) the only practical difference to the end user is that you have to double the brackets define local:function to take xsl:integer* as argument, and you can do local:function((1,2)) local:function((1,2,3,4,5,6)) Dimitre expands this with Another approach, limited but pragmatic, is the one taken by FXSL to use many overloads for the function, one for a given allowed number of arguments. Thus, if I would expect no more than, say, 10 arguments to be specified, I would define the following overloads: x:fnName(arg1), x:fnName(arg1,arg2), x:fnName(arg1,arg2,arg3), x:fnName(arg1,arg2,arg3,arg4), x:fnName(arg1,arg2,arg3,arg4,arg5), x:fnName(arg1,arg2,arg3,arg4,arg5,arg6), x:fnName(arg1,arg2,arg3,arg4,arg5,arg6,arg7), x:fnName(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8), x:fnName(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9), x:fnName(arg1,arg2,arg3,arg4,arg5,arg6,arg7,arg8,arg9,arg10) Certainly, the common code implementing these overload can typically be put into a single auxiliary function, so that redundancy would be avoided. Also, the above overloads can be generated programmatically with another transformation :) | |||||
31. | Check for duplicate ID values across files | ||||
<xsl:for-each-group select="collection(...)//@id" group-by="."> <xsl:if test="count(current-group()) ne 1"> <xsl:message>Id value <xsl:value-of select="current-grouping-key()"/> is duplicated in files <xsl:value-of select="current-group()/document-uri(/)" separator=" and "/></xsl:message> </xsl:if> </xsl:for-each-group> | |||||
32. | Intersection 2.0 | ||||
if $a and $b are two node sets then $a[count(.|$b)=count($b)] is the intersection of $a and $b, that is $a intersect $b in xpath2. |