xslt sort unique
1. | How to sort the unique elements | ||||||
From <A> <D> <C/> <A> <B/> </A> </D> <B/> </A> I want an output of ABCD Applying the muenchian technique I get the following stylesheet: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:key name="first-id" match="*" use="generate-id((preceding::* | ancestor::*) [name() = name(current())])"/> <xsl:template match="/"> <xsl:apply-templates select="key('first-id', '')"> <xsl:sort select="name()"/> </xsl:apply-templates> </xsl:template> <xsl:template match="*"> <xsl:value-of select="name()"/> </xsl:template> </xsl:stylesheet> | |||||||
2. | Sort on more than one element | ||||||
Put as many <xsl:sort> elements as you need: <xsl:sort select="col1"/> <xsl:sort select="col2"/> ... where the first <xsl:sort> element specifies the primary sort key, the second specifies the secondary sort key and so on. When the apply-templates or for-each builds its nodelist, it sorts according to the sort keys; if two or more nodes have equal weight in the sort, then it should return in document order. Steve Muench adds: List each sort key in it's own <xsl:sort> element. The first one that appears in document order is the "primary" sort, the second one that appears is the "secondary" sort, etc. <xsl:for-each select="customer-list/customer"> <!-- Sort (alphabetically) on customer @name attr --> <xsl:sort select="@name"/> <!-- Sort (numerically, descending) on sum of their orders --> <xsl:sort select="sum(orders/order/total)" data-type="number" order="descending"/> <!-- etc. --> </xsl:for-each> Jeni T adds: The advantage of this syntax over a comma-separated list is that you can have different properties attached to the two sorts, such as the order in which the list is sorted by these cols, or whether the cols are treated as text or numbers: <xsl:sort select="col1" order="ascending" data-type="text" /> <xsl:sort select="col2" order="descending" data-type="number" /> You can add as many xsl:sorts as you want within an xsl:for-each or an xsl:apply-templates. | |||||||
3. | Sorting | ||||||
Q: Expansion. > I'm a bit confused by the interaction of xsl:sort and the various > axes. I suppose basically my question is: does xsl:sort affect the > ordering of nodes for the purpose of reference within the stylesheet, > or just for the purpose of the output? xsl:sort affects the order in which the nodes are processed. It does not affect the position of the nodes on any axis, such as the following-siblings axis. | |||||||
4. | How to find out if a node is the first of its kind after a sort | ||||||
<xsl:template match="test"> <xsl:for-each select="a"> <xsl:sort select="."/> [<xsl:number value="position()"/>: <xsl:value-of select="."/>] <xsl:if test="position()=1">This is First</xsl:if> </xsl:for-each> </xsl:template> produces [1: a] This is First [2: e] [3: f] [4: g] [5: x] [6: z] from <test> <a>e</a> <a>x</a> <a>f</a> <a>a</a> <a>g</a> <a>z</a> </test> | |||||||
5. | Sort Order vs Document Order | ||||||
Even when a node list is in sorted order (so the values returned by position() reflect sorted order) the axis specifiers like preceding-sibling refer to _document_ order. | |||||||
6. | First occurence of many | ||||||
Q expansion: I'm trying to select all <term> elements in a document (multiple of which may have the same content), for which the element is the first containing its content. You can try this : <xsl:for-each select="term[not(preceding::term=.)]"> <xsl:value-of select="."/> </xsl:for-each> Mike Brown adds: Hopefully this will work. The questionner wanted a comparison of "content" of term elements, which is more difficult to test than string values because descendant nodes would be considered "content". If each <term> contains only text, it will be fine. | |||||||
7. | How to select only the unique elements from an xml document | ||||||
<xsl:for-each select="//CUSTOMER[not(.=preceding::CUSTOMER)]"> <xsl:value-of select="."/> </xsl:for-each> | |||||||
8. | How to sort by attribute | ||||||
for source file <sender> <a clli="200" b="20"/> <a clli="100" b="10"/> </sender> Output needed is <wrapper> <a> @clli</a> <b> @b </b> </wrapper> <wrapper xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="1.0"> <xsl:for-each select="sender/a"> <xsl:sort select="@clli"/> <a><xsl:value-of select="@clli"/></a> <b><xsl:value-of select="@b"/></b> </xsl:for-each> </wrapper> This produces: <wrapper> <a>100</a> <b>10</b> <a>200</a> <b>20</b> </wrapper> | |||||||
9. | sort count number and group | ||||||
Example: xml: <root> <foo> <bar>bard</bar> <bar>bark</bar> </foo> <foo> <bar>bark</bar> <bar>barb</bar> </foo> </root> Sample xsl that selects distinct <bar> <xsl:template match="//bar[not(. = following::bar)]"> <xsl:value-of select="."/> </xsl:template> produces: bard bark barb what I want is to number these, sort them, and count the number of times they appear in the xml source Desired output: 1. barb -1 2. bard -1 3. bark -2 I can't seem to get there from here. Do I need to use for-each? Solution: <xsl:template match="/"> <DL> <xsl:apply-templates select="//bar[not(. = preceding::bar)]"> <xsl:sort select="bar"/> </xsl:apply-templates> </DL> </xsl:template> <xsl:template match="bar"> <DT> <xsl:number value="position()" format="1."/> <xsl:value-of select="."/>- <xsl:value-of select="count(//bar[.=current()])"/> </DT> </xsl:template> </xsl:stylesheet> | |||||||
10. | Sorting text-number strings | ||||||
> Is there a way to achieve a sort with results?: > title 1, title2, title 3,..., title 9, title 10, title 11 XSL isn't especially good (actually it's normally hopeless) at infering structure from character data, so it would have had a much easier time if the characters and numbers had been separated in the input <title name="this string" number="42"/> or some such. I'll give an example that sorts strings or the form "abc 123" ie characters, space, numbers, first on the first word, then numerically on the digits. <xsl:for-each select="whatever"> <xsl:sort data-type="text" select="substring-before(.,' ')"/> <xsl:sort data-type="number" select="substring-after(.,' ')"/> Mike Kay adds: Saxon allows you to supply a user-defined collator implemented as a Java class. | |||||||
11. | Checking sort order | ||||||
The simplest way to check that a list of strings is in sorted order is to sort it and see if the output equals the input. It's probably possible to improve the following: <xsl:template name="is-sorted"> <!-- test whether the document-order of the supplied $nodes is the same as the sorted order of their string-values --> <xsl:param name="nodes"/> <xsl:variable name="unsorted-nodes"> <xsl:for-each select="$nodes"/> <xsl:value-of select="."/> </xsl:for-each> </xsl:variable> <xsl:variable name="sorted-nodes"> <xsl:for-each select="$nodes"/> <xsl:sort/> <xsl:value-of select="."/> </xsl:for-each> </xsl:variable> <xsl:if test="string($sorted-nodes) != string($unsorted-nodes)"> <xsl:message terminate="yes">Data is not correctly sorted</xsl:message> </xsl:if> </xsl:template> E.g. to check that all qna's are sorted by topic order. <xsl:template match="section"> <xsl:call-template name="is-sorted"> <xsl:with-param name="nodes" select="qna/topic"/> </xsl:call-template> This passes all topics in this section to the named templated, which will bomb out with the message if the two 'orders' are not equal. I'm curious why you cast them to string prior to the comparison? Is it not possible to compare a result tree fragment held in the two variables? The cast to a string was there mainly for clarity, and also for robustness: The current version of MSXML doesn't follow the rules correctly when casting from a result-tree-fragment, though I think this example would be OK | |||||||
12. | The ultimate unique sort | ||||||
<?xml version='1.0'?> <Tasks> <Task><Desc>Task1</Desc><Owner>Steve</Owner></Task> <Task><Desc>Task2</Desc><Owner>Mike</Owner></Task> <Task><Desc>Task3</Desc><Owner>Dave</Owner></Task> <Task><Desc>Task4</Desc><Owner>Steve</Owner></Task> <Task><Desc>Task5</Desc><Owner>Mike</Owner></Task> <Task><Desc>Task9</Desc><Owner>Mike</Owner></Task> <Task><Desc>Task9</Desc><Owner>Fred</Owner></Task> <Task><Desc>Task9</Desc><Owner>Joe</Owner></Task> </Tasks> <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output indent="yes"/> <!--Create a key for the unique element required with the given example try Desc or Owner since both are duplicated. --> <xsl:key name="xxx" match="/Tasks/Task/Desc" use="."/> <xsl:template match="/"> <Outer-Wrapper> <xsl:for-each select="/Tasks/Task/Desc[generate-id(.)=generate-id(key('xxx',.)[1])]"> <!--Selects unique items --> <xsl:sort select="."/> <!--Only if wanted 'sorted' --> <Unique-Item-List Element-Name="{.}"> <!--Optional inner wrapper --> <xsl:for-each select="key('xxx',.)/.."> <!-- Unique items --> <xsl:comment>Present any associated data. Context is Element-name </xsl:comment> </xsl:for-each> </Unique-Item-List> <!--Close inner wrapper --> </xsl:for-each> </Outer-Wrapper> </xsl:template> </xsl:stylesheet> Ken Holman adds another example. T:\ftemp>type tests.xml <?xml version="1.0"?> <names> <name><given>Julie</given><surname>Holman</surname></name> <name><given>Margaret</given><surname>Mahoney</surname></name> <name><given>Ted</given><surname>Holman</surname></name> <name><given>John</given><surname>Mahoney</surname></name> <name><given>Kathryn</given><surname>Holman</surname></name> <name><given>Ken</given><surname>Holman</surname></name> </names> T:\ftemp>type tests.xsl <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text"/> <!--prepare to examine all names valued by surname--> <xsl:key name="surnames" match="name" use="surname"/> <xsl:template match="/"> <!--root rule--> <!--select only those name elements whose unique generated id is equal to the generated id of the first of the key members with the same surname--> <xsl:for-each select="//name[generate-id(.)= generate-id(key('surnames',surname)[1])]"> <xsl:value-of select="surname"/> <!--show the grouping--> <xsl:text> </xsl:text> <!--select only those for the grouping--> <xsl:for-each select="//name[surname=current()/surname]"> <xsl:sort select="given"/> <!--sorted within grouping--> <xsl:text> </xsl:text> <xsl:value-of select="given"/> <!--member distinctions--> <xsl:text> </xsl:text> </xsl:for-each> </xsl:for-each> </xsl:template> </xsl:stylesheet> T:\ftemp>saxon tests.xml tests.xsl Holman Julie Kathryn Ken Ted Mahoney John Margaret And Sebastian Rahtz adds you can speed this up again, I believe, by using the key again <xsl:for-each select="//name[generate-id(.)= generate-id(key('surnames',surname)[1])]"> <xsl:variable name="surname" select="surname"/> <xsl:for-each select="key('surnames',$surname)"> ... That is to say, for every unique surname, consult the key again to get the list of people with that surname. that way, you do not have to navigate the tree again at all, since the list of relevant nodes is already known. | |||||||
13. | Removing duplicates Muenchian solution | ||||||
I'm going to assume you *were* actually referring to removing duplicate elements and, to make the answer more general and more accurate, I'm also going to assume that you have a number of different elements within your content. Finally, I'm going to assume that you do know that the thing that it is the content of the element that makes it a duplicate (rather than the value of an attribute, say), so something like: <doc> <employee>Bill</employee> <employee>Andy</employee> <director>Amy</director> <employee>Bill</employee> <director>Louise</director> <director>Louise</director> <employee>Bill</employee> <employee>Andy</employee> <employee>John</employee> <director>Amy</director> <director>Louise</director> </doc> To produce something like: <doc> <employee>Bill</employee> <employee>Andy</employee> <director>Amy</director> <director>Louise</director> <employee>John</employee> </doc> Rather than using the preceding-sibling axis, I'm going to use the Muenchian technique to identify the first unique elements, because it's a lot easier to use in this case, as well as being more efficient generally. First, define a key so that you can index on the unique features of the particular elements that you want. In this case, there are two unique features: the name of the element, and the content of the element. To make a key that includes both, I'm concatenating these two bits of information together (with a separator to hopefully account for odd occurrances that could generate the same key despite having different element/content combinations): <xsl:key name="elements" match="*" use="concat(name(), '::', .)" /> So all the <employee>Bill</employee> elements are indexed under 'employee::Bill'. The unique elements are those that appear first in the list of elements that are indexed by the same key. Identifying those involves testing to see whether the node you're currently looking at is the same node as the first node in the list that is indexed by the key for the node. So if the <employee>Bill</employee> node that we're looking at is the first one in the list that we get when we retrieve the 'employee::Bill' nodes from the 'elements' key, then we know it hasn't been processed before. <xsl:template match="doc"> <xsl:for-each select="*[generate-id(.) = generate-id(key('elements', concat(name(), '::', .))[1])]"> <xsl:copy-of select="." /> </xsl:for-each> </xsl:template> | |||||||
14. | sorting and counting | ||||||
>1. sort them by 'priority' >2. leave, say, only 3 nodes in the result Here's a solution. First, specify the number of nodes you want in a
parameter, so that you can change it whenever you like:
Next, you want to treat the nodes individually despite them being nested inside each other, and you want to sort them within your output in order of priority. You can use either xsl:for-each or xsl:apply-templates to select the nodes within the document, whatever their level (using //node) and xsl:sort within whichever you use to sort in order of priority. For example: <xsl:for-each select="//node"> <xsl:sort select="@priority" order="ascending" /> ... </xsl:for-each> Within that, you only want to output anything if the position of the node within that sorted list is less than or equal to the number of nodes you want in the result. In other words: <xsl:for-each select="//node"> <xsl:sort select="@priority" order="ascending" /> <xsl:if test="position() <= number($nodes)"> <xsl:value-of select="name" /> </xsl:if> </xsl:for-each> | |||||||
15. | arbitrary sorting | ||||||
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:m ="urn:non-null-namespace"> <m:month name="Jan" value="1" /> <m:month name="Feb" value="2" /> <m:month name="Mar" value="3" /> <m:month name="Apr" value="4" /> <m:month name="May" value="5" /> <m:month name="Jun" value="6" /> <m:month name="Jul" value="7" /> <m:month name="Aug" value="8" /> <m:month name="Sep" value="9" /> <m:month name="Oct" value="10" /> <m:month name="Nov" value="11" /> <m:month name="Dec" value="12" /> <xsl:template match="report-list"> <xsl:apply-templates> <xsl:sort select="document('')//m:month[@name=current()/@month]/@value" data-type="number" /> </xsl:apply-templates> </xsl:template> </xsl:stylesheet> To sort descending, xsl:sort has an 'order' attribute, with possible values 'ascending' (the default) or 'descending'. | |||||||
16. | Arbitrary sorting | ||||||
>supposing I have elements with a month attribute <report month="Jan" /> <report month="Feb" /> >and so on. >Of course unordered :-) >Now I want them in chronological order.... >I know how to translate Jan->1, Feb->2 etc via a named template >[and] xsl:choose, but that doesn't help much in this case. Naturally, what you want is to map names to numbers using keys, which can be very efficient. Keys were made for just this purpose! So far, I've been able to get this to work if the month table is in the input document. Consider this input document: <?xml version="1.0"?> <doc> <monthtab> <entry><name>Jan</name><number>1</number></entry> <entry><name>January</name><number>1</number></entry> <entry><name>Feb</name><number>2</number></entry> <entry><name>February</name><number>2</number></entry> <entry><name>Mar</name><number>3</number></entry> <entry><name>March</name><number>3</number></entry> <entry><name>Apr</name><number>4</number></entry> <entry><name>April</name><number>4</number></entry> <entry><name>May</name><number>5</number></entry> <entry><name>Jun</name><number>6</number></entry> <entry><name>June</name><number>6</number></entry> <entry><name>Jul</name><number>7</number></entry> <entry><name>July</name><number>7</number></entry> <entry><name>Aug</name><number>8</number></entry> <entry><name>August</name><number>8</number></entry> <entry><name>Sep</name><number>9</number></entry> <entry><name>Sept</name><number>9</number></entry> <entry><name>September</name><number>9</number></entry> <entry><name>Oct</name><number>10</number></entry> <entry><name>October</name><number>10</number></entry> <entry><name>Nov</name><number>11</number></entry> <entry><name>November</name><number>11</number></entry> <entry><name>Dec</name><number>12</number></entry> <entry><name>December</name><number>12</number></entry> </monthtab> <bday person="Linda"><month>Apr</month><day>22</day></bday> <bday person="Marie"><month>September</month><day>9</day></bday> <bday person="Lisa"><month>March</month><day>31</day></bday> <bday person="Harry"><month>Sep</month><day>16</day></bday> <bday person="Ginny"><month>Jan</month><day>22</day></bday> <bday person="Pedro"><month>November</month><day>2</day></bday> <bday person="Bill"><month>Apr</month><day>4</day></bday> <bday person="Frida"><month>July</month><day>5</day></bday> </doc> The first part of the above document is the month table. For demonstration purposes, I have both abbreviated and full month names (look at September) as synonyms, and you could easily add names in other languages. There's a many-to-one structure: look up a name, get back the correct month number. The rest of the document is the set of records that we want to sort in chronological order. The <day> elements will work as a simple numeric sort, but that's secondary to the sort by months. Following Oliver's request, we want <xsl:sort select="key('MonthNum',month)" data-type="number"/> where we will take the <month> as a string and get its number out of the 'MonthNum' keyspace. I'll supply an example of the above sort working in an apply-templates situation, but it can work similarly in a for-each loop. The current node will be the outer <doc> element at the time we sort, so the keyspace definition will also be based on that context: <xsl:key name="MonthNum" match="monthtab/entry/number" use="../name" /> With that background, check out this stylesheet: <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:key name="MonthNum" match="monthtab/entry/number" use="../name" /> <xsl:template match="doc"> <out> <xsl:text>Birthdays in chronological order... </xsl:text> <xsl:apply-templates select="bday"> <xsl:sort select="key('MonthNum',month)" data-type="number" /> <xsl:sort select="day" data-type="number" /> </xsl:apply-templates> </out> </xsl:template> <xsl:template match="bday"> <xsl:value-of select="@person"/><xsl:text>: </xsl:text> <xsl:value-of select="month"/><xsl:text> </xsl:text> <xsl:value-of select="day"/><xsl:text> </xsl:text> </xsl:template> </xsl:stylesheet> The <out> element is just something we commonly use as a tracing aid. The above works on both Xalan and Saxon. Ideally, one would want to put the month table in a completely separate file, so it could be shared among all stylesheets that needed it. Depending on your situation, you might prefer to have the month table right in the stylesheet. Either way, you have to use document(), which certainly complicates the procedure. The point of this message is to show that key() can be used in sort keys. | |||||||
17. | Sorting into a variable | ||||||
Whats wrong with this? <xsl:variable name="fns"> <xsl:for-each select="functions[.!='']"> <xsl:sort data-type="text" select="pair/word"/> </xsl:for-each> </xsl:variable> The xsl:for-each loops over each of the functions, sorted in terms of pair/word and then... does nothing with them! Since there's nothing there that actually produces any output, then the variable $fns is set to an empty rtf. When it's passed to the next for-each, there's nothing to iterate over. What you wanted was: <xsl:variable name="fns"> <xsl:for-each select="functions[.!='']"> <xsl:sort data-type="text" select="pair/word"/> <xsl:copy-of select="." /> </xsl:for-each> </xsl:variable> i.e. to produce a copy of each of the sorted functions to process later on. [I guess there's a good reason for storing the sorted list and using an extension function to access it rather than just doing: <xsl:for-each select="functions[.!='']"> <xsl:sort data-type="text" select="pair/word" />* <xsl:value-of select="child::*/text()"/> <br /> </xsl:for-each>] | |||||||
18. | Ultimate arbitrary sort algorithm | ||||||
Mike Kay answered something like "it's not possible to construct a key expression that requires conditional processing". As we've learned from the intersection expression: never say never! :-) In the following I try to explain a method to construct such an expression - only by XPath means. Decide yourself wether the result is rather pretty or rather ugly ... xsl:sort requires an expression used as the sort key. What we want is the following: if condition1 then use string1 as sort key if condition2 then use string2 as sort key etc. How to achieve that? The following expression gives $string if $condition is true, otherwise it results in an empty string: substring($string, 1 div number($condition)) Regarding to Mike's book this is perfectly valid. (Note: works with Saxon and XT, but not with my versions of Xalan and Oracle XSL - but I've not installed the latest versions ...) If you don't like "infinity" - here's another solution: substring($string, 1, number($condition)*string-length($string)) but then you need $string twice ... The concatenation of all these substring expressions forms the sort key. Requirement: all conditions must be mutually exclusive. That's all! :-) Here's my example which demonstrates the handling of leading "Re: "s. If the string starts with "Re: ", an equivalent string without this prefix but with an appended ", Re" forms the key, otherwise the original string is used: <xsl:sort select="concat( substring(concat(substring-after(.,'Re: '), ', Re'), 1 div number(starts-with(.,'Re: '))), substring(., 1 div number(not(starts-with(.,'Re: ')))))" /> As you may imagine these expressions could become very complex the more arbitrary you want to sort. Jeni Tennison adds This is obviously the 'Becker Method' :) It is, of course, hideous when you actually use it. You can make it a little less hideous by dropping the number() - 'div' automatically converts its arguments to a number anyway. Do make sure, as well, to use boolean() if the condition is a node set to convert it into a boolean value (true if such a node exists, false if it doesn't). In other words, the pattern is: concat(substring($result1, 1 div $condition1), substring($result2, 1 div $condition2), ...) where the conditions are all boolean and the results are all strings. Matt's original problem was: >I have some data: ><item>MacBean</item> ><item>McBarlow</item> ><item>Re MacBart</item> ><item>Re McBeanie</item> > >Which needs to be sorted and transformed as follows: ><item>McBarlow</item> ><item>Re McBart</item> ><item>MacBean</item> ><item>Re McBeanie</item> This is solved by: <xsl:template match="list"> <xsl:for-each select="item"> <xsl:sort select="concat( substring(concat('Mac', substring-after(., 'Re Mc'), ', Re'), 1 div starts-with(., 'Re Mc')), substring(concat(substring-after(., 'Re '), ', Re'), 1 div (starts-with(., 'Re ') and not(starts-with(., 'Re Mc')))), substring(concat('Mac', substring-after(., 'Mc')), 1 div (not(starts-with(., 'Re ')) and starts-with(., 'Mc'))), substring(., 1 div not(starts-with(.,'Mc') or starts-with(., 'Re '))))" /> <xsl:copy-of select="." /> </xsl:for-each> </xsl:template> Assuming that 'Re ' is the only thing that can precede the name. You can nest the conditions if you want to (actually this makes it even more complex!). This is some of the ugliest XSLT I have ever seen :) :) | |||||||
19. | Maximum value of a list | ||||||
If the list is declared in XML, you can sort the list of values in descending order and pick off the first value: <xsl:variable name="maximum"> <xsl:for-each select="$list"> <xsl:sort select="." order="descending" /> <xsl:if test="position() = 1"> <xsl:value-of select="." /> </xsl:if> </xsl:for-each> </xsl:variable> If the list were a string separated by commas, say, then you have to use recursion, and the current node doesn't matter, so named templates are the best choice, but you can use xsl:apply-templates instead if you want to: <xsl:variable name="maximum"> <xsl:apply-templates select="." mode="maximum"> <xsl:with-param name="list" select="concat($list, ', ')" /> </xsl:apply-templates> </xsl:variable> <xsl:template match="node()|/" mode="maximum"> <xsl:param name="list" /> <xsl:variable name="first" select="substring-before($list, ',')" /> <xsl:variable name="rest" select="substring-after($list, ',')" /> <xsl:choose> <xsl:when test="not(normalize-space($rest))"> <xsl:value-of select="$first" /> </xsl:when> <xsl:otherwise> <xsl:variable name="max"> <xsl:apply-templates select="." mode="maximum"> <xsl:with-param name="list" select="$rest" /> </xsl:apply-templates> </xsl:variable> <xsl:choose> <xsl:when test="$first > $max"> <xsl:value-of select="$first" /> </xsl:when> <xsl:otherwise> <xsl:value-of select="$max" /> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template> | |||||||
20. | Case insensitive sorting | ||||||
In the comparison that does not involve case-insensitive translation, you select: //LEAGUE[not(@NAME = preceding::*/@NAME)] Within equality tests, the way the result is worked out depends on the type of the nodes that are involved. When they involve node sets (as in this case), the equality expression returns true if there are nodes within the node set(s) for which the equality expression will be true. In other words, "@NAME = preceding::*/@NAME" returns true if *any* of the preceding elements has a NAME attribute that matches the NAME attribute of the current node. When you select with the translation: //LEAGUE[not(translate(@NAME,$lower,$upper) = translate(preceding::*/@NAME,$lower,$upper))] things work differently because the translate() function returns a string. Doing: translate(preceding::*/@NAME, $lower, $upper) translates the string value of the node set preceding::*/@NAME from lower case to upper case, and returns this string. The string value of a node set is the string value of the first node in the node set - the value of the first preceding element's NAME attribute. That means that you're testing the equality of the translated @NAME of the current element with the translated @NAME of the first preceding element, not comparing it with all the other preceding element's @NAME I don't *think* it's possible to do the selection you're after with a single select expression, but you could get around it by doing a xsl:for-each on all the //LEAGUE elements, and containing within it an xsl:if that only retrieved those who don't have a preceding element with the same (translated) name: <xsl:for-each select="//LEAGUE"> <xsl:sort select="@NAME" /> <xsl:variable name="name" select="translate(@NAME, $lower, $upper)" /> <xsl:if test="not(preceding::*[translate(@NAME, $lower, $upper) = $name])"> <!-- do stuff --> </xsl:if> </xsl:for-each> If possible, for efficiency, you should probably give a more exact indication of the LEAGUE elements you're interested in (e.g. '/SCORES/LEAGUE') and if you're only interested in the preceding-sibling::LEAGUE elements, you should use this rather than the general preceding::*. Generally, more specific XPath expressions are more efficient. David Carlisle generalises case insenstive comparisons If you are writing in English, <xsl:variable name="u" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/> <xsl:variable name="l" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:when test= "self::node() [translate(. ,$u,$l) = translate($reference,$u,$l)]"> | |||||||
21. | Case Insensitive Sorted Index with Headings | ||||||
Case Insensitive Sorted Index with Headings (including combined heading for all numbers/symbols) PROBLEM: Need sorted index with a single subheading for numbers and symbols and subheadings for each letter represented. Important that upper and lowercase letters be intermingled, that is: <H2>A</H2> aa Ab Ac ad rather than <H2>a</H2> aa ad <H2>A</H2> Ab Ac My XML looked like this: <?xml version="1.0"?> <?xml:stylesheet type="text/xsl" href="test.xsl"?> <pages> <page name="name1" location="file1.xml"> <index entry="aa"/> </page> <page name="name2" location="file2.xml"> <index entry="Ac"/> <index entry="Ab"/> <index entry="ad"/> </page> ... </pages> The solution:
This resulted in XSL that looked like this: <?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:HTML="http://www.w3.org/Profiles/XHTML-transitional"> <xsl:key name="letters" match="//index" use="translate (substring(@entry,1,1), 'abcdefghijklmnopqrstuvwxyz1234567890@', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ###########')" /> ... <xsl:template match="pages"> <xsl:for-each select="//index[count(. | key('letters', translate (substring(@entry,1,1), 'abcdefghijklmnopqrstuvwxyz1234567890@', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ###########'))[1]) = 1]"> <xsl:sort select="@entry" /> <xsl:variable name="initial" select="translate (substring(@entry,1,1), 'abcdefghijklmnopqrstuvwxyz1234567890@', 'ABCDEFGHIJKLMNOPQRSTUVWXYZ###########')" /> <a name="{$initial}" /> <xsl:choose> <xsl:when test ="$initial = '#'"> <h2>Numbers & symbols</h2> </xsl:when> <xsl:otherwise> <h2><xsl:value-of select="$initial" /></h2> </xsl:otherwise> </xsl:choose> <xsl:for-each select="key('letters', $initial)"> <xsl:sort select="@entry" /> <p><a><xsl:attribute name="href"><xsl:value-of select="../@location"/></xsl:attribute><xsl:value-of select="@entry"/></a></p> </xsl:for-each> </xsl:for-each> </xsl:template> | |||||||
22. | Sorting time values | ||||||
> I have the following xml: > > <times> > <time value="10:45"/> > <time value="1:15"/> > <time value="9:43"/> > <time value="35:27"/> > <time value="20:48"/> > </times> Break up the time using substring-before() and substring-after(), and use the two parts as major and minor sort key, both with data-type="number". Dimitre Novatchev provides the example <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:template match="/times"> <xsl:copy> <xsl:apply-templates select="time"> <xsl:sort data-type="number" select="substring-before(@value,':')"/> <xsl:sort data-type="number" select="substring-after(@value,':')"/> </xsl:apply-templates> </xsl:copy> </xsl:template> <xsl:template match="/ | @* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> | |||||||
23. | Topological Sort | ||||||
i want to implement a topological sort in XSLT. This is necessary for generating program code (IDL for example) out of an XML file A stylesheet for processing elements in topological sorted order. The trick is to carefully select elements from the document into variables so that they are node sets and cen be selected later on. The complete problem is stated in an earlier post (2000-11-04) and can be found in the archive. Pseudocode: select structs with no dependencies process them repeat if not all structs are processed select structs which are not processed have only dependencies which are processed if empty stop else process them done This translates into the following stylesheet: <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text"/> <xsl:template match="structs"> <xsl:call-template name="process"> <xsl:with-param name="nodes" select="struct[not(field/type/ref)]"/> <xsl:with-param name="finished" select="/.."/> </xsl:call-template> </xsl:template> <xsl:template name="process"> <xsl:param name="nodes"/> <xsl:param name="finished"/> <xsl:variable name="processed" select="$nodes|$finished"/> <xsl:for-each select="$nodes"> <xsl:value-of select="name"/> </xsl:for-each> <xsl:if test="count(struct)>count($processed)"> <xsl:variable name="nextnodes" select="struct[not($processed/name=name) and not(field/type/ref[not(. = $processed/name)])]"/> <xsl:if test="$nextnodes"> <xsl:call-template name="process"> <xsl:with-param name="nodes" select="$nextnodes"/> <xsl:with-param name="finished" select="$processed"/> </xsl:call-template> </xsl:if> </xsl:if> </xsl:template> </xsl:stylesheet> The structs are processed in increasing distance from leaves in the dependency graph. Can one of the gurus please comment on the "count(field/type/ref)=count(...)" construct, and whether this could be substituted by a possibly more efficient condition? In my real world examples with some 200+ structs it takes quite some time, if there are cheap optimisations i would appreciate it. Applying the stylesheet to the following example document gives the expected result of This outputs "ACBED" which is the correct dependency order. <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE structs [ <!ELEMENT structs (struct*)> <!ELEMENT struct (name,field*)> <!ELEMENT field (name,type)> <!ELEMENT name (#PCDATA)> <!ELEMENT type (ref|long)> <!ELEMENT ref (#PCDATA)> <!ELEMENT long EMPTY> ]> <structs> <struct> <name>A</name> <field> <name>A1</name> <type><long/></type> </field> </struct> <struct> <name>B</name> <field> <name>B1</name> <type><ref>A</ref></type> </field> <field> <name>B2</name> <type><ref>C</ref></type> </field> </struct> <struct> <name>C</name> <field> <name>C1</name> <type><long/></type> </field> </struct> <struct> <name>D</name> <field> <name>D1</name> <type><ref>E</ref></type> </field> <field> <name>D2</name> <type><ref>A</ref></type> </field> </struct> <struct> <name>E</name> <field> <name>E1</name> <type><ref>C</ref></type> </field> </struct> </structs> | |||||||
24. | Special Alpha sort | ||||||
I need to create an alphabetical list of words. If a word contains a dash, it means it has to be joined with the following word that has an attribute type="end". xml <?xml version='1.0'?> <root> <line lineID="1"> <word wordID="1">ABC-</word> <word wordID="2">ABCD</word> <word wordID="2">ABCDE</word> </line> <line lineID="2"> <word wordID="1" type="end">DEF</word> <word wordID="2">XYZ</word> <word>ABC-</word> <word>spirit-level</word> <word type="end">DEF</word> </line> </root> (which combines your test case) and xsl <?xml version="1.0" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="text"/> <xsl:template match="/"> <xsl:for-each select="root/line/word[not(@type = 'end')]"> <xsl:sort select="translate(concat(.,self::*[contains(.,'-')] /following::word[@type='end']), '-', '')"/> <xsl:choose> <xsl:when test="substring(., string-length()) = '-'"> <xsl:value-of select="substring(., 1, string-length() - 1)"/> <xsl:value-of select="following::word[@type='end']"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="."/> </xsl:otherwise> </xsl:choose> <xsl:if test="position() != last()"> <xsl:text>
</xsl:text> </xsl:if> </xsl:for-each> </xsl:template> </xsl:stylesheet> Which includes your changes to test for the extra cases. (and your minor correction) this gives ABCD ABCDE ABCDEF ABCDEF spirit-level XYZ | |||||||
25. | Ordering and iteration problem | ||||||
> My thinking is that I need to do something like > > for each row > for each column > ooutput the <circuit-breaker> with that row and column I'd probably do this using the Piez Method/Hack of having an xsl:for-each iterate over the correct number of random nodes and using the position of the node to give the row/column number. You need to define some random nodes - I usually use nodes from the stylesheet: <xsl:variable name="random-nodes" select="document('')//node()" /> And since you'll be iterating over them, you need some way of getting back to the data: <xsl:variable name="data" select="/" /> I've used two keys to get the relevant circuit breakers quickly. One just indexes them by column (this is so you can work out whether you need to add a cell or whether there's a circuit breaker from higher in the column that covers it). The other indexes them by row and column. <xsl:key name="breakers-by-column" match="b:circuit-breaker" use="@column" /> <xsl:key name="breakers" match="b:circuit-breaker" use="concat(@row, ':', @column)" /> I've assumed that you've stored the maximum number of rows in a global variable called $max-row and the maximum number of columns in a global variable called $max-col. Here's the template that does the work: <xsl:template match="/"> <!-- store the right number of nodes for the rows in a variable --> <xsl:variable name="rows" select="$random-nodes[position() <= $max-row]" /> <!-- store the right number of nodes for the columns in a variable --> <xsl:variable name="columns" select="$random-nodes[position() <= $max-col]" /> <!-- create the table --> <table> <!-- iterate over the right set of nodes to get the rows --> <xsl:for-each select="$rows"> <!-- store the row number --> <xsl:variable name="row" select="position()" /> <!-- create the row --> <tr> <!-- iterate over the right set of nodes to get the columns --> <xsl:for-each select="$columns"> <!-- store the column number --> <xsl:variable name="col" select="position()" /> <!-- change the current node so that the key works --> <xsl:for-each select="$data"> <!-- identify the relevant circuit breaker --> <xsl:variable name="breaker" select="key('breakers', concat($row, ':', $col))" /> <xsl:choose> <!-- if there is one, apply templates to get the table cell --> <xsl:when test="$breaker"> <xsl:apply-templates select="$breaker" /> </xsl:when> <xsl:otherwise> <!-- find other breakers that start higher in the column --> <xsl:variable name="column-breakers" select="key('breakers-by-column', $col) [@row < $row]" /> <!-- output an empty cell if there isn't one that overlaps --> <xsl:if test="not($column-breakers [@row + @height > $row])"> <td /> </xsl:if> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:for-each> </tr> </xsl:for-each> </table> </xsl:template> <!-- template to give the cell for the circuit breaker --> <xsl:template match="b:circuit-breaker"> <td rowspan="{@height}"> <xsl:value-of select="b:amps" /> </td> </xsl:template> If you don't like the 'one big template' approach, then you could split it down by applying templates to the row and column nodes in 'row' and 'column' modes to distinguish between the two. | |||||||
26. | sorting code available | ||||||
There is now a new version or a sort function at redrice.com which implements a mix-and-match architecture. You can now import and call either the simplesort or mergesort template with exactly the same parameters including one which specifies which of your project-specific compare templates you want used. Both sort templates return their result in the same way, as an ordered list of node-ids, eg "[1:cr423][2:cd342]..." which can be de-referenced very conveniently within for-each loops or even XPath expressions. The demo can be run from the command line: C:\test>saxon sort.xml sortcall.xslt <?xml version="1.0" encoding="UTF-8"?> <product id="a_a_00_01"> 2 </product> <product id="a_a_00_03"> 1 </product> <product id="a_a_00_05"> 4 </product> <product id="a_a_00_9"> w </product> <product id="a_a_00_9"> x </product> <product id="a_a_00_b"> y </product> <product id="a_a_00_b"> z </product> <product id="a_b_00_02"> 3 </product> <product id="a_a_30_50"> 5 </product> <product id="a_a_60_20"> 6 </product> <product id="a_a_30_20"> 7 </product> <product id="a_a_100_30"> 8 </product> To switch sortcall.xslt from using mergesort to simplesort, change line 19 from <xsl:call-template name="mergesort"> to <xsl:call-template name="simplesort"> | |||||||
27. | dynamically set the sort order | ||||||
> I am trying to dynamically set sort order and sort column in my > XSLT. It seems that I can not use an expression for "order". > > <xsl:apply-templates select="Uow"> > <xsl:sort select="$sortColumn" order="$sortOrder"/> > <xsl:with-param name="from" select="$startRow"/> > <xsl:with-param name="to" select="$endRow"/> > </xsl:apply-templates> All attributes of xsl:sort with the exception of "select" can be specified as AVT-s. The "select" attribute can be any XPath expression. Certainly, an attempt to put an XPath expression in a variable and pass this variable as the (complete) value of the "select: attribute -- this will fail for xsl: sort as in any such attempt in XSLT, because XPath expressions are not evaluated dynamically. However, if you need to specify the name of a child element, then you can use an expression like this: *[name()=$sortColumn] Therefore, one possible way to achieve your wanted results is: <xsl:sort order="{$order}" select="*[name()=$sortColumn]"/> | |||||||
28. | Can xsl:sort use a variable | ||||||
The value of $orderBy doesn't depend on the current node, so you'll get the same sort key value for every node. You probably want select="*[name()=$orderBy]". | |||||||
29. | Sorting. | ||||||
I did a slightly different take on this and assumed that you would want to collect all of joe's preferences together, like this: Ann says: "Pears." Even if this is not what you really want, it's interesting to see how it works out. I have not completely handled putting in commas everywhere except a period for the last item - I leave this to the reader. I also haven't translated the first character of the name to upper case I also sorted the result by name. The solution is very compact without those refinements: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method='text'/> <!-- key for using the Muenchian method of getting unique node sets--> <xsl:key name='wrappers' match='wrap' use='name(*)'/> <!-- Elements with unique person names --> <xsl:variable name='unique' select='/root/wrap[generate-id(key("wrappers",name(*))[1])=generate-id(.)]'/> <xsl:template match="/"> <xsl:for-each select='$unique'> <xsl:sort select='name(*)'/> <xsl:variable name='theName' select='name(*)'/> <xsl:value-of select='$theName' /> says: <xsl:for-each select='key("wrappers",$theName)' ><xsl:value-of select='normalize-space(.)' />, </xsl:for-each><xsl:text> </xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet> The slightly odd formatting is an easy way to control the output format while still having short line lengths in the stylesheet (better for emailing), and the character reference is necessary on my (Windows) system to get the line feed to display. Here is the result (I added another person, bob, to the data, just for fun): =========================== This was interesting because the usual examples for getting unique node-sets assume you know what the target elements are named, but not in this case. | |||||||
30. | Sort, upper case then lower case | ||||||
This should work, although to me it looks a bit long-winded. So, to sort this xml by case (upperfirst) and then by value: <root> Output required: Monkey The XSL: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/> <xsl:variable name="hashes" select="'##########################'"/> <xsl:variable name="lowercasenodes" select="root/node[starts-with(translate(.,$lowercase,$hashes),'#')]"/> <xsl:variable name="uppercasenodes" select="root/node[starts-with(translate(.,$uppercase,$hashes),'#')]"/> <xsl:template match="/"> <xsl:apply-templates select="root/node" mode="upper"> <xsl:sort select="."/> </xsl:apply-templates> <xsl:apply-templates select="root/node" mode="lower"> <xsl:sort select="."/> </xsl:apply-templates> </xsl:template> <xsl:template match="node" mode="lower"> <xsl:if test=". = $lowercasenodes"> <xsl:value-of select="."/><br/> </xsl:if> </xsl:template> <xsl:template match="node" mode="upper"> <xsl:if test=". = $uppercasenodes"> <xsl:value-of select="."/><br/> </xsl:if> </xsl:template> </xsl:stylesheet> | |||||||
31. | Sorting on near numeric data | ||||||
In a transform, is it possible to correctly sort these poorly formed id's listed below Currently my standard sort: <xsl:apply-templates> <xsl:sort select="node()/@id"/> </xsl:apply-templates> Returns this: <someNode id="CM09.1"/> <someNode id="CM09.1.5"/> <someNode id="CM09.10"/> <someNode id="CM09.10.10.3"/> <someNode id="CM09.10.15"/> <someNode id="CM09.18.2"/> <someNode id="CM09.2"/> <someNode id="CM09.2.2"/> <someNode id="CM09.22"/> <someNode id="CM09.22.1"/> it's the old classic... 1 then 10 before 2 etc. I really need them sorted like the following: <someNode id="CM09.1"/> <someNode id="CM09.1.5"/> <someNode id="CM09.2"/> <someNode id="CM09.2.2"/> <someNode id="CM09.10"/> <someNode id="CM09.10.10.3"/> <someNode id="CM09.10.15"/> <someNode id="CM09.18.2"/> <someNode id="CM09.22"/> <someNode id="CM09.22.1"/> I'm looking now to see if I can work this out and I was wondering if anybody would be able to help me with the correct sort selection. The only other issue to be aware of is that the decimal points can go on indefinitely and I don't know until runtime the highest number in the any one id will be. > In a transform, is it possible to correctly sort these poorly formed > id's listed below Tricky. I did have a recursive solution for you until I noticed that just because there's an ID CM09.18.2 doesn't mean that there's an ID CM09.18. This irregularity makes the task very difficult. I think that I'd pick one of the following general approaches: 1. Decide that your stylesheet is only going to cope with IDs that have 5 components; or 10 components; or however many seems to be a reasonable maximum. You can always test the XML to make sure that this assumption holds and generate an error if it doesn't. But this allows you to do: <xsl:apply-templates select="someNode"> <xsl:sort data-type="number" select="substring-before( substring-after(@id, '.'), '.')" /> <xsl:sort data-type="number" select="substring-before( substring-after( substring-after(@id, '.'), '.') '.')" /> ... </xsl:apply-templates> 2. Create an extension function that can select the Nth component from an ID. Then create a recursive template that groups and sorts the nodes based on their Nth component. 3. Have a pre-processing phase that changes the IDs such that the number in each component of the ID is formatted with an appropriate number of leading zeros. You will then be able to sort the nodes by ID using alphabetical sorting. 4. Generate the stylesheet dynamically based on the data, creating a stylesheet that contains the appropriate number of sorts for the depth of the IDs that you encounter in the XML. Jeni Mike offers If you're prepared to write some recursive XSLT code to transform the keys, you could achieve this by the technique of prefixing each numeric component with a digit indicating its length. Thus 1 becomes 11, 10 becomes 210, 15 becomes 215, 109 becomes 3109. This will give you a key that collates alphabetically. | |||||||
32. | Sort by date | ||||||
<xsl:sort data-type="number" order="descending" select="concat(substring-after(substring-after(., ' '), ' '), format-number(document('')/*/x:months/month[@name = substring-before(substring-after(current(), ' '), ' ')]/@number, '00'), format-number(substring-before(., '.'), '00'))" /> with <x:months> <month name="January" number="1" /> <month name="February" number="2" /> <month name="March" number="3" /> <month name="April" number="4" /> <month name="May" number="5" /> <month name="June" number="6" /> <month name="July" number="7" /> <month name="August" number="8" /> <month name="September" number="9" /> <month name="October" number="10" /> <month name="November" number="11" /> <month name="December" number="12" /> </x:months> as a top-level element in your stylesheet. | |||||||
33. | Sorting problems | ||||||
ZZZZZ would come before AAAAAAAA. The sort was being performed by IE 6.0. After much hair pulling, I finally figured out it was because of the carriage return that preceded the ZZZZZZ (the actual XML doc was much bigger, hiding the problem). First question: It seems odd to me that the newline character would be considered significant and not get stripped. Why is this not so? Answer newlines that are followed by non-white space characters are _never_ considered insignificatnt by XML or stripped by XSLT. (Mike Kay adds; But they may be ignored when sorting - the spec leaves detailed decisions on how strings are sorted to the implementation.) You want the normalize-space function, select="normalize-space(vendor_name)" | |||||||
34. | How do I sort Hiragana and Katakana Japanese characters? | ||||||
First, a note on what exactly these two kinds of characters are. Written Japanese text uses four kinds of characters:
Children's books often put small Hiragana characters below a Kanji character, so the student can subvocalize the word and learn the Kanji that way. The Hiragana and Katakana alphabets are called "syllabaries". With one exception, each member is either a single-vowel syllable or a consonant-vowel syllable. The exception is "-n", as in "san", which would be written [sa][n]. Some of the syllables can be complex, such as the "kyo" in [To][kyo]. Both syllbaries follow the same order, following the a-i-u-e-o (vocalized as short 'u', long 'e', 'oo' as in "shoe", short e, semi-long o as in "beau") form horizontally, and the a-ka-sa-ta-na ha-ya-ma-wa-n order vertically (consonants like the hard g, d, sh, ch (as in chew), b, and p come by modifying the so-called unvoiced consonants). Modified syllables sort immediately after their base syllable. For example, "ga" sits between "ka" and "ki". Hiragana characters occupy Unicode code points Ux3042 - Ux3094. Katakana characters occupy Unicode code points Ux30A0 - Ux30FF. The Katakana alphabet is growing, at a slow rate, and contains syllables that are not in the Hiragana table. From what I know, the Unicode tables follow Japanese dictionary sorting order, <i>as long as you stay within either the Hiragana or Katakana table</i>. If all the items in your list are either one or the other, you should be able to use XSLT's simple Unicode-based xslt:sort element. Otherwise, you would need to write an extension function that would map Hiragana characters to their corresponding Katakana values (since the former is a proper subset of the latter). Here's an example, where I have a list of Japanese characters. The attribute "a" is used to indicate where I would expect each item to appear in a sort based on Unicode-values only. The input: <?xml version="1.0"> <items> <item a='k5'>ヴァ</item> <item a='h3'>す</item> <item a='h4'>も</item> <item a='k2'>キ</item> <item a='h1'>か</item> <item a='k4'>モ</item> <item a='k1'>カ</item> <item a='h2'>き</item> <item a='k3'>ス</item> </items> The XSLT: <?xml version="1.0"?> <xslt:stylesheet xmlns:xslt="http://www.w3.org/1999/XSL/Transform" version="1.0" > <xslt:output indent='yes' method='xml' encoding='utf-8' /> <xslt:template match='items'> <outitems what='Starting sorting'> <xslt:apply-templates select='item'> <xslt:sort select='.'/> </xslt:apply-templates> </outitems> </xslt:template> <xslt:template match='item'> <outitem> <xslt:attribute name='ord'> <xslt:value-of select='@a'/> </xslt:attribute> <xslt:value-of select='.'/> </outitem> </xslt:template> </xslt:stylesheet> The output: <?xml version="1.0" encoding="utf-8"?> <outitems what="Starting sorting"> <outitem ord="h1">か</outitem> <outitem ord="h2">き</outitem> <outitem ord="h3">す</outitem> <outitem ord="h4">も</outitem> <outitem ord="k1">カ</outitem> <outitem ord="k2">キ</outitem> <outitem ord="k3">ス</outitem> <outitem ord="k4">モ</outitem> <outitem ord="k5">ヴァ</outitem> </outitems> If you are doing any work in this area, Ken Lunde's book "CJKV Information Processing" (ISBN 1565922247) is a worthwhile investment. I supplement it with a copy of Unicode 3.0 I found at a local discount store for remaindered computer books. The unicode.org site is useful, too, but I prefer turning pages in hard copy to waiting for PDF files to open. But XML data is supposed to all be Unicode. C# supposedly stores all chars as "Unicode" (whatever that means, probably 16-bit ignoring issues with surrogates), so I'm surprised this sort didn't occur. Sort works fine with ascii text. And it seems to work when I mix Ascii and Japanese. Looks like I found a boundary condition violation. I wouldn't call it processor-specific. Ednote: Saxon on sourceforge indicates how saxon extends the sorting capability to other languages. | |||||||
35. | Sorting problems. | ||||||
The problem was that when sorting on this XML snippet using XSL: newlines that are followed by non-white space characters are _never_ considered insignificatnt by XML or stripped by XSLT. [ You want the normalize-space function, <xsl:value-of select="normalize-space(source)"/> | |||||||
36. | Sorting problems with whitespace (wrong order ). | ||||||
The problem was that when sorting on this XML snippet using XSL: newlines that are followed by non-white space characters are _never_ considered insignificatnt by XML or stripped by XSLT. [ You want the normalize-space function, <xsl:value-of select="normmalize-space(vendor_name)"/> | |||||||
37. | Sorting Upper-Case first, or in *your* way | ||||||
I don't know exactly what the intent of the XSLT 1.0 spec for case-order was, but you need to read the definition in the light of the two (non-normative) notes that follow it. The first says that two implementations may produce different results - in other words, the spec does not attempt to be completely prescriptive about the output order (therefore, by definition, this is not a Microsoft non-conformance). The second note points to Unicode TR-10: http://www.unicode.org/unicode/reports/tr10/index.html Section 6.6 of this report recommends that implementations should allow the user to decide whether lower-case should sort before or after upper-case, and my guess is that the xsl:sort parameter was intended to implement this recommendation. In turn this should be read in the context of the collation algorithm given in the report, which sorts strings in three phases:
The key thing here is that case is only considered if the two strings (as a whole) are the same except in case. So Xaaaa will sort before xaaaa if upper-case comes first; but Xaaaa will always sort before xaaab, regardless of case order. It looks to me from this evidence as if Microsoft is implementing something close to the Unicode TR10 algorithm.
My dictionary defines "lexicographical" [sic] as "pertaining to the making of dictionaries", so on that basis "lexicographic order" means "the order that headwords might appear in a dictionary". And in my dictionary, "Johnsonian" comes after "johnny" and before "joie-de-vivre". I think the great man would have been surprised if he had appeared before "a" or after "zymotic". I know that the word lexicographic is also used to describe a class of sorting algorithms, but I don't think the XSLT 1.0 spec is using the word in that sense. This is clear from the phrase "lexicographically in the culturally correct manner for the language..." and from the fact that it recommends Unicode TR10, which is not a lexicographic sort in that sense. David C adds. See for example the definition given here: http://mathworld.wolfram.com/LexicographicOrder.html Note that (despite the etymology) "lexicographic order" doesn't necessarily mean "the order used in a dictionary" as dictionaries are compiled by human compilers and words can appear in whatever order the compiler chooses which may reflect personal and culturalpreferences as much as logic. However lexicographic ordering is used in a technical sense as a method of extending the ordering on one set (the alphabet) to a derived set (strings over that alphabet). I don't believe that the first note authorises this behaviour. it does not give a blanket licence to produce any result, it is an observation that because character order is language and system dependent the resulting lexicographic ordering will be too. The exact places that are system dependent are listed in the normative text above. MK continues The second note points to Unicode TR-10: http://www.unicode.org/unicode/reports/tr10/index.html The key thing here is that case is only considered if the two strings (as a whole) are the same except in case. DC retorts. You mean that this is a feature of the algorithm in TR-10 (I didn't follow it closely enough to derive this property just now)? Of course XSLT 1.0 doesn't actualy define "lexicographic" but my understanding is that it always implies a direct extension on an ordering on characters to an ordering on strings by comparing the first different position. If that isn't what is intended I think XSLT shouldn't use this term and should just directly refer to TR10. Yours truly adds: With more than a little help from Eliot, below is a means of providing your own sort order. Its Saxon specific, tested with 6.5.2. Sorry. <?xml version="1.0" encoding="utf-8" ?> <doc> <word>Hello</word> <word>hello</word> <word>[hello]</word> <word>Mword</word> <word>Nword</word> <word>Mother</word> <word>Nato</word> <word>:Mother</word> <word>5Mother</word> <word>ünter</word> <word>unter</word> <word>!unter</word> <word>$unter</word> </doc> xslt: <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:saxon="http://icl.com/saxon" version="1.0"> <xsl:output method="xml" encoding="utf-8"/> <xsl:template match="/"> <html> <head> <title>Collation</title> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8" /> </head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="doc"> <h3>Using an external collator class that implements java.util.comparator, with the rules in an external text file.</h3> <p>The rules file is straightforward, with a couple of notable exceptions. </p> <ol> <li>N sorts before M</li> <li>punctuation sorts before alpha which sorts before Numeric</li> </ol> <p> Before: <br /> <xsl:for-each select="word"> <xsl:value-of select="."/> <br /> </xsl:for-each> </p> <hr /> <p> <b>After</b> <br /> <xsl:for-each select="word"> <xsl:sort select="." data-type="text" lang="er"/> <xsl:value-of select="."/> <br /> </xsl:for-each> </p> </xsl:template> <xsl:template match="*"> *****<xsl:value-of select="name(..)"/>/<xsl:value-of select="name()"/>****** </xsl:template> </xsl:stylesheet> Note the 'lang' attribute. This sends saxon off looking for a class com.icl.saxon.sort.Compare_er which provides the necessary items. (note, the default is _en, the english 'sort') For convenience, the actual sort order is kept externally, read in at runtime from a text file, as utf-8 For this test case it reads, ' ' , ':' , ';' , '<' , '=' , '>' , '?' , '@', '!', Two basic blocks. First up to the first < sign. These are ignorable. Next the 'sequence' of characters, e.g. < 'A',a < 'B',b implying that A sorts before B. Note < 'N',n < 'M',m which is the test case. I.e. N should sort before M. This lot is held in file collator.txt Spec is http://java.sun.com/products/jdk/1.2/docs/api/java/text/RuleBasedCollator.html The java is below. package com.icl.saxon.sort; import java.text.Collator; import java.text.RuleBasedCollator; import java.text.ParseException; import java.lang.StringBuffer; import java.io.FileReader; import java.io.BufferedReader; import java.io.Serializable; import com.icl.saxon.sort.TextComparer; import java.io.File; /** * Custom Saxon collator implementation. **/ public class Compare_er extends com.icl.saxon.sort.TextComparer { static Collator collator; //static final String collatorRules = "< a < b < c"; // String containing collation rules as defined by Java // RulesBasedCollator class. This could come from an // external resource of some sort, including from a Java // property or read from an application-specific configuration // file. public Compare_er() { super(); String rulesFile="collator.txt"; try { collator = new RuleBasedCollator(getRules(rulesFile)); } catch (Exception e) { e.printStackTrace(); // Saxon will not report an exception thrown at this point } } public int compare(java.lang.Object obj, java.lang.Object obj1) { return collator.compare(obj, obj1); } /** *Read a set of rules into a String *@param filename name of the file containing the rules *@return String, the rules * **/ private static String getRules(String filename) { String res=""; try{ BufferedReader reader = new BufferedReader (new FileReader (filename)); StringBuffer buf=new StringBuffer(); String text; try { while ((text=reader.readLine()) != null) buf.append(text + "\n"); reader.close(); }catch (java.io.IOException e) { System.err.println("Unable to read from rules file "+ filename); System.exit(2); } res=buf.toString(); }catch (java.io.FileNotFoundException e) { System.out.println("Unable to read Rules file, quitting"); System.exit(2); } return res; }// end of getRules() } Note the read from the rules file. (Also that if its not found, saxon doesn't report the error) Finally, testing with <doc> <word>Hello</word> <word>hello</word> <word>[hello]</word> <word>Mword</word> <word>Nword</word> </doc> gives output Hello I.e. the M and N are re-arranged. Caution. Assuming that the java file is in location com/icl/saxon/sort/Compare_er.java then make sure that '.' is in the classpath, so it finds it. With your own collator.txt file you can then sort text to your hearts content and to your own rules. Its even easier in saxon 7, but that's another story. Last word goes to Mike Kay.
I thought you would never ask. I'm an optimist ;-) The answer is different for Saxon 6.x and Saxon 7.x. In Saxon 6.x, you can write your own collating functions as a plug-in, but if you don't, then two strings are compared as follows:
Case normalization relies on the Java method toLowerCase. Accent stripping is implemented only for characters in the upper half of the Latin-1 set. The above is essentially a simplified implementation of the Unicode Collation Algorithm. In Saxon 7.x, Saxon uses the collation capabilities of JDK 1.4. You can select any collation supported by the JDK. The default is selected according to your locale, or according to the language if lang is specified on xsl:sort. If case-order is upper-first, then the action of the selected Java collation is modified as follows: if the Java collation decides that two strings collate as equal, then Saxon examines the two strings, looking for the first character that differs between the two strings. If one of these is upper case, then that string comes first in the sorted order. | |||||||
38. | Topological sort | ||||||
Regarding the post from two years ago about topological sorting (Archive), here is another approach that I came up with. To me it seems to be more in the spirit of XSLT, ie, writing functionally rather than procedurally. Tell me what you think. Topological sort refers to printing the nodes in a graph such that you print a node before you print any nodes that reference that node. Here's an example of a topologically sorted list: <element id="bar"/> <element id="bar2"/> <element id="foo"> <idref id="bar"/> </element> My algorithm is simply:
The algorithm is O(n^2) for a simple XSLT processor, but it would be O(n) if the XSLT processor was smart enough to cache the values returned from the computeWeight(node) function. Does saxon do this? Maybe it would if I used keys. Here is the code. Note that it's XSLT V2 (although it could be written more verbosely in XSLT V1). <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bill="http://bill.org" version="2.0"> Here's the code to compute the weight of a node (This code doesn't detect circular dependencies, but it should be easy to add. That's left as an exercise to the reader. :-) <xsl:function name="bill:computeWeight" as="xs:integer*"> <xsl:param name="node"/> <!-- generate a sequence containing the weights of each node I reference --> <xsl:variable name="referencedNodeWeights" as="xs:integer*"> <xsl:sequence select="0"/> <xsl:for-each select="$node/idref/@id"> <xsl:sequence> <xsl:value-of select="bill:computeWeight(//element[@id=current()])"/> </xsl:sequence> </xsl:for-each> </xsl:variable> <!-- make my weight higher than any of the nodes I reference --> <xsl:value-of select="max($referencedNodeWeights)+1"/> </xsl:function> Here's the driver code, that sorts the elements according to their weight. <xsl:template match="/"> <xsl:for-each select="top/element"> <xsl:sort select="bill:computeWeight(.)" data-type="number"/> <xsl:copy-of select="."/> </xsl:for-each> </xsl:template> See archive 1), archive 2 and archive 3 The latter is a stable topological sort -- "keeps the cliques together". | |||||||
39. | Sorting on names with and without spaces | ||||||
As is so often the case, it's easy in 2.0: <xsl:sort select="if (contains(., ' ')) then substring-after(., ' ') else string(.)"/> The only workaround I can think of for 1.0 is the "infinite-substring" hack. This relies on the fact that if B is a boolean expression, then substring(X, 1 div B, string-length(X)) returns [if (B) then X else ""] So you get something like select="concat( substring(., 1 div (not(contains(., ' ')), string-length(X), substring(substring-after(., ' '), 1 div contains(., ' '), string-length(X)))"/> Dimitre offers This xml <names> <span class="person">Jenofsky</span> <span class="person">Jones</span> <span class="person">Zubbard</span> <span class="person">Bob Madison</span> <span class="person">Oscar Madison</span> <span class="person">Felix Unger</span> </names> <xsl:template match="names"> <xsl:apply-templates select="span"> <xsl:sort select="substring(translate(., ' ', ''), string-length(substring-before(.,' ')) + 1) "/> </xsl:apply-templates> </xsl:template> This calculates the index of the first space, then removes the space(s) from the string, then uses what follows the space as the sorting key. and Mike Kay the 2.0 solution <xsl:template match="names"> <xsl:apply-templates select="span"> <xsl:sort select="if (contains(., ' ')) then substring-after(., ' ') else string(.)" data-type="text" case-order="upper-first"/> </xsl:apply-templates> </xsl:template> |