Grouping

1. Split a long list into groups
2. Muenchian grouping
3. Muenchian grouping explained.
4. Grouping by attribute
5. Grouping, two levels.
6. Grouping in 3
7. Grouping
8. Grouping by a fragment of a date
9. Select first n of a list of elements
10. Positional Grouping solution
11. Grouping by position
12. Grouping Variant (NITF)
13. Grouping at two levels
14. Grouping by first letter
15. Grouping problem
16. Two level grouping
17. Grouping with keys for positional grouping
18. Grouping by substrings
19. Concatenate two elements
20. Grouping by first letter
21. Grouping by two attributes
22. Grouping by text values, to reduce to single values
23. Unique items using Muenchian grouping
24. Grouping and remove duplicates
25. Grouping a flat structure
26. Muenchian grouping from tables.
27. Tree Walking, forward walk
28. Grouping
29. Intersect

Split a long list into groups

Michael Kay

The logic for splitting a list of 100 items into pages of 20 is exactly the same as arranging it in a table with 20 rows. It's basically

<xsl:for-each select="item[position() mod 20 = 1]">
  <page>
    <xsl:for-each 
        select=".|following-sibling::item[position() <20]">
            ....

Muenchian grouping

Wendell Piez

The high-level skinny* on Muenchian grouping:

(* Ednote. And I thought Wendell liked English :-)

1. Establish a way to group the nodes you want grouped

2. Select and process only one (typically the first) of the nodes in each group

3. When you process it, also pick up and process the others in the group

We usually use keys to establish the grouping: it's handy and efficient. (This can also be done with raw XPath, though you'll find performance will degrade on anything but small documents).

Since XPath 1.0 has no direct way to test node identity, the de-duplicating step (2) is usually done with either of two non-obvious techniques:

- compare generate-id of a selected node with the generated id of the first node in the group to which your selected node belongs

- count the nodes in the set formed by the union of a selected node and the first in its group: if it equals 1, they're the same node.

Muenchian grouping explained.

Jarno Elovirta and David Carlisle

> From Muenchian method of grouping, I always use something like this:
>
> ROW[count(. | key('relacion_x_cobertura', REL_ID)[1]) = 1]

  key('relacion_x_cobertura', REL_ID)

will return a node-set whose "relacion_x_cobertura" key value is the same as the string value of REL_ID element,

  key('relacion_x_cobertura', REL_ID)[1]

of that node-set, select the first,

  . | key('relacion_x_cobertura', REL_ID)[1]

create an union of that node-set with the current node. Remember that a node-set will *not* contain dublicates, so if the current node is same node as the first one returned by the key, the resulting set will only contain the current node.

  count(. | key('relacion_x_cobertura', REL_ID)[1])

count the number of nodes in the union set,

  count(. | key('relacion_x_cobertura', REL_ID)[1]) = 1

and if the count is equal to one, return boolean true. In effect, check if the current node is the same node as the first one returned by the key.

> - Why the . (dot) is used? why if I omit it, it doesn't work (it
> returns all elements)?

The current node. See <http://www.w3.org/TR/xpath#NT-AbbreviatedStep>.

> - What's the meaning of the | (pipe)?

You haven't read the XPath 1.0 spec, have you? See <http://www.w3.org/TR/xpath#NT-UnionExpr>.

> - What's the meaning of [1]? I have always used things like
> [FIELD_NAME=some_value]. I understand that perfectly, but what about
> placing only that number in the brackets? I tried by using [2] and it
> worked too... or, perhaps I was lucky?

Read the XPath 1.0 spec, you'll feel better in the morning.

David Carlisle gives;

this is explained in jeni's pages on grouping but basically

> Why the . (dot) is used?

. is the current node as always in xpath

> What's the meaning of the | (pipe)?

| is set union select="a|b" selects all nodes called a and all nodes called b and returns the union of those sets (which means, it can often be read as "or") select="a|b" selects all elements called a or b.

> - What's the meaning of [1]?

if a predicate is numeric it is tests the value of position()

select="a[3]" selects the third a child of the current node.

> I tried by using [2] and it worked too... or, perhaps I was lucky?

[2] wouldn't work in general, that would select the second item of each group rather than the first, and in particular if a group only had one item you would get nothing.

You want to test if the current node is the first item of the group in XPath 2 draft that is

. is key('relacion_x_cobertura', REL_ID)[1]

but Xpath 1 does not have the "is" operator or any other direct way to test node identity so you can use either one of two methods

testing generated ids:

generate-id(.) = generate-id(key('relacion_x_cobertura', REL_ID)[1])

this does a string equality test of the generated ids, which will be equal only if they are the same node

or you can do the test you had

count(. | key('relacion_x_cobertura', REL_ID)[1]) = 1

. | key('relacion_x_cobertura', REL_ID)

is the union of the two nodes . and key('relacion_x_cobertura', REL_ID)[1], so eiether these nodes are different and so this set has two elements or they are the same in which case the set will have one element (so count(....) =1. (Note this test relies on the fact that in this context you know that there is some element with key('relacion_x_cobertura', REL_ID) as in general you need to check that the key('relacion_x_cobertura', REL_ID) is non-empty, as if that is the empty set the union with . would again only have one element in it.

Grouping by attribute

Mike Kay

Can anyone suggest a better way to achieve the following... I need to group the xml elements category and place each category in a separate table.

There is no simple way of doing grouping in XSLT. In some cases it can be done, painfully and generally rather slowly, by testing each item to see if it is the same as the previous item.

There is a proprietary feature in SAXON XSL to do grouping, the saxon:group element. Give it a try.

Grouping, two levels.

Jeni Tennison

I'm having a tough time grouping on an attribute where I need to get
only unique values of the domain attribute for each unique technology.
The problem is that the technology element is unbounded and repeats as a
child of the unbounded product element. Looking at the XML below we see
that tech 1 belongs to both Product 1 and Product 2 and has a domain
attribute of xyz and abc. Whenever I group and output the values to get
each unique domain I always get repeating domain values. So grouping on
Tech 1 I would get 2 values for domain=xyz and 2 values for domain=abc
(since the domain has these values in both Product 1 and Product 2).

I've been trying to set the key as <xsl:key name="domain4tech"
match="technology/@domain" use="technology"/> but this has proved
useless. Any suggestions?

Here's an example of the output I'm looking for.

Technology 1
domain = xyz
domain = abc

Technology 2
domain = xyz
domain = abc
domain = zzz
domain = xxx

Technology 3
domain = xyz
domain = aaa

- -------------------------------
Here's the XML

<document>
 <products>
<product>Product1
 <technology domain="xyz">tech1</technology>
 <technology domain="abc">tech1</technology>
 <technology domain="xyz">tech2</technology>
 <technology domain="xxx">tech2</technology>
 <technology domain="xyz">tech3</technology>
 <product>
<product>Product2
 <technology domain="xyz">tech1</technology>
 <technology domain="abc">tech1</technology>
 <technology domain="zzz">tech2</technology>
 <technology domain="xxx">tech2</technology>
 <technology domain="aaa">tech3</technology>
 <product>
 <products>
<document>

I think that the reason you're running into difficulties is because this is actually a two-level grouping problem. The first level of grouping is grouping all the technology elements by their value (e.g. tech1, tech2, tech3). The second level of grouping is only required to get rid of the duplicates - you need to group the technology elements for each particular technology by their domain attribute.

In XSLT 2.0 terms, the grouping would look like:

  <xsl:for-each-group select="product/technology"
                      group-by=".">
    <xsl:sort select="." />
    <xsl:value-of select="." />
    <xsl:for-each-group select="current-group()"
                        group-by="@domain">
      domain = <xsl:value-of select="@domain" />
    </xsl:for-each-group>
  </xsl:for-each-group>

or, alternatively:

  <xsl:for-each-group select="product/technology"
                      group-by=".">
    <xsl:sort select="." />
    <xsl:value-of select="." />
    <xsl:for-each select="distinct-values(current-group()/@domain)">
      domain = <xsl:value-of select="@domain" />
    </xsl:for-each>
  </xsl:for-each-group>

That's probably not much use to you (unless you're using Saxon 7.0), but it does help see the overall structure of what we're doing.

To do two levels of grouping with keys, you need two keys - one for the first level, one for the second level. The first level key is easy, you're grouping technology elements by their value:

<xsl:key name="tech" match="technology" use="." />

The second level is a little harder, because you need to index the technology elements by *both* their technology and their domain. You can create a key to do this by concatenating the two values by which you want to group together, as follows:

<xsl:key name="tech-by-domain" match="technology"
         use="concat(., '+', @domain)" />

Then you need the old Muenchian trick to get the unique values, and Bob's your uncle:

  <xsl:for-each select="product/technology
                          [generate-id() =
                           generate-id(key('tech', .)[1])]">
    <xsl:sort select="." />
    <xsl:value-of select="." />
    <xsl:for-each
      select="key('tech', .)
                [generate-id() =
                 generate-id(key('tech-by-domain',
                                 concat(., '+', @domain))[1])]">
      domain = <xsl:value-of select="@domain" />
    </xsl:for-each>
  </xsl:for-each>

Grouping in 3

Various



Q expansion.

Given
<field>1</field>
  <field>2</field>
  <field>3</field>
  <field>4</field>
  <field>5</field>...
  <field>n</field>

We desire output:
<tr>
    <td>1</td>
    <td>2</td>
    <td>3</td>
</tr>
<tr>
    <td>4</td>
    <td>5</td>
        ...
    <td>n</td>
</tr>

I.e. in triples.

Steve Tinney offers

<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
  <xsl:call-template name="triples">
    <xsl:with-param name="nodes" select="/*/f"/>
  </xsl:call-template>
</xsl:template>

<xsl:template name="triples">
  <xsl:param name="nodes"/>
  <tr><td><xsl:value-of select="$nodes[1]"/></td>
      <td><xsl:value-of select="$nodes[2]"/></td>
      <td><xsl:value-of 
              select="$nodes[3]"/></td></tr>
  <xsl:if test="count($nodes) > 3">
    <xsl:call-template name="triples">
      <xsl:with-param name="nodes" 
                      select="$nodes[position() > 3]"/>
    </xsl:call-template>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>

Steve Muench offers the following, having wrapped the source file in a 'data' wrapper.

<xsl:stylesheet 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:param name="max" select="number(3)"/>

  <xsl:template match="/">
    <html><body><xsl:apply-templates/></body></html>
  </xsl:template>

  <xsl:template match="data">
    <table>
      <tr>
      <xsl:for-each select="field">
        <td><xsl:apply-templates/></td>
        <xsl:if test="position() 
         mod $max = 0 and position()!=last()">
          <xsl:text 
   disable-output-escaping="yes"><!
       [CDATA[</tr><tr>]]></xsl:text>
        </xsl:if>
      </xsl:for-each>
      </tr>
    </table>
  </xsl:template>
</xsl:stylesheet>

Which was critiqued for its use of CDATA ;-), the following being then offered by Nikolai

here's the rewording of the same thing that does the grouping in a single template - two nested loops as in C/Perl.

<xsl:template match="data">
  <table>
    <xsl:for-each select="field[position() mod $max = 1]" >
      <tr>
         <xsl:for-each select="self::field |
          following-sibling::field[position() < $max]>
            <td><xsl:apply-templates/></td>
         </xsl:for-each>
      </tr>
    </xsl:for-each>
  </table>
</xsl:template>

Mike Kay then offers

<xsl:template match="field[position() 
             mod 3 = 1]" priority="2">
  <tr>
    <td><xsl:value-of select="." mode=/></td>
    <td><xsl:value-of 
        select="following-sibling::field[1]"/></td>
    <td><xsl:value-of 
        select="following-sibling::field[2]"/></td>
  </tr>
</xsl:template>

<xsl:template match="field" priority="1"/>

this also generates empty 
   <td> elements to fill up the last <tr> row.

Grouping

Mike Kay


We want a result like this:

    Installed Software

  Program         Version
  ------------------------
  Emacs ......... 19.34 ..
  Emacs ......... 20.3 ...
  Emacs ......... 20.4 ...
  JDK %%%%%%%%%%% 1.1.2 %%
  JDK %%%%%%%%%%% 1.2 %%%%
  XEmacs ........ 20.4 ...
  XEmacs ........ 21.1.9 .

I have tried to visualize the background color with the characters "." and "%". This allows you to see, at a glance, which lines belong to the same program.

I'd tackle it like this (I don't recall the exact shape of your data so adapt as necessary):

1. create a set of distinct programs: var progs select=//program var distinct_progs select=$progs[not(@name=preceding::program/@name)]

2. iterate through this in sorted order:
for-each select=$distinct_progs
  sort select=@name
    var color choose when position() mod 2 = 0 red otherwise blue 
    for-each version of this program
      display this program version in the current background color.

Grouping by a fragment of a date

Steve Muench


<rs:data>
		<z:row iID="1" dCreated="1900-01-01T01:00:00"/>
		<z:row iID="2" dCreated="1900-01-02T01:00:00"/>
		<z:row iID="3" dCreated="1900-01-02T02:00:00"/>
		<z:row iID="4" dCreated="1900-01-04T01:00:00"/>
	</rs:data>
</xml>

Using XSLT, how is it possible to get:

	1900-01-01	- Item1
	1900-01-02	- Item2 Item3
	1900-01-04	- Item4

you should be able to use the substring-before() function in the <xsl:key> declaration like this:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882"
   xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
   xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema">


   <xsl:output indent="yes"/>


   <xsl:key name="foo"
      match="/xml/rs:data/z:row"
      use="substring-before(@dCreated,'T')"/>


 <xsl:template match="/">
   <RowsByCreatedDate>
     <xsl:for-each
      select="/xml/rs:data/z:row[generate-id(.)=
                                 generate-id(
key('foo',substring-before(@dCreated,'T')))]">
     <Created date="{substring-before(@dCreated,'T')}">
       <xsl:for-each select="key('foo',substring-before(@dCreated,'T'))">
         <xsl:copy-of select="."/>
       </xsl:for-each>
     </Created>
   </xsl:for-each>
   </RowsByCreatedDate>
 </xsl:template>
</xsl:stylesheet>

Select first n of a list of elements

Linus-Lin

How to write an xsl file that can take only 5 items from an xml file that 30 items in it... I am taking a newsfeed in XML, but the feed itself has 30 stories, and I only want to display the first 5 - Is that possible?

a possible solution would be this (there's plenty of solution depending on your XML):

if your NEWSTORY(s) have been sorted based on priority... get all stories upto but not including the 6th.

<xsl:templates match="NEWSTORIES">
  <xsl:if test="position() &lt; 6"> 
	<xsl:apply-templates select="NEWSTORY"/>
  </xsl:if> 
</xsl:templates>

10.

Positional Grouping solution

Mike Kay

when I have something like

<name>Tom</name>
<remark>1</remark>
<remark>2</remark>
<remark>3</remark>

<name>John</name>
<remark>4</remark>
<remark>5</remark>
<remark>6</remark>

is there a way of creating something like

Tom: 1,2,3
John: 4,5,6   ?

This is an example of a positional grouping problem.

One solution is:

<xsl:for-each select="name">
  <tr><td><xsl:value-of select="."/></td>
  <td>
  <xsl:for-each select="following-sibling::remark[
              count(preceding-sibling::name[1] | current()) = 1]">
    <xsl:value-of select="remark"/>,
  </xsl:for-each>
  </td>
  </tr>
</xsl:for-each>

The "count(x | y) = 1" idiom is used because the result of the "|" operator is a set containg the nodes on either side with no duplicates, this can be used to test if two nodes are identical.

DaveP. I had used this idiom, without understanding it before, but when I came to a variant situation I stared at the preceding paragraph for twenty minutes before I came to understand it. After a couple of questions which Jeni kindly answered I offer my take on this rather elegant idea. I simply wanted to wrap a certain subset of the flat structure, breaking the input at certain input elements (it was the output of an Omnimark transform).

Note that in the source XML there are two kinds of elements, wrappers-to-be, and wrapped, i.e. those that will go inside the new wrapper element. The outer for-each picks up, in turn, each of the wrappers-to-be, and outputs a wrapper (Jeni's example simply outputs a td element).

The inner for-each picks up all following-siblings until the predicate is false. This is where I became stuck. It iterates over the following-siblings, but the predicate looks at the preceding-sibling axis. My take on this is that the predicate becomes false when the item being addressed no longer has the context node as its specifically named first preceding-sibling. Jeni's example uses the <name> element. So it stops iterating when it reaches a node which is not number one on the preceding axis, and named 'name'. The example then wraps this up in the idiom language, which makes it elegant, works and all, but didn't make sense to my tiny brain. HTH, DaveP

11.

Grouping by position

Jeni Tennison


> I have a calendar that displays 12 months of a year. Currently, all
> 12 months display across the page as one row. Instead, I'd like to
> arrange them a 3 rows with 4 months in each row. Any ideas how to do
> this: <tr>4 MONTHS HERE</tr>?

This is a grouping-by-position problem: you want to group the months according to their position within the CALENDAR element.

As with any grouping problem, you can break it down into two steps: 1. finding the first node in a group 2. processing the group

The usual way of finding the first node in a group based on position use the mod operator on the position of the node. If you want to group into groups of 4, then the position of the first node in each group mod 4 will equal 1. In your case, you can use the XPath:

  MONTH[position() mod 4 = 1]

to select the months that are first in each row. I'd probably select these by applying templates in 'row' mode inside the CALENDAR-matching template:

<xsl:template match="CALENDAR">
  <xsl:apply-templates select="MONTH[position() mod 4 = 1"
                       mode="row" />
</xsl:template>

Then create a template that matches MONTHs in 'row' mode. Because you've only selected the first in the group, this template will only fire once per row. This template needs to create a row and then apply templates to each of the months in the group. The group of months consists of the MONTH you're currently on and its next 3 siblings, i.e.:

  . | following-sibling::MONTH[position() &lt; 4]

So it should look something like:

<xsl:template match="MONTH" mode="row">
  <tr>
    <xsl:apply-templates
      select=". | following-sibling::MONTH[position() &lt; 4]" />
  </tr>
</xsl:template>

This will then use the MONTH-matching template that you already have.

If you want to, you can separate out the number of months that you want in each row into a variable or parameter that you can change during development, or even let the user change dynamically.

12.

Grouping Variant (NITF)

Jeni Tennison

> the NITF has a <content.body> tag which is equivilant to
> HTMLs <body> tag.
> However, its children are far more rigidly defined in that it
> only allows elements as children.
>
> > I need to get the line:
> this is <em>emphasis</em> some more <b>text</b></br></br>
> > to end up wrapped in <p> tags (preferably without the <br>s)
> >
> > For clarity, the children of the body are:
>      p
>      ul
> |    text()
> |    em
> |    text()
> |    b
> |    br
> |    br
>      p
>
> > I need to work with thos tags that  have the | beside them
> as a single
> > block so that I can wrap the entire thing in a <p> tag.
>

>> I need to work with those tags that  have the | beside them as a single
>> block so that I can wrap the entire thing in a <p> tag. 
>> Since I don't know
>> the placement or the order or even the frequency of such situations (there
>> is no reason why I couldn't have more blocks that need to be grouped
>> together). The solution needs to be general.

Example input

<body>
<p> this is some text</p>
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b><br/><br/>
<p>This is a new paragraph</p>
</body>

Here's a single-pass solution that steps through the nodes one by one to work out what to do. The first thing to do is get rid of all that insignificant whitespace - otherwise you'll get lots of paragraphs containing nothing but whitespace:

<xsl:strip-space elements="*" />

The next is to set the thing going with a body-matching template. This creates a copy of the body and then starts the ball rolling by applying templates to its first child:

<xsl:template match="body">
   <body>
      <xsl:apply-templates select="node()[1]" />   
   </body>
</xsl:template>

Now, if apply templates like this comes across something that you want to keep as it is, then you just want to copy it before moving on to the next node. Here I've just listed the elements that you said were valid directly under a body element:

<xsl:template match="p|table|ul|ol">
   <xsl:copy-of select="." />
   <xsl:apply-templates select="following-sibling::node()[1]" />
</xsl:template>

Now, when you come across something else, you want to create a p element and place the next bunch of misfits inside it. Creating the p element is easy. Inside it, I apply templates to the current node in 'copy' mode. 'copy' mode is my mode for copying the misfits and moving on to the next. Then I apply templates to the next sibling that's one of the acceptable elements:

<xsl:template match="*|text()">
   <p>
      <xsl:apply-templates select="." mode="copy" />
   </p>
   <xsl:apply-templates select="following-sibling::*[self::p or 
   self::table or self::ul or self::ol][1]" />
</xsl:template>

For 'copy' mode, I want to make a copy of the matched node, and then move on to the next, but only if the next is a misfit node, not if it's acceptable. I could have done it with a big XPath, but it's a bit clearer to use an xsl:if:

<xsl:template match="*|text()" mode="copy">
   <xsl:copy-of select="." />
   <xsl:if test="not(following-sibling::node()[1]
	  [self::p or self::table or self::ul or self::ol])">
      <xsl:apply-templates 
	  select="following-sibling::node()[1]" mode="copy" />
   </xsl:if>
</xsl:template>

[Aside: I think that's the first time I've used a stepping-through solution to one of these problems. Usually I use a Muenchian method, keying on the preceding acceptable element. This way is actually a lot smoother and easier to understand.]

As Mike suggested, you might want to do a two-pass solution in which you essentially label the nodes that are 'misfits' as opposed to those that are valid in the context. That makes it a little easier to know which node to move on to next.

13.

Grouping at two levels

Jeni Tennison

To do grouping at two levels, you need the key for the second level to take into account the first level of grouping as well. You need to have two keys - the first as you have it, indexing the PROROW elements by name, the second indexing the PROROW elements by name *and* project_name. You can create a key that indexes an element by two values by combining the values with the concat() function:

<xsl:key name="rows-by-name" match="PROROW" use="name"/>
<xsl:key name="rows-by-name-and-project_name" match="PROROW"
use="concat(name, '+', project_name)"/>

Then amend your first-level grouping template (which you have in the 'other' mode) so that it applies templates to all those PROROW elements of the same name (as retrieved with the first key), with a unique value according to the second key:

<xsl:template match="PROROW" mode="other">
  <b><xsl:value-of select="name" /></b>
  <xsl:apply-templates mode="again"
    select="key('rows-by-name', name)
              [generate-id(.) =
               generate-id(key('rows-by-name-and-project_name',
                               concat(name, '+', project_name)))]" />
</xsl:template>

And there you have it.

14.

Grouping by first letter

J.Pietschmann

Here is a pure XSLT 1.0 solution. It's somewhat convoluted, a solution in XSLT 2.0 or using xx:node-set() could get rid of recursion and the Piez-Method for simulating iteration and would therefore probably easier to read.

I used a simplified XML

  <?xml version="1.0"?>
  <counties>
    <county>Anderson</county>
    <county>Bailey</county>
    ...
  </counties>

in order to shorten the XPath, they are already unwieldy enough.

The core is the recursive template which accumulates both county elements with all different initials in "$startlist" and the maximum number of counties starting with a particular initial in "$maxcount".

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>

  <xsl:key name="county-initial" match="county" use="substring(.,1,1)"/>
  
  <xsl:template match="counties">
    <table>
      <xsl:call-template name="accumulate">
        <xsl:with-param name="countylist" select="county"/>
        <xsl:with-param name="startlist" select="/.."/>
        <xsl:with-param name="maxcount" select="0"/>
      </xsl:call-template>
    </table>
  </xsl:template>

  <xsl:template name="accumulate">
    <xsl:param name="countylist"/>
    <xsl:param name="startlist"/>
    <xsl:param name="maxcount"/>
    <xsl:choose>
      <xsl:when test="$countylist">
        <xsl:variable name="initial"
select="substring($countylist[1],1,1)"/>
        <xsl:variable name="currentcount"
          select="count(key('county-initial',$initial))"/>
        <xsl:choose>
          <xsl:when test="$currentcount &amp;gt; $maxcount">
            <xsl:call-template name="accumulate">
              <xsl:with-param name="countylist"
select="$countylist[not(substring(.,1,1)=$initial)]"/>
              <xsl:with-param name="startlist"
select="$startlist|$countylist[1]"/>
              <xsl:with-param name="maxcount" select="$currentcount"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <xsl:call-template name="accumulate">
              <xsl:with-param name="countylist"
select="$countylist[not(substring(.,1,1)=$initial)]"/>
              <xsl:with-param name="startlist"
select="$startlist|$countylist[1]"/>
              <xsl:with-param name="maxcount" select="$maxcount"/>
            </xsl:call-template>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise>
        <!-- use Wendell Pietz' method for iterating over 1..$maxcount -->
        <xsl:for-each select="(//node())[position() &lt;= $maxcount]">
          <xsl:variable name="index" select="position()"/>
          <tr>
            <xsl:for-each select="$startlist">
              <xsl:for-each
select="key('county-initial',substring(.,1,1))[$index]">
                <td><xsl:value-of select="."/></td>
              </xsl:for-each>
            </xsl:for-each>
          </tr>
        </xsl:for-each>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

As the lists auto documentation demon seems to be down, ask again if something is unclear or too tricky.

15.

Grouping problem

Dimitre Novatchev.

> I have the following XML:
> <a>
> 4
> 9
> 6
> 1
> 8
> 6
> 4
> 7
> </a>
>
> I am trying to generate the following ouput:
> <a>
> 1
> 4
> 4
> 6
> 6
> 7
> 8
> 9
> </a>
>
> Using xsl:for-each with a sort on b and then using position() I can
> get a ranking from 1 to 8 but I have no idea how to achieve the
above.

This is a grouping problem. You have to take just the elements with distinct values -- their positions in the node-list will determine the rank. Then for each such distinct element you have to produce all other elements with the same value -- and to assign to them the same rank.

Below is the stylesheet that does this:

<xsl:stylesheet version="1.0" 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 
 <xsl:key name="kRanking" match="b" use="."/>
 
  <xsl:template match="/">
    <a>
     <xsl:for-each select="/a/b[generate-id() 
                              = 
                               generate-id(key('kRanking',.)[1])
                               ]">
       <xsl:sort select="." data-type="number"/>
       
       <xsl:variable name="vPos" select="position()"/>
       
       <xsl:for-each select="key('kRanking',.)">
         <b rank="{$vPos}">
           <xsl:value-of select="."/>
         </b>
       
       </xsl:for-each>
     </xsl:for-each>
    </a>
  </xsl:template>
</xsl:stylesheet>

With your source xml it produces exactly the desires result:

<a>
   <b rank="1">1</b>
   <b rank="2">4</b>
   <b rank="2">4</b>
   <b rank="3">6</b>
   <b rank="3">6</b>
   <b rank="4">7</b>
   <b rank="5">8</b>
   <b rank="6">9</b>
</a>

16.

Two level grouping

Jeni Tennison

> Here is my XSL with first level grouping extracting distinct regions
> and i am having problems grouping users within Region .

The secret of 2nd level grouping with the Muenchian method is to create keys that combine the two things that you want to group by. Your first level key is:

<xsl:key name="distinct-region" match="*" use="@reg"/>

(Though I think it would be better as:

<xsl:key name="distinct-region" match="los" use="@reg"/>

since that would limit it to only holding los elements.) So your second level key should be something along the lines of:

<xsl:key name="distinct-region-and-user" match="los"
         use="concat(@reg, '+', @user)" />

Then, given that you've found a region ($reg), you can get all the unique users in that region with:

  key('distinct-region', $reg)
    [generate-id() =
     generate-id(key('distinct-region-and-user',
                     concat($reg, '+', @user)))]

17.

Grouping with keys for positional grouping

Dimitre Novatchev.

> I'm still shaky on grouping with keys so I've probably missed
> something
> obvious, but I can't get a grouping to work when it has to group on a
> recursive structure. The input looks essentially like the following:
>
> <list>
> <a type="1" flag="false"/>
> 
> <c type="3" flag="false">
> <d type="4" flag="true"/>
> <e type="4" flag="true"/>
> <f type="5" flag="true"/>
> <g type="5" flag="true"/>
> </c>
> </list>
> <list>
> <a type="1" flag="false"/>
> 
> <c type="3" flag="false">
> <d type="7" flag="false">
> <e type="4" flag="true"/>
> <e type="4" flag="true"/>
> </d>
> </c>
> </list>
>
> Where, if there are adjacent nodes with the same type than the flag
> will be
> true, otherwise the flag will always be false. It could be possible
> for
> adjacent nodes to have the same type with the flag set to false. The
> same
> structure could go many more levels deep than shown here. The desired
> output
> is
>
> <list>
> <a type="1" flag="false"/>
> 
> <c type="3" flag="false">
> <group>
> <d type="4" flag="true"/>
> <e type="4" flag="true"/>
> </group>
> <group>
> <f type="5" flag="true"/>
> <g type="5" flag="true"/>
> </group>
> </c>
> </list>
> <list>
> <a type="1" flag="false"/>
> 
> <c type="3" flag="false">
> <d type="7" flag="false">
> <group>
> <e type="4" flag="true"/>
> <e type="4" flag="true"/>
> </group>
> </d>
> </c>
> </list>
>
> Where any adjacent nodes of the same type and with "flag" = true are
> enclosed in a group.

This is a positional grouping problem. Here's one possible solution:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
  <xsl:key name="kGrp1" match="a|b|c|d|e|f|g"
   use="number(
               @flag = 'true'
              and
                not(@type = preceding-sibling::*[1]/@type)
              and
                following-sibling::*[1]/@flag = 'true'
              and
                @type = following-sibling::*[1]/@type
                )"/>
  
  <xsl:strip-space elements="*"/>
  <xsl:template match="/ | @* | node()">
    <xsl:copy>
      <xsl:apply-templates  select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  
  <xsl:template match="a|b|c|d|e|f|g">
    <xsl:choose>
      <xsl:when test="count(. | key('kGrp1', '1')) 
                     = 
                      count(key('kGrp1', '1'))">
        <group>
          <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
          </xsl:copy>
          
          <xsl:variable name="vOutOfGroupSibling" 
              select="following-sibling::*[not(@type = current()/@type
                                              and
                                               @flag = 'true'
                                               )
                                           ][1]"/>
          <xsl:variable name="vGroupLength">
            <xsl:choose>
              <xsl:when test="$vOutOfGroupSibling">
                <xsl:value-of 
                select="count($vOutOfGroupSibling/preceding-sibling::*)
                      - count(preceding-sibling::*)"/>
              </xsl:when>
              <xsl:otherwise>
                <xsl:value-of select="count(../*) 
                                    - count(preceding-sibling::*)"/>
              </xsl:otherwise>
            </xsl:choose>
          </xsl:variable>
          
          <xsl:apply-templates mode="inGroup" 
              select="following-sibling::*
                                  [position() &lt; $vGroupLength]"/>
        </group>
      </xsl:when>
      <xsl:when test="not(@type = preceding-sibling::*[1]/@type
                        and @flag = 'true'
                        and preceding-sibling::*[1]/@flag = 'true'  
                          )">
        <xsl:copy>
           <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
      </xsl:when>
    </xsl:choose>
  </xsl:template>
  
  <xsl:template match="a|b|c|d|e|f|g" mode="inGroup">
    <xsl:copy>
       <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

This transformation works correctly on your original xml file. It also works with nested groupings, e.g. when applied on the following source xml:

<lists>
  <list>
    <a type="1" flag="false"/>
    <b type="2" flag="false"/>
    <c type="3" flag="true">
      <d type="4" flag="true"/>
      <e type="4" flag="true"/>
      <f type="5" flag="true"/>
      <g type="5" flag="true"/>
    </c>
    <c type="3" flag="true"/>
  </list>
  <list>
    <a type="1" flag="false"/>
    <b type="2" flag="false"/>
    <c type="3" flag="false">
      <d type="7" flag="false">
        <e type="4" flag="true"/>
        <e type="4" flag="true"/>
      </d>
    </c>
  </list>
</lists>

the result correctly contains nested groups:

<lists>
   <list>
      <a type="1" flag="false"/>
      <b type="2" flag="false"/>
      <group>
         <c type="3" flag="true">
            <group>
               <d type="4" flag="true"/>
               <e type="4" flag="true"/>
            </group>
            <group>
               <f type="5" flag="true"/>
               <g type="5" flag="true"/>
            </group>
         </c>
         <c type="3" flag="true"/>
      </group>
   </list>
   <list>
      <a type="1" flag="false"/>
      <b type="2" flag="false"/>
      <c type="3" flag="false">
         <d type="7" flag="false">
            <group>
               <e type="4" flag="true"/>
               <e type="4" flag="true"/>
            </group>
         </d>
      </c>
   </list>
</lists>

18.

Grouping by substrings

Jeni Tennison

> I have a grouping problem I cannot work out. I have done a couple of
> stylesheets with keys but this one has me befuddled.
>
> I am having trouble declaring and using a key like this; (I am new
> to keys so go easy here)
>
> <xsl:key name="xprogs" match="doc/prog[contains(.,'msx']"
> use="substring(.,4,1)"/>
>
> My algorithm is
> a). Group by 3rd character of prog node ie 'x' in msx123 followed
> by 'y' in msy123
>
> b). For each unique 4th character of prog node list prog nodes 4
(or ideally custom value)
> per line separated by a single space
> (produce new line when a
> different 4th character
> encountered).

OK, it sounds as though you want to group by the third *and* fourth characters. You want to group *all* the prog elements rather than just those that contain 'msx', so your match attribute should match all prog elements. And if you want the third and fourth characters, then you need substring(., 3, 2):

<xsl:key name="progs" match="prog" use="substring(., 3, 2)" />

Then you have to think about selecting all those prog elements that have a unique letter-number combination: if they're the first prog with that particular letter-number combination:

<xsl:template match="doc">
  <xsl:for-each select="prog[generate-id() =
                             generate-id(key('progs',
                                             substring(., 3, 2))[1])]">
    <xsl:variable name="progs"
                  select="key('progs', substring(., 3, 2))" />
    ...
  </xsl:for-each>
</xsl:template>

Once you've got that set of $progs together, you can group them by their position. Say you set a global parameter to the number you want in each group:

<xsl:param name="nprogs" select="4" />

then you can loop through the $progs and use position() mod $nprogs to work out whether you need to add a newline or a space before the value of the particular prog:

<xsl:template match="doc">
  <xsl:for-each select="prog[generate-id() =
                             generate-id(key('progs',
                                             substring(., 3, 2))[1])]">
    <xsl:variable name="progs"
                  select="key('progs', substring(., 3, 2))" />
    <xsl:for-each select="$progs">
      <xsl:choose>
        <xsl:when test="position() mod $nprogs = 1">
          <xsl:text>&#xA;</xsl:text>
        </xsl:when>
        <xsl:otherwise>
          <xsl:text> </xsl:text>
        </xsl:otherwise>
      </xsl:choose>
      <xsl:value-of select="." />
    </xsl:for-each>
  </xsl:for-each>
</xsl:template>

19.

Concatenate two elements

Jeni Tennison

> I have been bashing my brain for days over this and I need help.
> Here is the (style of) input I have:
>
> <record n="1" type="normal">
> <foo> <x>... <y>...</y> ...</x> </foo>
> <bar> <things> ... </things> </bar>
> </record>
> <record n="2" type="normal">
> <foo> <x>... <y>...</y> ...</x> </foo>
> </record>
> <record n="3" type="continuation">
> <bar> <things> ... </things> </bar>
> </record>
> <record n="4" type="normal">
> <foo> <x>... <y>...</y> ...</x> </foo>
> <bar> <things> ... </things> </bar>
> </record>
>
> The problem is <record>s 2 and 3: they need to be concatenated.

This isn't *too* hard (wait 'til you get on to the really tricky grouping problems ;), "just" requires you to think in a declarative way rather than a procedural way. You want to treat records with a type of 'continuation' in a different way from those with a type of 'normal', so you need separate templates for the two types:

<xsl:template match="record[@type = 'normal']">
  ...
</xsl:template>

<xsl:template match="record[@type = 'continuation']">
  ...
</xsl:template>

When you come across a record with a type of 'normal', you want to create a record element and copy the content of that record into it. You also want to include the content of the next sibling record, if it's of type 'continuation':

<xsl:template match="record[@type = 'normal']">
  <record>
    <xsl:copy-of select="*" />
    <xsl:copy-of select="following-sibling::record[1]
                           [@type = 'continuation']/*"/>
  </record>
</xsl:template>

On the other hand, if the record is of type 'continuation' then you don't want to generate a record, and you don't have to worry about the content of the record because it's already been taken care of by the previous record. So you do nothing:

<xsl:template match="record[@type = 'continuation']" />

Personally, in this situation, I'd only apply templates to the records whose type is 'normal' in the first place, so I'd have something like:

<xsl:template match="records">
  <xsl:apply-templates select="record[@type = 'normal']" />
</xsl:template>

<!-- this template will only be applied to records with a type of
     'normal' -->
<xsl:template match="record">
  <record>
    <xsl:copy-of select="*" />
    <xsl:copy-of select="following-sibling::record[1]
                           [@type = 'continuation']/*"/>
  </record>
</xsl:template>

Note that this method only works if there's only one continuation for each normal record. If there might be more, then I'd use a key-based solution where you index each continuation record by its closest normal record and use that to identify which extra fields need to be added to the new record. If you need help with that, let us know.

20.

Grouping by first letter

Joerg Pietschmann

  <?xml version="1.0"?>
  <counties>
    <county>Anderson</county>
    <county>Bailey</county>
    ...
  </counties>

in order to shorten the XPath, they are already unwieldy enough.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html"/>

  <xsl:key name="county-initial" match="county" use="substring(.,1,1)"/>
  
  <xsl:template match="counties">
    <table>
      <xsl:call-template name="accumulate">
        <xsl:with-param name="countylist" select="county"/>
        <xsl:with-param name="startlist" select="/.."/>
        <xsl:with-param name="maxcount" select="0"/>
      </xsl:call-template>
    </table>
  </xsl:template>

  <xsl:template name="accumulate">
    <xsl:param name="countylist"/>
    <xsl:param name="startlist"/>
    <xsl:param name="maxcount"/>
    <xsl:choose>
      <xsl:when test="$countylist">
        <xsl:variable name="initial"
select="substring($countylist[1],1,1)"/>
        <xsl:variable name="currentcount"
          select="count(key('county-initial',$initial))"/>
        <xsl:choose>
          <xsl:when test="$currentcount &gt; $maxcount">
            <xsl:call-template name="accumulate">
              <xsl:with-param name="countylist"
select="$countylist[not(substring(.,1,1)=$initial)]"/>
              <xsl:with-param name="startlist"
select="$startlist|$countylist[1]"/>
              <xsl:with-param name="maxcount" select="$currentcount"/>
            </xsl:call-template>
          </xsl:when>
          <xsl:otherwise>
            <xsl:call-template name="accumulate">
              <xsl:with-param name="countylist"
select="$countylist[not(substring(.,1,1)=$initial)]"/>
              <xsl:with-param name="startlist"
select="$startlist|$countylist[1]"/>
              <xsl:with-param name="maxcount" select="$maxcount"/>
            </xsl:call-template>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise>
        <!-- use Wendell Pietz' method for iterating over 1..$maxcount -->
        <xsl:for-each select="(//node())[position() &lt;= $maxcount]">
          <xsl:variable name="index" select="position()"/>
          <tr>
            <xsl:for-each select="$startlist">
              <xsl:for-each
select="key('county-initial',substring(.,1,1))[$index]">
                <td><xsl:value-of select="."/></td>
              </xsl:for-each>
            </xsl:for-each>
          </tr>
        </xsl:for-each>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

As the lists auto documentation demon seems to be down, ask again if something is unclear or too tricky.

21.

Grouping by two attributes

Mike Brown

> I have an Xml list, for esample this list of skills:
>
> <skills>
> <skill mark="excellent" name="excellentskill">
> <skill mark="excellent" name="excellent skill">
> <skill mark="good" name="goodskill">
> <skill mark="good" name="goodskill">
> <skill mark="basic" name="basicskill">
> <skill mark="basic" name="basicskill">
> <skill mark="excellent" name="excellentskill">
> <skill mark="good" name="goodskill">
> <skill mark="basic" name="basicskill">
> </skills>
>
> and I want to list in groups based on 'mark' attribute:
> [...]
> What I want to obtain is an xhtml list like this
>
> excellent skills: excellentskill
> excellentskill
> excellentskill
> good skills: goodskill
> goodskill
> goodskill
> basic skills: basicskill
> basicskill
> basicskill
>
> using <div>s or <table>s

Grouping problems are a FAQ. The most efficient solution is explained at jenitennison.com, and in your case would look something like this:

  <xsl:key name="skills-by-mark" match="skill" use="@mark"/>
  <xsl:template match="skills">
    <table>
      <!-- process a set consisting of the first skill element for each mark
-->
      <xsl:for-each
select="skill[count(.|key('skills-by-mark',@mark)[1])=1]">
        <tr>
          <td><b><xsl:value-of select="concat(@mark,' skills:')"/></b></td>
          <td>
            <!-- process all skill elements having the current skill's mark
-->
            <xsl:for-each select="key('skills-by-mark',@mark)">
              <xsl:value-of select="@name"/>                
              <xsl:if test="position()!=last()"><br/></xsl:if>
            </xsl:for-each>
          </td>
        </tr>
      </xsl:for-each>
    </table>
  </xsl:template>

There is an easier to understand, but less efficient, way that doesn't use
keys, but does the same thing, first identifying the set of skill elements
that are the first ones with that mark, and then for each of those,
finding the rest with that mark.

<xsl:for-each select="skill[not(@mark=preceding-sibling::skill/@mark)]">
  ...
  <xsl:for-each select=".|following-sibling::skill[@mark=current()/@mark]">
    ...

22.

Grouping by text values, to reduce to single values

Eliot Kimber

I have an XML file which is going to be used as a "dictionary" for an internationalised web application. The structure of he file is like so:

<dictionary>
	<text>foo</text>
	<text>bar</text>
	<text>foo</text>
	<text>baz</text>
	<text>foobar</text>
	(etc...)
</dictionary>

The file contains quite a few "duplicates" (in terms of the text() content of the node), and I've been trying to figure out a way to strip out all the dupicates, leaving me with an XML file with only unique <text> elements.

I wrote an XSL to identify all the duplicates, and print them out [basically using: test="current() = following-sibling::text or current() = preceding-sibling::text"] But now I want to actually remove the duplicates and create a new XML file in the output tree.

Eliot answers.

The way to do this is with what I call the "union trick". It took me a long time to finally figure out what was going on and I realized that my barrier had been not fully understanding that the "|" operator is a set union, not a logical OR. [I was trying to understand the code Jenny Tennison had written to do back-of-the-book index processing for Docbook.]

What you do is get the current node and the first node of the current nodes' entry in the key table and then construct a set from them using the union operator ("|"). If the result is a list of length one, then the two nodes must be the same node because if they were different nodes you'd get a set of length 2. The key is that sets, by definition, always contain exactly one copy of each node in the set.

So, given this group spec:

<xsl:key name="text-by-content" match="text" use="normalize-space(.)"
/>

You would do something like this:

   <xsl:variable name="text-items"
         select="//term[count(.|key('text-by-content', 

                                    normalize-space(.))[1]) = 1]"/>

Follow this from the inside out:

1. key('text-by-content',
        normalize-space(.))[1]

This looks up the key table entry for each term selected by the "//term" pattern and then selects the first item in that list, that is, the first instance of a given term value.

2. ".|key(...)[1]"

This creates a set from the current node and the first node of the key table entry that contains the current node.

3. count(.|key(...)[1])

This gets the length of the set.

4. count(...) = 1

This returns true if the length of the set is 1, meaning that the current <term> node is the first node in its containing key table entry. This node will be selected and added to the result node list.

You can test the result by doing this:

<xsl:for-each select="$text-items">
   <xsl:message>[<xsl:value-of select="position(.)"/>] = '<xsl:value-of 
select="."/>'</xsl:message>
</xsl:for-each>

When doing this type of grouping work, I find it really useful to create a "debug" template that just constructs all the different groups and then reports them--makes it easier to work out the details of the key specs and lookups. If you're doing sorting, it also makes it easy to test your collation rules.

23.

Unique items using Muenchian grouping

Mukul Gandhi

> Given an XML file that contains a list of Items each
> having an attribute
> Colour with its value,
> is it possible to obtain a list of all unique colours of Items.
>
> For example if XML looks something like this:
>
> <Item name="item1">
> <Field name="Colour">
> <Value>Red</Value>
> </Field>
> </Item>
> <Item name="item2">
> <Field name="Colour">
> <Value>Blue</Value>
> </Field>
> </Item>
> <Item name="item3">
> <Field name="Colour">
> <Value>Red</Value>
> </Field>
> </Item>
>
> Then I would like to obtain a list containg exactly two entries: Red,
> Blue

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" version="1.0"
encoding="UTF-8" indent="yes"/>

<xsl:key name="x" match="Value" use="." />

<xsl:template match="/root">
   <xsl:for-each select="Item/Field">
     <xsl:if test="generate-id(Value) =
generate-id(key('x', Value)[1])">
         <xsl:value-of select="Value" />
     </xsl:if>
   </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

The XSL above, uses Muenchian method for Grouping.

24.

Grouping and remove duplicates

Mukul Gandhi

> I have a structure on which I want to make a unicity sort. I don't
> know how to begin.
> here it is what I have :
>
> <1>
> <a/>
> 
> </1>
> <2>
> <a/>
> </2>
> <3>
> 
> <c/>
> </3>
> <1>
> <a/>
> <c/>
> </1>
>
> etc ...
>
> And I want to sort the "number" by "letters" :
> Here is what I want :
>
> <1>
> <a/>
> 
> <c/>
> </1>
> <2>
> <a/>
> </2>
> <3>
> 
> <c/>
> </3>

Please try the following XSL -

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xalan">

<xsl:output method="xml" version="1.0"
encoding="UTF-8" indent="yes"/>
	
<xsl:key name="by-num" match="/root/*" use="name()"/> <xsl:key
name="by-alphabet" match="/temp/*"
use="name()"/>
	
<xsl:template match="/root">
   <xsl:for-each select="*">
     <xsl:if test="generate-id(.) =
generate-id(key('by-num', name())[1])">
	<xsl:element name="{name()}">
          <xsl:variable name="rtf1">
	    <temp>
	      <xsl:for-each select="key('by-num', name())">
		<xsl:copy-of select="child::*"/>
	      </xsl:for-each>
	    </temp>
	  </xsl:variable>	  
          <xsl:variable name="rtf2">
	    <temp>
	      <xsl:for-each
select="xalan:nodeset($rtf1)/temp/*">
		<xsl:if test="generate-id(.) =
generate-id(key('by-alphabet', name())[1])">
		  <xsl:element name="{name()}"/>
		</xsl:if>
	      </xsl:for-each>
	    </temp>
	  </xsl:variable>
	  
          <xsl:for-each
select="xalan:nodeset($rtf2)/temp/*">
	    <xsl:sort select="name()" />
	    <xsl:element name="{name()}" />
	  </xsl:for-each>
      </xsl:element>
   </xsl:if>
</xsl:for-each>

</xsl:template>
	
</xsl:stylesheet>

<1>, <2> are not valid XML tag names.

I tested the XSL with the following XML -

<?xml version="1.0" encoding="UTF-8"?>
<root>
   <one>
    <a/>
    <b/>
   </one>
   <two>
    <a/>
   </two>
   <three>
    <b/>
    <c/>
   </three>
   <one>
    <a/>
    <c/>
   </one>
</root>

and got the output -
<?xml version="1.0" encoding="UTF-8"?>
<one>
  <a/>
  <b/>
  <c/>
</one>
<two>
  <a/>
</two>
<three>
  <b/>
  <c/>
</three>

25.

Grouping a flat structure

Michael Kay

> I am using XSLT2 as implemented by Saxon 7.9.1 to group a flat
> structure. What I start out with is something like
>
> <foo>
> <bar baz="1" />
> <bar baz="2" />
> <bar baz="2" />
> <bar />
> <bar baz="1" />
> <bar baz="1" />
> <bar />
> </foo>
>
> Now I need to put these bars in a list. The first step is easy enough
>
> This is all well and good, but now I need to get to
>
> <foo>
> <list>
> <list-item>
> <bar baz="1" />
> </list-item>
> <list-item>
> <bar baz="2" />
> <bar baz="2" />
> </list-item>
> </list>
> <bar />
> <list>
> <list-item>
> <bar baz="1" />
> <bar baz="1" />
> </list-item>
> </list>
> <bar />
> </foo>

I think that when you need to do two levels of grouping like this, it is usually easier to do it top-down: that is, do the outer level first. Doing it bottom-up as you are attempting also works, but it requires two passes over the data.

The top-down solution (untested) looks something like this:

<xsl:for-each-group select="bar" group-adjacent="exists(@baz)">
  <xsl:choose>
  <xsl:when test="exists(@baz)">
    <list>
      <xsl:for-each-group select="current-group() group-adjacent="@baz">
         <list-item>
            <xsl:copy-of select="current-group()" />
         </list-item>
      </xsl:for-each-group>
    </list>
  </xsl:when>
  <xsl:otherwise>
      <xsl:copy-of select="current-group()"/>
  </xsl:otherwise>
  </xsl:choose>
</xsl:for-each-group>

26.

Muenchian grouping from tables.

Mukul Gandhi

> I have
> <table>
> <tr>
> <td>D686 (code)</td>
> <td>Work (title)</td>
> <td>1 (points)</td>
> </tr>
> <tr>
> <td>E004 (code)</td>
> <td>English (title)</td>
> <td>2 (points)</td>
> </tr>
> </table> etc etc
>
> I've done this, however, all the values are in 1 table - what I need
> to do is now split the data - all codes starting with 'E'
> should be in a seperate table
> to those starting with 'E'.

Assuming you have written the XSL for 1st part of your requirement, the following XSL does grouping based on Muenchian method -

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes"/>
	
<xsl:key name="by-tr" match="tr"
use="substring(td[1],1,1)"/>
	
<xsl:template match="/table">
  <html>
    <head>
      <title/>
    </head>
    <body>
      <xsl:for-each select="tr">
	<xsl:if test="generate-id(.) =
	  generate-id(key('by-tr', substring(td[1],1,1))[1])">
	  <table>
	    <xsl:for-each select="key('by-tr',
	  substring(td[1],1,1))">
	     <tr>
	      <td>
		<xsl:value-of select="td[1]"/>
	      </td>
	      <td>
		<xsl:value-of select="td[2]"/>
	      </td>
	      <td>
		<xsl:value-of select="td[3]"/>
	      </td>
	     </tr>
	    </xsl:for-each>
	 </table>
      </xsl:if>
    </xsl:for-each>
  </body>
</html>
</xsl:template>

</xsl:stylesheet>

27.

Tree Walking, forward walk

Jarno Elovirta

I'm crediting Jarno with this because he provided the examples,
though we had difficulty in determining who first brought this up.

First template selects a starting point (title in the first example) The 'walk' template walks down the following-sibling axis, one node at a time, repeating the processing until the terminating condition is met

	    ( not(following-sibling::*[1]/self::title) in the example)

at which time the first template is triggered to start the next group, and the whole process repeats.

Source XML:

<document>
  <title>First title</title>
  <para>First para</para>
  <para>Second para</para>
  <title>Second title</title>
  <para>Third para</para>
  <para>Fourth para</para>
</document>

XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="document">
    <xsl:copy>
      <xsl:apply-templates select="title" mode="walker"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="title" mode="walker">
    <section>
      <xsl:apply-templates select="."/>
      <xsl:apply-templates select="following-sibling::*[1]" mode="walker"/>
    </section>
  </xsl:template>
  <xsl:template match="*" mode="walker">
    <xsl:apply-templates select="."/>
    <xsl:if test="not(following-sibling::*[1]/self::title)">
      <xsl:apply-templates select="following-sibling::*[1]" mode="walker"/>
    </xsl:if>
  </xsl:template>
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Result XML:

<document>
  <section>
    <title>First title</title>
    <para>First para</para>
    <para>Second para</para>
  </section>
  <section>
    <title>Second title</title>
    <para>Third para</para>
    <para>Fourth para</para>
  </section>
</document>

or data-centric where you group SubConcept elements and their non-SubConcept children.

Source XML:

<Top>
  <PrimeConcept id="0001" type="none">A</PrimeConcept>
  <SubConcepts>
    <SubConcept id="0002" name="A1">
      <Value ref="0003">hasProperty1 AB</Value>
      <Value ref="0004">hasProperty2 XY</Value>
      <SubConcept id="0004" name="XY">
        <ChildConcept ref="0005">XY1</ChildConcept>
        <SubConcept id="0005" name="XY1">
          <ChildConcept ref="0007">XY11</ChildConcept>
          <ChildConcept ref="0008">XY12</ChildConcept>
        </SubConcept>
        <ChildConcept ref="0006">XY2</ChildConcept>
      </SubConcept>
    </SubConcept>
  </SubConcepts>
</Top>

XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="SubConcepts">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:for-each  select="descendant::SubConcept">
        <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:apply-templates select="*[1]" mode="walker"/>
        </xsl:copy>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="Value | ChildConcept" mode="walker">
    <xsl:apply-templates select="."/>
    <xsl:apply-templates select="following-sibling::*[1]" mode="walker"/>
  </xsl:template>
  <xsl:template match="*" mode="walker">
    <xsl:apply-templates select="following-sibling::*[1]" mode="walker"/>
  </xsl:template>
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Result XML:

<Top>
  <PrimeConcept id="0001" type="none">A</PrimeConcept>
  <SubConcepts>
    <SubConcept id="0002" name="A1">
      <Value ref="0003">hasProperty1 AB</Value>
      <Value ref="0004">hasProperty2 XY</Value>
    </SubConcept>
    <SubConcept id="0004" name="XY">
      <ChildConcept ref="0005">XY1</ChildConcept>
      <ChildConcept ref="0006">XY2</ChildConcept>
    </SubConcept>
    <SubConcept id="0005" name="XY1">
      <ChildConcept ref="0007">XY11</ChildConcept>
      <ChildConcept ref="0008">XY12</ChildConcept>
    </SubConcept>
  </SubConcepts>
</Top>

And e.g. where you have change-start and change-end elemens in a flat structure and you want to group them inside an ins element

Source XML:

<document>
  <para>First para</para>
  <change-start id="c1"/>
  <para>Second para</para>
  <para>Third para</para>
  <change-start id="c2"/>
  <para>Fourth para</para>
  <change-end id="c2"/>
  <para>Fifth para</para>
  <change-end id="c1"/>
  <para>Sixth para</para>
</document>

XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="document">
    <xsl:copy>
      <xsl:apply-templates select="*[1]" mode="walker"/>
    </xsl:copy>
  </xsl:template>
  <!-- copy non-change elements through and continue walking -->
  <xsl:template match="*" mode="walker">
    <xsl:param name="id"/>
    <xsl:apply-templates select="."/>
    <xsl:apply-templates select="following-sibling::*[1]" mode="walker">
      <xsl:with-param name="id" select="$id"/>
    </xsl:apply-templates>
  </xsl:template>
  <!-- on change-start create ins element, process changed content 
  into that and continue from after corresponding change-end -->
  <xsl:template match="change-start" mode="walker">
    <xsl:param name="id"/>
    <ins id="{@id}">
      <xsl:apply-templates select="following-sibling::*[1]" mode="walker">
        <xsl:with-param name="id" select="@id"/>
      </xsl:apply-templates>
    </ins>
    <xsl:apply-templates select="following-sibling::change-end[@id = 
    current()/@id]/following-sibling::*[1]" mode="walker">
        <xsl:with-param name="id" select="$id"/>
    </xsl:apply-templates>
  </xsl:template>
  <!-- stop walking is matching end, otherwise continue -->
  <xsl:template match="change-end" mode="walker">
    <xsl:param name="id"/>
    <xsl:if test="not(@id = $id)">
      <xsl:apply-templates select="following-sibling::*[1]" mode="walker">
        <xsl:with-param name="id" select="$id"/>
      </xsl:apply-templates>
    </xsl:if>
  </xsl:template>
  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Result XML:

<document>
  <para>First para</para>
  <ins id="c1">
    <para>Second para</para>
    <para>Third para</para>
    <ins id="c2">
      <para>Fourth para</para>
    </ins>
    <para>Fifth para</para>
  </ins>
  <para>Sixth para</para>
</document>

I still don't know the name of this grouping method, so I can't tell you what I'm describing here.

28.

Grouping

David Carlisle and Dimitre Novatchev.

> Group the following XML into 2 periods. The periods
> are arbitrary, but for this example they happen to be:
> Period 1: 1 - 12
> Period 2: 14 - 30
> I.e. non-overlapping ranges.
>
> Expected Result:
> <result>
> <period begins="1" ends="12">
> 
> 
> 
> </period> <period
> begins="14" ends="30">
> 
> 
> </period> </result>

using Dimitre's test file (which has sorted input) here's a simplish pure xslt1 solution, no node set or other extensions.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">


<xsl:output indent="yes"/>

<xsl:template match="A">
<result>
<xsl:apply-templates select="B[1]"/>
</result>
</xsl:template>

<xsl:template match="B">
 <xsl:param name="b" select="@period_begin"/>  
<xsl:param name="e" select="@period_end"/>  
<xsl:param name="g" select="/.."/> 
<xsl:variable name="e2" 
   select="@period_end[. &gt; $e]|$e[. &gt;= current()/@period_end]"/> 
<xsl:choose> 
  <xsl:when test="../B[@period_begin &lt;=$e2 and @period_end &gt; $e2]">  

   <xsl:apply-templates select="following-sibling::B[1]">
  <xsl:with-param name="b" select="$b"/>
  <xsl:with-param name="e" select="$e2"/>
  <xsl:with-param name="g" select="$g|."/>  </xsl:apply-templates> </xsl:when>
  <xsl:otherwise>
  <period begins="{$b}" ends="{$e2}">
    <xsl:copy-of select="$g|."/>
  </period>
 <xsl:apply-templates select="following-sibling::B[1]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>


</xsl:stylesheet>

With input

<A>
        <B period_begin="1" period_end="5"/>
        <B period_begin="2" period_end="7"/>
        <B period_begin="3" period_end="10"/>
        <B period_begin="4" period_end="12"/>
        <B period_begin="14" period_end="16"/>
        <B period_begin="16" period_end="20"/>
        <B period_begin="16" period_end="30"/>
        <B period_begin="32" period_end="33"/>
        <B period_begin="33" period_end="38"/>
</A>

Produces this output

<?xml version="1.0" encoding="utf-8"?>
<result>
   <period begins="1" ends="12">
      <B period_begin="1" period_end="5"/>
      <B period_begin="2" period_end="7"/>
      <B period_begin="3" period_end="10"/>
      <B period_begin="4" period_end="12"/>
   </period>
   <period begins="14" ends="30">
      <B period_begin="14" period_end="16"/>
      <B period_begin="16" period_end="20"/>
      <B period_begin="16" period_end="30"/>
   </period>
   <period begins="32" ends="38">
      <B period_begin="32" period_end="33"/>
      <B period_begin="33" period_end="38"/>
   </period>
</result>

At a user request, David goes on to explain...

> David, WOW! ... i'm still trying to figure out how it works ...
> So, can you explain? This is like magic..

Have you considered the possibility that it really is magic? Asking how it works might spoil the fun. It's just like sawing ladies in half...

> Things I don't get:
Oh if you insist...

> 1)  Variable e2 select.
> select="@period_end[. &gt; $e]|$e[. &gt;= current()/@period_end]"
>

That expression is just me being tricksy, It just sets e2 to being the maximum of e and the current period_end attribute, ie as you walk along the list one at a time it keeps a note of the current period end.

I could have written

<xsl:variable name="e2">
 <xsl:choose>
  <xsl:when test="@period_end &gt; $e"><xsl:value-of select="@period_end"/></xsl:when>
  <xsl:otherwise><xsl:value-of select="$e"/></xsl:otherwise>  </xsl:choose> </xsl:variable>

but that would just make people yearn for XSLT2's

<xsl:variable name="e2" select="max(@period_end,$e)"/>

Whereas the version I used makes XSLT1 look exotic and enticing:-)

> How does the pipe work here and is this only evaluating the current B 
> element, or evaluating all @period_end(s)?

The thing about understanding Xpath expressions is to remember that things mean what they mean, even if they get used in unexpected places, so if I'd have gone select="a|b" you would probably not have asked what | means: it means select all a's and all b's and take the union of those two selections. similarly if I'd have gone selct="@foo" you probably wouldn't have asked which element this is the foo atribute of, it's (just) the curent element. so putting it all together @period_end[. > $e] selects those period_end attributes of the current element for which the value of the attribute is greater than $e. Since there is only one period_end attribute this either selects that attribute (if it is greater than $e) or selects the empty set otherwise.

the other side of the | is $e[. >= current()/@period_end] the [] predicate is the negation of the last one, so this node set is $e if the other node set is empty and empty if the other node set is @period_end

so the | is the union of two node sets, one of which has one attribute node, the other is empty so $e2 ends up being a node set of exactly one attribute node, with value the maximum of the two attributes compared.


> 2)  Right off the bat (first iteration), I don't understand how you 
> determine the period attribute "ends" value.

You walk along the nodes one at a time, carrying the current best guess of the end of the period in the parameter $e, as discussed above. You can tell when to stop because there is no B note that "overlaps" this current guessed end, ie there is a B that starts before $e2 and ends after it:

<xsl:when test="../B[@period_begin &lt;=$e2 and @period_end &gt; $e2]">

If this condition is not satisfied you just process the next node and try again. You never output any elements until you get to the end of a range, you just pass the beginning of the range, the current best guess of the end, and all the B nodes so far collected in the three parameters b e and g.

Once it is satisfied you can make your wrapper element with the determined range and then just copy-of $g and the current element to form the content, then agsin process the next B, this time not setting any parameters so you initialise a new range.

> 3)  Variable g select, what does this get you, the ancestor record?
> select="/.."

The value on a param setting is only used if a parameter is not explictly supplied. /.. is the parent of the root node, which doesn't exist so this is the empty set. If instead I had gone <xsl:param name="g" /> the default value would have been an empty string but then when you try to add teh current node to teh collection when moving on to the next node: <xsl:with-param name="g" select="$g|."/> The | would generate an error that it can't be used with an empty string if $g was ""/ If $g is the empty set then $g|. is the union of the emopty set and the current node, which is the current node.


> 4) The copy of within the element period within the otherwise then the 
> apply templates rule, it obviously creates the new <period/> element, 
> but I don't see how your recursive template call inserts the 
> "members", I don't get how you are preserving the member element of 
> period

No. The copy-of copies $g (all the B elements picked up earlier in this range, and . which is the current B element) the period element isn't copied from anywhere, it's generated as a literal result element on the line above: <period begins="{$b}" ends="{$e2}">

Having just looked over this, I think my XSLT is easier to understand than my English description of it, so maybe I should just give up and see if the list auto-documentation-daemon is triggered and documentation arrives from elsewhere....

> How long did it take you to come up with this solution?

As anyone reading this list may have noticed, my typing isn't that accurate so it probably look me longer to type it in (and a lot longer to type that last reply) than it did to actually come up with the code. Starting from scratch it would have taken longer as I wouldn't have had a clear view of the criterion for what constituted a range from the original description, but once someone (Michael, I think) commented the test for whether a element terminated a range could be a simple test whether any other element overlapped this one, the rest of it followed more or less naturally.

> Have you seen similar problems like this in the past

Once it is was clear that there was an atomic test that you could do that flagged when the group needed to change then it's pretty much a standard grouping question of the type that we see on this list every day for the last 7 years or so:-) The thing that makes this one a bit more interesting (and stops the usual grouping solutions working out of the box) is that you need to add an attribute to the grouping element that you don't know until the end of the group. As you have to add attributes before child elements this means that you have to save up the child elements to add later, hence the $g parameter. Apart from that it's a standard "tree walking" grouping method, another example of which I posted in another thread earlier in the week (in that case grouping on processing instruction nodes)

and from Dimitre

An XSLT 2.0 solution, but using f:foldl(.) This can be re-written 1:1 in XSLT 1.0 + FXSL for XSLT 1.0.

The reason I'm posting this is because it resembles very much the "functional tokenizer" (see for example: xslt archive) and your problem can be called something like: "interval tokenization".

I still cannot fully assimilate the meaning and implications of this striking similarity but it seems to prove that there's law, order and elegance in the world of functional programming.

Here's the transformation:

<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:foldl-func="foldl-func"
xmlns:f="http://fxsl.sf.net/"
exclude-result-prefixes="f foldl-func"
>

   <xsl:import href="../f/func-foldl.xsl"/>
   
   <xsl:output omit-xml-declaration="yes" indent="yes"/>

<!--
    This transformation must be applied to:  
        ../data/periods.xml                  
-->
   <xsl:variable name="vFoldlFun" as="element()">
           <foldl-func:foldl-func/>
   </xsl:variable>
   
   <xsl:variable name="vA0" as="element()+">
     <period start="0" end="0"/>
   </xsl:variable>

    <xsl:template match="/">
      <xsl:sequence select="f:foldl($vFoldlFun, $vA0, /*/* )[position() > 1]"/>
    </xsl:template>
    
    <xsl:template match="foldl-func:*" as="element()+"
     mode="f:FXSL">
       <xsl:param name="arg1"/>
       <xsl:param name="arg2"/>
       
       <xsl:variable name="vLastPeriod" select="$arg1[last()]"/>
         
       <xsl:choose>
         <xsl:when test=
            "number($arg2/@period_begin) > number($vLastPeriod/@end)">
           <xsl:sequence select="$arg1"/>
           <period start="{$arg2/@period_begin}" end="{$arg2/@period_end}"/>
         </xsl:when>
         <xsl:otherwise>
           <xsl:sequence select="$arg1[not(. is $vLastPeriod)]"/>
           <xsl:choose>
             <xsl:when test="number($arg2/@period_end) > number($vLastPeriod/@end)">
               <period start="{$vLastPeriod/@start}" end="{$arg2/@period_end}"/>
             </xsl:when>
             <xsl:otherwise>
               <xsl:sequence select="$vLastPeriod"/>
             </xsl:otherwise>
           </xsl:choose>
         </xsl:otherwise>
       </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

When applied on the same source xml document:

<A>
        <B period_begin="1" period_end="5"/>
        <B period_begin="2" period_end="7"/>
        <B period_begin="3" period_end="10"/>
        <B period_begin="4" period_end="12"/>
        <B period_begin="14" period_end="16"/>
        <B period_begin="16" period_end="20"/>
        <B period_begin="16" period_end="30"/>
        <B period_begin="32" period_end="33"/>
        <B period_begin="33" period_end="38"/>
</A>

it produces the wanted result:

<period start="1" end="12"/>
<period start="14" end="30"/>
<period start="32" end="38"/>

29.

Intersect

Florent Georges

> What I am trying to express in an Xpath is "Look to see
> where there are any <w:t> elements containing text between
> these 2 <w:br>s".

What you are looking for is computing the intersection between two sets. In XPath 2.0, you have the 'intersect' operator. In XSLT 1.0, you can use the following technique, using the fact that nodes appearing in two sets don't modify the count of the elements in one set. Strangely, I didn't find reference to this in the FAQ, Dave. Maybe I didn't look at the right place?

   [29] ~/xslt/tests$ cat intersect.xml
   <root xmlns:w="WordML">
     <w:t id="a"/>
     <w:br/>
     <w:t id="b"/>
     <w:t id="c"/>
     <w:br/>
     <w:t id="d"/>
     <w:br/>
     <w:t id="e"/>
   </root>


   [30] ~/xslt/tests$ cat intersect.xsl
   <xsl:transform
       xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
       xmlns:w="WordML"
       version="1.0">

     <xsl:output omit-xml-declaration="yes" indent="yes"/>

     <xsl:variable name="after-1st"  select="
         /*/w:br[1]/following-sibling::w:t"/>
     <xsl:variable name="before-2nd" select="
         /*/w:br[2]/preceding-sibling::w:t"/>
     <xsl:variable name="intersect"  select="
         $after-1st[count(.|$before-2nd) = count($before-2nd)]"/>

     <xsl:template match="/">
       <result>
         <xsl:copy-of select="$intersect"/>
       </result>
     </xsl:template>

   </xsl:transform>

   [31] ~/xslt/tests$ xalan -XSL intersect.xsl -IN intersect.xml
   <result xmlns:w="WordML">
   <w:t id="b"/>
   <w:t id="c"/>
   </result>

   [32] ~/xslt/tests$ xsltproc intersect.xsl intersect.xml
   <result xmlns:w="WordML">
     <w:t id="b"/>
     <w:t id="c"/>
   </result>

   [33] ~/xslt/tests$ saxon intersect.xml intersect.xsl
   Warning: at xsl:transform on line 4 of ~/xslt/tests/intersect.xsl:
     Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
   <result xmlns:w="WordML">
      <w:t id="b"/>
      <w:t id="c"/>
   </result>

	Home	Feedback
Copyright © 1999-2012 Dave Pawson.