Jeni Tennison
For example: with input <xmlfile>
<book>
<title>The quick brown</title>
</book>
<book>
<title>A little knowledge is a dangerous thing</title>
</book>
<book>
<title>Is this the real thing</title>
</book>
</xmlfile> How to get output like
<result>
<before>The quick brown</before>
<after>quick brown</after>
<before>A little knowledge is a dangerous thing</before>
<after>little knowledge is a dangerous thing</after>
<before>Is this the real thing</before>
<after>this the real thing</after>
</result>
Adapting Eric's solution:
The xsl:stylesheet element declares the necessaries, and the additional
namespace 'sw' that is used for the internal data (the list of stop words).
To prevent this namespace being declared on your output, use
'exclude-result-prefixes':
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sw="mailto:vdv@dyomedea.com"
exclude-result-prefixes="sw">
...
</xsl:stylesheet>
Then the declaration of the stop words that you want to filter out. I've
put these in a variable so that they can be accessed easily:
<sw:stop>
<word>the</word>
<word>a</word>
<word>is</word>
</sw:stop>
<xsl:variable name="stop-words"
select="document('')/xsl:stylesheet/sw:stop/word" />
Declaration of two variables so that we can translate between upper and
lower case fairly easily:
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
Now the template. I've only used one for brevity, but of course you can
split it down into several through calling and applying templates. Within
this template, I iterate through each of the titles. For each title, I
find all the stop words such that the current title starts with that stop
word (plus a space, and all ignoring case). If there is such a match, then
the title is substring()ed to give the resulting title by taking off the
characters that make up the word it begins with.
<xsl:template match="/">
<result>
<xsl:for-each select="xmlfile/book/title">
<before><xsl:value-of select="." /></before>
<xsl:variable name="begins-with"
select="$stop-words[starts-with(translate(current(), $uppercase,
$lowercase),
concat(translate(., $uppercase,
$lowercase),
' '))]" />
<after>
<xsl:choose>
<xsl:when test="$begins-with">
<xsl:value-of
select="substring(., string-length($begins-with) + 2)" />
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="." />
</xsl:otherwise>
</xsl:choose>
</after>
</xsl:for-each>
</result>
</xsl:template>
This strips leading stop words in SAXON and MSXML (July). It works in
Xalan-C++ v.0.40.0 except for the exclude-result-prefixes thing, which is
ignored.
However...
>How do you XSL-create a sort criterion? ...you can't (at the moment) use a template to create a string to use as a
sort criterion. Sort criteria have to be XPath select expressions. This
problem will go away when (a) you can convert RTFs to node sets and/or (b)
when you can use something like saxon:function to declare extension
functions within XSLT.
For the meantime, then you have to use something really horrible like:
<xsl:template match="/">
<result>
<xsl:for-each select="xmlfile/book/title">
<xsl:sort select="concat(substring(substring-after(., ' '), 0 div
boolean($stop-words[starts-with(translate(current(), $uppercase,
$lowercase), concat(translate(., $uppercase, $lowercase), ' '))])),
substring(., 0 div not($stop-words[starts-with(translate(current(),
$uppercase, $lowercase), concat(translate(., $uppercase, $lowercase), '
'))])))" />
<title><xsl:value-of select="." /></title>
</xsl:for-each>
</result>
</xsl:template>
(Honestly, it doesn't look that much clearer even when it *is* indented ;)
This works in SAXON, MSXML (July) and Xalan (with the exception of the
result-prefixes thing).
|