Jeni Tennison
> I have an XML journal source, and an XML list of acronyms. I wish to
>automatically replace any occurrence of an acronym within the XML source,
>with the appropriate <acronym title="blah">acronym</acronym> tag. It's easy
>to replace one acronym, using simple XSL recursive find/replace, but when I
>try to do more than one, I hit multiple difficulties.
OK. First off, you need to design the parameters for the template. The
things that matter are the list of acronyms that you want to be replaced and
the text in which you want to replace them:
<xsl:template name="replace-acronyms">
<xsl:param name="acronyms"
select="document('../xml/acronyms.xml')/acronyms/acronym" />
<xsl:param name="text" />
...
</xsl:template>
The first tests to make are the stopping conditions: if $text is empty, then
the template shouldn't generate anything; if $acronyms is empty, then the
template should just return $text:
<xsl:choose>
<xsl:when test="not($acronyms)">
<xsl:value-of select="$text" />
</xsl:when>
<xsl:when test="not(string($text))" />
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
Now we've confirmed that we actually have some text to process and some
acronyms to replace within it, we'll set about our first task: to replace
the first occurrence in $text of the first acronym in $acronyms. We'll store
the first acronym that we want to find in $acronyms in a variable called
$acronym:
<xsl:variable name="acronym" select="$acronyms[1]/@acronym" />
Note that I'm assuming the <acronym> elements are of the form:
<acronym acronym="XML">Extensible Markup Language</acronym>
What we do then depends on whether $acronym appears in $text or not:
<xsl:choose>
<xsl:when test="contains($text, $acronym)">
...
</xsl:when>
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
If $acronym *doesn't* appear in $text, then we want to call the template
again on the unadjusted text, with $acronyms this time set to the *rest* of
the acronyms (all but the first):
<xsl:otherwise>
<xsl:call-template name="replace-acronyms">
<xsl:with-param name="text" select="$text" />
<xsl:with-param name="acronyms"
select="$acronyms[position() > 1]" />
</xsl:call-template>
</xsl:otherwise>
If $acronym *does* appear in text, then we need to break the text into two
parts: the part before $acronym and the part after $acronym:
<xsl:variable name="before"
select="substring-before($text, $acronym)" />
<xsl:variable name="after"
select="substring-after($text, $acronym)" />
Now, we know that $acronym doesn't appear in $before (because $before is, by
definition, the text before the first occurrence of $acronym), but $before
might contain other acronyms. So we need to call the template on $before
with the 'rest' of the acronyms:
<xsl:call-template name="replace-acronyms">
<xsl:with-param name="text" select="$before" />
<xsl:with-param name="acronyms"
select="$acronyms[position() > 1]" />
</xsl:call-template>
Then we need to generate the <acronym> element. The title attribute needs to
hold the value of the first <acronym> element in $acronyms, and the value of
the <acronym> element is the acronym $acronym itself:
<acronym title="{$acronyms[1]}">
<xsl:value-of select="$acronym" />
</acronym>
Then we need to do something with $after. Now, $after could contain $acronym
again, so the recursive call needs to pass *all* the $acronyms through to
the text call:
<xsl:call-template name="replace-acronyms">
<xsl:with-param name="text" select="$after" />
<xsl:with-param name="acronyms" select="$acronyms" />
</xsl:call-template>
And there we have it. The complete template looks like:
<xsl:template name="replace-acronyms">
<xsl:param name="acronyms"
select="document('../xml/acronyms.xml')/acronyms/acronym" />
<xsl:param name="text" />
<xsl:choose>
<xsl:when test="not($acronyms)">
<xsl:value-of select="$text" />
</xsl:when>
<xsl:when test="not(string($text))" />
<xsl:otherwise>
<xsl:variable name="acronym" select="$acronyms[1]/@acronym" />
<xsl:choose>
<xsl:when test="contains($text, $acronym)">
<xsl:variable name="before"
select="substring-before($text, $acronym)" />
<xsl:variable name="after"
select="substring-after($text, $acronym)" />
<xsl:call-template name="replace-acronyms">
<xsl:with-param name="text" select="$before" />
<xsl:with-param name="acronyms"
select="$acronyms[position() > 1]" />
</xsl:call-template>
<acronym title="{$acronyms[1]}">
<xsl:value-of select="$acronym" />
</acronym>
<xsl:call-template name="replace-acronyms">
<xsl:with-param name="text" select="$after" />
<xsl:with-param name="acronyms" select="$acronyms" />
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="replace-acronyms">
<xsl:with-param name="text" select="$text" />
<xsl:with-param name="acronyms"
select="$acronyms[position() > 1]" />
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
So, the key points here are:
1. work through the acronyms using recursion rather than iteration 2.
recurse on the before and after portions of the text 3. treat generated XML
as XML rather than as a string (so don't use
disable-output-escaping to create it)
To complete this email, I'll just mention that in XSLT 2.0, you can use
<xsl:analyze-string> to do this. Something along the lines of:
<xsl:variable name="acronyms" as="element(acronym)+"
select="document('../xml/acronyms.xml')/acronyms/acronym" />
<xsl:variable name="acronym-regex" as="xs:string"
select="string-join($acronyms/@acronym, '|')" />
<xsl:analyze-string select="$text" regex="{$acronym-regex}">
<xsl:matching-substring>
<xsl:variable name="acronym" as="xs:string" select="." />
<acronym title="{$acronyms[@acronym = $acronym]}">
<xsl:value-of select="$acronym" />
</acronym>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string> |