Andrew Welch
You can process directories of XML using the collection() function,
and keep memory usage constant by using the Saxon extension
saxon:discard-document()
<xsl:for-each select="for $x in
collection('file:///c:/xmlDir?select=*.xml;recurse=yes;on-error=ignore')
return saxon:discard-document($x)">
You have to be careful that Saxon doesn't optimize out the call to
saxon:discard-document() - this basic outer xsl:for-each works well
and has become boilerplate code for whenever I start a new report.
This technique allows you to do things that would otherwise not be
feasible with XSLT, and would take longer in another language. For
example finding, grouping and sorting all links in your collection of
XML files. Coding the XSLT takes minutes and running it takes time
proportional to your dataset size, but the restriction of system
memory has gone.
David Carlisle adds
> in what way do you use the collection() function?
collection is good for collections where you don't know in advance which
documents will be there for example saxon lets you go
collection('foo?select=*.xml') to pick up all xml files in a directory.
It's also likely to be what's used to map to xml databases and the like
I would expect. <xsl:variable name="files" select=
"collection('file:///sgml/?select=*.xml;on-error=warning') "/>
<doc>
<xsl:copy-of select="$files/*"/>
</doc>
<xsl:for-each select="for $f in
collection('file:///sgml/?on-error=warning;select=c*.xml') return $f">
<file name="{document-uri(.)}">
<xsl:copy-of select="."/>
</file>
</xsl:for-each>
|