Wendell Piez
>I'm developing a stylesheet that converts XML to html to display
>research articles. The articles contains three citation types,
>bibliographical, table call, and figure call. Upon encountering a table
>call or figure call, I would like to display the table or figure
>referred to immediately following the paragraph that contains the call.
>I want the table or figure to appear in the order they were referred to
>in the paragraph and I want each table or figure to only appear once in
>the outputted document. Tables and figures are numbered in order of
>their reference, though at any point you can refer to
>a table or figure that has been previously called.
>
>Citations look like this:
><xref ref-type="bibr" rid="B1">1</xref> <xref ref-type="table"
>rid="T1">Table 1</xref> <xref ref-type="fig" rid="F1">Figure 1</xref>
>
>Sample Input:
>
>[A paragraph that includes a citation for Table 1.] [A paragraph that
>includes citations for Table 2, Table 1, Figure 1, and Table 3.]
>
>Sample Output:
>
>[A paragraph that includes a citation for Table 1.]
>
>Table 1
>
>[A paragraph that includes citations for Table 2, Table 1, Figure 1,
>and Table 3.]
>
>Table 2
>Figure 1
>Table 3
>
> My initial thought is to create a set of keys:
>Key: Last Table Processed
>Key: Last Figure Processed
>Key: Last Table Encountered
>Key: Last Figure Encountered
>
>Since the tables and figures are numbered in order, a comparison of the
>two keys should be in order. This comparison should be made at the end
>of processing a paragraph. However, I'm not quite sure how I'd make
>such a comparison or even if I can use keys in that manner. I'm
>thinking I might need to generate some sort of array to keep track of
>the multiple citations encountered so that in the sample provided the
>output is (Table 2, Figure 1, Table 3) and not (Table 2, Table 3,
>Figure 1) or (Figure 1, Table 2, Table 3). If I were to build an array,
>since at this point I don't need to process <xref> citations of "bibr"
>type, those should be ignored. Any suggestions would be greatly appreciated.
This problem is a bit difficult, not because of any inherent difficulty in any of the methods you will use, but because it's both complex, and will require using a couple of XSLT 1.0 tricks. Disentangling it shows a way forward.
You actually have several problems here: * Assigning each table or figure its correct number (based on order of citation, not
order of appearance in the source) * Citing the tables or figures where xrefs appear in line, each with its correct number * Placing the tables and figures each after the paragraph where it's first cited, and
not elsewhere
You will use keys for this, though not perhaps in exactly the way you're imagining. Likewise, it'd be nice if we could construct an array, and in XSLT 2.0 we could (or at least the functional equivalent thereof), but in XSLT 1.0 we can't. So we have to fake a couple of things. As you'll see, this faking may potentially get us into a bit of trouble with performance.
The usual XSLT 1.0 approach when this happens is to split a problem into two or more passes, which generally gives us opportunities to optimize for efficiency. In order to simplify this explanation I'm going to assume you have only tables. Figures will work just the same:
Assigning each table or figure its number ... we could either do this by counting the references (filtering out for repeated references) or by counting the tables, sorted by their first reference. While the latter would be nice, the former is easier in XSLT 1.0. We do this first by giving us a means to filter out the repeats:
<xsl:key name="tablerefs-by-rid" match="xref[@ref-type='table']" use="@rid"/> Given $rid, we can then get all the references to any table by calling
key('tablerefs-by-rid', $rid)
and the first one only by calling
key('tablerefs-by-rid', $rid)[1]
In addition, we can get all the first references by saying, e.g.
//xref[@ref-type='table'][count(.|key('tablerefs-by-rid', @rid)[1])=1]
This XPath traverses the entire document from the root, collecting all xrefs that are the first reference to their table. If they aren't to a table, the first predicate filters them out. If they are not a first reference, the count of their union with the first reference will be 2 not 1, and the second predicate will filter them out. This uses an XPath 1.0 idiom (the count() trick) to test node identity. The
generate-id() function is also sometimes used for this, so this would also
work:
//xref[generate-id()=generate-id(key('tablerefs-by-rid', @rid)[1])]
Notice here we don't need the first predicate (since xrefs not to tables will also be thrown out by the predicate given -- so you may prefer this.
It would be very convenient to have all these particular nodes collected together so we don't have to collect them over and over (an expensive traversal). So:
<xsl:variable name="first-table-refs"
select="//xref[generate-id()=generate-id(key('tablerefs-by-rid',
@rid)[1])]"/> (This is awfully close to an array, isn't it?)
Consequently, we can also get the proper number for any given xref[@ref-type='table'] with the expression
count($first-table-refs
[count(.|current()/preceding::xref) = count(current()/preceding::xref)]) + 1
which looks, and is, awfully obnoxious and expensive (using the costly
preceding:: axis twice), but which can be optimized slightly as a template
call: <xsl:template match="xref" mode="assign-table-number">
<xsl:for-each select="key('tablerefs-by-rid', @rid)[1]">
<!-- switching context to the first reference
to this reference's table -->
<xsl:variable name="preceding-refs" select="preceding::xref"/>
<xsl:value-of select="count($first-table-refs
[count(.|$preceding-refs) = count($preceding-refs)]) + 1"/>
<!-- counting the first table references before this one,
and adding 1 -->
</xsl:for-each>
</xsl:template> I wish this were easier, but in XSLT 1.0 it just isn't. In 2.0, it is (and maybe Mike or Jeni or someone will show us how). But it does solve problem 1, and you can see how any given xref[@ref-type='table'] can call <xsl:apply-templates select="."
mode="assign-table-number"/> and get its number, thereby solving problem 2.
Problem 3 is a matter of selecting, after you create a paragraph, those references in it that are first references to their targets (tables, figures, what not), which again you can do (in the case of tables) using this same idiom:
<xsl:template match="para">
<p>
<xsl:apply-templates/>
</p>
<xsl:apply-templates mode="get-target"
select=".//xref[generate-id()=generate-id(key('tablerefs-by-rid',
@rid)[1])]"/>
<!-- do the same with any other keys for xrefs you have, e.g. to figures, perhaps
unifying the select -->
</xsl:template>
To actually get the target you're going to need another key:
<xsl:key name="target-by-rid" match="table|figure" use="@id"/>
and then
<xsl:template match="xref" mode="get-target">
<xsl:apply-templates select="key('target-by-rid', @rid)"
mode="show"/>
</xsl:template>
which will go apply templates to the table, figure or whatever. Note I've put this call also in a special mode, "show", enabling you to say in the default mode
<xsl:template match="table|figure"/> so the tables only come out where you actually want them.
Whew! not bad for a bit of work, eh?
This should work fine for input at the scale of most human-readable documents. For higher performance (that numbering is a beast), you'll want to split out an analytic/sorting pass before processing, or get out the big rotary saw (XSLT 2.0).
Note: I just typed this up, and haven't tested, but I have used such code and it works. Beware particularly of missing parentheses in my XPaths, etc. |