Dimitre Novatchev  
> I'm looking at graph processing problems as a testbed for this, and 
> came across a problem that I haven't been able to solve elegantly. The 
> problem is to find "linker" vertexes that a pair of verteces from a 
> pre-defined set. For example, if the graph verteces represent cities 
> and edges represent flights between them, then given a list of cities, 
> find all intermediate cities that you would stop in via a "connecting 
> flight". 
> 
> For example, given the following simple graph: 
> 
> V1 -> V2 -> V3 -> V4 
>         \<- V5 ->/ 
> 
> (V5 points to both V2 and V4), and its XML serialization: 
> 
>  <graph> 
>    <vertex id="V1"/> 
>    <vertex id="V2" type="anchor"/> 
>    <vertex id="V3"/> 
>    <vertex id="V4" type="anchor"/> 
>    <vertex id="V5"/> 
>    <edge source="V1" target="V2"/> 
>    <edge source="V2" target="V3"/> 
>    <edge source="V3" target="V4"/> 
>    <edge source="V5" target="V2"/> 
>    <edge source="V5" target="V4"/> 
> </graph> 
> 
> I would like to transform this into a second graph where all vertexes 
> that "link" two anchor distinct vertexes are flagged as link nodes. In 
> this case, there are two anchor vertexes V2 and V4, and I can link 
> them through V3 (V2 -> V3 -> V4) and V5 (V2 <- V5 -> V4). Note that 
> linked verteces must be distinct, so traversing the V2 <- V1 -> V2 
> path should not yield V1 as a link node. So I'd like to see something 
> like this: 
> 
> <graph> 
>    <vertex id="V1"/> 
>    <vertex id="V2" type="anchor"/> 
>    <vertex id="V3" linker="true"/> 
>    <vertex id="V4" type="anchor"/> 
>    <vertex id="V5" linker="true"/> 
>    <edge source="V1" target="V2"/> 
>    <edge source="V2" target="V3"/> 
>    <edge source="V3" target="V4"/> 
>    <edge source="V5" target="V2"/> 
>    <edge source="V5" target="V4"/> 
> </graph> 
> 
> It would be ideal to come up with a generalized solution that would 
> let you use 1, 2, .. N intermediate linking nodes.  
 
Here's the general solution and it is quite simple and
straightforward. I use a recursive template based on the following
rule:   Paths(/.., X, Z) = Union( {X -> Xi} . Paths(X, Xi, Z) )
Where the function Paths(Eset, X, Z) returns all paths from X to Z,
that do not include any vertices belonging to the exclusion-set Eset. 
Here {X -> Xi} are all arcs from X. 
Informally,
    {X -> Xi} . Paths(X, Xi, Z) 
is the Cartesian product of the arc {X -> Xi} with the set of paths Paths(X,
Xi, Z), giving a new set of paths. 
As you can see the code of the transformation is 71 lines long, but it
should be noted that for the purpose of readability I write some XPath
expressions on several lines and also almost half of these 71 lines are just
closing tags. 
Here's the code. This transformation: <xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:exsl="http://exslt.org/common"
 exclude-result-prefixes="exsl"
 >
  <xsl:output omit-xml-declaration="yes" indent="yes"/>
  <xsl:key name="kNeighbors" match="vertex"
  use="../edge[@target = current()/@id]/@source"/>
  <xsl:key name="kNeighbors" match="vertex"
  use="../edge[@source = current()/@id]/@target"/>
  <xsl:template match="/">
    <xsl:call-template name="getPaths">
      <xsl:with-param name="pNode1"
       select="/*/vertex[@type='anchor'][1]"/>
      <xsl:with-param name="pNode2"
       select="/*/vertex[@type='anchor'][2]"/>
    </xsl:call-template>
  </xsl:template>
  <xsl:template name="getPaths">
    <xsl:param name="pNode1" select="/.."/>
    <xsl:param name="pNode2" select="/.."/>
    <xsl:param name="pExcluded" select="/.."/>
    <xsl:for-each select=
           "key('kNeighbors', $pNode1/@id)
                       [not(@id = $pExcluded/@id)]">
      <xsl:choose>
        <xsl:when test="@id = $pNode2/@id">
          <path>
            <xsl:copy-of
             select="/*/edge[$pNode1/@id = @*
                           and
                             $pNode2/@id = @*
                            ]"/>
          </path>
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="vrtfTail">
            <xsl:call-template name="getPaths">
              <xsl:with-param name="pNode1"
                              select="."/>
              <xsl:with-param name="pNode2"
                              select="$pNode2"/>
              <xsl:with-param name="pExcluded"
                        select="$pExcluded | $pNode1"/>
            </xsl:call-template>
          </xsl:variable>
          <xsl:variable name="vTail"
           select="exsl:node-set($vrtfTail)/*"/>
           <xsl:if test="$vTail">
             <path>
               <xsl:copy-of
                  select="/*/edge[$pNode1/@id = @*
                                and
                                  current()/@id = @*
                                  ]"/>
               <xsl:copy-of select="$vTail/*"/>
             </path>
           </xsl:if>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>
when applied on this source.xml: <graph>
  <vertex id="V1"/>
  <vertex id="V2" type="anchor"/>
  <vertex id="V3"/>
  <vertex id="V4" type="anchor"/>
  <vertex id="V5"/>
  <edge source="V1" target="V2"/>
  <edge source="V1" target="V3"/>
  <edge source="V2" target="V3"/>
  <edge source="V3" target="V4"/>
  <edge source="V5" target="V2"/>
  <edge source="V5" target="V4"/>
</graph>
 produces the wanted result: <path>
   <edge source="V1" target="V2"/>
   <edge source="V1" target="V3"/>
   <edge source="V3" target="V4"/>
</path>
<path>
   <edge source="V2" target="V3"/>
   <edge source="V3" target="V4"/>
</path>
<path>
   <edge source="V5" target="V2"/>
   <edge source="V5" target="V4"/>
</path>
 
These are all three different paths (with all nodes in the path only
once) from V2 to V4 in the graph described by the xml above. The first solution: is 3 edges long. 
The other two are two edges long each (Note that I added to your original
graph structure a new arc from V1 to V3 in order to make it more "general"). 
I hope this helped in this specific problem and also to answer
positively your questions about the expressiveness of XPath/XSLT and
the appropriateness of XSLT as a tool for solving this type of
problems. 
Sjoerd Visscher adds > It would be ideal to come up with a generalized solution that would 
>  let you use 1, 2, .. N intermediate linking nodes. I've been able to 
> get this working with nested loops, but it isn't particularly 
>  declarative or speedy, and is certainly more verbose than I'd like, 
>  so I'm wondering if anyone here has insights into how to do this 
>  elegantly and in XSLT/XPath style. For example, is it possible to 
>  write a single XPath expression that will select <vertex> 
>  elements that obey the above criteria? If not, does anyone have any 
>  suggestions for how to code this effectively and efficiently with 
>  XSLT? 
 The following XSL transformation does what you want: <?xml version="1.0"?>
<xsl:transform version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="a" match="/graph/vertex[@type='anchor']" use="@id" />
<xsl:key name="e1" match="/graph/edge/@target" use="../@source" />
<xsl:key name="e2" match="/graph/edge/@source" use="../@target" />
<!-- identity template -->
<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()" />
  </xsl:copy>
</xsl:template>
<xsl:template match="vertex">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()" />
    <!-- more than one edge to an anchor vertex? -->
    <xsl:if test="count(key('a',key('e1',@id)|key('e2',@id)))>1">
      <xsl:attribute name="linker">true</xsl:attribute>
    </xsl:if>
  </xsl:copy>
</xsl:template>
</xsl:transform>
I've used the key function here as both a look-up and as a filter shortcut. The neat thing about the key function is that it returns a set of distinct nodes. F.e. if you'd have two edges from V1 to V2, the expression key('e1', @id) returns 2 nodes, but when put through the key function again to find the anchor vertices, there's only one result: the V2 vertex. Extending this to a generalized solution is still hard though...  |