accessing directories with xslt
1. | Accessing a directory full of xml files. |
This is a pure DOS / XML / XSLT way of creating an XML file containing directory listing. It's based on my earlier solution which didn't tolerate embedded spaces in filenames. Warning. If a file name contains the ampersand character it will fail to parse! If you need that, use a java filter to remove. For *nix users, According to the XML spec the line-parsing technique should be OS independent. So you'd change the batch file (and SED those pesky ampersands while you're at it), but no change to the XML file, and all you need to do to the XSL file is swap the '\' for a '/'...:) (a perl script is added to the end of this answer) In other words the processing-a-line-separated-file technique should be portable without change, and the specific utility should be fairly easily transportable. The solution now takes a line-separated text file and processes it into an XML file. Doing this requires two uses of XML entities, firstly a system entity to read the text file into the content of an XML element; and secondly a character entity to access the acii 10 linefeed character to parse that content. For anyone unfamiliar with system entities, run the xmlDir.bat, then see the difference between looking at xmlDir.xml in a text processor and in an xml processor like IE5. Ta-da... I was never very fond of XML entities so this was a useful exercise for me, I hope it helps others too. 1. The batch file @echo off cd > xmlDir.lst dir *.xml /b >> xmlDir.lst saxon xmlDir.xml xmlDir.xsl > xmlFiles.xml Note that the last line needs changing to call your own saxon processor (not the java version) 2. The xml file <?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE xmlDir [ <!ENTITY xmlDirList SYSTEM "xmlDir.lst"> ]> <xmlDir>&xmlDirList;</xmlDir> Note that this won't work until you have created the entity by running the batch file, and saving it in a location where the xml file can access it. 3. The xsl file <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- root function --> <xsl:template match="/xmlDir"> <!-- create our root output element --> <xmlDir> <!-- the path is on the first line, filenames on the others --> <xsl:call-template name="file"> <!-- all the CR-LF pairs have been normalised to ascii 10, as specified in http://www.w3.org/TR/1998/REC-xml-19980210#sec-line-ends --> <xsl:with-param name="path" select="substring-before(string(), ' ')" /> <xsl:with-param name="flist" select="substring-after(string(), ' ')" /> </xsl:call-template> </xmlDir> </xsl:template> <!-- process the individual files in the space-separated file list --> <xsl:template name="file"> <xsl:param name="path" /> <xsl:param name="flist" /> <xsl:if test="$flist != ''"> <!-- output the path and the first filename as one element --> <xmlFile><xsl:value-of select = "concat($path, '\', substring-before($flist, ' '))" /></xmlFile> <!-- now recurse with same path and rest of the filenames --> <xsl:call-template name="file"> <xsl:with-param name="path" select="$path" /> <xsl:with-param name="flist" select="substring-after($flist, ' ')" /> </xsl:call-template> </xsl:if> </xsl:template> </xsl:stylesheet> It was pointed out that this solution ignores any ampersands in the file names. The following java program addresses this. filename="EscapeAmps.java" import java.io.*; public class EscapeAmps { static final int amp = '&'; public static void main(String[] args) { int b; DataInputStream stdIn = new DataInputStream(System.in); try { while((-1)!=(b=stdIn.read())) { switch(b) { case amp: { System.out.print("&"); break; } default: { System.out.print((char)b); break; } }//switch (b) }//while(!eof()) } catch(Exception e) { return; } return; } } for perl users... from Beckers, Marc A self-documenting Perl script that will do what you want for HTML files. All you need to do is edit the DOS dir call (line 23) to read in files with the XML suffix rather than HTM*. You must have Perl installed, of course. The output file is called mother.xml, contains the mother root element with each path in a file element. The path names are relative to the working directory. You can get the absolute path by deleting line 42. # Everything after a hash is a comment # This perl script scans for all HTM or HTML files # in or under the current directory # and creates an XML file that records # where the file is located. The file name is placed # in a "file" element". # The outer element of the XML file is "mother". # This version 2000-02-09, Chris Bradley # NOTE: This is WIndows NT-specific. # It uses a the DOS "dir" command to create a # temporary file "temp.xml" that contains the list of HTM and HTML files. # The temp.xml file is also used to # contain the name of the current working directory. # A UNIX solution would have to use another solution # Get the current working directory # (will be removed later from all input lines of # the directory listing) system ("cd > temp.xml"); open (inputfile, "temp.xml"); $a=<inputfile>; close (inputfile); $curdirnamelen = length ($a); #print "Length of b is ", $curdirnamelen; # Here's the DOS "dir" call that traverses # the tree and stores into "temp.xml" # or wherever you want it. system ("dir /b/s *.xml* > temp.xml"); # Now open the file just created, # and use it as input to create the new "mother" XML file open (inputfile, "temp.xml"); # Open the "mother" output file for WRITE operations, # and call it "mother.xml" in the current directory: open (outputfile, ">mother.xml"); # Start the mother document with the opening "mother" tag: print outputfile "<mother>"; # Now scan the file that contains all the filenames, # using a "while" loop: while ($a=<inputfile>) { # The variable "$a" contains # the current input line from temp.xml. chomp($a); # removes the line feed at the end #of the input line (technical detail) # remove the current working directory from the path name $a = substr ($a, $curdirnamelen, length ($a)-1); # Put opening and closing "file" tags round the current line print outputfile "\n <file>$a</file>" }; # Now output the closing "mother" tag. print outputfile "\n</mother>"; # Exit cleanly by closing any open files: close (inputfile); close (outputfile); # Congratulations, you're through ! | |
2. | List directory contents into xml |
I wrote a javascript command line utility. to list the directory contents, in xml. It just creates the following format <?xml version="1.0"?> <folder name="temp" dirroot="c:\\temp"> <folder name="temp"> <file name="temp.tmp" /> </folder> <file name="temp.tmp" /> </folder> |