xslt, spreadsheets and xslt
1. | Producing Excel spreadsheet from XML data |
Peter Flynn suggested output to CSV files and then importing into Excel. John Putman suggested using the http://www.hallogram.com/formulaone/ product or the new Excel 2002 product (as was one idea in my original note). Paul Clapham also suggested the CSV approach. As another approach he suggested producing HTML and then using the import function in Excel 2000. Thomas Passin also suggested producing HTML and then importing into Excel (97+). His second idea was to write script inside Excel to call the XML parser. My thoughts: I don't think that any non-Excel centric approach will work very well because of my "pretty" requirement. i.e. the Excel spreadsheet has been professionally designed and I need to retain all the fonts, colours and layout. Importing the CSV is going to give the data, but not all the fancy presentation. Similarly importing the HTML is going to give the data and perhaps most of the presentation, but not all of it perfectly. I have come to the realization that life will be easier if I take an Excel centric approach (which has nothing to do with XSL -- sorry to list readers, you are getting this note to close off this thread) either through Excel script or by calling Excel's COM interface (say from VB). Excel was not originally designed to separate data from presentation, so why fight it? My solution will probably be as follows: (1) no XSL transformation occurs on Unix (2) FTP the XML document to NT (3) write a VB program that traverses the XML and writes interesting bits of data into the right spots into the Excel (COM that calls MSXML3 and Excel's automation interfaces). Well, obviously support for <xsl:output method="ms-excel"/> is a long way off. But a common technique for producing Excel-compatible data without Excel-interfacing DLLs and the like is to produce a comma-separated values file. You would use method="text" and output one line for each row of the spreadsheet you are creating; the columns would be separated by commas, and non-numeric columns would be surrounded by quotes ("). If you put this in a file whose extension is ".csv", most Windows systems with Excel loaded will fire up Excel to import this when the file is double-clicked. Of course this is data only and doesn't allow you to specify fonts, formulas, and so on. Another option, if your users are up to Excel 2000, is to check out what Excel does when you say "Save As HTML". This option produces an HTML file that Excel can convert back to a spreadsheet. If you can duplicate the HTML format that Excel uses, then you would be able to produce an HTML file that Excel could read. (From what I see, reverse-engineering that format would be non-trivial -- and perhaps even illegal in the United States, from what I've heard -- but you might find documentation for it somewhere in Microsoft's web site.) | |
2. | XML data to MS-Excel |
You can use an XSLT stylesheet to transform the XML into one of the following data formats, all of which can be imported into an Excel spreadsheet: Tab delimited data - This works well if you know that none of the cell values will contain tab characters or end of line characters. However, the tab delimited data won't contain any information about fonts, colors, borders, formulas, etc. (In other words, the spreadsheets will be quite dull.) Comma Separated Values (CSV) - This works well even if some cell values contain commas and/or end of line characters, since you can enclose each cell value in quotes. However, if a cell value contains a quote character, then you must escape the quote char with an additional quote char. This is doable in XSLT, but it's not easy. Also, the CSV file won't contain any information about fonts, colors, borders, formulas, etc. HTML table - Excel 97+ can import (and export) an HTML table. Excel will honor the HTML attributes that control text size, color, etc. However, I believe that Excel will ignore any attributes that are defined in a CSS. The HTML can even contain some Microsoft-proprietary attributes that control number formatting. I don't think the HTML table can contain spreadsheet formulas. To learn more about importing HTML tables into Excel, you can export a variety of Excel spreadsheets to HTML and examine the resulting HTML. Office 2000's HTML+XML format - Excel 2000 can import (and export) HTML+XML documents. These HTML+XML documents contain XML islands, which encode spreadsheet information that can't be expressed in HTML. I've heard that Excel 2000 can convert between binary spreadsheets and HTML+XML spreadsheets without loss of information. To learn more about this, you might want to read the "Microsoft Office HTML and XML Reference", which is available at Microsoft. SYLK (Symbolic Link) - If some of your users have spreadsheet software that can't import/open an HTML table but that can open a SYLK file, then you can transform the XML into SYLK. SYLK is a text-based interchange format for spreadsheets; it supports formulas, borders, fonts, point sizes, etc. SYLK is supported by Excel and other spreadsheet packages. SYLK is the RTF of spreadsheets. The problem with SYLK is that it is not well documented. You can find some links and references to some terse SYLK documents in section 14 of the comp.apps.spreadsheets FAQ. You can also learn SYLK by creating a variety of simple spreadsheets, exporting each of them to a SYLK file and analyzing the resulting SYLK. Jeff Todd adds In addition to the other suggestions posted, you can also accomplish the conversion with a tool. iMaker for Excel from Infoteria will convert your XML to Excel or export Excel to XML. Single user license is $150. 30 day trial version is available from infoteria Reuel Alvarez adds Try this: It's the Formula One iReporting Engine, which can generate REAL Excel spreadsheets with different datasources, including XML and JDBC. It can also edit Excel spreadsheets via Java. | |
3. | Microsoft Excel to XML |
the problem of converting an Excel file into a DocBook file. After saving as XML, the rest of the process boiled down to a straightforward transform, with one sticky bit. As many of you know (because it's been on the list a number of times), Excel doesn't add Cell elements for empty table cells. Instead, it adds an Index attribute to the next Cell that has content. So, the resulting data looks something like this: <Row ss:AutoFitHeight="0"> <Cell><Data ss:Type="String">text</Data><NamedCell ss:Name="area_range"/></Cell> <Cell><Data ss:Type="String">Formating information in following row "group"</Data><NamedCell ss:Name="area_range"/></Cell> <Cell ss:Index="11" ss:StyleID="s21">And so on</Cell> </Row> I searched Google and the FAQ for ideas. Mike Kay and Joris Gillis had solutions, and thanks to both for that. However, their solutions didn't quite suit me. Also I always feel that I don't truly understand a problem until I develop my own solution to it. So, here's my take on how to convert Excel's odd XML output to something easier for XSL to process: <xsl:template match="ss:Row"> <row> <xsl:for-each select="xs:integer($count + 1) to xs:integer($max-cells)"> <entry/> </xsl:for-each> </xsl:otherwise> </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:for-each> </xsl:template> Note that a cell in a DocBook table is called an entry. Also, you'll need xmlns:xs="http://www.w3.org/2001/XMLSchema"; in the stylesheet element. MIke Kay adds I don't think your solution handles the ss:mergeAcross attribute, which can appear on a cell that spans more than one column. Instead of creating Cell elements to represent missing cells, I've tended to compute the column number of the cells that do exist and add it as an explicit attribute of the cell. Both techniques achieve the objective of making it easy to address a cell given its column number. This can be done using this kind of logic: <xsl:template match="Row"> <Row nr="{f:getRowNumber(.)}"> <xsl:for-each-group select="Cell" group-starting-with="Cell[@ss:Index]"> <xsl:variable name="start" select="(@ss:Index, 1)[1]" as="xs:integer"/> <xsl:for-each select="current-group()"> <xsl:variable name="mergeTotal" select="sum(current-group()[current() >> .]/@ss:MergeAcross)"/> <xsl:if test="Data"> <Cell col="{$start + $mergeTotal + position() - 1}"> <xsl:value-of select="Data"/> </Cell> </xsl:if> </xsl:for-each> </xsl:for-each-group> </Row> </xsl:template> The key to this is that it treats the problem as a positional grouping problem in which the group is a sequence of cells starting with one that has an ss:Index attribute. I think you could adapt the technique to creating empty cells by putting the logic to construct the empty cells at the start of the for-each-group body. |