xslt performance
1. | XSLT benchmark | ||||||||||||||||||||||||||||
DataPower announces XSLTmark, a XSLT benchmark and a small compliance testing suite. Version 1.1.0 is available now and is the first release to the general public. There are about 40 different testcases in this release (see documentation for descriptions and several third-party credits). A variety of java and C/C++ processors are supported, and drivers for other XSLT engines are easy to add. Source and makefiles are being released (with an emphasis on Linux X86, although Win32 X86 and Solaris SPARC are also supported, and other platforms should be fairly straightforward). For more information, please see: DataPower We are also making available some initial benchmark results for several popular and well-regarded XSLT processors. We welcome comments, benchmark results submissions and new test drivers for other XSLT processors. (See list for drivers that already exist). XSLTMark -- First XSLT Benchmark DataPower's XSLTMark is the first comprehensive benchmark for measuring the performance of XSL processors. It can be used to test the XSLT performance of XSL processors for XML-to-XML and XML-to-HTML transformations. It also provides basic compliance testing to ensure that benchmark results are not distorted by incorrectly functioning processors. The benchmark is a java application that uses a "Driver" class to communicate with the XSL processor under test. Both java and native (C/C++) processors are supported, with driver modules available for many popular XSLT engines on a variety of platforms. XSLTMark is currently being used for performance and compliance testing at DataPower, but also has a core suite of tests to yield benchmark figures for external comparison purposes. Features:
System Requirements:
Availability: Version 1.1.0 available for evaluation download Kevin Jones adds (July 2003) We (www.sarvega.com) also maintain XSLT and XML Parser benchmarks. The XSLT results are for late 2002 although I believe an update is being prepared at the moment which should be ready soon. Ednote: The tests are undated (as of Feb 2005) | |||||||||||||||||||||||||||||
2. | Benchmark for XPath - Update | ||||||||||||||||||||||||||||
I updated the XPath benchmark (science.uva.nl) according to some of the comments I have received from Saxon people. In particular, the major changes are as follows: 1) Query Q21, which was problematic because of the format of the output, has been slightly modified. 2) Whitespaces that are not expected in the query result have been removed from the results of the queries. The format of the result of each query is now the same you obtain running Saxon with the serialization parameter '!indent=no' (and without wrapping). 3) A version of the benchmark including a DTD has been added. In order to declare in the DTD only the elements of the benchmark, the benchmark documents and the query results have been encapsulated into a CDATA section. Moreover, the DTD for the XMark target document has been included. 4) The technical report describing the benchmark has been updated with sections about the evaluation methodology. In particular, several performance indexes to evaluated an XML engine have been proposed. My next task is to evaluate Galax and Saxon according to such an evaluation strategy. Any comment is always welcome. Massimo University of Amsterdam | |||||||||||||||||||||||||||||
3. | Eight tips for how to use XSLT efficiently: | ||||||||||||||||||||||||||||
Eight tips for how to write efficient XSLT:
Sebastian updated this, July 2000 I have spent some time making a more rigorous set of XSLT performance statistics, which are visible at http://users.ox.ac.uk/~rahtz/xsltest/Report.html I should say that I don't entirely believe some of what I did. I'll repeat it tonight. The issue of file cacheing is a worry. Anyway, all the files I used are on the web, if anyone with some patience wants to try it, or test other processors. The file Test.pl is a (horrid) Perl script that does the work. Some trends are fairly obvious in this sort of `number-crunching' XSLT:
On *my* computer setup, Oracle (with Sun JDK) and Sablotron are memory hogs which effectively kill the machine while running on a decent-sized file. I know Oracle can do much better if other setups. None of this surprised me. My biggest disappointment was Sablotron, which had impressed me earlier; the 'apply-templates select="//*' really upset it! I know these tests are not of much interest to the web servlet brigade, but for those of us who want to run radical transforms on big documents, it may help. The next step must be to see what happens when different XML parsers are used with each processor. I just ran a test to see how different XSLT processors coped with reading a 3mbyte XML file and going over the tree twice: (All processors are current at July 2000) Table 1. Comparison
the interesting results here are Oracle has some real problem in working well for me :-} Sablotron is slower than Saxon, despite being compiled C++ bizarrely, Xalan produced the results even though it (rightly) said my XML file was not valid against the DTD. Is this to be expected? Scott Boag answers for Xalan There just validation warnings I think... the XML file is still well formed, and can be processed. We just happen to not stop if a validation error occurs. We just happen to not specifically stop if a validation warning occurs. I think you can override this behavior. In retrospect, it might be more correct to stop processing by default if a validation warning occurs. | |||||||||||||||||||||||||||||
4. | Improving Performance | ||||||||||||||||||||||||||||
The best practices are the same as with any other technology:
Most of the advice for XSLT is also fairly obvious:
At the coding level:
Some tips:
Obviously the details are processor-dependent. In particular, different processors do different optimisations. | |||||||||||||||||||||||||||||
5. | Predicate or if clause? | ||||||||||||||||||||||||||||
I just stumbled onto a subtle (at least to me) difference between these two nominally equivalent forms: <xsl:template match="foo[util:is_applicable()]"> and <xsl:template match="foo"> <xsl:if test="util:is_applicable()"> Which is that in the first case all *inapplicable* foo elements fall through to the default template, which if there's no explicit template for "foo", means that the content of foo will likely flow to the output, therefore failing to suppress inapplicable foo elements. Doh! Given that, it suggests that putting the check in the match= value is the least attractive as it requires at least a single separate template with a lower priority to catch all elements that fail the applicability check, while doing the check at select time ensures that only applicable elements will be processed at all. | |||||||||||||||||||||||||||||
6. | Match or match then test? | ||||||||||||||||||||||||||||
I just stumbled onto a subtle (at least to me) difference between these two nominally equivalent forms: <xsl:template match="foo[util:is_applicable()]"> and <xsl:template match="foo"> <xsl:if test="util:is_applicable()"> Which is that in the first case all *inapplicable* foo elements fall through to the default template, which if there's no explicit template for "foo", means that the content of foo will likely flow to the output, therefore failing to suppress inapplicable foo elements. Doh! Given that, it suggests that putting the check in the match= value is the least attractive as it requires at least a single separate template with a lower priority to catch all elements that fail the applicability check, while doing the check at select time ensures that only applicable elements will be processed at all. | |||||||||||||||||||||||||||||
7. | Performance hints | ||||||||||||||||||||||||||||
generic types; - always deliver any text based type thing from a server with gzip-encoding; apache has a module for doing this and instructions on how to define which mime-types/ext to deliver thusly...you will be amazed with the results. - if you must do client side transforms refactor your browser specific javascripts to manage transformations instead of loading stylesheets with an XML PI calling a stylesheet: there are more mechanisms in javascript for pre-loading or caching stylesheets (ex. http://www.perfectxml.com/articles/xml/XSLTInMSXML.asp?pg=2)...this of course complicates matters. - if you are interested in server side xslt performance, both in pre-publishing and dynamic server side xslt processing you can investigate compiled stylesheets...most of these technologies just convert your stylesheet into a java object (translets I think they were called...XSLTC being built into xalan) - pre publish as much as possible on the server to its final format....if things need to change, determine if the change is really dynamic or lets say you could schedule publishing every 15 minutes? - there are hardware appliances with XSLT processing built in mind these days, for all but the most serious situations I would imagine - like it or not, XSLT may not be the right tool for every publishing job....investigate refactoring using other techniques like SAX or perhaps your XML could be refactored to generate a more appropriate structure from source, designing out xslt transform steps. - make use of the nice timing mechanisms in SAXON to measure what parts of your stylesheet are slowing things down also depending on your xml size you can choose which tree model SAXON uses...which can have significant effects on speed. a few perf related tips at the XSL FAQ; this site a recent perf article for .NET...very informative if this is your env fawcette.com if you live in a java app serv env I found this article somewhat useful sys-con.com there is lots you can do to simplify your XSLT...though with no examples it hard to suggest anything. ps: add RAM/ more Processors/ get faster hard drives and use any other filesystem other then that provided by microsoft |