A brief introduction to XSLT.
1. | YMCA? | |||||||||||||||
I can't think of a better introduction than this - DaveP Joe Kesselman kindly forwarded this, Crossposted from the newsgroup rec.music.filk with the author's permission;
| ||||||||||||||||
2. | Is XSLT hard? | |||||||||||||||
Is that the reputation it has? I thought it had a rather different reputation, being easily the most successful of the W3C specified languages post XML, and one of the more widely distributed programming languages ever. > If I were in an IT department or company that hired various web page Odd. I find most people pick up the basic template driven style of xslt fairly quickly, It is true that people with more programming experience find some bits different/strange but that's because they just try to program in fortran with xslt syntax which doesn't really work, however people who haven't got preconceived ideas about programming normally pick up the programming aspects of xslt naturally as well (that being the point of the functional programming style, that it's more natural) Michael Kay adds These statements aren't contradictory. If XSLT weren't so successful, it wouldn't have any reputation at all, it would just be ignored like 99% of the other programming languages that have been invented. It's got a reputation for being challenging because people see the learning curve that's ahead of them and they know they can't just ignore the challenge, they have to face up to it. I went up this learning curve myself about 4-5 years ago. I didn't find it easy. I never do find new concepts easy. I struggled when I first learnt SQL, when I learnt goto-less programming, when I learnt object-oriented programming - I even remember struggling the first time I had to understand subroutines. But each time, I've got to the top of the hill and never looked back. It's worth the climb. Wendell adds The bottom line is, a great deal depends, as when learning anything new, on your attitude going in. If you dread the prospect, it'll be hard. If you expect to have fun learning how to get things done with a powerful new instrument, it'll be easy. It's in the context of this awareness that I get bugged by all the claims of how hard XSLT is: that only becomes a self-fulfilling prophecy, for those unwary enough to believe it. | ||||||||||||||||
3. | Basic XSLT process | |||||||||||||||
In a previous life I wrote this: xml.com. This article explains the XML/XSLT process occurring either at the server side or the client side. If you want to read more, just go to my personal site and read some of my articles I wrote about the XML technologies. Didier PH Martin didier-martin.com | ||||||||||||||||
4. | Terminology | |||||||||||||||
Yes. It bears some resemblance to the language under discussion (xslt) If we talked about "finding" nodes we'd have to say which xslt instructions found things, and which did other things, whereas teh xslt instructions that select have the syntax select-... the analogy with English is closer as well with select. for an Xpath like "/" you don't really need to "search" or "find" the root node of your document, you know where it is at the start, and it is handed to you as the initial node of the transform, however before you do anything with it, you need to select it with select="/" or specify code with a template that matches it with match="/".
That's the problem with making up your own terminoligy on the fly: you have to decide such things and then explain them to others before they know what you are talking about.
XML documents have elements and attributes. XSLT/Xpath data is a tree of nodes that more or less correspond to the original document. It is sometimes helpful to elide the distinction and sometimes helpful to stress it. Similar;y sometimes its helpful to stress "element node" and sometimes its more convenient to talk of elements. Sometimes (but rarely in an XSLT context it's convenient to talk of elementtags, which is something else again. > Sorry, that's an American thing--getting thrown a curve ball... Yes I could guess (although I have not seen the expression before) But that was really my point. You are making assertions about the use of unfamiliar and confusing terminology which I just don't think hold up. When I looked at your message, just about the only terms in it that I was sure I understood were the Xpath expressions. If you are learning a new language then the terms are unfamiliar but that doesn't mean the terms are wrong and should be changed. > What I'm saying is that it's may be easier to describe an XPath from a But it is a fact that the design of the language was explictly designed to be reminiscent of the unix and windows file path. We could not tell people that but we can't change history.
Much of the terminology comes from family trees; child:: following-sibling:: ancestor:: etc, however the syntax /a/b/* is just lifted straight from the shell file path syntax. > By incidental, I mean that if one looks at the XPath as "not a path" Yes this is true (If you mean what I think you mean) A single Xpath expression (as it selects a set of nodes, not just one) can represent many actual paths (branches) in the document tree. > Eh? Five types of what? See what I mean, now you've confused me again! not types of anything, just types. Xpath has the types number node boolean string XSLT adds result tree fragment, so that's 5 in all (plus one special object type for encapsulating any non standard extensions that a system might have) > Because we are "selecting and matching" I find it hard to understand If you start off with a document <x/> and transform it with <xsl:template match="x"> <y/> </xsl:template> then the first document <x/> is the input document and the result document <y/> is the result document. Exactly how the system does I/O is out of scope for the spec itself, but every system gives you some way of getting hold of the result, otherwise what's the point? > Can you see my point about "rewriting" the way for the "next wave"? No. It seems you found an example where someone used a term not closely aligned with the terms in the actual specification which meant that without beeing given more context it's hard to be sure what exactly was meant. However you are arguing that technical terms be used less not more. > Is XSL a "declarative" language? Roughly speaking it means that you declare what you want the result to be, not specify exactly how what the machine does to do it. "imperative" languags consist of instructions executed sequentially (mostly) by a machine.
Xpath is a search function, itjust searches (and selects) nodes in a tree or trees, XSLt is a programming language, you can express any algorithm in xslt that you can express in any other language. | ||||||||||||||||
5. | At what processing phase does this happen? | |||||||||||||||
This arose from the many questions that people ask on the list, to which the answer is that this is at time X, not time Y. Whilst the details may not be 100% correct, they hopefully provide some indication of what takes place when during the processing of an XML instance by an XSLT engine using an XSLT stylesheet. Phase 0, XML parse of stylesheet and input document: Build the internal model of the stylesheet (including URI resolution if needed) by parsing the stylesheet including any included or imported stylesheets. Build the internal model of the source document (if present) Action document properties such as xml:space, xml:base etc See Note 1. Phase 1, XSLT transformation. Apply the stylesheet to the input document (if present). URI and entity resolution for any doc() and document() calls from the stylesheet. Parse additional entities such as the result of document() calls, Output may be produced at the implementors discretion output production infers Apply applicable document properties such as xml:space Apply appropriate character encodings Apply any character maps (XSLT 2.0) (See note 2) Apply Disable Output Escaping. Actual output target could be DOM Document or SAX stream or Serialised to a Disk file A feed to another transformation stage. d-o-e impacts the serializer that writes the output to a file, not the processor that generates the output (the "result tree"). A reason it is optional is that serializers are optional, and if an implementor builds an engine with no expectation that the result tree will be written out to a file, how is it to be handled? Note 1. (DC.) This may include some or all of the following in an appropriately interleaved order:
Note 2. The above three are only applicable when serializing using the XML, XHTML or HTML output method. | ||||||||||||||||
6. | Basic terminology | |||||||||||||||
I've inserted this qanda here simply because I like it! DaveP. > I am a beginner on XSLT. I read some documents and am not Stop thinking about tags for a moment. There are different ways of looking at XML. Yes, if you take the XML spec at its word, an element "is" the element start tag to the element end tag, and the element's content is everything in between. However, you will have an easier time dealing with XPath and XSLT if you try not to relate these terms directly to the text of the raw document and its syntax. Instead, think of the text of the XML document, tags and all, as being instructions on how to build a hierarchical data structure -- a tree of "nodes". Nodes are something that exist in an abstract (imaginary, implied) universe. Just think of them as "things"; little containers of information. In the world of XPath/XSLT, the nodes are given relatively simple relationships to each other to form a hierarchical tree. Consider this tree:
The document element is a "child" of the root node. The text node and 2nd element node are children of the document element node. The root node is an "ancestor" of all the nodes. The element node is the "parent" of the attribute nodes, but in a twist of XPath and DOM weirdness, the attributes are not children of the element they apply to. You can see that an element is just one of several types of node. Now consider that the node tree can be serialized into a linear syntax consisting of certain sequences Unicode characters: <stuff>hello<crap foo1="bar1" foo2="bar2"/></stuff> These 'character' things are still a bit abstract, since computers need to ultimately deal with them as bit patterns (0's and 1's). So the characters are mapped ("encoded") into sequences of bits & bytes according to a character-to-bit-pattern map that goes by a cryptic name like iso-8859-1 or utf-8 or one of a bazillion others. Once encoded as bits (or bytes or whatever the most convenient level abstraction is for you), your data is ready for storage and/or transmission in copper and silicon. And that... is XML. It is up to you to figure out how to best arrange your data. Typically you use elements as named containers for other elements and/or runs of character data that become text nodes in the XPath/XSLT tree model. Attributes are name-value pairs that are attached to elements. When to use attributes and when to use elements is a matter for XML Zen 101. You're off to a good start. You will also need to understand namespaces, what an XML parser does, and the XSLT processing model. Buy Michael Kay's XSLT Reference tome and pore over the introductory chapters. > 3) Element node Not necessarily wrong, but you should be thinking about trees, not tags, or you will be burned hard by XSLT. Check out what I just posted at http://skew.org/xml/stylesheets/treeview/html/ ... compare the sample_input.xml (view it in a text editor, not Internet Explorer, so you see the line breaks) and the sample_output.html. Then download an XSLT processor like Instant Saxon or msxml.exe, and start applying tree-view.xsl to your own XML documents. > 4) descendant::* The descendant and child axes do not include attribute nodes. | ||||||||||||||||
7. | When to use XSLT | |||||||||||||||
A big question, but a good one. Trouble is, it needs a big answer, and I don't have time to give one. It depends a little bit on the task you are performing, and the alternative approaches you are considering. Once you are over the learning curve, I think that an XSLT solution to most XML transformation tasks (the tasks for which the language was designed) will be shorter than solutions in other languages: that generally means they will be faster to develop and easier to maintain. A particular benefit is that if you use the rule-based programming model for XSLT, the stylesheet will be very robust in the face of changes to the schema of the source documents it is designed to process. Don't believe arguments that XSLT is slow. For most straightforward transformations, the execution time is dominated by the time to parse the input and serialize the output, and that will be the same whatever language you use. For more complex transformations, very large input documents, etc, it's probably true that a good Java programmer could produce a faster solution: but could they produce it faster? One shouldn't discount arguments based on skills and experience. Most people solving a programming problem will produce a better solution using poor tools that they know well rather than good tools that are new to them. If you're in a hurry, use the tools you know well. If you're taking a long-term view, don't attach much weight to the experience of your first few days. |
This section is for the newbie. It covers some background, the why and what is XSLT, what it can do, tools and where you can find them, some simple examples, access to reading material in books and on the web. I hope its enought to both satisfy an initial curiosity and get you started. DaveP
Origins, what and why.
The first working draft of the W3C was produced in August 1998 and in November it became a formal recommendation of that organisation. Its companion recommendation XPATH is similarly dated.
Mike Kay defines XSLT as a language for transforming the structure of an XML document.
Why? Well XML by itself isn't up to much. Only by transforming it can its power be released. XSLT provides that facility. It makes it possible to seperate content and structure from presentation. It makes it possible to match the output form of one process to the input form of another process or application.
What can it do?
The most common usage I'm aware of is producing human readable forms of XML source. This capability extends from producing plain ASCII text through to professionally formatted print documents in PDF and postscript, and includes HTML, XHTML, WML on the way.
What it can't do? Well, the recommendation itself tells us its not a general purpose transformation language. This tends to mean that its not good at some things. It works hard manipulating basic character content. It can't access system or language resources without assistance, and it can't deal with binary files. That aside, it is pretty competant in a remarkably large area, and with extensions is extremely powerful. The price of using extensions is a tie to a particular processor or language. It is supremely good at what its meant for, XML transformation.
What is it?
XSLT and XPATH are realised via a combination of an XSLT processor and stylesheets. A stylesheet is a well formed XML document itself which specifies how the output is 'styled' with respect to the input document, and works in conjuction with an XSLT processor to produce output which is usually another XML document, although it could be a plain text document.
By tools I mean the things necessary to work with XSLT. Editors are a nicety, but do help due the necessity of creating a valid XML document using the unusual syntax of XSLT. An XSLT processor is the second requirement.
Since an XSLT stylesheet is an XML document, its no harder than creating an XML document. Any text editor will suffice. A number of people have come up with editors which assist in the creation of XSLT stylesheets. The ones I'm aware of I've listed by their announcements on the XSLT list in a seperate document.
There are a number of implementations of the XSLT and XPATH recommendations available. These applications take in the XML source, the stylesheet, written in XSLT, and produce an XML output. Most are free of charge, but have varying licence conditions attached. Again these are the ones I know of, and are available in a seperate document.
For those of you using Microsofts Operating Systems, please be aware that as of Feb 2001, there are complications using Internet Explorer
Rather than try and describe them, I suggest a visit to an unofficial Microsoft site which addresses them. Jonathan Marsh, from Microsoft, offer the following advice re compliance. “Please read the Microsoft documentation on what is supported, and in which release. Until complete XSLT support is available in all our products it is important to check the deployment details before making any assumptions about support.” and suggests you visit the Microsoft site for further information.
This section is written to give you a very brief introduction to transformations using XSLT, with some very simple examples, to outline two approaches, to mention some design patterns that have emerged and to then leave you with other reading, both in the form of on line reading, books and other websites where you might learn more.
This is the Hello World of XSLT. A simple example that runs in all the processors I've met. To make it even simpler, I've not even made use of a source document. All the content is held in the stylesheet.
<HTML xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> <P>Welcome to the World of XSLT</P> </BODY> </HTML>
With one exception this looks like any other HTML or XHTML document. The namespace xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" tacked onto the outer element of the document lets the processor know that this is a stylsheet document. The impact of this is that the processor looks for elements in that namespace, i.e. which look like <xsl:....> and does something special with them. Since this document has none, that basic processor rule is never used. The second rule for processors is that anything which is not in that special namespace is passed straight through to the output, which in this case is everything!
If you run this through your favourite XSLT processor you will have created a simple stylesheet that produces HTML! For example using James Clarks XT processor from the command line, assuming the file above is called helloWorld.xsl, and I want the otuput to be hello.html, then the command
xt helloWorld.xsl helloWorld.xsl hello.html
Would produce a small HTML file just as you would expect.
To usefully use the content of an input file, rather than a name which isn't used (the first parameter to xt), such as the one below doc.xml,
<?xml version="1.0" ?> <doc> <head>A document title</head> <para>My very first transformed paragraph</para> </doc>
using a modified version of the one above, I can produce output which has both literal stuff from the stylesheet and input from the source document. The stylesheet then changes to
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/doc"> <html> <head><title>Test Document</title></head> <body> <xsl:apply-templates/> <i>Some additional content from the stylesheet</i> </body> </html> </xsl:template> <xsl:template match="head"> <h1><xsl:apply-templates/></h1> </xsl:template> <xsl:template match="para"> <p><xsl:apply-templates/></p> </xsl:template> </xsl:stylesheet>
which is the basis of a useful stylesheet. Process this using xt as
xt doc.xml helloWorld.xsl output.html
to obtain a useful html document.
These two forms of stylesheet approach, which are not mutually exclusive, are discussed at some lengthelsewhere
If you are still interested, intrigued even, then perhaps some additional reading might be of interest.
This page has a list of websites which cover XSLT.
This page has links to examples of using XSLT
This page lists the books that explicitly cover XSLT.
Once you are familiar with XSLT, then perhaps a reference card is all you need? Mulberrytechhttp://www.mulberrytech.com/quickref/ offers a PDF version which I find very useful
Mulberrytech has hosted a discussion list with the sole subject of XSLT and XSL for the last four years at least, for which thanks Tommie. Newbies and gurus alike are more than welcome. The questions raised vary from absolute beginner to quite advanced. I've found it to be extremely friendly (just so long as you are polite), and helpful, for which my thanks to Mulberrytech. If you get stuck, or are just interested in whats happening in the world of XSLT, then this is the place to be. Joining instructions and all the other information about the list is available on their site.
Other sites which provide stylesheet examples etc.
XSLT Cookbook from Paul Prescod
The XSLT Cookbook is a new project based on a very successful experiment that ActiveState and O'Reilly did called the Python Cookbook. The idea of an online Cookbook is to get people to contribute "recipes" that other people could then take and use in their programs. In the case of the XSLT Cookbook, we are of course talking about XSLT snippets to be used in stylesheets and transformations. See activestate.com
A Cookbook is not a FAQ because it only deals with snippets of code and discussions around them. It doesn't talk about implementation issues or deep language semantics or anything other than snippets of code. Unlike a FAQ, a Cookbook is completely community run. The "editor" just cleans up around the edges. People from the community submit recipes without editor supervision and the community can add commentary, ratings and alternatives. Using this buzzward-compliant distributed, peer-to-peer, web-services strategy, the Python world has collected almost 200 recipes and these recipes contribute to Python discussion lists and Python culture. I hope the same will occur for XSLT. It really depends on whether the community decides to use it or not.
Note that a Cookbook is also very different than a collection of code in a library such as EXSLT or the XSLT Standard Library. The nice thing about a library is that you directly plug in using import/include. People who maintain these libraries often get submissions of code that cannot really be turned into a straight-forward, reusable set of templates because they are more *ideas* or *patterns* than concrete reusable code. If you can package up some XSLT code as a library, great: you should do that. A Cookbook is for the stuff that cannot be so nicely packaged. XPath expressions are a perfect example.
I've discussed this with Steve Ball of the XSLT Standard Library and he sees the projects as complimentary. I certainly hope that some of the recipes will build on the code libraries out there: "This recipe shows how to use EXSLT to do X". Cookbook recipes can also be discussed in comments and rated by end-users.
Right now the XSLT Cookbook is very small because the XSLT community has not yet been invited to start building it. Consider this an invitation! We need recipes in all categories and we may even add categories as we get recipes (e.g. we'd love to fill in SVG and FO categories).
We are working hard to integrate the XSLT Cookbook with the next version of our Komodo XML/XSLT/Python/Perl/PHP development environment. It will soon be possible to submit and download cookbook recipes right from within Komodo. When you combine the XSLT Cookbook with the free educational license for Komodo you have a really excellent environment for teaching or learning XSLT. In the longer term we will also add this feature to our Visual XSLT environment, probably after Visual Studio.NET ships.
When entering any new domain of interest, one of the biggest pitfalls is vocabulary. XSLT is no different. To ease this I've added a page addressing this topic. It attempts to cover most of the esotericisms of XSLT.
I've been adding to this site now since early in 1999, so its a case of grew like topsy. I won't apologise for its content, but I can understand how it can be a little frustrating when you can't find what you want. I'll try to explain my logic in its organisation.
At the faq root page a number of files are to be found. These includes
A copy of the Mulberrytech list guidelines |
Special character handling, which has proved a problem with many XML and XSLT users |
The major subdivisions of the faq (see below) |
external questions, which I interpret to be things people are asking about but which are not strictly a part of XSLT. This includes such things as Java, javascript, emacs etc. |
Extension issues, i.e. extensions to XSLT |
Printing. This is a gentle introduction to XSL-FO |
The major subdivisions of the faq are:
Where to start, which contains this introduction and a few other files (badly in need of tidying up :-), such as the terminology and a file addressing one of the most common pitfalls of both XML and XSLT, handling special characters.
XSLT Questions and Answers, which constitutes the main part of this site, nearly 4 megabytes of questions and answers gleaned from the Mulberrytech lists. This links through to a further index which lists out the various topics.
FO Questions, which are questions and answers on the use of XSL-FO, the print oriented side of XSL.
Just out of interest, yes, its all generated from XML, using XSLT. I use the docbook DTD and Norm Walsh's excellent docbook stylesheets, adapted for use on web sites.
> I've been doing
>some background research on XSLT as a programming language.
Though it is Turing Complete, I tell my students to not regard XSLT as a programming language, but as a templating language. The paradigm (I feel) is "transformation by example" not "transformation by imperative program code".
The stylesheet writer's objective is to supply the processor enough examples (in templates) for the processor to assemble the final resulting node tree out of the nodes from the tree of the XML stylesheet and the nodes of the tree from the XML source file.
>Now I have
>grasped the concepts of declarative programming, but there's one important
>question left: what is the advantages of having XSLT expressed in XML
syntax?
Given my description above, one is representing nodes from a node tree, not statements of a programming language.
What better hierarchical syntax is there for a node tree than XML?
>Even if we like XML, it's a quite verbose syntax for programming.
I *hate* XSLT as an imperative programming language for this very reason.
I *love* XSLT as a templating language because the objective for the stylesheet writer is to express the nodes that are to be copied to the result tree and the instructions that act on the source tree. And XML does that very very well.
This brought the response:
I agree that anyone coming from an *imperative* programming background needs to relearn a few things. But saying XSLT is 'a templating language' surely risks confusion with interpretive technologies like JSP, ASP etc? That association also needs to be avoided, or you will find people trying select="x/$var/y" and such.
XSLT is a *functional* programming language.
Note that XSLT does not give the stylesheet writer control over pedestrian concerns such as the syntactic representation of the result tree ... the result is being processed "downstream" by another markup processor, so the syntax is unimportant and irrelevant.
If your needs are such that you have concerns for result markup, then choose another paradigm than node tree building and copying and use an imperative language to get what you want.
REMEMBER: XSLT is not designed for manipulating angle brackets ... it is designed for manipulating node trees (that were created from and are created for XML documents with angle brackets).
>I mean keep the concepts of declarative programming, just change
>the syntax. Would such work be fruitful and welcome?
Not by me ... I can't think of any better way to represent node trees than with XML.
> what is the advantages of having XSLT expressed in XML syntax?
The main advantage is that it makes the original use case syntactically very simple. Most of (simple) XSLT stylesheets are template bodies which are fragments of the desired output.
In other languages you have to quote the XML fragments as strings which gets very horrible very quickly.
The other example is that the language, being xml is able to be processed by xslt programs. This is in fact very common, one can derive an xslt stylesheet as the result of an xslt transform.
it means that XSLT re-uses all the lexical apparatus of XML, such as entities, encodings, etc
it's useful when large parts of a stylesheet are basically boilerplate content to be added to the result document
it allows transformations to take stylesheets as their input and/or output. This is not as exotic as it seems, it's something that many "big" XSLT applications do.
> Even if we like XML, it's a quite verbose syntax for
> programming. Have there been any attempts on making an
> abbreviated form of XSLT
Yes, there have. For example see pault.com
But verbosity is not necessarily a bad thing in programming. The proportion of development time spent actually typing code is tiny.
Functional Programming
> for someone not used to it and coming in with
> the understanding that XSLT was being promoted as a *functional*
> language, seeing it being used in what appears to be a non-functional
> (i.e., procedural) way would *seem* not natural. that's all
>
The term "functional" applies to the language rather than to any particular program written in the language. In fact unlike say some lisps or standad ML, XSLT is a rather pure declarative language with essentially no imperative instructions.
You can fill your lisp with imperative setq statements if you wish, but in XSLT it's just not possible to write procedural code. In particular xsl:for-each and xsl:template both fall very naturally in the functional paradigm, they just happen to use XML syntax rather than f(x) syntax.
>
> and yet, in the very next paragraph, he writes, "Instead of
> looping, XSLT uses iteration and recursion." excuse me but,
> where i grew up, iteration is just another word for looping.
> and looping sure seems to have a procedural programming history.
In any loop construct something needs to change otherwise you'll loop for ever. In a C for loop or fortran DO loop etc what gets changed is the value of some variable so the whole construct requires the imperative/procedural notion of a variable whose value may be changed. Loops in functional languages are different, you just iterate some function over all the elements in some structure, and you end when you've done them all. Just as the previous kind of loop is syntactic shorthand for an assignment, an if test and a goto, this kind of loop is shorthand for recursing over the structure of the object, It is perfectly natural in a functional language.