Processing Embedded HTML

1. Parsing HTML as XML

1.

Parsing HTML as XML

David Carlisle


> I have a text file containing malformed HTML text I want to import as 
> CDATA into a template, to be called from my stylesheet.

Using this stylesheet,

If you are using XSLT2 (ie saxon8) you could do

..

<xsl:import href="http://www.dcarlisle.demon.co.uk/htmlparse.xsl"/>

...

<xsl:template match="...">
...
<xsl:copy-of
select="d:htmlparse(unparsed-text('file-with-html-in-it.html','ISO-8859-1'))
"/>

...
</xsl:template>


</xsl:stylesheet>

For XSLT1 you really need to convert your file to well formed xml first tidy or tagsoup could do this (see google for URLs for these)