XSLT to generate xsl fo from HTML with varying namespace - html

I have a XSLT file which is used for html to xsl-fo conversion using fop engine.
It has templates for HTML elements as shown below
<pre>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="2.0">
<xsl:template match="html">
//handle html element
</xsl:template>
<xsl:template match="head/title">
//handle head/title elements
</xsl:template>
</xsl:stylesheet>
</pre>
I need to convert all kinds of HTML files provided as input to the processor.
HTML files without namespace are processed without any issues.
However, some HTML files have name space (<html xmlns="http://www.w3.org/1999/xhtml">)
in which case the fop processor is throwing exceptions.
What is the best way to handle this sort of cases?
Can i create some template which ,based on the local-name(), call the correct template?

My preference in this kind of situation is to normalize the input before doing anything else, in a separate pass. This could be done with a template rule something like this:
<xsl:template match="*">
<xsl:element name="lower-case(local-name())">
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>

You can do something like this:
<xsl:template match="*[local-name()='html' or local-name()='HTML']">
//handle html element
</xsl:template>
that will match html or HTML elements in any namespace.
Unfortunately HTML can have a LOT of variations, and also contains non-valid XML (e.g. tags that are not closed). If you want a real general solution you need a HTML parser.

Related

htmlToFo Transformer (an XSL file which should have logic to copy even style attributes of HTML to FO)

I am using javax.xml.transform.Transformer to transform XHTML to XSL-FO using an XSL parser. But the parser I have is not able to copy style attributes to XSL-FO.
So, can you help with valid parser which should be able to parse XHTML with style attributes to XSL-FO.
You can expand on this simple example, using a choose structure to eliminate/change the names to recognized XSL FO attributes. There are things in HTML like a style that starts-with "-moz" that likely has no meaning in XSL FO. Some others need adjustments made. You would also need to handle direct attributes (like #colspan or #rowspan) that are not in the #style attribute.
Given this simple input:
<p style="font-size:12pt; font-weight:bold; color: red">This is a sample</p>
Using this XSL:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="2.0">
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="p">
<fo:block>
<xsl:apply-templates select="#*"/>
<xsl:apply-templates/>
</fo:block>
</xsl:template>
<xsl:template match="#style">
<xsl:variable name="styleList" select="tokenize(.,';')"/>
<xsl:for-each select="$styleList">
<xsl:attribute name="{normalize-space(substring-before(.,':'))}">
<xsl:value-of select="normalize-space(substring-after(.,':'))"/>
</xsl:attribute>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
You get this output:
<fo:block xmlns:fo="http://www.w3.org/1999/XSL/Format" font-size="12pt" font-weight="bold" color="red">This is a sample</fo:block
You can see the tokenize() function split the #style attribute into parts and those parts are used to create appropriate XSL FO attributes.
The site using similar methods is linked below. It has a very complex XSL for handling inline #style to XSL FO attributes.
Cloudformatter CSStoPDF

How to transform XML with embedded HTML with embedded XML using XSLT?

I have an XSLT stylesheet which I'm using to transform XML to HTML for display. Some of the XML has HTML embedded in it, and I have an XSLT rule for this:
<xsl:template match='html|div|span|h1|h2|table'>
<xsl:copy-of select="."/>
</xsl:template>
My problem is that the output sometimes has XML inside an HTML table, and the rule above just gives me the raw untransformed XML inside the HTML, so my browser just ignores the unrecognised XML tags and displays only the text content of the XML. How can I make it transform the inner XML, but pass on all other HTML as-is?
I found a solution which involves copying HTML nodes but then applying templates to the content, using <xsl:copy> instead of <xsl:copy-of>.
My XML looked like this:
<foo>
<h2>Heading</h2>
<table>
<tr>
<td>
<bar>xyz</bar>
</td>
</tr>
</table>
</foo>
The XSLT rule I had above copied anything inside the HTML tags I expected to see inside the XML, but that copied the "<bar>xyz</bar>" construct, which the browser displayed as just "xyz".
The solution was to use <xsl:copy> instead of <xsl:copy-of>, to apply it to all unrecognised nodes, and to apply templates inside the node with the following rule:
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
This way, HTML nodes that are not specifically covered are copied, but their contents are processed by applying any applicable templates.

Concatenate two xml/html Files with XSLT

i'm trying to concatenate two Files. One of them XML and the other HTML. I'm probably making a stupid mistake, i'm not very familiar with handling XSLT.
I apply an XSL file to a XML file that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<REPORT>
<YASCA />
<AOSCAT />
</REPORT>
And this is what the XSL file looks like:
<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>
<xsl:template match="//YASCA">
<xsl:copy>
<xsl:copy-of select="document('abc.xml')"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//AOSCAT">
<xsl:copy>
<xsl:copy-of select="document('xyz.html')"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
And this is the error-message i get from AltovaXMLSpy after applying the XSLT and trying to save the created document:
XML Production Error: Character 'A' following the text '<' does not fulfill production 'Misc'.
It occurs at the point in the file where the first tag (the container for the contents of the XML file) ends and the second one (the container for the contents of the HTML file) starts.
</YASCA><AOSCAT>
I also tried different approaches of combining the files (some of them i found on stackoverflow), but none of them worked and this seems to be the most favorable since it's simple should does exactly what i want.
I hope i sufficiently explained my Problem and someone can help me.
Best regards
Marty
Well in general HTML is not XML and you won't be able to use document('file.html') sucessfully. But in your case it seems that operation works but you failed to ensure your root element is copied, so you end up with two top level elements in the result document which is then not XML as there needs to be a single root element. So add
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
in your XSLT and the result will be a well-formed XML document with a single root element.
That rather cryptic error message basically tells you that the markup following the first result element does not match the Misc production in the XML specification which only allows comments and/or processing instructions following the single allowed root element.

How do I inline the contents of an external HTML document using XSLT?

I have an XSLT that I input to a 3rd party application. This application displays the result of that XSLT as a web page in their application.
I have a dynamic HTML document that I want to display in that application. How can I "read" the HTML document via an XSLT document such that whenever the html document is updated, the XSLT will read the new file?
If I'm not being clear, to convey the idea, my xslt would read something like this:
<xsl:stylesheet>
<xsl:output method="html"/>
<xsl:template match="Something">
<!-- Stuff is done here -->
</xsl:template>
<xsl:ReadExternalDocument filePath="my/path/document.html" />
</xsl:stylesheet>
I've come across the Document() function, but it seems to destroy my tags. That is, I would like to include the child tags of the parent element in the output.
As Tomalak suggested, the document function is the way to go. I read in the external HTML document using the document() with the copy-of node. copy-of does a deep-copy, including tags, to obtain the whole external HTML document. The code looks like this:
<xsl:stylesheet ... >
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:copy-of select="document('ExternalDocument.html')" />
</xsl:template>
</xsl:stylesheet>

Transforming HTML with XSL and modifying form attributes

I would like to parse HTML document and replace action attribute of all the forms and add some hidden fields with XSL. Can someone show some examples of XSL that can do this?
What you need first is well formed HTML (at least transitional), although best recommended XHTML. Some XSLT processors could accept malformed HTML but it is not the rule.
To try the example below you can download this small Microsoft command line app.
Quick and dirty XSLT example for what you need (example-xslt.xsl):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="form[#action='foo']">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="action">non-foo</xsl:attribute>
<input type="hidden" name="my-hidden-prop" value="hide-foo-here"/>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
And the corresponding XML example (example.xml).
<?xml version ="1.0"?>
<?xml-stylesheet type="text/xsl" href="example-xslt.xsl"?>
<html>
<head></head>
<body>
<form action="foo">
</form>
<form action="other">
</form>
</body>
</html>
Thinking of gurin's answer: one possible XSLT-based pathway for HTML is to use tidy to convert it to XHTML, apply XSLT to the XHTML, but use xsl:output[#method="html"] to get HTML back out. The #doctype-system and #doctype-public attributes let you provide a doctype declaration in the output file as well.
I don't have any sample files for shahbhat, but the general approach is straightforward from an XSLT point of view: start with an identity transform and add in templates for the action attributes to override them in the way you want. To add hidden fields, I suspect the easiest way would be to create a template explicitly for the form element as an identity transform, but with additional elements inside it that are output as well. I think Fernando Miguélez has just posted an example.
You can start from this tutorial
But be aware that generally XSLT requires well-formed XML as input and HTML isn't always well-formed