Transforming HTML with XSL and modifying form attributes - html

I would like to parse HTML document and replace action attribute of all the forms and add some hidden fields with XSL. Can someone show some examples of XSL that can do this?

What you need first is well formed HTML (at least transitional), although best recommended XHTML. Some XSLT processors could accept malformed HTML but it is not the rule.
To try the example below you can download this small Microsoft command line app.
Quick and dirty XSLT example for what you need (example-xslt.xsl):
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="form[#action='foo']">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:attribute name="action">non-foo</xsl:attribute>
<input type="hidden" name="my-hidden-prop" value="hide-foo-here"/>
<xsl:apply-templates select="*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
And the corresponding XML example (example.xml).
<?xml version ="1.0"?>
<?xml-stylesheet type="text/xsl" href="example-xslt.xsl"?>
<html>
<head></head>
<body>
<form action="foo">
</form>
<form action="other">
</form>
</body>
</html>

Thinking of gurin's answer: one possible XSLT-based pathway for HTML is to use tidy to convert it to XHTML, apply XSLT to the XHTML, but use xsl:output[#method="html"] to get HTML back out. The #doctype-system and #doctype-public attributes let you provide a doctype declaration in the output file as well.
I don't have any sample files for shahbhat, but the general approach is straightforward from an XSLT point of view: start with an identity transform and add in templates for the action attributes to override them in the way you want. To add hidden fields, I suspect the easiest way would be to create a template explicitly for the form element as an identity transform, but with additional elements inside it that are output as well. I think Fernando Miguélez has just posted an example.

You can start from this tutorial
But be aware that generally XSLT requires well-formed XML as input and HTML isn't always well-formed

Related

How to transform XML with embedded HTML with embedded XML using XSLT?

I have an XSLT stylesheet which I'm using to transform XML to HTML for display. Some of the XML has HTML embedded in it, and I have an XSLT rule for this:
<xsl:template match='html|div|span|h1|h2|table'>
<xsl:copy-of select="."/>
</xsl:template>
My problem is that the output sometimes has XML inside an HTML table, and the rule above just gives me the raw untransformed XML inside the HTML, so my browser just ignores the unrecognised XML tags and displays only the text content of the XML. How can I make it transform the inner XML, but pass on all other HTML as-is?
I found a solution which involves copying HTML nodes but then applying templates to the content, using <xsl:copy> instead of <xsl:copy-of>.
My XML looked like this:
<foo>
<h2>Heading</h2>
<table>
<tr>
<td>
<bar>xyz</bar>
</td>
</tr>
</table>
</foo>
The XSLT rule I had above copied anything inside the HTML tags I expected to see inside the XML, but that copied the "<bar>xyz</bar>" construct, which the browser displayed as just "xyz".
The solution was to use <xsl:copy> instead of <xsl:copy-of>, to apply it to all unrecognised nodes, and to apply templates inside the node with the following rule:
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
This way, HTML nodes that are not specifically covered are copied, but their contents are processed by applying any applicable templates.

Concatenate two xml/html Files with XSLT

i'm trying to concatenate two Files. One of them XML and the other HTML. I'm probably making a stupid mistake, i'm not very familiar with handling XSLT.
I apply an XSL file to a XML file that looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<REPORT>
<YASCA />
<AOSCAT />
</REPORT>
And this is what the XSL file looks like:
<?xml version="1.0"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml"/>
<xsl:template match="//YASCA">
<xsl:copy>
<xsl:copy-of select="document('abc.xml')"/>
</xsl:copy>
</xsl:template>
<xsl:template match="//AOSCAT">
<xsl:copy>
<xsl:copy-of select="document('xyz.html')"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
And this is the error-message i get from AltovaXMLSpy after applying the XSLT and trying to save the created document:
XML Production Error: Character 'A' following the text '<' does not fulfill production 'Misc'.
It occurs at the point in the file where the first tag (the container for the contents of the XML file) ends and the second one (the container for the contents of the HTML file) starts.
</YASCA><AOSCAT>
I also tried different approaches of combining the files (some of them i found on stackoverflow), but none of them worked and this seems to be the most favorable since it's simple should does exactly what i want.
I hope i sufficiently explained my Problem and someone can help me.
Best regards
Marty
Well in general HTML is not XML and you won't be able to use document('file.html') sucessfully. But in your case it seems that operation works but you failed to ensure your root element is copied, so you end up with two top level elements in the result document which is then not XML as there needs to be a single root element. So add
<xsl:template match="/*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
in your XSLT and the result will be a well-formed XML document with a single root element.
That rather cryptic error message basically tells you that the markup following the first result element does not match the Misc production in the XML specification which only allows comments and/or processing instructions following the single allowed root element.

XSLT transform removes HTML elements from mixed-content

Is it possible for XSLT preserve anchors and other embedded HTML tags within XML?
Background: I am trying to convert an HTML document into XML with an XSL stylesheet using XSLT. The original HTML document had content interspersed with anchor tags (e.g. Some hyperlinks here and there). I've copied that content into my XML, but the XSLT output lacks anchor tags.
Example XML:
<?xml version="1.0" ?>
<observations>
<observation>Hyperlinks disappear.</observation>
</observations>
Example XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/html">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:value-of select="observation"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Output:
<html xmlns="http://www.w3.org/1999/html">
<body>Hyperlinks disappear.</body>
</html>
I've read a few similar articles on stackoverflow and checked out the Identity transform page on wikipedia; I started to get some interesting results using xsl:copy-of, but I don't understand enough about XSLT to get all of the words and tags embedded within each XML element to appear in the resulting HTML. Any help would be appreciated.
Write a separate template to match a elements, copy their attributes and content.
What is wrong with your approach? In your code,
<xsl:value-of select="observation"/>
simply sends to the output the string value of the observation element. Its string value is the concatenation of all text nodes it contains. But you need not only the text nodes in it, but also the a elements themselves.
The default behaviour of an XSLT processor is to "skip" element nodes, because of a built-in template. So, if you do not mention a in a template match, it is simply ignored and only its text content is output.
Stylesheet
Note: This stylesheet still relies on the default behaviour of the XSLT processor to some extent. The order of events will resemble the following:
The template where match="/observations" is matched. It adds html
and body to the output. Then, a template rule must be found for the
content of observations. A built-in template matches observation,
does nothing with it, and looks for a template to process its content.
For the a element, the corresponding template is matched, with
copies the element and attributes. Finally, a built-in template copies
the text nodes inside observation and a.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="a">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
XML Output
<html>
<body>Hyperlinks disappear.
</body>
</html>

XSLT to generate xsl fo from HTML with varying namespace

I have a XSLT file which is used for html to xsl-fo conversion using fop engine.
It has templates for HTML elements as shown below
<pre>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format"
version="2.0">
<xsl:template match="html">
//handle html element
</xsl:template>
<xsl:template match="head/title">
//handle head/title elements
</xsl:template>
</xsl:stylesheet>
</pre>
I need to convert all kinds of HTML files provided as input to the processor.
HTML files without namespace are processed without any issues.
However, some HTML files have name space (<html xmlns="http://www.w3.org/1999/xhtml">)
in which case the fop processor is throwing exceptions.
What is the best way to handle this sort of cases?
Can i create some template which ,based on the local-name(), call the correct template?
My preference in this kind of situation is to normalize the input before doing anything else, in a separate pass. This could be done with a template rule something like this:
<xsl:template match="*">
<xsl:element name="lower-case(local-name())">
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
You can do something like this:
<xsl:template match="*[local-name()='html' or local-name()='HTML']">
//handle html element
</xsl:template>
that will match html or HTML elements in any namespace.
Unfortunately HTML can have a LOT of variations, and also contains non-valid XML (e.g. tags that are not closed). If you want a real general solution you need a HTML parser.

How do I inline the contents of an external HTML document using XSLT?

I have an XSLT that I input to a 3rd party application. This application displays the result of that XSLT as a web page in their application.
I have a dynamic HTML document that I want to display in that application. How can I "read" the HTML document via an XSLT document such that whenever the html document is updated, the XSLT will read the new file?
If I'm not being clear, to convey the idea, my xslt would read something like this:
<xsl:stylesheet>
<xsl:output method="html"/>
<xsl:template match="Something">
<!-- Stuff is done here -->
</xsl:template>
<xsl:ReadExternalDocument filePath="my/path/document.html" />
</xsl:stylesheet>
I've come across the Document() function, but it seems to destroy my tags. That is, I would like to include the child tags of the parent element in the output.
As Tomalak suggested, the document function is the way to go. I read in the external HTML document using the document() with the copy-of node. copy-of does a deep-copy, including tags, to obtain the whole external HTML document. The code looks like this:
<xsl:stylesheet ... >
<xsl:output method="html"/>
<xsl:template match="/">
<xsl:copy-of select="document('ExternalDocument.html')" />
</xsl:template>
</xsl:stylesheet>