XSLT transform removes HTML elements from mixed-content - html

Is it possible for XSLT preserve anchors and other embedded HTML tags within XML?
Background: I am trying to convert an HTML document into XML with an XSL stylesheet using XSLT. The original HTML document had content interspersed with anchor tags (e.g. Some hyperlinks here and there). I've copied that content into my XML, but the XSLT output lacks anchor tags.
Example XML:
<?xml version="1.0" ?>
<observations>
<observation>Hyperlinks disappear.</observation>
</observations>
Example XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/html">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:value-of select="observation"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Output:
<html xmlns="http://www.w3.org/1999/html">
<body>Hyperlinks disappear.</body>
</html>
I've read a few similar articles on stackoverflow and checked out the Identity transform page on wikipedia; I started to get some interesting results using xsl:copy-of, but I don't understand enough about XSLT to get all of the words and tags embedded within each XML element to appear in the resulting HTML. Any help would be appreciated.

Write a separate template to match a elements, copy their attributes and content.
What is wrong with your approach? In your code,
<xsl:value-of select="observation"/>
simply sends to the output the string value of the observation element. Its string value is the concatenation of all text nodes it contains. But you need not only the text nodes in it, but also the a elements themselves.
The default behaviour of an XSLT processor is to "skip" element nodes, because of a built-in template. So, if you do not mention a in a template match, it is simply ignored and only its text content is output.
Stylesheet
Note: This stylesheet still relies on the default behaviour of the XSLT processor to some extent. The order of events will resemble the following:
The template where match="/observations" is matched. It adds html
and body to the output. Then, a template rule must be found for the
content of observations. A built-in template matches observation,
does nothing with it, and looks for a template to process its content.
For the a element, the corresponding template is matched, with
copies the element and attributes. Finally, a built-in template copies
the text nodes inside observation and a.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="a">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
XML Output
<html>
<body>Hyperlinks disappear.
</body>
</html>

Related

XML to XSL transformation

I am new to XML. I am trying to convert the following XMl file
<?xml version="1.0"?>
<parent original_id="OI123" id="I123">
<custompanel panel="cp"></custompanel>
</parent>
into the following HTML
<html>
<body><div xmlAttribute="{"original-id":"OI123","id":"I123"}">
<p xmlAttribute={"panel":"cp"}/>
</div>
</body>
</html>
XML tag <parent> should be converted to <div> and <custompanel> should be converted to <p> tag.
I have read the XSTL documentation from W3CSchool but still I am not exactly sure how to approach the problem.Can anyone help me with it?
The custom attribute needs to be stored in xmlAttribute as JSONObject.
After a quick research of the correct syntax I came up with this.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output
method="xml"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
indent="yes"
encoding="utf-8" />
<xsl:template match="parent">
<html>
<body>
<div xmlAttribute="{{'original-id':'{#original_id}','id':'{#id}'}}">
<xsl:apply-templates />
</div>
</body>
</html>
</xsl:template>
<xsl:template match="custompanel">
<p xmlAttribute="{{'panel':'{#panel}'}}" />
</xsl:template>
</xsl:stylesheet>
The tricky part is espacing the {} for the JSON, which we build ourselves. You need two curly braces {{ to have a literal one. Also you need to use single quotes ' inside the attributes as double quotes would be escaped to ". You can access attributes with the #foo selector, but now you need to use actual {} to make the processor recognize it should do something.
I guess that your actual file has more than one <parent>. In that case you need to have a root element around it, and you need to adjust the XSLT. Add another <xsl:template match="/"> and move the HTML frame there.

How to get a value from a JavaScript in XSLT?

I have the following XML and XSLT to transform to HTML.
XML
<?xml version="1.0" encoding="UTF-8"?>
<root>
<te>t1</te>
</root>
XSLT
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html" indent="yes" />
<xsl:template match="root">
<html>
<div>
<xsl:variable name="name1" select="te" />
**
<xsl:value-of select="CtrlList['$name1']" />
**
</div>
<script language="javascript">var List={
"t1":"test"
}</script>
</html>
</xsl:template>
</xsl:stylesheet>
So my objective is get the value of "te" from the XML and map it with the JavaScript object "List" and return the value test while transforming with the XSLT. So i should get the value test as output.
Can anyone figure out what wrong in the XSLT.
When you look at your XSLT, it may seem like there is JavaScript there, but all XSLT sees is that it is outputing an element named "script", with an attribute "language", which contains some text. It is also worth noting that xsl:value-of is used to get the value from the input document, but your script element is actually part of the result tree, and so not accessible to xsl:value-of.
However, it is possible to extend XSLT so it can use javascript functions, but this is very much processor dependant, and you should think of it the same way as embedding JavaScript in HTML. Have a look at this question, as an example
How to include javaScript file in xslt
So, in your case, your XSLT would be something like this (Note this particular example will only work in Mircorsofts MSXML processor)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:user="http://mycompany.com/mynamespace"
exclude-result-prefixes="msxsl user">
<xsl:output method="xml" indent="yes" />
<msxsl:script language="JScript" implements-prefix="user">
var List={
"t1":"test"
}
function lookup(key) {
return List[key];
}
</msxsl:script>
<xsl:template match="root">
<html>
<div>
<xsl:variable name="name1" select="te"/>
<xsl:value-of select="user:lookup(string($name1))"/>
</div>
</html>
</xsl:template>
</xsl:stylesheet>
Of course, it might be worth asking why you want to use javascript in your XSLT. It may be possible to achieve the same result using purely XSLT, which would certainly make you XSLT more portable.

xslt xml to html : how to match an element that its name starts with xxx?

This is a problem I came across. To prevent h1 to be duplicated, in xml every h1 tag will have a radom number after h1. And the source xml and the wanted html are shown below:
source xml:
<h1_JW1XRT>Hello1</h1_JW1XRT>
<h1_JXZRIE>Hello2</h1_JXZRIE>
convert into html
<h1 id="h1_JW1XRT">Hello1</h1>
<h1 id="h1_JXZRIE">Hello2</h1>
how can i write this template?
This transformation:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="*[starts-with(name(), 'h1')]">
<h1 id="{name()}"><xsl:apply-templates/></h1>
</xsl:template>
</xsl:stylesheet>
when applied on the following XML document (the provided XML fragment, wrapped in a single top element -- to become a well-formed XML document):
<t>
<h1_JW1XRT>Hello1</h1_JW1XRT>
<h1_JXZRIE>Hello2</h1_JXZRIE>
</t>
produces the wanted, correct result:
<h1 id="h1_JW1XRT">Hello1</h1>
<h1 id="h1_JXZRIE">Hello2</h1>
Explanation: Proper use of the standard XPath function starts-with()

Clone contents of node into different namespace

I've updated the title and the text of my original question after gaining more knowledge on what's really going on. I misinterpreted the symptom as whitespace not being preserved while what's really going on was that the HTML elements weren't being interpreted as HTML.
I'm writing a transformation from a WADL document to HTML. I need to clone the contents of WADL <doc> elements while preserving whitespace because the <doc> elements may contain HTML elements like <pre> that care about whitespace while changing the namespace to HTML.
Sample WADL <doc> element:
<doc xml:lang="en" title="Some Representation">
Sample representation:
<pre><![CDATA[
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
]]></pre>
</doc>
Here's how I'm currently transforming this:
<xsl:apply-templates select="wadl:doc"/>
...
<xsl:template match="wadl:doc">
<xsl:if test="#title">
<p><strong><xsl:value-of select="#title"/></strong></p>
</xsl:if>
<p><xsl:copy-of select="*|node()"/></p>
</xsl:template>
What I'm seeing is that the contents of the copied <pre> element has the whitespace collapsed isn't interpreted as a <pre> element and therefore the representation sample looks out of whack. How can I instruct XSL output to preserve the whitespace override the namespace while copying the contents of <doc> elements? Or is this simply a problem with the way I select the contents of the <doc> elements?
Update
After being ticked off that this could be an output namespace issue, I created the following minimal setup to experiment on:
The XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="doc_test.xsl"?>
<application xmlns="http://wadl.dev.java.net/2009/02">
<doc>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</doc>
</application>
The XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="http://wadl.dev.java.net/2009/02">
<xsl:output
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="wadl:application">
<html>
<head>
<title>Doc Test</title>
</head>
<body>
<xsl:apply-templates select="wadl:doc"/>
</body>
</html>
</xsl:template>
<xsl:template match="wadl:doc">
<xsl:copy-of select="node()"/>
</xsl:template>
</xsl:stylesheet>
When I inspect the DOM node of the <p> or <pre> elements on Firefox, the namespace of the elements point to the WADL namespace, and they don't get properly rendered as HTML (they look like plain text). When I do the same on Chrome, the namespace is XHTML and the elements render as proper XHTML elements.
So, I guess, since in the <doc> elements of my original WADL document I'm not using namespace prefixes explicitly, I need to find a nice way to force the contents of <doc> to use the XHTML namespace, or simply add an XHTML namespace prefix to the contents of <doc> (more work, but seems to be the proper way).
I cannot repro the problem using the provided XSLT code.
I have modified it slightly, adding the wadl: namespace:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="some:wadl-namespace"
exclude-result-prefixes="wadl"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="wadl:doc"/>
</xsl:template>
<xsl:template match="wadl:doc">
<xsl:if test="#title">
<p><strong><xsl:value-of select="#title"/></strong></p>
</xsl:if>
<p><xsl:apply-templates select="node()"/></p>
</xsl:template>
<xsl:template match="wadl:pre">
<pre>
<xsl:apply-templates select="node()"/>
</pre>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on this XML document:
<doc xml:lang="en" title="Some Representation"
xmlns="some:wadl-namespace"
>
Sample representation:
<pre><![CDATA[
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
]]></pre>
</doc>
the result is produced with the desired whitespace:
<p><strong>Some Representation</strong></p>
<p>
Sample representation:
<pre>
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
</pre>
</p>
I suspect that your browser may not interpret correctly <pre> when it is in a custom namespace (IE doesn't mind that).
Do note the namespace stripping in my transformation. If namespace stripping alone does not produce the desired results in your browser, you'll need to create <pre> (and any other html tags copied from the source XML document) in the (X)Html namespace, by using the
<xsl:element name="{name()}" namespace="http://www.w3.org/1999/xhtml">
instruction.
Following Dimitre answer, when I run this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="http://wadl.dev.java.net/2009/02"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="wadl">
<xsl:output method="xml" omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="wadl:application">
<html>
<head>
<title>Doc Test</title>
</head>
<body>
<xsl:apply-templates select="wadl:doc/node()"/>
</body>
</html>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}" namespace="http://www.w3.org/1999/xhtml">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
With your update input document:
<application xmlns="http://wadl.dev.java.net/2009/02">
<doc>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</doc>
</application>
I get this output on Firefox 3.5.9 (from "Inspect element" option):
<html><head><title>Doc Test</title></head><body>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</body></html>
Note: In Firefox, if your transformation didn't output a proper HTML or XHTML, it will output inside a transformix element. But this seems to have no relationship with preserved white spaces (at least in this version).
Edit: The pre/#id belong to other input that I've tested for the '#*' template. Sorry

XSLT parse escaped HTML stored in an attribute and convert that attribute's content into element's content

I'm stuck on what I think should be simple thing to do. I've been looking around, but didn't find the solution. Hope you will help me.
What I have is an XML element with an attribute that contains escaped HTML elements:
<Booking>
<BookingComments Type="RAM" comment="RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com" />
</Booking>
What I need to get is parsed HTML elements and content from the #comment attribute to be a content of element as follows:
<p>
RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com
<p>
Here is my XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" exclude-result-prefixes="xs fn" version="1.0">
<xsl:output method="html" doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd" encoding="UTF-8" indent="yes" />
<xsl:template name="some-template">
<p>Some text</p>
<p>
<xsl:copy-of
select="/Booking/BookingComments[lower-case(#Type)='ram'][1]/#comment"/>
</p>
</xsl:template>
</xsl:stylesheet>
I've read that copy-of is a good way to restore escaped HTML elements back to proper elements. In this specific case, because it's initially an attribute the copy-of translates it into attribute as well. So I get:
<p comment="RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com"></p>
Which isn't what I want.
If I use apply-templates instead of copy-of, as in:
<p>
<xsl:apply-templates select="/Booking/BookingComments[lower-case(#Type)='ram'[1]/#comment"/>
</p>
I get p's content simply as text, not restored HTML elements.
<p>RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com</p>
I'm sure I'm missing something obvous. I would really appreciate any help and tips!
I would recommend using a dedicated template:
<!-- check if lower-casing #Type is really necessary -->
<xsl:template name="BookingComments[lower-case(#Type)='ram']/#comment">
<p>
<xsl:value-of select="." disable-output-escaping="yes" />
</p>
</xsl:template>
This way you could simply apply templates to the attribute. Note that disabling output escaping has the potential to generate ill-formed output.
You could bind an extension function parse() which parses a string into a nodeset. The exact mechanism will depend on your XSLT engine.
In Xalan, we can take the following static method:
public class MyExtension
{
public static NodeIterator Parse( string xml );
}
and use it like so:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:java="http://xml.apache.org/xalan/java"
exclude-result-prefixes="java"
version="1.0">
<xsl:template match="BookingComments">
<xsl:copy-of select="java:package.name.MyExtension.Parse(string(#comment))" />
</xsl:template>
</xsl:stylesheet>