Clone contents of node into different namespace - html

I've updated the title and the text of my original question after gaining more knowledge on what's really going on. I misinterpreted the symptom as whitespace not being preserved while what's really going on was that the HTML elements weren't being interpreted as HTML.
I'm writing a transformation from a WADL document to HTML. I need to clone the contents of WADL <doc> elements while preserving whitespace because the <doc> elements may contain HTML elements like <pre> that care about whitespace while changing the namespace to HTML.
Sample WADL <doc> element:
<doc xml:lang="en" title="Some Representation">
Sample representation:
<pre><![CDATA[
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
]]></pre>
</doc>
Here's how I'm currently transforming this:
<xsl:apply-templates select="wadl:doc"/>
...
<xsl:template match="wadl:doc">
<xsl:if test="#title">
<p><strong><xsl:value-of select="#title"/></strong></p>
</xsl:if>
<p><xsl:copy-of select="*|node()"/></p>
</xsl:template>
What I'm seeing is that the contents of the copied <pre> element has the whitespace collapsed isn't interpreted as a <pre> element and therefore the representation sample looks out of whack. How can I instruct XSL output to preserve the whitespace override the namespace while copying the contents of <doc> elements? Or is this simply a problem with the way I select the contents of the <doc> elements?
Update
After being ticked off that this could be an output namespace issue, I created the following minimal setup to experiment on:
The XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="doc_test.xsl"?>
<application xmlns="http://wadl.dev.java.net/2009/02">
<doc>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</doc>
</application>
The XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="http://wadl.dev.java.net/2009/02">
<xsl:output
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="wadl:application">
<html>
<head>
<title>Doc Test</title>
</head>
<body>
<xsl:apply-templates select="wadl:doc"/>
</body>
</html>
</xsl:template>
<xsl:template match="wadl:doc">
<xsl:copy-of select="node()"/>
</xsl:template>
</xsl:stylesheet>
When I inspect the DOM node of the <p> or <pre> elements on Firefox, the namespace of the elements point to the WADL namespace, and they don't get properly rendered as HTML (they look like plain text). When I do the same on Chrome, the namespace is XHTML and the elements render as proper XHTML elements.
So, I guess, since in the <doc> elements of my original WADL document I'm not using namespace prefixes explicitly, I need to find a nice way to force the contents of <doc> to use the XHTML namespace, or simply add an XHTML namespace prefix to the contents of <doc> (more work, but seems to be the proper way).

I cannot repro the problem using the provided XSLT code.
I have modified it slightly, adding the wadl: namespace:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="some:wadl-namespace"
exclude-result-prefixes="wadl"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="wadl:doc"/>
</xsl:template>
<xsl:template match="wadl:doc">
<xsl:if test="#title">
<p><strong><xsl:value-of select="#title"/></strong></p>
</xsl:if>
<p><xsl:apply-templates select="node()"/></p>
</xsl:template>
<xsl:template match="wadl:pre">
<pre>
<xsl:apply-templates select="node()"/>
</pre>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on this XML document:
<doc xml:lang="en" title="Some Representation"
xmlns="some:wadl-namespace"
>
Sample representation:
<pre><![CDATA[
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
]]></pre>
</doc>
the result is produced with the desired whitespace:
<p><strong>Some Representation</strong></p>
<p>
Sample representation:
<pre>
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
</pre>
</p>
I suspect that your browser may not interpret correctly <pre> when it is in a custom namespace (IE doesn't mind that).
Do note the namespace stripping in my transformation. If namespace stripping alone does not produce the desired results in your browser, you'll need to create <pre> (and any other html tags copied from the source XML document) in the (X)Html namespace, by using the
<xsl:element name="{name()}" namespace="http://www.w3.org/1999/xhtml">
instruction.

Following Dimitre answer, when I run this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="http://wadl.dev.java.net/2009/02"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="wadl">
<xsl:output method="xml" omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="wadl:application">
<html>
<head>
<title>Doc Test</title>
</head>
<body>
<xsl:apply-templates select="wadl:doc/node()"/>
</body>
</html>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}" namespace="http://www.w3.org/1999/xhtml">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
With your update input document:
<application xmlns="http://wadl.dev.java.net/2009/02">
<doc>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</doc>
</application>
I get this output on Firefox 3.5.9 (from "Inspect element" option):
<html><head><title>Doc Test</title></head><body>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</body></html>
Note: In Firefox, if your transformation didn't output a proper HTML or XHTML, it will output inside a transformix element. But this seems to have no relationship with preserved white spaces (at least in this version).
Edit: The pre/#id belong to other input that I've tested for the '#*' template. Sorry

Related

Retain formatting tags from XML with XSLT

I have to display an XML document using XSLT but I would like apply the <p> and <i> HTML tags from the original XML document. For example p tag from XML would create a new paragraph. What would be the best way to accomplish this?
XML and XSLT code posted below.
Edit: clarified the original question
<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="tour.xsl"?>
<cars>
<car>
<description>
<p><i>There is some text here</i> There is some more text here</p>
<p>This should be another paragraph</p>
<p>This is yet another paragraph</p>
</description>
</car>
</cars>
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0"/>
<xsl:template match="/">
<html>
<head>
<title>Title</title>
</head>
<body>
<xsl:value-of select="car"/><p/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>`
your actual question can be answered with a stylesheet like this, which uses apply-templates
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0" indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Title</title>
</head>
<body>
<xsl:apply-templates select="/cars/car/description/*"/>
<p/>
</body>
</html>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
However, if you have multiple cars you probably want to have some headlines. Then a stylesheet like this may point towards that goal:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0" indent="yes"/>
<xsl:template match="/">
<html>
<head>
<title>Title</title>
</head>
<body>
<xsl:apply-templates select="/cars/car/*"/>
<p/>
</body>
</html>
</xsl:template>
<xsl:template match="description">
<h1>Subtitle</h1>
<xsl:apply-templates select="*"/>
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
You can extend your template to copy over the p elements like this:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0"/>
<xsl:template match="/">
<html>
<head>
<title>Title</title>
</head>
<body>
<xsl:for-each select="/cars/car">
<xsl:for-each select="description/p">
<xsl:copy-of select="."/>
</xsl:for-each>
</xsl:for-each>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Given your XML input, the above XSLT will produce this output:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Title</title>
</head>
<body>
<p><i>There is some text here</i> There is some more text here
</p>
<p>This should be another paragraph</p>
<p>This is yet another paragraph</p>
</body>
</html>
Update:
I am trying to run in komodo's built in browser and ie.
You'd have to research komodo's XSLT capabilities. Also, be aware of challenges running XSLT in the browser; better to run on the server or in batch mode and only use the browser to display the results.
That said, the following XML file will open in IE 11 and be styled per the above XSLT sheet (named tour.xml and located in the same directory as the XML file):
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type='text/xsl' href='tour.xsl'?>
<cars>
<car>
<description>
<p><i>There is some text here</i> There is some more text here</p>
<p>This should be another paragraph</p>
<p>This is yet another paragraph</p>
</description>
</car>
</cars>
Here's a screenshot of the output in IE 11:
Update 2
I tried the template solution but it still returns unformatted text in
one line.
If IE 11 is given an XML file without a xml-stylesheet PI,
<?xml-stylesheet type='text/xsl' href='tour.xsl'?>
then it will display the XML in an outline form.
If IE 11 is given an XML file with a xml-stylesheet PI and it can find the XSLT file specified, it will apply the XSLT and display the results in the browser. This result can be seen in the above screenshot (and is your desired result).
If IE 11 is given an XML file with a xml-stylesheet PI and it cannot find the XSLT file specified, it will display unformatted text in one line as you describe.
Therefore, focus the search for your problem on the location and name of the XSLT file. Here is what happens when I intentionally force such behavior by renaming the XSLT file,
<?xml-stylesheet type='text/xsl' href='tour_CANNOT_FIND.xsl'?>
so that it cannot be found:
Note: Press F-12 to reveal the console.
Then, if you click on the "Allow blocked content" button:
You see the diagnostic message that tour_CANNOT_FIND.xsl cannot be found. If you resolve the problem of IE not finding your XSLT file, you should then see the formatted results of your XSLT file in the browser.

Replace xml tags to html

Hello all I wnt to replace some xml node tags to html tags
Example: <emphasis role="bold">Diff.</emphasis>
i want to convert it to <b>Diff.</b>
Example: <emphasis role="italic">Diff.</emphasis>
i want to convert it to <i>Diff.</i>
Any ideas?
As this answer suggests, XSLT is the de-facto standard to process XML from one format to another.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="//emphasis[#role='bold']">
<b><xsl:apply-templates select="node()" /></b>
</xsl:template>
<xsl:template match="//emphasis[#role='italic']">
<i><xsl:apply-templates select="node()" /></i>
</xsl:template>
</xsl:stylesheet>
XSLT makes use of XPath queries to query and process the content. For instance //emphasis[#role='bold'] matches any tag (no matter how deep) that has an attribute role with value 'bold', within such blocks, you specify how to process it. By presenting it within <b>...</b> blocks, XSLT will present the output within these blocks as well. And select="node()" inserts the content of the node there.
Example: say the above code is stored in process.xslt, you can process this using xsltproc (or another XSLT processor):
xsltproc process.xslt testinput.xml
If testinput is:
<?xml version="1.0"?>
<test>
<emphasis role="italic"><foo>Diff<emphasis role="italic">bar</emphasis></foo>.</emphasis>
<emphasis role="bold">Diff.</emphasis>
</test>
the resulting output is:
$ xsltproc process.xslt testinput.xml
<?xml version="1.0" encoding="ISO-8859-15"?>
<test>
<i><foo>Diff<i>bar</i></foo>.</i>
<b>Diff.</b>
</test>
To output it as HTML, you can override the main of the XSLT by including
<xsl:template match="/">
<html>
<head>
<title>Some title</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
in the <xsl:stylesheet>. In that case, the output is:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-15">
<title>Some title</title>
</head>
<body><test>
<i><foo>Diff<i>bar</i></foo>.</i>
<b>Diff.</b>
</test></body>
</html>

XSLT transform removes HTML elements from mixed-content

Is it possible for XSLT preserve anchors and other embedded HTML tags within XML?
Background: I am trying to convert an HTML document into XML with an XSL stylesheet using XSLT. The original HTML document had content interspersed with anchor tags (e.g. Some hyperlinks here and there). I've copied that content into my XML, but the XSLT output lacks anchor tags.
Example XML:
<?xml version="1.0" ?>
<observations>
<observation>Hyperlinks disappear.</observation>
</observations>
Example XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/html">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:value-of select="observation"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Output:
<html xmlns="http://www.w3.org/1999/html">
<body>Hyperlinks disappear.</body>
</html>
I've read a few similar articles on stackoverflow and checked out the Identity transform page on wikipedia; I started to get some interesting results using xsl:copy-of, but I don't understand enough about XSLT to get all of the words and tags embedded within each XML element to appear in the resulting HTML. Any help would be appreciated.
Write a separate template to match a elements, copy their attributes and content.
What is wrong with your approach? In your code,
<xsl:value-of select="observation"/>
simply sends to the output the string value of the observation element. Its string value is the concatenation of all text nodes it contains. But you need not only the text nodes in it, but also the a elements themselves.
The default behaviour of an XSLT processor is to "skip" element nodes, because of a built-in template. So, if you do not mention a in a template match, it is simply ignored and only its text content is output.
Stylesheet
Note: This stylesheet still relies on the default behaviour of the XSLT processor to some extent. The order of events will resemble the following:
The template where match="/observations" is matched. It adds html
and body to the output. Then, a template rule must be found for the
content of observations. A built-in template matches observation,
does nothing with it, and looks for a template to process its content.
For the a element, the corresponding template is matched, with
copies the element and attributes. Finally, a built-in template copies
the text nodes inside observation and a.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/observations">
<html>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="a">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
XML Output
<html>
<body>Hyperlinks disappear.
</body>
</html>

Convert 'embedded' XML doc into CDATA output in XSLT (1.0)

Given an input XML document like this:
<?xml version="1.0" encoding="utf-8"?>
<title> This contains an 'embedded' HTML document </title>
<document>
<html>
<head><title>HTML DOC</title></head>
<body>
Hello World
</body>
</html>
</document>
</root>
How I can extract that 'inner' HTML document; render it as CDATA and include in my output document ?
So the output document will be an HTML document; which contains a text-box showing the elements as text (so it will be displaying the 'source-view' of the inner document).
I have tried this:
<xsl:template match="document">
<xsl:value-of select="*"/>
</xsl:template>
But this only renders the Text Nodes.
I have tried this:
<xsl:template match="document">
<![CDATA[
<xsl:value-of select="*"/>
]]>
</xsl:template>
But this escapes the actual XSLT and I get:
<xsl:value-of select="*"/>
I have tried this:
<xsl:output method="xml" indent="yes" cdata-section-elements="document"/>
[...]
<xsl:template match="document">
<document>
<xsl:value-of select="*"/>
</document>
</xsl:template>
This does insert a CDATA section, but the output still contains just text (stripped elements):
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<title>My doc</title>
</head>
<body>
<h1>Title: This contains an 'embedded' HTML document </h1>
<document><![CDATA[
HTML DOC
Hello World
]]></document>
</body>
</html>
There are two confusions you need to clear up here.
First, you probably want xsl:copy-of rather than xsl:value-of. The latter returns the string value of an element, the former returns a copy of the element.
Second, the cdata-section-elements attribute on xsl:output affects the serialization of text nodes, but not of elements and attributes. One way to get what you want would be to serialize the HTML yourself, along the lines of the following (not tested):
<xsl:template match="document/descendant::*">
<xsl:value-of select="concat('<', name())"/>
<!--* attributes are left as an exercise for the reader ... *-->
<xsl:text>></xsl:text>
<xsl:apply-templates/>
<xsl:value-of select="concat('</', name(), '>')"/>
</xsl:template>
But the quicker way would be something like the following solution (squeamish readers, stop reading now), pointed out to me by my friend Tommie Usdin. Drop the cdata-section-elements attribute from xsl:output and replace your template for the document element with:
<xsl:template match="document">
<document>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="./html"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</document>
</xsl:template>

XSLT parse escaped HTML stored in an attribute and convert that attribute's content into element's content

I'm stuck on what I think should be simple thing to do. I've been looking around, but didn't find the solution. Hope you will help me.
What I have is an XML element with an attribute that contains escaped HTML elements:
<Booking>
<BookingComments Type="RAM" comment="RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com" />
</Booking>
What I need to get is parsed HTML elements and content from the #comment attribute to be a content of element as follows:
<p>
RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com
<p>
Here is my XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" exclude-result-prefixes="xs fn" version="1.0">
<xsl:output method="html" doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd" encoding="UTF-8" indent="yes" />
<xsl:template name="some-template">
<p>Some text</p>
<p>
<xsl:copy-of
select="/Booking/BookingComments[lower-case(#Type)='ram'][1]/#comment"/>
</p>
</xsl:template>
</xsl:stylesheet>
I've read that copy-of is a good way to restore escaped HTML elements back to proper elements. In this specific case, because it's initially an attribute the copy-of translates it into attribute as well. So I get:
<p comment="RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com"></p>
Which isn't what I want.
If I use apply-templates instead of copy-of, as in:
<p>
<xsl:apply-templates select="/Booking/BookingComments[lower-case(#Type)='ram'[1]/#comment"/>
</p>
I get p's content simply as text, not restored HTML elements.
<p>RAM name fred<br/>Tel 09876554<br/>Email fred#bla.com</p>
I'm sure I'm missing something obvous. I would really appreciate any help and tips!
I would recommend using a dedicated template:
<!-- check if lower-casing #Type is really necessary -->
<xsl:template name="BookingComments[lower-case(#Type)='ram']/#comment">
<p>
<xsl:value-of select="." disable-output-escaping="yes" />
</p>
</xsl:template>
This way you could simply apply templates to the attribute. Note that disabling output escaping has the potential to generate ill-formed output.
You could bind an extension function parse() which parses a string into a nodeset. The exact mechanism will depend on your XSLT engine.
In Xalan, we can take the following static method:
public class MyExtension
{
public static NodeIterator Parse( string xml );
}
and use it like so:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:java="http://xml.apache.org/xalan/java"
exclude-result-prefixes="java"
version="1.0">
<xsl:template match="BookingComments">
<xsl:copy-of select="java:package.name.MyExtension.Parse(string(#comment))" />
</xsl:template>
</xsl:stylesheet>