Convert 'embedded' XML doc into CDATA output in XSLT (1.0) - html

Given an input XML document like this:
<?xml version="1.0" encoding="utf-8"?>
<title> This contains an 'embedded' HTML document </title>
<document>
<html>
<head><title>HTML DOC</title></head>
<body>
Hello World
</body>
</html>
</document>
</root>
How I can extract that 'inner' HTML document; render it as CDATA and include in my output document ?
So the output document will be an HTML document; which contains a text-box showing the elements as text (so it will be displaying the 'source-view' of the inner document).
I have tried this:
<xsl:template match="document">
<xsl:value-of select="*"/>
</xsl:template>
But this only renders the Text Nodes.
I have tried this:
<xsl:template match="document">
<![CDATA[
<xsl:value-of select="*"/>
]]>
</xsl:template>
But this escapes the actual XSLT and I get:
<xsl:value-of select="*"/>
I have tried this:
<xsl:output method="xml" indent="yes" cdata-section-elements="document"/>
[...]
<xsl:template match="document">
<document>
<xsl:value-of select="*"/>
</document>
</xsl:template>
This does insert a CDATA section, but the output still contains just text (stripped elements):
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<title>My doc</title>
</head>
<body>
<h1>Title: This contains an 'embedded' HTML document </h1>
<document><![CDATA[
HTML DOC
Hello World
]]></document>
</body>
</html>

There are two confusions you need to clear up here.
First, you probably want xsl:copy-of rather than xsl:value-of. The latter returns the string value of an element, the former returns a copy of the element.
Second, the cdata-section-elements attribute on xsl:output affects the serialization of text nodes, but not of elements and attributes. One way to get what you want would be to serialize the HTML yourself, along the lines of the following (not tested):
<xsl:template match="document/descendant::*">
<xsl:value-of select="concat('<', name())"/>
<!--* attributes are left as an exercise for the reader ... *-->
<xsl:text>></xsl:text>
<xsl:apply-templates/>
<xsl:value-of select="concat('</', name(), '>')"/>
</xsl:template>
But the quicker way would be something like the following solution (squeamish readers, stop reading now), pointed out to me by my friend Tommie Usdin. Drop the cdata-section-elements attribute from xsl:output and replace your template for the document element with:
<xsl:template match="document">
<document>
<xsl:text disable-output-escaping="yes"><![CDATA[</xsl:text>
<xsl:copy-of select="./html"/>
<xsl:text disable-output-escaping="yes">]]></xsl:text>
</document>
</xsl:template>

Related

Using XSLT to span text between two empty nodes

I have an XML file with a series of pairings like the following:
<metamark function="let-stand" spanTo="#meta-93"/>some text between the two empty nodes<anchor xml:id="meta-93"/>
In other words, the text is always preceded with a metamark tag with #function='let-stand' and a spanTo with a unique value. And the text is always followed with an anchor tag whose #xml:id value match that of the #spanTo value on the metamark.
When transforming such text via XSLT into HTML, I would like to wrap it in a span tag as follows:
<span class="dotted">some text between the two empty nodes</span>
How can I achieve this? Note that the text between the two empty nodes will always be siblings. The value I've put on the span #class is arbitrary. I'm just using "dotted" for demonstration purposes here.
The basic idea is that for each metamark:
create span tag,
get following siblings of the current metamark,
which as a following sibling have anchor tag with proper id (end point, exclusive),
and apply templates to them.
Of course, you have to block "normal" template application within the parent tag of your metamark tags.
Try the following transformation:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" doctype-public="XSLT-compat"
encoding="UTF-8" indent="yes" />
<xsl:template match="metamark">
<xsl:element name="span">
<xsl:attribute name="class" select="'dotted'"/>
<xsl:variable name="termId" select="substring(#spanTo, 2)"/>
<xsl:variable name="srcRange" select="following-sibling::node()
[following-sibling::anchor[#xml:id=$termId]]"/>
<xsl:apply-templates select="$srcRange"/>
</xsl:element>
<xsl:text>
</xsl:text>
</xsl:template>
<!-- In "main" process only "metamark" tags -->
<xsl:template match="main">
<xsl:apply-templates select="metamark"/>
</xsl:template>
<!-- HTML envelope -->
<xsl:template match="/">
<html>
<body>
<xsl:text>
</xsl:text>
<xsl:apply-templates />
</body>
</html>
</xsl:template>
<!-- Identity transform -->
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
</xsl:transform>
I tried it for the following XML sample:
<?xml version="1.0" encoding="utf-8"?>
<main>
<metamark function="let-stand" spanTo="#meta-93"/>Aaaaaa bbbbbbb<anchor xml:id="meta-93"/>
<metamark function="let-stand" spanTo="#meta-94"/>Eeeeee <b>bbb</b> ccc<anchor xml:id="meta-94"/>
<metamark function="let-stand" spanTo="#meta-95"/>Ffffff bbbbbbb<anchor xml:id="meta-95"/>
</main>
and got result:
<!DOCTYPE html PUBLIC "XSLT-compat">
<html>
<body>
<span class="dotted">Aaaaaa bbbbbbb</span>
<span class="dotted">Eeeeee <b>bbb</b> ccc</span>
<span class="dotted">Ffffff bbbbbbb</span>
</body>
</html>

How do I use multiple <xsl:output> statements in a single xslt file

I would like to use a single XSL to produce multiple output formats (xml and html for now)
I would like to define which output format by means of a stylesheet
So the code I have is as follows:
<xd:doc scope="stylesheet">
<xd:desc>
<xd:p><xd:b>Created on:</xd:b> July 1, 2015</xd:p>
<xd:p><xd:b>Author:</xd:b> me</xd:p>
<xd:p>A stylesheet to test the application of XYZABC</xd:p>
<xd:p>takes a single parameter - xslt_output_format</xd:p>
<xd:p>valid inputs - xml html</xd:p>
</xd:desc>
</xd:doc>
<xsl:output name="xml_out" encoding="UTF-8" indent="yes" method="xml" />
<xsl:output name="html_out" encoding="ISO-8859-1" indent="yes" method="html"/>
<xsl:template match="/">
<xsl:choose>
<xsl:when test="$xslt_output_format = 'xml'">
<data>
<p>This is some test xml output</p>
</data>
</xsl:when>
<xsl:when test="$xslt_output_format = 'html'">
<html>
<head>
<title>HTML Test Output</title>
</head>
<body>
<p>This is some test html output</p>
</body>
</html>
</xsl:when>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
If I pass 'xml' as the parameter I get
This is some test xml output
and if I pass 'html' I get
HTML Test Output
This is some test html output
That doesn't seem to respect my respect for ISO-8859-1 encoding on the html (which I was just using to test the was working)
Michael Kay's XSLT 2.0 and Xpath 2.0 tome is a little vague and definitely short of examples on using multiple statements (sorry Mike)
So I am just asking am I using it correctly?
Can I achieve what I am aiming for?
TIA
Feargal
I think you need to use xsl:output together with xsl:result-document http://www.w3.org/TR/xslt20/#creating-result-trees, so try along the lines of
<xsl:template match="/">
<xsl:choose>
<xsl:when test="$xslt_output_format = 'xml'">
<xsl:result-document format="xml_out" href="output.xml">
<data>
<p>This is some test xml output</p>
</data>
</xsl:result-document>
</xsl:when>
<xsl:when test="$xslt_output_format = 'html'">
<xsl:result-document format="html_out" href="output.html">
<html>
<head>
<title>HTML Test Output</title>
</head>
<body>
<p>This is some test html output</p>
</body>
</html>
</xsl:result-document>
</xsl:when>
</xsl:choose>
</xsl:template>
I would probably use templates and modes to distinguish the two different ways of processing but the advice on using xsl:output and xsl:result-document remains the same.
xsl:output by itself doesn't allow you to make any run-time selection of output method.
In 2.0 you can use xsl:output in conjunction with xsl:result-document: the xsl:result-document can select a named xsl:output declaration, or it can override some of its attributes selectively.
Another option (also available in 1.0) is to override xsl:output from the calling API: if you're using JAXP look at Transformer.setOutputProperty().
On the Saxon command line you can set output properties using the syntax !indent=yes on the command line (the "!" needs to be "\!" with some shells).

proper use of xpath for "select=" attribute in xsl:apply-templates tag

I am new to xslt and i am getting quite a hard time getting used to its xsl:apply-templates element.Here i have a simple xml file and i wnat to apply XSL style on its elements.I want to select every entry element from my XML file and show the title child of it on the screen.I am extracting the section from my XSL file where lies my confusion.
<xsl:template match='/'>
<html>
<head>
<title>my xsl file</title>
</head>
<body>
<h2>my book collection</h2>
<xsl:apply-templates select='entry'/>
</body>
</html>
</xsl:template>
In the above snippet in xsl:apply-templates tag if i use select attribute,no content is shown on the screen.But if i remove it everything is fine.My question is why is that ? Am i not supposed to select and match the entry tag.Like the following
<xsl:template match='entry'>
<p>
<xsl:apply-templates select='title'/>
</p>
</xsl:template>
here i have to "select" the "title" tag form every entry then have to make a template match for "title" tag.Like the following.Previous snippent selects the title tag and the following snippet matches it and create a h2 tag with its content.Then why we can't do the same thing for entry tag which is the parent of title tag?
<xsl:template match='title'>
<h2 style='color:red;'><xsl:value-of select="."/></h2>
</xsl:template>
FULL code:
XML file:
<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl' href='haha.xslt'?>
<book>
<entry>
<title>amar boi</title>
<page>100</page>
</entry>
<entry>
<title>adhunik biggan</title>
<page>200</page>
</entry>
<entry>
<title>machine design</title>
<page>1000</page>
</entry>
</book>
XSL file:
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' >
<xsl:template match='/'>
<html>
<head>
<title>my xsl file</title>
</head>
<body>
<h2>my book collection</h2>
<xsl:apply-templates select='entry'/>
</body>
</html>
</xsl:template>
<xsl:template match='entry'>
<p>
<xsl:apply-templates select='title'/>
</p>
</xsl:template>
<xsl:template match='title'>
<h2 style='color:red;'><xsl:value-of select="."/></h2>
</xsl:template>
</xsl:stylesheet>
In the above snippet in xsl:apply-templates tag if i use select
attribute,no content is shown on the screen.But if i remove it
everything is fine.My question is why is that ?
The reason for this is that you are in the context of the / root node (that's what your template matches), and your <xsl:apply-templates/> is selecting "entry" - which is an abbreviation of "child::entry". However, entry is not a child of /, so your expression selects nothing.
If you remove the selection, then templates are applied to nodes that are children of the current node (book in your example). The built-in template rule then applies templates to the children of book and that is how your template matching entry is eventually applied.
You could avoid this problem simply by changing your first template's start-tag to:
<xsl:template match='/book'>
The root node / is not the same as the document element /* (in your case /book).
In your template matching the root node (xsl:template match="/"), you are using xsl:apply-templates select="entry"/>, which is equivalent to /entry and happens to select nothing.
If you want to apply-templates to the entry elements, then you could change the first template to match the document element(as #michael.hor257k recommends), or you could adjust the XPath for the apply-templates in the root node template to be: xsl:apply-templates select="book/entry", or even */entry"
Complete example:
<?xml version='1.0' encoding='UTF-8'?>
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0' >
<xsl:template match='/'>
<html>
<head>
<title>my xsl file</title>
</head>
<body>
<h2>my book collection</h2>
<xsl:apply-templates select='book/entry'/>
</body>
</html>
</xsl:template>
<xsl:template match='entry'>
<p>
<xsl:apply-templates select='title'/>
</p>
</xsl:template>
<xsl:template match="title">
<h2 style="color:red;">
<xsl:value-of select="."/>
</h2>
</xsl:template>
</xsl:stylesheet>

Replace xml tags to html

Hello all I wnt to replace some xml node tags to html tags
Example: <emphasis role="bold">Diff.</emphasis>
i want to convert it to <b>Diff.</b>
Example: <emphasis role="italic">Diff.</emphasis>
i want to convert it to <i>Diff.</i>
Any ideas?
As this answer suggests, XSLT is the de-facto standard to process XML from one format to another.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="//emphasis[#role='bold']">
<b><xsl:apply-templates select="node()" /></b>
</xsl:template>
<xsl:template match="//emphasis[#role='italic']">
<i><xsl:apply-templates select="node()" /></i>
</xsl:template>
</xsl:stylesheet>
XSLT makes use of XPath queries to query and process the content. For instance //emphasis[#role='bold'] matches any tag (no matter how deep) that has an attribute role with value 'bold', within such blocks, you specify how to process it. By presenting it within <b>...</b> blocks, XSLT will present the output within these blocks as well. And select="node()" inserts the content of the node there.
Example: say the above code is stored in process.xslt, you can process this using xsltproc (or another XSLT processor):
xsltproc process.xslt testinput.xml
If testinput is:
<?xml version="1.0"?>
<test>
<emphasis role="italic"><foo>Diff<emphasis role="italic">bar</emphasis></foo>.</emphasis>
<emphasis role="bold">Diff.</emphasis>
</test>
the resulting output is:
$ xsltproc process.xslt testinput.xml
<?xml version="1.0" encoding="ISO-8859-15"?>
<test>
<i><foo>Diff<i>bar</i></foo>.</i>
<b>Diff.</b>
</test>
To output it as HTML, you can override the main of the XSLT by including
<xsl:template match="/">
<html>
<head>
<title>Some title</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
in the <xsl:stylesheet>. In that case, the output is:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-15">
<title>Some title</title>
</head>
<body><test>
<i><foo>Diff<i>bar</i></foo>.</i>
<b>Diff.</b>
</test></body>
</html>

Clone contents of node into different namespace

I've updated the title and the text of my original question after gaining more knowledge on what's really going on. I misinterpreted the symptom as whitespace not being preserved while what's really going on was that the HTML elements weren't being interpreted as HTML.
I'm writing a transformation from a WADL document to HTML. I need to clone the contents of WADL <doc> elements while preserving whitespace because the <doc> elements may contain HTML elements like <pre> that care about whitespace while changing the namespace to HTML.
Sample WADL <doc> element:
<doc xml:lang="en" title="Some Representation">
Sample representation:
<pre><![CDATA[
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
]]></pre>
</doc>
Here's how I'm currently transforming this:
<xsl:apply-templates select="wadl:doc"/>
...
<xsl:template match="wadl:doc">
<xsl:if test="#title">
<p><strong><xsl:value-of select="#title"/></strong></p>
</xsl:if>
<p><xsl:copy-of select="*|node()"/></p>
</xsl:template>
What I'm seeing is that the contents of the copied <pre> element has the whitespace collapsed isn't interpreted as a <pre> element and therefore the representation sample looks out of whack. How can I instruct XSL output to preserve the whitespace override the namespace while copying the contents of <doc> elements? Or is this simply a problem with the way I select the contents of the <doc> elements?
Update
After being ticked off that this could be an output namespace issue, I created the following minimal setup to experiment on:
The XML:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="doc_test.xsl"?>
<application xmlns="http://wadl.dev.java.net/2009/02">
<doc>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</doc>
</application>
The XSL:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="http://wadl.dev.java.net/2009/02">
<xsl:output
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="wadl:application">
<html>
<head>
<title>Doc Test</title>
</head>
<body>
<xsl:apply-templates select="wadl:doc"/>
</body>
</html>
</xsl:template>
<xsl:template match="wadl:doc">
<xsl:copy-of select="node()"/>
</xsl:template>
</xsl:stylesheet>
When I inspect the DOM node of the <p> or <pre> elements on Firefox, the namespace of the elements point to the WADL namespace, and they don't get properly rendered as HTML (they look like plain text). When I do the same on Chrome, the namespace is XHTML and the elements render as proper XHTML elements.
So, I guess, since in the <doc> elements of my original WADL document I'm not using namespace prefixes explicitly, I need to find a nice way to force the contents of <doc> to use the XHTML namespace, or simply add an XHTML namespace prefix to the contents of <doc> (more work, but seems to be the proper way).
I cannot repro the problem using the provided XSLT code.
I have modified it slightly, adding the wadl: namespace:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="some:wadl-namespace"
exclude-result-prefixes="wadl"
>
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:apply-templates select="wadl:doc"/>
</xsl:template>
<xsl:template match="wadl:doc">
<xsl:if test="#title">
<p><strong><xsl:value-of select="#title"/></strong></p>
</xsl:if>
<p><xsl:apply-templates select="node()"/></p>
</xsl:template>
<xsl:template match="wadl:pre">
<pre>
<xsl:apply-templates select="node()"/>
</pre>
</xsl:template>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on this XML document:
<doc xml:lang="en" title="Some Representation"
xmlns="some:wadl-namespace"
>
Sample representation:
<pre><![CDATA[
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
]]></pre>
</doc>
the result is produced with the desired whitespace:
<p><strong>Some Representation</strong></p>
<p>
Sample representation:
<pre>
<MyRoot>
<MyChild awesome="yes"/>
</MyRoot>
</pre>
</p>
I suspect that your browser may not interpret correctly <pre> when it is in a custom namespace (IE doesn't mind that).
Do note the namespace stripping in my transformation. If namespace stripping alone does not produce the desired results in your browser, you'll need to create <pre> (and any other html tags copied from the source XML document) in the (X)Html namespace, by using the
<xsl:element name="{name()}" namespace="http://www.w3.org/1999/xhtml">
instruction.
Following Dimitre answer, when I run this stylesheet:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:wadl="http://wadl.dev.java.net/2009/02"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="wadl">
<xsl:output method="xml" omit-xml-declaration="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"/>
<xsl:template match="wadl:application">
<html>
<head>
<title>Doc Test</title>
</head>
<body>
<xsl:apply-templates select="wadl:doc/node()"/>
</body>
</html>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}" namespace="http://www.w3.org/1999/xhtml">
<xsl:apply-templates select="#*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="#*">
<xsl:copy/>
</xsl:template>
</xsl:stylesheet>
With your update input document:
<application xmlns="http://wadl.dev.java.net/2009/02">
<doc>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</doc>
</application>
I get this output on Firefox 3.5.9 (from "Inspect element" option):
<html><head><title>Doc Test</title></head><body>
<p>This is an HTML paragraph</p>
<pre>
And this
is a
preformatted
block of
text.
</pre>
</body></html>
Note: In Firefox, if your transformation didn't output a proper HTML or XHTML, it will output inside a transformix element. But this seems to have no relationship with preserved white spaces (at least in this version).
Edit: The pre/#id belong to other input that I've tested for the '#*' template. Sorry