XML encoded HTML to XSLT - html

How do I take XML-encoded HTML and create a XSLT? I have the xml/html page linked to the XSLT file and it shows the text from the document but will not show the link or picture. The image in the XML/HTML is in a folder called images within the folder where the xml and xslt are.
<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="XLST.xslt"?>
<html>
<head>
<title>CATS! CATS! CATS!</title>
</head>
<body>
<h1>CATS</h1>
<p>
Visit Google...
</p>
Cats like milk!
<p>
<image> <img src="Cats.jpg" alt="Cats, so cute!"/></image>
</p>
</body>
</html>
And the XSLT file:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="html">
<xsl:apply-templates select="body" />
</xsl:template>
<xsl:template match="body">
<img alt="">
<xsl:attribute name="src">
<xsl:value-of select="//Cats"/>
</xsl:attribute>
</img>
</xsl:template>
</xsl:stylesheet>

Transforming an existing xhtml document to an almost identical xhtml document with just a few additions isn't something you would normally do, but it's relatively easy. Start with a stylesheet that only includes the identity template:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This will output the xhtml exactly as-is, then you can add in templates to supplement your source xhtml. For example, adding in this:
<xsl:template match="body">
<xsl:copy>
<xsl:apply-templates select="#*"/>
<img alt="" src="(insert url here)"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
Will add an image at the start of the body element. It copies the element itself, then it's attributes, inserts an image, then copies the child elements.

Related

How to change the siblings of xml into parent and child nodes according to their attributes using xsl

the input is
<p style="abc">hbfg</p>
<r style="cds">bhf</r>
<r style="cds"> bhfsh</r>
<p style="abc">pofj</p>
<r style="abc"> bchs</r>
the expected output should be
<p style="abc">hbfg
<r style="cds">bhf</r>
<r style="cds"> bhfsh</r></p>
<p style="abc">pofj
<r style="abc"> bchs</r></p>
How to convert it using xslt.
Here are two versions of an XSLT stylesheet which will process the XML
file you posted, one for xslt-2.0 which introduced a convenient
xsl:for-each-group group-starting-with=pattern element for this
use case, and, for maximum portability, one for xslt-1.0 using
XPath to do the grouping. Both versions use doc/text as the logical
root of the tree and xsl:apply-templates to make the most of
the built-in template rules. Mind the whitespace handling.
More examples of flat file transformation at
SO
and the XSLT 1.0 FAQ, now at
archive.org.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="doc/text">
<chapter>
<title>
<xsl:apply-templates select="p[#style='TRH2']"/>
</title>
<research>
<title>
<xsl:apply-templates select="p[#style='TRRef']"/>
</title>
<reftext>
<xsl:apply-templates select="p[#style='TRRefText']"/>
</reftext>
</research>
<sections>
<xsl:for-each-group
select="p[not(#style) or #style='TRH7']"
group-starting-with="p[#style='TRH7']"
>
<title>
<xsl:apply-templates select="self::p[1]"/>
</title>
<paragraphs>
<xsl:for-each select="current-group()[self::p][position()>1]">
<para-text>
<xsl:apply-templates/>
</para-text>
</xsl:for-each>
</paragraphs>
</xsl:for-each-group>
</sections>
</chapter>
</xsl:template>
<xsl:template match="p[#style='TRRefText']">
<xsl:value-of select="."/><br/>
</xsl:template>
<xsl:template match="foot-note">
<footnoteref>
<id><xsl:value-of select="#id-rel"/></id>
<xsl:apply-templates/>
</footnoteref>
</xsl:template>
</xsl:transform>
The XSLT 1.0 version (in the third xsl:template) uses an XPath
expression to group the non-title p elements between current and
next subsection title element (p[#style='TRH7']), and a mode="para"
clause to avoid processing the title as both title and paragraph.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="doc/text">
<chapter>
<title>
<xsl:apply-templates select="p[#style='TRH2']" />
</title>
<research>
<title>
<xsl:apply-templates select="p[#style='TRRef']" />
</title>
<reftext>
<xsl:apply-templates select="p[#style='TRRefText'] "/>
</reftext>
</research>
<sections>
<xsl:apply-templates select="p[#style='TRH7']" />
</sections>
</chapter>
</xsl:template>
<xsl:template match="p[#style='TRRefText']">
<xsl:value-of select="."/><br/>
</xsl:template>
<xsl:template match="p[#style='TRH7']">
<title><xsl:apply-templates/></title>
<paragraphs>
<xsl:apply-templates mode="para"
select="following-sibling::p[not(#style='TRH7')]
[generate-id(preceding-sibling::p[#style='TRH7'][1])
= generate-id(current())]"
/>
</paragraphs>
</xsl:template>
<xsl:template match="p" mode="para">
<para-text><xsl:apply-templates/></para-text>
</xsl:template>
<xsl:template match="foot-note">
<footnoteref>
<id><xsl:value-of select="#id-rel"/></id>
<xsl:apply-templates/>
</footnoteref>
</xsl:template>
</xsl:transform>
UPDATE: Additional explanation as requested in comment.
Your own code is very close to what I posted so I'll expand on how to group elements using XSLT 1.0. Each sub-section in the document is triggered by the style of its title (p[#style='TRH7']), activating the 3rd template:
<xsl:template match="p[#style='TRH7']">
<title><xsl:apply-templates/></title>
<paragraphs>
<xsl:apply-templates mode="para"
select="following-sibling::p[not(#style='TRH7')]
[generate-id(preceding-sibling::p[#style='TRH7'][1])
= generate-id(current())]"
/>
</paragraphs>
</xsl:template>
This template emits a sub-section title (using a built-in template rule), then collects the following non-title paragraphs
(following-sibling::p[not(#style='TRH7')]) which have the current
title as the most recent logical parent. Recall that preceding-sibling is a reverse axis so p[…][1] refers to the nearest sibling in reverse document order. Since following-sibling::p[…] selects all following non-title paras the second predicate [generate-id(…)] limits the selection to the logical children of the current title.

Using XSLT to span text between two empty nodes

I have an XML file with a series of pairings like the following:
<metamark function="let-stand" spanTo="#meta-93"/>some text between the two empty nodes<anchor xml:id="meta-93"/>
In other words, the text is always preceded with a metamark tag with #function='let-stand' and a spanTo with a unique value. And the text is always followed with an anchor tag whose #xml:id value match that of the #spanTo value on the metamark.
When transforming such text via XSLT into HTML, I would like to wrap it in a span tag as follows:
<span class="dotted">some text between the two empty nodes</span>
How can I achieve this? Note that the text between the two empty nodes will always be siblings. The value I've put on the span #class is arbitrary. I'm just using "dotted" for demonstration purposes here.
The basic idea is that for each metamark:
create span tag,
get following siblings of the current metamark,
which as a following sibling have anchor tag with proper id (end point, exclusive),
and apply templates to them.
Of course, you have to block "normal" template application within the parent tag of your metamark tags.
Try the following transformation:
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="html" doctype-public="XSLT-compat"
encoding="UTF-8" indent="yes" />
<xsl:template match="metamark">
<xsl:element name="span">
<xsl:attribute name="class" select="'dotted'"/>
<xsl:variable name="termId" select="substring(#spanTo, 2)"/>
<xsl:variable name="srcRange" select="following-sibling::node()
[following-sibling::anchor[#xml:id=$termId]]"/>
<xsl:apply-templates select="$srcRange"/>
</xsl:element>
<xsl:text>
</xsl:text>
</xsl:template>
<!-- In "main" process only "metamark" tags -->
<xsl:template match="main">
<xsl:apply-templates select="metamark"/>
</xsl:template>
<!-- HTML envelope -->
<xsl:template match="/">
<html>
<body>
<xsl:text>
</xsl:text>
<xsl:apply-templates />
</body>
</html>
</xsl:template>
<!-- Identity transform -->
<xsl:template match="#*|node()">
<xsl:copy><xsl:apply-templates select="#*|node()"/></xsl:copy>
</xsl:template>
</xsl:transform>
I tried it for the following XML sample:
<?xml version="1.0" encoding="utf-8"?>
<main>
<metamark function="let-stand" spanTo="#meta-93"/>Aaaaaa bbbbbbb<anchor xml:id="meta-93"/>
<metamark function="let-stand" spanTo="#meta-94"/>Eeeeee <b>bbb</b> ccc<anchor xml:id="meta-94"/>
<metamark function="let-stand" spanTo="#meta-95"/>Ffffff bbbbbbb<anchor xml:id="meta-95"/>
</main>
and got result:
<!DOCTYPE html PUBLIC "XSLT-compat">
<html>
<body>
<span class="dotted">Aaaaaa bbbbbbb</span>
<span class="dotted">Eeeeee <b>bbb</b> ccc</span>
<span class="dotted">Ffffff bbbbbbb</span>
</body>
</html>

Adding HTML tags with XSLT on the fly

I have an XML document containing this:
<d1/>
<p1>...</p1>
<p2>...</p2>
<d2/>
<p3>...</p3>
<d3/>
Where pn are elements with possibly subelements and other stuff, and dn indicates where an HTML DIV tag wrapping the p tags should begin, but without a corresponding closing tag, this is only indicated implicitly by the next dn tag. The desired HTML output is this:
<div>
<p1>...</p1>
<p2>...</p2>
</div>
<div>
<p3>...</p3>
</div>
I have written an XSLT to introduce the <div> and </div> tags on the fly, using the following:
<xsl:text disable-output-escaping="yes"><div></xsl:text>
and
<xsl:text disable-output-escaping="yes"></div></xsl:text>
and this works on Safari, but it fails on FireFox, which makes me suspect that it's not the right way to do it.
Do you have a better idea that will work on every browser?
Thanks a lot in advance.
Firefox does not support disable-output-escaping because it does not serialize the result tree. The problem is a grouping problem, one way to solve it is to use a key:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:key name="group" match="body/*[not(starts-with(local-name(), 'd'))]" use="generate-id(preceding-sibling::*[starts-with(local-name(), 'd')][1])"/>
<xsl:template match="body">
<xsl:copy>
<xsl:apply-templates select="*[starts-with(local-name(), 'd')]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[starts-with(local-name(), 'd')]">
<div>
<xsl:copy-of select="key('group', generate-id())"/>
</div>
</xsl:template>
</xsl:transform>
That would create an empty div however at the end of your sample, so you might want to change the last template to
<xsl:template match="*[starts-with(local-name(), 'd')]">
<xsl:if test="key('group', generate-id())">
<div>
<xsl:copy-of select="key('group', generate-id())"/>
</div>
</xsl:if>
</xsl:template>
You could use a technique known as "sibling recursion".
Given a well-formed input such as:
XML
<root>
<d1/>
<p1>a</p1>
<p2>b</p2>
<d2/>
<p3>c</p3>
<d3/>
</root>
the following stylesheet:
XSLT 1.0
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:template match="/root">
<body>
<xsl:apply-templates select="*[starts-with(name(), 'd')][position()!=last()]"/>
</body>
</xsl:template>
<xsl:template match="*[starts-with(name(), 'd')]">
<div>
<xsl:apply-templates select="following-sibling::*[1][not(starts-with(name(), 'd'))]"/>
</div>
</xsl:template>
<xsl:template match="/root/*[not(starts-with(name(), 'd'))]">
<xsl:copy-of select="."/>
<xsl:apply-templates select="following-sibling::*[1][not(starts-with(name(), 'd'))]"/>
</xsl:template>
</xsl:stylesheet>
will return:
<body>
<div>
<p>a</p>
<p>b</p>
</div>
<div>
<p>c</p>
</div>
</body>

how to interpret HTML in XSL

I have the following xml
<results>
<first-name>Carl<first-name>
<data><b> This is carl's data </b></data>
</results>
How do I include the bold tags which is present in the <data> tag to be a part of the output but rendered as an HTML
When I say <xsl:value-of select="results/data"/> The output is
<b> This is carl's data </b>
I want to achieve "This is carl's data" as the output in bold.
Well <xsl:copy-of select="results/data/node()"/> is a start but if the requirement is part of a larger problem then you are better off writing a template for data elements which uses apply-templates to push the child nodes through some template(s) for copying HTML elements through to the output.
I am sure someone will let me know if I am being naive:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/results">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="first-name">
<xsl:value-of select="." />
<xsl:text>: </xsl:text>
</xsl:template>
<xsl:template match="data">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="b">
<b>
<xsl:value-of select="." />
</b>
</xsl:template>
</xsl:stylesheet>

Preserve certain html tags during XSLT

I have looked up solutions on stackflow, but none of them seem to work for me. Here is my question. Lets say I have the following text :
Source:
<greatgrandparent>
<grandparent>
<parent>
<sibling>
Hey, im the sibling .
</sibling>
<description>
$300$ <br/> $250 <br/> $200! <br/> <p> Yes, that is right! <br/> You can own a ps3 for only $200 </p>
</description>
</parent>
<parent>
... (SAME FORMAT)
</parent>
... (Several more parents)
</grandparent>
</greatgrandparent>
Output:
<newprice>
$300$ <br/> $250 <br/> $200! <br/> Yes, that is right! <br/> You can own a ps3 for only $200
</newprice>
I can't seem to find a way to do that.
Current XSL:
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="greatgrandparents">
<xsl:apply-templates />
</xsl:template>
<xsl:template match = "grandparent">
<xsl:for-each select = "parent" >
<newprice>
<xsl:apply-templates>
</newprice>
</xsl:for-each>
</xsl:template>
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
Use templates to define behavior on specific elements
<!-- after standard identity template -->
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
The first template says to swap description with newprice. The second one says to ignore the p element.
If you're unfamiliar with the identity template, take a look here for a few examples.
EDIT: Given the new example, we can see that you want to only extract the description element and its contents. Notice that the template action starts with the match="/" template. We can use this control where our stylesheet starts and thus skip much of the riffraff we want to filter out.
change the <xsl:template match="/"> to something more like:
<xsl:template match="/">
<xsl:apply-templates select="//description"/>
<!-- use a more specific XPath if you can -->
</xsl:template>
So altogether our solution looks like this:
<xsl:stylesheet
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
exclude-result-prefixes="xs">
<xsl:template match="/">
<xsl:apply-templates select="//description" />
</xsl:template>
<!-- this is the identity template -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
Shouldn't the contents of be inside a CDATA element? And then probably disable output encoding on xsl:value-of..
You should look into xsl:copy-of.
You would probably wind up with somthing like:
<xsl:template match="description">
<xsl:copy-of select="."/>
</xsl:template>
Probably the shortest solution is this one:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="description">
<newprice>
<xsl:copy-of select="node()"/>
</newprice>
</xsl:template>
<xsl:template match="text()[not(ancestor::description)]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document, the wanted result is produced:
<newprice>
$300$ <br /> $250 <br /> $200! <br /> <p> Yes, that is right! <br /> You can own a ps3 for only $200 </p>
</newprice>
Do note:
The use of <xsl:copy-of select="node()"/> to copy all the subtree rooted in description, without the root itself.
How we override (with a specific, empty template) the XSLT built-in template, preventing any text nodes that are not descendents of a <description> element, to be output.