XSL character escape problem - html

I am writing this because I have really hit the wall and cannot go ahead. In my database I have escaped HTML like this: "<p>My name is Freddy and I was".
I want to show it as HTML OR strip the HTML tags in my XSL template. Both solutions will work for me and I will choose the quicker solution.
I have read several posts online but cannot find a solution. I have also tried disable-output-escape with no success. Basically it seems the problem is that somewhere in the XSL execution the engine is changing this <p> into this: &lt;p&gt;.
It is converting the & into &. If it helps, here is my XSL code. I have tried several combinations with and without the output tag on the top.
Any help will be appreciated. Thanks in advance.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" omit-xml-declaration="yes"/>
<xsl:template match="DocumentElement">
<div>
<xsl:attribute name="id">mySlides</xsl:attribute>
<xsl:apply-templates>
<xsl:with-param name="templatenumber" select="0"/>
</xsl:apply-templates>
</div>
<div>
<xsl:attribute name="id">myController</xsl:attribute>
<xsl:apply-templates>
<xsl:with-param name="templatenumber" select="1"/>
</xsl:apply-templates>
</div>
</xsl:template>
<xsl:template match="DocumentElement/QueryResults">
<xsl:param name="templatenumber">tobereplace</xsl:param>
<xsl:if test="$templatenumber=0">
<div>
<xsl:attribute name="id">myController</xsl:attribute>
<div>
<xsl:attribute name="class">article</xsl:attribute>
<h2>
<a>
<xsl:attribute name="class">title</xsl:attribute>
<xsl:attribute name="title"><xsl:value-of select="Title"/></xsl:attribute>
<xsl:attribute name="href">/stories/stories-details/articletype/articleview/articleid/<xsl:value-of select="ArticleId"/>/<xsl:value-of select="SEOTitle"/>.aspx</xsl:attribute>
<xsl:value-of select="Title"/>
</a>
</h2>
<div>
<xsl:attribute name="style">text-indent: 25px;</xsl:attribute>
<xsl:attribute name="class">articlesummary</xsl:attribute>
<xsl:call-template name="removeHtmlTags">
<xsl:with-param name="html" select="Summary" />
</xsl:call-template>
</div>
</div>
</div>
</xsl:if>
<xsl:if test="$templatenumber=1">
<div>
<xsl:attribute name="id">myController</xsl:attribute>
<span>
<xsl:attribute name="class">jFlowControl</xsl:attribute>
aa
</span>
</div>
</xsl:if>
</xsl:template>
<xsl:template name="removeHtmlTags">
<xsl:param name="html"/>
<xsl:choose>
<xsl:when test="contains($html, '<')">
<xsl:value-of select="substring-before($html, '<')"/>
<!-- Recurse through HTML -->
<xsl:call-template name="removeHtmlTags">
<xsl:with-param name="html" select="substring-after($html, '>')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$html"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Based in the assumption that you have this HTML string,
<p>My name is Freddy & I was
then if you escape it and store it in a database it would become this:
<p>My name is Freddy &amp; I was
Consequently, if you retrieve it as XML (without unescaping it beforehand), the result would be this:
&lt;p&gt;My name is Freddy &amp;amp; I was
and <xsl:value-of select="." disable-output-escaping="yes" /> would produce:
<p>My name is Freddy &amp; I was
You are getting exactly the same thing you have in your database, but of course you see the HTML tags in the output. So what you need is a mechanism that does the following string replacements:
"&lt;" with "<" (effectively changing < to < in unescaped ouput)
"&gt;" with ">" (effectively changing > to > in unescaped ouput)
"&quot;" with """ (effectively changing " to " in unescaped ouput)
"&amp;" with "&" (effectively changing & to & in unescaped ouput)
From your XSL I have inferred the following test input XML:
<DocumentElement>
<QueryResults>
<Title>Article 1</Title>
<ArticleId>1</ArticleId>
<SEOTitle>Article_1</SEOTitle>
<Summary>&lt;p&gt;Article 1 summary &amp;amp; description.&lt;/p&gt;</Summary>
</QueryResults>
<QueryResults>
<Title>Article 2</Title>
<ArticleId>2</ArticleId>
<SEOTitle>Article_2</SEOTitle>
<Summary>&lt;p&gt;Article 2 summary &amp;amp; description.&lt;/p&gt;</Summary>
</QueryResults>
</DocumentElement>
I have changed the stylesheet you supplied and implemented such a replacement mechanism. If you apply the following XSLT 1.0 template to it:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:namespace"
exclude-result-prefixes="my"
>
<xsl:output method="html" omit-xml-declaration="yes"/>
<my:unescape>
<my:char literal="<" escaped="&lt;" />
<my:char literal=">" escaped="&gt;" />
<my:char literal=""" escaped="&quot;" />
<my:char literal="&" escaped="&amp;" />
</my:unescape>
<xsl:template match="DocumentElement">
<div id="mySlides">
<xsl:apply-templates mode="slides" />
</div>
<div id="myController">
<xsl:apply-templates mode="controller" />
</div>
</xsl:template>
<xsl:template match="DocumentElement/QueryResults" mode="slides">
<div class="article">
<h2>
<a class="title" title="{Title}" href="{concat('/stories/stories-details/articletype/articleview/articleid/', ArticleId, '/', SEOTitle, '.aspx')}">
<xsl:value-of select="Title"/>
</a>
</h2>
<div class="articlesummary" style="text-indent: 25px;">
<xsl:apply-templates select="document('')/*/my:unescape/my:char[1]">
<xsl:with-param name="html" select="Summary" />
</xsl:apply-templates>
</div>
</div>
</xsl:template>
<xsl:template match="DocumentElement/QueryResults" mode="controller">
<span class="jFlowControl">
<xsl:text>aa </xsl:text>
<xsl:value-of select="Title" />
</span>
</xsl:template>
<xsl:template match="my:char">
<xsl:param name="html" />
<xsl:variable name="intermediate">
<xsl:choose>
<xsl:when test="following-sibling::my:char">
<xsl:apply-templates select="following-sibling::my:char[1]">
<xsl:with-param name="html" select="$html" />
</xsl:apply-templates>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$html" disable-output-escaping="yes" />
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:call-template name="unescape">
<xsl:with-param name="html" select="$intermediate" />
</xsl:call-template>
</xsl:template>
<xsl:template name="unescape">
<xsl:param name="html" />
<xsl:choose>
<xsl:when test="contains($html, #escaped)">
<xsl:value-of select="substring-before($html, #escaped)" disable-output-escaping="yes"/>
<xsl:value-of select="#literal" disable-output-escaping="yes" />
<xsl:call-template name="unescape">
<xsl:with-param name="html" select="substring-after($html, #escaped)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$html" disable-output-escaping="yes"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Then this output HTML is produced:
<div id="mySlides">
<div class="article">
<h2>
<a class="title" title="Article 1" href="/stories/stories-details/articletype/articleview/articleid/1/Article_1.aspx">Article 1</a>
</h2>
<div class="articlesummary" style="text-indent: 25px;">
<p>Article 1 summary & description.</p>
</div>
</div>
<div class="article">
<h2>
<a class="title" title="Article 2" href="/stories/stories-details/articletype/articleview/articleid/2/Article_2.aspx">Article 2</a>
</h2>
<div class="articlesummary" style="text-indent: 25px;">
<p>Article 2 summary & description.</p>
</div>
</div>
</div>
<div id="myController">
<span class="jFlowControl">aa Article 1</span>
<span class="jFlowControl">aa Article 2</span>
</div>
Note
the use of a temporary namespace and embedded elements (<my:unescape>) to create a list of characters to replace
the use of recursion to emulate an iterative replacement of all affected characters in the input
the use of the implicit context within the unescape template to transport the information which character is to be replaced at the moment
Furthermore note:
the use of template modes to get different output for the same input (this replaces your templatenumber parameter)
most of the time there is no need for <xsl:attribute> elements. They can safely be replaced by inline notation (attributename="{attributevalue}")
the use of the concat() function to create the URL
Generally speaking, it is a bad idea to store escaped HTML in a database (more generally speaking: It is a bad idea to store HTML in a database.). You set yourself up to get all kinds of problems, this being one of them. If you can't change this setup, I hope that the solution helps you.
I cannot guarantee that it does the right thing in all situations, and it may open up security holes (think XSS), but dealing with this was not part of the question. In any case, consider yourself warned.
I need a break now. ;-)

You shouldn't store escaped HTML in your database. If your database contained the actual "<" character, then the "disable-output-escaping" command would do what you wanted.
If you can't change the data then you'll have to unescape the data before your perform the transform.

Add this line to your stylesheet
<xsl:output method="html" indent="yes" version="4.0"/>

It is a bad idea to store HTML in a database
What? How are you supposed to store it then? In an XML doc so you have to use XSLT anyway? As a web developer, we've always used SQL databases to store user-defined HTML data. There's nothing wrong with that method as long as it is sanitized properly for your purposes.

Related

Recursion depth and different parsing

I'm fairly new to xslt. So what im trying to do, is parse a book in xml to an html. A Basic example would be this.
<book>
<title>
Some important title
</title>
<section>
<title>animal</title>
<kw>RealAnimal</kw>
<kw>something|something more about it</tkw>
<para>Some really important facts</para>
<section>
<title>something</title>
<kw>something else</kw>
<para>Enter Text</para>
</section>
<section>
<title>Even more</title>
<kw>and more</kw>
<para>hell of a lot more</para>
</section>
</section>
</book>
a section can have an unknown number of subsections. So obviously i need to handle this with recusrion. so far i designed 2 templates, in order to handle a book and a section, based on my needs.
<xsl:template match="book">
<html>
<body>
<h1><xsl:value-of select="title" /></h1>
<xsl:apply-templates select="section" />
</body>
</html>
</xsl:template>
<xsl:template match="section[title]">
<li><xsl:value-of select="title" /></li>
<!-- do something more here -->
</xsl:template>
what i cant figure out is, can i get my current recursion depth, because i want to make a decision which kind of header to use based on the depth.
Also, the book is supposed to consist of 2 parts. its normal content at the beginning, like header and para below that header. and an index in the end. This leads me to believe that i need to parse it in 2 different ways within one document, but how would i do that? Any hints or Code would be greatly appreciated
so i figured out how to make section and subsection headers with numbers like a list in Word.
<xsl:number level="multiple" />
gives me for a subsection x.y basend on parents section position and its own position. what i now want is that it gives me the number of groups, as it groups the values based on the depth, but i cant figure out how
what id expect is that it parses to
<h1>Some important title</h1>
...
<h2> animal </h2>
...
<h3> something </h3>
...
<h3> Even more </h3>
and if i were to add another section to the "something"-section it would be h4 and so on...
solved it like this
<xsl:param name="depth"/>
<xsl:choose>
<xsl:when test="6 > $depth">
<xsl:element name="h{$depth}">
<xsl:number level="multiple" />.
<xsl:value-of select="title" />
</xsl:element>
</xsl:when>
<xsl:otherwise>
<h6><xsl:number level="multiple" />. <xsl:value-of select="title" /></h6>
</xsl:otherwise>
</xsl:choose>
well what im trying to do is set h2 for section as subsection for
book, and h3 for a subsection of section
Here's one way you could do this - with unlimited nesting of subsections:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/book">
<html>
<body>
<h1><xsl:value-of select="title" /></h1>
<xsl:apply-templates select="section" />
</body>
</html>
</xsl:template>
<xsl:template match="section">
<h2><xsl:value-of select="title" /></h2>
<xsl:apply-templates select="subsection">
<xsl:with-param name="depth" select="3"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="subsection">
<xsl:param name="depth"/>
<xsl:element name="h{$depth}">
<xsl:value-of select="title" />
</xsl:element>
<xsl:apply-templates select="subsection">
<xsl:with-param name="depth" select="$depth + 1"/>
</xsl:apply-templates>
</xsl:template>
</xsl:stylesheet>
Note that this is recursive and unlimited; AFAIK, HTML will run out of levels after h6.
Edit:
a subsection isnt named subsection, it just a section as a child of
another section.
Well, then this could be simpler. Or at least shorter.
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/book">
<html>
<body>
<h1><xsl:value-of select="title" /></h1>
<xsl:apply-templates select="section">
<xsl:with-param name="depth" select="2"/>
</xsl:apply-templates>
</body>
</html>
</xsl:template>
<xsl:template match="section">
<xsl:param name="depth"/>
<xsl:element name="h{$depth}">
<xsl:value-of select="title" />
</xsl:element>
<xsl:apply-templates select="section">
<xsl:with-param name="depth" select="$depth + 1"/>
</xsl:apply-templates>
</xsl:template>
</xsl:stylesheet>
Edit 2:
im supposed to set h2-h5 for th first 4, and h6 after that.
If you mean you want to limit the heading to a maximum of h6 regardless of the section's depth, then change this:
<xsl:with-param name="depth" select="$depth + 1"/>
to:
<xsl:with-param name="depth" select="$depth + ($depth < 6)"/>
You might try tinkering with "count(ancestor::*)" if you really want to know how deep you are. However, I'd suggest taking a look at automatic numbering first, just in case it does the trick. It even handles nested items pretty handily.
"XSLT's xsl:number instruction makes it easy to insert a number into your result document. Its value attribute lets you name the number to insert, but if you really want to add a specific number to your result, it's much simpler to add that number as literal text. When you omit the value attribute from an xsl:value-of instruction, the XSLT processor calculates the number based on the context node's position in the source tree or among the nodes being counted through by an xsl:for-each instruction, which makes it great for automatic numbering."
XML.com reference page

XSLT transformation for multiple font types

I am trying to make an XSLT transformation from XML, I want to transform font style tags into HTML tags, but my I am doing something wrong.
My XML file is like this one :
<root>
<p>
<span>
<i/>
italic
</span>
<span>
<i/>
<b/>
bold-italic
</span>
<span>
normal
</span>
</p>
</root>
What I want is HTML with the same tags but my XSLT transformation does not work:
HTML:
<p>
<i>italic</i>
<i><b>bold-italic</b></i>
normal
<p>
I was trying xsl:if condition but it does not work,i do not know what I am doing wrong:
XSLT:
<xsl:template match="p">
<p>
<xsl:for-each select="span">
<xsl:if test="i">
<i>
<xsl:value-of select="."/>
</i>
</xsl:if>
<xsl:if test="b">
<b>
<xsl:value-of select="."/>
</b>
</xsl:if>
</xsl:for-each>
</p>
</xsl:template>
Do you know how to repair my code ?
Can you have more than just b and i elements? It may be possible to do this with a generic solution, that creates a nested element for each child element of a span element.
This solution uses a recursive template, that matches span, but with a parameter contain the index number of the child element that needs to be output. When this index exceeds the number of child elements, the text is output.
Try this XSLT too:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="span">
<xsl:param name="num" select="1"/>
<xsl:variable name="childElement" select="*[$num]"/>
<xsl:choose>
<xsl:when test="$childElement">
<xsl:element name="{local-name($childElement)}">
<xsl:apply-templates select=".">
<xsl:with-param name="num" select="$num + 1"/>
</xsl:apply-templates>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
This does assume that all the span element only contain elements you want to nest, in addition to the text.
You can test the contents of span element using an XPath expression with a predicate which tests for its contents, and match different templates for each situation. Since you need b and i for bold-italic, you should use that expression in one of your predicates.
The stylesheet below does the transformation using only templates (without the need of a for-each). I'm assuming the contents of your <span> elements is text (not mixed content):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:template match="p">
<p><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="span[i]">
<i><xsl:value-of select="."/></i>
</xsl:template>
<xsl:template match="span[b]">
<b><xsl:value-of select="."/></b>
</xsl:template>
<xsl:template match="span[i and b]">
<i><b><xsl:value-of select="."/></b></i>
</xsl:template>
</xsl:stylesheet>

Facing issues in parsing xml with xslt

I am trying to parse a xml to generate a html report.The problematic section of the xml is as given
<failure message="Management changes link count is not 3$HiHello" type="junit.framework.AssertionFailedError">junit.framework.AssertionFailedError: Management changes link count is not 3$HiHelloJI
at CustomProjects.CommonTemplates.verifyManagementChanges(Unknown Source)
at CustomProjects.EmersonTest.testEmerson_VerifyManagementChanges(Unknown Source)
</failure>
The xslt written for parsing this is :
<xsl:choose>
<xsl:when test="failure">
<td>Failure</td>
<td><xsl:apply-templates select="failure"/></td>
<td>screenshot</td>
<td><xsl:apply-templates select="failurelink"/></td>
</xsl:when>
</xsl:choose>
<xsl:template match="failure">
<xsl:call-template name="display-failures"/>
</xsl:template>
<xsl:template match="failurelink">
<xsl:call-template name="display-failures-link"/>
</xsl:template>
<xsl:template name="display-failures">
<xsl:param name="FailText" select="#message"/>
<xsl:choose>
<xsl:when test="not(#message)">N/A</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring-before($FailText,'$')"/>
</xsl:otherwise>
</xsl:choose>
<!-- display the stacktrace -->
<code>
<br/><br/>
<xsl:call-template name="br-replace">
<xsl:with-param name="word" select="."/>
</xsl:call-template>
</code>
<!-- the later is better but might be problematic for non-21" monitors... -->
<!--pre><xsl:value-of select="."/></pre-->
</xsl:template>
<xsl:template name="display-failures-link">
<xsl:param name="linktext" select="#message"/>
<xsl:choose>
<xsl:when test="not(#message)">N/A</xsl:when>
<xsl:otherwise>
<xsl:value-of select="substring-after($linktext,'$')"/>
</xsl:otherwise>
</xsl:choose>
<!-- display the stacktrace -->
<code>
<br/><br/>
<xsl:call-template name="br-replace">
<xsl:with-param name="word" select="."/>
</xsl:call-template>
</code>
<!-- the later is better but might be problematic for non-21" monitors... -->
<!--pre><xsl:value-of select="."/></pre-->
</xsl:template>
Here I am getting the desired result(The String before $ sign) from display-failures template but on calling display-failures-link I am getting nothing.(Should get the string after $ sign).I dont know whether the problem is with sunstring function or with something else.Kindly let me know what I am doing wrong here.
Any help is highly appreciated.
The problem here is that you are trying to apply-templates on the XPath failurelink, but you don't have an element called <failurelink>, so this apply-templates isn't finding anything.
<xsl:apply-templates select="failurelink"/>
One way to apply two different templates on the same kind of element is to use modes:
<xsl:template match="failure">
<xsl:call-template name="display-failures"/>
</xsl:template>
<xsl:template match="failure" mode="link">
<xsl:call-template name="display-failures-link"/>
</xsl:template>
Then the area where you apply the templates would change to this:
<td>Failure</td>
<td><xsl:apply-templates select="failure"/></td>
<td>screenshot</td>
<td><xsl:apply-templates select="failure" mode="link"/></td>
But in your case, there's an even better approach. Just eliminate the second template, and do this:
Replace the whole <xsl:choose> with:
<xsl:apply-templates select="failure" />
Replace the first template you listed with:
<xsl:template match="failure">
<td>Failure</td>
<td><xsl:call-template name="display-failures"/></td>
<td>screenshot</td>
<td><xsl:call-template name="display-failures-link"/></td>
</xsl:template>
And delete the second template you listed.

How to split text and preserve HTML tags (XSLT 2.0)

I have an xml that has a description node:
<config>
<desc>A <b>first</b> sentence here. The second sentence with some link The link. The <u>third</u> one.</desc>
</config>
I am trying to split the sentences using dot as separator but keeping in the same time in the HTML output the eventual HTML tags.
What I have so far is a template that splits the description but the HTML tags are lost in the output due to the normalize-space and substring-before functions.
My current template is given below:
<xsl:template name="output-tokens">
<xsl:param name="sourceText" />
<!-- Force a . at the end -->
<xsl:variable name="newlist" select="concat(normalize-space($sourceText), ' ')" />
<!-- Check if we have really a point at the end -->
<xsl:choose>
<xsl:when test ="contains($newlist, '.')">
<!-- Find the first . in the string -->
<xsl:variable name="first" select="substring-before($newlist, '.')" />
<!-- Get the remaining text -->
<xsl:variable name="remaining" select="substring-after($newlist, '.')" />
<!-- Check if our string is not in fact a . or an empty string -->
<xsl:if test="normalize-space($first)!='.' and normalize-space($first)!=''">
<p><xsl:value-of select="normalize-space($first)" />.</p>
</xsl:if>
<!-- Recursively apply the template for the remaining text -->
<xsl:if test="$remaining">
<xsl:call-template name="output-tokens">
<xsl:with-param name="sourceText" select="$remaining" />
</xsl:call-template>
</xsl:if>
</xsl:when>
<!--If no . was found -->
<xsl:otherwise>
<p>
<!-- If the string does not contains a . then display the text but avoid
displaying empty strings
-->
<xsl:if test="normalize-space($sourceText)!=''">
<xsl:value-of select="normalize-space($sourceText)" />.
</xsl:if>
</p>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
and I am using it in the following manner:
<xsl:template match="config">
<xsl:call-template name="output-tokens">
<xsl:with-param name="sourceText" select="desc" />
</xsl:call-template>
</xsl:template>
The expected output is:
<p>A <b>first</b> sentence here.</p>
<p>The second sentence with some link The link.</p>
<p>The <u>third</u> one.</p>
A good question, and not an easy one to solve. Especially, of course, if you're using XSLT 1.0 (you really need to tell us if that's the case).
I've seen two approaches to the problem. Both involve breaking it into smaller problems.
The first approach is to convert the markup into text (for example replace <b>first</b> by [b]first[/b]), then use text manipulation operations (xsl:analyze-string) to split it into sentences, and then reconstitute the markup within the sentences.
The second approach (which I personally prefer) is to convert the text delimiters into markup (convert "." to <stop/>) and then use positional grouping techniques (typically <xsl:for-each-group group-ending-with="stop"/> to convert the sentences into paragraphs.)
Here is one way to implement the second approach suggested by Michael Kay using XSLT 2.
This stylesheet demonstrates a two-pass transformation where the first pass introduces <stop/> markers after each sentence and the second pass encloses all groups ending with a <stop/> in a paragraph.
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<!-- two-pass processing -->
<xsl:template match="/">
<xsl:variable name="intermediate">
<xsl:apply-templates mode="phase-1"/>
</xsl:variable>
<xsl:apply-templates select="$intermediate" mode="phase-2"/>
</xsl:template>
<!-- identity transform -->
<xsl:template match="#*|node()" mode="#all" priority="-1">
<xsl:copy>
<xsl:apply-templates select="#*|node()" mode="#current"/>
</xsl:copy>
</xsl:template>
<!-- phase 1 -->
<!-- insert <stop/> "milestone markup" after each sentence -->
<xsl:template match="text()" mode="phase-1">
<xsl:analyze-string select="." regex="\.\s+">
<xsl:matching-substring>
<xsl:value-of select="regex-group(0)"/>
<stop/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
<!-- phase 2 -->
<!-- turn each <stop/>-terminated group into a paragraph -->
<xsl:template match="*[stop]" mode="phase-2">
<xsl:copy>
<xsl:for-each-group select="node()" group-ending-with="stop">
<p>
<xsl:apply-templates select="current-group()" mode="#current"/>
</p>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
<!-- remove the <stop/> markers -->
<xsl:template match="stop" mode="phase-2"/>
</xsl:stylesheet>
This is my humble solution, based on the second suggestion of #Michael Kay answer.
Differently from #Jukka answer (which is very elegant indeed) I'm not using xsl:analyse-string, as XPath 1.0 functions contains and substring-after are enough to accomplish the split. I've also started the match pattern from the config.
Here's the transform:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<!-- two pass processing -->
<xsl:template match="config">
<xsl:variable name="pass1">
<xsl:apply-templates select="node()"/>
</xsl:variable>
<xsl:apply-templates mode="pass2" select="$pass1/*"/>
</xsl:template>
<!-- 1. Copy everything as is (identity) -->
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<!-- 1. Replace "text. text" with "text<dot/> text" -->
<xsl:template match="text()[contains(.,'. ')]">
<xsl:value-of select="substring-before(.,'. ')"/>
<dot/>
<xsl:value-of select="substring-after(.,'. ')"/>
</xsl:template>
<!-- 2. Group by examining in population order ending with dot -->
<xsl:template match="desc" mode="pass2">
<xsl:for-each-group select="node()"
group-ending-with="dot">
<p><xsl:apply-templates select="current-group()" mode="pass2"/></p>
</xsl:for-each-group>
</xsl:template>
<!-- 2. Identity -->
<xsl:template match="node()|#*" mode="pass2">
<xsl:copy>
<xsl:apply-templates select="node()|#*" mode="pass2"/>
</xsl:copy>
</xsl:template>
<!-- 2. Replace dot with mark -->
<xsl:template match="dot" mode="pass2">
<xsl:text>.</xsl:text>
</xsl:template>
</xsl:stylesheet>
Applied on the input shown in your question, produces:
<p>A <b>first</b> sentence here.</p>
<p>The second sentence with some link The link.</p>
<p>The <u>third</u> one.</p>
this might do the trick:
http://symphony-cms.com/download/xslt-utilities/view/20816/
/J

Preserve certain html tags during XSLT

I have looked up solutions on stackflow, but none of them seem to work for me. Here is my question. Lets say I have the following text :
Source:
<greatgrandparent>
<grandparent>
<parent>
<sibling>
Hey, im the sibling .
</sibling>
<description>
$300$ <br/> $250 <br/> $200! <br/> <p> Yes, that is right! <br/> You can own a ps3 for only $200 </p>
</description>
</parent>
<parent>
... (SAME FORMAT)
</parent>
... (Several more parents)
</grandparent>
</greatgrandparent>
Output:
<newprice>
$300$ <br/> $250 <br/> $200! <br/> Yes, that is right! <br/> You can own a ps3 for only $200
</newprice>
I can't seem to find a way to do that.
Current XSL:
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="greatgrandparents">
<xsl:apply-templates />
</xsl:template>
<xsl:template match = "grandparent">
<xsl:for-each select = "parent" >
<newprice>
<xsl:apply-templates>
</newprice>
</xsl:for-each>
</xsl:template>
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
Use templates to define behavior on specific elements
<!-- after standard identity template -->
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
The first template says to swap description with newprice. The second one says to ignore the p element.
If you're unfamiliar with the identity template, take a look here for a few examples.
EDIT: Given the new example, we can see that you want to only extract the description element and its contents. Notice that the template action starts with the match="/" template. We can use this control where our stylesheet starts and thus skip much of the riffraff we want to filter out.
change the <xsl:template match="/"> to something more like:
<xsl:template match="/">
<xsl:apply-templates select="//description"/>
<!-- use a more specific XPath if you can -->
</xsl:template>
So altogether our solution looks like this:
<xsl:stylesheet
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
exclude-result-prefixes="xs">
<xsl:template match="/">
<xsl:apply-templates select="//description" />
</xsl:template>
<!-- this is the identity template -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
Shouldn't the contents of be inside a CDATA element? And then probably disable output encoding on xsl:value-of..
You should look into xsl:copy-of.
You would probably wind up with somthing like:
<xsl:template match="description">
<xsl:copy-of select="."/>
</xsl:template>
Probably the shortest solution is this one:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="description">
<newprice>
<xsl:copy-of select="node()"/>
</newprice>
</xsl:template>
<xsl:template match="text()[not(ancestor::description)]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document, the wanted result is produced:
<newprice>
$300$ <br /> $250 <br /> $200! <br /> <p> Yes, that is right! <br /> You can own a ps3 for only $200 </p>
</newprice>
Do note:
The use of <xsl:copy-of select="node()"/> to copy all the subtree rooted in description, without the root itself.
How we override (with a specific, empty template) the XSLT built-in template, preventing any text nodes that are not descendents of a <description> element, to be output.