XSLT Apply potentially multiple templates to a single node depending on conditions - html

Here is my need:
I have an HTML text composed from multiple nested SPAN.
Some of the spans include inline css. In that inline CSS, I need to transform SOME OF the style attributes into "classical" HTML, like .
Some spans have one or many of the attributes to be replaced by HTML elements.
Here is an example input :
<HTML>
<SPAN>
Some sentance including another
<SPAN STYLE="font-weight: bold;" >
included bold block
</SPAN>
with a tail in it and line breaks<BR/>
</SPAN>
<SPAN STYLE= "font-family: 'Helvetica';font-weight: bold;" >
Another span with 1 attribute not to be taken into account and 1 to be
</SPAN>
<SPAN STYLE= "font-family: 'Helvetica';font-weight: bold;text-decoration:underline;" >
Another span with two attributes to be taken into account
</SPAN>
</HTML>
What I would like as a result is :
<HTML>
<p>Some sentance including another
<p><b>included bold block </b></p>
with a tail in it and line breaks<BR/></p>
<p><b>Another span with 1 attribute not to be taken into account and 1 to be</b></p>
<p><b><u>Another span with two attributes to be taken into account</u></b></p>
</HTML>
I thought the best way to do it would be to use the Identity Transformation and to have templates to match the attributes with conditions like :
SPAN[contains(#STYLE, 'font-weight: bold;')
and :
SPAN[contains(#STYLE, 'text-decoration:underline;')]
Here is what I tried...
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="SPAN[contains(#STYLE, 'text-decoration:underline;')]">
<xsl:element name="u">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="SPAN[contains(#STYLE, 'font-weight: bold;')]">
<xsl:element name="b">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="SPAN">
<!-- replacing SPAN into <p>elements -->
<xsl:element name="p">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="SPAN/#STYLE"/>
<!-- suppression of the old Style attribute-->
</xsl:stylesheet>
The problem is, when it runs, it only matches either one of the the or the
I think I mis-understand how to use the xsl:copy in the templates to ask the template to evaluate for a SPAN and to re-evaluate the other templates for the same span, but I did not succeed to have it work with or without it...
I thank you in advance for your thoughts about it.
Best regards.

Here how you can do it with modes in XSLT 1.0
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="SPAN[contains(#STYLE, 'text-decoration:underline;')]" mode="u">
<u>
<xsl:apply-templates/>
</u>
</xsl:template>
<xsl:template match="SPAN" mode="u">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="SPAN[contains(#STYLE, 'font-weight: bold;')]" mode="b">
<b>
<xsl:apply-templates select="." mode="u" />
</b>
</xsl:template>
<xsl:template match="SPAN" mode="b">
<xsl:apply-templates select="." mode="u" />
</xsl:template>
<xsl:template match="SPAN">
<p>
<xsl:apply-templates select="." mode="b" />
</p>
</xsl:template>
</xsl:stylesheet>
Note how a template matching SPAN[contains(...)] has a higher priority than one just matching SPAN.
So, effectively you have a change of templates. For a SPAN tag, a p node is output, then the b template is called to check for bold. The b templates then calls the u template to check for underline, and this template then processes the children. You can add more templates in a similar fashion (so b calls u, then u calls i, and the i processes the children).
EDIT: If you wanted to avoid writing two templates for each, you could combine each pair of templates into one, like so
<xsl:template match="SPAN" mode="b">
<xsl:choose>
<xsl:when test="contains(#STYLE, 'font-weight: bold;')">
<b>
<xsl:apply-templates select="." mode="u" />
</b>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="." mode="u" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
And if you could use XSLT 2.0, you could use priorities, and next-match instead, which reduces the number of templates
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="SPAN">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="SPAN[contains(#STYLE, 'text-decoration:underline;')]" priority="8">
<u>
<xsl:next-match />
</u>
</xsl:template>
<xsl:template match="SPAN[contains(#STYLE, 'font-weight: bold;')]" priority="9">
<b>
<xsl:next-match />
</b>
</xsl:template>
<xsl:template match="SPAN" priority="10">
<p>
<xsl:next-match />
</p>
</xsl:template>
</xsl:stylesheet>

Related

Remove exceeding more than n consecutive br-tags from running text

I have some rich text input and want to allow only a maximum of two consecutive <br> tags.
I found a partial good answer to my problem: XSLT: remove duplicate br-tags from running text
However I need to allow up to two consecutive <br> tags. A more general question would be: how to allow a maximum of n consecutive <br> tags?
Example input:
<p>
Lorem ipsum...<br>
<br>
<br>
..dolor sit
</p>
Needed output:
<p>
Lorem ipsum...<br>
<br>
..dolor sit
</p>
Example input 2:
<p>
Lorem ipsum...<br>
lorem ... <br>
<br>
<br>
..dolor sit
</p>
Needed output:
<p>
Lorem ipsum...<br>
lorem ... <br>
<br>
..dolor sit
</p>
First, make sure you have XHTML, so <br/> instead of <br>! And then:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="UTF-8" method="xml" version="1.0" indent="yes"/>
<!-- Catch-all templates -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="processing-instruction()">
<xsl:copy/>
</xsl:template>
<!-- specific part -->
<xsl:template match="br">
<xsl:if test="not(preceding-sibling::node()[local-name() or normalize-space()][1][local-name()='br']) or not(preceding-sibling::node()[local-name() or normalize-space()][2][local-name()='br'])">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Depending on how many <br>'s following each other directly you want to allow, you can add more of the same, e.g. for the third one: or not(preceding-sibling::node()[local-name() or normalize-space()][3][local-name()='br']), 4th one or not(preceding-sibling::node()[local-name() or normalize-space()][4][local-name()='br']) etc.
try this one
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="br">
<xsl:if test="not(preceding-sibling::node()
[not(self::text() and normalize-space(.) = '')][2]
[self::br])">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Use the following code in your stylesheet:
<!-- set desired threshold here -->
<xsl:variable name="brThreshold" select="2"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="br[count(preceding-sibling::node()[not(self::text() and normalize-space(.) = '')][position() <= $brThreshold][self::br]) = $brThreshold]">
<!-- delete this br -->
</xsl:template>

XSLT transformation for multiple font types

I am trying to make an XSLT transformation from XML, I want to transform font style tags into HTML tags, but my I am doing something wrong.
My XML file is like this one :
<root>
<p>
<span>
<i/>
italic
</span>
<span>
<i/>
<b/>
bold-italic
</span>
<span>
normal
</span>
</p>
</root>
What I want is HTML with the same tags but my XSLT transformation does not work:
HTML:
<p>
<i>italic</i>
<i><b>bold-italic</b></i>
normal
<p>
I was trying xsl:if condition but it does not work,i do not know what I am doing wrong:
XSLT:
<xsl:template match="p">
<p>
<xsl:for-each select="span">
<xsl:if test="i">
<i>
<xsl:value-of select="."/>
</i>
</xsl:if>
<xsl:if test="b">
<b>
<xsl:value-of select="."/>
</b>
</xsl:if>
</xsl:for-each>
</p>
</xsl:template>
Do you know how to repair my code ?
Can you have more than just b and i elements? It may be possible to do this with a generic solution, that creates a nested element for each child element of a span element.
This solution uses a recursive template, that matches span, but with a parameter contain the index number of the child element that needs to be output. When this index exceeds the number of child elements, the text is output.
Try this XSLT too:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="span">
<xsl:param name="num" select="1"/>
<xsl:variable name="childElement" select="*[$num]"/>
<xsl:choose>
<xsl:when test="$childElement">
<xsl:element name="{local-name($childElement)}">
<xsl:apply-templates select=".">
<xsl:with-param name="num" select="$num + 1"/>
</xsl:apply-templates>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
This does assume that all the span element only contain elements you want to nest, in addition to the text.
You can test the contents of span element using an XPath expression with a predicate which tests for its contents, and match different templates for each situation. Since you need b and i for bold-italic, you should use that expression in one of your predicates.
The stylesheet below does the transformation using only templates (without the need of a for-each). I'm assuming the contents of your <span> elements is text (not mixed content):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:template match="p">
<p><xsl:apply-templates/></p>
</xsl:template>
<xsl:template match="span[i]">
<i><xsl:value-of select="."/></i>
</xsl:template>
<xsl:template match="span[b]">
<b><xsl:value-of select="."/></b>
</xsl:template>
<xsl:template match="span[i and b]">
<i><b><xsl:value-of select="."/></b></i>
</xsl:template>
</xsl:stylesheet>

What is a better XSL styling method for creating an HTML tag?

I have a fairly simple xsl stylesheet for transforming an xml doc that defines our html into an html format (please don't ask why, it's just the way we have to do it...)
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="HtmlElement">
<xsl:element name="{ElementType}">
<xsl:apply-templates select="Attributes"/>
<xsl:value-of select="Text"/>
<xsl:apply-templates select="HtmlElement"/>
</xsl:element>
</xsl:template>
<xsl:template match="Attributes">
<xsl:apply-templates select="Attribute"/>
</xsl:template>
<xsl:template match="Attribute">
<xsl:attribute name="{Name}">
<xsl:value-of select="Value"/>
</xsl:attribute>
</xsl:template>
The issue came up when I ran across this little bit of HTML requiring transformation:
<p>
Send instant ecards this season <br/> and all year with our ecards!
</p>
the <br/> in the middle breaks the logic of the transformation and gives me only the first half of the paragraph block: Send instant ecards this season <br></br>. The XML attempting to be transformed looks like:
<HtmlElement>
<ElementType>p</ElementType>
<Text>Send instant ecards this season </Text>
<HtmlElement>
<ElementType>br</ElementType>
</HtmlElement>
<Text> and all year with our ecards!</Text>
</HtmlElement>
Suggestions?
You can simply add a new rule for Text elements and then match both HTMLElements and Texts:
<xsl:template match="HtmlElement">
<xsl:element name="{ElementName}">
<xsl:apply-templates select="Attributes"/>
<xsl:apply-templates select="HtmlElement|Text"/>
</xsl:element>
</xsl:template>
<xsl:template match="Text">
<xsl:value-of select="." />
</xsl:template>
You could make the stylesheet a bit more generic in order to handle additional elements by adjusting the template for HtmlElement to ensure that it applies templates first to the Attributes element, and then to all elements except for the Attributes and HtmlElement elements by using a predicate filter in the select attribute of the xsl:apply-templates.
The built-in templates will match the Text element and will copy the text() to the output.
Also, the template for the root node that you currently have declared (i.e. match="/") can be removed. It simply re-defines what is already handled by the built-in template rules and does not do anything to change behavior, just clutters your stylesheet.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<!-- <xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>-->
<xsl:template match="HtmlElement">
<xsl:element name="{ElementType}">
<xsl:apply-templates select="Attributes"/>
<!--apply templates to all elements except for ElementType and Attributes-->
<xsl:apply-templates select="*[not(self::Attributes|self::ElementType)]"/>
</xsl:element>
</xsl:template>
<xsl:template match="Attributes">
<xsl:apply-templates select="Attribute"/>
</xsl:template>
<xsl:template match="Attribute">
<xsl:attribute name="{Name}">
<xsl:value-of select="Value"/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>

how to interpret HTML in XSL

I have the following xml
<results>
<first-name>Carl<first-name>
<data><b> This is carl's data </b></data>
</results>
How do I include the bold tags which is present in the <data> tag to be a part of the output but rendered as an HTML
When I say <xsl:value-of select="results/data"/> The output is
<b> This is carl's data </b>
I want to achieve "This is carl's data" as the output in bold.
Well <xsl:copy-of select="results/data/node()"/> is a start but if the requirement is part of a larger problem then you are better off writing a template for data elements which uses apply-templates to push the child nodes through some template(s) for copying HTML elements through to the output.
I am sure someone will let me know if I am being naive:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
<xsl:output method="html" indent="yes"/>
<xsl:template match="/results">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="first-name">
<xsl:value-of select="." />
<xsl:text>: </xsl:text>
</xsl:template>
<xsl:template match="data">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="b">
<b>
<xsl:value-of select="." />
</b>
</xsl:template>
</xsl:stylesheet>

Preserve certain html tags during XSLT

I have looked up solutions on stackflow, but none of them seem to work for me. Here is my question. Lets say I have the following text :
Source:
<greatgrandparent>
<grandparent>
<parent>
<sibling>
Hey, im the sibling .
</sibling>
<description>
$300$ <br/> $250 <br/> $200! <br/> <p> Yes, that is right! <br/> You can own a ps3 for only $200 </p>
</description>
</parent>
<parent>
... (SAME FORMAT)
</parent>
... (Several more parents)
</grandparent>
</greatgrandparent>
Output:
<newprice>
$300$ <br/> $250 <br/> $200! <br/> Yes, that is right! <br/> You can own a ps3 for only $200
</newprice>
I can't seem to find a way to do that.
Current XSL:
<xsl:template match="/">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="greatgrandparents">
<xsl:apply-templates />
</xsl:template>
<xsl:template match = "grandparent">
<xsl:for-each select = "parent" >
<newprice>
<xsl:apply-templates>
</newprice>
</xsl:for-each>
</xsl:template>
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
Use templates to define behavior on specific elements
<!-- after standard identity template -->
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
The first template says to swap description with newprice. The second one says to ignore the p element.
If you're unfamiliar with the identity template, take a look here for a few examples.
EDIT: Given the new example, we can see that you want to only extract the description element and its contents. Notice that the template action starts with the match="/" template. We can use this control where our stylesheet starts and thus skip much of the riffraff we want to filter out.
change the <xsl:template match="/"> to something more like:
<xsl:template match="/">
<xsl:apply-templates select="//description"/>
<!-- use a more specific XPath if you can -->
</xsl:template>
So altogether our solution looks like this:
<xsl:stylesheet
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
exclude-result-prefixes="xs">
<xsl:template match="/">
<xsl:apply-templates select="//description" />
</xsl:template>
<!-- this is the identity template -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="description">
<xsl:element name="newprice">
<xsl:apply-templates/>
</xsl:element>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/>
</xsl:template>
</xsl:stylesheet>
Shouldn't the contents of be inside a CDATA element? And then probably disable output encoding on xsl:value-of..
You should look into xsl:copy-of.
You would probably wind up with somthing like:
<xsl:template match="description">
<xsl:copy-of select="."/>
</xsl:template>
Probably the shortest solution is this one:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="description">
<newprice>
<xsl:copy-of select="node()"/>
</newprice>
</xsl:template>
<xsl:template match="text()[not(ancestor::description)]"/>
</xsl:stylesheet>
When this transformation is applied on the provided XML document, the wanted result is produced:
<newprice>
$300$ <br /> $250 <br /> $200! <br /> <p> Yes, that is right! <br /> You can own a ps3 for only $200 </p>
</newprice>
Do note:
The use of <xsl:copy-of select="node()"/> to copy all the subtree rooted in description, without the root itself.
How we override (with a specific, empty template) the XSLT built-in template, preventing any text nodes that are not descendents of a <description> element, to be output.