for-each-group text paragraph - xslt 2.0 - html

I am looking for a solution to group text based on the title h1. I tried this with for-each-group, starts-with ="h1". The problem is that the h1 is not on the same level as the rest of the elements (div/h1).
Input html:
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>test</title>
</head>
<body>
<div>
<h1><b>TRAIN</b></h1>
</div>
<p>text</p>
<p>In this field there is text</p>
<div>
<h1><b>nr1</b><b>CAR</b></h1>
</div>
<h2><b>1.</b><b>nr2</b><b>area</b></h2>
<p>infos about cars</p>
<p><b>more and</b>more infos about cars</p>
</body>
</html>
What I have so far is:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
xpath-default-namespace="http://www.w3.org/1999/xhtml">
<xsl:output omit-xml-declaration="yes" method="xhtml" version="1.0" encoding="UTF-8"
indent="yes"/>
<xsl:template match="head"/>
<xsl:template match="body">
<xsl:for-each-group select = "*" group-starting-with = "h1">
<output>
<xsl:apply-templates select="current-group()"/>
</output>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{name()}">
<xsl:apply-templates select="node()"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
But the output is not working the way I want. I would like to have two output-blocks as this example output:
<html>
<output>
<div><h1><b>TRAIN</b></h1></div>
<p>text</p>
<p>In this field there is text</p>
</output>
<output>
<div><h1><b>nr1</b><b>CAR</b></h1></div>
<h2>
<b>1.</b>
<b>nr2</b>
<b>area</b>
</h2>
<p>infos about cars</p>
<p><b>more and</b>more infos about cars</p>
</output>
Thanks for any help!

You could use the descendant-or-self axis, to group starting on elements which have h1 as a descendant (or are h1 elements themselves)
<xsl:for-each-group select="*" group-starting-with="*[descendant-or-self::h1]">
Also note that in your XSLT you have used xpath-default-namespace, but your input XML does not use that namespace, so as it stands your body template in your XSLT won't match the input. Either you need to add the default namespace to your input, or remove the xpath-default-namespace from your XSLT.

How about:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:template match="/html">
<xsl:copy>
<xsl:for-each-group select="body/*" group-starting-with="div[h1]">
<output>
<xsl:copy-of select="current-group()"/>
</output>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>

Related

structure, under certain conditions

I'm trying my hand at html but I don't know how to do it. That's my but doesnt work:(:
<xsl:template match="*[contains(local-name(), '.')]">
<xsl:element name="{translate(local-name(), '.', '_')}" namespace="{namespace-uri()}">
<xsl:apply-templates select="#* | node()"/>
</xsl:element>
</xsl:template>
Does anyone have an idea how to deal with it?
It seems a recursive grouping problem though the single example doesn't really spell out when to wrap and/or nest items as lists; nevertheless with XSLT 2 or 3 it could be tackled with a recursive function using for-each-group:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:mf="http://example.com/mf"
exclude-result-prefixes="#all"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="html" indent="yes" html-version="5"/>
<xsl:function name="mf:wrap" as="element()*">
<xsl:param name="elements" as="element()*"/>
<xsl:param name="level" as="xs:integer"/>
<xsl:for-each-group select="$elements" group-adjacent="boolean(self::*[matches(local-name(), '^li[' || $level || '-9]+$')])">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<ul class="li{$level}">
<xsl:sequence select="mf:wrap(current-group(), $level + 1)"/>
</ul>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:function>
<xsl:template match="*[matches(local-name(), '^li[0-9]+$')]">
<li>
<xsl:apply-templates/>
</li>
</xsl:template>
<xsl:template match="*[*[matches(local-name(), '^li[0-9]+$')]]">
<div>
<xsl:apply-templates select="mf:wrap(*, 1)"/>
</div>
</xsl:template>
<xsl:template match="uz">
<h5>
<xsl:apply-templates/>
</h5>
</xsl:template>
<xsl:template match="/">
<html>
<head>
<title>.NET XSLT Fiddle Example</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
https://xsltfiddle.liberty-development.net/bnnZWK
Note that the list structure the above creates is a bit different, all liX items of the same X level are wrapped into a single ul class="liX" wrapper while your wanted sample at some places seems to wrap several items and at other places wrap only single items.

XSLT handling text with tag in between

I have the following XML: <a>Text with <b>stuff</b> here</a>
With my code:
<xsl:template match="*[local-name() = 'a'][namespace-uri()=namespace-uri(.)]">
<xsl:value-of select="normalize-space(text()) "/><xsl:text> </xsl:text>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="*[local-name() = 'b'][namespace-uri()=namespace-uri(.)]">
<xsl:value-of select="normalize-space(text())"/><xsl:text> </xsl:text>
<xsl:apply-templates/>
</xsl:template>
I only get the result:
Text with stuff
What I want is:
Text with stuff here.
So how do I handle the remaining text after the <b/> element?
Why is this so complicated? If this is really your XML input:
<a>Text with <b>stuff</b> here</a>
then the following stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:template match="/">
<xsl:value-of select="a" />
</xsl:template>
</xsl:stylesheet>
will return the requested* result:
Text with stuff here
--
(*) except for the period at the end, which is not present in the input.

I have a problems with xpath, i need href link and other <a> attributes

I need get some part of a tag. My XML is like that
<div class="item">
<h2>Vyzivovy poradca</h2>
...
...
<div class="watch"><span>sometihink</span></div>
</div>
And I need href attribute and "data-id" attribute. My template look like
<xsl:variable name="url" select="xhtml:h2/xhtml:a/href"/>
<xsl:variable name="job_id" select="xhtml:div[#class = 'watch']/xhtml:a/data-id"/>
<job>
<xsl:attribute name="id"><xsl:value-of select="$job_id"/></xsl:attribute>
<url name="url"><xsl:value-of select="$url"/></url>
</job>
and template for tag a is:
<xsl:template match="xhtml:a">
<xsl:copy>
<!-- can not copy href, cause it is not absolute url ! -->
<xsl:copy-of select="#align|#title|#rel|#itemprop|#itemtype|#itemscope"/>
<xsl:attribute name="target">_blank</xsl:attribute>
<xsl:apply-templates select="*|text()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()"><xsl:value-of select="normalize-space(.)"/></xsl:template>
<xsl:template match="text()[ancestor::xhtml:pre]"><xsl:value-of select="etl:regex-replace(., '(\s|\n)+', '$1', 'g')"/></xsl:template>
but it doesn't work, some ideas?
This input XML:
<div class="item">
<h2>
Vyzivovy poradca
</h2>
...
...
<div class="watch">
<a href="sth"
data-id="292931"
data-active="somethink"
data-inactive="blablalba"
data-class="monitored"
class="watchItem"
title="watching"><span>sometihink</span></a>
</div>
</div>
Given to this XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="//a[descendant::text() = 'sometihink']">
<root>
<href>
<xsl:value-of select="#href"/>
</href>
<data-id>
<xsl:value-of select="#data-id"/>
</data-id>
</root>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Produces this output XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<href>sth</href>
<data-id>292931</data-id>
</root>
Notes:
I'm assuming that the "sometihink" content is the most unique
characteristic of the a you seek. If it's something else (such as the parent div[#class="watch"]), let me know and we can adjust.
Update per OP's comment below:
This updated XSLT:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<root>
<item-href>
<xsl:value-of select="//div[#class='item']/h2/a/#href"/>
</item-href>
<watch-data-id>
<xsl:value-of select="//div[#class='watch']/a/#data-id"/>
</watch-data-id>
</root>
</xsl:template>
</xsl:stylesheet>
Given the above input XML will yield this output XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<item-href>url.html</item-href>
<watch-data-id>292931</watch-data-id>
</root>
containing the requested attribute values.

xslt retrieve information from <a>

<div>
<a href="M_TestNamespace_StoredNumber_Swap``1_2_890a5ef1.htm">
Swap
<span class="languageSpecificText">
<span class="cs"><</span>
<span class="vb">(Of </span><span class="cpp"><</span>
<span class="fs"><'</span><span class="nu">(</span>
</span>
T
<span class="languageSpecificText">
<span class="cs">></span>
<span class="vb">)</span>
<span class="cpp">></span>
<span class="fs">></span>
<span class="nu">)</span>
</span>
</a>
<div>
I would like using XSLT and translate above into the result like this:
<div>
Swap(T)
<div>
FYI, the "(" and ")" are from <span class="nu"/>.
You might want to create a parameter to hold the 'nu' value.
<xsl:param name="lang" select="'nu'" />
Then, you would be able to extract the language specific text like so
<xsl:template match="span[#class='languageSpecificText']">
<xsl:value-of select="span[#class=$lang]" />
</xsl:template>
Here is the full XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"/>
<xsl:param name="lang" select="'nu'" />
<xsl:template match="a">
<xsl:apply-templates />
</xsl:template>
<xsl:template match="span[#class='languageSpecificText']">
<xsl:value-of select="span[#class=$lang]" />
</xsl:template>
<xsl:template match="a/text()">
<xsl:value-of select="normalize-space()" />
</xsl:template>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to your sample XML, the following is output
<div>Swap(T)</div>
Change the parameter to 'vb' and you get the following
<div>Swap(Of T)</div>
Try like so:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
<xsl:template match="div">
<xsl:element name="div">
<xsl:value-of select="normalize-space(a/text()[1])"/>
<xsl:value-of select="(.//span/span[#class='nu'])[1]/text()"/>
<xsl:value-of select="normalize-space(a/text()[2])"/>
<xsl:value-of select="(.//span/span[#class='nu'])[2]/text()"/>
</xsl:element>
</xsl:template>
</xsl:stylesheet>
I've assumed that the <div> is correctly closed :).
One of the shortest ways to generate the wanted result is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="a">
<div><xsl:apply-templates/></div>
</xsl:template>
<xsl:template match="a/text() | span[#class='nu']/text()">
<xsl:value-of select="normalize-space()"/>
</xsl:template>
<xsl:template match="text()"/>
</xsl:stylesheet>
Explanation:
All text nodes are ignored by: <xsl:template match="text()"/> -- this effectively "deletes" them from the output.
Only the text-node children of a and span[#class='nu'] are treated differently (used to generate text-nodes in the output -- by the template matching: a/text() | span[#class='nu']/text().
The unwanted white-space in the text-node children of a is removed using the standard XPath function normalize-space()

Vertical output XSL and XML

I am working on a simple dictionary in XML, and now I'm trying to output some words vertical, but they all come out on a line without spaces.
This is some of the XML file
<thesaurus>
<dictionary>
<language>English</language>
<word type="1">word 1</word>
<word type="2">word 2</word>
<word type="3">word 3</word>
<word type="4">word 4</word>
<word type="5">word 5</word>
<word type="6">word 6</word>
</dictionary>
</thesaurus>
This is my first "almost" solution
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates select="//word">
<xsl:sort order="ascending"/>
</xsl:apply-templates>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
That solution only prints out all the word like this
AgentsColorFoundationsGrainPartialPogotypePretendSilentStrollTender
My second try is something like this
<xsl:for-each select="thesaurus">
<h1> <xsl:value-of select="//word"/></h1>
</xsl:for-each>
In that way I could style the words and they will print vertical, but the thing is that only the first of the words is printing. =/
Would be great with a hint :)
Thanks
Use this template:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<html>
<body>
<xsl:apply-templates select="*/*/word">
<xsl:sort order="ascending"/>
</xsl:apply-templates>
</body>
</html>
</xsl:template>
<xsl:template match="word">
<xsl:value-of select="."/>
<br/>
</xsl:template>
</xsl:stylesheet>
Output:
<html>
<body>word 1<br />word 2<br />word 3<br />word 4<br />word 5<br />word 6<br /></body>
</html>