XSLT How to select only the value of an element with children - html

I have this code and can't edit it:
<elenco-treni>
<treno id='1'> Moderno
<percorso>Terni - Spoleto</percorso>
<tipo genere='locale'> aaa
<fermata>Narni</fermata>
<fermata informale='s'>Giano</fermata>
</tipo>
</treno>
<treno id='5' codice='G140'> Jazz
<percorso>Orte - Terontola</percorso>
<tipo genere='regionale'>
<fermata>Terni</fermata>
<fermata>Spoleto</fermata>
<fermata>Foligno</fermata>
<fermata>Assisi</fermata>
<fermata>Perugia</fermata>
</tipo>
</treno>
</elenco-treni>
and I got some problems:
When I select "elenco-treni", everything doesn't work
<xsl:for-each select="elenco-treni">
<xsl:value-of select="treno"/>
gives me blank result.
I can't get the value of tipo which is "aaa"
<xsl:for-each select="treno">
<xsl:value-of select="tipo"/>
gives me all of "tipo" children and it's value.

This is badly designed XML, in that it is using mixed content (elements that have both text nodes and other elements as children) in a way that mixed content wasn't designed to be used. Constructs like xsl:value-of work well if mixed content is used properly, but they don't work well on this kind of input.
When you're dealing with badly designed XML, the best thing is to start by transforming it to something cleaner. You could do this here with a transformation that wraps the text nodes in an element:
<xsl:template match="treno/text()[normalize-space(.)]">
<veicolo><xsl:value-of select="normalize-space(.)"/></veicolo>
</xsl:template>
This takes care only to wrap the non-whitespace text nodes.

Related

Need an xpath that finds all the elements of a particular type before the first occurence of a certain element

I need an xpath that fetches all the elements of a particular element type, say input, that occurs before the first occurrence of another element. the problem is, there is no proper hierarchy between the targeted elements and the 'another element'. and there can be any number of 'another element' present in the html.
i tried using the 'following' axes and it works if there is only one 'another element'. but if there are many it doesn't work
<a>
<b>
<input>zyx</input>
<div>abc</div>
<span>def</span>
<input>ghi</input>
</b>
<c>
<div class="SameAttribute">Test</div>
<input>jkl</input>
<div>mno</div>
</c>
<d>
<div class="SameAttribute">Test</div>
<input>pqr</input>
<div>stu</div>
</d>
</a>
as per the html structure above, i want only the input elements that are within the <b> tag. the xpath needs to ignore the input elements that are within <c> and <d> tags
Tried this
.//*[self::input][following::div[#class = 'SameAttribute']]
but it picks the elements from both <b> and <c> tags.
When i try this, nothing gets selected
.//*[self::input][following::(div[#class = 'SameAttribute'])[1]]
I cannot write xpaths containing any of the tags <b>, <c>, <d> due to other constraints
i want only the input elements that are within the <b> tag. the xpath
needs to ignore the input elements that are within <c> and <d> tags
Use:
//b//input
I need an xpath that fetches all the elements of a particular element
type, say input, that occurs before the first occurrence of another
element. the problem is, there is no proper hierarchy between the
targeted elements and the 'another element'. and there can be any
number of 'another element' present in the html.
This is not equivalent to the first requirement quoted above.
You don't specify what is mean't by "another element" but combining the two quoted requirements, and the provided source xml document, one can logically conclude that "another element" here means any following sibling of the element /a/b[1]
These will be selected by:
(//b)[1]//input
or for the provided xml document just:
/a/b[1]//input
If the document had more than one /a/b elements and you wanted to get the input descendants of only these /a/b/ elements that precede any /a/{X} elements, where {X} is a name different from b, use:
/a/b[not(preceding-sibling::*[not(self::b)])]//input
Finally, in the most general case, if you want to select the input descendants of only such b elements that come **before* any other (non-b) element (excluding the top element -- if the top element is a b then any input descendant of the top element satisfies the requirement, here is one XPath expression that selects these:
/*//b[not(ancestor::*[not(self::b) and parent::*])
and not(preceding::*[not(self::b)])]
//input
Here we use the fact that if an element x is before (in document order) an element y, then x is either an ancestor of y (belongs to itsancestor::* axis) or is a preceding element (belongs to its preceding::* axis)
XSLT-based verification:
This transformation evaluates all 5 XPath expressions and outputs the selected nodes:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select="//b//input"/>
==================================
<xsl:copy-of select="(//b)[1]//input"/>
==================================
<xsl:copy-of select="/a/b[1]//input"/>
==================================
<xsl:copy-of select="/a/b[not(preceding-sibling::*[not(self::b)])]//input"/>
==================================
<xsl:copy-of select=
"/*//b[not(ancestor::*[not(self::b) and parent::*])
and not(preceding::*[not(self::b)])]
//input"/>
</xsl:template>
</xsl:stylesheet>
When applied on the originally-provided XML document:
<a>
<b>
<input>zyx</input>
<div>abc</div>
<span>def</span>
<input>ghi</input>
</b>
<c>
<div class="SameAttribute">Test</div>
<input>jkl</input>
<div>mno</div>
</c>
<d>
<div class="SameAttribute">Test</div>
<input>pqr</input>
<div>stu</div>
</d>
</a>
the wanted, correct result is selected when evaluating each expression:
<input>zyx</input>
<input>ghi</input>
==================================
<input>zyx</input>
<input>ghi</input>
==================================
<input>zyx</input>
<input>ghi</input>
==================================
<input>zyx</input>
<input>ghi</input>
==================================
<input>zyx</input>
<input>ghi</input>
You can try this xpath
This is for indexing all input (please change count number for other) :
(.//*[self::input][following::div[#class = 'SameAttribute']])[1]
This is for simple way, input between tag <b> :
//b//input
One Xpath that would seem to meet your criteria is:
//input[not(preceding-sibling::*[contains(#class,'SameAttribute')])]
This will find all input elements that do not have a preceding sibling that has a class attribute that contains the class SameAttribute.
The way you've described the problem, the simplest solution is //b/*. Alternatively, if you want all the elements with the same parent as the first input element, you might want (//input)[1]/following-sibling::*.
You certainly don't want the following axis here: read up on the difference between following and following-sibling.
Your expression //*[self::input] is a very convoluted way of saying //input.
I tried using the combination of preceding and ancestor axes to arrive at the solution. following is the xpath that worked for me
(.//div[#class='SameAttribute'])[1]/preceding::*[self::input][ancestor::a]

XSL - Whitespace issue when setting dynamically placeholder

I have an xslt template, in which i am loading translation content from xml files.
I want to set dynamically the placeholder in an input field, but apparently I am keep getting whitespace (the placeholder is moved to the right).
Here is my code.
<xsl:attribute name="placeholder">
<xsl:value-of select="/paygate/language/computop.creditcard.number.message"/>
</xsl:attribute>
I tried removing the whitespace between the lines, also setting
<xsl:strip-space elements="*"/>
in the beginning of the file. Nothing worked :(
An XSLT processor ought to strip whitespace-only text nodes that are direct children of an <xsl:attribute> by default. If the transform you present is producing placeholder attributes with unwanted leading or trailing whitespace in their values, then, I conclude it is coming from the application of the <xsl:value-of> element; its result is not subject to whitespace stripping.
In that case, you could consider applying the standard normalize-space() XPath function to the attribute value:
<xsl:attribute name="placeholder">
<xsl:value-of select="normalize-space(string(/paygate/language/computop.creditcard.number.message))"/>
</xsl:attribute>
normalize-space() will delete both leading and trailing whitespace from its (string) argument, but will also replace each internal run of whitespace characters with a single space character.

How to store superscript in XML attribute and read using XSL?

I have a requirement where I need create an XML document dynamically. Some of the attributes of the nodes of this XML contain superscript Reg etc. My question is how should I store such superscript characters in XML and then read it using XSL to render as HTML. A sample XML is shown below:
<?xml version="1.0" encoding="utf-8"?>
<node name="Some text <sup>®</sup>"/>
I know this cannot be stored under sup tag inside attribute as it breaks XML. I tried using <sup> also in place of opening and closing tag. But then they are rendered as <sup> on HTML instead of actually making it superscript.
Please let me know the solution for this problem. I have control over generation of XML. I can write it the correct way, If I know what is the right way to store superscripts.
Since you're using XSL to transform the input into HTML, I would suggest using a different method to encode the fact that some things need to be superscripts. Make up your own simple markup, for example
<node name="Some text [[®]]"/>
The markup can be anything that you can uniquely identify later and doesn't occur naturally in your data. Then in your XSL process the attribute values that can contain this markup with a custom template that converts the special markup to <sup> and </sup>. This allows you to keep the document structure (i.e. not move these string values to text nodes) and still achieve your goal.
Please let me know the solution for this problem. I have control over
generation of XML. I can write it the correct way, If I know what is
the right way to store superscripts.
Because attributes can only contain values (no nodes), the solution is to store markup (nodes) inside elements:
<node>
<name>Some text <sup>®</sup></name>
</node>
If it's only single characters like ® that need to be made superscript, then you can leave the XML without crooks like <sup>, i.e. like
<node name="Some text ®"/>
and look for the to-be-superscripted characters during processing. A template like this might help:
<xsl:template match="node/#name">
<xsl:param name="nameString" select="string()"/>
<!-- We're stepping through the string character by character -->
<xsl:variable name="firstChar" select="substring($nameString,1,1)"/>
<xsl:choose>
<!-- '®' can be extended to be a longer string of single characters
that are meant to be turned into superscript -->
<xsl:when test="contains('®',$firstChar)">
<sup><xsl:value-of select="$firstChar"/></sup>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$firstChar"/>
</xsl:otherwise>
</xsl:choose>
<!-- If we we didn't yet step through the whole string,
chop off the first character and recurse. -->
<xsl:if test="$firstChar!=''">
<xsl:apply-templates select=".">
<xsl:with-param name="nameString" select="substring($nameString,2)"/>
</xsl:apply-templates>
</xsl:if>
</xsl:template>
This approach is however not very efficient, especially if you have lots of name attributes and/or very long name attributes. If your application is performance critical, then better do some testing whether the impact on processing times is justifiable.

Is there a way to detect numeric string in xslt?

I am now doing a html to xml xslt transformation, pretty straigh-forward. But I have one slight problem that is left unsolved.
For example, in my source html, a node looks like:
<p class="Arrow"><span class="char-style-override-12">4</span><span class="char-style-override-13"> </span>Sore, rash, growth, discharge, or swelling.</p>
As you can see, the first child node < span> has a value of 4, is it actually rendered as a arrow point in the browser (maybe some encoding issue, it is treated as a numeric value in my xml editor).
So my question is, I wrote a template to match the tag, then pass the text content of it to another template match :
<xsl:template match="text()">
<xsl:variable name="noNum">
<xsl:value-of select="normalize-space(translate,'4',''))"/>
</xsl:variable>
<xsl:copy-of select="$noNum"/>
</xsl:template>
As you can see, this is definitely not a good solution, it will replace all the numbers appearing in the string, not only the first character. So I wonder if there is a way to remove only the first character IF it is a number, maybe using regular expression? Or, I am actually going the wrong way, should there be a better way to think of solving this problem(e.g, changing the encoding)?
Any idea is welcomed! Thanks in advance!
Just use this :
<xsl:variable name="test">4y4145</xsl:variable>
<xsl:if test= "not(string(number(substring($test,1,1)))='NaN')">
<xsl:message terminate="no">
<xsl:value-of select="substring($test,2)"/>
</xsl:message>
</xsl:if>
This is a XSLT 1.0 solution. I think regex is an overkill for this.
Output :
[xslt] y4145
Use this single XPath expression:
concat(translate(substring(.,1,1), '0123456789', ''),
substring(.,2)
)

selenium xpath scrape of mixed content html span

I'm trying to scrape a span element that has mixed content
<span id="span-id">
<!--starts with some whitespace-->
<b>bold title</b>
<br/>
text here that I want to grab....
</span>
And here's a code snippet of a grab that identifies the span. It picks it up without a problem but the text field of the webelement is blank.
IWebDriver driver = new FirefoxDriver();
driver.Navigate().GoToUrl("http://page-to-examine.com");
var query = driver.FindElement(By.XPath("//span[#id='span-id']"));
I've tried adding /text() to the expression which also returns nothing. If I add /b I do get the text content of the bolded text - which happens to be a title that I'm not interested in.
I'm sure with a bit of xpath magic this should be easy but I'm not finding it so far!! Or is there a better way? Any comments gratefully received.
I've tried adding /text() to the expression which also returns nothing
This selects all the text-node-children of the context node -- and there are three of them.
What you refer to "nothing" is most probably the first of these, which is a white-space-only text node (thus you see "nothing" in it).
What you need is:
//span[#id='span-id']/text()[3]
Of course, there are other variations possible:
//span[#id='span-id']/text()[last()]
Or:
//span[#id='span-id']/br/following-sibling::text()[1]
XSLT-based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="node()|#*">
"<xsl:copy-of select="//span[#id='span-id']/text()[3]"/>"
</xsl:template>
</xsl:stylesheet>
This transformation simply outputs whatever the XPath expression selects. When applied on the provided XML document (comment removed):
<span id="span-id">
<b>bold title</b>
<br/>
text here that I want to grab....
</span>
the wanted result is produced:
"
text here that I want to grab....
"
I believe the following xpath query should work for your case. following-sibling useful for what you're trying to do.
//span[#id='span-id']/br/following-sibling::text()