XPath for text after <br/> - html

Looking to get the XPath of $2.00 with this block:
<td class="undefined" colspan="6">
<table class="history-bill-payments" cellspacing="0" cellpadding="0" border="0" align="center" width="99%">
<thead>
<tbody>
<tr>
<td valign="top">04/19/2016</td>
<td valign="top" style="text-align:right; height:">
$3.00
<br/>
$2.00
</td>
I have tried these but to no avail
$I->CanSeeElement("//table[contains(tbody/tr[2]/td/table/tbody/tr/td[2]/following-sibling::br)]");
$I->CanSeeElement("//table[contains(tbody/tr[2]/td/table/tbody/tr/td[2]/preceding-sibling::br/text(),'$2.00')]");
$I->CanSeeElement("//table[contains(tbody/tr[2]/td/table/tbody/tr/td[2]/following-sibling::br/text(),'$2.00')]");
Using firepath in Firefox I get this XPath
html/body/div[4]/div[2]/div/div/div/div/table/tbody/tr[2]/td/table/tbody/tr/td[2]
I was able to get the xpath of $3.00
$I->CanSeeElement("//table[contains(tbody/tr[2]/td/table/tbody/tr/td[2]/text(),'$3.00')]");

In XPath 1.0, given a node-set, contains() would only evaluates the first node in the set. That's why your initial XPath successfully find text node that contains '$3.00', but not the one that contains '$2.00'.
XPath expression that is close to the way your xpath of $3.00 works would be as follow :
//table[tbody/tr[2]/td/table/tbody/tr/td[2]/text()[contains(.,'$2.00')]]
The XPath above works by applying contains() on individual text node instead of passing multiple text nodes at once.

td with certain contents
From your trials, it seems you're fine with keying off of $2.00 literally, so you could use this XPath 2.0 expression to get the td that ends with $2.00:
//td[ends-with(normalize-space(), '$2.00')]
Note that browsers don't generally support XPath 2.0, so use this XPath 1.0 expression if running within a browser and you're ok with $2.00 appearing anywhere within the td:
//td[contains(.,'$2.00')]
Text following a br
If you don't want to literally specify the $2.00, you'll have to state some other invariant constraint. For example, this XPath will return the string that follows the br contained within a td that starts with $3.00:
normalize-space(//td[starts-with(normalize-space(),'$3.00')]/br/following::text())
See also
XPath contains() works differently in XPath v1.0 vs v2.0+
How to use XPath contains() here?
How to use XPath contains() for specific text?

If you need, just add table id or any other specific locator.
xpath=//table//tr/td[2]/text()[2]

Related

What is the role of parentheses in XPath 1.0?

In Chrome DevTools > Elements, when I search for //tr/td/span I find an element (because such an element exists on my page).
When I search for (//tr)/td/span or (//tr/td)/span I also find this element.
But neither //tr(/td)/span nor //tr/(td)/span nor //tr/(td/)span find anything.
What is the meaning of these parentheses in XPath?
Parenthesis in XPath are used as they are in other programming languages:
Function argument grouping: e.g: //tr/td[contains(.,"e")]
Evaluation precedence indication: e.g: normal arithmetic expression grouping as well as leading path grouping (trace LocationPath through to PrimaryExpr in the XPath grammar) as in (//td)[1] to find the first td in the document as opposed to //td[1] which finds the td elements that are the first child of their respective parent elements.
They're also used in
node tests: e.g: node(), element(), ...
processing instructions: e.g: PageBreak().
Your examples that do not find anything (e.g: //tr(/td)/span, //tr/(td)/span1, etc) have parenthesis embedded within the path that do not follow in one of the above categories. Such use of parenthesis are actually syntactically invalid and should have been reported as such rather than silently failing.
1Note that this expression would actually be syntatically valid under XPath 2.0/3.0. Thanks, #Andersson, for noticing.
I don't think that parenthesis mean something in your case, but it might be used to return required node/nodes set depending on passed index
For instance, HTML is like below:
<table>
<tr>
<td>
<span>first</span>
</td>
<td>
<span>second</span>
</td>
</tr>
<tr>
<td>
<span>third</span>
</td>
<td>
<span>fourth</span>
</td>
</tr>
</table>
(//tr)[1]/td will return cells for first row only (first, second)
(//tr)[2]/td - for second row (third, fourth)
(//tr/td)[1] - first cell of first row (first). Note that //tr/td[1] will returns each first cell of each row (first, third)
...

Combining functions in xpath selector

I have a selection I want to make in xpath and can't seem to get it right. So I have: //td[starts-with(#id, '16276688381') and not(ends-with(#id, '_name'))]
This is the simple html
<td id="16276688381_name">I don't want this</td>
<td id="16276688381_B3" >What I want</td>
<td id="16276688381_B4" >More of these...I want them</td>
Once I add the and my selection disappears. Any idea what is going wrong here?
As Martin points out, XPath 1.0 does not support ends-with, but you can simulate it with some string length calculations:
//td[starts-with(#id, '16276688381') and
not(substring(#id, string-length(#id) - 4) = '_name'))]
ends-with is a function introduced in XPath 2.0 in 2007, browsers unfortunately still only support XPath 1.0 from 1999.

Special Characters in HTML Element

What I'm trying to do is output a percent sign (%) directly into a < td > tag. Below is my code:
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td class="item_container" %%=v(#Item_Container_Style)=%%>
...
When I test the XSL I get the following error:
SAXParseException: Expected an attribute name (Set_A_Custom.xsl, line 205, column 38)
So basically it's seeing "%%=v(#Item_Container_Style)=%%" as invalid HTML but I need this code to be there.
If you are wondering why I am doing this it is because I am writing the XSL to output HTML that contains AMPscript (An ExactTarget proprietary Scripting language). You don't need to know anything about AMPscript though to help me out though, I just need to output the percent sign (%) in the HTML and everything will work.
Any ideas? For the record I'm using XSL 1.0. Thanks all!
An XSLT stylesheet must itself be well-formed XML, so you can't include this kind of construct directly in the stylesheet. If the XSLT processor you're using supports disable-output-escaping then you would be able to do something like
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<xsl:text disable-output-escaping="yes"><![CDATA[<td class="item_container" %%=v(#Item_Container_Style)=%%>]]></xsl:text>
...
<xsl:text disable-output-escaping="yes"><![CDATA[</td>]]></xsl:text>
</tr>
</table>
If it does not allow disable-output-escaping then your only option is to use the text output method, and write all the tags you want to output as text with the angle brackets escaped (or in CDATA).
What I'm trying to do is output a percent sign (%) directly into a <td> tag.
Not possible with the "html" or "xml" output modes. XSLT has been designed to create syntactically sane HTML, you cannot make it do anything else.
Of course you could switch to the "text" output mode and do whatever you like, but generating HTML this way it a lot harder.
Alternatively you can use disable-output-escaping, if your XSLT processor supports it, but this will quickly degenerate your XSLT stylesheet into a mess if you need to do it in many places.
That being said, here's a proposal. In XSLT you use the "html" output mode and this:
<td
class="item_container"
amp-1="%%=v({#Item_Container_Style})%%"
amp-2="%%=v({#Some_Other_Element})%%"
>
some text %%=v(<xsl:value-of select="Other_Stuff" />)%% more text
</td>
That is syntactically valid XSLT which covers both cases (multiple placeholders in attributes, multiple placeholders in the text) and creates syntactically valid HTML:
<td
class="item_container"
amp-1="%%=v(item container style content)%%"
amp-2="%%=v(some other element content)%%"
>
Here some text %%=v(other stuff)%%
</td>
and then you use a post-processing step to convert that HTML into AMPscript:
Regex-replace \bamp-\d+="(%%[\s\S]*?%%)" with $1, which would result in
<td
class="item_container"
%%=v(item container style content)%%
%%=v(some other element content)%%
>
Here some text %%=v(other stuff)%%
</td>
Handling HTML with regular expressions is generally strongly dis-recommended, but this might just be a narrow-enough use case.
AMPScript appears to have a standards-based syntax as an alternative to its proprietary syntax:
Delimiter Comparison
The table below demonstrates the similarities between standard AMPscript delimiters and server-side delimiters.
Standard AMPscript Delimiter Tag-based AMPscript Delimiter
%%[ <script runat=server language=ampscript>
etc
Does this help you?

One Xpath expression doesn't work in selenium, but works in Firefox

I have one question about xpath.
There is td like this in chrome:
<td class="dataCol col02">
"Hello world(notes:there is$)nbsp;"
[View Hierarchy]
</td>
but when I inspect the same element in Firefox it doesn't have $nbsp and double quotes;
<td class="dataCol col02">
Hello world
[View Hierarchy]
</td>
I used FireFinder and use the xpath:
//td[text()='Hello world']
, it can locate that element.
but when I use selenium api 2.24, it couldn't find that element.
by.xpath("//td[text()='Hello world']")
Do you have any idea of that?
Thanks!
Try with normalize-space() which trims leading and trailing whitespace characters:
//td[normalize-space(text())='Hello world']
Edit following the different comments:
here's an XPath expression that's probably better suited in the general case:
//td[starts-with(normalize-space(.), 'Hello world')]
meaning it matches <td> nodes if the concatenated string content of the whole <td>, less leading and trailing whitespace, starts with "Hello world"
I would try to use contains() function.
Your xpath will look like: //td[contains(text(),'Hello world')]

Get last </td></tr> with regular expression?

I need to get all tags between last </td> and the closing </tr> in each row. The regular expression I use <\/TD\s*>(.*?)<\/TR\s*> retrieve all from first </TD> till last </TR> - marked with bold on sample below.
<TABLE>
<TR><TD>TD11**</TD><TD>TD12</TD><TD>TD13</TD><SPAN><FONT>test1</FONT></SPAN></TR>**
<TR><TD>TD21**</TD><TD>TD22</TD><TD>TD23</TD><SPAN><FONT>test2</FONT></SPAN></TR>**
</TABLE>
But a what I really need is
<TABLE>
<TR><TD>TD11</TD><TD>TD12</TD><TD>TD13**</TD><SPAN><FONT>test1</FONT></SPAN></TR>**
<TR><TD>TD21</TD><TD>TD22</TD><TD>TD23**</TD><SPAN><FONT>test2</FONT></SPAN></TR>**
</TABLE>
Its not recommended to use regular expressions to parse HTML, html is non regular and there for notoriously unreliable when trying to use regular expressions.
Heres a good blog post explaining the logic and offering alternatives:
http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
</TD>((?:(?!</T[DR]>).)*)</TR>
The regex starts to match at the first </TD>, but fails as soon as it reaches the second </TD> because of the (?!</T[DR]>)., which matches any character that's not the first character of a </TD> or </TR> tag. That's optional because of the enclosing (?:...)*, so it tries to match the next part of the regex, which is </TR>. That fails too, so the match attempt is abandoned.
It tries again starting at the second </TD> and fails again. Finally, it starts matching at the third </TD> and successfully matches from there to the first </TR>.
You may want to specify "single-line" or "dot-matches-all" mode, in case there are newlines that didn't show in your example. You didn't specify a regex flavor, so I can't say exactly how to do that.