xpath ignore node - html

I'm looking for a way to select a node in xpath, giving that a node on it's path may exist or not. Just like '?' works in regexp ;)
For instance, I'd like to figure out a xpath query to get to <td> regardless of the case whether <tbody> node exists or not, with something like /table/(tbody)?/tr/td. I'd like it to work in both cases:
<table>
<tr>
<td />
</tr>
</table>
and
<table>
<tbody>
<tr>
<td />
</tr>
</tbody>
</table>

This may fail to cover more complex cases, but in this example using /table/tbody/tr/td | /table/tr/td should do the trick.

You can do:
//table/descendant::tr/td
or
//table//tr/td
depending on your taste. The double slash is a "look that up somewhere on this level or deeper" (more formally, descendant-or-self:: axis). The spec is, surprisingly, a very good read on this!

Related

How Browser Engine works?

Here are my HTML test codes using Google HTML/CSS guide.
<table>
<thead>
<tr>
<th>Date
<th>Country
<tbody>
<tr>
<td>24/07/2018
<td>Myanmar
<tr>
<td>31/06/2018
<td>France
</table>
The following is how browser interprets it.
<table>
<thead>
<tr>
<th>Date
</th>
<th>Country
</th>
</tr>
</thead>
<tbody>
<tr>
<td>24/07/2018
</td>
<td>Myanmar
</td>
</tr>
<tr>
<td>31/06/2018
</td>
<td>France
</td>
</tr>
</tbody>
</table>
Here is my questions.
How the browser detect the lack of closing tag and how they interprets it.
It is preferable to use without closing elements? If it is, when should I use?
If it is not preferable, why?
Will it be impact on styling and adding interactivity on HTML semantic style?
Answers:
This is beyond the scope of SO, but just like any compiler detects something that opened is not closed. You can try and program something that identifies a valid bracket series, it would probably be similar.
Not using closing elements may break your page beyond being horribly un-maintainable. Always use closing elements.
See 2.
See 2.
Browsers sometimes can guess what you meant (better say, they parse luckily in a way that produces what you meant), but might also be wrong. See this:
<div>Hello<span>world
is this:
<div>Hello</div><span>world></span>
or
<div>Hello<span>world></span></div>
both are valid, and the browser has no idea which you meant. If you really want to get into how browsers are supposed to parse HTML, see this great link by
#modu and #kalido: http://w3c.github.io/html/syntax.html#tokenization . You may be able to workout how a browser should parse the above line.

Regular expression different result

Hello I want match a table tag, followed by any characters but not another table, followed by an element with id ContentPlaceHolder1 and finally followed by the /table closed tag.
I write this reg exp:
~\<table[^>]*>.*?ContentPlaceHolder1.+?<\/table>~is
In my text editor (Emeditor) work fine, in PHP script this match the first table tag of page and al the followed code.
Can anyone tell me what's wrong?
Tks a lot
I am just assuming what you wish to achieve, and as Matt has commented on your question, a code snippet with an explanation of what exactly you are trying to achieve would help us help you.
So, in that context, I will try to guess the issue:
I'm guessing that your code has an element with id ContentPlaceHolder1 near the end and maybe nowhere else. What is leading me to assume that is that you are stating:
in PHP script this match the first table tag of page and al the followed code.
and also
want match a table tag, followed by any characters but not another table
Though this is not the case. In fact your regex is doing the following:
Match the first <table> tag with any attributes there might be inside it ([^>]*)
Match any character as few times as possible (.*?)
Match ContentPlaceHolder1
Match at least one character to any, but as few as possible to make a match (.+?)
Match a closing <\/table> tag
What I tend to believe you are misinterpreting is step #2. What this step is trying to achieve, is not to ignore leading <table> tags, but instead ignore multiple occurrences of the keyword ContentPlaceHolder1.
Consider the following example (please ignore that the html is broken, it's just an example):
<table border="3" cellpadding="10" cellspacing="10">
<td>
<table border="3" cellpadding="3" cellspacing="3">
<td>2nd table</td>
<some_element id="ContentPlaceHolder1"></some_element>
</table>
<td>2nd table</td>
<tr>
<td>2nd table</td>
<td>2nd table</td>
</tr>
</table>
<some_element id="ContentPlaceHolder1"></some_element>
</td>
<td> the cell next to this one has a smaller table inside of it, a table inside a table.</td>
</table>
Here, .*? is not instructing the regex engine to avoid matching a second <table> tag, what is instructing instead is to match the first occurence of the keyword ContentPlaceHolder1 instead of greedily matching the last one.
What you are trying to achieve, can be achieved using Negative Lookahead. What this implies, is that it instructs the regex engine, to look further away and assure that it doesn't match the first subset, if the second one exists. You can see this in practice in this demo, where I'm using negative lookahead to instruct the regex engine to only match a <table> tag if it is not followed by another <table> tag (<table[^>]*>(?!.*<table[^>]*>).
Please review my answer, and if it does solve your issue, please add more information and a sample of your code so that we can provide further assistance.
Regards
tks for the answere
This is an ipotetical page code:
<table border="3" cellpadding="10" cellspacing="10">
<tr>
<td>aaaaa</td>
</tr>
</table>
<table border="3" cellpadding="10" cellspacing="10">
<tr>
<td>aaaaa</td>
</tr>
</table>
<table border="3" cellpadding="10" cellspacing="10">
<tr>
<td>
<!-- from here -->
<table border="3" cellpadding="3" cellspacing="3">
<tr>
<td>aaaa</td>
<td><a id="ContentPlaceHolder1">link</a></td>
</tr>
</table>
<!-- to here -->
</td>
</tr>
</table>
<table border="3" cellpadding="10" cellspacing="10">
<tr>
<td>aaaaa</td>
</tr>
</table>
<table border="3" cellpadding="10" cellspacing="10">
<tr>
<td>aaaaa</td>
</tr>
</table>
I want to match from the first comment to the second.
In other word I want to match the complete table that contains elemnt with ContentPlaceHolder1 id.
My regexp in PHP match the first page table tag.
Tks a lot

Xpath using sibblings or fellowing in two defrent Cell

Put bluntly I want to locate TestCoupon10% inside td then open a sibling td then locate //a[contains(#id,"cmdOpen")] I did try sibling and fellowing but likely I didnt do it right because
//span[./text()="TestCoupon10%"]/following-sibling:a[contains(#id,"cmdOpen")]
result into an invalid xpath. the HTML structure look as fellow
<tr>
<td>
<span id="oCouponGrid_ctl03_lblCode">TestCoupon10%</span>
</td>
<td>...</td>
<td>...</td>
<td valign="middle" align=""right">
<a id="oCouponGrid_ctl03_cmdOpen">
</td>
</tr>
I need to find cmdOpen and test coupon does anyone has an idea how to?
Axes are delimited with double colons, not single ones (those are used for namespace prefixes). You wanted to say this:
//span[./text()="TestCoupon10%"]/following-sibling::a[contains(#id,"cmdOpen")]
But - the <a> is not a following sibling of the <span> in question. You need to do some navigating:
//span[./text()="TestCoupon10%"]/parent::td/following-sibling::td/a[contains(#id,"cmdOpen")]
Or, simply avoid descending into the tree you you don't have to "climb up" again in the first place.
//td[span = "TestCoupon10%"]/following-sibling::td/a[contains(#id,"cmdOpen")]

Xpath help to Find Unique value

I want to find the first tr tag with PONumber: text. I am not able to do that. Any help? I can find it using the //table/tbody/tr/td[contains(text(),'PONumber')] but it gives 2 objects. I want to find the first one only.
<tr>
<td class="clsLabel" align="right"> PONumber: </td>
<td class="clsInput"> PN659 </td>
</tr>
<tr>
<td class="clsLabel" align="right"> PreviousPONumber: </td>
<td class="clsInput"/>
</tr>
You can use following xpath to find exact object which you want
//tr/td[normalize-space(.)='PONumber:']
You can use something like
(//tr/td[contains(text(),'PONumber')])[1]
so put the xpath in brackets and with [1] you can specifiy to only return the first entry. Otherwise you could also use something like:
//tr/td[contains(text(),'PONumber') and not(contains(text(),'Previous'))]
so "Previous" will be excluded from the search results
You can limit the XPath result to return only the first matched by using [1] :
(//table/tbody/tr/td[contains(.,'PONumber')])[1]

html table syntax validation

This should be an easy one.
I have a table like so:
<table>
<tr>
<td></td><td></td><td></td><td></td>
</tr>
<tr>
<td></td>
</tr>
</table>
My firefox 3 validator says this is acceptable code. It just seems wrong to me, are there any possible issues leaving the table rows uneven like this? It works in IE7 too.
You should use 'rowspan' or 'colspan' attributes
<table>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td colspan="3"></td>
</tr>
</table>
Table rows are not required to have the same number of cells. The number of columns in the table is determined from the row with most cells.
Your second table row will just have three cells that are blank (which is not the same as empty cells).
If you want to use uneven amounts of rows/columns, you need to should use rowspan and/or colspan attributes to indicate this.
eg:
<table>
<tr><td></td><td></td><td></td></tr>
<tr><td colspan="3"></td></tr>
</table>
As guffa corrected me below, colspan isn't technically needed, but it never hurts to be explicit about your intent.
Well, there are no syntax errors there, and I really can't see why you should be sceptical about a table like that, as long as you use the colspan attribute of the td-element:
<table>
<tr>
<td></td><td></td><td></td><td></td>
</tr>
<tr>
<td colspan="3"></td>
</tr>
</table>
Hope that helped.
That code is fine from a structural point of view. It's valid XHTML. Compare this:
<orders>
<order id='2009/1'>
<item id='1'/><item id='2'><item id='3'/>
</order>
<order id='2009/2'>
<item id='33'/>
</order>
</orders>
It might look strange though, hence the suggestion to use colspan. That way you can get the single TD to fill up the row, instead of being the width of the TD above it.