I have a complex html structure with lot of tables and divs.. and also the structure might change. How to find xpath by skipping the elements in between.
for example :
<table>
<tr>
<td>
<span>First Name</span>
</td>
<td>
<div>
<table>
<tbody>
<tr>
<td>
<div>
<table>
<tbody>
<tr>
<td>
<img src="1401-2ATd8" alt="" align="middle">
</td>
<td><span><input atabindex="2" id=
"MainLimitLimit" type="text"></span></td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
I have to get the input element with respect to the "First Name" span
eg :
By.xpath("//span[contains(text(), 'First Name')]/../../td[2]/div/table/tbody/tr/td/table/tbody/tr/td[2]/input")
but.. can I skip the between htmls and directly access the input element.. something like?
By.xpath("//span[contains(text(), 'First Name')]/../../td[2]//input[contains#id,'MainLimitLimit')]")
You can try this Xpath :
//td[contains(span,'First Name')]/following-sibling::td[1]//input[contains(#id, 'MainLimitLimit')]
Explanation :
select <td><span>First Name</span></td> element :
//td[contains(span,'First Name')]
then get <td> element next to above <td> element :
/following-sibling::td[1]
then get <input> element within <td> element selected in the 2nd step above :
//input[contains(#id, 'MainLimitLimit')]
You can use // which means at any level
By.xpath("//span[contains(text(), 'First Name')]//td[2]/input[contains#id,'MainLimitLimit')]")
you can use the "First Name" span as a predicate. Try the code below
//td[preceding-sibling::td[span[contains(text(), 'First Name')]]]//input[contains(#id,'MainLimitLimit')]
Related
I'm using Blade to fill some tables with content but in some cases a table might end up empty when there is nothing to fill.
Here is part of the php / blade template:
<table class="table">
#isset ($content->client)
<tr>
<td>
Client:
</td>
<td class="text-right">
{{ $content->client }}
</td>
</tr>
#endisset
#isset ($content->published)
<tr>
<td>
Published:
</td>
<td class="text-right">
{{ $content->published }}
</td>
</tr>
#endisset
</table>
In case $content->client and $content->published are not set the result is something like:
<table class="table">
</table>
Is there a simple css way to remove the table entirely in these cases?
I'm familiar with the :empty selector but aparently that doesn't work if there are whitespaces in the tag :(
I would suggest not printing the table if either of the variables are empty.
<?php
if( isset($content->client) || isset($content->published))
{
// echo table
}
?>
Did you try :blank? It also selects whitespace while :empty does not.
I have a complex html structure with lot of tables and divs.. and also the structure might change. How to find xpath by skipping the elements in between.
for example :
<table>
<tr>
<td>
<span>First Name</span>
</td>
<td>
<div>
<table>
<tbody>
<tr>
<td>
<div>
<table>
<tbody>
<tr>
<td>
<img src="1401-2ATd8" alt="" align="middle">
</td>
<td><span><input atabindex="2" id=
"MainLimitLimit" type="text"></span></td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</td>
</tr>
</table>
I have to get the input element with respect to the "First Name" span
eg :
By.xpath("//span[contains(text(), 'First Name')]/../../td[2]/div/table/tbody/tr/td/table/tbody/tr/td[2]/input")
but.. can I skip the between htmls and directly access the input element.. something like?
By.xpath("//span[contains(text(), 'First Name')]/../../td[2]//input[contains#id,'MainLimitLimit')]")
You can try this Xpath :
//td[contains(span,'First Name')]/following-sibling::td[1]//input[contains(#id, 'MainLimitLimit')]
Explanation :
select <td><span>First Name</span></td> element :
//td[contains(span,'First Name')]
then get <td> element next to above <td> element :
/following-sibling::td[1]
then get <input> element within <td> element selected in the 2nd step above :
//input[contains(#id, 'MainLimitLimit')]
You can use // which means at any level
By.xpath("//span[contains(text(), 'First Name')]//td[2]/input[contains#id,'MainLimitLimit')]")
you can use the "First Name" span as a predicate. Try the code below
//td[preceding-sibling::td[span[contains(text(), 'First Name')]]]//input[contains(#id,'MainLimitLimit')]
I`m using Scrapy Python to try to grep data from the site.
How I can grep this structure with Xpath?
<div class="foo">
<h3>Need this text_1</h3>
<table class="thesamename">
<tbody>
<tr>
<td class="tmp_year">
45767
</td>
<td class="tmp_outcome">
<b>Win_1</b><br>
<span class="tmp_category">TEST_1</span>
</td>
</tr>
<tr>
<td class="tmp_year">
1232004
</td>
<td class="tmp_outcome">
<b>Win_2</b><br>
<span class="tmp_category">TEST_2</span>
</td>
</tr>
<tr>
<td class="tmp_year">
122004
</td>
<td class="tmp_outcome">
<b>Win_3</b><br>
<span class="tmp_category">TEST_3</span>
</td>
</tr>
</tbody>
<h3>Need this text_2</h3>
<table class="thesamename">
<tbody>
<td class="tmp_year">
234
</td>
<td class="tmp_outcome">
<b>Win_E</b><br>
<span class="tmp_category">TEST_E</span>
</td>
</tr>
<tr>
<td class="tmp_year">
3476
</td>
<td class="tmp_outcome">
<b>Win_C</b><br>
<span class="tmp_category">TEST_C</span>
</td>
</tr>
</tbody>
<h3>Need this text_3</h3>
<table class="thesamename">
<tbody>
<tr>
<td class="tmp_year">
85567
</td>
<td class="tmp_outcome">
<b>Win_T</b><br>
<span class="tmp_category">TEST_T</span>
</td>
</tr>
<tr>
<td class="tmp_year">
435656
</td>
<td class="tmp_outcome">
<b>Win_A</b><br>
<span class="tmp_category">TEST_A</span>
</td>
</tr>
<tr>
<td class="tmp_year">
980
</td>
<td class="tmp_outcome">
<b>Win_Z</b><br>
<span class="tmp_category">TEST_Z</span>
</td>
</tr>
</tbody>
I would like to have output with this structure:
"Section": {
Need this text_1 :
[45767 : Win_1 : TEST_1]
[1232004 : Win_2 : TEST_2]
[122004: Win_3 : TEST_3]
,
Need this text_2:
[234 : Win_E : TEST_E]
[3476 : Win_C : TEST_C]
,
Need this text_3:
[85567 : Win_T : TEST_T]
[435656 : Win_A : TEST_A]
[980: Win_Z : TEST_Z]
}
How can I create the proper xpath select to take this structure?
I can take separately all "h3" , all "a" then all tags with class but how I can match?
GREP YOU SAY?! LOL Well, You would be entirely wron to name it so but for the sake ofkeeping the jargon cleanfor understanding your just parsing/extracting.... So new to scrapy? or web dev sideof things? No matter... Theres no way I couldexpect to teach you in one answer here how to xpth/regex like a pro... only wayis for you to keep at but I throw in my input.
First of all, xpath is amazingly usefull wen it comes to websites that are necessarily build to stadard, which doesnt make them bad per say but in the html snipet you gave... its structured all right soo.. Id recommend css extract .. THESE ARE THE VALUES...
year = response.css('td.tmp_year a::text').extract()
outcome = response.css('td.tmp_outcome b::text').extract()
category= response.css('span.tmp_category::text').extract()
PRO-TIP: For what ever case you deem it neccesary, you can save a web page asan HTML file and use scrapy shell by referencing the direct file path to it... So I save you html snippet to a file on my desktop then ran...
scrapy shell file:///home/scriptso/Desktop/letsGREPlol.html
ANYWAYS... as far as xpath... since you asked lol... cake. lets compare the xpath with the cssand tell me you can see... it? lol
response.css('td.tmp_outcome b::text').extract()
so is a td tag....and the class name is tmp_outcome, thn the next node is a bold tag... of which where the text is thusly declaring it as text with the ::text
response.xpath('//td[#class="tmp_outcome"]/b/text()').extract()
So xpath is basically saying we star with a patter inthe entire site of the td tag... and class= tmp_outcome, then the bold, then in xpath to declare type /text() is for text.... /#href is for.. yeah you guessedit
I am trying to select td.col_4 within the following HTML structure using Capybara, but to no avail so far:
<div id="potentialResults">
<div class="result">
<table class="report">
<tbody>
<tr>
<td class="col_1">
<img original-title="Sources : Telephone Directory">
<u>Name 1</u>
</td>
<td class="col_2"></td>
<td class="col_3"></td>
<td class="col_4">Address 1</td>
</tr>
</tbody>
</table>
</div>
<div class="result">
<table class="report">
<tbody>
<tr>
<td class="col_1">
<img original-title="Sources : Telephone Directory">
<u>Name 2</u>
</td>
<td class="col_2"></td>
<td class="col_3"></td>
<td class="col_4">Address 2</td>
</tr>
</tbody>
</table>
</div>
</div>
So at the moment to get the text address 1 for Name 1 I do this
page.find("#potentialResults > .result > .report > tbody > tr > td.col_1 > a", text: "Name 1", match: :first).find('td.col_4').text
But where I seem to be struggling is getting the same address but using the img data-attribute as my identifier
page.find("#potentialResults > .result > .report > tbody > tr > td.col_1 > img[original-title='Sources : Telephone Directory'] + td.col_4", match: :first).text
But td.col_4 isn't exactly adjacent in this example is it?
How else would I be able to get the text when stipulating that it has to be the first match?
The css you're passing to your finders is way too lengthy. It can be shortened without running into ambiguous results.
If you want to just simply get text from td.col_4, just do:
find('#potentialResults .col_4').text
If you want the address for 'Name 1', you need to first find that element, traverse up to the parent by a few levels, then find your way back down. This is because the elements you are looking for are siblings, and Capybara doesn't really provide an easy way to find these:
find('#potentialResults .col_1 a', :text => 'Name 1').find(:xpath, '../../..').find('.col_4').text
Here is the html code:
<table>
<tr class="WhiteRow">
<td align="center">
<input id="SelectedDelivery1" type="checkbox" onclick="HandleClick(this.name,this.checked,"")" value="Y" name="SelectedDelivery1">
</td>
<td valign="top">
<span></span>
<span class="bold">Instrument Search</span>
<br>
abc (TRANSFER)
</td>
<td align="center">5 minutes</td>
<td class="noborder" align="right">
<td class="noborder" align="right">
<td class="noborder" align="right">
<td class="noborder" align="right">
</tr>
<tr>
<td align="center">
<input id="SelectedDelivery2" type="checkbox" onclick="HandleClick(this.name,this.checked,"")" value="Y" name="SelectedDelivery1">
</td>
<td valign="top">
<span></span>
<span class="bold">Instrument Search</span>
<br>
abc (CAVEAT)
</td>
...
</tr>
</table>
I would like to target the <tr> containing <span class="bold">Instrument Search</span> and abc (TRANSFER). That tr may not be the first element in the table.
So far I tried
//td/span[text()="Instrument Search"]/ancestor::tr
which only satisfy one of the condition, and there are a few tr that satisfy the selector.
Could you please advise me how to target both of them
Use the following XPath expression:
//tr[contains(., 'abc (TRANSFER)') and contains(td/span[#class = 'bold'], 'Instrument Search')]
If possible, you should always use expressions that are unidirectional, because a "backwards" axis like ancestor:: could be a costly move. That's the advantage over the solution you have found already.
If the span[#class = 'bold'] cannot contain anything else than "Instrument Search", you should modifiy the expression above to:
//tr[contains(., 'abc (TRANSFER)') and td/span[#class = 'bold'] = 'Instrument Search']
The location of "abc (TRANSFER)" is still not very precise, if it is required in a certain place (e.g. always inside a td element) you'd have to further restrict the above.
EDIT Respondin to your comment:
abc (TRANSFER) is inside td tag, it's just a text field
Then use
//tr[contains(td, 'abc (TRANSFER)') and td/span[#class = 'bold'] = 'Instrument Search']
I found myself an answer after crawling through the syntax.
Please let me know if there is any other better ways
//td/span[text()="Instrument Search"]/ancestor::td/text()[contains(., "TRANSFER")]/ancestor::tr