I have this html :
<tbody>
<tr id="1">
<td>foo faa</td>
<td>faa fii</td>
<td>foo faa</td>
<td>faa fuu</td>
</tr>
<tr id="2">
<td>foo fuu</td>
<td>fyy fuu</td>
<td>foo foo</td>
<td>fuu fii</td>
</tr>
<tr id="3">
<td>fuu faa</td>
<td>fii fuu</td>
<td>fuu fuu</td>
<td>fyy fee</td>
</tr>
<tr id="4">
<td>foo foo</td>
<td>fee faa</td>
<td>fee fyy</td>
<td>foo fuu</td>
</tr>
</tbody>
Elements td in my example contains two words, but in my real case, elements td may contains more words. And tr elements may contains more 4 td childs.
I want select tr element(s) depending of innerText of its childs. I want be able to search multiple values.
Be example :
if I search "fuu" and "foo" and "fii", the expected result of the xpath must be the elements tr with id 1 and 2.
if I search "fuu" and "fii", the expected result of the xpath must be the elements tr with id 1 and 2 and 3.
if I search only "fee", the expected result of the xpath must be the element tr with id 3 and 4.
I tried this :
//tr[*[contains(text(), 'fuu')] and *[contains(text(), 'foo')] and *[contains(text(), 'fii')]]
Its work as expected (http://xpather.com/Tdg5OGr2). But maybe it exist a more generic/proper solution, any idea someone ?
If I want search by example ten words, the xpath will become really big x)
Related
I apply a relative XPath (./) to an HtmlElement and it doesn't return any results. When I try using double dots (../), it returns all results matching from root HTML instead of descendant results of that specific HtmlElement. I am not sure what is wrong here.
The version of lxml is 4.5.2
Example:
<html>
<h3>
<p>
<table>
<tr>
<td>Sample</td>
<td>Sample</td>
</tr>
</table>
</p>
<h3>
<p>
<table>
<tr>
<td>Sample 2</td>
<td>Sample 2</td>
</tr>
</table>
</p>
</html>
Code
r = requests.get('http://website.com')
tree = html.fromstring(r.content)
tables = tree.xpath("(//p/table)")
for table in tables:
result = table.xpath('.//td')
text = result.text_content()
The first iteration in the loop should return "Sample" texts and the second iteration should return "Sample 2" texts.
The problem was with the HTML itself. When I inspect the document on a browser, it shows that <p> is the parent of the <table> elements however requested HTML revealed that <p></p> is actually the sibling element preceding <table>.
I want to add two rows inside a td like below picture but I can't find any solution of this.
You can do by using nested table
http://www.corelangs.com/html/tables/table-inside-table.html
You Have Asked (How can i add tr inside td)
To have tr inside td the only way is create another table inside td than you can have tr inside td.
Example :
<table>
<tr>
<td>
<table>
<tr>
<td>
...
</td>
</tr>
</table>
</td>
</tr>
</table>
But the image you have added it represent that you want to merge two rows for that you need to use Row Span.
Example for Row Span ( https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_td_rowspan )
Try using rowspan attribute in the cell.
https://www.w3schools.com/tags/att_td_rowspan.asp
Maybe you can use the rowspan attribute on td tags in your HTML: https://www.w3schools.com/tags/att_td_rowspan.asp
Example:
<table>
<tr>
<th>B/B LC NO Value USD</th>
<th>ABP Value USD</th>
<th>Exp N& Date</th>
</tr>
<tr>
<td rowspan="2">$ xxxxx<br/>$ -</td>
<td>$100</td>
<td>$50</td>
</tr>
<tr>
<td>$1</td>
<td>$80</td>
</tr>
</table>
You can give the fiels that you don't want to divide into two rowspan="2" and then skip the fields that would normally be under them.
<table>
<tr>
<td rowspan="2">Margaret Nguyen</td>
<td>427311</td>
<td><time datetime="2010-06-03">June 3, 2010</time></td>
<td>0.00</td>
</tr>
<tr>
<td>533175</td>
<td><time datetime="2011-01013">January 13, 2011</time></td>
<td>37.00</td>
</tr>
</table>
this is my test data
<tbody>
<tr>
<td>foo 1</td>
<td>first interest</td>
<td>bar 1</td>
</tr>
<tr>
<td>foo 2</td>
<td>
<p>second interest</p>
</td>
<td>bar 2</td>
</tr>
<tr>
<td>
</td>
<td>
</td>
<td>
</td>
</tr>
</tbody>
I'd like to select every time text of second cell (td[2]) of table row but problem is that the text can be in another subelement (paragraph p).
When I execute this xpath //tbody/tr[1]/td[2]/p/text() | //tbody/tr[1]/td[2]/text() the result is ok, but if I execute this for second row //tbody/tr[2]/td[2]/p/text() | //tbody/tr[2]/td[2]/text() then I get three texts where first and last are empty. How can I modify the xpath to get everytime only the text which I'm interested in. Note: there can be also empty cell, that I don't want to get.
thanks
Try this XPath to get text from required (not empty second) table cells:
//tbody/tr/td[2]//text()[normalize-space()]
<table border="1">
<tbody>
<tr>
<th>ID</th>
<th>Product</th>
<th>Color</th>
<th>Model</th>
</tr>
<tr>
<td>22</td>
<td>Car</td>
<td>blue</td>
<td>
<ul>
</ul>
</td>
</tr>
</tbody>
</table>
Above is a snippet of a highly nested html document. To get the table level I have used the following xpath
//th[contains(text(), "ref_code")]/following-
sibling::td[contains(text(), "197")]/ancestor::table[2]
How then can I edit the same xpath and select a specific table header data and the corresponding table data column like so using xpath:
ID |Product |Color
22 |Car |Blue
Any help will be appreciated
From your comments to the answers given here:
I assume that you get the above table from an existing xpath which is :
//th[contains(text(), "ref_code")]/following-
sibling::td[contains(text(), "197")]/ancestor::table[2]
Now you want to add/edit to this xpath such that you get the values of td given a column for e.g. Color, then the below xpath should give you the td values for all columns given Color as input:
//td[position()<=(count(//tr/th[.='Color']/preceding-sibling::*)+1) ]
Assuming your first xpath works correctly, add the above xpath to that like:
//th[contains(text(), "ref_code")]/following-
sibling::td[contains(text(), "197")]/ancestor::table[2]//td[position()<=(count(//tr/th[.='Color']/preceding-sibling::*)+1) ]
Output:
<td>22</td>
<td>Car</td>
<td>blue</td>
If you want just the Color, use xpath :
//td[(count(//tr/th[.='Color']/preceding-sibling::*)+1) ]
If you want just the Product use xpath :
//td[(count(//tr/th[.='Product']/preceding-sibling::*)+1) ]
If you want just the ID use xpath :
//td[(count(//tr/th[.='ID']/preceding-sibling::*)+1) ]
Note that the xpath changes at th[.='XXX'] where XXX is the selected element.
But if you want the output to be in the form of a table , you need to use XSLT, because you are trying to get a transformed view of your html , not just selected elements.
We seach for table data //table//td by position in header of column //table//th[text()='Color']
That [count(element/preceding-sibling::*) +1] is how to find element's index
So result is:
//table//td[count(//table//th[text()='Color']/preceding-sibling::*) +1]
why Header1 and Header2 not exists in all page in print landscape
https://fiddle.jshell.net/6mvucked/
seems height header is limit. if more than a value to be not show in all page
why ?
ٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍٍ
Your javascript will append 100 rows to only those containers whose id is "test".
So if you want the same to happen with the tbody of header, then simply write
<tbody id="test">
in the one in the header.
But in that case, it will only print 100 rows for the header tbody and not the other second tbody, as Javascript will append 100 rows to the 1st tag with id="test".
So if you need to append 100 or x number of rows to both or many tbody, then give them separate ids and hence write separate functions for them in javascript.
Like this:
<table>
<teahd>
<tr>
<td>ok , no problem, but show only in first page and not repeat</td>
<td>
<table>
<tbody id="test-one">
<tr><td>header not be shown if this code(table) here</td></tr>
</tbody>
</table>
</td>
</tr>
</teahd>
<tbody id="test-two">
</tbody>
<tfoot>
<tr>
<td>no problem</td>
</tr>
</tfoot>
</table>
And the javascript functions like:
for(var i=1;i<=100; i++)
$('#test-one').append('<tr><td colspan="2">row '+i+'</td></tr>');
$('#test-two').append('<tr><td colspan="2">row '+i+'</td></tr>');