I have a HTML tree which looks like this:
<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
<span>
Text line1.
<br>
Text line2.
</span>
I am trying to extract all the text from the span with the following XPath expression:
//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span/text()
However this approach only returns the first text line until the break? The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag? I would appreciate any help very much and thank you in advance for the support.
use // and getall method to get all text inside specific element
getall returns list, just join it
txt = "".join(response.xpath('//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span//text()').getall())
Related
I have this HTML
<p>
<strong>aquiline</strong>
<i> adj. </i>
of or like the eagle.
</p>
All this this node is wrapped by a div with class= field-item even
I would like to recive Aquiline adj. of or like the eagle.... Now i have this uncorrect xpath response.xpath('//div[#class="field-item even"]//descendant-or-self::p/text()').getall()
Your xpath is almost correct. Replace p with * to select all text nodes and not only text nodes of paragraph tags. Also using normalize-space function you can get all the text as one string instead of a list. See below code snippet.
response.xpath('normalize-space(//div[#class="field-item even"]//descendant-or-self::*)').get()
I am using Selenium in Python and Firefox and I would want to get the text "TextB" in the next portion of HTML, I have tried with element.get_attribute('textContent') but it takes "TextA" too, is there any form of getting ONLY that text?
<p class="class_name" id="id_name">
<i class="class_name2"></i>
<b>TextA</b>
TextB
</p>
Try to get text content of the last child only
element = driver.find_element_by_id('id_name')
driver.execute_script('return arguments[0].lastChild.textContent', element)
As per the HTML you have provided to extract the text TextB in a more Pythonic way would be to use the splitlines() method as follows:
myText = driver.find_element_by_xpath("//p[#class='class_name' and #id='id_name']").get_attribute("innerHTML").splitlines()[3]
Below gives you TextA which is the text of b tag
driver.find_element_by_xpath("//p[#class='class_name']/b").text;
Below gives you TextB which is the text of p tag
driver.find_element_by_xpath("//p[#class='class_name']").text;
My html :
<div class="my class">
<p style="padding:3px;">
<b> Série :</b>
my text I need to got with my xpath expression
</p>
For now I have : '//div[#class="my class"]/p[b[contains(., \'Série :\')]]' But the last part with the contains isn't working. I really need to say that i want the text from a balise p that contains balise b which contains text "Serie" because there is an other structures like that with the only difference being the text in that famous balise.
To get only the text after the <b> tag, you can use an index on the text() node
//div[#class='my class']/p[contains(b, 'Série :')]/text()[2]
An alternative to this approach is using the following:: axis
//div[#class='my class']/p[contains(b, 'Série :')]/b/following::text()
Output in both cases is:
my text I need to got with my xpath expression
I need to get the text but only before the certain text ('---------------').
E.g. example of HTML code:
...
<p> This is correct text. Everything after it is wrong</p>
<p>---------------------</p>
<p><strong>This is wrong text</strong></p>
<p> This is wrong another text</p>
...
I'm trying to solve this with the next XPath expression:
/p/text()[normalize-space()][not(ancestor::p[contains(.,'---')])]
But unfortunately this doesn't work as expected.
Would be appreciate for the correct solution.
This XPath will select the text of a p whose immediately following sibling contains ---:
//p[following-sibling::p[contains(.,'---')]][1]/text()
I'm parsing HTML page with XPath and want to grab whole text of some specific paragraph, including text of links.
For example I have following paragraph:
<p class="main-content">
This is sample paragraph with link inside.
</p>
I need to get following text as result: "This is sample paragraph with link inside", however applying "//p[#class'main-content']/text()" gives me only "This is sample paragraph with inside".
Could you please assist? Thanks.
To get the whole text content of a node, use the string function:
string(//p[#class="main-content"])
Note that this gets a string value. If you want text nodes (as returned by text()), you can do this. You need to search at all depths:
//p[#class="main-content"]//text()
This returns three text nodes: This is sample paragraph with, link and inside.