Extracting full text from HTML span element with XPath expression

Extracting full text from HTML span element with XPath expression - html

I have a HTML tree which looks like this:
<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
<span>
Text line1.
<br>
Text line2.
</span>
I am trying to extract all the text from the span with the following XPath expression:
//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span/text()
However this approach only returns the first text line until the break? The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag? I would appreciate any help very much and thank you in advance for the support.

use // and getall method to get all text inside specific element
getall returns list, just join it
txt = "".join(response.xpath('//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span//text()').getall())

Related

Get text of a tag and the text of child tags

I have this HTML
<p>
<strong>aquiline</strong>
<i> adj. </i>
of or like the eagle.
</p>
All this this node is wrapped by a div with class= field-item even
I would like to recive Aquiline adj. of or like the eagle.... Now i have this uncorrect xpath response.xpath('//div[#class="field-item even"]//descendant-or-self::p/text()').getall()

Your xpath is almost correct. Replace p with * to select all text nodes and not only text nodes of paragraph tags. Also using normalize-space function you can get all the text as one string instead of a list. See below code snippet.
response.xpath('normalize-space(//div[#class="field-item even"]//descendant-or-self::*)').get()

Get text of an element when it has more elements with text

I am using Selenium in Python and Firefox and I would want to get the text "TextB" in the next portion of HTML, I have tried with element.get_attribute('textContent') but it takes "TextA" too, is there any form of getting ONLY that text?
<p class="class_name" id="id_name">
<i class="class_name2"></i>
<b>TextA</b>
TextB
</p>

Try to get text content of the last child only
element = driver.find_element_by_id('id_name')
driver.execute_script('return arguments[0].lastChild.textContent', element)

As per the HTML you have provided to extract the text TextB in a more Pythonic way would be to use the splitlines() method as follows:
myText = driver.find_element_by_xpath("//p[#class='class_name' and #id='id_name']").get_attribute("innerHTML").splitlines()[3]

Below gives you TextA which is the text of b tag
driver.find_element_by_xpath("//p[#class='class_name']/b").text;
Below gives you TextB which is the text of p tag
driver.find_element_by_xpath("//p[#class='class_name']").text;

XPATH getting text inside balise that contains balise which contains text

My html :
<div class="my class">
<p style="padding:3px;">
<b> Série :</b>
my text I need to got with my xpath expression
</p>
For now I have : '//div[#class="my class"]/p[b[contains(., \'Série :\')]]' But the last part with the contains isn't working. I really need to say that i want the text from a balise p that contains balise b which contains text "Serie" because there is an other structures like that with the only difference being the text in that famous balise.

To get only the text after the <b> tag, you can use an index on the text() node
//div[#class='my class']/p[contains(b, 'Série :')]/text()[2]
An alternative to this approach is using the following:: axis
//div[#class='my class']/p[contains(b, 'Série :')]/b/following::text()
Output in both cases is:
my text I need to got with my xpath expression

Get text followed by certain text

I need to get the text but only before the certain text ('---------------').
E.g. example of HTML code:
...
<p> This is correct text. Everything after it is wrong</p>
<p>---------------------</p>
<p><strong>This is wrong text</strong></p>
<p> This is wrong another text</p>
...
I'm trying to solve this with the next XPath expression:
/p/text()[normalize-space()][not(ancestor::p[contains(.,'---')])]
But unfortunately this doesn't work as expected.
Would be appreciate for the correct solution.

This XPath will select the text of a p whose immediately following sibling contains ---:
//p[following-sibling::p[contains(.,'---')]][1]/text()

Using XPath to get text of paragraph with links inside

I'm parsing HTML page with XPath and want to grab whole text of some specific paragraph, including text of links.
For example I have following paragraph:
<p class="main-content">
This is sample paragraph with link inside.
</p>
I need to get following text as result: "This is sample paragraph with link inside", however applying "//p[#class'main-content']/text()" gives me only "This is sample paragraph with inside".
Could you please assist? Thanks.

To get the whole text content of a node, use the string function:
string(//p[#class="main-content"])
Note that this gets a string value. If you want text nodes (as returned by text()), you can do this. You need to search at all depths:
//p[#class="main-content"]//text()
This returns three text nodes: This is sample paragraph with, link and inside.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Extracting full text from HTML span element with XPath expression - html

use // and getall method to get all text inside specific element getall returns list, just join it txt = "".join(response.xpath('//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span//text()').getall())

Related

Get text of a tag and the text of child tags

Get text of an element when it has more elements with text

XPATH getting text inside balise that contains balise which contains text

Get text followed by certain text

Using XPath to get text of paragraph with links inside

Categories

Resources