XPATH getting text inside balise that contains balise which contains text - html

My html :
<div class="my class">
<p style="padding:3px;">
<b> Série :</b>
my text I need to got with my xpath expression
</p>
For now I have : '//div[#class="my class"]/p[b[contains(., \'Série :\')]]' But the last part with the contains isn't working. I really need to say that i want the text from a balise p that contains balise b which contains text "Serie" because there is an other structures like that with the only difference being the text in that famous balise.

To get only the text after the <b> tag, you can use an index on the text() node
//div[#class='my class']/p[contains(b, 'Série :')]/text()[2]
An alternative to this approach is using the following:: axis
//div[#class='my class']/p[contains(b, 'Série :')]/b/following::text()
Output in both cases is:
my text I need to got with my xpath expression

Related

Get text of a tag and the text of child tags

I have this HTML
<p>
<strong>aquiline</strong>
<i> adj. </i>
of or like the eagle.
</p>
All this this node is wrapped by a div with class= field-item even
I would like to recive Aquiline adj. of or like the eagle.... Now i have this uncorrect xpath response.xpath('//div[#class="field-item even"]//descendant-or-self::p/text()').getall()
Your xpath is almost correct. Replace p with * to select all text nodes and not only text nodes of paragraph tags. Also using normalize-space function you can get all the text as one string instead of a list. See below code snippet.
response.xpath('normalize-space(//div[#class="field-item even"]//descendant-or-self::*)').get()

Extracting full text from HTML span element with XPath expression

I have a HTML tree which looks like this:
<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div
<div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
<span>
Text line1.
<br>
Text line2.
</span>
I am trying to extract all the text from the span with the following XPath expression:
//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span/text()
However this approach only returns the first text line until the break? The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag? I would appreciate any help very much and thank you in advance for the support.
use // and getall method to get all text inside specific element
getall returns list, just join it
txt = "".join(response.xpath('//div[#data-hook="review"]//div[#data-hook="review-collapsed"]/span//text()').getall())

Get text of an element when it has more elements with text

I am using Selenium in Python and Firefox and I would want to get the text "TextB" in the next portion of HTML, I have tried with element.get_attribute('textContent') but it takes "TextA" too, is there any form of getting ONLY that text?
<p class="class_name" id="id_name">
<i class="class_name2"></i>
<b>TextA</b>
TextB
</p>
Try to get text content of the last child only
element = driver.find_element_by_id('id_name')
driver.execute_script('return arguments[0].lastChild.textContent', element)
As per the HTML you have provided to extract the text TextB in a more Pythonic way would be to use the splitlines() method as follows:
myText = driver.find_element_by_xpath("//p[#class='class_name' and #id='id_name']").get_attribute("innerHTML").splitlines()[3]
Below gives you TextA which is the text of b tag
driver.find_element_by_xpath("//p[#class='class_name']/b").text;
Below gives you TextB which is the text of p tag
driver.find_element_by_xpath("//p[#class='class_name']").text;

How to retrieve an internal text

How to retrieve text from this template using XPath?
<div class="c">
<span> a </span>
text
</div>
I know that //div[#class='c']//text() returns whole div part, but I only want the text.
There is one slash too much. One slash makes sure that only text directly below div is returned:
//div[#class='c']/text()
The above returns text nodes. In many places in XPath or XQuery, they get automatically converted to strings (atomized), but you can also explicitly force a conversion to strings:
//div[#class='c']/text()/string()
or if you need to clean up for spaces and empty text nodes to return exactly text:
XPath 2.0:
//div[#class='c']/text()/normalize-space()[string-length() gt 0]
XPath 1.0 (for this specific document):
normalize-space(//div[#class='c']/text()[2])

Using XPath to get text of paragraph with links inside

I'm parsing HTML page with XPath and want to grab whole text of some specific paragraph, including text of links.
For example I have following paragraph:
<p class="main-content">
This is sample paragraph with link inside.
</p>
I need to get following text as result: "This is sample paragraph with link inside", however applying "//p[#class'main-content']/text()" gives me only "This is sample paragraph with inside".
Could you please assist? Thanks.
To get the whole text content of a node, use the string function:
string(//p[#class="main-content"])
Note that this gets a string value. If you want text nodes (as returned by text()), you can do this. You need to search at all depths:
//p[#class="main-content"]//text()
This returns three text nodes: This is sample paragraph with, link and inside.