Extracting a number from an XML node

Extracting a number from an XML node - html

The best I do (so far) with XPath is to extract the following node:
<li class="List-guests">
<span class="icon guests"/>
3
</li>
I actually need just to extract the number 3. Is there a way to do this in XPath? I really don't want to start using some complicated regex if I can avoid it.

You should be able to use the text() function

Both the normalized text following the class="icon guests" span,
normalize-space(//span[#class="icon guests"]/following-sibling::text())
and the normalized text of the class='List-guests' span,
normalize-space(//li[#class='List-guests'])
for your shown XML will be 3, as requested.
Note: This is the string 3. You can wrap either of the above XPath expressions in number() if you actually need the number 3.

Related

How to match text and skip HTML tags using a regular expression?

I have a bunch of records in a QuickBase table that contain a rich text field. In other words, they each contain some paragraphs of text intermingled with HTML tags like <p>, <strong>, etc.
I need to migrate the records to a new table where the corresponding field is a plain text field. For this, I would like to strip out all HTML tags and leave only the text in the field values.
For example, from the below input, I would expect to extract just a small example link to a webpage:
<p>just a small <a href="#">
example</a> link</p><p>to a webpage</p>
As I am trying to get this done quickly and without coding or using an external tool, I am constrained to using Quickbase Pipelines' Text channel tool. The way it works is that I define a regex pattern and it outputs only the bits that match the pattern.
So far I've been able to come up with this regular expression (Python-flavored as QB's backend is written in Python) that correctly does the exact opposite of what I need. I.e. it matches only the HTML tags:
/(<[^>]*>)/
In a sense, I need the negative image of this expression but have not be able to build it myself.
Your help in "negating" the above expression is most appreciated.

Assuming there are no < or > elsewhere or entity-encoded, an idea using a lookbehind.
(?:(?<=>)|^)[^<]+
See this demo at regex101
(?:(?<=>)|^) is an alternation between either ^ start of the string or looking behind for any >. From there [^<]+ matches one or more characters that are not < (negated character class).

How do I get rid of the tags in XPath

I have a bunch of html files with tons of data in it and I want to extract the important parts of it.
The files are all very similar; I've to search for a <tr> which contains a certain keyword. The third column of this table row always contains the name of the "block" I'm searching for (it's a few table rows).
//body/table/tbody/tr[td = "Deployed to"]/td[3]/div//span[text()]
with this XPath query I get the names (maybe one, maybe more)
The problem is, how do I get rid of the tags around the data?
Right now my output is something like this:
<span class="log_entry_text">Name1</span><span class="log_entry_text">Name2</span><span class="log_entry_text">Name3</span>
I want to have something like that: Name1 Name2 Name3
So I can use it for extracting these blocks more easily.
With string() i can only extract the first element (result would be: Name1)
Thanks for helping me!

Just wrap your xpath with data() element like data(//body/table/tbody/tr[td = "Deployed to"]/td[3]/div//span[text()]) for retrieve text.

Your XPath expression asks to retrieve span elements and that's what it has returned. If you're seeing tags with angle brackets in the output, that's because of the way the XPath result is being processed and rendered by the receiving application.
If you're in XPath 2.0+ or XQuery 1.0+ you can combine the several span elements into a single string using
string-join(//path/span, ' ')

Find XPath of Text which is Under Div Tag and besides span tag

Sorry if this is discussed before. I tried searching it but didn't find exact match. My question is I have low HTML code,
<div class="column1">
<div>
<div class="name">
Dynamic Name
<span class="id" title="ID">Dynamic ID</span>
</div>
</div>
</div>
I am looking for XPath to get text Dynamic Name.
Here is what I tried which didn't work
1. //div/div[#class='name'] which is finding text Dynamic Name Dynamic ID
2. //div/span[#class='id']/preceding-sibling::text()
Since Dynamic Name & Dynamic ID, both are the dynamic value, I can't use split & use name as we don't know where to split it.
Thanks in advance for your time & help.

This XPath,
normalize-space(//div[#class="name"]/text())
will return "Dynamic Name" for your HTML, as requested.
Thanks but it has some syntax error as i tried putting on Firepath and its giving some error. not sure which one.
Wild guess: See if
//div[#class="name"]/text()
avoids your syntax error (which would be a limitation of your tool as normalize-space() is a proper XPath 1.0 function) and selects your targeted text (although with extraneous whitespace).

Since you stated that the XPath //div/div[#class='name'] is working but returns too much info, we can grab that element, grab the innerHTML, and then split on the first <, take the first String, and trim() it just in case to get the answer.
driver.findElement(By.xpath("//div/div[#class='name']")).getAttribute("innerHTML").split("<")[0].trim();

Why won't my XPath select link/button based on its label text?

<a href="javascript:void(0)" title="home">
<span class="menu_icon">Maybe more text here</span>
Home
</a>
So for above code when I write //a as XPath, it gets highlighted, but when I write //a[contains(text(), 'Home')], it is not getting highlighted. I think this is simple and should have worked.
Where's my mistake?

Other answers have missed the actual problem here:
Yes, you could match on #title instead, but that's not why OP's
XPath is failing where it may have worked previously.
Yes, XML and XPath are case sensitive, so Home is not the same as
home, but there is a Home text node as a child of a, so OP is
right to use Home if he doesn't trust #title to be present.
Real Problem
OP's XPath,
//a[contains(text(), 'Home')]
says to select all a elements whose first text node contains the substring Home. Yet, the first text node contains nothing but whitespace.
Explanation: text() selects all child text nodes of the context node, a. When contains() is given multiple nodes as its first argument, it takes the string value of the first node, but Home appears in the second text node, not the first.
Instead, OP should use this XPath,
//a[text()[contains(., 'Home')]]
which says to select all a elements with any text child whose string value contains the substring Home.
If there weren't surrounding whitespace, this XPath could be used to test for equality rather than substring containment:
//a[text()[.='Home']]
Or, with surrounding whitespace, this XPath could be used to trim it away:
//a[text()[normalize-space()= 'Home']]
See also:
Testing text() nodes vs string values in XPath
Why is XPath unclean constructed? Why is text() not needed in predicate?
XPath: difference between dot and text()

yes you are doing 2 mistakes, you're writing Home with an uppercase H when you want to match home with a lowercase h. also you're trying to check the text content, when you want to check check the "title" attribute. correct those 2, and you get:
//a[contains(#title, 'home')]
however, if you want to match the exact string home, instead of any a that has home anywhere in the title attribute, use #zsbappa's code.

You can try this XPath..Its just select element by attribute
//a[#title,'home']

is it possible to read the text of a li using Xpath with different attributes?

I am aware that I can directly use:
driver.FindElement(By.XPath("//ul[3]/li/ul/li[7]")).Text
to get the text .. but I am trying get the text by using Xpath and combination of different attributes like text(), contains() etc.
//ul[3]/li/ul/li//[text()='My Data']
Please suggest me different ways that I can handle this ... except the one I mentioned.
<li class="ng-binding ng-scope selectedTreeElement" ng-click="orgSelCtrl.selectUserSessionOrg(child);" ng-class="{selectedTreeElement: child.organizationId == orgSelCtrl.SelectedOrg.organizationId}" ng-repeat="child in node.childOrgs" style="background-color: transparent;"> My Data </li>

looks like you have extra "/" in your xpath and you miss dot:
//ul[3]/li/ul/li//[text()='My Data']
try this:
.//ul[3]/li/ul/li[text()='My Data']
BUT you are use xpath only for find elements, but not for reading its attributes. If you need to read attribute or text inside of it, you need to use selenium after search.

.Text of a WebElement would just return you the text of an element.
If you want to make expectations about the text, check the text() inside the XPath expression, e.g.:
//ul[3]/li/ul/li[text()='My Data']
or, using contains():
//ul[3]/li/ul/li[contains(text(), 'My Data')]
There are other functions you can make use of, see Functions - XPath.
You can also combine it with other conditions. For instance:
//ul[3]/li/ul/li[contains(#class, 'selectedTreeElement') and contains(text(), 'My Data')]

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Extracting a number from an XML node - html

The best I do (so far) with XPath is to extract the following node: <li class="List-guests"> <span class="icon guests"/> 3 </li> I actually need just to extract the number 3. Is there a way to do this in XPath? I really don't want to start using some complicated regex if I can avoid it.

You should be able to use the text() function

Related

How to match text and skip HTML tags using a regular expression?

How do I get rid of the tags in XPath

Find XPath of Text which is Under Div Tag and besides span tag

Why won't my XPath select link/button based on its label text?

is it possible to read the text of a li using Xpath with different attributes?

Categories

Resources