Select attribute content XPath - html

I have an XPath
//*[#class]
I would like to make an XPath to select the content inside this attribute.
<li class="tab-off" id="navList0">
So in this case I would like to extract the text "tab-off", is this possible with XPath?

Your original //*[#class] XPath query returns all elements which have a class attribute. What you want is //*[#class]/#class to retrieve the attribute itself.
In case you just want the value and not the attribute name try string(//*[#class]/#class) instead.

If you are specifically grabbing the data from an tag, you can do this:
//li[#class]
and loop through the result set to find a class with attribute "tab-off". Or
//li[#class='tab-off']
If you're in a position to hard code.
I assume you have already put your file through an XML parser like a DOMParser. This will make it much easier to extract any other values you may need on a specific tag.

Related

How to select HTML element by XPATH if attribute name contains space?

for example
<li big class="attribute"></li>
in selenium selecting would be like this
driver.find_element(By.XPATH, '//*[#big class="attribute"]');
so how can i select the element by XPATH , using that results an invalid expression.
selecting just by class like this //*[#class="attribute"] doesnt work
If you want to select element by both attributes correct code would be
driver.find_element(By.XPATH, '//li[#big and #class="attribute"]')
note that big seem to be a separate boolean attribute (it might not have an explicit value) but not an "... attribute name contains space"

XPath based on id attribute value that starts with something?

For example if I have multiple anchor elements on a site and the easiest way to get them is via their ID, but the IDs look like this:
lots of html...
hop1
...lots of html...
hop2
...lots of html...
hop3
...lots of html
Is it possible to select the href attributes of all anchor elements whose id has the "foo_" part of the id? In other words, can I add a wildcard in an attribute's value in XPath?
This XPath expression, which works with all versions of XPath,
//a[starts-with(#id,"foo_")]/#href
will select all a/#href attributes whose a has an id attribute value that starts with "foo_".
Yes you can use matches function in terms of XSL:
Starting with foo_ //a/#id[matches(.,'^foo_\d+')]
Containing foo_ //a/#id[matches(.,'foo_\d+')]
Please specify for which language you are asking for

Extracting content of HTML tag with specific attribute

Using regular expressions, I need to extract a multiline content of a tag, which has specific id value. How can I do this?
This is what I currently have:
<div(.|\n)*?id="${value}"(.|\n)*?>(.|\n)*?<\/div>
The problem with this is this sample:
<div id="1">test</div><div id="2">test</div>
If I want to replace id="2" using this regexp (with ${value} = 2), the whole string would get matched. This is because from the tag opening to closing I match everything until id is found, which is wrong.
How can I do this?
A fairly simple way is to use
Raw: <div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)</div>
Delimited: /<div(?=\s)[^>]*?\sid="2"[^>]*?>([\S\s]*?)<\/div>/
Use the variable in place of 2.
The content will be in group 1.
Change (.|\n) to [^>] so it won't match the > that ends the tag. Then it can't match across different divs.
<div\b[^>]*\bid="${value}"[^>]*>.*?<\/div>
Also, instead of using (.|\n)* to match across multiple lines, use the s modifier to the regexp. This makes . match any character, including newlines.
However, using regular expressions to parse HTML is not very robust. You should use a DOM parser.

is it possible to read the text of a li using Xpath with different attributes?

I am aware that I can directly use:
driver.FindElement(By.XPath("//ul[3]/li/ul/li[7]")).Text
to get the text .. but I am trying get the text by using Xpath and combination of different attributes like text(), contains() etc.
//ul[3]/li/ul/li//[text()='My Data']
Please suggest me different ways that I can handle this ... except the one I mentioned.
<li class="ng-binding ng-scope selectedTreeElement" ng-click="orgSelCtrl.selectUserSessionOrg(child);" ng-class="{selectedTreeElement: child.organizationId == orgSelCtrl.SelectedOrg.organizationId}" ng-repeat="child in node.childOrgs" style="background-color: transparent;"> My Data </li>
looks like you have extra "/" in your xpath and you miss dot:
//ul[3]/li/ul/li//[text()='My Data']
try this:
.//ul[3]/li/ul/li[text()='My Data']
BUT you are use xpath only for find elements, but not for reading its attributes. If you need to read attribute or text inside of it, you need to use selenium after search.
.Text of a WebElement would just return you the text of an element.
If you want to make expectations about the text, check the text() inside the XPath expression, e.g.:
//ul[3]/li/ul/li[text()='My Data']
or, using contains():
//ul[3]/li/ul/li[contains(text(), 'My Data')]
There are other functions you can make use of, see Functions - XPath.
You can also combine it with other conditions. For instance:
//ul[3]/li/ul/li[contains(#class, 'selectedTreeElement') and contains(text(), 'My Data')]

Regex for HTML attribute replacement/addition

I'm looking for a single line regex which does the following:
Given a HTML tag with the "name" attribute, I want to replace it with my own attribute. If that tag lacks the name attribute, I want to implant my own attribute. The result should look like this:
<IMG name="img1" ...> => <IMG name="myImg1" ...>
<IMG ...> => <IMG name="myImg1" ...>
Can this be done with a single line regex?
The trick is to match every complete "attribute=value" pair, but capture only the ones whose attribute name isn't "name". Then plug in your own "name" attribute along with all the captured ones.
s/<IMG
((?:\s+(?!name\b)\w+="[^"]+")*)
(?:\s+name="[^"]+")?
((?:\s+(?!name\b)\w+="[^"]+")*)
>
/<IMG name="myName"$1$2>
/xg;
This isn't a perfect solution, the spacing and position within the tag may not be exactly what you want, but it does accomplish the goals. This is with a perl regex, but there's nothing particular perl-specific about it.
s/(<IMG)((\s+[^>]*)name="[^"]*")?(.*)/$1$3 name="myID"$4/g
If, like in your example, the name attribute is always the first one inside the IMG tag, then it's very easy. Search for
<(?!/)(/w+)\s+(name="[^"]+")?
and replace with
<\1 name="myImg1"
but I doubt that this is what you really want.
If the name attribute can occur in other positions, it gets more difficult.