Is there any way to get element attribute names by Xpath?

Is there any way to get element attribute names by Xpath? - html

I can get attribute value of element by Xpath, but how to get all the attribute names?
example:
# there is an element
'<img src="http://fakesrc" alt="pic name"></img>'
i = <Element img at 0x102622cb0>
In [10]: i.xpath("//img/#src")
Out[10]: ['http://fakesrc']
In [11]: i.xpath("//img/#*")
Out[11]: ['http://fakesrc', 'pic name']
How can I get the the names src/alt of i?

Depending on whether you want to include namespace prefixes or not, you can choose between the following two options in XPath 2.0:
//#*/local-name()
//#*/name()
Choose a different initial context node that fits your needs and see the specifications for more info.
With XPath 1.0, the above is not possible. The following does work, but will only show the attribute name of one attribute, even if there are multiple ones.
local-name(//#*)
name(//#*)

Related

Selecting element based on attribute order in XPath?

I am working on a project using the Html-Agility-Pack and I need to build a list of each link that has an href attribute as its first attribute. What XPath expression would be used for this?
Example (I would want to only select the first):
<a href="http://someurl.com"/>
<a id="someid" href="http://someurl.com"/>

No, don't do that.
You really don't want to select elements based upon the ordering of their attributes because attribute order is arbitrary in HTML and XML. Find another criteria to limit your selections:
attribute presence or attribute value
child element presence or string value
preceding element value, possibly a label
etc
You want to choose a criteria that's invariant across all instances of the HTML/XML documents you may encounter. Attribute order is not such a criteria.

Robot framework, how to check class

Is there a keyword in Robot Framework to ensure element has a certain class? Something like
Element should has class element className
Alternatively, I could check if element has a certain attribute with certain value. Former would be more suitable though, as element may contain multiple classes.

You could create a new keyword via XPath selectors:
Element should have class
[Arguments] ${element} ${className}
Wait until page contains element ${element}[contains(#class, '${className}')]
Or via CSS selectors:
Element should have class
[Arguments] ${element} ${className}
Wait until page contains element ${element}.${className}
Wait until page contains element could be replaced by any keyword of your liking to check if the element exists and is visible, such as Element should be visible.

Here's an alternative solution (though the accepted answer's CSS one is quite good), working for any kind of selector strategy:
Element should have class
[Arguments] ${locator} ${target value}
${class}= Get Element Attribute ${locator}#class
Should Contain ${class} ${target value}
It can be modified for any other attribute - just substitute the #class in Get Element Attribute with it (or even, make it an optional argument).

Some of the solutions on this page may suffer from sub-string matches. Checking that the class attribute (e.g. test-run) contains a class (e.g. test) may pass even though it should fail.
There are a few ways to deal with this, but in the end, I did the following:
Element Should Have Class
[Arguments] ${locator} ${class}
${escaped}= Regexp Escape ${class}
${classes}= Get Element Attribute ${locator} class
Should Match Regexp ${classes} \\b${escaped}\\b
Element Should Not Have Class
[Arguments] ${locator} ${class}
${escaped}= Regexp Escape ${class}
${classes}= Get Element Attribute ${locator} class
Should Not Match Regexp ${classes} \\b${escaped}\\b

Here below an example of both ways:
${temp}= get element attribute xpath=/elementpath class
should contain ${temp} ${ClassName}
OR
Wait until page contains element xpath=/elementpath[contains(#class, '${ClassName}')

Using proper names

My HTML tag specifies lang="en", but there are a lot of proper names in the document. These are such things as surnames, which the validator flags as spelling mistakes. I'd like to put them in a <span> with lang="none" for example. Is there a correct way of doing this (i.e. one which validates as correct HTML?

The correct way to do it is to set the attribute to an empty string
<span lang="">...</span>
To determine the language of a node, user agents must look at the nearest ancestor element (including the element itself if the node is an element) that has a lang attribute in the XML namespace set or is an HTML element and has a lang in no namespace attribute set. That attribute specifies the language of the node (regardless of its value).
If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown.
HTML5 Spec

How to parse HTML/XML tags according to NOT conditions in [r]

Dearest StackOverflow homies,
I'm playing with HTML that was output by EverNote and need to parse the following:
Note Title
Note anchor (hyperlink identities of the notes themselves)
Note Creation Date
Note Content, and
Intra-notebook hyperlinks (the
links within the content of a note to another note's anchor)
According to examples by Duncan Temple Lang, author of the [r] XML package and a SO answer by #jdharrison, I have been able to parse the Note Title, Note anchor, and Note Creation Dates with relative ease. For those who may be interested, the commands to do so are
require("XML")
rawHTML <- paste(readLines("EverNotebook.html"), collapse="\n") #Yes... this is noob code
doc = htmlTreeParse(rawHTML,useInternalNodes=T)
#Get Note Titles
html.titles<-xpathApply(doc, "//h1", xmlValue)
#Get Note Title Anchors
html.tAnchors<-xpathApply(doc, "//a[#name]", xmlGetAttr, "name")
#Get Note Creation Date
html.Dates<-xpathApply(doc, "//table[#bgcolor]/tr/td/i", xmlValue)
Here's a fiddle of an example HTML EverNote export.
I'm stuck on parsing 1. Note Contents and 2. Intra-notebook hyperlinks.
Taking a closer look at the code it is apparent the solution for the first part is to return every upper-most* div that does NOT include a table with attribute bgcolor="#D4DDE5." How is this accomplished?
Duncan says that it is possible to use XPath to parse XML according to NOT conditions:
"It allows us to express things such as "find me all nodes named a" or "find me all nodes named a that have no attribute named b" or "nodes a that >have an attribute b equal to 'bob'" or "find me all nodes a which have c as >an ancestor node"
However he does not go on to describe how the XML package can parse exclusions... so I'm stuck there.
Addressing the second part, consider the format of anchors to other notes in the same notebook:
<a href="#13178">
The goal with these is to procure their number and yet this is difficult because they are solely distinguished from www links by the # prefix. Information on how to parse for these particular anchors via partial matching of their value (in this case #) is sparse - maybe even requiring grep(). How can one use the XML package to parse for these special hrefs? I describe both problems here since it's possible a solution to the first part may aid the second... but perhaps I'm wrong. Any advice?
UPDATE 1
By upper-most div I intend to say outer-most div. The contents of every note in an EverNote HMTL export are within the DOMs outer-most divs. Thus the interest is to return every outer-most div that does NOT include a table with attribute bgcolor="#D4DDE5."

"....to return every upper-most div that does NOT include a table with attribute bgcolor="#D4DDE5." How is this accomplished?"
One possible way ignoring 'upper-most' as I don't know exactly how would you define it :
//div[not(table[#bgcolor='#D4DDE5'])]
Above XPath reads: select all <div> not having child element <table> with bgcolor attribute equals #D4DDE5.
I'm not sure about what you mean by "parse" in the 2nd part of the question. If you simply want to get all of those links having special href, you can partially match the href attribute using starts-with() or contains() :
//a[starts-with(#href, '#')]
//a[contains(#href, '#')]
UPDATE :
Taking "outer-most" div into consideration :
//div[not(table[#bgcolor='#D4DDE5']) and not(ancestor::div)]
Side note : I don't know exactly how XPath not() is defined, but if it works like negation in general, (this worked as confirmed by OP in the comment below) you can apply one of De Morgan's law :
"not (A or B)" is the same as "(not A) and (not B)".
so that the updated XPath can be slightly simplified to :
//div[not(table[#bgcolor='#D4DDE5'] or ancestor::div)]

Select attribute content XPath

I have an XPath
//*[#class]
I would like to make an XPath to select the content inside this attribute.
<li class="tab-off" id="navList0">
So in this case I would like to extract the text "tab-off", is this possible with XPath?

Your original //*[#class] XPath query returns all elements which have a class attribute. What you want is //*[#class]/#class to retrieve the attribute itself.
In case you just want the value and not the attribute name try string(//*[#class]/#class) instead.

If you are specifically grabbing the data from an tag, you can do this:
//li[#class]
and loop through the result set to find a class with attribute "tab-off". Or
//li[#class='tab-off']
If you're in a position to hard code.
I assume you have already put your file through an XML parser like a DOMParser. This will make it much easier to extract any other values you may need on a specific tag.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Is there any way to get element attribute names by Xpath? - html

Related

Selecting element based on attribute order in XPath?

Robot framework, how to check class

Using proper names

How to parse HTML/XML tags according to NOT conditions in [r]

Select attribute content XPath

Categories

Resources