I am working on a project using the Html-Agility-Pack and I need to build a list of each link that has an href attribute as its first attribute. What XPath expression would be used for this?
Example (I would want to only select the first):
<a href="http://someurl.com"/>
<a id="someid" href="http://someurl.com"/>
No, don't do that.
You really don't want to select elements based upon the ordering of their attributes because attribute order is arbitrary in HTML and XML. Find another criteria to limit your selections:
attribute presence or attribute value
child element presence or string value
preceding element value, possibly a label
etc
You want to choose a criteria that's invariant across all instances of the HTML/XML documents you may encounter. Attribute order is not such a criteria.
Related
for example
<li big class="attribute"></li>
in selenium selecting would be like this
driver.find_element(By.XPATH, '//*[#big class="attribute"]');
so how can i select the element by XPATH , using that results an invalid expression.
selecting just by class like this //*[#class="attribute"] doesnt work
If you want to select element by both attributes correct code would be
driver.find_element(By.XPATH, '//li[#big and #class="attribute"]')
note that big seem to be a separate boolean attribute (it might not have an explicit value) but not an "... attribute name contains space"
For example if I have multiple anchor elements on a site and the easiest way to get them is via their ID, but the IDs look like this:
lots of html...
hop1
...lots of html...
hop2
...lots of html...
hop3
...lots of html
Is it possible to select the href attributes of all anchor elements whose id has the "foo_" part of the id? In other words, can I add a wildcard in an attribute's value in XPath?
This XPath expression, which works with all versions of XPath,
//a[starts-with(#id,"foo_")]/#href
will select all a/#href attributes whose a has an id attribute value that starts with "foo_".
Yes you can use matches function in terms of XSL:
Starting with foo_ //a/#id[matches(.,'^foo_\d+')]
Containing foo_ //a/#id[matches(.,'foo_\d+')]
Please specify for which language you are asking for
I got for example
News
More
What would be the most efficient way of extracting the href based on the value between <a></a> (the atomic data) with XPath
Question interpretation:
"Most efficient" is taken to mean in a programmer's time sense, not
in a performance sense.
"The value between" is taken to mean the string between the a tags.
This XPath selects all a elements,
//a
This XPath selects all a elements whose string value is "News":
//a[.='News']
This XPath selects all href attributes of all a elements whose string value is "News"1:
//a[.='News']/#href
1. Credit: #localghost posted correct answer in comments
I have an XPath
//*[#class]
I would like to make an XPath to select the content inside this attribute.
<li class="tab-off" id="navList0">
So in this case I would like to extract the text "tab-off", is this possible with XPath?
Your original //*[#class] XPath query returns all elements which have a class attribute. What you want is //*[#class]/#class to retrieve the attribute itself.
In case you just want the value and not the attribute name try string(//*[#class]/#class) instead.
If you are specifically grabbing the data from an tag, you can do this:
//li[#class]
and loop through the result set to find a class with attribute "tab-off". Or
//li[#class='tab-off']
If you're in a position to hard code.
I assume you have already put your file through an XML parser like a DOMParser. This will make it much easier to extract any other values you may need on a specific tag.
What's the point of the name attribute on an HTML form? As far as I can tell, you can't read the form name on submission or do anything else with it. Does it serve a purpose?
In short, and probably oversimplifying a bit: It is used instead of id for browsers that don't understand document.getElementById.
These days it serves no real purpose. It is a legacy from the early days of the browser wars before the use of name to describe how to send control values when a form is submitted and id to identify an element within the page was settled.
From the specification:
The name attribute represents the form's name within the forms collection.
Once you assign a name to an element, you can refer to that element via document.name_of_element throughout your code. It doesn't work to tell when you've got multiple fields of the same name, but it does allow shortcuts like:
<form name="myform" ...>
document.myform.submit();
instead of
document.getElementsByName('myform')[0].submit();
Here's what MDN has to say about it:
name
The name of the form. In HTML 4, its use is deprecated (id should be used instead). It must be unique among the forms in a document and not just an empty string in HTML 5.
(from <form>, Attributes, name)
I find it slightly confusing that specifies that it must be unique, non-empty string in HTML 5 when it was deprecated in HTML 4. (I'd guess that requirement only applies if the name attribute is specified at all?). But I think it's safe to say that any purpose it once served has been superseded by the id attribute.
You can use the name attribute as an "extra information" attribute - similarly as with a hidden input - but this keeps the extra information tied into the form, which makes it just a little simpler to read/access.
name attribute is not completely redundant vis-à-vis id. As aforementioned, it useful with <forms>, but less known is that it can also be used with with any HTMLCollection, such as the children property of any DOM element.
HTMLCollection, in additional to be a array-like object, will have named properties commensurate with any named members (or the first occurrence in case of non-unique name). It is useful to retrieve specific named nodes.
For example, in the following example HTML:
<div id='person1'>
<span name='firstname'>John</span>
<span name='lastname'>Doe</span>
<span name='middlename'></span>
</div>
<div id='person2'>
<span name='firstname'>Jane</span>
<span name='lastname'>Doe</span>
<span name='middlename'></span>
</div>
by naming each child, one can quickly and efficiently retrieve a named element, such as lastname, as such:
document.getElementById('person1').children.namedItem('lastname')
...and if there is no risk of 'length' being the name of a member element, (being that length is a reserved property of HTMLCollection), a more terse notation may be used instead:
document.getElementById('person1').children.lastname
DOM Living Standard 2019 March 29
An HTMLCollection object is a collection of elements...
The namedItem(key) method, when invoked, must run these steps:
If key is the empty string, return null.
Return the first element in the collection for which at least one of the following is true:
it has an ID which is key;
it is in the HTML namespace and has a name attribute whose value is key;