XPath based on id attribute value that starts with something? - html

For example if I have multiple anchor elements on a site and the easiest way to get them is via their ID, but the IDs look like this:
lots of html...
hop1
...lots of html...
hop2
...lots of html...
hop3
...lots of html
Is it possible to select the href attributes of all anchor elements whose id has the "foo_" part of the id? In other words, can I add a wildcard in an attribute's value in XPath?

This XPath expression, which works with all versions of XPath,
//a[starts-with(#id,"foo_")]/#href
will select all a/#href attributes whose a has an id attribute value that starts with "foo_".

Yes you can use matches function in terms of XSL:
Starting with foo_ //a/#id[matches(.,'^foo_\d+')]
Containing foo_ //a/#id[matches(.,'foo_\d+')]
Please specify for which language you are asking for

Related

How to select HTML element by XPATH if attribute name contains space?

for example
<li big class="attribute"></li>
in selenium selecting would be like this
driver.find_element(By.XPATH, '//*[#big class="attribute"]');
so how can i select the element by XPATH , using that results an invalid expression.
selecting just by class like this //*[#class="attribute"] doesnt work
If you want to select element by both attributes correct code would be
driver.find_element(By.XPATH, '//li[#big and #class="attribute"]')
note that big seem to be a separate boolean attribute (it might not have an explicit value) but not an "... attribute name contains space"

Xpath with wildcards between id and divs?

I try to enter data in a table using Robot Framework. The table has an ID, but it changes every time I load the page (it is some kind of UUID) so I can't use it as "anchor" for my xpath. However there is a heading for this table that seems reasonable to start with that has a fixed ID. Inbetween the heading and the table there are a couple of divs. So something like this (some mix of pseudo code and what I get when I copy selector and xpath in Chrome) to get to the first cell in the first line of the table:
//*[#id="heading"] (a bunch of divs) /*[#id="random string of letters"]/div[3]/div/div/div[2]
I would like to write an xpath that looked something like this
//*[#id="heading"] [wildcard for the random ID and divs] /div[3]/div/div/div[2]
How do I write this?
Thank you.
If only one element inside the "header" contains an id attribute you could use
//*[#id="heading"]//*[#id]/div[3]/div/div/div[2]
If there are more than one element with id attribute you need something more, eg if it contains a certain tag
//*[#id="heading"]//*[contains(#id, "tag")]/div[3]/div/div/div[2]
or (if using xpath 2.0) and only this #id contains an uuid within the heading
//*[#id="heading"]//*[matches(#id,"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}")]/div[3]/div/div/div[2]
Otherways you will have to try to find something unique (within the context of "heading") to start the div[3]/div/div/div[2] search (if you are lucky div[3]/div/div/div[2] is unique enough.

Selecting element based on attribute order in XPath?

I am working on a project using the Html-Agility-Pack and I need to build a list of each link that has an href attribute as its first attribute. What XPath expression would be used for this?
Example (I would want to only select the first):
<a href="http://someurl.com"/>
<a id="someid" href="http://someurl.com"/>
No, don't do that.
You really don't want to select elements based upon the ordering of their attributes because attribute order is arbitrary in HTML and XML. Find another criteria to limit your selections:
attribute presence or attribute value
child element presence or string value
preceding element value, possibly a label
etc
You want to choose a criteria that's invariant across all instances of the HTML/XML documents you may encounter. Attribute order is not such a criteria.

Select attribute content XPath

I have an XPath
//*[#class]
I would like to make an XPath to select the content inside this attribute.
<li class="tab-off" id="navList0">
So in this case I would like to extract the text "tab-off", is this possible with XPath?
Your original //*[#class] XPath query returns all elements which have a class attribute. What you want is //*[#class]/#class to retrieve the attribute itself.
In case you just want the value and not the attribute name try string(//*[#class]/#class) instead.
If you are specifically grabbing the data from an tag, you can do this:
//li[#class]
and loop through the result set to find a class with attribute "tab-off". Or
//li[#class='tab-off']
If you're in a position to hard code.
I assume you have already put your file through an XML parser like a DOMParser. This will make it much easier to extract any other values you may need on a specific tag.

What's the point of HTML forms `name` attribute?

What's the point of the name attribute on an HTML form? As far as I can tell, you can't read the form name on submission or do anything else with it. Does it serve a purpose?
In short, and probably oversimplifying a bit: It is used instead of id for browsers that don't understand document.getElementById.
These days it serves no real purpose. It is a legacy from the early days of the browser wars before the use of name to describe how to send control values when a form is submitted and id to identify an element within the page was settled.
From the specification:
The name attribute represents the form's name within the forms collection.
Once you assign a name to an element, you can refer to that element via document.name_of_element throughout your code. It doesn't work to tell when you've got multiple fields of the same name, but it does allow shortcuts like:
<form name="myform" ...>
document.myform.submit();
instead of
document.getElementsByName('myform')[0].submit();
Here's what MDN has to say about it:
name
The name of the form. In HTML 4, its use is deprecated (id should be used instead). It must be unique among the forms in a document and not just an empty string in HTML 5.
(from <form>, Attributes, name)
I find it slightly confusing that specifies that it must be unique, non-empty string in HTML 5 when it was deprecated in HTML 4. (I'd guess that requirement only applies if the name attribute is specified at all?). But I think it's safe to say that any purpose it once served has been superseded by the id attribute.
You can use the name attribute as an "extra information" attribute - similarly as with a hidden input - but this keeps the extra information tied into the form, which makes it just a little simpler to read/access.
name attribute is not completely redundant vis-à-vis id. As aforementioned, it useful with <forms>, but less known is that it can also be used with with any HTMLCollection, such as the children property of any DOM element.
HTMLCollection, in additional to be a array-like object, will have named properties commensurate with any named members (or the first occurrence in case of non-unique name). It is useful to retrieve specific named nodes.
For example, in the following example HTML:
<div id='person1'>
<span name='firstname'>John</span>
<span name='lastname'>Doe</span>
<span name='middlename'></span>
</div>
<div id='person2'>
<span name='firstname'>Jane</span>
<span name='lastname'>Doe</span>
<span name='middlename'></span>
</div>
by naming each child, one can quickly and efficiently retrieve a named element, such as lastname, as such:
document.getElementById('person1').children.namedItem('lastname')
...and if there is no risk of 'length' being the name of a member element, (being that length is a reserved property of HTMLCollection), a more terse notation may be used instead:
document.getElementById('person1').children.lastname
DOM Living Standard 2019 March 29
An HTMLCollection object is a collection of elements...
The namedItem(key) method, when invoked, must run these steps:
If key is the empty string, return null.
Return the first element in the collection for which at least one of the following is true:
it has an ID which is key;
it is in the HTML namespace and has a name attribute whose value is key;