Xpath node-set nesting order selection - html

Is there an Xpath 1.0 expression that I could use starting at the div[#id='rootTag'] context to select the different nested span descendants based on how deep they are nested?
For example could you use something like span[2] to select the second most deeply nested span tag rather than second span child of the same parent element?
<div id='rootTag'>
<span>Test</span>
<div>
<span>Test</span>
<span>Test</span>
</div>
</div>
<span>Test</span>
</div>
<div>
<div>
<div>
<div>
<span>Test</span>
</div>
<span>Test</span>
</div>
</div>
</div>
</div>

It's a bit (a lot...) of a hack, but it can be done this way:
Assume your html is like this:
levels = """<div id='rootTag'>
<span>Level2</span>
<div>
<span>Level3</span>
<div>
<span>Level4</span>
</div>
</div>
<div>
<span>Level3</span>
</div>
<div>
<div>
<div>
<div>
<span>Level6</span>
</div>
<span>Level5</span>
</div>
</div>
</div>
</div>"""
We then do this:
#First collect the data:
from lxml import etree #you have to make sure your html is well-formed, or it won't work
root = etree.fromstring(levels)
tree = etree.ElementTree(root)
#collect the paths of all <span> elements
paths = [tree.getpath(e) for e in root.iter('span')]
#determine the nesting level of each <span> element
nests = [e.count('/') for e in paths] #or, alternatively:
#nests = [tree.getpath(e).count('/') for e in root.iter('span')]
From here, we use the nesting level in the nests list to extract the comparable element in the paths list. For example, to get the <span> element with the deepest nesting level:
deepest = nests.index(max(nests))
print(paths[deepest],root.xpath(paths[deepest])[0].text)
Output:
/div/div[3]/div/div/div/span Level6
Or to extract the <span> element with a level 4 nesting:
print(paths[nests.index(4)],root.xpath(paths[nests.index(4)])[0].text)
Output:
/div/div[1]/div/span Level4

Related

CSS selector for the element without any classname or attribute

Is it possible to write a CSS selector matching the element which does not contain any attributes or class names?
For example, I have html like the following (but with tons of divs and dynamic class names) and I want to match the second div (it does not contain class)
<div class="xeuugli x2lwn1j x1cy8">
<div>
<div class="xeuugli x2lwn1j x1cy8">
<div class="xeuugli x2lwn1j n94">
<div class="x8t9es0 x10d9sdx xo1l8bm xrohj xeuugli">$0,00</div>
</div>
</div>
<div class="xeuugli x2lwn1j x1cy8zghib x19lwn94">
<span class="x8t9es0 xw23nyj xeuugli">Helloworld.</span>
</div>
</div>
</div>
P.S. Getting the div like div:nth-child(2) is not a solution.
P.P.S. Could you please advise in general why the dynamic class names are used in the development?
Well, if you can't use classes, maybe try giving it an ID if possible, like
<div class="xeuugli x2lwn1j x1cy8">
<div id="myId">
<div class="xeuugli x2lwn1j x1cy8">
<div class="xeuugli x2lwn1j n94">
<div class="x8t9es0 x10d9sdx xo1l8bm xrohj xeuugli">$0,00</div>
</div>
</div>
<div class="xeuugli x2lwn1j x1cy8zghib x19lwn94">
<span class="x8t9es0 xw23nyj xeuugli">Helloworld.</span>
</div>
</div>
</div>
ad then you can select the ID via the css #id selector like so:
#myId {
/*stuff here*/
}
If you can't have IDs either, we could get really creative by finding a grouping element which you will swear to never use on another place, like <section> or <article>, and then you could use
const elem = document.getElementsByTagName("article")[0];
elem.style.border = '2px solid red';
which returns an array of all elements with that tag name, which in our case would be the only one you need. Then you could via Javascript give it the css you need.

Traversing the DOM with querySelector

I'm using the statement document.querySelector("[data-testid='people-menu'] div:nth-child(4)") in the console to give me the below HTML snippet:
<div>
<span class="jss1">
<div class="jss2">
<p class="jss3">Owner</p>
</div>
</span>
<div class="jss4">
<div class="5" title="User Title">
<p class="jss6">UT</p>
</div>
<div class="jss7">
<p class="jss82">User Title</p>
<span class="jss9">Project Manager</span>
</div>
</div>
</div>
I'd like to extend the statement in the console to extract the title "User Title" but can't figure out what combination of nth-child or nextSibling (or something else) to use. The closest I've gotten is:
document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1)")
which gives me the span with class jss1.
I expected document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1).nextSibling") to give me the div with class jss4, but it returns null.
I can't use class selectors because those are generated dynamically at build.
Why not just add [title] onto your querySelector?
document.querySelector("[data-testid='people-menu'] div:nth-child(4) [title]")
You can then get whatever you are looking for from that section? This is assuming title will be unique attribute in this section of html

How to query nested level DOM using javascript and recursive

I have an issue like this and still cant find a solution for it.
I have a DOM query like this
#someid .someclass p
Below is the DOM
<div id='someid'>
<div class='someclass'>
<p>
some text
</p>
</div>
</div>
How can I query the p element by just using these 3 API
document.getElementById
document.getElementsByClassName
document.getElementsByTagName
I want to use recursive in this case until I can get the p element
Thanks everyone for your guidance.
I've edited my answer and my you are looking for something like this:
var targetContent1 = myFunction('someid', 'someclass', 'p')
var targetContent2 = myFunction('anotherId', 'anotherClass', 'span')
console.log('targetContent1', targetContent1)
console.log('targetContent2', targetContent2)
function myFunction(idName, className, tagName) {
return document.getElementById(idName).getElementsByClassName(className)[0].getElementsByTagName(tagName)[0].textContent;
}
<div id='someid'>
<div class='someclass'>
<p>some text</p>
</div>
</div>
<div id='anotherId'>
<div class='anotherClass'>
<span>Another Tag Text!</span>
</div>
</div>
Use these:
document.getElementById("someid");
document.getElementsByClassName("someclass");
document.getElementsByTagName("p")

How to find the text in all div elements without the first div coming from class "B"

I'm trying to capture the text inside all the div elements without the text of the first div that is the child of class "B"
I've been banging my head all day, but I can't seem to get it to work properly.
<div class="A">
Text 1
</div>
<div class="B">
<div>
Welcome 1
</div>
<div>
Welcome 2
</div>
</div>
This is the expression I'm using:
//body//text()[not (//div[#class='B']/div[1])]
but it is not returning any results.
After making the XML well-formed by giving it a single root element,
<div>
<div class="A">
Text 1
</div>
<div class="B">
<div>
Welcome 1
</div>
<div>
Welcome 2
</div>
</div>
</div>
Here are all the div elements that have no div descendents and are not the first div with a parent with a #class attribute value of `B':
//div[not(descendant::div) and not(../#class='B' and position() = 1)]
The above XPath selects these two div elements:
<div class="A">
Text 1
</div>
<div>
Welcome 2
</div>
So you can get the associated text() nodes using this XPath:
//div[not(descendant::div) and not(../#class='B' and position() = 1)]/text()
...which will select:
Text 1
Welcome 2
without selecting Welcome 1, as requested.

Xpath: select div with an anchor descendant whose depth is unknown

Sample html:
<div>
<div class="foo bar baz"> <-- target 1 -->
<div>
<span>
hello world
</span>
</div>
</div>
<div class="foo bar">foo</div>
<div class="bar"> <-- target 2 -->
<div>
<div>
<span>
hello world
</span>
</div>
</div>
</div>
</div>
I want to select: divs that: 1)has class name bar 2) has an <a> descendant whose href contains hello.
My problem is that the <a> tag could be nested in different levels. How to handle this correctly?
You can use relative descendant-or-self axis (.//) to check <a> element in various possible level depth :
//div[contains(concat(' ', #class, ' '), ' bar ')][.//a[contains(#href,'hello')]]
related discussion : How can I find an element by CSS class with XPath?