How to query nested level DOM using javascript and recursive - html

I have an issue like this and still cant find a solution for it.
I have a DOM query like this
#someid .someclass p
Below is the DOM
<div id='someid'>
<div class='someclass'>
<p>
some text
</p>
</div>
</div>
How can I query the p element by just using these 3 API
document.getElementById
document.getElementsByClassName
document.getElementsByTagName
I want to use recursive in this case until I can get the p element
Thanks everyone for your guidance.

I've edited my answer and my you are looking for something like this:
var targetContent1 = myFunction('someid', 'someclass', 'p')
var targetContent2 = myFunction('anotherId', 'anotherClass', 'span')
console.log('targetContent1', targetContent1)
console.log('targetContent2', targetContent2)
function myFunction(idName, className, tagName) {
return document.getElementById(idName).getElementsByClassName(className)[0].getElementsByTagName(tagName)[0].textContent;
}
<div id='someid'>
<div class='someclass'>
<p>some text</p>
</div>
</div>
<div id='anotherId'>
<div class='anotherClass'>
<span>Another Tag Text!</span>
</div>
</div>

Use these:
document.getElementById("someid");
document.getElementsByClassName("someclass");
document.getElementsByTagName("p")

Related

CSS selector for the element without any classname or attribute

Is it possible to write a CSS selector matching the element which does not contain any attributes or class names?
For example, I have html like the following (but with tons of divs and dynamic class names) and I want to match the second div (it does not contain class)
<div class="xeuugli x2lwn1j x1cy8">
<div>
<div class="xeuugli x2lwn1j x1cy8">
<div class="xeuugli x2lwn1j n94">
<div class="x8t9es0 x10d9sdx xo1l8bm xrohj xeuugli">$0,00</div>
</div>
</div>
<div class="xeuugli x2lwn1j x1cy8zghib x19lwn94">
<span class="x8t9es0 xw23nyj xeuugli">Helloworld.</span>
</div>
</div>
</div>
P.S. Getting the div like div:nth-child(2) is not a solution.
P.P.S. Could you please advise in general why the dynamic class names are used in the development?
Well, if you can't use classes, maybe try giving it an ID if possible, like
<div class="xeuugli x2lwn1j x1cy8">
<div id="myId">
<div class="xeuugli x2lwn1j x1cy8">
<div class="xeuugli x2lwn1j n94">
<div class="x8t9es0 x10d9sdx xo1l8bm xrohj xeuugli">$0,00</div>
</div>
</div>
<div class="xeuugli x2lwn1j x1cy8zghib x19lwn94">
<span class="x8t9es0 xw23nyj xeuugli">Helloworld.</span>
</div>
</div>
</div>
ad then you can select the ID via the css #id selector like so:
#myId {
/*stuff here*/
}
If you can't have IDs either, we could get really creative by finding a grouping element which you will swear to never use on another place, like <section> or <article>, and then you could use
const elem = document.getElementsByTagName("article")[0];
elem.style.border = '2px solid red';
which returns an array of all elements with that tag name, which in our case would be the only one you need. Then you could via Javascript give it the css you need.

Traversing the DOM with querySelector

I'm using the statement document.querySelector("[data-testid='people-menu'] div:nth-child(4)") in the console to give me the below HTML snippet:
<div>
<span class="jss1">
<div class="jss2">
<p class="jss3">Owner</p>
</div>
</span>
<div class="jss4">
<div class="5" title="User Title">
<p class="jss6">UT</p>
</div>
<div class="jss7">
<p class="jss82">User Title</p>
<span class="jss9">Project Manager</span>
</div>
</div>
</div>
I'd like to extend the statement in the console to extract the title "User Title" but can't figure out what combination of nth-child or nextSibling (or something else) to use. The closest I've gotten is:
document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1)")
which gives me the span with class jss1.
I expected document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1).nextSibling") to give me the div with class jss4, but it returns null.
I can't use class selectors because those are generated dynamically at build.
Why not just add [title] onto your querySelector?
document.querySelector("[data-testid='people-menu'] div:nth-child(4) [title]")
You can then get whatever you are looking for from that section? This is assuming title will be unique attribute in this section of html

Xpath node-set nesting order selection

Is there an Xpath 1.0 expression that I could use starting at the div[#id='rootTag'] context to select the different nested span descendants based on how deep they are nested?
For example could you use something like span[2] to select the second most deeply nested span tag rather than second span child of the same parent element?
<div id='rootTag'>
<span>Test</span>
<div>
<span>Test</span>
<span>Test</span>
</div>
</div>
<span>Test</span>
</div>
<div>
<div>
<div>
<div>
<span>Test</span>
</div>
<span>Test</span>
</div>
</div>
</div>
</div>
It's a bit (a lot...) of a hack, but it can be done this way:
Assume your html is like this:
levels = """<div id='rootTag'>
<span>Level2</span>
<div>
<span>Level3</span>
<div>
<span>Level4</span>
</div>
</div>
<div>
<span>Level3</span>
</div>
<div>
<div>
<div>
<div>
<span>Level6</span>
</div>
<span>Level5</span>
</div>
</div>
</div>
</div>"""
We then do this:
#First collect the data:
from lxml import etree #you have to make sure your html is well-formed, or it won't work
root = etree.fromstring(levels)
tree = etree.ElementTree(root)
#collect the paths of all <span> elements
paths = [tree.getpath(e) for e in root.iter('span')]
#determine the nesting level of each <span> element
nests = [e.count('/') for e in paths] #or, alternatively:
#nests = [tree.getpath(e).count('/') for e in root.iter('span')]
From here, we use the nesting level in the nests list to extract the comparable element in the paths list. For example, to get the <span> element with the deepest nesting level:
deepest = nests.index(max(nests))
print(paths[deepest],root.xpath(paths[deepest])[0].text)
Output:
/div/div[3]/div/div/div/span Level6
Or to extract the <span> element with a level 4 nesting:
print(paths[nests.index(4)],root.xpath(paths[nests.index(4)])[0].text)
Output:
/div/div[1]/div/span Level4

Is it possible to create a sort of HTML object (even using a framework)

I was wondering if it was possible to create a sort of HTML object instead of copy pasting stuff, I thought of doing it via javascript but wondered if there was an easier way to do it (writing html in JS is a bit tedious).
Basically let's say a have a div like that:
<div class ="col">
<div class="Title">
Title
</div>
<div class="Text">
Text
</div>
</div>
Which is the best way, to have some sort of function where you can objectName.create(title, text) or to have a javascript function like Function(title, text) create the element?
You could take the outer element and clone it, change its content and append it back to where you want it. Be advised that this may duplicate ids if your elements should have one.
function createHtml(title, text) {
const el = document.querySelector('.col').cloneNode(true);
el.querySelector('.Title').innerText = title;
el.querySelector('.Text').innerText = text;
document.body.appendChild(el);
}
createHtml("Foo", "Bar");
<div class="col">
<div class="Title">
Title
</div>
<div class="Text">
Text
</div>
</div>
Another option would be to create the element from scratch
function createElement(title, text) {
const el = document.createElement('div');
el.clasName = 'col';
const titleDiv = document.createElement('div');
titleDiv.className = 'Title';
titleDiv.appendChild(document.createTextNode(title));
const textDiv = document.createElement('div');
textDiv.className = 'Text';
textDiv.appendChild(document.createTextNode(text));
el.appendChild(titleDiv);
el.appendChild(textDiv);
document.body.appendChild(el);
}
createElement("Foo", "Bar");
Note that there are many frameworks out there (like angular, react, vue, ...) that would do things like that easier/better.
It is not so bad to write html in js after template literals became a thing in js, you could do something like this
function addCol(title, text){
document.querySelector(".list").innerHTML += `
<div class="col">
<div class="Title">
${title}
</div>
<div class="Text">
${text}
</div>
</div>
`;
}
addCol("hello", "world");
addCol("foo", "bar");
<div class="list"></div>

How to get simple text from HTML page with goquery?

I am new to Go. I am using goquery to extract data from an HTML page.
But the problem is the data I am looking for is not bounded by any HTML tag. It is simple text after a <br> tag. How can I extract it?
Edit : Here is HTML code.
<div class="container">
<div class="row">
<div class="col-lg-8">
<p align="justify"><b>Name</b>Priyaka</p>
<p align="justify"><b>Surname</b>Patil</p>
<p align="justify"><b>Adress</b><br>India,Kolhapur</p>
<p align="justify"><b>Hobbies </b><br>Playing</p>
<p align="justify"><b>Eduction</b><br>12th</p>
<p align="justify"><b>School</b><br>New Highschool</p>
</div>
</div>
</div>
From this I want "Priyanka" and "12th".
The following is what you want:
doc.Find(".container").Find("[align=\"justify\"]").Each(func(_ int, s *goquery.Selection) {
prefix := s.Find("b").Text()
result := strings.TrimPrefix(s.Text(), prefix)
println(result)
})
import strings in front of your code. If you need complete code example, check here.
Try query for and get its siblings
http://godoc.org/github.com/PuerkitoBio/goquery#Selection.Siblings