How to get span class text using jsoup - html

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.

There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}

Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

Related

Traversing the DOM with querySelector

I'm using the statement document.querySelector("[data-testid='people-menu'] div:nth-child(4)") in the console to give me the below HTML snippet:
<div>
<span class="jss1">
<div class="jss2">
<p class="jss3">Owner</p>
</div>
</span>
<div class="jss4">
<div class="5" title="User Title">
<p class="jss6">UT</p>
</div>
<div class="jss7">
<p class="jss82">User Title</p>
<span class="jss9">Project Manager</span>
</div>
</div>
</div>
I'd like to extend the statement in the console to extract the title "User Title" but can't figure out what combination of nth-child or nextSibling (or something else) to use. The closest I've gotten is:
document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1)")
which gives me the span with class jss1.
I expected document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1).nextSibling") to give me the div with class jss4, but it returns null.
I can't use class selectors because those are generated dynamically at build.
Why not just add [title] onto your querySelector?
document.querySelector("[data-testid='people-menu'] div:nth-child(4) [title]")
You can then get whatever you are looking for from that section? This is assuming title will be unique attribute in this section of html

how to give a class to an <li> element that inserted in my code automatically by moodle cashe?

I am creating a new moodle block and I want to edit the code of CSS for this block. Currently there are images that I want to make inline but moodle makes these images in ul and li by default that take the full width "display: block". I can't give any class to these li and I can't select the li because moodle create the li tags for the titles and everything in the block, so if I edit the li it will be for every element like h1 and so on not only for imgs, can I give a class for it?
<ul class="unlist">
<li class="r0"><div class="column c1"><h3><div class="text_to_html">User grade</div></h3></div></li>
<li class="r1"><div class="column c1">first grade middle school </div></li>
<li class="r0"><div class="column c1"><h3><div class="text_to_html">user type</div></h3></div></li>
<li class="r1"><div class="column c1"> Student</div></li>
<li class="r0"><div class="column c1"><h3><div class="text_to_html">General Certificates</div></h3></div></li>
<li class="r1"><div class="column c1"><a download="download" class="inline" href="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/1900 $10000 Gold Certificate both sides.jpg"><img class="cvpic" src="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/1900%20%2410000%20Gold%20Certificate%20both%20sides.jpg"></a></div></li>
<li class="r0"><div class="column c1"><a download="download" class="inline" href="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/Aamir Javed Certificate.jpg"><img class="cvpic" src="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/Aamir%20Javed%20Certificate.jpg"></a></div></li>
<li class="r1"><div class="column c1"><a download="download" class="inline" href="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/US $20 1905 Gold Certificate.jpg"><img class="cvpic" src="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/US%20%2420%201905%20Gold%20Certificate.jpg"></a></div></li>
If you can use jQuery, you can do something like:
$(document).ready(function() {
$("ul.unlist li div img").css("display", "inline");
});
What does it do:
Wait for the page to completely load
Select all <img> tags inside your <ul> that have the unlist class, that also have a <div> as a parent element, inside a <li> element
So, basically, it selects the <img> in the last 3 rows of the few lines you provided.
That way, you only select the <li> that have a <img> tag in it and not the other <li> with the <h3> tag, for example.
If you can't use jQuery, here is the same code, but in pure JavaScript:
(function() {
document.querySelector("ul.unlist li div img").style.display = "inline";
})();
Edit
By re-reading your question, you want to add a class, so here it is:
jQuery
$("ul.unlist li div img").addClass("your-class");
JavaScript
document.querySelector("ul.unlist li div img").className += " your-class";
Don't forget the space in the beginning of the JavaScript one, otherwise your <img> classes are gonna be like that : cvpicyour-class instead of cvpic your-class

XPath to select link containing text?

I tried to use this XPath:
//*[contains(normalize-space(text()),'Jira')]
Also tried:
//*[contains(text(),'Jira')]
In the below HTML example, there is space before and after text "Jira". I am not able to click on the link:
<a href="#/crm/usergroup-edit?id=572a3c84e4b07f6189958700"
ng-repeat="gp in groups | filter : userGroupSearch | orderBy:'-name':1"
class="ng-scope">
<div class="inventoryPanel" ng-style="myStyle" style="width: 15.8%;">
<h4 class="ng-binding">
<div class="groupIcon G">
<div class="text ng-binding">P</div>
</div>Jira
</h4>
</div>
</a>
The following XPath will select all a elements whose string value contains a Jira substring:
//a[contains(.,'Jira')]

How to access div element text based on adjacent text

I have the following HTML code and am trying to access "QA1234", which is the value of the Serial Number. Can you let me know how I can access this text?
<div class="dataField">
<div class="dataName">
<span id="langSerialNumber">Serial Number</span>
</div>
<div class="dataValue">QA1234</div>
</div>
<div class="dataField">
<div class="dataName">
<span id="langHardwareRevision">Hardware Revision</span>
</div>
<div class="dataValue">05</div>
</div>
<div class="dataField">
<div class="dataName">
<span id="langManufactureDate">Manufacture Date</span>
</div>
<div class="dataValue">03/03/2011</div>
</div>
I assume you are trying to get the "QA1234" text in terms of being the "Serial Number". If that is correct, you basically need to:
Locate the "dataField" div that includes the serial number span.
Get the "dataValue" within that div.
One way is to get all the "dataField" divs and find the one that includes the span:
parent = browser.divs(class: 'dataField').find { |div| div.span(id: 'langSerialNumber').exists? }
p parent.div(class: 'dataValue').text
#=> "QA1234"
parent = browser.divs(class: 'dataField').find { |div| div.span(id: 'langManufactureDate').exists? }
p parent.div(class: 'dataValue').text
#=> "03/03/2011"
Another option is to find the serial number span and then traverse up to the parent "dataField" div:
parent = browser.span(id: 'langSerialNumber').parent.parent
p parent.div(class: 'dataValue').text
#=> "QA1234"
parent = browser.span(id: 'langManufactureDate').parent.parent
p parent.div(class: 'dataValue').text
#=> "03/03/2011"
I find the first approach to be more robust to changes since it is more flexible to how the serial number is nested within the "dataField" div. However, for pages with a lot of fields, it may be less performant.

Hiding nested anchors in html/css

I have the following code:
<a id="outer-anchor" href="/test">
text in anchor
<a id="inner-anchor" href="/test2" style="display:none"></a>
</a>
I tried this in different browsers and the inner-anchor drops out of the outer-anchor in every browser. So it gets rendered as this:
<a id="outer-anchor" href="/test">
text in anchor
</a>
<a id="inner-anchor" href="/test2" style="display:none"></a>
Does someone know why and how to fix this?
Thanks in advance
You can't house anchor tags inside anchor tags.
This would move the childNode out of the outer-anchor to be it's sibling after it, and then hide it nicely;
JSfiddle
HTML
<div id="parent-placeholder">
<a id="outer-anchor" href="/test">
text in anchor
<a id="inner-anchor" href="/test2" style="display:none"></a>
</a>
</div>
JavaScript
var outer = document.getElementById('outer-anchor');
var inner = outer.nextSibling;
inner.style.display = 'none';
inner.parentNode.removeChild(inner);
outer.parentNode.appendChild(inner);
Output
<div id="parent-placeholder">
<a id="outer-anchor" href="/test">text in anchor</a>
<a id="inner-anchor" href="/test2" style="display: none;"></a>
</div>
parent-placeholder div purely exists to show how to relate to the parent DOM-element where the anchors are in.