get text with jsoup - html

I have this HTML
<ul id="items"><li>
<p><strong><span class="style4"><strong>Lifts open today include Agassiz to the top, Sunset, Hart Prairie, Little and Big Spruce from <br />
9 a.m. - 4 p.m.</strong></span></strong></p>
</li>
</ul>
<h3> </h3>
<h3>Trails Open<br />
</h3>
<ul id="items">
<li class="style4">
<p><strong><span class="style4">100% of trails open with 30 groomed runs. </span></strong></p>
</li>
</ul>
I want the text "Lifts open today....."
This is my code. Nothing is show. There is no error in the logcat
Document doc = Jsoup.connect(url).get();
Elements div = doc.select("div.right");
for (Element liftope : div){
Elements p =liftope.select("#items > li > p");
liftoper = p.text();
}
What is wrong???

If you want only that text "Lifts open today include: Agassiz to the top, Sunset, Hart Prairie, Aspen and Little Spruce Conveyor!" this (i try) work:
Element div = doc.getElementById("contentinterior");
Elements uls = div.getElementsByTag("ul");
Element ul = uls.get(2);
String result = ul.text();

Related

CSS Selectors Explanation

I'm really confused as to how I can style this list WITHOUT changing the following HTML or wrapping it in class names. How do I go about this?
For example, if I want to put Ocean Fish in bold but NOT have Pacific or Atlantic affected, how do I target that div without adding a class name?
Another example, I want to have "12 salmon" and "3 cod" in green text, and "46 halibut" and "13 pollock" in blue text? in I know there's a trick using very specific selectors, but I don't know how.
<ul>
<li>
<div>Ocean Fish</div>
<ul>
<li>
<div>Pacific</div>
<ul>
<li>12 salmon</li>
<li>3 cod</li>
</ul>
</li>
</ul>
<ul>
<li>
<div>Atlantic</div>
<ul>
<li>46 halibut</li>
<li>13 pollock</li>
</ul>
</li>
</ul>
</li>
</ul>```
This will handle all your cases if you want to do it without classes.
body > ul > li > div{font-weight:bold;}
body > ul > li > ul:nth-child(2) > li > ul{color:green;}
body > ul > li > ul:nth-child(3) > li > ul{color:blue;}
Similar method to previous answer, but more direct:
let divSel = document.querySelectorAll('div'); // alert(divSel.length);
divSel[0].style.fontWeight = 'bolder';
let liSel = document.querySelectorAll('li');
liSel[2].style.color = 'green';
liSel[3].style.color = 'green';
liSel[5].style.color = 'blue';
liSel[6].style.color = 'blue';
Still a pain to modify if entry order is changed.

how to get the value from <h> element in <li> element

i have html code and i need to get value from <h> element that is placed in a <li> element so i tried the following code
<li class="product-price">
<h3> 7 406,10 dollar </h3>
<!-- close price -->
</li>
<script>
function myFunction() {
var x = document.getElementsByClassName("product-price");
alert(x.item(0).innerHTML);
}
</script>
but i am getting [https://i.stack.imgur.com/tCt1X.png]
as long as h3 is without a class or id so u can get it from it's parent using queryselector,
try the snippet =)
window.addEventListener('load', function() {
var txt = document.querySelector('.product-price :nth-child(1)').innerHTML;
console.log(txt);
});
<li class="product-price">
<h3> 7 406,10 dollar </h3>
<!-- close price -->
</li>
<li >
<h3 class="product-price"> 7 406,10 dollar </h3>
</li>
That should do it.
Although you might need to change the class names
This should also work (if you can't change the html):
console.log(x[0].innerHTML);
https://www.w3schools.com/jsref/dom_obj_htmlcollection.asp

how to write css selector for scrapy?

I have the following web page:
<div id="childcategorylist" class="link-list-container links__listed" data-reactid="7">
<div data-reactid="8">
<strong data-reactid="9">Categories</strong>
</div>
<div data-reactid="10">
<ul id="categoryLink" aria-label="shop by category" data-reactid="11">
<li data-reactid="12">
Contact Lenses
</li>
<li data-reactid="14">
Beauty
</li>
<li data-reactid="16">
Personal Care
</li>
I want to have css selector of href tags under li tag, i.e. for contact lens, beauty and personal-care. How to write it?
I am writing it in the following way:
#childcategorylist li
gives me following output:
['<li class="titleitem" data-reactid="16"><strong data-reactid="17">Categories</strong></li>']
Please help!
I am not a expert in scrapy, but usually html elements should have a .text object.
If not, you might want to use regexp to extract the text between > and < like:
import re
txt = someArraycontainingStrings[0]
x = re.search(">[a-zA-Z]*</", txt)
Maybe that gives you proper results

How to get span class text using jsoup

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.
There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}
Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

Google Translate, translate="no"

I have a Help.htm file for my App which translates reasonably well with Google Translate. I want to mark the menu items as Do Not Translate but none of the HTML tags that i found and tried would work. For the following i used the Google Translate website - it translated where i did not expect! as the following example shows.
Email us at <span class="notranslate">sales at mydomain dot com</span>
Écrivez-nous à <span class="notranslate">ventes à mydomain dot com</span>
I found a couple similar no translate tags but same results. What am I missing here?
Here is a "real life" example, from my help file. I copied this into the Google translate, chose French and clicked on Translate ...
Then from the Options Menu choose one of:
<ul>
<li><span class="notranslate">Help</span></li>
<li><span class="notranslate">Browse WWW</span></li>
<li><span class="notranslate">Load HTML Text</span></li>
<li><span class="notranslate">Get Connection State</span></li>
</ul>
Here is the :( translation to French ...
Ensuite, dans le menu Options, choisissez l'une des:
<ul>
     <li> <span class = "notranslate"> Aide </ span> </ li>
     <li> <span class = "notranslate"> Parcourir WWW </ span> </ li>
     <li> <span class = "de notranslate"> Load HTML texte </ span> </ li>
     <li> <span class = "de notranslate"> Obtenez Connection État </ span> </ li>
Control K not working consistently for me. Nope, my keyboard is messing up. Time for a new one. Hope you can fix for me :)
Here is mine with <span translate="no">, followed by actual examples from 3 professional HTML websites; none of these work for me ...
Then from the Options Menu choose one of:
<ul>
<li><span translate="no">Help</span> </li>
<li><span translate="no">Browse WWW</span></li>
<li><span translate="no">Load HTML Text</span></li>
<li><span translate="no">Get Connection State</span></li>
</ul>
<Puis dans le menu Options, choisissez l'une des:
<ul>
     <li> <span translate = "no"> Aide </ span> </ li>
     <li> <span traduire = "no"> Parcourir WWW </ span> </ li>
     <li> <span translate = "no"> Load HTML texte </ span> </ li>
     <li> <span translate = "no"> Obtenez Connection État </ span> </ li>
</ ul>
From the official Google Webmaster Central Blog ...
Email us at <span class="notranslate">sales at mydomain dot com</span>
Écrivez-nous à <span class = "notranslate"> ventes à mydomain dot com </span>
From w3schools.com ...
Don't translate this!
This can be translated to any language.
translate = "no"> Ne pas traduire cette!
Cela peut être traduit en aucune langue.
From w3.org ...
Using HTML's translate attribute
Utilisation de HTML translate attribut
I thought at first the above worked but translate in English == translate in French :(
<h1>Using HTML's <span class="kw" translate="no">They Cheated</span> attribute</h1>
<h1> Utilisation de HTML <le span class = "kw" translate = "no"> qu'ils ont triché </ span> attribut </ h1>
I did eventually determine what is the actual problem. It is that the markup will only be recognised as a signal not to translate the text if it is a page of an HTML website that you put through Google Translate. The translator interface at https://translate.google.com doesn't recognise that the pasted text should be interpreted as HTML code.