how to write css selector for scrapy? - html

I have the following web page:
<div id="childcategorylist" class="link-list-container links__listed" data-reactid="7">
<div data-reactid="8">
<strong data-reactid="9">Categories</strong>
</div>
<div data-reactid="10">
<ul id="categoryLink" aria-label="shop by category" data-reactid="11">
<li data-reactid="12">
Contact Lenses
</li>
<li data-reactid="14">
Beauty
</li>
<li data-reactid="16">
Personal Care
</li>
I want to have css selector of href tags under li tag, i.e. for contact lens, beauty and personal-care. How to write it?
I am writing it in the following way:
#childcategorylist li
gives me following output:
['<li class="titleitem" data-reactid="16"><strong data-reactid="17">Categories</strong></li>']
Please help!

I am not a expert in scrapy, but usually html elements should have a .text object.
If not, you might want to use regexp to extract the text between > and < like:
import re
txt = someArraycontainingStrings[0]
x = re.search(">[a-zA-Z]*</", txt)
Maybe that gives you proper results

Related

jQuery Relative path issue

I am currently am trying to get this below function working. I would like to use a relative path in order to add an active class for my different ul li a tag in my dom. The problem I am facing is that despite splitting the URL, it doesn't appear to be checking and comparing right part of the url and href. here is an of my HTML structure:
<div class="parent-bar"> <ul> <li>
<a class="parent-link" href="/dummy-parent-2/">
<div class="wrap">
<span class="text">Parent page 2
</span>
</div> </a> <div id="child-menu-1" class="child-menu">
<ul>
<li>
<a href="/dummy-parent-2/dummy-child-1/" class="child-links">
<div class="wrap">
<span class="text">child page 1
</span>
</div>
</a>
</li>
</ul> </div> </li> </ul> </div>
and here is the jQuery supposed to be tracking the url:
jQuery(function($) {
var path = window.location.href;
path = path.split("/");
$(".parent-link").each(function() {
if (this.href === path[3]) {
$(this).addClass('active');
}
});
if($('.parent-link').hasClass('active')){
$('.child-links').addClass('active');
}
});
The console is not showing any errors. I can see the active class is not added. The reason I am looking to do this is to allow the children of the parent items to be active as well. I will apply the same principle to the children to track the parent's respective URL. if that makes sense.
Any help or insight would be much appreciated.

how to give a class to an <li> element that inserted in my code automatically by moodle cashe?

I am creating a new moodle block and I want to edit the code of CSS for this block. Currently there are images that I want to make inline but moodle makes these images in ul and li by default that take the full width "display: block". I can't give any class to these li and I can't select the li because moodle create the li tags for the titles and everything in the block, so if I edit the li it will be for every element like h1 and so on not only for imgs, can I give a class for it?
<ul class="unlist">
<li class="r0"><div class="column c1"><h3><div class="text_to_html">User grade</div></h3></div></li>
<li class="r1"><div class="column c1">first grade middle school </div></li>
<li class="r0"><div class="column c1"><h3><div class="text_to_html">user type</div></h3></div></li>
<li class="r1"><div class="column c1"> Student</div></li>
<li class="r0"><div class="column c1"><h3><div class="text_to_html">General Certificates</div></h3></div></li>
<li class="r1"><div class="column c1"><a download="download" class="inline" href="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/1900 $10000 Gold Certificate both sides.jpg"><img class="cvpic" src="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/1900%20%2410000%20Gold%20Certificate%20both%20sides.jpg"></a></div></li>
<li class="r0"><div class="column c1"><a download="download" class="inline" href="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/Aamir Javed Certificate.jpg"><img class="cvpic" src="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/Aamir%20Javed%20Certificate.jpg"></a></div></li>
<li class="r1"><div class="column c1"><a download="download" class="inline" href="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/US $20 1905 Gold Certificate.jpg"><img class="cvpic" src="http://81.10.36.53/pluginfile.php/711/profilefield_file/files_2/0/US%20%2420%201905%20Gold%20Certificate.jpg"></a></div></li>
If you can use jQuery, you can do something like:
$(document).ready(function() {
$("ul.unlist li div img").css("display", "inline");
});
What does it do:
Wait for the page to completely load
Select all <img> tags inside your <ul> that have the unlist class, that also have a <div> as a parent element, inside a <li> element
So, basically, it selects the <img> in the last 3 rows of the few lines you provided.
That way, you only select the <li> that have a <img> tag in it and not the other <li> with the <h3> tag, for example.
If you can't use jQuery, here is the same code, but in pure JavaScript:
(function() {
document.querySelector("ul.unlist li div img").style.display = "inline";
})();
Edit
By re-reading your question, you want to add a class, so here it is:
jQuery
$("ul.unlist li div img").addClass("your-class");
JavaScript
document.querySelector("ul.unlist li div img").className += " your-class";
Don't forget the space in the beginning of the JavaScript one, otherwise your <img> classes are gonna be like that : cvpicyour-class instead of cvpic your-class

Unable to click a link residing inside a <li element

I am not able to click on the link nestled inside a list tag.
Here is the HTML code:
<div class="sideBarContent" ng-include="'routes/sidebar/sidebar.tpl.html'">
<div id="innerSidebarContent" ng-controller="SidebarController">
<div>
<ul class="menuItems bounceInDown">
<li id="menuHome" class="" ui-sref="home" ng-click="closeMobileMenu()" href="/home/">
<li id="menuConfigurator" ui-sref="configurator" ng-click="closeMobileMenu()" href="/configurator/">
<span class="menuIcon regularImage blueHighlight activated icon-selectAndTailor"></span>
<span class="menuIcon icon-selectAndTailor_active activeImage">
<p class="mainMenuLabel multiLine">Select & Tailor Methods</p>
</li>
I tried all these ways to locate the text and click on it:
describe('Test objects in /configurator/ route', function() {
it('Click on select and tailor banner icon', function(){
//element(by.css('ul.menuItems > li[href=/configurator/]')).click();
//element(by.className('menuIcon icon-selectAndTailor_active activeImage')).click();
//element(by.css("li[#id='menuConfigurator' and #href='/configurator/']")).click();
//element(by.id('menuConfigurator')).click();
//element(by.xpath("//div[#class='sideBarContent']/p")).click();
//element(by.css("#menuConfigurator > p")).click();
//element(by.partialLinkText('Select & Tailor Methods')).click();
element(by.linkText("Select & Tailor Methods")).click();
console.log('in the configspec ...');
})});
Can someone help me resolve this?
Just had the same issue.
It turned out that wrapping the list in a < div > block was the problem.
Once the list was moved to be outside any < div > block the < a > tags worked.
li can not have href attribute
Use
<li id="menuHome" class="" ui-sref="home" ng-click="closeMobileMenu()"></li>
Or
<li id="menuHome" class="" ui-sref="home" ng-click="closeMobileMenu()" href="/home/"></li>
Instead of
<li id="menuHome" class="" ui-sref="home" ng-click="closeMobileMenu()" href="/home/"></li>
According to html this is not link.
Select it using other selectors:
element(by.className("multiLine")).click();
element(by.css(".mainMenuLabel.multiLine")).click();
element(by.css("[class='mainMenuLabel multiLine']")).click();
element(by.xpath(".//p[#class='mainMenuLabel multiLine']")).click();

How to get span class text using jsoup

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.
There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}
Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

Finding html elements in jquery

I am supposed to find a class and apply a logic for that.
My code structure is as follows.
<div class="class">
<form>
<ul>
<li>xxx</li><li>xxx</li>
</ul>
<ul>
<li>xxx</li><li>xxx</li>
</ul>
<ul class="ul_class">
<li>
<input ....><a ...><span ..></span>
<a href="#" title="View History" class="hstry">
<span class="hide"> </span></a>
</li>
<li>xxx</li>
</ul>
</form>
How to find the class hstry inside the ul with the class named ul_class.
Just use a normal CSS selector to find nested classes like the following:
$( 'ul.ul_class .hstry' )
Note the whitespace between both classes. Without it, it would match an element having both classes, instead of an element with class hstry which is below some <ul> element with class ul_class.
If you want the content, try
var hstry = $('body').find('.hstry').html();
Then you can operate with this variable any way you want.
Using jquery:
$("ul.ul_class").find(".hstr");
$('ul.ul_class .hstry').html(); //for html content
$('ul.ul_class .hstry').text(); //for text data