Python Selenium unable to find text of iterative span classes - html

Unable to get the "span text" printed.
Expected Output
Hello-World
Foo-Bar
Given HTML Snippet:
<div class="information-container">
<ul>
<li class="info-item">
<span class="info-text">
Hello-World
</span>
</li>
</ul>
<ul>
<li class="info-item">
<span class="info-text">
Foo-Bar
</span>
</li>
</ul>
</div>
My parse Code (Method 1):
page = self.browser.find_element_by_class_name("information-container")
for elem in page.find_elements_by_xpath('.//span[#class = "info-text"]'):
print("E>", elem.text)
attrs = self.browser.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', elem)
pprint(attrs)
Output (Method 1):
E>
{'class': 'info-text'}
E>
{'class': 'info-text'}
My parse Code (Method 2):
page = self.browser.find_element_by_class_name("information-container")
li_objs = page.find_elements_by_class_name('info-text')
for o in li_objs:
print("text:", o.text)
Output (Method 2):
text:
text:

The text from all span tags on a page can be displayed using the following
from selenium import webdriver
driver = webdriver.Firefox(executable_path = 'path_to_driver')
driver.get('path_to_site')
elements = driver.find_elements_by_tag_name('span')
for element in elements:
text = element.text
print(text)

Related

How to attach a checkbox to each of the items of a list with ngModel in Angular?

I want to have a checkbox attached to each text of my list.
Like this:
Text1 [Checkbox1]
Text2 [Checkbox2]
Text3 [Checkbox3]
Text4 [Checkbox4]
The list is dynamic therefore checkboxes should also appear dynamically next to each item of the list.
I should be able to set the default value of each checkbox in the beginning and also gather their values when user clicks on them.
I have tried this:
<div *ngIf = "blogs.length > 0">
<ul>
<li *ngFor = "let blog of blogs"
(click) = "onSelect(blog)"
[class.selected] = "blog === clickedOnThisBlog">
<a *ngIf = "blog.show === true" routerLink = "/editor/{{blog.id}}">
{{blog.title}}
creationDate: {{blog.creationDate}}
modificationDate: {{blog.modificationDate}}
</a>
<a *ngIf = "blog.show === true">
<input type = "checkbox"
[ngModel] = "checkboxChecked"
#checkbox_l = "ngModel"
value = "blog"
(click) = "onCheckboxClicked( checkbox_l, value )" >
</a>
</li>
</ul>
</div>
The first half of this code shows the list of text.
In second half I have attempted to attach the checkboxes with each of the text.
I don't know how to link the list of checkboxes back to the .ts file so that I can control them at one place there.
This is a template driven code. ngModel has to be used.
What's the way out?
I defined an array in the corresponding .ts file as follows:
checkboxes: CheckboxStructure[] = []
CheckboxStructure is defined as follows:
export interface CheckboxStructure
{
id: number
value: boolean
}
I wrote let i = index in *ngFor and attached my checkboxes array's particular field to [ngModel] like this: [ngModel] = "checkboxes[i].value"
<div *ngIf = "blogs.length > 0">
<ul>
<li *ngFor = "let blog of blogs; let i = index;" >
<a *ngIf = "blog.show === true" routerLink = "/editor/{{blog.id}}">
{{blog.title}}
creationDate: {{blog.creationDate}}
modificationDate: {{blog.modificationDate}}
</a>
<a *ngIf = "blog.show === true">
<input type = "checkbox"
[ngModel] = "checkboxes[i].value">
</a>
</li>
</ul>
</div>
Credit: *ngFor how to bind each item in array to ngModel using index

Parsing "Further reading" with selenium, python

I need to parse text from Further reading in wikipedia.
My code can open "google" by inputing request, for example 'Bill Gates', and then it can find url of wikipedia's page.And now i need to parse text from Further reading, but i do not know how.
Here is code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
URL = "https://www.google.com/"
adress = input() #input request, example: Bill Gates
def main():
driver = webdriver.Chrome()
driver.get(URL)
element = driver.find_element_by_name("q")
element.send_keys(adress, Keys.ARROW_DOWN)
element.send_keys(Keys.ENTER)
elems = driver.find_elements_by_css_selector(".r [href]")
link = [elem.get_attribute('href') for elem in elems]
url = link[0] #wikipedia's page's link
if __name__ == "__main__":
main()
And here's HTML code
<h2>
<span class="mw-headline" id="Further_reading">Further reading</span>
</h2>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>
<h3>
<span class="mw-headline" id="Primary_sources">Primary sources</span>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>
url - https://en.wikipedia.org/wiki/Bill_Gates
This page has Further Reading text between 2 h2 tags. To collect the text, just find ul elements between h2s. This is the code that worked for me:
# Open the page:
driver.get('https://en.wikipedia.org/wiki/Bill_Gates')
# Search for element, get text:
further_read = driver.find_element_by_xpath("//ul[preceding-sibling::h2[./span[#id='Further_reading']] and following-sibling::h2[./span[#id='External_links']]]").text
print(further_read)
I hope this helps, good luck.

Targeting elements of an a-tag with Nokogiri when classes dont work

I am trying to build a scraper and I would need some help with the following:
I would like to grab a bunch of data from an a-tag and some divs/spans nested in the same div.
My code look like this:
page = Nokogiri::HTML(open(website))
page.search('.company').each { |e| companies << e.text.strip }
page.search('.jobtitle').each { |e| jobtitles << e.text.strip }
page.search('.location').each { |e| locations << e.text.strip }
page.xpath('//a[#class="turnstileLink"]').map{ |e| links << e['href'] }
For the first three (company, title and location) I get either 16 or 15 results, but for the last search my array only contains 10 elements. Weirdly its they also dont match the first 10 of one of the other arrays, but rather start matching somewhere around the 3rd or 4th element of one of the other arrays.
The html of a typical card that I would like to target is here:
<div class="row result clickcard" id="pj_81c3e09223cbc6b3" data-jk="81c3e09223cbc6b3" data-advn="4563763653116462" data-tu="">
<a target="_blank" id="sja1" data-tn-element="jobTitle" class="jobtitle turnstileLink" href="/pagead/clk?mo=r&ad=-6NYlbfkN0DhDTzlYIMy8YIuVE6IrMC_kH05KGZgoAT6LTrcTn8STrwXoiuruouegXiAvJy4qud6xIecRibm3b0Q5eOBkpCiV3R04sAyQbvP7gt6NKZVpCRp32eFzXudmk-TIABX3xEZGo90a47Vz9OofqZaLDh37545RNQ3sFjM6VzWNEWwKf_YoXxeGKcAICj9AADyBuYAY7p9UIUxoox7J5U9gO8Zo2dvRW-i5FJtaUr49Vjsl04W0Jp-CN2azbfp6rrfT6RYFbJ_YAc2iI-L37eeygDtI4KXQwv_elrV8ZLEKo9rkcfEzbE129kX7JKeEq5wJ1dj7GJ4ONH1lIPJQd1gJLoqNYJVQlLTKJiBP72Z0RBmgfZQ-69U8AoEyMT6pytz6iqykLCnO-SxClmvFPJsNV96oBGzpMWtWQeVgGQ49jZfBBRq9Ubw7N73iEjCv6oQ70hcW1P4d8DYK0pCI7vu2KfUh0P9vx8AKC6wY2QoAZeeP4OiBIJ8ikKSIUYJTbe3UwKcLYP7r_3_rx1gY_JO1ReG21ctCxfqGH9DnqTSjz3SYCMZ2ZekooXa&vjs=3&p=1&sk=&fvj=1" title="Private Care Jobs With Elder - Immediate Start - £550 to £750 pw" rel="noopener nofollow" onmousedown="sjomd('sja1'); clk('sja1');" onclick="setRefineByCookie([]); sjoc('sja1',0); convCtr('SJ')">Private Care Jobs With Elder - Immediate Start - £550 to £75...</a>
<br>
<div class="sjcl">
<span class="company">
Elder</span>
<span class="location">London</span>
</div>
<div class="">
<table cellpadding="0" cellspacing="0" border="0"><tbody><tr><td class="snip">
<span class="summary">
Pass a full DBS check or have a valid check already. Access to the internet and a smartphone. At Elder, we’re looking for caring individuals to join our...</span>
</td></tr></tbody></table>
</div>
<div class="sjCapt">
<div class="result-link-bar-container">
<div class="result-link-bar"><span class=" sponsoredGray ">Sponsored</span> - <span id="tt_set_10" class="tt_set"><a id="sj_81c3e09223cbc6b3" href="#" class="sl resultLink save-job-link " onclick="changeJobState('81c3e09223cbc6b3', 'save', 'linkbar', true, ''); return false;" title="Save this job to my.indeed">save job</a></span><div id="editsaved2_81c3e09223cbc6b3" class="edit_note_content" style="display:none;"></div><script>if (!window['sj_result_81c3e09223cbc6b3']) {window['sj_result_81c3e09223cbc6b3'] = {};}window['sj_result_81c3e09223cbc6b3']['showSource'] = false; window['sj_result_81c3e09223cbc6b3']['source'] = "Indeed"; window['sj_result_81c3e09223cbc6b3']['loggedIn'] = false; window['sj_result_81c3e09223cbc6b3']['showMyJobsLinks'] = false;window['sj_result_81c3e09223cbc6b3']['undoAction'] = "unsave";window['sj_result_81c3e09223cbc6b3']['jobKey'] = "81c3e09223cbc6b3"; window['sj_result_81c3e09223cbc6b3']['myIndeedAvailable'] = true; window['sj_result_81c3e09223cbc6b3']['showMoreActionsLink'] = window['sj_result_81c3e09223cbc6b3']['showMoreActionsLink'] || false; window['sj_result_81c3e09223cbc6b3']['resultNumber'] = 10; window['sj_result_81c3e09223cbc6b3']['jobStateChangedToSaved'] = false; window['sj_result_81c3e09223cbc6b3']['searchState'] = "l=London&start=20"; window['sj_result_81c3e09223cbc6b3']['basicPermaLink'] = "https://www.indeed.co.uk"; window['sj_result_81c3e09223cbc6b3']['saveJobFailed'] = false; window['sj_result_81c3e09223cbc6b3']['removeJobFailed'] = false; window['sj_result_81c3e09223cbc6b3']['requestPending'] = false; window['sj_result_81c3e09223cbc6b3']['notesEnabled'] = false; window['sj_result_81c3e09223cbc6b3']['currentPage'] = "serp"; window['sj_result_81c3e09223cbc6b3']['sponsored'] = true;window['sj_result_81c3e09223cbc6b3']['showSponsor'] = true;window['sj_result_81c3e09223cbc6b3']['reportJobButtonEnabled'] = false; window['sj_result_81c3e09223cbc6b3']['showMyJobsHired'] = false; window['sj_result_81c3e09223cbc6b3']['showSaveForSponsored'] = true; window['sj_result_81c3e09223cbc6b3']['showJobAge'] = true;</script></div></div>
<div class="tab-container">
<div class="sign-in-container result-tab"></div>
<div class="tellafriend-container result-tab email_job_content"></div>
</div>
</div>
</div>
All cards have the same class ".clickcard" and all the relevant links have the class ".turnstileLink" but I cant seem to get consistent results when i try to page.search or page.xpath them, without having a problem matching up the data from all the different arrays correctly, besides the different number of elements I get returned.
So my question is: If I want to scrape the company name, location, job title, the url to that page and possibly another value, how would I best go about this?
I would appreciate any feedback!
Edit:
The contains() expression needs to be more complex:
contains(
concat(' ',normalize-space(#class),' '),
' turnstileLink '
)
to prevent classes like turnstileLinkerCar from matching. It's such a hassle that I would use doc.css() with a css selector like a.turnstileLink, which takes care of matching exactly the specified class name in a string that may have multiple class names.
Try:
doc.xpath('//a[contains(#class, "turnstileLink")]').each{ |e| links << e['href'] }
Or:
doc.css('a.turnstileLink').each{ |e| links << e['href'] }
Here's the problem:
require 'nokogiri'
my_html = %q{
<html>
<body>
A link
B link
C link
D link
</body>
</html>
}
doc = Nokogiri::HTML(my_html)
links = doc.xpath('//a[#class="c1"]').map{ |e| e["href"] }
p links
--output:--
["aaa"]
The class of the bbb link is "c1 c2" which is not equal to "c1".
Response to comment:
require 'nokogiri'
my_html = %q{
<html>
<body>
<div class="x">
A link
B link
C link
<div>
D link
</div>
</div>
<div class="y">
Y link
</div>
</body>
</html>
}
doc = Nokogiri::HTML(my_html)
links = doc.css('a.c1').map{ |e| e["href"] }
p links
--output:--
["aaa", "bbb", "ccc", "ddd", "yyy"]
But:
links = doc.css('div.x a.c1').map{ |e| e["href"] }
p links
--output:--
["aaa", "bbb", "ccc", "ddd"]
The same thing with xpaths:
links = doc.xpath('//div[contains(#class, "x")]//a[contains(#class, "c1")]').map{ |e| e["href"] }
plinks
--output:--
["aaa", "bbb", "ccc", "ddd"]

How to Provide Breadcrumbs for Page with dynamic Parameters?

I found a Grails framework for generating Breadcrumbs here. It does generate breadcrumbs based on a static definition in a breadcrumbs.xml file where it defines the hierarchies of the crumbs:
<map>
<nav id="homeCrumb" matchController="samplePages" matchAction="homeBreadCrumbPage">
<!-- levels navigation -->
<nav id="itemsLevel1Crumb" matchController="samplePages" matchAction="level1BreadCrumbPage">
<nav id="itemsLevel2Crumb" matchController="samplePages" matchAction="level2BreadCrumbPage">
<nav id="itemsLevel3Crumb" matchController="samplePages" matchAction="level3BreadCrumbPage">
<nav id="showItemCrumb" matchController="samplePages" matchAction="itemDetailsBreadCrumbPage"/>
</nav>
</nav>
</nav>
<nav id="simple1Crumb" matchController="samplePages" matchAction="simpleBreadCrumb"/>
<nav id="simple2Crumb" matchController="samplePages" matchAction="simpleBreadCrumbWithAttr"/>
<!-- levels navigation -->
</nav>
</map>
This file is evaluated and printed by a taglib:
class BreadCrumbTagLib {
static def log = LogFactory.getLog("grails.app.breadCrumbTag")
def breadCrumb = { attrs , body ->
def manager = BreadCrumbManager.getInstance()
def uri = request.getRequestURI()
def context = request.getContextPath()
def controller = params.controller
def action = params.action
def attrTitle = attrs.title
def attrLink = attrs.link
// if controller and action are missing from params try to get them from request url
if (!controller && !action && uri && context && uri.indexOf(context) != -1) {
def uriParams = uri.substring(uri.indexOf(context) + (context.length() + 1), uri.length())
def uriArray = uriParams.split('/')
if (uriArray.size() >= 2 ) {
controller = uriArray[0]
action = uriArray[1]
}
}
def crumbs = manager.getBreadCrumbs(controller, action)
if (crumbs) {
out << '<div class="breadcrumb"><ul>'
def size = crumbs.size()
crumbs.eachWithIndex { crumb, index ->
out << '<li>'
// override title and link of breadcrumb on current page (i.e. last bread crumb in hierarchy)
// if name, link attributes are supplied
if (index == size - 1) {
if (attrTitle)
crumb.title = attrTitle
if (attrLink)
crumb.link = attrLink
}
// set title to undefined if not found, associated
// renderer if present can overwrite it
if (!crumb.title)
crumb.title = "undefined"
if (crumb.title && crumb.title.size() > 40)
crumb.title = crumb.title.substring(0, 40) + "..."
if (crumb.viewController && crumb.viewAction) {
def content = g.include(controller:crumb.viewController, action:crumb.viewAction, breadcrumb:crumb, params:params)
out << content
} else if (crumb.viewTemplate) {
def content = g.include(view:crumb.viewTemplate, breadcrumb:crumb, params: params)
out << content
} else if (crumb.linkToController && crumb.linkToAction && (size - 1 > index)){
out << "${crumb.title}"
// if crumb has a link and its not the last vread crumb then show link else
// just show the text
} else if (crumb.link && (size - 1 > index)){
out << "${crumb.title}"
} else {
out << "${crumb.title}"
}
out << "</li>"
// do not print for last bread crumb
if (size - 1 > index)
out << "<li>»</li>"
}
out << "</ul></div>"
}
}
}
Problem: When I have a structure where I need some params which are not fix.
Example: I am in the third level of navigation lets say
A1 / A2 / A3
In my case A2 should open a page like user/show/1234 where 1234 is the id of the user to show. The problem is that I cannot add 1234 hard coded in the breadcrumbs.xml file because this id changes depending on which user you want to show.
How can I handle this when an intermediate breadcrumbs link needs dynamic parameters?
After thinking about it some more, I realized it may be better not to use the HttpSession. If you use a session-scoped service instead it will be easier to unit test the breadcrumb code.
First, create a session-scoped service to maintain the user's navigation history.
class NavigationHistoryService {
static transactional = false
static scope = "session"
def history = [:]
public List push(String controller, String action, Map params) {
def crumb = [
action: action,
params: params]
history.controller = crumb
return history
}
In your controllers inject the service and use it to keep track of where the user has been. Then add the history as part of what's returned by the action's model:
class CompanyController {
def navigationHistoryService
def show() {
navigationHistoryService.push('company', 'show', params)
...
[crumbs: navigationHistoryService.history]
}
}
Finally, use the history in your GSP to render the crumbs.
<ol class="breadcrumb">
<li><g:link controller="company" action="${crumbs.company.action}" params="${crumbs.company.params}">SOMETHING</a></li>
</ol>
It looks like your breadcrumbs are in the format CONTROLLER/ACTION/ID. If that's so, the information you need is already available in your GSP via the webRequest property. Here's an example using Twitter Bootstrap breadcrumbs:
<ol class="breadcrumb">
<li>${webRequest.controllerName}</li>
<li>${webRequest.actionName}</li>
<li class="active">${webRequest.id}</li>
</ol>
You'd still have to set up the hrefs to something meaningful. A more robust approach would be something like this...
<g:set var="crumbs" value="${[webRequest.controllerName, webRequest.actionName, webRequest.id].findAll { it != null }}.collect { [label: it, active: false] }" />
<% crumbs.last().active = true %>
<ol class="breadcrumb">
<g:each in="${crumbs}">
<li class="${it.active ? 'active' : ''}">${it.label}</li>
</g:each>
</ol>
Embedding Groovy code into GSP via the <% %> tags is not recommended, but something like this could be done in a TagLib. This approach can handle breadcrumbs of 1-3 parts in length. It adjusts according to the current URI.
use simple by blade view
<ul class="breadcrumb" style="padding-right: 20px">
<li> <i class="fa fa-home"></i> <a class="active" href="{{url('/')}}">Home</a>
{{--<i class="fa fa-angle-right"></i>--}}
</li> <?php $link = url('/') ; ?>
#for($i = 1; $i <= count(Request::segments()); $i++)
<li>
#if($i < count(Request::segments()) & $i > 0)
<?php $link .= "/" . Request::segment($i); ?>
<a class="active" href="<?= $link ?>">{{Request::segment($i)}}</a>
{{--{!!'<i class="fa fa-angle-right"></i>'!!}--}}
#else {{Request::segment($i)}}
#endif
</li>
#endfor
</ul>

Umbraco 6 razor menu

Hi I want to add a class of "active" to my li, if it is active or if I am on a page under that. I have the following code
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
#{
Layout = null;
}
<nav class="topNav">
<ul>
#foreach (var item in Model.Content.AncestorOrSelf(1).Children.Where(x => x.IsVisible() && x.IsDocumentType("Subfrontpage") || x.IsDocumentType("Procesguide")))
{
<li>
#item.Name
</li>
}
</ul>
</nav>
I think I can do something with this, but it gives an error
var isSelected = Model.Path.Contains(item.Id.ToString()) ? "active" : "";
<li class="#Html.Raw(isSelected)">
#item.Name
</li>
This is the error I get. Line 10.
Compilation Error
Description: An error occurred during the compilation of a resource required to service this request. Please review the following specific error details and modify your source code appropriately.
Compiler Error Message: CS1061: 'Umbraco.Web.Models.RenderModel' does not contain a definition for 'Path' and no extension method 'Path' accepting a first argument of type 'Umbraco.Web.Models.RenderModel' could be found (are you missing a using directive or an assembly reference?)
Source Error:
Line 8: #foreach (var item in Model.Content.AncestorOrSelf(1).Children.Where(x => x.IsVisible() && x.IsDocumentType("Subfrontpage") || x.IsDocumentType("Procesguide")))
Line 9: {
Line 10: var isSelected = Model.Path.Contains(item.Id.ToString()) ? "active" : "";
Line 11:
Line 12: <li class="#Html.Raw(isSelected)">
I have now tried with this, but no lunk
<ul>
<li>
#home.Name
</li>
#foreach (var item in Model.Content.AncestorOrSelf(1).Children.Where(x => x.IsVisible() && x.IsDocumentType("Subfrontpage") || x.IsDocumentType("Procesguide")))
{
var isSelected = item.IsDescendant(Model,"active", "");
<li class="#Html.Raw(isSelected)">
#item.Name
</li>
}
</ul>
ok here is the solution. This is for typed, not dynamic cshtml.
#inherits Umbraco.Web.Mvc.UmbracoTemplatePage
#{
Layout = null;
var home = Model.Content.AncestorOrSelf(1);
}
<ul>
#*Render Home item*#
#{ var homeActive = ""; }
#if( home.Id == Model.Content.Id){
homeActive = "active";
}
<li class="#homeActive">
<a href="#home.Url">
#home.Name
</a>
</li>
#*Render Home children*#
#foreach (var item in home.Children.Where(x => x.IsVisible()))
{
var active = "";
if(home.Id != Model.Content.Id){ #* if NOT home *#
if (item.Id == Model.Content.AncestorOrSelf(2).Id){
#* if foreach id and currentpage ancestor id is equal *#
active = "active";
}
}
<li class="#active">
<a href="#item.Url">
#item.Name
</a>
</li>
}
</ul>
You can try something like this:
var isSelected = item.IsDescendant(Model,"active", "");
Here is a list of the functions you can use:
#Model.AncestorOrSelf(string nodeTypeAlias)
#Model.AncestorOrSelf(int level)
#Model.AncestorOrSelf(Func<DynamicNode, bool> func)
and those Helper functions:
#Model.IsDescendant(DynamicNode[,valueIfTrue][,valueIfFalse])
#Model.IsDescendantOrSelf(DynamicNode[,valueIfTrue][,valueIfFalse])
Reference: Is the current page a descendant of a specific node id?
The following is the default script for navigation produced by umbraco:
#inherits umbraco.MacroEngines.DynamicNodeContext
#*
Macro to display child pages below the root page of a standard website.
Also highlights the current active page/section in the navigation with
the css class "current".
*#
#{
#*Get the root of the website *#
var root = Model.AncestorOrSelf(1);
}
<ul>
#foreach (var page in root.Children.Where("Visible"))
{
<li class="#page.IsAncestorOrSelf(Model, "current", "")">
#page.Name
</li>
}
</ul>
I just created a menu myself and needed the Home page to be active also. Working with just the method IsAncestorOrSelf doesn't work for the homepage.
The piece of code I've created is this:
#{
var root = Model.AncestorOrSelf(1);
var homeClass = root.Id == Model.Id ? "active" : "";
}
<ul id="nav" class="nav navbar-nav pull-right">
<li class="#homeClass"><a href="/" >Home</a></li>
#foreach (var page in root.Children.Where("Visible"))
{
<li class="#page.IsAncestorOrSelf(Model, "active", "")">
#page.Name
</li>
}
</ul>
This iterates through all children of the root node (1). To be able to discover if you are on the Home page, I'm checking the Id's of the page and set the active class accordingly.