HtmlUnit - Unable to get anchors from div - html

The divs of the HTML page I am targeting look like this:
<div class="white-row1">
<div class="results">
<div class="profile">
<a href="hrefThatIWant.com" class>
<img src = "http://imgsource.jpg" border="0" width="150" height="150 alt>
</a>
</div>
</div>
</div>
<div class="white-row2">
// same content as the div above
</div>
I want to scrap collect the href in each div in a list.
This is my current code:
List<HtmlAnchor> profileDivLinks = (List)htmlPage.getByXPath("//div[#class='profile']//#href");
for(HtmlAnchor link:profileDivLinks)
{
System.out.println(link.getHrefAttribute());
}
This is the error I am receiving (which goes on first line of the for statement):
Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.html.DomAttr cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlAnchor
What do you think the issue is?

The issue is you're getting an attribute and then you're casting that attribute to an anchor. I guess the solution with the minimal change to your code would be just modifying the XPath to return an anchor:
htmlPage.getByXPath("//div[#class='profile']//a");

try
//div[#class='profile']//data(#href)

Related

Traversing the DOM with querySelector

I'm using the statement document.querySelector("[data-testid='people-menu'] div:nth-child(4)") in the console to give me the below HTML snippet:
<div>
<span class="jss1">
<div class="jss2">
<p class="jss3">Owner</p>
</div>
</span>
<div class="jss4">
<div class="5" title="User Title">
<p class="jss6">UT</p>
</div>
<div class="jss7">
<p class="jss82">User Title</p>
<span class="jss9">Project Manager</span>
</div>
</div>
</div>
I'd like to extend the statement in the console to extract the title "User Title" but can't figure out what combination of nth-child or nextSibling (or something else) to use. The closest I've gotten is:
document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1)")
which gives me the span with class jss1.
I expected document.querySelector("[data-testid='people-menu'] div:nth-child(4) span:nth-child(1).nextSibling") to give me the div with class jss4, but it returns null.
I can't use class selectors because those are generated dynamically at build.
Why not just add [title] onto your querySelector?
document.querySelector("[data-testid='people-menu'] div:nth-child(4) [title]")
You can then get whatever you are looking for from that section? This is assuming title will be unique attribute in this section of html

React error: Style prop value must be an object react/style-prop-object

I am trying to wrap text around an image and if I use the following code:
<div className="container">
<img src={myImageSource} alt="swimmer" height="300" width="300" style="float: left" />
<p> This is where the other text goes about the swimmer</p>
</div>
Now I understand that this style="float: left" is html 5. However if I use the following code:
<div className="container">
<img src={myImageSource} alt="swimmer" height="300" width="300" align="left" />
<p> This is where the other text goes about the swimmer</p>
</div>
It works! Why can't I use style in React?
You can still use style in react. Try :
style={{float: 'left'}}
The issue is you are passing style as a String instead of an Object. React expects you to pass style in an object notation:
style={{ float:`left` }} // Object literal notation
or another way would be:
const divStyle = {
margin: '40px',
border: '5px solid pink'
};
<div style={divStyle}> // Passing style as an object
See the documentation for more info
In case if a property(e.g. 'align-self') has special characters, you need use single quotes for property name as well, e.g
style={{'align-self':'flex-end'}}
You may see a warning "Unsupported style property align-self. Did you mean alignSelf?", but the style is copied correctly to generated html align-self: flex-end;.
From the doc:
The style attribute accepts a JavaScript object with camelCased properties(style={{float: 'left'}}) rather than a CSS string (style="float: left"). This is consistent with the DOM style JavaScript property, is more efficient, and prevents XSS security holes.
So you should write it as:
<img src={myImageSource} alt="swimmer" height="300" width="300" style={{float: 'left'}} />
As other has stated. When copying from HTML, it gets string style="string: string".
You need to go style={{property: "value"}}
Adding ' before and after the style class and the value worked for me.
You can write inline style in react as style={{float: 'left'}}
And if you want to use more than one property you can use it as object as below to avoid lint errors.
const imgStyle = {
margin: '10px',
border: '1px solid #ccc',
float: 'left',
height: '300px',
width: '300px'
};
<div className="container">
<img src={imgStyle} alt="swimmer" />
<p> This is where the other text goes about the swimmer</p>
</div>
<div className="container">
<img src={myImageSource} alt="swimmer" height="300" width="300" **style={{align:'left'}}** />
<p> This is where the other text goes about the swimmer</p>
</div>

Using Nokogiri's CSS method to get all elements within an alt tag

I am trying to use Nokogiri's CSS method to get some names from my HTML.
This is an example of the HTML:
<section class="container partner-customer padding-bottom--60">
<div>
<div>
<a id="technologies"></a>
<h4 class="center-align">The Team</h4>
</div>
</div>
<div class="consultant list-across wrap">
<div class="engineering">
<img class="" src="https://v0001.jpg" alt="Person 1"/>
<p>Person 1<br>Founder, Chairman & CTO</p>
</div>
<div class="engineering">
<img class="" src="https://v0002.png" alt="Person 2"/></a>
<p>Person 2<br>Founder, VP of Engineering</p>
</div>
<div class="product">
<img class="" src="https://v0003.jpg" alt="Person 3"/></a>
<p>Person 3<br>Product</p>
</div>
<div class="Human Resources & Admin">
<img class="" src="https://v0004.jpg" alt="Person 4"/></a>
<p>Person 4<br>People & Places</p>
</div>
<div class="alliances">
<img class="" src="https://v0005.jpg" alt="Person 5"/></a>
<p>Person 5<br>VP of Alliances</p>
</div>
What I have so far in my people.rake file is the following:
staff_site = Nokogiri::HTML(open("https://www.website.com/company/team-all"))
all_hands = staff_site.css("div.consultant").map(&:text).map(&:squish)
I am having a little trouble getting all elements within the alt="" tag (the name of the person), as it is nested under a few divs.
Currently, using div.consultant, it gets all the names + the roles, i.e. Person 1Founder, Chairman; CTO, instead of just the person's name in alt=.
How could I simply get the element within alt?
Your desired output isn't clear and the HTML is broken.
Start with this:
require 'nokogiri'
doc = Nokogiri::HTML('<html><body><div class="consultant"><img alt="foo"/><img alt="bar" /></div></body></html>')
doc.search('div.consultant img').map{ |img| img['alt'] } # => ["foo", "bar"]
Using text on the output of css isn't a good idea. css returns a NodeSet. text against a NodeSet results in all text being concatenated, which often results in mangled text content forcing you to figure out how to pull it apart again, which, in the end, is horrible code:
doc = Nokogiri::HTML('<html><body><p>foo</p><p>bar</p></body></html>')
doc.search('p').text # => "foobar"
This behavior is documented in NodeSet#text:
Get the inner text of all contained Node objects
Instead, use text (AKA inner_text or content) against the individual nodes, resulting in the exact text for that node, that you can then join as you want:
Returns the content for this Node
doc.search('p').map(&:text) # => ["foo", "bar"]
See "How to avoid joining all text from Nodes when scraping" also.

Selenium locate <img> nested in <div> class

I have the following html code:
<span id="spanId" class="myThumbnails">
<div class="Thumbnail" style="margin-bottom:12px;text-align:center;">
<img id="thumbl00_cph_Img1" style="border-width:0px;" src="http://someImg.jpg"></img>
<input id="thumbl00_cph_Img1" type="hidden" value="http://someImg.jpg"></input>
</div>
<div class="Thumbnail" style="margin-bottom:12px;text-align:center;"></div>
<div class="Thumbnail" style="margin-bottom:12px;text-align:center;"></div>
<div class="Thumbnail" style="margin-bottom:12px;text-align:center;"></div>
</span>
I've extracted the span using xpath & then findElements by className but now I need the inner <img> src attribute, since the id is generated i can't use it is there a way to extract img?
WebElement has getAttribute method. That does exactly what you want. So your code could be something similar to:
driver.findElement(By.Xpath("//div[#class=\"Thumbnail\"]/img").getAttribute("src")
You can use jQuery selector
$('span#spanId img').src
in case there is only 1 img tag, if there is more than 1 img tag, just use loop to get src attribute

How can I target a class within a class with CSS?

I've been up for hours trying and failing to make this work.
I have some code like this:
<h1 class="title ">MasterClass Lessons</h1>
<div class="view view-uc-products view-id-uc_products view-display-id-page_4 3col-grid view-dom-id-1">
<div class="view-content">
<table class="views-view-grid">
<tbody>
<tr class="row-1 row-first">
<td class="col-1"><div class="panel-display panel-1col clear-block" >
<div class="panel-panel panel-col">
<div>
<div class="views-field-field-image-cache-fid">
<div class="field-content">
<a href="/content/gold-pass-all-lessons">
</div>
</div>
<div class="views-field-title">
<span class="field-content">GOLD PASS - ALL LESSONS!</span>
</div>
I want to target the class "views-field-title" with a style sheet, but I only want to apply a style when it's a subclass of "3col-grid" which is specified in a div a few levels above. Drupal lets me specify my own class name (I used 3col-grid) for the specific purpose of this CSS targeting, but when I do the following…
<style>
.3col-grid .views-field-title {
font-weight:bold;
}
</style>
It doesn't work.
I also tried
.3col-grid>.views-field-title
and
.3col-grid * .views-field-title
and
.3col-grid*.views-field-title
I'm sure there must be a way to make it work, and I'm sure it's quite simple.
Anyone who can tell me what that is will make me a happy man.
Thanks,
Joe
Your HTML is invalid.
CSS classnames cannot begin with numbers.
Once you fix that, you can use the descendant selector:
.OuterClass .InnerClass