XPath selection by value - html

I want to get a value of "square" (for example, 201). I tried to do so, as described here, but it doesn't work:
./li[attributeTitle='Этаж']
Html code:
<div class = "A">
<ui class = "B">
<li>
<span class = "attributeTitle"> Floor </span>
<span class = "attributeValue"> 3 </span>
</li>
<! A random more items "li" >
<li>
<span class = "attributeTitle"> Square </span>
<span class = "attributeValue"> 201 </span>
</li>
<li>
<span class = "attributeTitle"> Nrooms </span>
<span class = "attributeValue"> 4 </span>
</li>
</ui>
</div>
Thanks for any help.

You can use contains() function in xpath to check whether text contains some string:
"//div[#class='attributeTitle'][contains(text(),'Square')]"
This gets you this node:
<span class = "attributeTitle"> Square </span>
To get the value node that is right below it you can use following-sibling::span:
"//div[#class='attributeTitle'][contains(text(),'Square')]/following-sibling::span[1]"
And adding [1] to indicate that we want only the first sibling in case there are more than one sibling. You can also use [class='attributeValue'] instead to indicate that we only want siblings that have this particular class, or not use anything at all there if you trust there will only be 1 sibling.

Related

How to take content from two same nodes separately?

I have HTML file with list of product names and prices
<ul>
<li>
<label>
<span class="name">Name 1</span>
<span class="price">3.99</span>
</label>
</li>
<li>
<label>
<span class="name">Name 2</span>
<span class="price">5.49</span>
</label>
</li>
...
</ul>
and need to take names and prices from each <label> separately.
I'm using Nokogiri to parse HTML file and tried
file.xpath('//ul/li/label').each do |item|
puts item.content
end
but, as you may have guessed, it returns both name and price.
Name and price span elements are children of the label element, so you can fetch them using xpath within the scope of each label
file.xpath('//ul/li/label').each do |item|
name = item.at_xpath("span[#class='name']").text()
price = item.at_xpath("span[#class='price']").text()
puts "#{name} - #{price}"
end
or using css selector
file.xpath('//ul/li/label').each do |item|
name = item.at_css('.name').text()
price = item.at_css('.price').text()
puts "#{name} - #{price}"
end
Typically I'd use something like this:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<ul>
<li>
<label>
<span class="name">Name 1</span>
<span class="price">3.99</span>
</label>
</li>
<li>
<label>
<span class="name">Name 2</span>
<span class="price">5.49</span>
</label>
</li>
</ul>
EOT
data = doc.css('label').map { |label| [label.at('.name').text, label.at('.price').text] }.to_h
# => {"Name 1"=>"3.99", "Name 2"=>"5.49"}
As long as the .name text is unique, which it seems like it should from the example HTML, the resulting hash will be valid and easy to use.
IF you need them in order then Ruby will return the key/value pairs in the order they're originally inserted if you iterate over them, which is not something I recommend relying on because in other languages you can't rely on that but your mileage might vary. Otherwise, the lookup to retrieve the value for a given key is extremely fast, no matter how many entries there are because it's a hash. And, a hash can be passed around for a lot of useful munging.

How to get the text inside a span tag which is inside another tag using beautifulsoup?

How do I get the value of all the tags that have class="no-wrap text-right circulating-supply"? What I used was:
text=[ ]
text=(soup.find_all(class_="no-wrap text-right circulating-supply"))
Output of text[0]:
'\n\n17,210,662\nBTC\n'
I just want to extract the numeric value.
Example of one instance:
<td class="no-wrap text-right circulating-supply" data-sort="17210662.0">
<span data-supply="17210662.0">
<span data-supply-container="">
17,210,662
</span>
<span class="hidden-xs">
BTC
</span>
</span>
</td>
Thanks.
In case all elements have similar HTML structure try below to get required output:
texts = [node.text.strip().split('\n')[0] for node in soup.find_all(class_="no-wrap text-right circulating-supply")]
This might look like an overkill , You could use use regex to extract numbers
from bs4 import BeautifulSoup
html = """<td class="no-wrap text-right circulating-supply" data-sort="17210662.0">
<span data-supply="17210662.0">
<span data-supply-container="">
17,210,662
</span>
<span class="hidden-xs">
BTC
</span>
</span>
</td>"""
import re
soup = BeautifulSoup(html,'html.parser')
coin_value = [re.findall('(\d+)', node.text.replace(',','')) for node in soup.find_all(class_="no-wrap text-right circulating-supply")]
print coin_value
prints
[[u'17210662']]

Swap Follow/Following button based on whether or not the user Follows the individual

I am trying to swap the Follow/Following button depending on whether or not the currentuser is following the other individual. In my code I have and NgIF set up and the thing i am having difficulty with is checking for the value in the array. If just one users name is in the the code works for that user. However if the array has multiple indexes the code turns the value to false.
HTML:
<div *ngFor="let pic of pics">
<span *ngIf="pic.user!=current">
<span *ngIf="pic.user!=cFollows">
<button ion-button>Follow</button>
</span>
<span *ngIf="pic.user==cFollows">
<button ion-button>Following</button>
</span>
My TS File(all of the data in pics is in JSON:
pics = []
cFollows = ["user1","user2"]
So basically if the string value of pic.user is equal to any string in the array show the following button. If it is not show the follow button.
So i figured out i need to change the code to match below
<span *ngIf="pic.user!=current">
<span *ngIf="cFollows.indexOf(pic.user)==-1">
<button ion-button>Follow</button>
</span>
<span *ngIf="cFollows.indexOf(pic.user)!=-1">
<button ion-button>Following</button>
</span>
</span>

How to get span class text using jsoup

I am using jsoup HTML parser and trying to travel into span class and get the text from it but Its returning nothing and its size always zero. I have pasted small part of HTML source . pls help me to extract the text.
<div class="list_carousel">
<div class="rightfloat arrow-position">
<a class="prev disabled" id="ucHome_prev" href="#"><span>prev</span></a>
<a class="next" id="ucHome_next" href="#"><span>next</span></a>
</div>
<div id="uc-container" class="carousel_wrapper">
<ul id="ucHome">
<li modelID="587">
<h3 class="margin-bottom10"> Ford Figo Aspire</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 5.50 - 7.50 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
<li modelID="899">
<h3 class="margin-bottom10"> Chevrolet Trailblazer</h3>
<div class="border-dotted margin-bottom10"></div>
<div>Estimated Price: <span class="cw-sprite rupee-medium"></span> 32 - 40 lakhs</div>
<div class="border-dotted margin-top10"></div>
</li>
I have tried below code:
Elements var_1=doc.getElementsByClass("list_carousel");//four classes with name of list_carousel
Elements var_2=var_1.eq(1);//selecting first div class
Elements var_3 = var_2.select("> div > span[class=cw-sprite rupee-medium]");
System.out.println(var_3 .eq(0).text());//printing first result of span text
please ask me , if my content was not very clear to you. thanks in advance.
There are several things to note about your code:
A) you can't get the text of the span, since it has no text in the first place:
<div>Estimated Price:
<span class="cw-sprite rupee-medium"></span>
5.50 - 7.50 lakhs
</div>
See? The text is in the div, not the span!
B) Your selector "> div > span[class=cw-sprite rupee-medium]" is not really robust. Classes in HTML can occur in any order, so both
<span class="cw-sprite rupee-medium"></span>
<span class="rupee-medium cw-sprite"></span>
are the same. Your selector only picks up the first. This is why there is a class syntax in css, which you should use instead:
"> div > span.cw-sprite.rupee-medium"
Further you can leave out he first > if you like.
Proposed solution
Elements lcEl = doc.getElementsByClass("list_carousel").first();
Elements spans = lcEl.select("span.cw-sprite.rupee-medium");
for (Element span:spans){
Element priceDiv = span.parent();
System.out.println(priceDiv.getText());
}
Try
System.out.println(doc.select("#ucHome div:nth-child(3)").text());

Set parent element of a group of nodes (wrap whole group)

Can Jsoup set parent element of a group of nodes? I mean wrap it, but no every matched element - only create one parent element? So I want to include more elements into one?
Example: before
<b>some text<i> blabla </i> other text </b>
After
<span id='something'><b>some text<i> blabla </i> other text </b></span>
<b>some te
<span id="cke_bm_69S" style="display: none;"> </span>
xt</b>
aaa
<i>bb
<span id="cke_bm_69S" style="display: none;"> </span>
b</i>
The span tags are bookmarks - start selection and end selection - added from CKEDITOR. Then on the server side I have to process it. This is the goal - add final span and remove the temp-spans (bookmarks):
<b>some te</b>
<span id="something"><b>
xt</b>
aaa
<i>bb
</i></span><i>
b</i>
As you can see, it has to solve the tag-crossing problem.
public static void main(String... args) throws IOException {
Document document = Jsoup.parse("<div>"
+ "<b>some text<i> blabla </i> other text </b>" + "</div>");
Element b = document.select("b").first();
Element span = document.createElement("span");
span.attr("id", "something");
b.replaceWith(span);
span.appendChild(b);
System.out.println(document);
}
Output
<html>
<head></head>
<body>
<div>
<span id="something"><b>some text<i> blabla </i> other text </b></span>
</div>
</body>
</html>