I would like to get the contents only of the name of the product and its seller. I do not want description or feeedback.
<div class="m-l-50 col-md-7 ">
<span class="font-size-15 " style="vertical-align:top"><strong>How to fix hdd</strong></span><br>
<span>Seller: bestbuy</span><br>
<span>Description: This Method will show you how to </span><br>
Feedback:<strong> <span style="color: green;"> 74 </span> : <span style="color: red;">1 </span><br>
MY CODE
def scrape_this_page(page_source):
page_source=BeautifulSoup(page_source,"html.parser")
products = page_source.findAll(class_='m-l-50 col-md-7')
for product in products:
names.append(product.span[0])
for product in products:
sellers.append(product.span[1])
In selenium just use -> for example: driver.find_element_by_css_selector(div.some_class_name.another_class_name)
And in BeautifulSoup use page_source.select(div.some_class_name.another_class_name)
If you dont have any classname you have to iterate (for loop) over the elements and check if the text starts with "Seller" or access it with Indies (elements[0]) (may be unstable)
Related
I've found the lowest class: <span class="pill css-1a10nyx e1pqc3131"> of multiple elements of a website but now I want to find the related/linked upper-class so for example the highest <div class="css-1v73czv eh8fd9011" xpath="1">. I've got the soup but can't figure out a way to get from the 'lowest' class to the 'highest' class, any idea?
<div class="css-1v73czv eh8fd9011" xpath="1">
<div class="css-19qortz eh8fd9010">
<header class="css-1idy7oy eh8fd909">
<div class="css-1rkuvma eh8fd908">
<footer class="css-f9q2sp eh8fd907">
<span class="pill css-1a10nyx e1pqc3131">
End result would be:
INPUT- Search on on all elements of a page with class <span class="pill css-1a10nyx e1pqc3131">(lowest)
OUTPUT - Get all related titles or headers of said class.
I've tried it with if-statements but that doesn't work consistently. Something with an if class = (searchable class) then get (desired higher class) should work.
I can add any more details if needed please let me know, thanks in advance!
EDIT: Picture per clarification where the title(highest class) = "Wooferland Festival 2022" and the number(lowest class) = 253
As mentioned, question needs some more information, to give a concret answer.
Assuming you like to scrape the information in the picture based on your example HTML you select your pill and use .find_previous() to locate your elements:
for e in soup.select('span.pill'):
print(e.find_previous('header').text)
print(e.find_previous('div').text)
print(e.text)
Assuming there is a cotainer tag in HTML structure like <a> or other you would select this based on the condition, that it contains a <span> wit class pill:
for e in soup.select('a:has(span.pill)'):
print(e.header.text)
print(e.header.next.text)
print(e.footer.span.text)
Note: Instead of using css classes, that can be highly dynamic, try use more static attributes or the HTML structure.
Example
See both options, for first one the <a> do not matter.
from bs4 import BeautifulSoup
html='''
<a>
<div class="css-1v73czv eh8fd9011" xpath="1">
<div class="css-19qortz eh8fd9010">
<header class="css-1idy7oy eh8fd909">some date information</header>
<div class="css-1rkuvma eh8fd908">some title</div>
<footer class="css-f9q2sp eh8fd907">
<span class="pill css-1a10nyx e1pqc3131">some number</span>
<footer>
</div>
</div>
</a>
'''
soup = BeautifulSoup(html)
for e in soup.select('span.pill'):
print(e.find_previous('header').text)
print(e.find_previous('div').text)
print(e.text)
print('---------')
for e in soup.select('a:has(span.pill)'):
print(e.header.text)
print(e.header.next.text)
print(e.footer.span.text)
Output
some date information
some title
some number
---------
some date information
some date information
some number
How do I get the value of all the tags that have class="no-wrap text-right circulating-supply"? What I used was:
text=[ ]
text=(soup.find_all(class_="no-wrap text-right circulating-supply"))
Output of text[0]:
'\n\n17,210,662\nBTC\n'
I just want to extract the numeric value.
Example of one instance:
<td class="no-wrap text-right circulating-supply" data-sort="17210662.0">
<span data-supply="17210662.0">
<span data-supply-container="">
17,210,662
</span>
<span class="hidden-xs">
BTC
</span>
</span>
</td>
Thanks.
In case all elements have similar HTML structure try below to get required output:
texts = [node.text.strip().split('\n')[0] for node in soup.find_all(class_="no-wrap text-right circulating-supply")]
This might look like an overkill , You could use use regex to extract numbers
from bs4 import BeautifulSoup
html = """<td class="no-wrap text-right circulating-supply" data-sort="17210662.0">
<span data-supply="17210662.0">
<span data-supply-container="">
17,210,662
</span>
<span class="hidden-xs">
BTC
</span>
</span>
</td>"""
import re
soup = BeautifulSoup(html,'html.parser')
coin_value = [re.findall('(\d+)', node.text.replace(',','')) for node in soup.find_all(class_="no-wrap text-right circulating-supply")]
print coin_value
prints
[[u'17210662']]
I am trying to swap the Follow/Following button depending on whether or not the currentuser is following the other individual. In my code I have and NgIF set up and the thing i am having difficulty with is checking for the value in the array. If just one users name is in the the code works for that user. However if the array has multiple indexes the code turns the value to false.
HTML:
<div *ngFor="let pic of pics">
<span *ngIf="pic.user!=current">
<span *ngIf="pic.user!=cFollows">
<button ion-button>Follow</button>
</span>
<span *ngIf="pic.user==cFollows">
<button ion-button>Following</button>
</span>
My TS File(all of the data in pics is in JSON:
pics = []
cFollows = ["user1","user2"]
So basically if the string value of pic.user is equal to any string in the array show the following button. If it is not show the follow button.
So i figured out i need to change the code to match below
<span *ngIf="pic.user!=current">
<span *ngIf="cFollows.indexOf(pic.user)==-1">
<button ion-button>Follow</button>
</span>
<span *ngIf="cFollows.indexOf(pic.user)!=-1">
<button ion-button>Following</button>
</span>
</span>
I tried to use this XPath:
//*[contains(normalize-space(text()),'Jira')]
Also tried:
//*[contains(text(),'Jira')]
In the below HTML example, there is space before and after text "Jira". I am not able to click on the link:
<a href="#/crm/usergroup-edit?id=572a3c84e4b07f6189958700"
ng-repeat="gp in groups | filter : userGroupSearch | orderBy:'-name':1"
class="ng-scope">
<div class="inventoryPanel" ng-style="myStyle" style="width: 15.8%;">
<h4 class="ng-binding">
<div class="groupIcon G">
<div class="text ng-binding">P</div>
</div>Jira
</h4>
</div>
</a>
The following XPath will select all a elements whose string value contains a Jira substring:
//a[contains(.,'Jira')]
Hello I'm trying to extract the price. Can anyone please help me? There is no output for the price
Html
<div id="product-price-box" class="prod_pricebox price_details" property="gr:hasPriceSpecification">
<div class="prod_pricebox_price">
<div class="prod_pricebox_price_final">
<span id="product_price" class="hidden">389.00</span>
<span id="special_price_box">RM 389.00</span>
</div>
<div id="special_price_area" class=" prod_pricebox_price_special">
<span id="product_special_price_label">Before</span>
<span class="price_erase">
<span id="product_price_prefix" class="price-prefix-detail"></span>
<span id="price_box">RM 449.00,</span>
</span>
<div class="prod_saving">
<span id="product_saving_label">You save</span>
<span id="product_saving_percentage" class="price_highlight"> 13%</span>
</div>
</div>
</div>
Jsoup
String url = "http://www.lazada.com.my/asus-zenfone-c-zc451cg-16gb-white-2801812.html";
Document doc = Jsoup.connect(url).get();
//Document doc = Jsoup.connect("http://www.lazada.com.my/").followRedirects(true).get();
String title = doc.title();
System.out.println("title is: " + title);
String price = doc.select("span[id=prod_pricebox_price]").text();
System.out.println("Price is: " + price);
Use String price = doc.select("[class=product-price]").text();.
The selectors you've used belongs to some JavaScript that runs on the page, so JSUOP is not useful for it.
According to your selection ("span[id=prod_pricebox_price]"), you are trying to select a span element with an id of prod_pricebox_price, which doesn't exist anywhere in this document.
You might want to try something like "span#price_box", which should give you the element <span id="price_box">RM 449.00,</span>