splinter nested <html> documents - html

I am working on some website automation. Currently, I am unable to access a nested html documents with Splinter. Here's a sample website that will help demonstrate what I am dealing with: https://www.w3schools.com/html/tryit.asp?filename=tryhtml_elem_select
I am trying to get into the select element and choose the "saab" option. I am stuck on how to enter the second html document. I've read the documentation and saw nothing. I'm hoping there is a way with Python.
Any thoughts?
Before Solution:
from splinter import Browser
exe = {"executable_path": "chromedriver.exe"}
browser = Browser("chrome",**exe, headless=False)
url = "https://www.w3schools.com/html/tryit.asp?filename=tryhtml_elem_select"
browser.visit(url)
# This is where I'm stuck. I cannot find a way to access the second (nested) html doc
innerframe = browser.find_by_name("iframeResult").first
innerframe.find_by_name("cars")[0]
Solution:
from splinter import Browser
exe = {"executable_path": "chromedriver.exe"}
browser = Browser("chrome",**exe, headless=False)
url = "https://www.w3schools.com/html/tryit.asp?filename=tryhtml_elem_select"
browser.visit(url)
with browser.get_iframe("iframeResult") as iframe:
cars = iframe.find_by_name("cars")
cars.select("saab")

I figured out that these are called iframes. Once I learned the terminology, it wasn't too hard to figure out how it interact with it. "Nested html documents" was not returning the results I needed to find the solution.
I hope this helps someone out in the future!

Related

Scraping prices with BeautifulSoup4 in Python3

I am new scraping with Python and BeautifulSoup4. Also, I do not have knowledge of HTML. To practice, I am trying to use it on Carrefour website to extract the price and price per kilogram of the product that I search for EAN code.
My code:
barcodes = ['5449000000996']
for barcode in barcodes:
url = 'https://www.carrefour.es/?q=' + barcode
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find_all('strong', {'class':'ebx-result-price__value'})
print(searchingprice)
searchingpricerperkg = bs.find_all('span', {'class':'ebx-result__quantity ebx-result-quantity'})
print(searchingpricerperkg)
But I do not get any result at all
Here is a screenshot of the HTML code:
What am I doing wrong? I tried with other website and it seems to work
The problem here is that you're scraping a page with Javascript-generated content. Basically, the page that you're grabbing with requests actually doesn't have the thing you're grabbing from it - it has a bunch of javascript. When your browser goes to the page, it runs the javascript, which generates the content - so the page you see in the rendered version in your browser is not the same thing returned from the actual page itself. The page contains instructions for your browser to write the page that you see.
If you're just practicing, you might want to simply try a different source to scrape from, but to scrape from this page, you'll need to look into other solutions that can handle javascript generated content:
Web-scraping JavaScript page with Python
Alternatively, the javascript generates content by requesting data from other sources. I don't speak spanish, so I'm not much help in figuring this part out, but you might be able to.
As an exercise, go ahead and have BS4 prettify and print out the page that it receives. You'll see that within that page there are requests to other locations to get the info you're asking for. You might be able to change your request to not go to the page where you view the info, but to the location that page gets it's data from.

Selenium login to page with python 3.6 can't find element by name

Today I tried to write a code to make a bot for ytpals.com webpage.
I am using python selenium library.
What I am trying to do first is to login to page with my youtube channel ID.
But I was unsucessfull to find element 'channelid' whatever I do.
Adding to this this, page sometimes doesn't load fully...
Btw it worked for me with other pages to find an input form, but this page... I can't understand.
Maybe someone has better understanding than me and know how to log in in this page?
My simple code:
import time
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('https://www.ytpals.com/')
search = browser.find_element_by_name('channelid')
search.send_keys("testchannel")
time.sleep(5) # sleep for 5 seconds so you can see the results
browser.quit()
So I found a solution to my problem.
I downloaded SELENIUM IDE, and I can use it as a debugger, such a great tool!
if someone will need it, grab a link:
https://www.seleniumhq.org/docs/02_selenium_ide.jsp

How to fill out a web form and return the data with knowing the web form id/name in python

I am currently trying to automatically submit information into the web forms on this website : https://coinomi.com/recovery-phrase-tool.html Unfortunately I do not know the name of the forms, and cant seem to find out from its source code. Now I have tried to fill out the forms using the requests python module, and just by passing the parameters through the URL before scraping it. Unfortunately I have trouble finding the name of the form so I cant do this.
If possible I wanted to do this with the offline version of the website at https://github.com/Coinomi/bip39/blob/master/bip39-standalone.html so that it is more secure but I barely know how to use regular web forms with the tools I have, let alone locally from my computer.
I am not sure what exactly are you looking for. However, here is a part of code, which use selenium to fill some parts of the form that you mention.
import selenium
from selenium import webdriver
from selenium.webdriver.support.select import Select
browser = browser = webdriver.Chrome('C:\\Users...\\chromedriver.exe')
browser.get('https://coinomi.com/recovery-phrase-tool.html')
# Example to fill a text box
recoveryPhrase = browser.find_element_by_id('phrase')
recoveryPhrase.send_keys('your answer')
# Example to select a element
numberOfWords = Select(browser.find_element_by_id('strength'))
numberOfWords.select_by_visible_text('24')
# Example to click a button
generateRandomMnemonic = browser.find_element_by_xpath('/html/body/div[1]/div[1]/div/form/div[4]/div/div/span/button')
generateRandomMnemonic.click()

Error when getting website table data using python selenium - Multiple tables and Unable to locate element

I am trying to get info from brazilian stock market (BMF BOVESPA). The website has several tables, but my code is not being able to get them.
The code below aims to get all data from table "Ações em Circulação no Mercado" -> one of the last tables from webpage.
I have tried the ones below, but none worked for me:
content = browser.find_element_by_css_selector('//div[#id="div1"]')
and
table = browser.find_element_by_xpath(('//*[#id="div1"]/div/div/div1/table/tbody'))
Thanks in advance for taking my question.
from selenium import webdriver
from time import sleep
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-
Listadas/ResumoEmpresaPrincipal.aspx?codigoCvm=19348&idioma=pt-br"
browser = webdriver.Chrome()
browser.get(url)
sleep(5) #wait website to reload
content = browser.find_element_by_css_selector('//div[#id="div1"]')
HTML can be found at attached picture
As alternative, the code below reaches the same website
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-Listadas/BuscaEmpresaListada.aspx?idioma=pt-br"
Ticker='ITUB4'
browser = webdriver.Chrome()
browser.get(url)
sleep(2)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_txtNomeEmpresa_txtNomeEmpresa_text"]')).send_keys(Ticker)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnBuscar"]')).click();
content = browser.find_element_by_id('div1')
Selenium with Python documentation UnOfficial
Hii there
Selenium provides the following methods to locate elements in a page:
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
Why your code doesnt work ? because you're not using correct correct code to locate element
you're using xpath inside css selector
content = browser.find_element_by_css_selector('//div[#id="div1"]') #this part is wrong
instead you can do this if you want to select div1
content = browser.find_element_by_id('div1')
here's the correct code
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-
Listadas/BuscaEmpresaListada.aspx?idioma=pt-br"
Ticker='ITUB4'
browser = webdriver.Chrome()
browser.get(url)
sleep(2)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_txtNomeEmpresa_txtNomeEmpresa_text"]')).send_keys(Ticker)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnBuscar"]')).click()
I tested it and it worked :)
Mark it as best answer if i helped you :)

Using R-selenium to scrape data from an aspx webpage

I am pretty new to r and selenium so hopefully i can express myself clearly about my question.
I want to scrape some data off a website (.aspx) and i need to type some chemical code to be able to pull out some information in the next page (using R-selenium to input and click element). So far i have been able to build a short code that will get me through the first step, i.e. pull out the correct page i wanted. But i had so much trouble in finding a good way to scrape the data (the chemical information in the table) off this website. Mainly because the website will not assign a new html address instead of give me the same aspx address for any chemical i search. I plan to overcome this and then build a loop so i can scrape more information automatically. Anyone has any good thoughts that how i should get the data off after click-element? I need the chemical information table in the second page.
Thanks heaps in advance!
Here i put my code that i wrote so far: the next step i need is to scrape the table out the next page!
library("RSelenium")
checkForServer()
startServer()
mybrowser <- remoteDriver()
mybrowser$open()
mybrowser$navigate("http://limitvalue.ifa.dguv.de/")
mybrowser$findElement(using = 'css selector', "#Tbox_cas")
wxbox <- mybrowser$findElement(using = 'css selector', "#Tbox_cas")
wxbox$sendKeysToElement(list("64-19-7"))
wxbutton <- mybrowser$findElement(using = 'css selector', "#Butsearch")
wxbutton$clickElement()
First of all, your tool choice is wrong.
Secondly, in your case
POST to the "permanent" url
302 redirect to a new url, which is http://limitvalue.ifa.dguv.de/WebForm_ueliste2.aspx in your case
GET the new url
Thirdly, what's the ultimate output you are after?
It really depends on how much data you are up to. Otherwise do a manual task.