Hi I'm practicing extracting information from a website.
(I'm using python, selenium, and beautifulsoup, which doesn't matter too much. The question is about finding an element in HTML.)
So (1) I want info in the table in graph. I located the table using Firefox Inspector: <table id='......'>
(2) but in my code I can't find it:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://corp.sec.state.ma.us/corpweb/UCCSearch/UCCSearch.aspx'
driver = webdriver.Firefox()
driver.get(url)
# navigate to the page I want using selenium
driver.find_element_by_id("MainContent_rdoSearchO").click()
driver.find_element_by_id("MainContent_txtName").send_keys("mcdonald")
Select(driver.find_element_by_id("MainContent_cboOState")).select_by_visible_text("Massachusetts")
Select(driver.find_element_by_id("MainContent_UCCSearchMethodO")).select_by_visible_text("Begins With")
driver.find_element_by_id("MainContent_btnSearch").click()
# now on next page, click link (selenium)
link_text = '95352026'
driver.find_element_by_link_text(link_text).click()
### real question starts here:
# now on the page I want
# in firefox inspector find: <table id="MainContent_tblFilingHistory">
table_id = 'MainContent_tblFilingHistory'
# try find it
table = driver.find_elements_by_id(table_id)
len(table) # length = 0, can't find it
html.find(table_id) # -1, HTML really doesn't have this string
The element you have trouble to locate is in another window. You need to tell the driver to switch the context to that window:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
driver = webdriver.Firefox()
driver.get('http://corp.sec.state.ma.us/corpweb/UCCSearch/UCCSearch.aspx')
driver.find_element_by_id("MainContent_rdoSearchO").click()
driver.find_element_by_id("MainContent_txtName").send_keys("mcdonald")
Select(driver.find_element_by_id("MainContent_cboOState")).select_by_visible_text("Massachusetts")
Select(driver.find_element_by_id("MainContent_UCCSearchMethodO")).select_by_visible_text("Begins With")
driver.find_element_by_id("MainContent_btnSearch").click()
driver.find_element_by_link_text('95352026').click()
#switch to the next window
driver.switch_to_window(driver.window_handles[1])
table = driver.find_elements_by_id('MainContent_tblFilingHistory')
Related
I am having issues clicking a button with Selenium. I have never worked with Selenium before, so I have tried searching the web for a solution but have had no luck. I tried some other things such as WebDriverWait but nothing has worked.
# My Code
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time
PATH = "F:\SeleniumProjects\chromedriver.exe"
options = webdriver.ChromeOptions()
options.add_argument("start-maximized");
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(options=options, executable_path=PATH)
driver.get("https://www.discord.com")
time.sleep(3)
buddy = driver.find_element_by_xpath("/html/body/div/div/div/div[1]/div[1]/header[1]/nav/div/a")
ActionChains(driver).move_to_element(buddy).click().perform()
This exception is confusing me because I know I can interact with it but I am unsure why it says it isn't. I am sure there is some simple fix but I am stumped.
selenium.common.exceptions.ElementNotInteractableException: Message: element not interactable: https://discord.com/login has no size and location
(Session info: chrome=90.0.4430.93)
Here is the button I am trying to press
<a class="button-195cDm buttonWhite-18r1SC buttonSmall-2bnF7I gtm-click-class-login-button button-1x6X9g mobileAppButton-2dMGaq" href="//discord.com/login">Login</a>
Wrap a wait around it and the following should work:
buddy = WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.XPATH, "//div[#class='appButton-2wSXh-']//a[contains(text(),'Login')]")))
buddy.click()
There are 2 elements of this on the page:
//a[contains(text(),'Login')]
That is why we need to go up one more level to the div element in the XPath
Import the following at the top of your script if they aren't there already:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
The ActionChains is not needed, that is generally used for dropdown menus, things like that. Also your XPath was the full path in the HTML, that is generally not advisable and can be brittle.
This is an open site with tax information. I access the site, click on 'Abzüge', then the next page is in English. I try to change to German, however, I cannot locate the button. My current code is
from selenium import webdriver
driver = webdriver.Chrome('/path-to-my-chromedriver/chromedriver')
driver.get('https://www.estv.admin.ch/estv/de/home/allgemein/steuerstatistiken/fachinformationen/steuerbelastungen/steuerfuesse.html')
elem1 = driver.find_element_by_link_text('Abzüge')
elem1.click()
de = driver.find_element_by_xpath('/html/body/div/div[4]/header/div/div[2]/div/button[1]')
de.click()
driver.close()
The Error is NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div/div[4]/header/div/div[2]/div/button[1]"} (Session info: chrome=85.0.4183.102)
I have also tried to find the button by class and by shorter xPaths, but I can never locate the button.
I also need to select some buttons in the main field. So I gave that a try as well, but it does not work as well. My code to select Whole of Switzerland, without changing to German, is
element = driver.find_element_by_xpath('//*[#id="LOCATION"]/div[2]/div[1]/div[1]/div/div/div[2]')
element.click()
I get the same error, NoSuchElementException.
How would I change the language on the webpage, and how would I select a button in the body?
You can use below code with XPath locator discussed in comments.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get('https://www.estv.admin.ch/estv/de/home/allgemein/steuerstatistiken/fachinformationen/steuerbelastungen/steuerfuesse.html')
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH, "//nav[#class='nav-lang']//li[contains(., 'IT')]")))
print(driver.find_element(By.XPATH, "//nav[#class='nav-lang']//li[contains(., 'IT')]").text)
driver.find_element(By.XPATH, "//nav[#class='nav-lang']//li[contains(., 'IT')]").click()
I am trying to make a bot that can play Cookie Clicker. I have successfully opened the website using the webbrowser module. When I use the developer tool to see the html I can see the information I want to obtain, such as how much money I have, how expensive items are ect. But when I try to get that information using the requests and beautifulsoup it instead gets the html of a new window. How can I make it so that I get the html of the already opened tab?
import webbrowser
webbrowser.open('https://orteil.dashnet.org/cookieclicker/')
from bs4 import BeautifulSoup
import requests
def scrape():
html = requests.get('https://orteil.dashnet.org/cookieclicker/')
print(html)
scrape()
You can try to do this:
body_element = html.find_element_by_xpath("//body")
body_content = body_element.get_attribute("innerHTML")
print(body_content)
I am trying to get info from brazilian stock market (BMF BOVESPA). The website has several tables, but my code is not being able to get them.
The code below aims to get all data from table "Ações em Circulação no Mercado" -> one of the last tables from webpage.
I have tried the ones below, but none worked for me:
content = browser.find_element_by_css_selector('//div[#id="div1"]')
and
table = browser.find_element_by_xpath(('//*[#id="div1"]/div/div/div1/table/tbody'))
Thanks in advance for taking my question.
from selenium import webdriver
from time import sleep
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-
Listadas/ResumoEmpresaPrincipal.aspx?codigoCvm=19348&idioma=pt-br"
browser = webdriver.Chrome()
browser.get(url)
sleep(5) #wait website to reload
content = browser.find_element_by_css_selector('//div[#id="div1"]')
HTML can be found at attached picture
As alternative, the code below reaches the same website
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-Listadas/BuscaEmpresaListada.aspx?idioma=pt-br"
Ticker='ITUB4'
browser = webdriver.Chrome()
browser.get(url)
sleep(2)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_txtNomeEmpresa_txtNomeEmpresa_text"]')).send_keys(Ticker)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnBuscar"]')).click();
content = browser.find_element_by_id('div1')
Selenium with Python documentation UnOfficial
Hii there
Selenium provides the following methods to locate elements in a page:
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
Why your code doesnt work ? because you're not using correct correct code to locate element
you're using xpath inside css selector
content = browser.find_element_by_css_selector('//div[#id="div1"]') #this part is wrong
instead you can do this if you want to select div1
content = browser.find_element_by_id('div1')
here's the correct code
url = "http://bvmf.bmfbovespa.com.br/cias-Listadas/Empresas-
Listadas/BuscaEmpresaListada.aspx?idioma=pt-br"
Ticker='ITUB4'
browser = webdriver.Chrome()
browser.get(url)
sleep(2)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_txtNomeEmpresa_txtNomeEmpresa_text"]')).send_keys(Ticker)
browser.find_element_by_xpath(('//*[#id="ctl00_contentPlaceHolderConteudo_BuscaNomeEmpresa1_btnBuscar"]')).click()
I tested it and it worked :)
Mark it as best answer if i helped you :)
I was trying to get the best search result position of few product in the below link
https://www.purplle.com/search?q=hair%20fall%20shamboo
I used below tools to get the html details from the page
++
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.purplle.com/search?q=hair%20fall%20shamboo")
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()
++
now I am confused how to get the product names and position(to get the best rank in the search )from this html.
I used the below method to get the details of the products but the output has a lot of unwanted things too.
details = soup.find('div', attrs={'class': 'pr'})
any idea how to solve this?
I don't know what you meant by position. However, the below script can fetch you the title of different products and its position (allegedly) from that page:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("https://www.purplle.com/search?q=hair%20fall%20shamboo")
soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.find_all(class_="prd-lstng pr"):
name = item.find_all(class_="pro-name el2")[0].text
position = item.find_all(class_="mrl5 tx-std30")[0].text
print(name,position)
driver.quit()