How can I store an XPath as a variable in Python Selenium? - html

I want to be able to select text on a website, store it, and then turn that stored information into a variable. More specifically as seen in the picutre below, my goal is for the selected name, in this case, John J. John, somehow be copied, turned into a variable, and then printed, all without emulating any key binds.
HTML Inspect element photo
The code I have tried to use to get the information is this:
selectedName = browser.find_element_by_xpath('//*[#id="sign_in_box"]/div/div[2]/div/div[1]/div').get_attribute("div")
print (selectedName)
The return I am getting is this:
None
I know that the problem almost definetly lies somewhere in the path, but I can't figure it out.

Assuming your xpath is correct, you should put .text instead of .get_attribute("div") since it will not return the text you want, but the div itself.
How to get text with Selenium WebDriver in Python
Try
selectedName = browser.find_element_by_xpath('//*[#id="sign_in_box"]/div/div[2]/div/div[1]/div').text

Related

Trying to extract data from Google results pages for a specific domain

so I'm trying to extract the URL, Title and decription from the SERPs for just 1 domain. This means I have to use the URL in some sort of "contains" function in order to get the corresponding title and description, right?
Since Google has the URL and the Title within the same element, I could get these easily via xpath.
My issue right now is the description, which is outside the initial where the URL is. So far I have tried Xpath as well as regex and couldn't find a way to make it work.
Here is some code that didn't work:
Xpath:
//div[#class="jtfYYd"]/a[starts-with(#href,'https://www.example.com')]//*[#class="NJo7tc Z26q7c uUuwM"]/div
Regex:
A: ["']href="https://www.example.com["']<div class="NJo7tc Z26q7c uUuwM"["']>(.*?)
B: (?=["']https://www.example.com["'])(?=["']NJo7tc Z26q7c uUuwM["'])(.*?)
I can only use Xpath1.0 since the tool (Screaming Frog) doesn't support Xpath 2.0. I hope someone has a solution.

Changing number of a web page (in the URL), change the display but not the Html source code

I am facing a behavior that I really don't understand.
If you go on the webpage: https://www.edel-optics.fr/Lunettes-de-soleil.html#ful_iPageNumber=1 and inspect the code you will realize that it's the same html content as on https://www.edel-optics.fr/Lunettes-de-soleil.html#ful_iPageNumber=7
=> to test it, try to search "ERIKA - 710/T5" on both source codes and you will find it (but you should only find on the ful_iPageNumber=1).
Why is it behaving like this ?
Secondary question: how to I get the real content of https://www.edel-optics.fr/Lunettes-de-soleil.html#ful_iPageNumber=7 ?
Thank you for your help
John
Problem
You have explained that when you perform a search, you get the same results as with your pagination (page 1)
Issue
You are not getting the value your searching for placed into the URL
https://www.edel-optics.fr/Recherche.html?time=1519871844737#query=
the #query is = to nothing
You would be needing something like:
https://www.edel-optics.fr/Recherche.html?time=1519871844737#query=ERIKA%20-%20710/T5
Without seing your code its hard to say where the issue lays. it could well be that the search box is not inside the Form or it could be that the submit button is on another form to the search box, or maybe an issue with backend scripts not grabbing the get values as a result of case differences in the value name.
Without seing your script its hard to diagnose
Ok I found a solution to solve this strange problem, replace the # in the URL with a ? and you will have the actual html content (corresponding to the display)...

Modify HTML content during Web Scraping

I try to do some Web Scraping
The objective is to collect all remedials according to the postal code. The problem is when I try my code, my list is empty because the url did't change according to the postal code. This is why I want to change the HTML value during the scrape.
I'm not sure how to do this. I tried using Selenium and XPATH however I wasn't able to find anything.
Here's the HTML Code: (in red is what I need to change.)
EDIT : Indeed, the goal is to collect the pagination with the name and the type of remedial according to the postal code, this is why I want to change the HTML content during the scrap.
This is the best that I can do for the moment, I hope u will see the error
This input is in a form, which is good because Selenium has special functionalities to handle forms.
from selenium import webdriver
url = "https://www.maif.fr/services-en-ligne/consultationreparateurs/geolocaliserReparateur.action?view"
query = "whatever you want to put into the search box"
driver = webdriver.Chrome()
driver.get(url)
webform_input = driver.find_element_by_xpath("//input[#id='adresseInternaute']")
webform_input.send_keys(query)
webform_input.submit()
The key here is submit(). It will walk the HTML tree until it finds a button within the current form, meaning you don't have to write an extra two lines just to click the search button.

Check for visual presence of character on page?

I'm in an odd situation where I need to check, via a test, that a currency symbol is being properly displayed on our web page.
We've been running into issues where sometimes the unicode alphanumeric value is showing up on the page instead of the actual currency symbol itself.
Is there a way to check for something like this? Like with some type of visual checking library, or through javascript?
The answer to this issue was to specifically copy and paste the unicode character I was looking to test against into my text editor.
So using the Protractor framework, I would find my css element, and if I have a known price of 17.99 that should be returning, my test function would return:
return expect(myPriceElement.getText()).to.eventually.equal(£17.99);
If on my webpage, £17.99 shows up, then my test will pass

Hide the text input field portion in a form's GET query url when the value is an empty string

e.g.: http://127.0.0.1:8000/database/?reference_doi=&submit=Submit
I know It appears to be an html standard, but is there a tag to switch it, so that the empty text input string does not appear in the query url?
Or alternatively, since I'm using Django, I tried doing the following in my view.
request_get_copy = request.GET.copy()
for key, value in request_get_copy.items():
if not value or key == 'submit':
request_get_copy.pop(key)
request.GET = request_get_copy
request.META['QUERY_STRING'] = request_get_copy.urlencode()
I displayed request.GET and request.META['QUERY_STRING'] in the actual page through my template, and several methods that request object has, and they all gave successfully "corrected" values, like http://127.0.0.1:8000/database/ But since the GET request first goes through the browser, the displayed url still contains empty string value portions. Is there anything I can do?
The easiest thing you could do is to issue a redirect to your fixed URL:
fixed_url = request_get_copy.urlencode()
return redirect(fixed_url)
Even better if you do that only if it actually changed, and before any DB access or heavy work.
This means an additional GET, but gives you the result you want, and I guess that's more valuable to you :)
If Javascript is an option, you could also do this changes before the submit actually happens, it's a tad more convoluted but will avoid the extra request.
Edit: Just to be clear, there's no way to "turn it off", you could say this is how HTTP and browsers work :)