How to get the text from a webpage element using Selenium - html

I am trying to get a series of text from a web element, but unable to.
The HTML code is as follows:
<span class="versionInfo">
<span class="menu-highight">SoftFEPVis (GUI): </span> == $0
"1.6.4"
</span>
Where SoftFEPVis (GUI): and 1.6.4 are the texts which I would like to be able extract.
I am able to locate the element, and print out its class (menu-highlight), but un-able to extract SoftFEPVis (GUI): and 1.6.4.
I tried :
Version_Number = Browser.find_element(By.XPATH,'//[#id="versionDropDown"]/div/span[3]/span').getText()
and got an error:
'WebElement' object has no attribute getText.
Please help.

Instead of using .getText() you could use:
.get_attribute('innerText')
or
.get_attribute('innerHtml')
or
.text
If it helps, here is a more in-depth discussion of the topic:
Given a (python) selenium WebElement can I get the innerText?

getText() is a Selenium Java client method, where as from your code trials and the error message presumably you are using Selenium Python client.
Solution
To print the text SoftFEPVis (GUI): and 1.6.4 you can use the text attribute and you can use either of the following locator strategies:
Using css_selector and get_attribute("innerHTML"):
print(Browser.find_element(By.CSS_SELECTOR, "span.versionInfo").text)
Using xpath and text attribute:
print(Browser.find_element(By.XPATH, "//span[#class='versionInfo']").text)
Note : You have to add the following imports :
from selenium.webdriver.common.by import By

Related

Getting text in <div> using Selenium

I have a question related to Selenium in Python:
I want to obtain the text content "D. New Jersey" on a webpage. In addition, the text that I want to get can be different on different pages, but it is always under "COURT:".
The HTML code is:
<div class="span4">
<strong>COURT:</strong>
D. New Jersey
</div>
The code I use now is as follows. And it doesn't work.
self.driver.get(address)
element=driver.findElement("//a[contains(#class,'span4') and contains(div/div/text(),'COURT:')]").gettext()
I have also tried the following solutions with no luck, and no Selenium exception is being thrown either:
text = self.driver.find_element_by_xpath("//div[strong[text()='COURT:']]").text
and
text = self.driver.find_element_by_xpath("//a[contains(#class,'span4') and contains(div/div/text(),'COURT:')]").text
Is there anyone who knows how to get the text from this code using Selenium?
Thanks
For Python, you can get the text as such:
text = self.driver.find_element_by_xpath("//div[strong[text()='COURT:']]").text
This uses an XPath to query on the div element, using its inner strong element to ensure we have selected the correct div. Then, we call Python's webelement.text method to get the div's text.

Scraping HTML elements between ::before and ::after with scrapy and xpath

I am trying to scrape some links from a webpage in python with scrapy and xpath, but the elements I want to scrape are between ::before and ::after so xpath can't see them as they do not exist in the HTML but are dynamically created with javascript. Is there a way to scrape those elements?
::before
<div class="well-white">...</div>
<div class="well-white">...</div>
<div class="well-white">...</div>
::after
This is the actual page http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/amif/calls/amif-2018-ag-inte.html#c,topics=callIdentifier/t/AMIF-2018-AG-INTE/1/1/1/default-group&callStatus/t/Forthcoming/1/1/0/default-group&callStatus/t/Open/1/1/0/default-group&callStatus/t/Closed/1/1/0/default-group&+identifier/desc
I can't replicate your exact document state.
However if you load the page you can see some template language loaded in the same format your example data is:
Also if you check XHR network inpector you can see some AJAX requests for json data is being made:
So you can download the whole data you are looking for in handy json format over here:
http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json
scrapy shell "http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json"
> import json
> data = json.loads(response.body_as_unicode())
> data['topicData']['Topics'][0]
{'topicId': 1259874, 'ccm2Id': 31081390, 'subCallId': 910867, ...
Very very easy!
you just use the "Absolute XPath" and "Relative XPath" (https://www.guru99.com/xpath-selenium.html) together.By this trick you can pass form ::before (and maybe ::after). For example in your case (I supposed that,:
//div[#id='"+FindField+"'] // following :: td[#class='KKKK'] is before your "div".
FindField='your "id" associated to the "div"'
driver.find_element_by_xpath ( "//div[#id='"+FindField+"'] // following :: td[#class='KKKK'] / div")
NOTE:only one "/" must be use.
Also you can use only "Absolute XPath" in all addressing (Note:must be use "//" at the first Address.

href attribute empty using xpath (python3)

Using chrome and xpath in python3, I try to extract the value of an "href" attribute on this web page. "href" attributes contains the link to the movie's trailer ("bande-annonce" in french) I am interested in.
First thing, using xpath, it appears that the "a" tag is a "span" tag. In fact, using this code:
response_main=urllib.request.urlopen("http://www.allocine.fr/film/fichefilm_gen_cfilm=231874.html")
htmlparser = etree.HTMLParser()
tree_main = etree.parse(response_main, htmlparser)
tree_main.xpath('//*[#id=\"content-start\"]/article/section[3]/div[2]/div/div/div/div[1]/*')
I get this result:
[<Element span at 0x111f70c08>]
So the "div" tag contains no "a" tag but just a "span" tag. I've read that html visualization in browsers doesn't always reflects the "real" html sent by the server. Thus I tried to use this command to extract the href:
response_main=urllib.request.urlopen("http://www.allocine.fr/film/fichefilm_gen_cfilm=231874.html")
htmlparser = etree.HTMLParser()
tree_main = etree.parse(response_main, htmlparser)
tree_main.xpath('//*[#id=\"content-start\"]/article/section[3]/div[2]/div/div/div/div[1]/span/#href')
Unfortunately, this returns nothing... And when I check the attributes within the "span" tag with this command:
tree_main.xpath('//*[#id=\"content-start\"]/article/section[3]/div[2]/div/div/div/div[1]/span/#*')
I got the value of the "class" attribute, but nothing about "href"... :
['ACrL3ZACrpZGVvL3BsYXllcl9nZW5fY21lZGlhPTE5NTYwMDcyJmNmaWxtPTIzMTg3NC5odG1s meta-title-link']
I'd like some help to understand what's happening here. Why the "a" tag is a "span" tag? And the most important question to me, how can I extract the value of the "href" attribute?
Thanks a lot for your help!
Required link generated dynamically with JavaScript. With urllib.request you can get only initial HTML page source while you need HTML after all JavaScript been executed.
You might use selenium + chromedriver to get dynamically generated content:
from selenium import webdriver as web
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait
driver = web.Chrome("/path/to/chromedriver")
driver.get("http://www.allocine.fr/film/fichefilm_gen_cfilm=231874.html")
link = wait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//div[#class='meta-title']/a[#class='xXx meta-title-link']")))
print(link.get_attribute('href'))

Find attribute value using selenium <span class='overlay' title id='ab12'></span>

I am trying to get value of title attribute for following html code :-
<span class='overlay' title id='ab12'></span>
Actually this code is written for a tooltip. When i view source code for this HTML page , I see following
<span class='overlay' title="Test Tooltip"></span>
So basically id='ab12' in HTML code denotes Test Tooltip.
Could you tell me how can I get this text value (Test Tooltip) using Selenium-Webdriver ?
Actually your question creates some confusion, I don't think what you are saying about id='ab12', but as I'm seeing in your provided HTML class='overlay' is fixed.
(Assuming you're using Java) you should try using By.className() to locate <span> element, then use getAttribute("title") to get tooltip text as below :-
WebElement el = driver.findElement(By.className("overlay"));
String tooltip = el.getAttribute("title");

xpath to extract link or hrefs

I am trying to extract the links of similar apps from google playstore from here( using xpath )
https://play.google.com/store/apps/details?id=com.mojang.minecraftpe
Below is the screenshot of the links(marked green) which i wanted to extract
HTML sample
<div class="details">
<a title="Temple Run" href="/store/apps/details?id=com.imangi.templerun" class="title">Temple Run
<span class="paragraph-end"/>
</a>
<div>....</div>
<div>....</div>
</div>
I have used below xpath in chrome console to locate a single link but it doesnt return the href attribute of the tag. but for other attributes it works(for example "title").
Below xpath doesnt work(extract "href")
//*[#id="body-content"]/div/div/div[2]/div[1]//*/a[2]/#href
Below xpath works(extract "title")
//*[#id="body-content"]/div/div/div[2]/div[1]//*/a[2]/#title
Python code
HTML of individual tiles on the right of the linked page is in the following form * :
<div class="details">
<a title="Temple Run" href="/store/apps/details?id=com.imangi.templerun" class="title">Temple Run
<span class="paragraph-end"/>
</a>
<div>....</div>
<div>....</div>
</div>
Turned out that <a> element with class="title" uniquely identify your target <a> elements in that page. So the XPath can be as simple as :
//a[#class="title"]/#href
Anyway, the problem you noticed seems to be specific to the Chrome XPath evaluator **. Since you mentioned about Python, simple Python codes proves that the XPath should work just fine :
>>> from urllib2 import urlopen
>>> from lxml import html
>>> req = urlopen('https://play.google.com/store/apps/details?id=com.mojang.minecraftpe')
>>> raw = req.read()
>>> root = html.fromstring(raw)
>>> [h for h in root.xpath("//a[#class='title']/#href")]
['/store/apps/details?id=com.imangi.templerun', '/store/apps/details?id=com.lego.superheroes.dccomicsteamup', '/store/apps/details?id=com.turner.freefurall', '/store/apps/details?id=com.mtvn.Nickelodeon.GameOn', '/store/apps/details?id=com.disney.disneycrossyroad_goo', '/store/apps/details?id=com.rovio.angrybirdsstarwars.ads.iap', '/store/apps/details?id=com.rovio.angrybirdstransformers', '/store/apps/details?id=com.disney.dinostampede_goo', '/store/apps/details?id=com.turner.atskisafari', '/store/apps/details?id=com.moose.shopville', '/store/apps/details?id=com.DisneyDigitalBooks.SevenDMineTrain', '/store/apps/details?id=com.turner.copatoon', '/store/apps/details?id=com.turner.wbb2016', '/store/apps/details?id=com.tov.google.ben10Xenodrome', '/store/apps/details?id=com.turner.ggl.gumballrainbowruckus', '/store/apps/details?id=com.lego.starwars.theyodachronicles', '/store/apps/details?id=com.mojang.scrolls']
*) Stripped down version. You can take this as an example of providing minimal HTML sample.
**) I can reproduce this problem, that #hrefs are printed as empty string in my Chrome console. The same problem happened to others as well : Chrome element inspector Xpath with #href won't show link text