I am scraping elements from a web page & I can see the element being visible (numeric value) on the web page in a grayed out box ,but when tried to inspect the element I cant find it between the tags. I assumed the URL might be any webservice endpoint & tried to GET from postman but it returned mere HTML code not a JSON response.
In general, we can get values between the tags by finding the element & getting innerText attribute in selenium that too failed as there is no text in between the tags.
I cannot post any URL or responses due to security compliance issues in my org. Please advise any other way I can work around.
Got my answer , tried running JS code "document.querySelector('input[name=assets_tot_mfr]').value;" and ran it through python execute_script. Thanks
Related
Working with an in-house developed application, when I go to the web site manually (launch Chrome, type in the URL, etc.) and inspect a particular element, I see that element with the ID attribute as follows:
id="input-145"
When I use Chromedriver and Selenium to run my test, the same element shows up with this ID attribute:
id="input-147"
Has anyone ever seen this before? I'm not sure how to proceed.
I was expecting the ID attribute to remain the same regardless of how the site is accessed.
Some frameworks use dynamic id for input to protect against parsers. I solved this problem using element search by full xpath.
Example:
# xpath: does not work with dynamic id
input = driver.find_element(By.XPATH, "//*[#id='input-147']")
# full xpath: work with dynamic id
input = driver.find_element(By.XPATH, "/html/body/header/div/form/div/input")
Locating by xpath documentation
The XPath selector in Scrapy shell response.xpath('//div[#class="chr-lot-header__bid-details"]//span[#class="chr-lot-header__value-field"] returns an empty list while the same XPath selector selects the right html tag in the "Elements" tab of my Chrome browser.
Here's the website the XPath selector is intended for:
https://www.christies.com/en/lot/lot-5973059
The output I want the XPath selector to produce is "GBP 11,282,500".
I just checked the website you mentioned in your mentioned is getting the required information dynamically loaded, which means it can not be scrapped directly. Because scrapy only scrap statically available data not dynamically loaded data. To scrap dynamically loaded data either you need to mimic real time browser like you can use selenium/playwright and integrate there library inside your scrapy code or you can try to find API calls in network tab, which this required data is being loaded/fetched.
I'm new to web scraping so I'm fooling around with scrapy and trying to crawl a certain website.
I'm working with the scrapy shell on windows and just trying to establish the proper XPath to a particular element I want to access. The element is a schedule, this is the HTML:
I'm trying to access the rv-schedule-module and all its sub-nodes. I'm able to access all nodes up until the rv-schedule-module however beyond that all XPath calls return null. For instance:
The progression of calls returns data until I want to access a div underneath the rv-schedule-module. That call returns null.
What am I doing wrong?
Just as I suspected that content is dynamically created because it's handled by javascript!
When you inspect the element it will be there but if you check the page source it won't. Scrapy by itself doesn't handle javascript, you'll need something like scrapy-splash or Selenium.
There is a really good post of the all mighty Alex explaining how to use it - https://stackoverflow.com/a/30378765/2781701
I use Scrapy xpath to gather some information.
now I need to locate some element on web pages.
but the brower fix the invalid tags,the structure in DevTool is corrected html.
but in scrapy, request return the origin html,the xpaths which work in DevTool did not work in code any more.
is there a good way to locate the element like in DevTool or close the error tolerance ?
Ive used WindowsHost to host a WebBrowser control, and that has allowed me to access the WebBrowsers Document/DOM directly, t read HTML content via mouse clicks on HTML document elements and also to invokes on submit forms. I never found a way even in Net 3.5 to do this when I was searching at the time. Ive found this post http://rhizohm.net/irhetoric/blog/72/default.aspx and it looks like through som magic casing you can expose the dom. BUT My question is, has any one done this, and is it possible once you get the dom to do Invokes to submit contect to html forms and also get HTML elements via mouse click events????
Anyone tried? and was able to do both?
Thanks
I'm using WPF.
add a reference to:
Microsoft.mshtml
then:
var doc = ( mshtml.HTMLDocument )_wbOne.Document;
and this gives you the raw string:
doc.documentElement.innerHTML
in return, if you know how to get information out of the HTML document, i'd appreciate it.
for example get all the s and and the metas and whatever else might be gettable so i can get the information from them? i don't want to dink around with the html, just get the info from them...:-)