Parsing html with changing div id

Parsing html with changing div id - html

I'm trying to parse the following HTML in order to get to the link that I marked below by using jsoup:
In order to do so, I did the following:
Document doc = Jsoup.parse( url );
Elements links = doc.select(".list-item-wrapper").select(".list-item")----> HERE IM STUCK
I would have continued by using:
doc.select(".list-item-wrapper").select(".list-item").select(#SEARCH_RESULT_RECORDID_dedupmrg914683993).select()....
But the problem is that _dedupmrg914683993 is changed between every page.
I also tried:
doc.select(".list-item-wrapper").select(".list-item").select(.list-item-primary-content result-item-primary-content layout-row).select()....
But I got 0 results.
How can I parse it so I could get eventually to the link inside <img class="main-img fan-img-1"...>?
Thank you

You can search for string match on any attribute, if your id always start with SEARCH_RESULT_RECORDID string you can look for it using the following syntax
doc.select(".list-item-wrapper").select(".list-item").select('[id^=SEARCH_RESULT_RECORDID]').select()....
I assuming that selectors are using jquery scheme

Related

How to add a specific metadata using jquery into a div with an ID

I am trying to figure out how to write jQuery code to insert meta data in to an element, specifically this: data-section-name="home"
I want to insert this in a div with an ID of home-section and the output would be like this:
<div id="home-section" data-section-name="home">
(some code here...)
</div>
I am using a divi builder in Wordpress

If you need the data attribute to appear in the HTML source use attr():
$('#home-section').attr('data-section-name', 'home');
If the data attribute is going to be read by a jQuery library then you can use the data() method instead, which is more performant:
$('#home-section').data('section-name', 'home');
The caveat in both cases is to ensure that you execute this line of code before whatever library depends on that data attribute being present.

Selenium not giving whats inside a class?

PATH = "D:\CDriver\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get('https://www.se.com/ww/en/about-us/careers/job-details/inside-sales-associate/006ZMV')
TITLE = driver.find_element_by_class_name('sdl-application-job-details__job-title')
print(TITLE)
driver.quit()
I have all the needed imports, I just wanted to leave them out.
When I run this the output SHOULD be: Inside Sales Associate
But instead it gives me this: <selenium.webdriver.remote.webelement.WebElement, the session and element code.
What do I need to do to make it print what it should print. I have tried by_tag_name('h1.sdl-application-job-details__job-title') but that gives the exact same.

There is a inbuilt title method available in Selenium. You can call that method on driver object not on web element.
Code :
driver.get('https://www.se.com/ww/en/about-us/careers/job-details/inside-sales-associate/006ZMV')
driver.title
print(driver.title)
or if you want to retrieve text inside any web element, you could probably do something like this :
class_value = driver.find_element(By.CSS_SELECTOR, "h1[class$='sdl-application-job-details__job-title']").text
print(class_value)

The find_element methods return web elements. Just pass print(TITLE.text)

unable to find multiple existing element with selenium

here is the html code of the Links i am trying to click on (i dont know why selenium cant locate the link. My guess is, that there are X amounts of the link. The only difference is the string in the brackets of onclick.) Underneath i'll show you 2 examples of the html code. I'd like to click on all of them (in this case on both)!
Button or Link:
<td class="text-right">
Ansehen</td>
Button or Link:
<td class="text-right">
Ansehen</td>
here are my Attempts to click on the button:
driver.find_element_by_link_text("doRetrievePnr('DUA75J')").click()
driver.find_element_by_xpath("//a[#onclick='doRetrievePnr('DUA75J')']").click()
Here are the Errors i get:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"link text","selector":"doRetrievePnr('DUA75J')"}
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: Unable to locate an element with the xpath expression //a[#onclick='doRetrievePnr('DUA75J')'] because of the following error: SyntaxError: Failed to execute 'evaluate' on 'Document': The string '//a[#onclick='doRetrievePnr('DUA75J')']' is not a valid XPath expression.
am I missing out on something?
EDIT
Heres a picture of the BUTTONS, the HTML CODE and my PYTHON CODE (lines 34 & 35)

As mentioned in the comments - getting all the items of a specific identifier can be done with find_elements_by_... (note find elements - it's plural)
That returns an array you can iterate through to interact with each element.
It looks like all your links have the same ID.
Something like this might work:
links = driver.find_elements_by_id('viewPnr')
for link in links:
link.click()
time.sleep(10) #replace this with what you want to do
Obviously if you're clicking links and the page changes this might not work after the first iteration. However, this is the concept on how you can get all objects and iterate through them.

How can I get the innerText of Dynamic Html tags using Puppeteer.js (node.js) in TripAdvisor?

How would I get all 10 comments located in this page with a loop or a Puppeteer function https://www.tripadvisor.com/Restaurant_Review-g294308-d3937445-Reviews-Maki-Quito_Pichincha_Province.html using innerText property?
The only solution I have come up with is getting the outerHTML of the whole container of comments and then try to substring to get all the comments, but that is not optimal and I think its a more difficult approach. Maybe there is an easier solution in Puppeteer I cant find?
I am doing this for educational purposes. The comments are in class="partial_entry" and I want to get the innerText of a Dynamic Html tag (I want all 10), like the ones you see here:
If I where to open the div that contains <div class="review-container" data-reviewid="606551292" data-collapsed="true" data-deferred="false"><!--trkN:3-->, I would get another with id="review_582693262". Getting to the point, If I get to a <div> that has class="partial_entry" this would be where my comment is located. I have tried a few things but I get null, because it is not found since the parent <div> for each comment has a unique id like id="review_xxxxxxxxx".
Its kind of difficult since the review id is autogenerated like id="review_xxxxxxxxx" and cant iterate with a loop copying the CSS path since I dont have a static parent .

Why not just select those elements which have partial_entry class? This works:
let comments = await page.evaluate(() =>
[...document.querySelectorAll(".partial_entry")].map(item => item.textContent)
);

Retrieve an image from a website using Ruby and Nokogiri

I am trying to get an image from this website using Ruby.
https://steamcommunity.com/market/listings/730/M4A1-S%20%7C%20Cyrex%20(Minimal%20Wear)
So far, I have successful code to get the name of the item listed on the website:
html = Nokogiri::HTML.parse(open('https://steamcommunity.com/market/listings/730/'+url2))
title = html.css('title').text
titles = title.sub(/^Steam Community Market :: Listings for / , '')
Which results in "M4A1-S | Cyrex (Minimal Wear)"
(The "url2" comes from an input box on the html page that I made)
The image on the Steam Website has a class of "market_listing_largeimage".
Is there a way to also use Nokogiri to get the image src so that I can then input it into Html?

The image does not have that class; the div that the image is wrapped in does. That said,
html.at_css('.market_listing_largeimage img')['src']

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Parsing html with changing div id - html

Related

How to add a specific metadata using jquery into a div with an ID

Selenium not giving whats inside a class?

unable to find multiple existing element with selenium

How can I get the innerText of Dynamic Html tags using Puppeteer.js (node.js) in TripAdvisor?

Retrieve an image from a website using Ruby and Nokogiri

Categories

Resources