How to query this html element? - html

A different DOM element
I was checking source code of youtube and found I couldn't query these elements from DOM. what are these ?
I was expecting the element to be queryable using javascript.
Update I found the answerit is a web component and can be queried as a shadow root. What problem does web component solve ?
source - https://kevinsimper.medium.com/document-queryselector-with-web-components-76a8be5bc59

These are custom tags that you wouldn't find in most code bases.
However, fret not. You can still query this elements with the following selector:
let thumbnailArray = document.querySelectorAll("#dismissible > ytd-thumbnail");
If you want to query an element, but can't figure out a selector that returns something, you can always right click on the element and mouse over the copy option, then "copy selector".
Here is the list of thumbnails returned into the array:

This is the react element of youtube. That's why we cannot query it from anywhere. We can create own element in react. These are one of those.
***** THANKS *****

Related

VBA - Click "Download" from html w/ masked button by anchor

I'm new to this website and just begun my journey in HTML. My hope is that I can provide to the community as much as I have received lurking in the answers!
I am currently working on automating some navigation on IE using VBA. All has gone to plan with the exception of the following:
There's a "button" I am trying to click...here is the HTML:
<a class="alignLeft nowrap" href="/assistant/newRunReport?parameterId=de9498-1643e6f7969-5tv0">Download</a>
In the past, I have simply used the href to navigate directly to the page. However this particular request returns an error page in the browser so that doesn't seem to be an option.
Any help would be appreciated!
What link are you trying to access? Is it a folder included in your project, or the one from an external server?
From what I understand, you're trying to access the "assistant" folder is this?
The "/" before "assistant" (/ assistant) can affect the location because it indicates the entry in another folder.
Try if the folder is in the project, without a previous "/":
"assistant/newRunReport?parameterId=de9498-1643e6f7969-5tv0"
Have you also tried a CSS selector of a[href='/assistant/newRunReport?parameterId=de9498-1643e6f7969-5tv0']
Which would be
.document.querySelector("a[href='/assistant/newRunReport?parameterId=de9498-1643e6f7969-5tv0']").Click
If those fail try:
.document.querySelector(".alignLeft.nowrap").Click or
or
.document.querySelector("a.alignLeft.nowrap")
.querySelector is the method of document that applies a CSS selector. You select an element by its styling.
a.alignLeft.nowrap, for example, is a tags with className of alignLeft nowrap. The "." means className. You are not allowed compound names when using .querySelector so you add another dot where the space is to get a.alignLeft.nowrap.
.querySelector returns a single matching item i.e. the first. .querySelectorAll will return a nodeList of all matching elements which you then traverse the .Length of to get to individual items by index.

How to test if class has been added to nav tag in Rspec

When a user scrolls down on a webpage I have, a class gets added to the nav tag and it changes positions on the page. When the user scrolls back to the top, the class is removed and it moves back to the original position. I know this feature works and the class when I inspect the element in a browser. I'm trying to write an Rspec test to test this feature. I've been trying to use Capybara without success. The scrolling part of the test works but searching the HTML and CSS for the added class isn't.
The nav id is "views" and the class being added is "on_nav". This is the test so far:
scenario 'Scroll down' do
visit '/'
page.execute_script "window.scrollBy(0,1000)"
expect(page.html).to include('class="on_nav"')
end
The error message is that it cannot find 'class=on_nav"' on the page, even though when I inspect the element in the browser I can see it. These are a few of the different random commands I've tried, from answers I've looked at online when trying to Google this, that all give me the same error or they say my syntax is wrong and I can't find how to fix it:
expect(page.html).to have_selector("#on_nav")
expect(page).to have_css("nav#views.on_nav")
expect(page.has_css?(".on_nav")).to eq(true)
I am completely new to writing web tests, but I do know the answers I have found online (for example this question about checking the CSS and this article about testing elements with Capybara) haven't worked for me. It might be giving me problems because I'm trying to test the nav tag whereas all the examples I've found online talk about either div or input? Is it even possible?
Doing expect(page.html).to include('class="on_nav"') should never be done, and you should ignore everything from any article/tutorial that suggested that. By getting the page source as a string you are completely disabling Capybaras ability to wait/retry for the given condition to be met.
With regards to your other attempts
By default the selector type is :css, so expect(page.html).to have_selector("#on_nav") would look for an element with an id of 'on_nav' which isn't what you want, and by calling page.html you've again disabled waiting/retrying behavior since the string gets parsed back into a static document.
expect(page.has_css?(".on_nav")).to eq(true) is getting closer to what you want but will not provide useful error messages since your expectation is just for true or false
expect(page).to have_css("nav#views.on_nav") is the correct way to verify an elements existence on the page. It will look for a visible <nav> element with an id of views and a class of on_nav and wait/retry up to Capybara.default_max_wait_time seconds for that element to exist. If that isn't working for you then either the element isn't visible on the page, the selector doesn't actually match the element, or your JS that's adding/removing the class isn't working when you call scrollBy. If you're using selenium, pause the driver after calling scrollBy and inspect the element in the browser to ensure it's adding/removing the class as expected, and if it is then add the actual HTML of the nav to your question.

JSoup Select Tag Recursive Search

I recently tried to work with JSoup to parse HTML documents, I went through the turorial on JSoup and found that the select-Method might be what I am looking for.
What I try to accomplish is to find all elements in a html document which possess a certain class. To test that, I tried this with the amazon web page (idea: find all deals with certain offers).
So I inspected the web page to see which classes and ids are being used and then I tried to integrate this into a small code snippet. In this example I found the follwing element:
<span id="dealTitle" class="a-size-base a-color-link dealTitleTwoLine restVisible singleCellTitle autoHeight">PROCAVE Matratzen-Brücke aus Schaumstoff 25 x 200 cm für ...</span>
This element is embedded in other elements and exists multiple times (for each deal of course). So here is my code to read the deal elements:
Document doc = Jsoup.connect("https://www.amazon.de/gp/angebote/ref=gbph_ftr_s-8_cd61_page_1?gb_f_LD=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL%252CUPCOMING,dealTypes:LIGHTNING_DEAL,page:1,sortOrder:BY_SCORE,dealsPerPage:8&pf_rd_p=425ddcb8-bed4-4e85-ac0f-c1a79d14cd61&pf_rd_s=slot-8&pf_rd_t=701&pf_rd_i=gb_main&pf_rd_m=A3JWKAKR8XB7XF&pf_rd_r=BTHRY008J9N3N5CCMNEN&gb_f_second=dealStates:AVAILABLE%252CWAITLIST%252CWAITLISTFULL,dealTypes:COUPON_DEAL,page:8,sortOrder:BY_SCORE,dealsPerPage:8").timeout(0).get();
Elements deals = doc.select("span.a-size-base.a-color-link.dealTitleTwoLine.restVisible.singleCellTitle.autoHeight");
for (Element deal : deals) {
if (deal.text().contains("ItemMatch")) {
System.out.println("Found deal: " + deal.text());
}
}
Unfortunately I can't get the element I am looking for. deals has always the size of 0. I tried to modify my select with only part of the classes, I added the id-attribute and so on. Nevertheless, I do not get the elements (in this case these are nested into some others). If I try an element which is above this element in the DOM hierarchy (e.g. the div with class "a-section a-spacing-none slotContainer"), this is found.
Do I actually need to specify the whole DOM hierarchy (by using ">" in my select expressions? I expected to be able to define a selector and JSoup would travers and search the whole DOM-tree.
No, you do not have to specify the full DOM hierarchy. Your test should work, if the elements are really part of the DOM. I suspect that they might not be part of DOM as it is loaded be JSoup. The reason might me, that the inner DOM nodes are filled by JavaScript through AJAX. JSoup does not run JavaScript, so dynamically loaded parts of the DOM are not accessible. To achieve what you want you can either look into the AJAX calls directly and analyze them, or you move on to another solution like selenium webdriver, which runs a real browser including a working JavaScript engine.

Using node and node-phantom to scrape AngularJS Application

I have a node script set up to scrape pages from an AngularJS application and then generate code needed for testing purposes. It works great except for one thing. ng-if. Since ng-if removes elements from the dom the script never sees these blocks of code. I can't remove the ng-if's. So I'm wondering if there is some way to intercept the html between when node-phantom requests the page and when it actually loads everything in to phantoms dom. What I'm hoping to do is simply set all the ng-if's to true so that all content is available. Does anyone have any ideas for this?
EDIT I'm using phantomjs-node not node-phantom.
My Final solution was to scrape the page for all of the comment tags. Then filter through to find the ones that contained ng-ifs and parse out variable names from those tags. Then I tapped into Angular's $scope and set all of the variables to true. Forcing everything that is hidden on the page to be visible.

Extract (random) image with no useful src= from web page

First I'd like to know how this can be achieved in general, and then maybe someone knows how to accomplish this using Capybara.
Example: <img src="http://example.com/getrandomimage">
Thing is, src points to a script which returns random image, not to the image itself.
Page is loaded, script is run, image is displayed. I can easily get the src value, but if I access the link to download the image, the script runs again and returns a totally different picture. And I need the one that's already on the page.
I think the process would be very similar using JS or Capybara. I'd break it down into two steps:
Write a selector that will find the <img> tag. In JS that might look like:
myImg = document.getElementByTagName("img")
Call .src on the returned node:
result = myImg.src
I believe Capybara is limited to XPath and CSS selectors. Therefore, depending on the page you are trying to scrape, you'll have to identify some sort of pattern in the HTML tags or the CSS attributes to find the <img> tag.