Is it possible to find the current source from srcset using Xpath? - html

An example:
<img class="lazyautosizes lazyloaded" src="//cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_150x150.png?v=1583128930"
data-srcset="//cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_180x.png?v=1583128930 180w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_240x.png?v=1583128930 240w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_360x.png?v=1583128930 360w"
srcset="//cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_180x.png?v=1583128930 180w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_240x.png?v=1583128930 240w, //cdn.shopify.com/s/files/1/0332/0178/2916/products/trm044_360x.png?v=1583128930 360w">
I want to find the link from the srcset for the one which got rendered in the browser. Is there a way to write a xpath which points at that, say the 240w one? The tag has src but that is not the one rendered in the browser.
This is how I use that xpath in Puppeteer. I do not want to write specific logic for some specific type of xpath. -
const getXpathElement = await page.$x(xpath)
const promises = getXpathElement.map((element) => page.evaluate(el => {
return el.textContent
}, element));

Related

Playwright; asserting nested `img src` values

Note, this is related to my previous question here: https://stackoverflow.com/a/73043433/4190664
I am looking to further assert somethings within the DOM when I click the 'Print' button.
From troubleshooting I am seeing the following:
the pdfjs page has a #printContainer that is an empty div
when you click the Print button, it begins creating divs with the class .printedPage to represent each page of the document
within each .printedPage div is an img element with src="blob:https://mozilla.github.io/**"
Example when the print dialog is open:
<div id="printContainer">
<div class="printedPage"><img src="blob:https://mozilla.github.io/5afcff4c-aa36-4118-b4b8-011cdce6a9bc"></div>
<div class="printedPage"><img src="blob:https://mozilla.github.io/30cd3036-2d81-4b82-af9a-0f2e9c834b69"></div>
<div class="printedPage"><img src="blob:https://mozilla.github.io/047e8762-3fae-44d1-a5a0-56ea576de93e"></div>
</div>
I already am testing the following:
let requestCount = 0;
page.on('request', request => {
if(request.url().includes('blob:https://mozilla.github.io/pdf.js/web/viewer.html')) {
expect(page.locator(`.printedPage img >> nth=${requestCount}`)).toHaveAttribute('src', /blob:https:\/\/mozilla.github.io/);
requestCount++;
}
});
await printBtn.click();
await expect.poll(() => requestCount).toBe(3);
What would be the best way to assert that each .printedPage > 'img' src contains the blob information as well?
Playwright (and javascript in general) is not a strong language so I am definitely struggling on this one 😬
Any syntactical help is appreciated
You can do something like this. You can add this before the value of requestCount increments.
await expect(
page.locator(`.printedPage img >> nth=${requestCount}`)
).toHaveAttribute('src', /blob:https:\/\/mozilla.github.io/)

Puppeteer js attempting to get value of data-src in img tag

Currently I have the following HTML:
I'm needing to get the data-src link that is there. My code in puppeteer js is:
await page.waitForSelector('#ldpPhotoGallery');
const getImgSrc = await page.$$eval('#ldpPhotoGallery', imgs => imgs.map(img => {img.getAttribute('data-src')}));
console.log(getImgSrc);
Here I wait for the page id then after it's loaded it should run the page evaluation. I'm not sure if I'm doing this correctly. From what I understand I'm evaluation the id ldpPhotoGallery then from there it returns the contents. From there I'm searchinging getAttribute data-src and it should return it no? The console.log is [null]. I know the data is there. What am I doing wrong?
It seems you just have a typo in the arrow function format: .map(img => {img.getAttribute('data-src')}) would fill all the array with undefined, as an arrow functiond body in curly brackets without retutn implicitly returns undefined. Then undefined is serialized as null and you get [null]. Just remove curly brackets or add explicit retutn.
BTW, you need not page.$$eval() for id selector, it returns an array with just one element. page.$eval() may suffice:
await page.waitForSelector('#ldpPhotoGallery');
const getImgSrc = await page.$eval('#ldpPhotoGallery', img => img.getAttribute('data-src'));
console.log(getImgSrc);

getting a handle on an element in the context of another

I have a typical page containing sections of label/field pairs but with the same label name in different sections. The sections have are named so I can identify them using the name. The HTML is structured so that the section name is a sibling of another element containing the label/fields
<div class="section">Business Address<\div>
<div>
<div class="field">
<div class="label">Country<\div>
<input type="text">
....
If I could identify the label element using a selector only I can do something like: -
const siblingHandle = page.evaluateHandle(() => {
const sectionLabelHandle = Array.from(document.querySelectorAll('.blah')).find(el=>el.textContent.includes('section label name'))
return sectionLabelHandle.nextElementSibling
})
const label = await siblingHandle.$('label selector')
But what I need is a handle on the label element so that I can get its sibling field so I can type a value in it.
I can't use siblingHandle.$eval() as it doesn't return a handle.
I've also considered using page.waitForFunction, passing in the handle so that can be used instead of 'document'
const labelHandle = page.waitForFunction(
handle => Array.from(handle.querySelectorAll('sel')).find(el=>el.textContent.includes('text'),
{},
siblingHandle
)
but I get a cycling JSON error if I do that.
So, a couple of questions,
1) Is there any way to get siblings in Puppeteer without having to use nextElementSibling in an evaluate function?
2) How can I search for an element containing specified text, but in the context of a parent handle rather than document?
Xpath selectors as opposed to CSS selectors can answer both of your questions.
Search for an element via specified text:
const xpathWithText = '//div[text()="Country"]';
Using it to get the next sibling:
const xPathTextToSibling = '//div[text()="Country"]/following-sibling::input';
In practice:
const myInput = await page.waitForXPath(xPathTextToSibling);
await myInput.type('Text to type');
You should not need to search for an element with specific text in the context of a parent handle because the second selector I used above will give you a handle of the element you want to type in directly.

Regex capture string between delimiters and excluding them

I saw in this forum an answare close to my "request" but not enough
(Regexp to capture string between delimiters).
My question is: I have an HTML page and I would get only the src of all "img" tags of this page and put them in one array without using cheerio (I'm using node js).
The problem is that i would prefer to exclude the delimiters.
How could i resolve this problem?
Yes this is possible with regex, but it would be much easier (and probably faster but don't quote me on that) to use a native DOM method. Let's start with the regex approach. We can use a capture group to easily parse the src of an img tag:
var html = `test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >`;
var srcs = [];
html.replace(/<img[^<>]*src=['"](.*?)['"][^<>]*>/gm, (m, $1) => { srcs.push($1) })
console.log(srcs);
However, the better way would be to use getElementsByTagName:
(note the following will get some kind of parent domain url since the srcs are relative/fake but you get the idea)
var srcs = [].slice.call(document.getElementsByTagName('img')).map(img => img.src);
console.log(srcs);
test<div>hello</div>
<img src="first">
<img class="test" src="second" data-lang="en">
test
<img src="third" >

Href attribute empty when selecting anchor with xpath

I have a number of links in a page that look like so :
<a class="plant_detail_link" href="plants/O7-01111"><h3>O7-01111</h3></a>
I can select all these link in my page with the following xpath :
//a[#class='plant_detail_link']
I can extract attributes like the class of each link in the usual manner :
//a[#class='plant_detail_link']/#class
But when I attempt to use the same technique to extract the href attribute values I get an empty list :
//a[#class='plant_detail_link']/#href
Does anyone have any ideas why this may be the case?
image detailing chrome developer console xpath execution
EDIT:
See full page html here - http://pastebin.com/MAjTt86V
it's a chrome bug, I believe. You can add the [index].value to get the result. In other words, the $x for href did work but it doesn't return the result in the output for some reason.
For example, I ran these $x queries in the console on this page for the 'Questions' button and got the following output:
$x("//a[#id='nav-questions']/#href")
> []
$x("//a[#id='nav-questions']/#href")[0].value
> "/questions"
You can use something like this to get a usable array of values:
var links = $x("//a[#target='_blank']/#href");
var linkArr = [];
for (i in links) { linkArr.push(links[i].value)}
or to put it in a function:
function getHref(selector, value, $x) {
var links = $x("//a[#"+selector+"='"+value+"']/#href");
var linkArr = [];
for (i in links) { linkArr.push(links[i].value)};
return linkArr; }
getHref("target","_blank", $x);
EDIT
Not sure if this will help you but in chrome adding a comma like this returns the output without the [index].value:
$x,("//a[#id='nav-questions']/#href")
> "//a[#id='nav-questions']/#href"
you could try adding a comma to the xpath selector but I'm not sure if it will help in your case.