Https iframe in puppeteer - puppeteer

I'm trying to render an iframe in puppeteer - all well and good so far, it's setup and working. The problem I'm having is that to render an iframe from this url, I need to be on an https url; the error I get is:
Refused to frame because an ancestor violates the
following Content Security Policy directive: "frame-ancestors 'self'
https:".
Is there a way to get this kind of thing working in puppeteer?
Here's the code I have so far:
const puppeteer = require("puppeteer");
const embed = `
<iframe src="<some https url>" style="width: 330px; height: 186px; border: 0px;"></iframe>
`;
const timedPromise = time => new Promise(res => {
setTimeout(() => { res() }, time);
});
(async function () {
const browser = await puppeteer.launch({ headless: false});
const page = await browser.newPage();
await page.setContent(embed);
await timedPromise(3000);
await page.screenshot({ path: `screenshot${Number(Date.now())}.png` });
await browser.close();
})();

Related

How to render a webpage using puppeteer

How can I get the fully rendered html+css of a client side rendered webpage? The page contents on puppeteer returns a very poorly rendered outcome with missing css
Simplified code:
const express = require('express')
const puppeteer = require('puppeteer');
const app = express()
const port = 3000
async function getHtml(url) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto(url,
{ waitUntil: ['networkidle0', 'networkidle2', 'load', 'domcontentloaded'] });
const k = await page.content()
await browser.close();
return k
};
app.get('/', (request, response) => {
getHtml(request.query.url)
.then(function (res) {
response.send(res);
})
.catch(function (err) {
console.error(err)
response.send(err);
})
});
app.listen(port)
Running this with any website; for example https://www.tesla.com/ gives something like
Although using the page.screenshot() method gives the desired results.
Any ideas on why this occurs? And more importantly, is there a way to get around this behaviour?

What is the current folder for img references of static page

When a page is rendered using the page.setContent method of some static Html content, what is the current folder for attributes such as the src of img tags?
For example, for:
await page.setContent("<img src="./pic.jpg" />");
where is the folder ./?
Maybe it's undefined, here is my test result:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
await page.setContent('<img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
The page URL is about:blank and there's no requests sent.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
await page.setContent('<base href="https://www.google.com"><img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
send request: https://www.google.com/test.jpg
console: Failed to load resource: the server responded with a status of 404 ()
browser request test.jpg after appending a base element while the URL is still about:blank
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
// set base href to local URL
await page.setContent('<base href="file:///abc/index.html"><img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
console: Not allowed to load local resource: file:///abc/test.jpg
send request: file:///abc/test.jpg
The folder is located from the page you are visiting.
For example if the URL is
mydomain.com/directory1/page.html
The image can be found at mydomain.com/directory1/pic.jpg

puppeteer: Different response from the puppeteer browser and the user browser

I use my own browser to get the result page I want. Everything is correct. Page link is below.
https://parcelsapp.com/en/tracking/016-35294405
img for working
I want to use puppeteer to help me to load the result page. The page shows differently.
I use options headless=false to debug. I found the browser pop up from puppeteer can not load the url correctly. I guess it is because the different environments. How can I solve the problem? Thank you.
img for not working
My code is below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slow down by 250ms
executablePath: '/usr/bin/google-chrome-stable',
});
const page = await browser.newPage();
await page.on("request", (request) => {
request.abort();
});
await page.goto('https://parcelsapp.com/en/tracking/016-35294405');
await page.waitForNavigation()
await page.screenshot({ path: 'result.png' });
await browser.close();
})();

Avoid puppeteer detection

Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?
This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();

Chrome puppeteer Close page on error event

I want to close pages when puppeteer faces on any error , sometimes page the page that i try to load crashes and it doesnt call .close();
(async () => {
const page = await browser.newPage();
await page.setViewport({width: resWidth, height: resHeight});
await page.goto(d["entities"]["urls"][0]["expanded_url"], {timeout :90000});
await page.screenshot({path: './resimdata/'+d['id']+'.png' ,fullPage: true});
await page.close();
})();
There is an issue/PR on puppeteer repo regarding this which will be helpful in similar situation.
Related Issue link: https://github.com/GoogleChrome/puppeteer/issues/952
Meanwhile, you can try this little hack, if the PR is there on version 0.12+, we don't have to worry about the following code.
(async() => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
function handleClose(msg){
console.log(msg);
page.close();
browser.close();
process.exit(1);
}
process.on("uncaughtException", () => {
handleClose(`I crashed`);
});
process.on("unhandledRejection", () => {
handleClose(`I was rejected`);
});
await page.goto("chrome://crash");
})();
Which will output something like the following,
▶ node app/app.js
I was rejected