Can puppeteer run on Render.com? - puppeteer

I've trying to deploy puppeteer on Render.com,
I can do other requests but puppeteer does not seem to work,
did I do something wrong
async function startBrowser() {
let browser;
try {
console.log("Opening the browser......");
browser = await puppeteer.launch({
headless: false,
ignoreDefaultArgs: ["--disable-extensions"],
args: [
"--no-sandbox",
"--use-gl=egl",
"--disable-setuid-sandbox",
],
ignoreHTTPSErrors: true,
});
} catch (err) {
console.log("Could not create a browser instance => : ", err);
}
return browser;
}

I realized that I forgot to switch back to headless : true, and in headless : true I have to set user agent for that page. Example:
await page.setUserAgent("Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36")

Related

Need to get full PDF in one page only Puppeteer?

I have an application in which I have to generate PDF for report but I am facing issue in creating PDF
As I want my PDF to look good and I don't know height of my PDF content so when I create PDF it breaks pages at wrong place
So is there any way to create full PDF in one Page only so that it want break on any other place
I have added my code below
const reportDetailHTML = await this.getHTML("report-detail.html", {
...
});
const sectionHTML = await this.getHTML(
"section-divider.html",
{
...
}
);
const browser = await puppeteer.launch({
args: ["--font-render-hinting=none", "--force-color-profile=srgb"],
headless: true,
});
const page: any = await browser.newPage();
await page.setUserAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
);
await page.setContent(reportDetailHTML + sectionHTML, {
waitUntil: "networkidle0",
});
let height = await page.evaluate(
() => document.documentElement.offsetHeight
);
console.log(height);
await page.waitForNetworkIdle();
await page.evaluateHandle("document.fonts.ready");
await page.emulateMediaType("screen");
const pdf = await page.pdf({
printBackground: true,
margin: "none",
format:"A4",
height,
});
writeFile("./report.pdf", pdf, {}, (err) => {
if (err) {
return console.error("error");
}
console.log("success!");
});
await browser.close();`
So is there any way to create full PDF in one Page only so that it want break on any other place

Invalid target language setting for automated Google Docs translation using Puppeteer

This is code
const browser = await puppeteer.launch({
headless: false,
timeout: 0,
defaultViewport: null,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--start-maximized",
"--ignore-certificate-errors",
],
ignoreDefaultArgs: ["--enable-automation"],
});
const page = await browser.newPage();
await page.setUserAgent(
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
);
// set download path
const client = await page.target().createCDPSession();
await client.send("Page.setDownloadBehavior", {
behavior: "allow",
downloadPath: "D:\\Download",
});
// open uri
await page.goto(
"https://translate.google.com.hk/?hl=zh-CN&sourceid=cnhp&sl=en&tl=zh-CN&op=docs",
{
waitUntil: "networkidle2",
}
);
// upload pdf docuemnt
const [fileChooser] = await Promise.all([
page.waitForFileChooser(),
page.click("label"),
]);
await fileChooser.accept(["D:\\test.pdf"]);
// click translate button
const button = await page.waitForSelector(
"div[jsname='itaskb'] > div > button"
);
await button.evaluate((b) => b.click());
// click download button
const button2 = await page.waitForSelector(
"div[jsname='itaskb'] > button",
{
visible: true,
timeout: 0,
}
);
await button2.evaluate((b) => b.click());
The whole process is the same as my manual operation. But the translated document after download is not zh-CN, but the same as the uploaded document, which is en.
What happened? How do I proceed to get the translation I want.

Puppeteer headless mode unable to play video in full screen

I am trying to play the video in full screen with the code below but it only works when the headless argument is set to false which means it cannot work on headless mode
Here is the code trying to play the youtube video on full-screen mode
The code below does click the full-screen button but the video still not playing in full-screen
const browser = await puppeteer.launch(
{
executablePath: '/usr/bin/chromium',
headless: true,
args: ['--start-maximized', '--proxy-server=127.0.0.1:1080'],
userDataDir: './userData',
ignoreDefaultArgs: ["--enable-automation"]
})
const page = await browser.newPage()
let currentScreen = await page.evaluate(() => {
return {
width: window.screen.availWidth,
height: window.screen.availHeight,
deviceScaleFactor: 1
};
});
await page.setViewport(currentScreen);
await page.setUserAgent('Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36')
await page.goto('https://www.youtube.com/watch?v=HS2nNhqnKcQ');
await page.hover('.ytp-fullscreen-button')
await page.waitForTimeout(500)
await page.click('.ytp-fullscreen-button')
await page.screenshot({path: 'youtube.png'})
by the way, it cannot play video in full-screen on pornhub.com too
Here is the answer, simply just change the headless: true to headless: 'chrome'
https://github.com/puppeteer/puppeteer/issues/8819#issuecomment-1223667718

WhatsApp Web Not working in electron app (Update to Google Chrome)

I can't run whatsapp on my electron browser even after setting useragent to the latest chrome version
if someone have solution please tell
It is necessary to remove:
ResponseHeader => "X-Frame-Options"
RequestHeader => "Sec-Fetch-Dest"
I think Whatsapp doesn't allow iframes, so you have to remove that information for it to work properly
In Main process
const mainWindow = new BrowserWindow({
webPreferences: {
nodeIntegration: true,
contextIsolation: false,
},
});
mainWindow.webContents.session.webRequest.onHeadersReceived(
{ urls: ['https://web.whatsapp.com/'] },
(details: any, callback) => {
if (details && details.responseHeaders['X-Frame-Options']) {
delete details.responseHeaders['X-Frame-Options'];
} else if (details.responseHeaders['x-frame-options']) {
delete details.responseHeaders['x-frame-options'];
}
callback({ cancel: false, responseHeaders: details.responseHeaders });
});
mainWindow.webContents.session.webRequest.onBeforeSendHeaders(
{ urls: ['https://web.whatsapp.com/'] },
(details, callback) => {
details.requestHeaders['User-Agent'] = userAgent;
details.requestHeaders['Access-Control-Allow-Origin'] = '*';
if (details.requestHeaders['Sec-Fetch-Dest']) {
delete details.requestHeaders['Sec-Fetch-Dest'];
}
callback({ cancel: false, requestHeaders: details.requestHeaders });
});
In Remote process
<iframe src="https://web.whatsapp.com/" />
You can use different user agent like this:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) old-airport-include/1.0.0 Chrome Electron/7.1.7 Safari/537.36

Node.js scraping with the request module

I want to get html from a web. But it show like that.
meta http-equiv=refresh content="0;url=http://www.skku.edu/errSkkuPage.jsp">
But when I use https://www.naver.com/ instead of https://www.skku.edu/skku/index.do, it works well.
I want to know the reason.
Here's my code.
var request = require('request');
const url = "https://www.skku.edu/skku/index.do";
request(url, function(error, response, body){
if (error) throw error;
console.log(body);
});
The website blocks the request that is coming from programmatic script checking User-Agent in the request header.
Pass the user-Agent that web-browser(eg: Google chrome) sends and it should work.
var request = require('request');
var options = {
'method': 'GET',
'url': 'https://www.skku.edu/skku/index.do',
'headers': {
'User-Agent': ' Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'
}
};
request(options, function (error, response) {
if (error) throw new Error(error);
console.log(response.body);
});
I wouldn't recommend request module as it is not maintained for changes anymore. see it here - https://github.com/request/request/issues/3142
You could look for alternatives in form of got, axios etc which makes code much more readable and clear. And most important thing - Native support for promises and async/await The above code will look like
var got = require('got');
const url = "https://www.skku.edu/skku/index.do";
(async () => {
const response = await got(url);
console.log(response.body);
})();