I'm new to puppeteer and node, trying to use a proxy with puppeteer in order to collect requests & responses, hopefully also websocket communication, but so far couldn't get anything to work..
I'm trying the following code:
const puppeteer = require('puppeteer');
const httpProxy = require('http-proxy');
const url = require('url');
let runProxy = async ()=> {
// raise a proxy and start collecting req.url/response.statusCode
};
let run = async () => {
await runProxy();
const browser = await puppeteer.launch({
headless: false,
args: ['--start-fullscreen',
'--proxy-server=localhost:8096']
});
page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.goto('http://www.google.com',
{waitUntil: 'networkidle2', timeout: 120000});
};
run();
I've tried some variation from https://github.com/nodejitsu/node-http-proxy but nothing seems to work for me, some guidance is at need, thanks
try this, use https-proxy-agent or http-proxy-agent to proxy request for per page:
import {Job, Launcher, OnStart, PuppeteerUtil, PuppeteerWorkerFactory} from "../..";
import {Page} from "puppeteer";
class TestTask {
#OnStart({
urls: [
"https://www.google.com",
"https://www.baidu.com",
"https://www.bilibili.com",
],
workerFactory: PuppeteerWorkerFactory
})
async onStart(page: Page, job: Job) {
await PuppeteerUtil.defaultViewPort(page);
await PuppeteerUtil.useProxy(page, "http://127.0.0.1:2007");
await page.goto(job.url);
console.log(await page.evaluate(() => document.title));
}
}
#Launcher({
workplace: __dirname + "/workplace",
tasks: [
TestTask
],
workerFactorys: [
new PuppeteerWorkerFactory({
headless: false,
devtools: true
})
]
})
class App {}
Related
How can I get the fully rendered html+css of a client side rendered webpage? The page contents on puppeteer returns a very poorly rendered outcome with missing css
Simplified code:
const express = require('express')
const puppeteer = require('puppeteer');
const app = express()
const port = 3000
async function getHtml(url) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto(url,
{ waitUntil: ['networkidle0', 'networkidle2', 'load', 'domcontentloaded'] });
const k = await page.content()
await browser.close();
return k
};
app.get('/', (request, response) => {
getHtml(request.query.url)
.then(function (res) {
response.send(res);
})
.catch(function (err) {
console.error(err)
response.send(err);
})
});
app.listen(port)
Running this with any website; for example https://www.tesla.com/ gives something like
Although using the page.screenshot() method gives the desired results.
Any ideas on why this occurs? And more importantly, is there a way to get around this behaviour?
I'm trying to submit a login form, but all I get is a timeout after 30 seconds.
My code is rather simple and I can't find anything wrong:
const puppeteer = require('puppeteer');
const creds = {
user: "1234",
password: "1234"
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({width: 1280, height: 800});
await page.goto('https://shop2.idena.de/NewShop/');
await page.type('input[name="FORM_LOGIN"]', creds.user);
await page.type('input[name="FORM_PASSWD"]', creds.password);
await Promise.all([
page.click('button[name="FORM_TYPE"]'),
page.waitForNavigation()
]);
await page.screenshot({path: 'example.png', fullPage: true});
await browser.close();
})();
Any ideas what's going wrong here?
Change the order of the promises a bit, it could be possible, the navigation happens super fast and the waitForNavigation is just waiting for nothing. Or maybe your website loads very slow after clicking the login button.
await Promise.all([
page.waitForNavigation({timeout: 60000}),
page.click('button[name="FORM_TYPE"]'),
]);
If I use your example with headful option, I get this dialog that prevents the page from loading:
So this addition can help (not sure if some dialog emerges with correct credentials):
const puppeteer = require('puppeteer');
const creds = {
user: "1234",
password: "1234"
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({width: 1280, height: 800});
await page.goto('https://shop2.idena.de/NewShop/');
await page.type('input[name="FORM_LOGIN"]', creds.user);
await page.type('input[name="FORM_PASSWD"]', creds.password);
page.on('dialog', async dialog => {
console.log(dialog.message());
await dialog.accept();
});
await Promise.all([
page.click('button[name="FORM_TYPE"]'),
page.waitForNavigation()
]);
await page.screenshot({path: 'example.png', fullPage: true});
await browser.close();
})();
EDIT: I'm updating my answer since more infomration has been provided in the original question.
The problem is that there's a dialog you need to confirm/dismiss:
Perhaps you didn't see it because the script was too fast. I recommend debugging puppeteer scripts with headless set to false and slowMo to some number greater than 0:
const browser = await puppeteer.launch({ headless: false, slowMo: 200 });
Then you need to get rid of the dialog:
page.on('dialog', async (dialog) => {
await dialog.accept();
});
The whole script that now passes:
const puppeteer = require('puppeteer');
const creds = {
user: "1234",
password: "1234"
};
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({width: 1280, height: 800});
await page.goto('https://shop2.idena.de/NewShop/');
await page.type('input[name="FORM_LOGIN"]', creds.user);
await page.type('input[name="FORM_PASSWD"]', creds.password);
page.on('dialog', async (dialog) => {
await dialog.accept();
});
await Promise.all([
page.click('button[name="FORM_TYPE"]'),
page.waitForNavigation()
]);
await page.screenshot({path: 'example.png', fullPage: true});
await browser.close();
})();
Tried below code but getting an error.
Error: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH at
https://www.xxxxxxsolutions.com/
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ignoreHTTPSErrors: true, acceptInsecureCerts: true, args: ['--proxy-bypass-list=*', '--disable-gpu', '--disable-dev-shm-usage', '--disable-setuid-sandbox', '--no-first-run', '--no-sandbox', '--no-zygote', '--single-process', '--ignore-certificate-errors', '--ignore-certificate-errors-spki-list', '--enable-features=NetworkService']});
const page = await browser.newPage();
try {
await page.goto('https://www.xxxxxxxsolutions.com/', {waitUntil: 'networkidle2', timeout: 59000});
const cookies = await page._client.send('Network.getAllCookies');
JSON.stringify(cookies, null, 4);
} catch (e) {
console.log(e);
}
await browser.close();
})();
#mujuonly, this is version related issue. Please try the same code above 1.16.0 or latest version 2.0. It's working fine.
I bought a proxy server version of socsk5.
In all manuals the same example
const browser = await puppeteer.launch({
headless: true,
ignoreHTTPSErrors: true,
defaultViewport: {...winSize},
args: [
'--proxy-server=socks5://proxyhost:8000',
'--host-resolver-rules="MAP * ~NOTFOUND , EXCLUDE proxyhost"',
],
})
It does not specify a login password for this proxy and it clearly does not work
If you specify this
'--proxy-server=socks5://user:password#proxyhost:8000',
it gives an error
net::ERR_NO_SUPPORTED_PROXIES
I tried with https://github.com/sjitech/proxy-login-automator build a bridge, but it did not work either.
Prompt please
You can use page.authenticate() to provide the credentials for your proxy.
For example:
'use strict';
const puppeteer = require('puppeteer');
(async () => {
const username = 'johndoe';
const password = 'qwerty1';
const browser = await puppeteer.launch({
args: [
'--proxy-server=socks5://proxyhost:8000',
],
});
const page = await browser.newPage();
await page.authenticate({
username,
password,
});
await page.goto('https://www.example.com/');
await browser.close();
})();
I trying to collect data from failing requests and js error.
I'm using the following site: https://nitzani1.wixsite.com/marketing-automation/3rd-page
The site has a request to https://api.fixer.io/1latest, which returns a status code of 404,
also the page contains thw following js error:
"Uncaught (in promise) Fetch did not succeed"
I've tried to code bellow to catch the 404 and js error but couldn't.
Not sure what I'm doing wrong, any idea as to how to solve it?
const puppeteer = require('puppeteer');
function wait (ms) {
return new Promise(resolve => setTimeout(() => resolve(), ms));
}
var run = async () => {
const browser = await puppeteer.launch({
headless: false,
args: ['--start-fullscreen']
});
page = await browser.newPage();
page.on('error', err=> {
console.log('err: '+err);
});
page.on('pageerror', pageerr=> {
console.log('pageerr: '+pageerr);
});
page.on('requestfailed', err => console.log('requestfailed: '+err));
collectResponse = [];
await page.on('requestfailed', rf => {
console.log('rf: '+rf);
});
await page.on('response', response => {
const url = response.url();
response.buffer().then(
b => {
// console.log(url+' : '+response.status())
},
e => {
console.log('response err');
}
);
});
await wait(500);
await page.setViewport({ width: 1920, height: 1080 });
await page.goto('https://nitzani1.wixsite.com/marketing-automation/3rd-page', {
});
};
run();
The complete worked answer is:
const puppeteer = require('puppeteer');
const run = async () => {
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage();
// Catch all failed requests like 4xx..5xx status codes
page.on('requestfailed', request => {
console.log(`url: ${request.url()}, errText: ${request.failure().errorText}, method: ${request.method()}`)
});
// Catch console log errors
page.on("pageerror", err => {
console.log(`Page error: ${err.toString()}`);
});
// Catch all console messages
page.on('console', msg => {
console.log('Logger:', msg.type());
console.log('Logger:', msg.text());
console.log('Logger:', msg.location());
});
await page.setViewport({ width: 1920, height: 1080 });
await page.goto('https://nitzani1.wixsite.com/marketing-automation/3rd-page', { waitUntil: 'domcontentloaded' });
await page.waitFor(10000); // To be sure all exceptions logged and handled
await browser.close();
};
run();
Save in .js file and easily run it.
Current puppeteer 8.0.0^ have a very small amount of information in message.text(). So we need to get a description of the error from JSHandle.
Please check this comment with fully descriptive console errors from JSHandle object
Check the link here https://stackoverflow.com/a/66801550/9026103