I'm trying to set the cookies on each request via Puppeteer request interception. I've noticed that while setting headers['sample-header']=1 creates header 'sample-header' equal to 1, setting headers['cookie'] = x... does not set the requests cookie. For instance, the following code does not set any requests cookies.
const browser = await puppeteer.launch({headless: false,executablePath:dirnae});
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await page.setRequestInterception(true);
page.on('request', request => {
const headers = request.headers();
headers['cookie'] = 1;
request.continue({ headers });
});
page.on('request', request => {
console.log(request.headers())
});
page.on('response', response => {
//console.log(response.headers()['set-cookie'])
});
await page.goto('https://google.com');
EDIT: I figured out that I can see the requests cookie header thru handling of the Network.requestWillBeSentExtraInfo event.
However, I can't seem to edit requests in that event.
You cannot change the cookie per network request. You can use the page.setCookie and provide a cookie for different url or domain. Below is the code for reference:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
var cookies = [
{
"name": "sample-cookie1",
"value": "1",
"domain": "stackoverflow.com"
},
{
"name": "sample-cookie2",
"value": "2",
"domain": "pptr.dev"
}
];
await page.setCookie(...cookies);
await page.goto("https://pptr.dev");
console.log(await page.cookies()); //this will have the **sample-cookie2** cookie
await page.goto("https://stackoverflow.com");
console.log(await page.cookies()); //this will have the **sample-cookie1** cookie
})();
Related
I try to run lhci on a website accessible after a Google authentication.
I use Puppeteer to authenticate and it works well but after Puppeteer steps lhci opens a new browser and I can see I am disconnected (I am redirected to Google login page).
The steps I can see are the followings :
-Puppeteer opens a browser
-Puppeteer opens a new page
-Puppeteer logs in with Google account
-I am connected on my website
-LHCI opens a new browser
-I am redirected to Google login page on the new browser
-LHCI tests the google login page performances instead of my website...
lighthouserc.js :
module.exports = {
ci: {
collect: {
headful: true,
disableStorageReset: true,
puppeteerScript: './puppeteerScript.js',
puppeteerLaunchOptions: {
slowMo: 20,
headless: false,
disableStorageReset: true,
},
settings: {
disableStorageReset: true,
preset: 'desktop',
'throttling-method': 'provided',
onlyCategories: ['performance', 'accessibility', 'seo'],
},
numberOfRuns: 1,
url: UrlsTab,
},
assert: {
assertions: {
'categories:performance': ['error', { minScore: 0.9 }],
'categories:accessibility': ['error', { minScore: 0.9 }],
'categories:seo': ['error', { minScore: 0.9 }],
},
},
upload: {
target: 'temporary-public-storage',
},
},
};
puppeteerScript.js
/**
* #param {puppeteer.Browser} browser
* #param {{url: string, options: LHCI.CollectCommand.Options}} context
*/
const puppeteer = require('puppeteer');
async function doGoogleLogin(loginUrl, page, email, password) {
const navigationPromise = page.waitForNavigation();
await page.goto(loginUrl);
await navigationPromise;
await page.waitForSelector('input[type="email"]');
await page.click('input[type="email"]');
await navigationPromise;
await page.type('input[type="email"]', email);
await page.waitForSelector('#identifierNext');
await page.click('#identifierNext');
await page.waitFor(500);
await page.waitForSelector('input[type="password"]');
await page.waitFor(500);
await page.type('input[type="password"]', password);
await page.waitForSelector('#passwordNext');
await page.click('#passwordNext');
await navigationPromise;
await page.waitFor(1000);
}
async function setup(browser, context) {
browser = await puppeteer.launch({ headless: false, disableStorageReset: true });
const page = await browser.newPage();
await page.setCacheEnabled(true);
await doGoogleLogin(context.url, page, googleEmail, googlePassword);
}
module.exports = setup;
I tried disableStorageReset: true but it is not sufficient to preserve the connection. Do you have an idea of something I could try ?
When a page is rendered using the page.setContent method of some static Html content, what is the current folder for attributes such as the src of img tags?
For example, for:
await page.setContent("<img src="./pic.jpg" />");
where is the folder ./?
Maybe it's undefined, here is my test result:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
await page.setContent('<img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
The page URL is about:blank and there's no requests sent.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
await page.setContent('<base href="https://www.google.com"><img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
send request: https://www.google.com/test.jpg
console: Failed to load resource: the server responded with a status of 404 ()
browser request test.jpg after appending a base element while the URL is still about:blank
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('request', request => console.log('send request: ' + request.url()));
page.on('console', message => console.log('console: ' + message.text()));
// set base href to local URL
await page.setContent('<base href="file:///abc/index.html"><img src="./test.jpg" /><script>console.log("href="+window.location.href);</script>');
await browser.close();
})();
output:
console: href=about:blank
console: Not allowed to load local resource: file:///abc/test.jpg
send request: file:///abc/test.jpg
The folder is located from the page you are visiting.
For example if the URL is
mydomain.com/directory1/page.html
The image can be found at mydomain.com/directory1/pic.jpg
i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()
Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}
Any existing sample on how to use puppeteer with nordVpn ?
I tried that:
page = await browser.newPage();
await useProxy(page, `socks5://login:password}#fr806.nordvpn.com:1080`);
I also tried:
'--proxy-server=socks5://login:password#fr806.nordvpn.com:1080'
This script works, you need to change the user/pass to yours... these are not your Nord user/pass... you need to get the service/api ones from in your account settings. Change the server to whatever one you need to use.
#!/usr/bin/env node
// Screengrab generator
// outputs a JSON object with a base64 encoded image of the screengrab
// eg;
const puppeteer = require('puppeteer');
let conf = new Object();
conf.url = "https://www.telegraph.co.uk";
// VPN
conf.vpnUser = conf.vpnUSer || 'USERNAME';
conf.vpnPass = conf.vpnPass || 'PASSWORD';
conf.vpnServer = conf.vpnServer || "https://uk1785.nordvpn.com:89";
(async() => {
const browser = await puppeteer.launch({
headless: true,
args: [
'--disable-dev-shm-usage',
'--proxy-server='+conf.vpnServer
]
});
try {
const page = await browser.newPage();
await page.authenticate({
username: conf.vpnUser,
password: conf.vpnPass,
});
await page.goto(conf.url, { waitUntil: 'networkidle2' });
} catch (error) {
console.error(error);
} finally {
await browser.close();
}
})();
I'm new to puppeteer and node, trying to use a proxy with puppeteer in order to collect requests & responses, hopefully also websocket communication, but so far couldn't get anything to work..
I'm trying the following code:
const puppeteer = require('puppeteer');
const httpProxy = require('http-proxy');
const url = require('url');
let runProxy = async ()=> {
// raise a proxy and start collecting req.url/response.statusCode
};
let run = async () => {
await runProxy();
const browser = await puppeteer.launch({
headless: false,
args: ['--start-fullscreen',
'--proxy-server=localhost:8096']
});
page = await browser.newPage();
await page.setViewport({ width: 1920, height: 1080 });
await page.goto('http://www.google.com',
{waitUntil: 'networkidle2', timeout: 120000});
};
run();
I've tried some variation from https://github.com/nodejitsu/node-http-proxy but nothing seems to work for me, some guidance is at need, thanks
try this, use https-proxy-agent or http-proxy-agent to proxy request for per page:
import {Job, Launcher, OnStart, PuppeteerUtil, PuppeteerWorkerFactory} from "../..";
import {Page} from "puppeteer";
class TestTask {
#OnStart({
urls: [
"https://www.google.com",
"https://www.baidu.com",
"https://www.bilibili.com",
],
workerFactory: PuppeteerWorkerFactory
})
async onStart(page: Page, job: Job) {
await PuppeteerUtil.defaultViewPort(page);
await PuppeteerUtil.useProxy(page, "http://127.0.0.1:2007");
await page.goto(job.url);
console.log(await page.evaluate(() => document.title));
}
}
#Launcher({
workplace: __dirname + "/workplace",
tasks: [
TestTask
],
workerFactorys: [
new PuppeteerWorkerFactory({
headless: false,
devtools: true
})
]
})
class App {}