I use my own browser to get the result page I want. Everything is correct. Page link is below.
https://parcelsapp.com/en/tracking/016-35294405
img for working
I want to use puppeteer to help me to load the result page. The page shows differently.
I use options headless=false to debug. I found the browser pop up from puppeteer can not load the url correctly. I guess it is because the different environments. How can I solve the problem? Thank you.
img for not working
My code is below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slow down by 250ms
executablePath: '/usr/bin/google-chrome-stable',
});
const page = await browser.newPage();
await page.on("request", (request) => {
request.abort();
});
await page.goto('https://parcelsapp.com/en/tracking/016-35294405');
await page.waitForNavigation()
await page.screenshot({ path: 'result.png' });
await browser.close();
})();
Related
How to press control + P on a web page that is automated by puppeteer?
This code loads the web page. But using await page.keyboard.down('Control') to press the Control key has no effect.
(async () =>
{
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(`https://google.com`);
await page.waitForSelector('input');
await page.focus("input");
// this works
await page.keyboard.down('Shift');
await page.keyboard.press('KeyP');
await page.keyboard.up('Shift');
// this has no effect.
await page.keyboard.down('Control');
await page.keyboard.press('KeyP');
await page.keyboard.up('Control');
})();
What I would like to do is navigate to a PDF file. Have the browser open the PDF. Then press Control P and automate the print dialog to the extent that the code selects the printer to print to and presses the Enter key.
running puppeteer in kiosk mode enables the window.print( ) dialog to be automatically responded to.
const puppeteer = require('puppeteer');
(async () =>
{
const browser = await puppeteer.launch(
{
headless: false,
"args": [ "--kiosk-printing" ]
});
const page = await browser.newPage();
await page.goto(`file:///C:/Users/srich/Downloads/packing-list.pdf`);
await page.evaluate(() => { window.print(); });
await page.waitForTimeout(2000) ;
await browser.close( ) ;
})();
i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()
Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}
Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?
This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();
I am using puppeteer to go to URL:https://www.booking.com/hotel/us/l-39-horizon-resort-amp-spa.en-gb.html
The following text in the hotel description sometimes shows up and sometimes not
Hotel Chain: The Leading Hotels of the World
Does anyone knows why this happens?
const puppeteer = require('puppeteer');
let bookingUrl = 'https://www.booking.com/hotel/us/l-39-horizon-resort-amp-spa.en-gb.html';
(async () => {
const browser = await puppeteer.launch({ headless: false ,slowMo: 250});
const page = await browser.newPage();
await page.goto(bookingUrl, { waitUntil : 'networkidle2' });
await delay(4000);
});
Have you tried waiting for it?
const selector = '#hotel_main_content > div.hp_hotel_description_hightlights_wrapper > div.hotel_description_wrapper_exp.hp-description > div.hp_desc_main_content > p.summary.hotel_meta_style';
await page.waitFor(selector, { visible:true });
A consideration about my code: I copied the elector with the Chrome devtools but it could change with time, make a try and if it works, refine it.
I want to close pages when puppeteer faces on any error , sometimes page the page that i try to load crashes and it doesnt call .close();
(async () => {
const page = await browser.newPage();
await page.setViewport({width: resWidth, height: resHeight});
await page.goto(d["entities"]["urls"][0]["expanded_url"], {timeout :90000});
await page.screenshot({path: './resimdata/'+d['id']+'.png' ,fullPage: true});
await page.close();
})();
There is an issue/PR on puppeteer repo regarding this which will be helpful in similar situation.
Related Issue link: https://github.com/GoogleChrome/puppeteer/issues/952
Meanwhile, you can try this little hack, if the PR is there on version 0.12+, we don't have to worry about the following code.
(async() => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
function handleClose(msg){
console.log(msg);
page.close();
browser.close();
process.exit(1);
}
process.on("uncaughtException", () => {
handleClose(`I crashed`);
});
process.on("unhandledRejection", () => {
handleClose(`I was rejected`);
});
await page.goto("chrome://crash");
})();
Which will output something like the following,
▶ node app/app.js
I was rejected