How to check if the puppeteer is finish downloading the file - puppeteer

I'm using this code to download files. I just hide the urls for security reasons. In the code below how can I check if the puppeteer has finished downloading the file
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setDefaultNavigationTimeout(0);
await page.goto('https://');
await page.waitForSelector('#aaa');
await page.goto('https://');
await page.waitForSelector('#bbb');
await page.goto('https://');

Please see the Puppeteer documentation as this is a very basic first step when working with Puppeteer.
page.waitForNavigation({ waitUntil: 'networkidle0' })
This will:
"consider navigation to be finished when there are no more than 0 network connections for at least 500 ms."
https://pptr.dev/

Related

Credentials fail logging into Pandora via Puppeteer using Chrome or Chromium

The credentials are verified to be correct.
When I run my script, I have headless:false and am watching everything seemingly behave correctly, but I get an incorrect username/password error. The same holds true if I manually type in the creds in this SAME browser that pops open when my script starts.
If I switch to my installed browser, however, creds work and I can login no problem.
I am just learning Puppeteer and do not have much front end experience. My guess is either Pandora has some extra layers of protection for logging in via scripts, or something I don't know about headless browsers in general.
Here is my script:
const puppeteer = require('puppeteer');
var fs = require('fs');
var ini = require('ini');
var config = ini.parse(fs.readFileSync('./creds.ini', 'utf-8'));
pandora_user = config.pandora.name;
pandora_pass = config.pandora.pass;
(async () => {
const browser = await puppeteer.launch(
{
headless: false,
executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
}
);
const page = await browser.newPage();
const timeout = 500000; // my internet is down, running off bluetooth phone connection, dont judge me here
page.setDefaultTimeout(timeout);
try {
await page.goto("https://www.pandora.com/account/sign-in");
await page.waitForSelector(`input[name=email]`, {timeout, visible: true});
await page.waitForSelector(`input[name=password]`, {timeout, visible: true});
await page.type(`input[name=email]`, pandora_user);
await page.type(`input[name=password]`, pandora_pass);
await Promise.all([
page.click(`button[name="login"]`),
page.waitForNavigation()
])
} catch (err) {
console.log(err);
} finally {
await browser.close();
}
Tried adding ignoreDefaultArgs: ['--enable-automation'], to the launch args

how to load extension using puppeteer.connect() method

i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()
Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}

puppeteer: Different response from the puppeteer browser and the user browser

I use my own browser to get the result page I want. Everything is correct. Page link is below.
https://parcelsapp.com/en/tracking/016-35294405
img for working
I want to use puppeteer to help me to load the result page. The page shows differently.
I use options headless=false to debug. I found the browser pop up from puppeteer can not load the url correctly. I guess it is because the different environments. How can I solve the problem? Thank you.
img for not working
My code is below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slow down by 250ms
executablePath: '/usr/bin/google-chrome-stable',
});
const page = await browser.newPage();
await page.on("request", (request) => {
request.abort();
});
await page.goto('https://parcelsapp.com/en/tracking/016-35294405');
await page.waitForNavigation()
await page.screenshot({ path: 'result.png' });
await browser.close();
})();

Avoid puppeteer detection

Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?
This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();

How do I implement NTLM authentication using Puppeteer?

I would like to use Puppeteer to automate testing with a site that requires NTLM authentication. It appears the page.authenticate() API only accepts a username and password but no domain. Does anyone have any suggestions or tips?
Until now, the only way that worked for me was using cntlm:
Install cntlm. http://cntlm.sourceforge.net/
Configure cntlm, following the Configuration hints in the page
Run cntlm
Then use it in Puppeteer like this
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
args: [
"--proxy-server=localhost:3133" // the cntlm proxy defined in cntlm.conf
]
});
const page = await browser.newPage();
await page.goto('http://example.com');
await page.screenshot({ path: 'screenshots/example.png' });
browser.close();
}
run();