How do I implement NTLM authentication using Puppeteer? - puppeteer

I would like to use Puppeteer to automate testing with a site that requires NTLM authentication. It appears the page.authenticate() API only accepts a username and password but no domain. Does anyone have any suggestions or tips?

Until now, the only way that worked for me was using cntlm:
Install cntlm. http://cntlm.sourceforge.net/
Configure cntlm, following the Configuration hints in the page
Run cntlm
Then use it in Puppeteer like this
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
args: [
"--proxy-server=localhost:3133" // the cntlm proxy defined in cntlm.conf
]
});
const page = await browser.newPage();
await page.goto('http://example.com');
await page.screenshot({ path: 'screenshots/example.png' });
browser.close();
}
run();

Related

Credentials fail logging into Pandora via Puppeteer using Chrome or Chromium

The credentials are verified to be correct.
When I run my script, I have headless:false and am watching everything seemingly behave correctly, but I get an incorrect username/password error. The same holds true if I manually type in the creds in this SAME browser that pops open when my script starts.
If I switch to my installed browser, however, creds work and I can login no problem.
I am just learning Puppeteer and do not have much front end experience. My guess is either Pandora has some extra layers of protection for logging in via scripts, or something I don't know about headless browsers in general.
Here is my script:
const puppeteer = require('puppeteer');
var fs = require('fs');
var ini = require('ini');
var config = ini.parse(fs.readFileSync('./creds.ini', 'utf-8'));
pandora_user = config.pandora.name;
pandora_pass = config.pandora.pass;
(async () => {
const browser = await puppeteer.launch(
{
headless: false,
executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
}
);
const page = await browser.newPage();
const timeout = 500000; // my internet is down, running off bluetooth phone connection, dont judge me here
page.setDefaultTimeout(timeout);
try {
await page.goto("https://www.pandora.com/account/sign-in");
await page.waitForSelector(`input[name=email]`, {timeout, visible: true});
await page.waitForSelector(`input[name=password]`, {timeout, visible: true});
await page.type(`input[name=email]`, pandora_user);
await page.type(`input[name=password]`, pandora_pass);
await Promise.all([
page.click(`button[name="login"]`),
page.waitForNavigation()
])
} catch (err) {
console.log(err);
} finally {
await browser.close();
}
Tried adding ignoreDefaultArgs: ['--enable-automation'], to the launch args

how to load extension using puppeteer.connect() method

i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()
Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}

Avoid puppeteer detection

Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?
This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();

In Puppeteer how to capture Chrome browser log in the console

I'm trying to collect Chrome browser logs: browser-issued warnings such as deprecation and interventions. For example, for site https://uriyaa.wixsite.com/corvid-cli2:
A cookie associated with a cross-site resource at http://wix.com/ was set without the `SameSite` attribute.
A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`.
You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.
I thought the following code would do the trick but it only catches logs generated by the page code.
(async ()=> {
const browser = await puppeteer.launch({dumpio: true});
const page = await browser.newPage();
page.on('console', msg => {
for (let i = 0; i < msg._args.length; ++i)
console.log(`${i}: ${msg._args[i]}`);
});
await page.goto('https://uriyaa.wixsite.com/corvid-cli2', {waitUntil: 'networkidle2', timeout: 20000});
await page.screenshot({path: 'screenshot.png'});
await browser.close();
})();
bellow is not relevant as I thought as reportingobserver does not catch the chrome info on cookies without sameSite:
Reading on the subject led me to https://developers.google.com/web/updates/2018/07/reportingobserver but I'm not sure how to use it, using the example int the browser console didn't work.
I'm not sure in which context the observer code should be used or if the browser need a flag to activate the reporting API. Or if this is the way to got about it.
help is welcomed.
Presumably, the 'console' event only catches console.log() and similar calls from the pages. But it seems you can catch warnings from the browser via CDPSession with Log Domain. Unfortunately, it works for me only with a headful browser:
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless: false });
const [page] = await browser.pages();
const cdp = await page.target().createCDPSession();
await cdp.send('Log.enable');
cdp.on('Log.entryAdded', async ({ entry }) => {
console.log(entry);
});
await page.goto('https://uriyaa.wixsite.com/corvid-cli2');
} catch (err) {
console.error(err);
}
})();
And one of the entries:
{
source: 'other',
level: 'warning',
text: 'A cookie associated with a cross-site resource at http://www.wix.com/ was set without the `SameSite` attribute. It has been blocked, as Chrome now only delivers cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.',
timestamp: 1589058118372.802,
url: 'https://uriyaa.wixsite.com/corvid-cli2'
}
When you launch Puppeteer browser, inside the options object, you should set dumpio to true:
await puppeteer.launch({ dumpio: true });
This will basically "pipe browser process stdout and stderr into process.stdout and process.stderr", which means it will redirect browser logs to whatever main process, server, etc. you are running.
You can see this and other launch options you can use when launching Puppeteer in here: https://www.puppeteersharp.com/api/PuppeteerSharp.LaunchOptions.html

Make Puppeteer use local profile's cookies

I want to use my local user's profile with Puppeteer. However, it doesn't seem to work.
I launch it with these args.
const browser = await puppeteer.launch({
executablePath: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',
userDataDir: '/Users/me/Library/Application Support/Google/Chrome',
});
When headless, it doesn't use the user's local profile's cookies at all, even though I'd expect it to. When it isn't headless, it can't even open the tab; Puppeteer crashes with
(node:23303) UnhandledPromiseRejectionWarning: Error: Failed to launch chrome!
TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
Is there a way to use my local user's profile? I'm using ^1.7.0 and Chrome 70.0.3521.2.
Rather than setting a userDataDir path in the Puppeteer.launch arguments you can use the chrome-cookies-secure NPM package to use cookies for one of your existing Chrome Profiles. This solution does not require Chrome Canary to be installed.
With your macOS keychain authorisation, the package reads the cookies for a given url from your hard-disk and makes them accessible in NodeJS. You can then load them into Puppeteer using the page.setCookie(...) method.
Here's an example:
const chrome = require('chrome-cookies-secure');
const puppeteer = require('puppeteer');
const url = 'https://www.yourUrl.com';
const getCookies = (callback) => {
chrome.getCookies(url, 'puppeteer', function(err, cookies) {
if (err) {
console.log(err, 'error');
return
}
console.log(cookies, 'cookies');
callback(cookies);
}, 'yourProfile') // e.g. 'Profile 2'
}
// find profiles at ~/Library/Application Support/Google/Chrome
getCookies(async (cookies) => {
const browser = await puppeteer.launch({
headless: false
});
const page = await browser.newPage();
await page.setCookie(...cookies);
await page.goto(url);
await page.waitFor(1000);
browser.close()
});
I solved this on MacOS by installing chrome canary, copying my default folder contained in ~/Library/Application Support/Google/Chrome/Default to ~/Library/Application Support/Google/Chrome\ Canary/Default
My working code looks like this:
async function run() {
const browser = await puppeteer.launch({
headless: false,
executablePath: '/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary',
userDataDir: '/Users/radium/Library/Application\ Support/Google/Chrome\ Canary/',
});
}
I was previously using the file path all the way to the Default folder, and truncated it to end with 'Chrome Canary' folder. This fixed everything. I have not tried with regular chrome.