how to load extension using puppeteer.connect() method - puppeteer

i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()

Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}

Related

Credentials fail logging into Pandora via Puppeteer using Chrome or Chromium

The credentials are verified to be correct.
When I run my script, I have headless:false and am watching everything seemingly behave correctly, but I get an incorrect username/password error. The same holds true if I manually type in the creds in this SAME browser that pops open when my script starts.
If I switch to my installed browser, however, creds work and I can login no problem.
I am just learning Puppeteer and do not have much front end experience. My guess is either Pandora has some extra layers of protection for logging in via scripts, or something I don't know about headless browsers in general.
Here is my script:
const puppeteer = require('puppeteer');
var fs = require('fs');
var ini = require('ini');
var config = ini.parse(fs.readFileSync('./creds.ini', 'utf-8'));
pandora_user = config.pandora.name;
pandora_pass = config.pandora.pass;
(async () => {
const browser = await puppeteer.launch(
{
headless: false,
executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
}
);
const page = await browser.newPage();
const timeout = 500000; // my internet is down, running off bluetooth phone connection, dont judge me here
page.setDefaultTimeout(timeout);
try {
await page.goto("https://www.pandora.com/account/sign-in");
await page.waitForSelector(`input[name=email]`, {timeout, visible: true});
await page.waitForSelector(`input[name=password]`, {timeout, visible: true});
await page.type(`input[name=email]`, pandora_user);
await page.type(`input[name=password]`, pandora_pass);
await Promise.all([
page.click(`button[name="login"]`),
page.waitForNavigation()
])
} catch (err) {
console.log(err);
} finally {
await browser.close();
}
Tried adding ignoreDefaultArgs: ['--enable-automation'], to the launch args

Node js speed up puppeteer html to pdf

I have a node js application that creates dynamic content which I want users to download.
static async downloadPDF(res, html, filename) {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage()
await page.setContent(html, {
waitUntil: 'domcontentloaded'
})
const pdfBuffer = await page.pdf({
format: 'A4'
});
res.set("Content-Disposition", "attachment;filename=" + filename + ".pdf");
res.setHeader("Content-Type", "application/pdf");
res.send(pdfBuffer);
await browser.close()
}
Is there a way to speed up the whole process since it takes about 10 seconds to create a pdf file of size about 100kb?
I read somewhere that I can launch the headless browser once then I will only be creating a new page instead of launching a browser every time I request for the file.
I cannot find out a correct way of doing it.
You could move page creation to a util and hoist it to re-use it.
const puppeteer = require('puppeteer');
let page;
const getPage = async () => {
if (page) return page;
const browser = await puppeteer.launch({
headless: true,
});
page = await browser.newPage();
return page;
};
.
const getPage = require('./getPage');
static async downloadPDF(res, html, filename) {
const page = await getPage()
}
Yes, no reason to launch browser every time. You can set puppeter to call new url and get content. Without every time launching, it would be more faster.
How implement this ? Cut your function to three steps :
Create a browser instance. No matter headless or not. If you run app in X environment, you can launch a window, to see what your puppetter do
Create a function code, that will do main task in cycle.
After block is done, call await page.goto(url) ( where "page" is the instance of browser.newPage() ) and run your function again.
This is one of possible solution in function style code :
Create a instnces :
const browser = await puppeteer.launch( {'headless' : false });
const page = await browser.newPage();
page.setViewport({'width' : 1280, 'height' : 1024 });
I put it in realtime async function like (async ()=>{})();
Gets a data
Im my case, a set of urls was in mongo db, after getting it, I had ran a cycle :
for( const entrie of entries)
{
const url = entrie[1];
const id = entrie[0];
await get_aplicants_data(page,url,id,collection);
}
In get_aplicants_data() I had realized a logic according a loaded page :
await page.goto(url); // Going to url
.... code to prcess page data
Also you can load url in cycle and then put in your logic
Hope I have given you some help )

Avoid puppeteer detection

Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?
This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();

How to invoke Chrome Node Screenshot from the console?

I know you can capture a single html node vial the command prompt, but is it possible to do this programmatically from the console similar to Puppeteer? I'd like to loop all elements on a page and capture them for occasional one-off projects where I don't want to set up a full auth process in puppeteer.
I'm referring to this functionality:
But executed from the console like during a foreach or something like that.
See the puppeteer reference here.
Something to the effect of this:
$x("//*[contains(#class, 'special-class-name')]").forEach((el)=> el.screenshot())
I just made a script that take a screenshot every submit button in Google main page. Just take a look and take some inspiration from it.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
headless:false,
defaultViewport:null,
devtools: true,
args: ['--window-size=1920,1170','--window-position=0,0']
})
const page = (await browser.pages())[0]
const open = await page.goto ( 'https://www.google.com' )
const submit = await page.$$('input[type="submit"]')
const length = submit.length
let num = 0
const shot = submit.forEach( async elemHandle => {
num++
await elemHandle.screenshot({
path : `${Date.now()}_${num}.png`
})
})
})()
You can use ElementHandle.screenshot() to take a screenshot of a specific element on the page. The ElementHandle can be obtained from Page.$(selector) or Page.$$(selector) if you want to return multiple results.
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("https://stackoverflow.com/questions/50715164");
const userInfo = await page.$(".user-info");
await userInfo.screenshot({ path: "userInfo.png" });
The output image after executing the code:

How do I implement NTLM authentication using Puppeteer?

I would like to use Puppeteer to automate testing with a site that requires NTLM authentication. It appears the page.authenticate() API only accepts a username and password but no domain. Does anyone have any suggestions or tips?
Until now, the only way that worked for me was using cntlm:
Install cntlm. http://cntlm.sourceforge.net/
Configure cntlm, following the Configuration hints in the page
Run cntlm
Then use it in Puppeteer like this
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
args: [
"--proxy-server=localhost:3133" // the cntlm proxy defined in cntlm.conf
]
});
const page = await browser.newPage();
await page.goto('http://example.com');
await page.screenshot({ path: 'screenshots/example.png' });
browser.close();
}
run();