how to load extension using puppeteer.connect() method

how to load extension using puppeteer.connect() method - puppeteer

i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()

Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}

Related

Credentials fail logging into Pandora via Puppeteer using Chrome or Chromium

The credentials are verified to be correct.
When I run my script, I have headless:false and am watching everything seemingly behave correctly, but I get an incorrect username/password error. The same holds true if I manually type in the creds in this SAME browser that pops open when my script starts.
If I switch to my installed browser, however, creds work and I can login no problem.
I am just learning Puppeteer and do not have much front end experience. My guess is either Pandora has some extra layers of protection for logging in via scripts, or something I don't know about headless browsers in general.
Here is my script:
const puppeteer = require('puppeteer');
var fs = require('fs');
var ini = require('ini');
var config = ini.parse(fs.readFileSync('./creds.ini', 'utf-8'));
pandora_user = config.pandora.name;
pandora_pass = config.pandora.pass;
(async () => {
const browser = await puppeteer.launch(
{
headless: false,
executablePath: "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
}
);
const page = await browser.newPage();
const timeout = 500000; // my internet is down, running off bluetooth phone connection, dont judge me here
page.setDefaultTimeout(timeout);
try {
await page.goto("https://www.pandora.com/account/sign-in");
await page.waitForSelector(`input[name=email]`, {timeout, visible: true});
await page.waitForSelector(`input[name=password]`, {timeout, visible: true});
await page.type(`input[name=email]`, pandora_user);
await page.type(`input[name=password]`, pandora_pass);
await Promise.all([
page.click(`button[name="login"]`),
page.waitForNavigation()
])
} catch (err) {
console.log(err);
} finally {
await browser.close();
}
Tried adding ignoreDefaultArgs: ['--enable-automation'], to the launch args

Node js speed up puppeteer html to pdf

I have a node js application that creates dynamic content which I want users to download.
static async downloadPDF(res, html, filename) {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage()
await page.setContent(html, {
waitUntil: 'domcontentloaded'
})
const pdfBuffer = await page.pdf({
format: 'A4'
});
res.set("Content-Disposition", "attachment;filename=" + filename + ".pdf");
res.setHeader("Content-Type", "application/pdf");
res.send(pdfBuffer);
await browser.close()
}
Is there a way to speed up the whole process since it takes about 10 seconds to create a pdf file of size about 100kb?
I read somewhere that I can launch the headless browser once then I will only be creating a new page instead of launching a browser every time I request for the file.
I cannot find out a correct way of doing it.

You could move page creation to a util and hoist it to re-use it.
const puppeteer = require('puppeteer');
let page;
const getPage = async () => {
if (page) return page;
const browser = await puppeteer.launch({
headless: true,
});
page = await browser.newPage();
return page;
};
.
const getPage = require('./getPage');
static async downloadPDF(res, html, filename) {
const page = await getPage()
}

Yes, no reason to launch browser every time. You can set puppeter to call new url and get content. Without every time launching, it would be more faster.
How implement this ? Cut your function to three steps :
Create a browser instance. No matter headless or not. If you run app in X environment, you can launch a window, to see what your puppetter do
Create a function code, that will do main task in cycle.
After block is done, call await page.goto(url) ( where "page" is the instance of browser.newPage() ) and run your function again.
This is one of possible solution in function style code :
Create a instnces :
const browser = await puppeteer.launch( {'headless' : false });
const page = await browser.newPage();
page.setViewport({'width' : 1280, 'height' : 1024 });
I put it in realtime async function like (async ()=>{})();
Gets a data
Im my case, a set of urls was in mongo db, after getting it, I had ran a cycle :
for( const entrie of entries)
{
const url = entrie[1];
const id = entrie[0];
await get_aplicants_data(page,url,id,collection);
}
In get_aplicants_data() I had realized a logic according a loaded page :
await page.goto(url); // Going to url
.... code to prcess page data
Also you can load url in cycle and then put in your logic
Hope I have given you some help )

Avoid puppeteer detection

Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?

This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();

How to invoke Chrome Node Screenshot from the console?

I know you can capture a single html node vial the command prompt, but is it possible to do this programmatically from the console similar to Puppeteer? I'd like to loop all elements on a page and capture them for occasional one-off projects where I don't want to set up a full auth process in puppeteer.
I'm referring to this functionality:
But executed from the console like during a foreach or something like that.
See the puppeteer reference here.
Something to the effect of this:
$x("//*[contains(#class, 'special-class-name')]").forEach((el)=> el.screenshot())

I just made a script that take a screenshot every submit button in Google main page. Just take a look and take some inspiration from it.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
headless:false,
defaultViewport:null,
devtools: true,
args: ['--window-size=1920,1170','--window-position=0,0']
})
const page = (await browser.pages())[0]
const open = await page.goto ( 'https://www.google.com' )
const submit = await page.$$('input[type="submit"]')
const length = submit.length
let num = 0
const shot = submit.forEach( async elemHandle => {
num++
await elemHandle.screenshot({
path : `${Date.now()}_${num}.png`
})
})
})()

You can use ElementHandle.screenshot() to take a screenshot of a specific element on the page. The ElementHandle can be obtained from Page.$(selector) or Page.$$(selector) if you want to return multiple results.
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("https://stackoverflow.com/questions/50715164");
const userInfo = await page.$(".user-info");
await userInfo.screenshot({ path: "userInfo.png" });
The output image after executing the code:

How do I implement NTLM authentication using Puppeteer?

I would like to use Puppeteer to automate testing with a site that requires NTLM authentication. It appears the page.authenticate() API only accepts a username and password but no domain. Does anyone have any suggestions or tips?

Until now, the only way that worked for me was using cntlm:
Install cntlm. http://cntlm.sourceforge.net/
Configure cntlm, following the Configuration hints in the page
Run cntlm
Then use it in Puppeteer like this
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({
args: [
"--proxy-server=localhost:3133" // the cntlm proxy defined in cntlm.conf
]
});
const page = await browser.newPage();
await page.goto('http://example.com');
await page.screenshot({ path: 'screenshots/example.png' });
browser.close();
}
run();

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

how to load extension using puppeteer.connect() method - puppeteer

Related

Credentials fail logging into Pandora via Puppeteer using Chrome or Chromium

Node js speed up puppeteer html to pdf

Avoid puppeteer detection

How to invoke Chrome Node Screenshot from the console?

How do I implement NTLM authentication using Puppeteer?

Categories

Resources