Some text missing from loaded page - puppeteer

I am using puppeteer to go to URL:https://www.booking.com/hotel/us/l-39-horizon-resort-amp-spa.en-gb.html
The following text in the hotel description sometimes shows up and sometimes not
Hotel Chain: The Leading Hotels of the World
Does anyone knows why this happens?
const puppeteer = require('puppeteer');
let bookingUrl = 'https://www.booking.com/hotel/us/l-39-horizon-resort-amp-spa.en-gb.html';
(async () => {
const browser = await puppeteer.launch({ headless: false ,slowMo: 250});
const page = await browser.newPage();
await page.goto(bookingUrl, { waitUntil : 'networkidle2' });
await delay(4000);
});

Have you tried waiting for it?
const selector = '#hotel_main_content > div.hp_hotel_description_hightlights_wrapper > div.hotel_description_wrapper_exp.hp-description > div.hp_desc_main_content > p.summary.hotel_meta_style';
await page.waitFor(selector, { visible:true });
A consideration about my code: I copied the elector with the Chrome devtools but it could change with time, make a try and if it works, refine it.

Related

how to load extension using puppeteer.connect() method

i use gologin service. gologin is a browser antidetect service where I can fake my browser identity / can manage browser fingerprint.
so I can freely do web-scraping without being detected.
in this case I want to be able to load my extension into that browser using the puppeteer.connect() method.
here's the code:
const puppeteer = require('puppeteer-core');
const GoLogin = require('gologin');
(async () => {
const GL = new GoLogin({
token: 'yU0token',
profile_id: 'yU0Pr0f1leiD',
});
const { status, wsUrl } = await GL.start();
const browser = await puppeteer.connect({
browserWSEndpoint: wsUrl.toString(),
ignoreHTTPSErrors: true,
});
const page = await browser.newPage();
await page.goto('https://myip.link/mini');
console.log(await page.content());
await browser.close();
await GL.stop();
})();
I don't know how. please help me, so i can load my extension using this puppeteer.connect()
Assume your wish is loading chrome-extension into your puppeteer browser.
Find chrome-extension Working Directory Where does Chrome store extensions?
Find your extension ID by go to chrome://extensions/
Sample code:
const puppeteer = require('puppeteer-core');
const MY_EXTENSION_PATH = '~/Library/Application Support/Google/Chrome/Default/Extensions/cdockenadnadldjbbgcallicgledbeoc/0.3.38_0'
async function loadExtension() {
return puppeteer.launch({
headless: 0,
args: [
`--disable-extensions-except=${MY_EXTENSION_PATH}`,
`--load-extension=${MY_EXTENSION_PATH}`,
],
});
}

puppeteer: Different response from the puppeteer browser and the user browser

I use my own browser to get the result page I want. Everything is correct. Page link is below.
https://parcelsapp.com/en/tracking/016-35294405
img for working
I want to use puppeteer to help me to load the result page. The page shows differently.
I use options headless=false to debug. I found the browser pop up from puppeteer can not load the url correctly. I guess it is because the different environments. How can I solve the problem? Thank you.
img for not working
My code is below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slow down by 250ms
executablePath: '/usr/bin/google-chrome-stable',
});
const page = await browser.newPage();
await page.on("request", (request) => {
request.abort();
});
await page.goto('https://parcelsapp.com/en/tracking/016-35294405');
await page.waitForNavigation()
await page.screenshot({ path: 'result.png' });
await browser.close();
})();

Avoid puppeteer detection

Is there any way to avoid being detected by a website that I am using puppeteer? I just can't navigate around the https://www.footlocker.ca/ website using puppeteer. I have tried using stealth plugin and random user-agents to no avail.
Any advice on what else I can try?
This website use navigator.webdriver to check if you are real user or bot. so you can use the code below to delete navigator.webdriver value. docs.
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.evaluateOnNewDocument(() => {
delete navigator.__proto__.webdriver;
});
await page.goto("https://www.footlocker.ca", {
waitUntil: "domcontentloaded",
});
})();

Using puppeteer get a selector from an input label

So, I have a front end in React.js or Ember.js - not sure. I'm just trying to automate some testing for it. When looking at the HTML in the Chrome Dev Tools, I see
<label class="MuiFormLabel-root-16 MuiInputLabel-root-5 MuiInputLabel-formControl-10 MuiInputLabel-animated-13 MuiInputLabel-outlined-15" data-shrink="false">Username</label>
This is set in an iframe (which isn't too important for this issue). I'm trying to get the ElementHandler using the puppeteer function
frame.$(selector)
How do I get the selector given
Username
I've tried a few things, but with little success.
If I understand correctly, you need to find an element by its text content. If so, these are at least two ways:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch(); // { headless: false, defaultViewport: null, args: ['--lang=en'] }
const [page] = await browser.pages();
await page.goto('https://example.org/');
const element1 = await page.evaluateHandle(() => {
return [...document.querySelectorAll('h1')].find(h1 => h1.innerText === 'Example Domain');
});
console.log(await (await element1.getProperty('outerHTML')).jsonValue());
const [element2] = await page.$x('//h1[text()="Example Domain"]');
console.log(await (await element2.getProperty('outerHTML')).jsonValue());
await browser.close();
} catch (err) {
console.error(err);
}
})();
My solution was to place HTML data attributes in the front end code so that I can easily select them.

How to invoke Chrome Node Screenshot from the console?

I know you can capture a single html node vial the command prompt, but is it possible to do this programmatically from the console similar to Puppeteer? I'd like to loop all elements on a page and capture them for occasional one-off projects where I don't want to set up a full auth process in puppeteer.
I'm referring to this functionality:
But executed from the console like during a foreach or something like that.
See the puppeteer reference here.
Something to the effect of this:
$x("//*[contains(#class, 'special-class-name')]").forEach((el)=> el.screenshot())
I just made a script that take a screenshot every submit button in Google main page. Just take a look and take some inspiration from it.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
headless:false,
defaultViewport:null,
devtools: true,
args: ['--window-size=1920,1170','--window-position=0,0']
})
const page = (await browser.pages())[0]
const open = await page.goto ( 'https://www.google.com' )
const submit = await page.$$('input[type="submit"]')
const length = submit.length
let num = 0
const shot = submit.forEach( async elemHandle => {
num++
await elemHandle.screenshot({
path : `${Date.now()}_${num}.png`
})
})
})()
You can use ElementHandle.screenshot() to take a screenshot of a specific element on the page. The ElementHandle can be obtained from Page.$(selector) or Page.$$(selector) if you want to return multiple results.
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("https://stackoverflow.com/questions/50715164");
const userInfo = await page.$(".user-info");
await userInfo.screenshot({ path: "userInfo.png" });
The output image after executing the code: