Cannot find selector in browserless.io - html

In puppeteer, I am trying to get a p element in the page, which is dynamically created. Locally, I am launching a puppeteer instance by:
puppeteer.launch({
headless: false,
timeout: 90000,
args: [`--no-sandbox`, `--disable-setuid-sandbox`],
});
I can reach the element in the page by:
await page.goto(URL);
await page.waitForSelector(`#selectorName`, {
timeout: 90000,
});
const res = await page.evaluate(() => {
const p = document.querySelector(`#selectorName`);
return p.innerHTML;
});
I want to use browserless.io for my prod environment, so I am connecting to the instance as below, instead of launching Chromium locally:
puppeteer.connect({ browserWSEndpoint: wss://chrome.browserless.io?token=API_TOKEN })
However, the waitForSelector function in this setup gives timeout. The URL is the same and I checked that the page still opens the desired url (by page.url()) and the p element is still there when I open the page locally. Do I need to get the selector in another way, when I render my page in browserless.io?

Related

How to simulate a browser crash with Puppeteer?

I have a lot of tests that are being run with Jest and Puppeteer. I'm creating a new browser instance with puppeteer.launch() before each test, in order to avoid situations where re-using the browser results in flaky behavior. When I run all my tests together, occasionally the browser crashes after launching, and I can detect this by listening for the browser.on('disconnected', ...) event.
I'm now trying to handle that case, and re-create the browser if needed. The problem is how can I force the Chromium browser to crash, and send the same disconnected event? I've tried loading the page with special chrome:// URLs (see below).
Here's the simple example I'm working on:
import puppeteer from 'puppeteer';
async function getPage(url) {
const browser = await puppeteer.launch({});
const page = await browser.newPage();
await browser.on('disconnected', async () => {
console.log('############ browser crashed #############');
});
await page.goto(url, {waitUntil: 'load'});
return page;
};
describe('Raw Puppeteer Test', () => {
let page;
beforeEach(async () => {
page = await getPage('chrome://kill/');
});
afterEach(async () => {
await page.close();
await page.browser().close();
});
it('should relaunch the browser and re-run if the browser crashes', async () => {
const el = await page.waitForSelector('h1');
expect(el).not.toBeNull();
});
});
While kill and crash do appear to crash the Chromium browser, it doesn't send the disconnected event that a real crash would. I also tried inducebrowsercrashforrealz but that just results in a net::ERR_INVALID_URL error.
Is there a way to simulate a failed browser initialization in Puppeteer, that mimics a real-life crash?

puppeteer: Different response from the puppeteer browser and the user browser

I use my own browser to get the result page I want. Everything is correct. Page link is below.
https://parcelsapp.com/en/tracking/016-35294405
img for working
I want to use puppeteer to help me to load the result page. The page shows differently.
I use options headless=false to debug. I found the browser pop up from puppeteer can not load the url correctly. I guess it is because the different environments. How can I solve the problem? Thank you.
img for not working
My code is below:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
headless: false,
slowMo: 250, // slow down by 250ms
executablePath: '/usr/bin/google-chrome-stable',
});
const page = await browser.newPage();
await page.on("request", (request) => {
request.abort();
});
await page.goto('https://parcelsapp.com/en/tracking/016-35294405');
await page.waitForNavigation()
await page.screenshot({ path: 'result.png' });
await browser.close();
})();

In Puppeteer how to capture Chrome browser log in the console

I'm trying to collect Chrome browser logs: browser-issued warnings such as deprecation and interventions. For example, for site https://uriyaa.wixsite.com/corvid-cli2:
A cookie associated with a cross-site resource at http://wix.com/ was set without the `SameSite` attribute.
A future release of Chrome will only deliver cookies with cross-site requests if they are set with `SameSite=None` and `Secure`.
You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.
I thought the following code would do the trick but it only catches logs generated by the page code.
(async ()=> {
const browser = await puppeteer.launch({dumpio: true});
const page = await browser.newPage();
page.on('console', msg => {
for (let i = 0; i < msg._args.length; ++i)
console.log(`${i}: ${msg._args[i]}`);
});
await page.goto('https://uriyaa.wixsite.com/corvid-cli2', {waitUntil: 'networkidle2', timeout: 20000});
await page.screenshot({path: 'screenshot.png'});
await browser.close();
})();
bellow is not relevant as I thought as reportingobserver does not catch the chrome info on cookies without sameSite:
Reading on the subject led me to https://developers.google.com/web/updates/2018/07/reportingobserver but I'm not sure how to use it, using the example int the browser console didn't work.
I'm not sure in which context the observer code should be used or if the browser need a flag to activate the reporting API. Or if this is the way to got about it.
help is welcomed.
Presumably, the 'console' event only catches console.log() and similar calls from the pages. But it seems you can catch warnings from the browser via CDPSession with Log Domain. Unfortunately, it works for me only with a headful browser:
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless: false });
const [page] = await browser.pages();
const cdp = await page.target().createCDPSession();
await cdp.send('Log.enable');
cdp.on('Log.entryAdded', async ({ entry }) => {
console.log(entry);
});
await page.goto('https://uriyaa.wixsite.com/corvid-cli2');
} catch (err) {
console.error(err);
}
})();
And one of the entries:
{
source: 'other',
level: 'warning',
text: 'A cookie associated with a cross-site resource at http://www.wix.com/ was set without the `SameSite` attribute. It has been blocked, as Chrome now only delivers cookies with cross-site requests if they are set with `SameSite=None` and `Secure`. You can review cookies in developer tools under Application>Storage>Cookies and see more details at https://www.chromestatus.com/feature/5088147346030592 and https://www.chromestatus.com/feature/5633521622188032.',
timestamp: 1589058118372.802,
url: 'https://uriyaa.wixsite.com/corvid-cli2'
}
When you launch Puppeteer browser, inside the options object, you should set dumpio to true:
await puppeteer.launch({ dumpio: true });
This will basically "pipe browser process stdout and stderr into process.stdout and process.stderr", which means it will redirect browser logs to whatever main process, server, etc. you are running.
You can see this and other launch options you can use when launching Puppeteer in here: https://www.puppeteersharp.com/api/PuppeteerSharp.LaunchOptions.html

How to invoke Chrome Node Screenshot from the console?

I know you can capture a single html node vial the command prompt, but is it possible to do this programmatically from the console similar to Puppeteer? I'd like to loop all elements on a page and capture them for occasional one-off projects where I don't want to set up a full auth process in puppeteer.
I'm referring to this functionality:
But executed from the console like during a foreach or something like that.
See the puppeteer reference here.
Something to the effect of this:
$x("//*[contains(#class, 'special-class-name')]").forEach((el)=> el.screenshot())
I just made a script that take a screenshot every submit button in Google main page. Just take a look and take some inspiration from it.
const puppeteer = require('puppeteer')
;(async () => {
const browser = await puppeteer.launch({
headless:false,
defaultViewport:null,
devtools: true,
args: ['--window-size=1920,1170','--window-position=0,0']
})
const page = (await browser.pages())[0]
const open = await page.goto ( 'https://www.google.com' )
const submit = await page.$$('input[type="submit"]')
const length = submit.length
let num = 0
const shot = submit.forEach( async elemHandle => {
num++
await elemHandle.screenshot({
path : `${Date.now()}_${num}.png`
})
})
})()
You can use ElementHandle.screenshot() to take a screenshot of a specific element on the page. The ElementHandle can be obtained from Page.$(selector) or Page.$$(selector) if you want to return multiple results.
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("https://stackoverflow.com/questions/50715164");
const userInfo = await page.$(".user-info");
await userInfo.screenshot({ path: "userInfo.png" });
The output image after executing the code:

Autodesk forge viewer in headless chrome, CONTEXT_LOST_WEBGL

I'm trying to use headless Chrome (v66 on Win10, using C#) to take a series of screenshots from a 3D model in forge autodesk viewer.
The problem i'm facing is that once the model is loaded i set the camera into the first position, take a screenshot and then try to set the camera to the next position for the next screenshot. Once i try that (setting the camera position later then on the initial load), the webgl context is lost.
I have too little knowledge of WebGL / swiftshaders / etc, but what i find frustrating, is that when i position the camera directly after the load, it does work. (IE, the workaround is to spawn a seperate headless session per camera view, but since the loading of the geometry takes 20sec or more, thats not preferred)
So, this:
viewerApp.myCurrentViewer.addEventListener(Autodesk.Viewing.GEOMETRY_LOADED_EVENT,
function () {
_viewer = viewerApp.myCurrentViewer;
SetPerspective();
SetCamera(cams[0].position, cams[0].target);//no probs here
document.getElementById('MyViewerDiv').classList.add("geometry-loaded");
});
works (the camera is positioned), but when i execute a javascript function later (using driver.ExecuteScript($"SetCamera({JsonConvert.SerializeObject(target.Value.Position)},{JsonConvert.SerializeObject(target.Value.Target)});"); or on a timeout in the page itself, it outputs WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost.
When i use a smaller model, everything works. Thus i think i understand the reason is too much memory/processing consumption, but why is it working at all then?
Looking at resource monitor, im not convinced that the consumption is actually problematic, my laptop should be capable (i7HQ7700, gtx1050, 16gbRam) I tried fiddling around with some GPU and GL flags of Chrome, to no avail. I suspect the GPU isn't used (which i found some posts that it actually can be used in headless...) Also, the forge viewer outputs GPU mem used, but that might be just the log message:
Starting ChromeDriver 2.38.552522 (437e6fbedfa8762dec75e2c5b3ddb86763dc9dcb) on port 62676
Only local connections are allowed.
[0517/203535.902:ERROR:gpu_process_transport_factory.cc(1007)] Lost UI shared context.
DevTools listening on ws://127.0.0.1:12556/devtools/browser/5b66c120-dc64-4211-a207-ac97152ace9a
---some ssl future warnings---
[0517/203540.524:INFO:CONSOLE(2)] "THREE.WebGLRenderer", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203543.074:INFO:CONSOLE(0)] "[.Offscreen-For-WebGL-00000237DECBB270]RENDER WARNING: there is no texture bound to the unit 0", source: http://localhost:8881/Content/Screenshot.html
[0517/203543.074:INFO:CONSOLE(0)] "[.Offscreen-For-WebGL-00000237DECBB270]RENDER WARNING: there is no texture bound to the unit 0", source: http://localhost:8881/Content/Screenshot.html
[0517/203552.280:INFO:CONSOLE(2)] "Total geometry size: 8.434013366699219 MB", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203552.281:INFO:CONSOLE(2)] "Number of meshes: 2909", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203552.281:INFO:CONSOLE(2)] "Num Meshes on GPU: 2908", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203552.281:INFO:CONSOLE(2)] "Net GPU geom memory used: 7494392", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203558.143:INFO:CONSOLE(0)] "WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost", source: http://localhost:8881/Content/Screenshot.html
To be complete, running the same program without the --headless flag, is working fine, so i guess the code itself is ok.
Is there any way to increase the allowed resources or anything?
(code for SetCamera)
function SetCamera(newPos, newTarget) {
nav = nav || viewerApp.myCurrentViewer.navigation;
nav.setPosition(newPos);
nav.setTarget(newTarget);
nav.orientCameraUp();
}
EDIT: Test case (currently on a test website, so this will we deleted at some point)
EDIT2: Result for running code below
NodeJS:
try {
const URN = '';
const Token = '';
(async () => {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
console.log('browsing');
await page.goto('https://rogerintelligentcloud.azurewebsites.net/test?urn=' + URN + '&token=' + Token);
//replace autodeskURN and token to point to your model
console.log("waiting");
await page.mainFrame().waitForSelector(
'.geometry-loaded', {
timeout: 60000
});
await takescreen(page, 'nodetest1');
await takescreen(page, 'nodetest2');
await takescreen(page, 'nodetest3');
await takescreen(page, 'nodetest4');
await takescreen(page, 'nodetest5');
await takescreen(page, 'nodetest6');
await page.evaluate("Test();");
await takescreen(page, 'nodetest11');
await takescreen(page, 'nodetest12');
await takescreen(page, 'nodetest13');
await takescreen(page, 'nodetest14');
await takescreen(page, 'nodetest15');
await takescreen(page, 'nodetest16');
await browser.close();
})();
} catch (e) {
console.log(e);
}
async function takescreen(page, name){
await page.screenshot({
path: 'c:\\temp\\'+name+'.png'
});
}
I didn't see any WebGL related error messages show up with your snippet and the rme_advanced_sample_project.rvt model, the only one I can see is page.delay is not defined. Here is my test code modified from the your code snippet and Philippe's forge-viewer-headless demo. If I missed something, please kindly point out. Thanks~
import puppeteer from 'puppeteer'
import 'babel-polyfill'
import path from 'path'
import os from 'os';
try {
const URN = 'YOUR_URN';
const Token = 'YOUR_TOKEN';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`https://rogerintelligentcloud.azurewebsites.net/test?urn=${ URN }&token=${ Token }`);
//replace autodeskURN and token to point to your model
await page.mainFrame().waitForSelector(
'.geometry-loaded', {
timeout: 60000
});
await delay(3000);
await page.screenshot({
path: path.join( os.tmpdir(), 'nodetest.png' )
})
await page.evaluate("Test();");
await page.evaluate("Test();");
await page.evaluate("Test();");
await page.evaluate("Test();");
await delay(3000);
const targetTxt = await page.evaluate(() => document.querySelector('body > :last-child').innerText );
console.log( targetTxt );
const targetLen = await page.evaluate(() => document.querySelectorAll('body > div:not(.box)').length );
console.log( targetLen );
await page.screenshot({
path: path.join( os.tmpdir(), 'nodetest2.png' )
})
await browser.close();
})();
function delay(timeout) {
return new Promise((resolve) => {
setTimeout(resolve, timeout);
});
}
}
catch (e) {
console.log(e);
}
Snapshot from my test result:
nodetest.png:
nodetest2.png:
Edit 2:
Tested with your code, it works fine on my machine.
But there is one change to run your code properly in my env. I modified your takescreen function definition:
function takescreen(page, name){
return page.screenshot({
path: 'c:\\temp\\'+name+'.png'
});
}