i just want to use XPath to get innerText using Puppeteer. This is code
import * as puppeteer from 'puppeteer-core';
(async () => {
// Make the browser visible by default, extend the timeout, and set a default viewport size
const browser = await puppeteer.launch({
executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
userDataDir: 'C:\\ctvbanhang\\browserData',
defaultViewport: { width: 1920, height: 1080 },
headless: false, // true = hide screen, false = show screen
timeout: 60000, // 60 seconds
});
// The browser automatically opens a page, so use that
const page = (await browser.pages())[0];
await page.goto('https://example.com/');
var XPath = "//h1/text()";// //'div.product-briefing > div > div > div > span';
// //await page.waitForSelector(selector);
await page.waitForXPath(XPath);
let result = await page.evaluate(element => {
console.log(element); //log in browser
console.log(typeof element); //log in browser
console.log(JSON.stringify(element)); //log in browser
return element;
}, (await page.$x(XPath))[0]);
console.log(result); //log in terminal
await page.waitFor(100000);
await browser.close();
})()
.then(() => {
console.log('Browser scans complete!');
})
Why the result is not the same?
this is result log in browser
and in terminal
According to the docs, various eval functions can transfer only serializable data (roughly, the data JSON can handle, with some additions). Your code returns a DOM element (Text node), which is not serializable (it has methods and circular references). Try to retrieve the data in the browser context and returns only serializable data. For example:
return element.wholeText;
Related
Using Puppeteer version: "9.0.0"
Unfortunately debugging in chrome dev tools does not work at all with this puppeteer version.
So I resorted to NDB.
Using NDB I can breakpoint anywhere apart from inside page.evaluate function and page.$$eval().
Running the script with ndb:
"scripts": {
"test": "echo \"Error: no test specified\" && exit 1",
"start": "ndb node startscrape.js"
},
startscrape.js
( async () => {
const browser = await puppeteer.launch(
{
headless: false,
defaultViewport:null,
slowMo: 250,
devtools:true,
});
const page = await browser.newPage();
await page.goto('https://www.google.com');
await page.type('input', 'Here' );
await page.keyboard.press('Enter');
await page.waitForNavigation();
let x = () => {
debugger;
console.log('can I debug here?') //YES - breakpoints work upon executing x();
};
x();
let xa = await page.evaluate(() => {
console.log('Alive'); // Logging works in console but cannot breakpoint
let elements = document.getElementsByClassName('someitem');
return elements;
});
// Cannot debug inside here too
// let xa = await page.$$eval('body', (body) => {
// console.log('Alive');
// let elements = document.getElementsByClassName('serp-item');
// return elements;
// });
// breakpoint comes here
await page.goto('https://www.google.com');
//await browser.waitForTarget(() => false);
})();
It seems the function arguments of page.evaluate() and similar ones are not executed per se: their serialized (stringified) code is transferred from the Node.js context into the browser context, then a new function is recreated from that code and executed there. That is why the breakpoints in the initial function have no effect on the recreated function.
I have a node js application that creates dynamic content which I want users to download.
static async downloadPDF(res, html, filename) {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({
headless: true
});
const page = await browser.newPage()
await page.setContent(html, {
waitUntil: 'domcontentloaded'
})
const pdfBuffer = await page.pdf({
format: 'A4'
});
res.set("Content-Disposition", "attachment;filename=" + filename + ".pdf");
res.setHeader("Content-Type", "application/pdf");
res.send(pdfBuffer);
await browser.close()
}
Is there a way to speed up the whole process since it takes about 10 seconds to create a pdf file of size about 100kb?
I read somewhere that I can launch the headless browser once then I will only be creating a new page instead of launching a browser every time I request for the file.
I cannot find out a correct way of doing it.
You could move page creation to a util and hoist it to re-use it.
const puppeteer = require('puppeteer');
let page;
const getPage = async () => {
if (page) return page;
const browser = await puppeteer.launch({
headless: true,
});
page = await browser.newPage();
return page;
};
.
const getPage = require('./getPage');
static async downloadPDF(res, html, filename) {
const page = await getPage()
}
Yes, no reason to launch browser every time. You can set puppeter to call new url and get content. Without every time launching, it would be more faster.
How implement this ? Cut your function to three steps :
Create a browser instance. No matter headless or not. If you run app in X environment, you can launch a window, to see what your puppetter do
Create a function code, that will do main task in cycle.
After block is done, call await page.goto(url) ( where "page" is the instance of browser.newPage() ) and run your function again.
This is one of possible solution in function style code :
Create a instnces :
const browser = await puppeteer.launch( {'headless' : false });
const page = await browser.newPage();
page.setViewport({'width' : 1280, 'height' : 1024 });
I put it in realtime async function like (async ()=>{})();
Gets a data
Im my case, a set of urls was in mongo db, after getting it, I had ran a cycle :
for( const entrie of entries)
{
const url = entrie[1];
const id = entrie[0];
await get_aplicants_data(page,url,id,collection);
}
In get_aplicants_data() I had realized a logic according a loaded page :
await page.goto(url); // Going to url
.... code to prcess page data
Also you can load url in cycle and then put in your logic
Hope I have given you some help )
So, I have a front end in React.js or Ember.js - not sure. I'm just trying to automate some testing for it. When looking at the HTML in the Chrome Dev Tools, I see
<label class="MuiFormLabel-root-16 MuiInputLabel-root-5 MuiInputLabel-formControl-10 MuiInputLabel-animated-13 MuiInputLabel-outlined-15" data-shrink="false">Username</label>
This is set in an iframe (which isn't too important for this issue). I'm trying to get the ElementHandler using the puppeteer function
frame.$(selector)
How do I get the selector given
Username
I've tried a few things, but with little success.
If I understand correctly, you need to find an element by its text content. If so, these are at least two ways:
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch(); // { headless: false, defaultViewport: null, args: ['--lang=en'] }
const [page] = await browser.pages();
await page.goto('https://example.org/');
const element1 = await page.evaluateHandle(() => {
return [...document.querySelectorAll('h1')].find(h1 => h1.innerText === 'Example Domain');
});
console.log(await (await element1.getProperty('outerHTML')).jsonValue());
const [element2] = await page.$x('//h1[text()="Example Domain"]');
console.log(await (await element2.getProperty('outerHTML')).jsonValue());
await browser.close();
} catch (err) {
console.error(err);
}
})();
My solution was to place HTML data attributes in the front end code so that I can easily select them.
Can someone explain why this code isn't working. I have a console log before I run page.evaluate() which logs what I expect, but the console log inside page.evaluate never runs.
const puppeteer = require('puppeteer');
(async () => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.example.com');
page.on('response', async response => {
const url = response.url();
if (url.includes('something')) {
console.log('this code runs');
await page.evaluate(() => {
console.log("this code doesn't run");
});
}
});
} catch (err) {
console.log(err);
}
})();
Console log doesn't work in page.evaluate()
https://github.com/GoogleChrome/puppeteer/issues/1944
Try to use this code for display console.log from evaluate
page.on('console', msg => {
for (let i = 0; i < msg.args().length; ++i)
console.log(`${i}: ${msg.args()[i]}`);
});
page.evaluate(() => console.log('hello', 5, {foo: 'bar'}));
https://pptr.dev/#?product=Puppeteer&version=v1.20.0&show=api-event-console
The code inside page.evaluate is run in the browser context, so the console.log works, but inside the Chrome console and not the Puppeteer one.
To display the logs of the Chrome context inside the Puppeteer console, you can set dumpio to true in the arguments when launching a browser using Puppeteer:
const browser = await puppeteer.launch({
dumpio: true
})
Console.log works but in the browser context. I'm guessing here that you are trying to see the log in the CLI. If you want to see the log set headless to false and then see the log in the browser console.
I'm trying to use headless Chrome (v66 on Win10, using C#) to take a series of screenshots from a 3D model in forge autodesk viewer.
The problem i'm facing is that once the model is loaded i set the camera into the first position, take a screenshot and then try to set the camera to the next position for the next screenshot. Once i try that (setting the camera position later then on the initial load), the webgl context is lost.
I have too little knowledge of WebGL / swiftshaders / etc, but what i find frustrating, is that when i position the camera directly after the load, it does work. (IE, the workaround is to spawn a seperate headless session per camera view, but since the loading of the geometry takes 20sec or more, thats not preferred)
So, this:
viewerApp.myCurrentViewer.addEventListener(Autodesk.Viewing.GEOMETRY_LOADED_EVENT,
function () {
_viewer = viewerApp.myCurrentViewer;
SetPerspective();
SetCamera(cams[0].position, cams[0].target);//no probs here
document.getElementById('MyViewerDiv').classList.add("geometry-loaded");
});
works (the camera is positioned), but when i execute a javascript function later (using driver.ExecuteScript($"SetCamera({JsonConvert.SerializeObject(target.Value.Position)},{JsonConvert.SerializeObject(target.Value.Target)});"); or on a timeout in the page itself, it outputs WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost.
When i use a smaller model, everything works. Thus i think i understand the reason is too much memory/processing consumption, but why is it working at all then?
Looking at resource monitor, im not convinced that the consumption is actually problematic, my laptop should be capable (i7HQ7700, gtx1050, 16gbRam) I tried fiddling around with some GPU and GL flags of Chrome, to no avail. I suspect the GPU isn't used (which i found some posts that it actually can be used in headless...) Also, the forge viewer outputs GPU mem used, but that might be just the log message:
Starting ChromeDriver 2.38.552522 (437e6fbedfa8762dec75e2c5b3ddb86763dc9dcb) on port 62676
Only local connections are allowed.
[0517/203535.902:ERROR:gpu_process_transport_factory.cc(1007)] Lost UI shared context.
DevTools listening on ws://127.0.0.1:12556/devtools/browser/5b66c120-dc64-4211-a207-ac97152ace9a
---some ssl future warnings---
[0517/203540.524:INFO:CONSOLE(2)] "THREE.WebGLRenderer", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203543.074:INFO:CONSOLE(0)] "[.Offscreen-For-WebGL-00000237DECBB270]RENDER WARNING: there is no texture bound to the unit 0", source: http://localhost:8881/Content/Screenshot.html
[0517/203543.074:INFO:CONSOLE(0)] "[.Offscreen-For-WebGL-00000237DECBB270]RENDER WARNING: there is no texture bound to the unit 0", source: http://localhost:8881/Content/Screenshot.html
[0517/203552.280:INFO:CONSOLE(2)] "Total geometry size: 8.434013366699219 MB", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203552.281:INFO:CONSOLE(2)] "Number of meshes: 2909", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203552.281:INFO:CONSOLE(2)] "Num Meshes on GPU: 2908", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203552.281:INFO:CONSOLE(2)] "Net GPU geom memory used: 7494392", source: https://developer.api.autodesk.com/modelderivative/v2/viewers/three.min.js (2)
[0517/203558.143:INFO:CONSOLE(0)] "WebGL: CONTEXT_LOST_WEBGL: loseContext: context lost", source: http://localhost:8881/Content/Screenshot.html
To be complete, running the same program without the --headless flag, is working fine, so i guess the code itself is ok.
Is there any way to increase the allowed resources or anything?
(code for SetCamera)
function SetCamera(newPos, newTarget) {
nav = nav || viewerApp.myCurrentViewer.navigation;
nav.setPosition(newPos);
nav.setTarget(newTarget);
nav.orientCameraUp();
}
EDIT: Test case (currently on a test website, so this will we deleted at some point)
EDIT2: Result for running code below
NodeJS:
try {
const URN = '';
const Token = '';
(async () => {
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
console.log('browsing');
await page.goto('https://rogerintelligentcloud.azurewebsites.net/test?urn=' + URN + '&token=' + Token);
//replace autodeskURN and token to point to your model
console.log("waiting");
await page.mainFrame().waitForSelector(
'.geometry-loaded', {
timeout: 60000
});
await takescreen(page, 'nodetest1');
await takescreen(page, 'nodetest2');
await takescreen(page, 'nodetest3');
await takescreen(page, 'nodetest4');
await takescreen(page, 'nodetest5');
await takescreen(page, 'nodetest6');
await page.evaluate("Test();");
await takescreen(page, 'nodetest11');
await takescreen(page, 'nodetest12');
await takescreen(page, 'nodetest13');
await takescreen(page, 'nodetest14');
await takescreen(page, 'nodetest15');
await takescreen(page, 'nodetest16');
await browser.close();
})();
} catch (e) {
console.log(e);
}
async function takescreen(page, name){
await page.screenshot({
path: 'c:\\temp\\'+name+'.png'
});
}
I didn't see any WebGL related error messages show up with your snippet and the rme_advanced_sample_project.rvt model, the only one I can see is page.delay is not defined. Here is my test code modified from the your code snippet and Philippe's forge-viewer-headless demo. If I missed something, please kindly point out. Thanks~
import puppeteer from 'puppeteer'
import 'babel-polyfill'
import path from 'path'
import os from 'os';
try {
const URN = 'YOUR_URN';
const Token = 'YOUR_TOKEN';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`https://rogerintelligentcloud.azurewebsites.net/test?urn=${ URN }&token=${ Token }`);
//replace autodeskURN and token to point to your model
await page.mainFrame().waitForSelector(
'.geometry-loaded', {
timeout: 60000
});
await delay(3000);
await page.screenshot({
path: path.join( os.tmpdir(), 'nodetest.png' )
})
await page.evaluate("Test();");
await page.evaluate("Test();");
await page.evaluate("Test();");
await page.evaluate("Test();");
await delay(3000);
const targetTxt = await page.evaluate(() => document.querySelector('body > :last-child').innerText );
console.log( targetTxt );
const targetLen = await page.evaluate(() => document.querySelectorAll('body > div:not(.box)').length );
console.log( targetLen );
await page.screenshot({
path: path.join( os.tmpdir(), 'nodetest2.png' )
})
await browser.close();
})();
function delay(timeout) {
return new Promise((resolve) => {
setTimeout(resolve, timeout);
});
}
}
catch (e) {
console.log(e);
}
Snapshot from my test result:
nodetest.png:
nodetest2.png:
Edit 2:
Tested with your code, it works fine on my machine.
But there is one change to run your code properly in my env. I modified your takescreen function definition:
function takescreen(page, name){
return page.screenshot({
path: 'c:\\temp\\'+name+'.png'
});
}