Unable to locate an element with puppeteer - puppeteer

I'm trying to do a basic search on FB marketplace with puppeteer(and it was working for me before) but fails recently.
The whole thing fails when it gets to "location" link on marketplace page. to change the location i need to click on it, but puppeteer Errors out saying:
Error: Node is either not visible or not an HTMLElement
If i try to get the boundingBox of the element it returns null
const browser = await puppeteer.launch();
const page = await browser.newPage();
const resp = await page.goto('https://www.facebook.com/marketplace', { waitUntil: 'networkidle2' })
const withinLink = await page.waitForXPath('//span[contains(.,"Within")]', { timeout: 4000 })
console.log(await withinLink.boundingBox()) //returns null
await withinLink.click() //errors out
If i take a screenshot of the page right before i locate an element it is clearly there and i am able to locate in in chrome console using the same xPath manually.
It just doesn't seem to work in puppeteer
Something clearly changed on FB. Maybe they started to use some AI technology to detect scraping?

I don't think facebook changed in headless browser detection lately, but it seems you haven't taken into account that const withinLink = await page.waitForXPath('//span[contains(.,"Within")]', { timeout: 4000 }) returns an array, even if there is only one matching elment to contains(.,"Within").
That should work if you add [0] index to the elementHandles:
const withinLink = await page.waitForXPath('//span[contains(.,"Within")]')
console.log(await withinLink[0].boundingBox())
await withinLink[0].click()
Note: Timeout is not mandatory in waitForXPath, but I'd suggest to rather use domcontentloaded instead of networkidle2 in page.goto if you don't need all analytics/tracking events to achive the desired results, it just slows down your script execution.
Note 2: Honestly, I don't have such element on my fb platform, maybe it is market dependent. But it works with any other XPath selectors with specific content.

Related

Puppeteer: use the first new page when opening incognito context

const context = await browser.createIncognitoBrowserContext();
const [page] = await context.pages();
Why isn't this code working? The array returned by pages() is always empty.
The only way I know to actually open a page is to call context.newPage(), which means having an about:blank empty page. I would like to avoid that.

How do you paste text using Puppeteer?

I am trying to write a test (using jest-puppeteer) for an input in my React application that handles autocomplete or copy/pasted strings in a unique way.
I was hoping by using Puppeteer, I could paste text into the input and then validate that the page is updated correctly. Unfortunately, I can't find any working example of how to do this.
I've tried using page.keyboard to simulate CMD+C & CMD+V but it does not appear that these sorts of commands work in Puppeteer.
I've also tried using a library such as clipboardy to write and read to the OS clipboard. While clipboardy does work for write (copy), it seems read (paste) does not affect the page run by Puppeteer.
I have successfully copied the text using a variety of methods but have no way to paste into the input. I've validated this assumption by adding event listeners for "copy" and "paste" to the document. The "copy" events fire, but no method has resulted in the "paste" event firing.
Here are a few approaches I have tried:
await clipboardy.write('1234'); // writes "1234" to clipboard
await page.focus("input");
await clipboardy.read(); // Supposedly pastes from clipboard
// assert input has updated
await clipboardy.write('1234');
await page.focus("input");
await page.keyboard.down('Meta');
await page.keyboard.press('KeyV');
await page.keyboard.up('Meta');
// assert input has updated
await page.evaluate(() => {
const input = document.createElement('input');
document.body.appendChild(input);
input.value = '1234';
input.focus();
input.select();
document.execCommand('copy');
document.body.removeChild(input);
});
wait page.focus("input");
await page.keyboard.down('Meta');
await page.keyboard.press('KeyV');
await page.keyboard.up('Meta');
I think the only missing piece here is pasting the text; but how do you paste text using Puppeteer?
This works for me with clipboardy, but not when I launch it in headless :
await clipboardy.write('foo')
const input= await puppeteerPage.$(inputSelector)
await input.focus()
await puppeteerPage.keyboard.down('Control')
await puppeteerPage.keyboard.press('V')
await puppeteerPage.keyboard.up('Control')
If you make it works in headless tell me.
I tried it the clipBoard API too but I couldn t make it compile:
const browser = await getBrowser()
const context = browser.defaultBrowserContext();
// set clipBoard API permissions
context.clearPermissionOverrides()
context.overridePermissions(config.APPLICATION_URL, ['clipboard-write'])
puppeteerPage = await browser.newPage()
await puppeteerPage.evaluate((textToCopy) =>{
navigator.clipboard.writeText(textToCopy)
}, 'bar')
const input= await puppeteerPage.$(inputSelector)
await input.focus()
await puppeteerPage.evaluate(() =>{
navigator.clipboard.readText()
})
I came up with a funny workaround how to paste a long text into React component in a way that the change would be registered by the component and it would not take insanely long time to type as it normally does with type command:
For text copying I use approach from Puppeteer docs (assume I want to select text from first 2 paragraphs on a page for example). I assume you already know how to set the permissions for clipboard reading and writing (for example one of the answers above shows how to do it).
const fromJSHandle = await page.evaluateHandle(() =>Array.from(document.querySelectorAll('p'))[0])
const toJSHandle = await page.evaluateHandle(() =>Array.from(document.querySelectorAll('p'))[1])
// from puppeteer docs
await page.evaluate((from, to) => {
const selection = from.getRootNode().getSelection();
const range = document.createRange();
range.setStartBefore(from);
range.setEndAfter(to);
selection.removeAllRanges();
selection.addRange(range);
}, fromJSHandle, toJSHandle);
await page.bringToFront();
await page.evaluate(() => {
document.execCommand('copy') // Copy the selected content to the clipboard
return navigator.clipboard.readText() // Obtain the content of the clipboard as a string
})
This approach does not work for pasting (on Mac at least): document.execCommand('paste')
So for pasting I use this:
await page.$eval('#myInput', (el, value) =>{ el.value = value }, myLongText)
await page.type(`#myInput`,' ') // this assumes your app trims the input value in the end so the whitespace doesn't bother you
Without the last typing step (the white space) React does not register change/input event. So after submitting the form (of which the input is part of for example) the input value would still be "".
This is where typing the whitespace comes in - it triggers the change event and we can submit the form.
It seems that one needs to develop quite a bit of ingenuity with Puppeteer to figure out how to work around all the limitations and maintaining some level of developer comfort at the same time.

How to get all console messages with puppeteer? including errors, CSP violations, failed resources, etc

I am fetching a page with puppeteer that has some errors in the browser console but the puppeteer's console event is not being triggered by all of the console messages.
The puppeteer chromium browser shows multiple console messages
However, puppeteer only console logs one thing in node
Here is the script I am currently using:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('console', msg => console.log('PAGE LOG:', msg.text));
await page.goto('https://pagewithsomeconsoleerrors.com');
await browser.close();
})();
Edit: As stated in my comment below, I did try the page.waitFor(5000) command that Everettss recommended but that didn't work.
Edit2: removed spread operator from msg.text as it was there by accident.
Edit3: I opened an issue on github regarding this with similar but different example screenshots: https://github.com/GoogleChrome/puppeteer/issues/1512
The GitHub issue about capturing console erorrs includes a great comment about listening to console and network events. For example, you can register for console output and network responses and failures like this:
page
.on('console', message =>
console.log(`${message.type().substr(0, 3).toUpperCase()} ${message.text()}`))
.on('pageerror', ({ message }) => console.log(message))
.on('response', response =>
console.log(`${response.status()} ${response.url()}`))
.on('requestfailed', request =>
console.log(`${request.failure().errorText} ${request.url()}`))
And get the following output, for example:
200 'http://do.carlosesilva.com/puppeteer/'
LOG This is a standard console.log message
Error: This is an error we are forcibly throwing
at http://do.carlosesilva.com/puppeteer/:22:11
net::ERR_CONNECTION_REFUSED https://do.carlosesilva.com/http-only/sample.png
404 'http://do.carlosesilva.com/puppeteer/this-image-does-not-exist.png'
ERR Failed to load resource: the server responded with a status of 404 (Not Found)
See also types of console messages received with the console event and response, request and failure objects received with other events.
If you want to pimp your output with some colours, you can add chalk, kleur, colorette or others:
const { blue, cyan, green, magenta, red, yellow } = require('colorette')
page
.on('console', message => {
const type = message.type().substr(0, 3).toUpperCase()
const colors = {
LOG: text => text,
ERR: red,
WAR: yellow,
INF: cyan
}
const color = colors[type] || blue
console.log(color(`${type} ${message.text()}`))
})
.on('pageerror', ({ message }) => console.log(red(message)))
.on('response', response =>
console.log(green(`${response.status()} ${response.url()}`)))
.on('requestfailed', request =>
console.log(magenta(`${request.failure().errorText} ${request.url()}`)))
The examples above use Puppeteer API v2.0.0.
Easiest way to capture all console messages is passing the dumpio argument to puppeteer.launch().
From Puppeteer API docs:
dumpio: <boolean> Whether to pipe the browser process stdout and stderr
into process.stdout and process.stderr. Defaults to false.
Example code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
dumpio: true
});
...
})();
You need to set multiple listeners if you want to catch everything. The console event is emitted when javascript within the page calls a console API message (like console.log).
For a full list of the console API take a look at the docs for console on MDN:
https://developer.mozilla.org/en-US/docs/Web/API/Console
The reason you need multiple listeners is because some of what is being logged in the image you posted is not happening within the page.
So for example, to catch the first error in the image, net:: ERR_CONNECTION_REFUSED you would set the listener like so:
page.on('requestfailed', err => console.log(err));
Puppeteer's documentation contains a full list of events. You should take a look at the documentation for the version you are using and look at the different events the Page class will emit as well as what those events will return. The example I've written above will return an instance of Puppeteer's request class.
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#class-page
I extended Ferdinand's great comment code with descriptive errors text from JSHandle consoleMessage object in next comment. So you can catch all the errors from the browser and read all the description like in browser console.
Check it out there https://stackoverflow.com/a/66801550/9026103

puppeteer form submit: evaluate is not a function

the form exists in the page and i am sure.
const form = await page.$('#my-form');
await form.evaluate(form => form.submit());
I get this error:
TypeError: form.evaluate is not a function
EDIT 2019: As mentioned by Kyle, the latest puppeteer have .evaluate method on elementHandle. It's been two years after all.
const tweetHandle = await page.$('.tweet .retweets');
expect(await tweetHandle.evaluate(node => node.innerText)).toBe('10');
You can try it this way,
await page.evaluate(() => {
const element = document.querySelector("#my-form")
element.submit()
});
ElementHandle does not have a .evaluate function property. Check the docs.
For new comers, if you encounter this problem. You're probably using puppeteer 1.19 or lower and need to update npm update puppeteer. Use the API of your version (see links at the top of the page by version).

Puppeteer Element Handle loses context when navigating

What I'm trying to do:
I'm trying to get a screenshot of every element example in my storybooks project. The way I'm trying to do this is by clicking on the element and then taking the screenshot, clicking on the next one, screenshot etc.
Here is the attached code:
test('no visual regression for button', async () => {
const selector = 'a[href*="?selectedKind=Buttons&selectedStory="]';
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://localhost:8080');
let examples = await page.$$(selector);
await examples.map( async(example) => {
await example.click();
const screen = await page.screenshot();
expect(screen).toMatchImageSnapshot();
});
await browser.close();
});
But when I run this code I get the following error:
Protocol error (Runtime.callFunctionOn): Target closed.
at Session._onClosed (../../node_modules/puppeteer/lib/Connection.js:209:23)
at Connection._onClose (../../node_modules/puppeteer/lib/Connection.js:116:15)
at Connection.dispose (../../node_modules/puppeteer/lib/Connection.js:121:10)
at Browser.close (../../node_modules/puppeteer/lib/Browser.js:60:22)
at Object.<anonymous>.test (__tests__/visual.spec.js:21:17)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:169:7)
I believe it is because the element loses its context or something similar and I don't know what methods to use to get around this. Could you provide a deeper explanation or a possible solution? I don't find the API docs helpful at all.
ElementHandle.dispose() is called once page navigation occurs as garbage collection as stated here in the docs. So when you call element.click() it navigates and the rest of the elements no longer point to anything.