I'm learning NestJs and puppeteer.
I tried web crawl and it worked well.
But because of launching and closing headless browser, it takes a lot of response time.
I think it's better launching browser just one time than every launching and closing.
But i don't know how i use constructor in NestJS. It looks different from Vanila javascript.
async crawlData() {
const browser = await launch();
const page = await browser.newPage();
await page.goto("https://ko.reactjs.org/");
await page.screenshot({ path: "./Docs/ko-reactjs-homepage.png" });
await browser.close();
}
Please understand that i'm not native english speaker.
It sounds like you need a custom provider for the browser so it can be opened once and don't need to open it again. Something along the lines of
{
provide: 'PUPPETEER_INSTNACE',
useFactory: async () => await launch()
}
And now in a constructor in a service that belongs to same module you can do #Inject('PUPPETEER_INSTANCE') private readonly puppeteer)
Related
I am currently working on a GitLab CI test environment and I have a test harness which we use to test our SDK. I have gone about setting up a custom event that is fired on the page which designates the end of the test run. In my puppeteer implementation I am wanting to listen for this custom event "TEST_COMPLETE".
I have not been successful in getting this to work so I figured I would at least make sure the custom-event.js example on the puppeteer repo worked and there too I am not seeing what I believe I should be getting. I cloned the main repo below and performed an npm install. When I execute the js test below, setting headless:false and don't close the browser, I do not see any log in console that shows any custom event being fired.
It is my understanding that I should see some console event message with 'fired' and then 'app-ready' event and info, but this is not the case. Even if I interact with the page I don't see anything outside of some 'features_loaded' and 'features_unveil' logs.
https://github.com/puppeteer/puppeteer/blob/main/examples/custom-event.js
Anyone able to get the expected behavior on this code today? Not sure if this worked previously and has broke since or I am just doing something wrong. Any info would be of great help, Thanks!
Not sure if this is what you need, but I can get the message 'TEST_COMPLETE fired.' in Node.js console with this simplified code (puppeteer 8.0.0):
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
try {
const [page] = await browser.pages();
await page.goto('https://example.org/');
await page.exposeFunction('onCustomEvent', async (type) => {
console.log(`${type} fired.`);
await browser.close();
});
await page.evaluate(() => {
document.addEventListener('TEST_COMPLETE', (e) => {
window.onCustomEvent('TEST_COMPLETE');
});
document.dispatchEvent(new Event('TEST_COMPLETE'));
});
} catch (err) { console.error(err); }
I'm trying to do a basic search on FB marketplace with puppeteer(and it was working for me before) but fails recently.
The whole thing fails when it gets to "location" link on marketplace page. to change the location i need to click on it, but puppeteer Errors out saying:
Error: Node is either not visible or not an HTMLElement
If i try to get the boundingBox of the element it returns null
const browser = await puppeteer.launch();
const page = await browser.newPage();
const resp = await page.goto('https://www.facebook.com/marketplace', { waitUntil: 'networkidle2' })
const withinLink = await page.waitForXPath('//span[contains(.,"Within")]', { timeout: 4000 })
console.log(await withinLink.boundingBox()) //returns null
await withinLink.click() //errors out
If i take a screenshot of the page right before i locate an element it is clearly there and i am able to locate in in chrome console using the same xPath manually.
It just doesn't seem to work in puppeteer
Something clearly changed on FB. Maybe they started to use some AI technology to detect scraping?
I don't think facebook changed in headless browser detection lately, but it seems you haven't taken into account that const withinLink = await page.waitForXPath('//span[contains(.,"Within")]', { timeout: 4000 }) returns an array, even if there is only one matching elment to contains(.,"Within").
That should work if you add [0] index to the elementHandles:
const withinLink = await page.waitForXPath('//span[contains(.,"Within")]')
console.log(await withinLink[0].boundingBox())
await withinLink[0].click()
Note: Timeout is not mandatory in waitForXPath, but I'd suggest to rather use domcontentloaded instead of networkidle2 in page.goto if you don't need all analytics/tracking events to achive the desired results, it just slows down your script execution.
Note 2: Honestly, I don't have such element on my fb platform, maybe it is market dependent. But it works with any other XPath selectors with specific content.
The main issue:
We have a lovely little express app, which has been crushing it for months with no issues. We manage our DB connections by opening a connection on demand, but then caching it "per request" using the cls-hooked library. Upon the request ending, we release the connection so our connection pool doesn't run out. Classic. Over the course of months and many connections, we've never "leaked" connections. Until now! Enter... slack! We are using the slack event handler as follows:
app.use('/webhooks/slack', slackEventHandler.expressMiddleware());
and we sort of think of it like any other request, however slack requests seem to play weirdly with our cls-hooked usage. For example, we use node-ts and nodemon to run our app locally (e.g. you change code, the app restarts automatically). Every time the app restarts locally on our dev machines, and you try and play with slack events, suddenly when our middleware that releases the connection tries to do so, it thinks there is nothing in session. When you then use a normal endpoint... it works fine and essentially seems to reset slack to working okay again. We are now scared to go to prod with our slack integration, because we're worried our slack "requests" are going to starve our connection pool.
Background
Relevant subset of our package.json:
{
"#slack/events-api": "^2.3.2",
"#slack/web-api": "^5.8.0",
"express": "~4.16.1",
"cls-hooked": "^4.2.2",
"mysql2": "^2.0.0",
}
The middleware that makes the cls-hooked session
import { session } from '../db';
const context = (req, res, next) => {
session.run(() => {
session.bindEmitter(req);
session.bindEmitter(res);
next();
});
};
export default context;
The middleware that releases our connections
export const dbReleaseMiddleware = async (req, res, next) => {
res.on('finish', async () => {
const conn = session.get('conn');
if (conn) {
incrementConnsReleased();
await conn.release();
}
});
next();
};
the code that creates the connection on demand and stores it in "session"
const poolConn = await pool.getConnection();
if (session.active) {
session.set('conn', poolConn);
}
return poolConn;
the code that sets up the session in the first place
export const session = clsHooked.createNamespace('our_company_name');
If you got this far, congrats. Any help appreciated!
Side note: you couldn't pay me to write a more confusing title...
Figured it out! It seems we have identified the following behavior in the node version of slack's API (seems to only happen on mac computers... sometimes)
The issue is that this is in the context of an express app, so Slack is managing the interface between its own event handler system + the http side of things with express (e.g. returning 200, or 500, or whatever). So what seems to happen is...
// you have some slack event handler
slackEventHandler.on('message', async (rawEvent: any) => {
const i = 0;
i = i + 1;
// at this point, the http request has not returned 200, it is "pending" from express's POV
await myService.someMethod();
// ^^ while this was doing its async thing, the express request returned 200.
// so things like res.on('finished') all fired and all your middleware happened
// but your event handler code is still going
});
So we ended up creating a manual call to release connections in our slack event handlers. Weird!
What I'm trying to do:
I'm trying to get a screenshot of every element example in my storybooks project. The way I'm trying to do this is by clicking on the element and then taking the screenshot, clicking on the next one, screenshot etc.
Here is the attached code:
test('no visual regression for button', async () => {
const selector = 'a[href*="?selectedKind=Buttons&selectedStory="]';
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://localhost:8080');
let examples = await page.$$(selector);
await examples.map( async(example) => {
await example.click();
const screen = await page.screenshot();
expect(screen).toMatchImageSnapshot();
});
await browser.close();
});
But when I run this code I get the following error:
Protocol error (Runtime.callFunctionOn): Target closed.
at Session._onClosed (../../node_modules/puppeteer/lib/Connection.js:209:23)
at Connection._onClose (../../node_modules/puppeteer/lib/Connection.js:116:15)
at Connection.dispose (../../node_modules/puppeteer/lib/Connection.js:121:10)
at Browser.close (../../node_modules/puppeteer/lib/Browser.js:60:22)
at Object.<anonymous>.test (__tests__/visual.spec.js:21:17)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:169:7)
I believe it is because the element loses its context or something similar and I don't know what methods to use to get around this. Could you provide a deeper explanation or a possible solution? I don't find the API docs helpful at all.
ElementHandle.dispose() is called once page navigation occurs as garbage collection as stated here in the docs. So when you call element.click() it navigates and the rest of the elements no longer point to anything.
I am trying to use these two methods (of WP 8) in windows phone 8.1, but it gives error and doesn't compile, most probably becasue they are removed. I tried searching the new APIs but couldn't get any. What are other alternatives for these.
Dispatcher.BeginInvoke( () => {}); msdn link
System.Threading.Thread.Sleep(); msdn link
They still exists for Windows Phone 8.1 SIlverlight Apps, but not for Windows Phone Store Apps. The replacements for Windows Store Apps is:
Sleep (see Thread.Sleep replacement in .NET for Windows Store):
await System.Threading.Tasks.Task.Delay(TimeSpan.FromSeconds(30));
Dispatcher (see How the Deployment.Current.Dispatcher.BeginInvoke work in windows store app?):
CoreDispatcher dispatcher = CoreWindow.GetForCurrentThread().Dispatcher;
await dispatcher.RunAsync(CoreDispatcherPriority.Normal, () => { });
Dispatcher.BeginInvoke( () => {}); is replaced by
await this.Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, async () => {});
and System.Threading.Thread.Sleep(); is replaced by
await Task.Delay(TimeSpan.FromSeconds(doubleValue));
Be aware that not only has the API changed (adopting the API from WindowsStore apps), but the way that the Dispatcher was obtained in windowsPhone 8.0 has changed as well.
#Johan Faulk's suggestion, although will work, may return null under a multitude of conditions.
Old code to grab the dispatcher:
var dispatcher = Deployment.Current.Dispatcher;
or
Deployment.Current.Dispatcher.BeginInvoke(()=>{
// any code to modify UI or UI bound elements goes here
});
New in Windows 8.1 Deployment is not an available object or namespace.
In order to make sure the Main UI Thread dispatcher is obtained, use the following:
var dispatcher = CoreApplication.MainView.CoreWindow.Dispatcher;
or
CoreApplication.MainWindow.CoreWindow.Dispatcher.RunAsync(
CoreDispatcherPriority.Normal,
()=>{
// UI code goes here
});
Additionally, although the method SAYS it will be executed Async the keyword await can not be used in the method invoked by RunAsync. (in the above example the method is anonymous).
In order to execute an awaitable method inside anonymous method above, decorate the anonymous method inside RunAsync() with the async keyword.
CoreApplication.MainWindow.CoreWindow.Dispatcher.RunAsync(
CoreDispatcherPriority.Normal,
**async**()=>{
// UI code goes here
var response = **await** LongRunningMethodAsync();
});
For Dispatcher, try this. MSDN
private async Task MyMethod()
{
await Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () => { });
}
For Thread.Sleep() try await Task.Delay(1000). MSDN