pyppeteer wait until all elements of page is loaded

pyppeteer wait until all elements of page is loaded - puppeteer

I am using pyppeteer to trigger headless chrome and perform some actions. But first I want all the elements of the web page to load completely. The official documentation of pyppeteer suggests a waitUntil parameter which comes with more than 1 parameters.
My doubt is do i have to pass all the parameters or any one in particular is sufficient? Please suggest if following snippet helps in my case?
await page.goto(url, {'waitUntil' : ['load', 'domcontentloaded', 'networkidle0', 'networkidle2']})

No, you don't have to pass all possible options to 'waitUntil'. You can pick either of them, or more options at the same time if you like, but if you are:
not deailing with a single-page app,
not interested in all network connections (like 3rd party trackings for example)
then you are good to go with: 'domcontentloaded' to wait for all the elements to be rendered on the page.
await page.goto(url, {'waitUntil' : 'domcontentloaded'})
The options in details:
load: when load event is fired.
domcontentloaded: when the DOMContentLoaded event is fired.
networkidle0: when there are no more than 0 network connections
for at least 500 ms.
networkidle2: when there are no more than 2 network connections
for at least 500 ms.
[source]
Note: of course it is true for the NodeJs puppeteer library as well, they work the same way in terms of waitUntil.

Related

How to stop http request from dash_renderer

I am trying a build a realtime monitoring system for high frequency data. To increase the performance, I used the extendData property of dcc.Graph() and websocket. So that, the brouser does not need to send request to get data.
I found that it still not increasing the performance as expected. The reason I found is, from the browser, I see (by inspecting network from browser) after some miliseconds browser is still sendng request and the initiator is the dash_renderer.
This picture is for a vanilla example just to show even for a textbox example the http request goes on and on. And for my real time websocket dashboard the frequency of requests get very high.
My question is:
What dash_renderer do?
why it is sending http request?
And how to stop that?

If you run Dash in Debug mode, it has a feature called Hot Reloading which regularly (every 3 seconds by default) checks for changes to your codebase and updates your running app if it finds any. That check for updated code is what you're seeing in the network inspection.
To turn it off, either don't run in debug mode or explicitly set dev_tools_hot_reload to False like so:
app.run_server(debug=True, dev_tools_hot_reload=False)

Although it is late, After some experience, my realization is dash is not designed to work with websocket. It uses call-backs which actually sends requests to server and in server, the callback function (which is python) send back some result.
These call-backs are designed to send HTTP request to server.
For high speed data, the websocket should be used with extendTrace method of plotly.js in client side.

Programmatically start the performance profiling in Chrome

Is there a way to start the performance profiling programmatically in Chrome?
I want to run a performance test of my web app several times to get a better estimate of the FPS but manually starting the performance profiling in Chrome is tricky because I'd have to manually align the frame models. (I am using this technique to extract the frames)
CMD + Shift + E reloads the page and immediately starts the profiling, which alleviates the alignment problem but it only runs for 3 seconds as explained here. So this doesn't work.
Ideally, I'd like to click on a button to start my test script and also starts the profiling. Is there a way to achieve that?

in case you're still interested, or someone else may find it helpful, there's an easy way to achieve this using Puppeteer's tracing class.
Puppeteer uses Chrome DevTools Protocol's Tracing Domain under the hood, and writes a JSON file to your system that can be loaded in the dev tools performance panel.
To get a profile trace of your page's loading time you can implement the following:
const puppeteer = require('puppeteer');
(async () => {
// launch puppeteer browser in headful mode
browser = await puppeteer.launch({
headless: false,
devtools: true
});
// start a page instance in the browser
page = await browser.newPage();
// start the profiling, with a path to the out file and screenshots collected
await page.tracing.start({
path: `tests/logs/trace-${new Date().getTime()}.json`,
screenshots: true
});
// go to the page
await page.goto('http://localhost:8080');
// wait for as long as you want
await page.waitFor(4000);
// or you can wait for an element to appear with:
// await page.waitForSelector('some-css-selector');
// stop the tracing
await page.tracing.stop();
// close the browser
await browser.close();
})();
Of course, you'll have to install Puppeteer first (npm i puppeteer). If you don't want to use Puppeteer you can interact with Chrome DevTools Protocol's API directly (see link above). I didn't investigate that option very much since Puppeteer delivers a high level and easy to use API over CDP's API. You can also interact directly with CDP via Puppeteer's CDPSession API.
Hope this helps. Good luck!

You can use the chrome devtools protocol and use any driver library from here https://github.com/ChromeDevTools/awesome-chrome-devtools#protocol-driver-libraries to programmatically create a profile.
Use this method - https://chromedevtools.github.io/devtools-protocol/tot/Profiler#method-start to start a profile.

How to get content on timeout in puppeteer (headless chrome)?

We are using puppeteer to run automated tests on hundreds of websites and URLs. Some of those websites are very slow and run into a timeout. That is often the case because there is an ad that does not finish loading. So increasing the timeout is not an option.
Is there a way to get the currently rendered HTML (DOM) at the moment the timeout is happening? page.content() is only returning a promise that is still pending.

You might be able to use something like evaluate, which injects a custom JavaScript function to execute. However, if the thread is truly "locked" then it'll likely run into the same issue.
const body = await page.evaluate(() => document.documentElement.outerHTML);
You might also need to be a little more flexible on how you orchestrate the script by catching goto timeouts and then trying the above.

How can I configure Polymer's platinum-sw-* to NOT cache one URL path?

How can I configure Polymer's platinum-sw-cache or platinum-sw-fetch to cache all URL paths except for /_api, which is the URL for Hoodie's API? I've configured a platinum-sw-fetch element to handle the /_api path, then platinum-sw-cache to handle the rest of the paths, as follows:
<platinum-sw-register auto-register
clients-claim
skip-waiting
on-service-worker-installed="displayInstalledToast">
<platinum-sw-import-script href="custom-fetch-handler.js"></platinum-sw-import-script>
<platinum-sw-fetch handler="HoodieAPIFetchHandler"
path="/_api(.*)"></platinum-sw-fetch>
<platinum-sw-cache default-cache-strategy="networkFirst"
precache-file="precache.json"/>
</platinum-sw-cache>
</platinum-sw-register>
custom-fetch-handler.js contains the following. Its intent is simply to return the results of the request the way the browser would if the service worker was not handling the request.
var HoodieAPIFetchHandler = function(request, values, options){
return fetch(request);
}
What doesn't seem to be working correctly is that after user 1 has signed in, then signed out, then user 2 signs in, then in Chrome Dev Tools' Network tab I can see that Hoodie regularly continues to make requests to BOTH users' API endpoints like the following:
http://localhost:3000/_api/?hoodieId=uw9rl3p
http://localhost:3000/_api/?hoodieId=noaothq
Instead, it should be making requests to only ONE of these API endpoints. In the Network tab, each of these URLs appears twice in a row, and in the "Size" column the first request says "(from ServiceWorker)," and the second request states the response size in bytes, in case that's relevant.
The other problem which seems related is that when I sign in as user 2 and submit a form, the app writes to user 1's database on the server side. This makes me think the problem is due to the app not being able to bypass the cache for the /_api route.
Should I not have used both platinum-sw-cache and platinum-sw-fetch within one platinum-sw-register element, since the docs state they are alternatives to each other?

In general, what you're doing should work, and it's a legitimate approach to take.
If there's an HTTP request made that matches a path defined in <platinum-sw-fetch>, then that custom handler will be used, and the default handler (in this case, the networkFirst implementation) won't run. The HTTP request can only be responded to once, so there's no chance of multiple handlers taking effect.
I ran some local samples and confirmed that my <platinum-sw-fetch> handler was properly intercepting requests. When debugging this locally, it's useful to either add in a console.log() within your custom handler and check for those logs via the chrome://serviceworker-internals Inspect interface, or to use the same interface to set some breakpoints within your handler.
What you're seeing in the Network tab of the controlled page is expected—the service worker's network interactions are logged there, whether they come from your custom HoodieAPIFetchHandler or the default networkFirst handler. The network interactions from the perspective of the controlled page are also logged—they don't always correspond one-to-one with the service worker's activity, so logging both does come in handy at times.
So I would recommend looking deeper into the reason why your application is making multiple requests. It's always tricky thinking about caching personalized resources, and there are several ways that you can get into trouble if you end up caching resources that are personalized for a different user. Take a look at the line of code that's firing off the second /_api/ request and see if it's coming from an cached resource that needs to be cleared when your users log out. <platinum-sw> uses the sw-toolbox library under the hood, and you can make use of its uncache() method directly within your custom handler scripts to perform cache maintenance.

Understanding New Relic's RUM with AJAX requests

I'm looking to do some performance tuning based on what I am seeing in New Relic's RUM output, but I need to understand the following first.
I have a page that loads up a KendoUI grid. The grid is configured to load it's data asynchronously. So the page loads and the user gets to see the grid layout. In a few milliseconds the grid displays the "loading" graphic, while it then waits for the async request for the data, which comes back as Json, following which the "loading" graphic is replaced with actual data.
I need to understand if this async loading of the data for the grid (or any other $.ajax() request for that matter) in any way affects New Relic's RUM output?
Specifically, the RUM is reporting a certain time for DOM Processing, and a certain time for Page Rendering. In which one of those two numbers will the async request be reported (if at all)?

In general, if anything happens after the Load() event, then the New Relic RUM (Real User Monitoring) will not capture this activity.
For example, if you take a look at your network (or waterfall) view in a browser you can see exactly when this Load event fires and when your resources are loaded in the context of this event.
Most likely, your async assets will (and should) be gathered after this Load() event and will not be included in RUM metrics. This blog article has a nice breakdown of how to tune this type of metric (and how New Relic has in the past).
"The RUM timer stops when the browser has rendered and the user is able to interact with the page. ... It’s up to you to decide what that means, and adjust your ... code accordingly."
http://blog.newrelic.com/2012/05/10/how-we-tune-our-own-app-using-rum-data/
We are also investigating adding support for Ajax instrumentation so you can get additional visibility into this activity.

New Relic has recently updated their monitoring to add support for measuring AJAX requests. You can find the release blog here:
New Relic AJAX Support Release Blog
and the docs here:
Enabling New Relic Browser AJAX Monitoring

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008