How to get content on timeout in puppeteer (headless chrome)? - puppeteer

We are using puppeteer to run automated tests on hundreds of websites and URLs. Some of those websites are very slow and run into a timeout. That is often the case because there is an ad that does not finish loading. So increasing the timeout is not an option.
Is there a way to get the currently rendered HTML (DOM) at the moment the timeout is happening? page.content() is only returning a promise that is still pending.

You might be able to use something like evaluate, which injects a custom JavaScript function to execute. However, if the thread is truly "locked" then it'll likely run into the same issue.
const body = await page.evaluate(() => document.documentElement.outerHTML);
You might also need to be a little more flexible on how you orchestrate the script by catching goto timeouts and then trying the above.

Related

pyppeteer wait until all elements of page is loaded

I am using pyppeteer to trigger headless chrome and perform some actions. But first I want all the elements of the web page to load completely. The official documentation of pyppeteer suggests a waitUntil parameter which comes with more than 1 parameters.
My doubt is do i have to pass all the parameters or any one in particular is sufficient? Please suggest if following snippet helps in my case?
await page.goto(url, {'waitUntil' : ['load', 'domcontentloaded', 'networkidle0', 'networkidle2']})
No, you don't have to pass all possible options to 'waitUntil'. You can pick either of them, or more options at the same time if you like, but if you are:
not deailing with a single-page app,
not interested in all network connections (like 3rd party trackings for example)
then you are good to go with: 'domcontentloaded' to wait for all the elements to be rendered on the page.
await page.goto(url, {'waitUntil' : 'domcontentloaded'})
The options in details:
load: when load event is fired.
domcontentloaded: when the DOMContentLoaded event is fired.
networkidle0: when there are no more than 0 network connections
for at least 500 ms.
networkidle2: when there are no more than 2 network connections
for at least 500 ms.
[source]
Note: of course it is true for the NodeJs puppeteer library as well, they work the same way in terms of waitUntil.

Programmatically start the performance profiling in Chrome

Is there a way to start the performance profiling programmatically in Chrome?
I want to run a performance test of my web app several times to get a better estimate of the FPS but manually starting the performance profiling in Chrome is tricky because I'd have to manually align the frame models. (I am using this technique to extract the frames)
CMD + Shift + E reloads the page and immediately starts the profiling, which alleviates the alignment problem but it only runs for 3 seconds as explained here. So this doesn't work.
Ideally, I'd like to click on a button to start my test script and also starts the profiling. Is there a way to achieve that?
in case you're still interested, or someone else may find it helpful, there's an easy way to achieve this using Puppeteer's tracing class.
Puppeteer uses Chrome DevTools Protocol's Tracing Domain under the hood, and writes a JSON file to your system that can be loaded in the dev tools performance panel.
To get a profile trace of your page's loading time you can implement the following:
const puppeteer = require('puppeteer');
(async () => {
// launch puppeteer browser in headful mode
browser = await puppeteer.launch({
headless: false,
devtools: true
});
// start a page instance in the browser
page = await browser.newPage();
// start the profiling, with a path to the out file and screenshots collected
await page.tracing.start({
path: `tests/logs/trace-${new Date().getTime()}.json`,
screenshots: true
});
// go to the page
await page.goto('http://localhost:8080');
// wait for as long as you want
await page.waitFor(4000);
// or you can wait for an element to appear with:
// await page.waitForSelector('some-css-selector');
// stop the tracing
await page.tracing.stop();
// close the browser
await browser.close();
})();
Of course, you'll have to install Puppeteer first (npm i puppeteer). If you don't want to use Puppeteer you can interact with Chrome DevTools Protocol's API directly (see link above). I didn't investigate that option very much since Puppeteer delivers a high level and easy to use API over CDP's API. You can also interact directly with CDP via Puppeteer's CDPSession API.
Hope this helps. Good luck!
You can use the chrome devtools protocol and use any driver library from here https://github.com/ChromeDevTools/awesome-chrome-devtools#protocol-driver-libraries to programmatically create a profile.
Use this method - https://chromedevtools.github.io/devtools-protocol/tot/Profiler#method-start to start a profile.

IMobileServiceSyncTable PullAsync doesn't return

I currently use Azure Mobile Services with Offline Sync and I it has been working fine. However I now have come to a problem I can't seem to debug. On the PullAsync it never returns, never goes to the Web API, it never errors, it just seems to be stuck somewhere and I don't know where.
IMobileServiceSyncTable<ResponseType> responseTypeTable = MobileService.GetSyncTable<ResponseType>();
await responseTypeTable.PullAsync(responseTypeTable.Where(c => c.CompanyId == companyId));
I use identical code elsewhere with a different type and it works well.
The only thing that happens is the Windows Phone emulator UI locks up, I can press buttons on the keyboard but the input or buttons are all frozen.
I get this on the Debug Output
The thread 0xb80 has exited with code 259 (0x103).
After a 5 seconds and that's about it. Breakpoints everywhere, nothing happening.
The method was in a Command (I'm using MVVMLight). When I call the function on the class initialization and just hold the value it works fine. There is obviously some bug that occurs when calling PullAsync on an event, in an async RelayCommand but getting the call out of there solves the issue.
I'll leave it at that unless anyone comes back with why it is actually happening. This is just a workaround at the moment.

How to extend AFNetworking 2.0 to perform request combining

I have a UI where the same image URL could be requested by several UIImageViews at varying times. Obviously if a request from one of them has finished then returning the cached version works as expected. However, especially with slower networks, I'd like to be able to piggy-back requests for an image URL onto any currently running/waiting HTTP request for the same URL.
On an HTTP server this called request combining and I'd love to do the same in the client - to combine the different requests for the same URL into a single request and then callback separately to each of the callers). The requests for that URL dont happen to start at the same time.
What's the best way to accomplish this?
I think re-writing UIImageView+AFNetworking might be the easiest way:
check the af_sharedImageRequestOperationQueue to see if it has an operation with the same request
if I do already have an operation in the queue or running then add myself to some list of callbacks/blocks to be called on success/failure
if I don't have the operation, then create it as normal
in the setCompletionBlockWithSuccess to call each of the blocks in turn.
Any simpler alternatives?
I encountered a similar problem and decided that your way was the most straightforward. One added bit of complexity is that these downloads require special credentials and so must go through their own operation queue. Here's the code from my UIImageView category to check whether a particular URL is inflight:
NSUInteger foundOperation = [[ConnectionManager sharedConnectionManager].operationQueue.operations indexOfObjectPassingTest:^BOOL(AFHTTPRequestOperation *obj, NSUInteger idx, BOOL *stop) {
BOOL URLAlreadyInFlight = [obj.request.URL.absoluteString isEqualToString:URL.absoluteString];
if (URLAlreadyInFlight) {
NSBlockOperation *updateUIOperation = [NSBlockOperation blockOperationWithBlock:^{
[[NSOperationQueue mainQueue] addOperationWithBlock:^{
self.image = [[ImageCache sharedImageCache] cachedImageForURL:URL];
}];
}];
//Makes updating the UI dependent on the completion of the matching operation.
[updateUIOperation addDependency:obj];
}
return URLAlreadyInFlight;
}];
Were you able to come up with a better solution?
EDIT: Well, it looks like my method of updating the UI just can't work, as the operation's completion blocks are run asynchronously, so the operation finishes before the blocks are run. However, I was able to modify the image cache to be able to add callbacks for when certain URLs are cached, which seems to work correctly. So this method will properly detect when certain URLs are in flight and be able to take action with that knowledge.

How does facebook, gmail send the real time notification?

I have read some posts about this topic and the answers are comet, reverse ajax, http streaming, server push, etc.
How does incoming mail notification on Gmail works?
How is GMail Chat able to make AJAX requests without client interaction?
I would like to know if there are any code references that I can follow to write a very simple example. Many posts or websites just talk about the technology. It is hard to find a complete sample code. Also, it seems many methods can be used to implement the comet, e.g. Hidden IFrame, XMLHttpRequest. In my opinion, using XMLHttpRequest is a better choice. What do you think of the pros and cons of different methods? Which one does Gmail use?
I know it needs to do it both in server side and client side.
Is there any PHP and Javascript sample code?
The way Facebook does this is pretty interesting.
A common method of doing such notifications is to poll a script on the server (using AJAX) on a given interval (perhaps every few seconds), to check if something has happened. However, this can be pretty network intensive, and you often make pointless requests, because nothing has happened.
The way Facebook does it is using the comet approach, rather than polling on an interval, as soon as one poll completes, it issues another one. However, each request to the script on the server has an extremely long timeout, and the server only responds to the request once something has happened. You can see this happening if you bring up Firebug's Console tab while on Facebook, with requests to a script possibly taking minutes. It is quite ingenious really, since this method cuts down immediately on both the number of requests, and how often you have to send them. You effectively now have an event framework that allows the server to 'fire' events.
Behind this, in terms of the actual content returned from those polls, it's a JSON response, with what appears to be a list of events, and info about them. It's minified though, so is a bit hard to read.
In terms of the actual technology, AJAX is the way to go here, because you can control request timeouts, and many other things. I'd recommend (Stack overflow cliche here) using jQuery to do the AJAX, it'll take a lot of the cross-compability problems away. In terms of PHP, you could simply poll an event log database table in your PHP script, and only return to the client when something happens? There are, I expect, many ways of implementing this.
Implementing:
Server Side:
There appear to be a few implementations of comet libraries in PHP, but to be honest, it really is very simple, something perhaps like the following pseudocode:
while(!has_event_happened()) {
sleep(5);
}
echo json_encode(get_events());
The has_event_happened function would just check if anything had happened in an events table or something, and then the get_events function would return a list of the new rows in the table? Depends on the context of the problem really.
Don't forget to change your PHP max execution time, otherwise it will timeout early!
Client Side:
Take a look at the jQuery plugin for doing Comet interaction:
Project homepage: http://plugins.jquery.com/project/Comet
Google Code: https://code.google.com/archive/p/jquerycomet/ - Appears to have some sort of example usage in the subversion repository.
That said, the plugin seems to add a fair bit of complexity, it really is very simple on the client, perhaps (with jQuery) something like:
function doPoll() {
$.get("events.php", {}, function(result) {
$.each(result.events, function(event) { //iterate over the events
//do something with your event
});
doPoll();
//this effectively causes the poll to run again as
//soon as the response comes back
}, 'json');
}
$(document).ready(function() {
$.ajaxSetup({
timeout: 1000*60//set a global AJAX timeout of a minute
});
doPoll(); // do the first poll
});
The whole thing depends a lot on how your existing architecture is put together.
Update
As I continue to recieve upvotes on this, I think it is reasonable to remember that this answer is 4 years old. Web has grown in a really fast pace, so please be mindful about this answer.
I had the same issue recently and researched about the subject.
The solution given is called long polling, and to correctly use it you must be sure that your AJAX request has a "large" timeout and to always make this request after the current ends (timeout, error or success).
Long Polling - Client
Here, to keep code short, I will use jQuery:
function pollTask() {
$.ajax({
url: '/api/Polling',
async: true, // by default, it's async, but...
dataType: 'json', // or the dataType you are working with
timeout: 10000, // IMPORTANT! this is a 10 seconds timeout
cache: false
}).done(function (eventList) {
// Handle your data here
var data;
for (var eventName in eventList) {
data = eventList[eventName];
dispatcher.handle(eventName, data); // handle the `eventName` with `data`
}
}).always(pollTask);
}
It is important to remember that (from jQuery docs):
In jQuery 1.4.x and below, the XMLHttpRequest object will be in an
invalid state if the request times out; accessing any object members
may throw an exception. In Firefox 3.0+ only, script and JSONP
requests cannot be cancelled by a timeout; the script will run even if
it arrives after the timeout period.
Long Polling - Server
It is not in any specific language, but it would be something like this:
function handleRequest () {
while (!anythingHappened() || hasTimedOut()) { sleep(2); }
return events();
}
Here, hasTimedOut will make sure your code does not wait forever, and anythingHappened, will check if any event happend. The sleep is for releasing your thread to do other stuff while nothing happens. The events will return a dictionary of events (or any other data structure you may prefer) in JSON format (or any other you prefer).
It surely solves the problem, but, if you are concerned about scalability and perfomance as I was when researching, you might consider another solution I found.
Solution
Use sockets!
On client side, to avoid any compatibility issues, use socket.io. It tries to use socket directly, and have fallbacks to other solutions when sockets are not available.
On server side, create a server using NodeJS (example here). The client will subscribe to this channel (observer) created with the server. Whenever a notification has to be sent, it is published in this channel and the subscriptor (client) gets notified.
If you don't like this solution, try APE (Ajax Push Engine).
Hope I helped.
According to a slideshow about Facebook's Messaging system, Facebook uses the comet technology to "push" message to web browsers. Facebook's comet server is built on the open sourced Erlang web server mochiweb.
In the picture below, the phrase "channel clusters" means "comet servers".
Many other big web sites build their own comet server, because there are differences between every company's need. But build your own comet server on a open source comet server is a good approach.
You can try icomet, a C1000K C++ comet server built with libevent. icomet also provides a JavaScript library, it is easy to use as simple as:
var comet = new iComet({
sign_url: 'http://' + app_host + '/sign?obj=' + obj,
sub_url: 'http://' + icomet_host + '/sub',
callback: function(msg){
// on server push
alert(msg.content);
}
});
icomet supports a wide range of Browsers and OSes, including Safari(iOS, Mac), IEs(Windows), Firefox, Chrome, etc.
Facebook uses MQTT instead of HTTP. Push is better than polling.
Through HTTP we need to poll the server continuously but via MQTT server pushes the message to clients.
Comparision between MQTT and HTTP: http://www.youtube.com/watch?v=-KNPXPmx88E
Note: my answers best fits for mobile devices.
One important issue with long polling is error handling.
There are two types of errors:
The request might timeout in which case the client should reestablish the connection immediately. This is a normal event in long polling when no messages have arrived.
A network error or an execution error. This is an actual error which the client should gracefully accept and wait for the server to come back on-line.
The main issue is that if your error handler reestablishes the connection immediately also for a type 2 error, the clients would DOS the server.
Both answers with code sample miss this.
function longPoll() {
var shouldDelay = false;
$.ajax({
url: 'poll.php',
async: true, // by default, it's async, but...
dataType: 'json', // or the dataType you are working with
timeout: 10000, // IMPORTANT! this is a 10 seconds timeout
cache: false
}).done(function (data, textStatus, jqXHR) {
// do something with data...
}).fail(function (jqXHR, textStatus, errorThrown ) {
shouldDelay = textStatus !== "timeout";
}).always(function() {
// in case of network error. throttle otherwise we DOS ourselves. If it was a timeout, its normal operation. go again.
var delay = shouldDelay ? 10000: 0;
window.setTimeout(longPoll, delay);
});
}
longPoll(); //fire first handler