Communicate "out" from Chromium via DevTools protocol - google-chrome

I have a page running in a headless Chromium instance, and I'm manipulating it via the DevTools protocol, using the Puppeteer NPM package in Node.
I'm injecting a script into the page. At some point, I want the script to call me back and send me some information (via some event exposed by the DevTools protocol or some other means).
What is the best way to do this? It'd be great if it can be done using Puppeteer, but I'm not against getting my hands dirty and listening for protocol messages by hand.
I know I can sort-of do this by manipulating the DOM and listening to DOM changes, but that doesn't sound like a good idea.

Okay, I've discovered a built-in way to do this in Puppeteer. Puppeteer defines a method called exposeFunction.
page.exposeFunction(name, puppeteerFunction)
This method defines a function with the given name on the window object of the page. The function is async on the page's side. When it's called, the puppeteerFunction you define is executed as a callback, with the same arguments. The arguments aren't JSON-serialized, but passed as JSHandles so they expose the objects themselves. Personally, I chose to JSON-serialize the values before sending them.
I've looked at the code, and it actually just works by sending console messages, just like in Pasi's answer, which the Puppeteer console hooks ignore. However, if you listen to the console directly (i.e. by piping stdout). You'll still see them, along with the regular messages.
Since the console information is actually sent by WebSocket, it's pretty efficient. I was a bit averse to using it because in most processes, the console transfers data via stdout which has issues.
Example
Node
async function example() {
const puppeteer = require("puppeteer");
let browser = await puppeteer.launch({
//arguments
});
let page = await browser.newPage();
await page.exposeFunction("callPuppeteer", function(data) {
console.log("Node receives some data!", data);
});
await page.goto("http://www.example.com/target");
}
Page
Inside the page's javascript:
window.callPuppeteer(JSON.stringify({
thisCameFromThePage : "hello!"
}));
Update: DevTools protocol support
There is DevTools protocol support for something like puppeteer.exposeFunction.
https://chromedevtools.github.io/devtools-protocol/tot/Runtime#method-addBinding
If executionContextId is empty, adds binding with the given name on
the global objects of all inspected contexts, including those created
later, bindings survive reloads. If executionContextId is specified,
adds binding only on global object of given execution context. Binding
function takes exactly one argument, this argument should be string,
in case of any other input, function throws an exception. Each binding
function call produces Runtime.bindingCalled notification.
.

If the script sends all its data back in one call, the simplest approach would be to use page.evaluate and return a Promise from it:
const dataBack = page.evaluate(`new Promise((resolve, reject) => {
setTimeout(() => resolve('some data'), 1000)
})`)
dataBack.then(value => { console.log('got data back', value) })
This could be generalized to sending data back twice, etc. For sending back an arbitrary stream of events, perhaps console.log would be slightly less of a hack than DOM events? At least it's super-easy to do with Puppeteer:
page.on('console', message => {
if (message.text.startsWith('dataFromMyScript')) {
message.args[1].jsonValue().then(value => console.log('got data back', value))
}
})
page.evaluate(`setInterval(() => console.log('dataFromMyScript', {ts: Date.now()}), 1000)`)
(The example uses a magic prefix to distinguish these log messages from all others.)

Related

Target closed outside of normal flow

I'm trying to automate a process with puppeteer. When adding a new feature that implied the usage of a new tab opened in a different window, I started having a Target closed error (stack below). I'm familiar with this error in other situations, but now I don't have a clue as to why this is happening. The version of puppeteer I'm using is 19.0.0.
This is the error stack:
Target closed
at node_modules/puppeteer-core/src/common/Page.ts:1599:26
at onceHandler (node_modules/puppeteer-core/src/common/EventEmitter.ts:130:7)
at node_modules/puppeteer-core/lib/cjs/third_party/mitt/index.js:3:232
at Array.map (<anonymous>)
at Object.emit (node_modules/puppeteer-core/lib/cjs/third_party/mitt/index.js:3:216)
at CDPSessionImpl.emit (node_modules/puppeteer-core/src/common/EventEmitter.ts:118:18)
at CDPSessionImpl._onClosed (node_modules/puppeteer-core/src/common/Connection.ts:457:10)
at Connection.onMessage (node_modules/puppeteer-core/src/common/Connection.ts:164:17)
at WebSocket.<anonymous> (node_modules/puppeteer-core/src/common/NodeWebSocketTransport.ts:50:24)
at WebSocket.onMessage (node_modules/puppeteer-core/node_modules/ws/lib/event-target.js:199:18)
When skipping the procedure that involves the usage of a second window, the error doesn't show.
This is my cleanup method, which is run when the process finished running:
public async destroy() {
let browserIsConnected: boolean = !!this._browser?.isConnected();
if (this._browser && browserIsConnected) {
for (let pg of await this._browser.pages()) {
this.logger.debug(`Closing page ${pg.url()}`);
await pg.close();
}
this.logger.debug(`Closing browser instance...`);
await this._browser?.close();
this.logger.log(`Closed browser connection`);
} else
this.logger.log(`Browser already destroyed`);
delete this._browser;
}
I tried omitting the page.close() calls but it didn't affect anything, and try/catching every library call in the method, but none throw. When running the code, the error is logged in parallel with this._browser?.close(), framed by the logs above and below it. However, the stack does not relate to the function call and I don't know how to catch it. Other than this, the process runs smoothly and the browser closes successfully, but this error is making my integration tests fail. Sorry about not sharing a reproducible case, but I couldn't reproduce it without disclosing my business logic.
My question is: why is this happening? is there any way to avoid it?
I eventually figured this out, and the source of this problem wasn't in the code described below but in the event handling logic during the process. While waiting the popup to show up, I was doing the following:
page.once('popup', async (newPage: Page) => {
// capture the information inside the page, with many awaits
});
What I didn't know was that mitt, puppeteer's underlying event handling library, doesn't support asynchronous event handlers, not awaiting my event properly. This was solved by resolving the promise from the handler and awaiting it further in the code:
let pagePromiseResolve: Function;
let pagePromise: Promise<Page> = new Promise(resolve => {
pagePromiseResolve = resolve;
});
page.once('popup', newPage => pagePromiseResolve(newPage));
let newPage = await pagePromise;
// capture the information inside the page, with many awaits
I'm leaving this here in case it helps somebody, either with this specific use-case or with awaiting events with a library alike mitt.

use of $timeout with 0 milliseconds

HttpMethod.CallHttpPOSTMethod('POST',null, path).success(function (response) {
console.log(response);
$scope.htmlString = $sce.trustAsHtml(response.data[0]);
$timeout(function () {
var temp = document.getElementById('form');
if (temp != null) {
temp.submit();
}
}, 0);
});
I will get html string in RESPONSE of my API call. And then I will add the html to my view page.
If I write the code outside $timeout service it wont work as it will work when written inside $timeout service.
What is the difference between two ways?
How is $timeout useful here?
When you make any changes to the controller, it does not start asynchronously for two-way binding. If the asynchronous code is wrapped in special ones: `$timeout, $scope.$apply, etc. binding will happen. For the current code example, I would have tried replace you code to:
HttpMethod.CallHttpPOSTMethod('POST',null, path).success(function (response) {
console.log(response);
$scope.htmlString = $sce.trustAsHtml(response.data[0]);
var temp = document.getElementById('form');
if (temp != null) {
temp.submit();
}
$scope.$apply();
});
I tried to give you an answer in very simple language, hope it may help to understand your issue.
Generally, When HTTP request fires to execute it will send to the server and get the data from the server this is the general scenario we have in our mind. There may be a situation occur that sometime due to network latency it may possible to receive response delay.
AngluarJs application has its own lifecycle.
Root scope is created during application bootstrap by the $injector. In template linking, directive binding creates new child scope.
While template linking there is watch registered to particular scope to identify particular changes.
In your case, when template linking and binding directive, there is a new watcher registered. Due to network latency or other reason your $http request sends delay response to your $http request and meanwhile those time scope variable has been changed. due to that, it will not give the updated response.
When you send $http request to a server it is asynchronous operation. When you use $timeout ultimately your scope binding wait to numbers of seconds in $timeout function you defined. After n number of seconds, your scope variable watch has been executed and it will update the value if you get the response in time.

Using Chrome DevTools Protocol Input.dispatchKeyEvent or Input.dispatchMouseEvent to send an event

I'm writing a DSL that will interact with a page via Google Chrome's Remote Debugging API.
The INPUT domain (link here:
https://chromedevtools.github.io/devtools-protocol/1-2/Input/) lists two functions that can be used for sending events: Input.dispatchKeyEvent and Input.dispatchMouseEvent.
I can't seem to figure out how to specify the target element as there is no link between the two functions and DOM.NodeId, or an intermediate API that accepts a DOM.NodeId which then returns an X,Y co-ordinate.
I know that it's possible to use Selenium, but I'm interested in doing directly using WebSockets.
Any help is appreciated.
Brief Intro
I'm currently working on a NodeJS interaction library to work with Chrome Headless via the Remote Debugging Protocol. The idea is to integrate it into my colleague's testing framework to eventually replace the usage of PhantomJS, which is no longer being supported.
Evaluating JavaScript
I'm just experimenting with things currently, but I have a way of evaluating JavaScript on the page, for example, to click on element via a selector reference. It should in theory work for anything assuming my implementation isn't flawed.
let evaluateOnPage: function (fn) {
let args = [...arguments].slice(1).map(a => {
return JSON.stringify(a);
});
let evaluationStr = `
(function() {
let fn = ${String(fn)};
return fn.apply(null, [${args}]);
})()`;
return Runtime.evaluate({expression: evaluationStr});
}
}
The code above will accept a function and any number of arguments. It will turn the arguments into strings, so they are serializable. It then evaluates an IIFE on the page, which calls the function passed in with the arguments.
Example Usage
let selector = '.mySelector';
let result = evaluateOnPage(selector => {
return document.querySelector(selector).click();
}, selector);
The result of Runtime.evaluate is a promise, which when is fulfilled, you can check the result object for a type to determine success or failure. For example, subtype may be node or error.
I hope this may be of some use to you.
this protocol is probably not the best if you are wanting to click on specific elements rather than clicking on spots on the screen...
It's important to keep in mind that this area of the devtools protocol is intended to emulate the raw input. If you want to try and figure out the position of the elements using the protocol or by running some javascript in the page you could do that, however it might be better to use something like target.dispatchEvent() with MouseEvent and inject the javascript into the page instead.

Service Worker not caching API content on first load

I've created a service worker enabled application that is intended to cache the response from an AJAX call so it's viewable offline. The issue I'm running into is that the service worker caches the page, but not the AJAX response the first time it's loaded.
If you visit http://ivesjames.github.io/pwa and switch to airplane mode after the SW toast it shows no API content. If you go back online and load the page and do it again it will load the API content offline on the second load.
This is what I'm using to cache the API response (Taken via the Polymer docs):
(function(global) {
global.untappdFetchHandler = function(request) {
// Attempt to fetch(request). This will always make a network request, and will include the
// full request URL, including the search parameters.
return global.fetch(request).then(function(response) {
if (response.ok) {
// If we got back a successful response, great!
return global.caches.open(global.toolbox.options.cacheName).then(function(cache) {
// First, store the response in the cache, stripping away the search parameters to
// normalize the URL key.
return cache.put(stripSearchParameters(request.url), response.clone()).then(function() {
// Once that entry is written to the cache, return the response to the controlled page.
return response;
});
});
}
// If we got back an error response, raise a new Error, which will trigger the catch().
throw new Error('A response with an error status code was returned.');
}).catch(function(error) {
// This code is executed when there's either a network error or a response with an error
// status code was returned.
return global.caches.open(global.toolbox.options.cacheName).then(function(cache) {
// Normalize the request URL by stripping the search parameters, and then return a
// previously cached response as a fallback.
return cache.match(stripSearchParameters(request.url));
});
});
}
})(self);
And then I define the handler in the sw-import:
<platinum-sw-import-script href="scripts/untappd-fetch-handler.js">
<platinum-sw-fetch handler="untappdFetchHandler"
path="/v4/user/checkins/jimouk?client_id=(apikey)&client_secret=(clientsecret)"
origin="https://api.untappd.com">
</platinum-sw-fetch>
<paper-toast id="caching-complete"
duration="6000"
text="Caching complete! This app will work offline.">
</paper-toast>
<platinum-sw-register auto-register
clients-claim
skip-waiting
base-uri="bower_components/platinum-sw/bootstrap"
on-service-worker-installed="displayInstalledToast">
<platinum-sw-cache default-cache-strategy="fastest"
cache-config-file="cache-config.json">
</platinum-sw-cache>
</platinum-sw-register>
Is there somewhere I'm going wrong? I'm not quite sure why it works on load #2 instead of load #1.
Any help would be appreciated.
While the skip-waiting + clients-claim attributes should cause your service worker to take control as soon as possible, it's still an asynchronous process that might not kick in until after your AJAX request is made. If you want to guarantee that the service worker will be in control of the page, then you'd need to either delay your AJAX request until the service worker has taken control (following, e.g., this technique), or alternatively, you can use the reload-on-install attribute.
Equally important, though, make sure that your <platinum-sw-import-script> and <platinum-sw-fetch> elements are children of your <platinum-sw-register> element, or else they won't have the intended effect. This is called out in the documentation, but unfortunately it's just a silent failure at runtime.

Any workaround for Chrome M40 redirect bug for service workers?

We have images that redirect from our media server to a CDN that I'm trying to exclude from my service worker logic to work around the bug in Chrome 40. In Canary the same worker is able to work just fine. I thought there was an event.default() to fall back to the standard behavior but I don't see that in Chrome's implementation, and reading the spec it seems like the current recommendation is to just use fetch(event.request).
So the problem I have is do I have to wait until 99% of all of our users move to Chrome 41+ in order to use service workers in this scenario, or is there some sort of way I can opt out for certain requests?
The core of my logic is below:
worker.addEventListener('install', function(event){
event.waitUntil(getDefaultCache().then(function(cache){
return cache.addAll(precacheUrls);
}));
});
worker.addEventListener('fetch', function(event){
event.respondWith(getDefaultCache().then(function(cache){
return cache.match(event.request).then(function(response){
if (!response){
return fetch(event.request.clone()).then(function(response){
if (cacheablePatterns.some(function(pattern){
return pattern.test(event.request.url);
})) {
cache.put(event.request, response.clone());
}
return response;
});
}
return response;
});
}));
});
Once you're inside a event.respondWith() you do need to issue a response or you'll incur a Network Error. You're correct that event.default() isn't currently implemented.
A general solution is to not enter the event.respondWith() if you can determine synchronously that you don't want to handle the event. A basic example is something like:
function fetchHandler(event) {
if (event.request.url.indexOf('abc') >= 0) {
event.respondWith(abcResponseLogic);
} else if (event.request.url.indexOf('def') >= 0) {
event.respondWith(defResponseLogic);
}
}
self.addEventListener('fetch', fetchHandler);
If event.respondWith() isn't called, then this fetch handler is a no-op, and any additional registered fetch handlers get a shot at the request. Multiple fetch handlers are called in the order in which they're added via addEventListener, one at a time, until the first one calls event.respondWith().
If no fetch handlers call event.respondWith(), then the user agent makes the request exactly as it normally would if there were no service worker involvement.
The one tricky thing to take into account is that the determination as to whether to call event.respondWith() needs to be done synchronously inside each fetch handler. Anything that relies on asynchronous promise resolution can't be used to determine whether or not to call event.respondWith(). If you attempt to do something asynchronous and then call event.respondWith(), you'll end up with a race condition, and likely will see errors in the service worker console about how you can't respond to an event that was already handled.