How to handle multiple redirection in puppeteer? - puppeteer

I am trying to open a page after a form post inside evaluate. There are 2 redirections after form post which can be any number and then I find a final page.
I tried to handled it by putting below (2 times for 2 redirections) after evaluate in which form post happened.
await page.waitForNavigation({'waitUntil':'domcontentloaded'});
await page.waitForNavigation({'waitUntil':'domcontentloaded'});
The above worked properly but I have to handle the situations when any number of redirections can happen.
I won't have any specific selector on DOM as page might be different many times.
Puppeteer version: 1.4.0
Platform / OS version: Linux
URLs (if applicable): NA
Node.js version: 8.10.0
The below is part of code which I am using:
const formPost = await page.evaluate(a => {
var form = formBuilder("payment_post", "post", acsUrl);
for (var i in a) {
form.add(i, i, 'hidden', a[i]);
}
form.generate("pareqFormContainer");
form.submit();
return document.querySelector('#pareqFormContainer').innerHTML;
}, jsonData)
.then(function () {
logger.info("form submitted with pareq and MD for txnId : " + jsonData.txnId)
});
await page.waitForNavigation({'waitUntil' : 'domcontentloaded', 'timeout' : waitTimeOut});
await page.waitForNavigation({'waitUntil' : 'domcontentloaded', 'timeout' : waitTimeOut});

Related

How to use a Javascript file to refresh/reload a div from an HTML file?

I am using Node JS and have a JS file, which opens a connection to an API, works with the receving API data and then saves the changed data into a JSON file. Next I have an HTML file, which takes the data from the JSON file and puts it into a table. At the end I open the HTML file in my browser to look at the visualized table and its data.
What I would like to happen is, that the table (or more specific a DIV with an ID inside the table) from the HTML file refreshes itself, when the JSON data gets updated from the JS file. Kinda like a "live table/website", that I can watch change over time without the need to presh F5.
Instead of just opening the HTML locally, I have tried it by using the JS file and creating a connection with the file like this:
const http = require('http');
const path = require('path');
const browser = http.createServer(function (request, response) {
var filePath = '.' + request.url;
if (filePath == './') {
filePath = './Table.html';
}
var extname = String(path.extname(filePath)).toLowerCase();
var mimeTypes = {
'.html': 'text/html',
'.css': 'text/css',
'.png': 'image/png',
'.js': 'text/javascript',
'.json': 'application/json'
};
var contentType = mimeTypes[extname] || 'application/octet-stream';
fs.readFile(filePath, function(error, content) {
response.writeHead(200, { 'Content-Type': contentType });
response.end(content, 'utf-8');
});
}).listen(3000);
This creates a working connection and I am able to see it in the browser, but sadly it doesn't update itself like I wish. I thought about some kind of function, which gets called right after the JSON file got saved and tells the div to reload itself.
I also read about something like window.onload, location.load() or getElementById(), but I am not able to figure out the right way.
What can I do?
Thank you.
Websockets!
Though they might sound scary, it's very easy to get started with websockets in NodeJS, especially if you use Socket.io.
You will need two dependencies in your node application:
"socket.io": "^4.1.3",
"socketio-wildcard": "^2.0.0"
your HTML File:
<script type="module" src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.0/socket.io.js"></script>
Your CLIENT SIDE JavaScript file:
var socket = io();
socket.on("update", function (data) { //update can be any sort of string, treat it like an event name
console.log(data);
// the rest of the code to update the html
})
your NODE JS file:
import { Server } from "socket.io";
// other code...
let io = new Server(server);
let activeConnections = {};
io.sockets.on("connection", function (socket) {
// 'connection' is a "magic" key
// track the active connections
activeConnections[socket.id] = socket;
socket.on("disconnect", function () {
/* Not required, but you can add special handling here to prevent errors */
delete activeConnections[socket.id];
})
socket.on("update", (data) => {
// Update is any sort of key
console.log(data)
})
})
// Example with Express
app.get('/some/api/call', function (req, res) {
var data = // your API Processing here
Object.keys(activeConnections).forEach((conn) => {
conn.emit('update', data)
}
res.send(data);
})
Finally, shameful self promotion, here's one of my "dead" side projects using websockets, because I'm sure I forgot some small detail, and this might help. https://github.com/Nhawdge/robert-quest

puppeteer cluster _ how to prevent close page?

I am glad to find the puppeteer cluster. this library made life easy on crawling and automation tasks.tnx to Thomas Dondorf.
according to the author of the puppeteer cluster, when a task finished page will be closed immediately.this is good by the way. but what about some cases that you need to page will be open?
my use case:
I will try to explain briefly:
there is some activity on the page that in the background a socket is involved in for sending some data to the front .this data changes the dome and I need to capture that.
this is my code :
async function runCrawler(){
const links = [
"foo.com/barSome324",
"foo.com/barSome22",
"foo.com/barSome1",
"foo.com/barSome765",
]
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
workerCreationDelay: 5000,
puppeteerOptions:{args: ['--no-sandbox', '--disable-setuid-sandbox'], headless:false},
maxConcurrency: numCPUs,
});
await cluster.task(async ({ page, data: url }) => {
await crawler(page, url)
});
for(link of links){
await cluster.queue(link);
}
await cluster.idle();
await cluster.close();
}
and this is the crawler logic in page section:
module.exports.crawler = async(page, link)=>{
await page.goto(link, { waitUntil: 'networkidle2' })
await page.waitForTimeout(10000)
await page.waitForSelector('#dbp')
try {
// method to be executed;
setInterval(async()=>{
const tables=await page.evaluate(async()=>{
/// data I need to catch in every 30 seconds
});
},30000)
} catch (error) {
console.log(error)
}
}
I searched And find out in js we can capture DOM changes with mutationObserver .and tried this solution . but did not work either.page will be closed with this error:
UnhandledPromiseRejectionWarning: Error: Protocol error
(Runtime.callFunctionOn): Session closed. Most likely the page has
been closed.
so I have two options here:
1.mutationObserver
2.set interval for every 30 seconds evaluates the page itself.
but they did not suit my needs. so any idea how to overcome this problem?

Can't access arrayBuffer on RangeRequest

Trying to solve the problem referenced in this article: https://philna.sh/blog/2018/10/23/service-workers-beware-safaris-range-request/
and here:
PWA - cached video will not play in Mobile Safari (11.4)
The root problem is that we aren't able to show videos on Safari. The article says it has the fix for the issue but seems to cause another problem on Chrome. A difference in our solution is that we aren't using caching. Currently we just want to pass through the request in our service worker. Implementation looks like this:
self.addEventListener('fetch', function (event){
if (event.request.cache === 'only-if-cached' && event.request.mode !== 'same-origin') {
return;
}
if (event.request.headers.get('range')) {
event.respondWith(returnRangeRequest(event.request));
} else {
event.respondWith(fetch(event.request));
}
});
function returnRangeRequest(request) {
return fetch(request)
.then(res => {
return res.arrayBuffer();
})
.then(function(arrayBuffer) {
var bytes = /^bytes\=(\d+)\-(\d+)?$/g.exec(
request.headers.get('range')
);
if (bytes) {
var start = Number(bytes[1]);
var end = Number(bytes[2]) || arrayBuffer.byteLength - 1;
return new Response(arrayBuffer.slice(start, end + 1), {
status: 206,
statusText: 'Partial Content',
headers: [
['Content-Range', `bytes ${start}-${end}/${arrayBuffer.byteLength}`]
]
});
} else {
return new Response(null, {
status: 416,
statusText: 'Range Not Satisfiable',
headers: [['Content-Range', `*/${arrayBuffer.byteLength}`]]
});
}
});
}
We do get an array buffer returned on the range request fetch but it has a byteLength of zero and appears to be empty. The range header actually contains "bytes=0-" and subsequent requests have a start value but no end value.
Maybe there is some feature detection we can do to determine that it's chrome and we can just call fetch regularly? I'd rather have a solution that works everywhere though. Also res is showing type:"opaque" so maybe that has something to do with it? Not quite sure what to look at next. If we can't solve the problem for Chrome I might need a different solution for Safari.
It seems that it was the opaque response. I didn't realize that fetch was 'nocors' by default. Adding 'cors' mode and overwriting the range header seems to have allowed the rewrite to work on chrome. Sadly, it still doesn't work on Safari, but I was able to access the arrayBuffer after setting the cors values properly.
Here is the change I had to make:
var myHeaders = {};
return fetch(request, { headers: myHeaders, mode: 'cors', credentials: 'omit' })
.then(res => {
return res.arrayBuffer();
})
It's important that the server respond with allowed headers. e.g.
access-control-allow-methods: GET
access-control-allow-origin: *

get post title after Infinite scroll finished

I manage to show all the post on a site where it has load_more button to go to the next page, but something is missing,
I got error of
e Error: Node is either not visible or not an HTMLElement
at ElementHandle._clickablePoint (/Users/minghann/Documents/productnation_scraper/node_modules/puppeteer/lib/ExecutionContext.js:331:13)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
Which doesn't happen if I don't load all the post. It's hard to debug because I don't know which post is missing what. Full code as below:
const browser = await puppeteer.launch({
devtools: true
});
const page = await browser.newPage();
await page.goto("https://example.net");
await page.waitForSelector(".load_more_btn");
const load_more_exist = !!(await page.$(".load_more_btn"));
while (load_more_exist > 0) {
await page.click(".load_more_btn");
}
const posts = await page.$$(".post");
let result = [];
for (const post of posts) {
result = [
...result,
{
title: await post.$eval(".post_title a", e => e.innerText)
}
];
}
console.log(result);
browser.close();
There are multiple ways and best way is to combine the following two different ways.
Look for Ajax
Wait for request instead. Whenever you click on Load More, it will do a simple ajax request to ?ajax-request=jnews. We can use .waitForRequest or .waitForResponse for this use case. Here is a working example,
await Promise.all([
page.waitForRequest(response => response.url().includes('?ajax-request=jnews') && response.status() === 200),
page.click(".load_more_btn")
])
Clean DOM and wait for new Element
Refer to these answers here and here.
Basically you can remove the dom elements that you collected, so next time you collect more data, there won't be any duplicates.
So, once you remove all current elements like document.querySelectorAll('.jeg_post'), you can simply do another page.waitFor('.jeg_post') later if you need.

Sending nested object via post request

I'm running this little node express server, which is supposed to check if the voucher is valid later and then send an answer back to the client
this is my code
app.post('/voucher', function (request, response) {
response.setHeader('Access-Control-Allow-Origin', '*');
response.setHeader('Access-Control-Request-Method', '*');
response.setHeader('Access-Control-Allow-Methods', 'OPTIONS, GET');
response.setHeader('Access-Control-Allow-Headers', 'authorization, content-type');
if ( request.method === 'OPTIONS' ) {
response.writeHead(200);
response.end();
return;
}
console.log(request)
let results;
let body = [];
request.on('data', function(chunk) {
body.push(chunk);
}).on('end', function() {
results = Buffer.concat(body).toString();
// results = JSON.parse(results);
console.log('#### CHECKING VOUCHER ####', results)
let success = {success: true, voucher: {name: results,
xxx: 10}}
success = qs.escape(JSON.stringify(success))
response.end(success)
} )
}
);
It is obviously just an example and the actual check is not implemented yet. So far so good.
Now on the client side where I work with REACT, I can not seem to decode the string I just send there.
there I'm doing this
var voucherchecker = $.post('http://localhost:8080/voucher', code , function(res) {
console.log(res)
let x = JSON.parse(res)
console.log(x)
console.log(qs.unescape(x))
It gives me the error
Uncaught SyntaxError: Unexpected token % in JSON at position 0
When I do it the other way arround
let x = qs.unescape(res)
console.log(x)
console.log(JSON.parse(x))
Than it tells me
Uncaught TypeError: _querystring2.default.unescape is not a function
Maybe you can help me? I don't know what the issue is here. Thank you.
Also another question on this behalf, since I'm only a beginner. Is there smarter ways to do such things than I'm doing it now? I have react which renders on the client and I have a mini express server which interacts a few times with it during the payment process.
The both run on different ports.
What would be the standard way or best practice to do such things?
I'm a bit perplexed as to why your backend code has so much going on in the request.
Since you asked for if there is a different way to write this, I will share with you how I would write it.
Server
It seems that you want your requests to enable CORS, it also seems that you originally wanted to parse a JSON in your request body.
This is how I would recommend you re-write your endpoint
POST /voucher to take a request with body JSON
{
code: "xxxxx"
}
and respond with
{
success: true,
voucher: {
name: results,
xxx: 10
}
}
I would recommend you use express's middleware feature as you will probably use CORS and parse JSON in most your requests so in your project I would.
npm install body-parser
npm install cors
then in your app initialization
var bodyParser = require('body-parser')
var cors = require('cors')
var app = express()
// parse application/x-www-form-urlencoded
app.use(bodyParser.urlencoded({ extended: false }))
// parse application/json you can choose to just pars raw text as well
app.use(bodyParser.json())
// this will set Access-Control-Allow-Origin * similar for all response headers
app.use(cors())
You can read more about body-parser and cors in their respective repos, if you don't want to use them I would still recommend you use your own middleware in order to reduse future redundancy in your code.
So far this will substitute this part of your code
response.setHeader('Access-Control-Allow-Origin', '*');
response.setHeader('Access-Control-Request-Method', '*');
response.setHeader('Access-Control-Allow-Methods', 'OPTIONS, GET');
response.setHeader('Access-Control-Allow-Headers', 'authorization, content-type');
if ( request.method === 'OPTIONS' ) {
response.writeHead(200);
response.end();
return;
}
console.log(request)
let results;
let body = [];
request.on('data', function(chunk) {
body.push(chunk);
}).on('end', function() {
results = Buffer.concat(body).toString();
// results = JSON.parse(results);
Now your route definition can just be
app.post('/voucher', function (request, response) {
var result = request.body.code // added by body-parser
console.log('#### CHECKING VOUCHER ####', result)
// express 4+ is smart enough to send this as json
response.status(200).send({
success: true,
voucher: {
name: results,
xxx: 10
}
})
})
Client
your client side can then be, assuming $ is jquery's post function
var body = {
code: code
}
$.post('http://localhost:8080/voucher', body).then(function(res) {
console.log(res)
console.log(res.data)
return res.data
})