I have an application in which I have to generate PDF for report but I am facing issue in creating PDF
As I want my PDF to look good and I don't know height of my PDF content so when I create PDF it breaks pages at wrong place
So is there any way to create full PDF in one Page only so that it want break on any other place
I have added my code below
const reportDetailHTML = await this.getHTML("report-detail.html", {
...
});
const sectionHTML = await this.getHTML(
"section-divider.html",
{
...
}
);
const browser = await puppeteer.launch({
args: ["--font-render-hinting=none", "--force-color-profile=srgb"],
headless: true,
});
const page: any = await browser.newPage();
await page.setUserAgent(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
);
await page.setContent(reportDetailHTML + sectionHTML, {
waitUntil: "networkidle0",
});
let height = await page.evaluate(
() => document.documentElement.offsetHeight
);
console.log(height);
await page.waitForNetworkIdle();
await page.evaluateHandle("document.fonts.ready");
await page.emulateMediaType("screen");
const pdf = await page.pdf({
printBackground: true,
margin: "none",
format:"A4",
height,
});
writeFile("./report.pdf", pdf, {}, (err) => {
if (err) {
return console.error("error");
}
console.log("success!");
});
await browser.close();`
So is there any way to create full PDF in one Page only so that it want break on any other place
Related
How can I get the fully rendered html+css of a client side rendered webpage? The page contents on puppeteer returns a very poorly rendered outcome with missing css
Simplified code:
const express = require('express')
const puppeteer = require('puppeteer');
const app = express()
const port = 3000
async function getHtml(url) {
const browser = await puppeteer.launch({
headless: true,
args: ['--no-sandbox']
});
const page = await browser.newPage();
await page.goto(url,
{ waitUntil: ['networkidle0', 'networkidle2', 'load', 'domcontentloaded'] });
const k = await page.content()
await browser.close();
return k
};
app.get('/', (request, response) => {
getHtml(request.query.url)
.then(function (res) {
response.send(res);
})
.catch(function (err) {
console.error(err)
response.send(err);
})
});
app.listen(port)
Running this with any website; for example https://www.tesla.com/ gives something like
Although using the page.screenshot() method gives the desired results.
Any ideas on why this occurs? And more importantly, is there a way to get around this behaviour?
I am trying to get some historical stock data from here:
https://www1.nseindia.com/products/content/equities/equities/eq_security.htm
I am using puppeteer and this is what I have tried:
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://www1.nseindia.com/products/content/equities/equities/eq_security.htm');
await page.click('#symbol');
await page.keyboard.type('SONACOMS');
let getData = '#get';
await page.waitForSelector(getData);
await page.click(getData);
await page.waitForSelector('#historicalData');
await page.screenshot({path: 'nse.png'});
await browser.close();
})();
The input gets filled correctly , but the click does not seem to
be working. The code hangs forever.
To debug I tried following from the developer console:
document.querySelector('#symbol').value = 'SONACOMS';
document.querySelector('#get').click()
This works correctly. So I am not sure what I am missing in the puppeteer code.
The site is pretty wonky and I'm not sure what's causing the hang, but should be scrapable by bypassing the DOM and hitting the search URL directly:
const puppeteer = require("puppeteer"); // ^19.0.0
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const ua =
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
await page.setUserAgent(ua);
await page.goto(
"https://www1.nseindia.com/products/content/equities/equities/eq_security.htm",
{waitUntil: "domcontentloaded"}
);
const symbol = "SONACOMS";
const searchUrl = `https://www1.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol=${symbol}&segmentLink=3&symbolCount=1&series=ALL&dateRange=day&fromDate=&toDate=&dataType=PRICEVOLUMEDELIVERABLE`;
await page.evaluate(`
fetch("${searchUrl}")
.then(res => res.text())
.then(html => document.body.innerHTML = html)
`);
const data = await page.$eval("table", el =>
[...el.querySelectorAll("tr")].map(e =>
[...e.querySelectorAll("th, td")].map(e =>
e.textContent.trim()
)
)
);
console.table(data);
const table = await page.$("table");
await table.screenshot({path: "nse.png"});
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
This is code
const browser = await puppeteer.launch({
headless: false,
timeout: 0,
defaultViewport: null,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--start-maximized",
"--ignore-certificate-errors",
],
ignoreDefaultArgs: ["--enable-automation"],
});
const page = await browser.newPage();
await page.setUserAgent(
"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36"
);
// set download path
const client = await page.target().createCDPSession();
await client.send("Page.setDownloadBehavior", {
behavior: "allow",
downloadPath: "D:\\Download",
});
// open uri
await page.goto(
"https://translate.google.com.hk/?hl=zh-CN&sourceid=cnhp&sl=en&tl=zh-CN&op=docs",
{
waitUntil: "networkidle2",
}
);
// upload pdf docuemnt
const [fileChooser] = await Promise.all([
page.waitForFileChooser(),
page.click("label"),
]);
await fileChooser.accept(["D:\\test.pdf"]);
// click translate button
const button = await page.waitForSelector(
"div[jsname='itaskb'] > div > button"
);
await button.evaluate((b) => b.click());
// click download button
const button2 = await page.waitForSelector(
"div[jsname='itaskb'] > button",
{
visible: true,
timeout: 0,
}
);
await button2.evaluate((b) => b.click());
The whole process is the same as my manual operation. But the translated document after download is not zh-CN, but the same as the uploaded document, which is en.
What happened? How do I proceed to get the translation I want.
I am trying to use puppeteer to login to the nike site but I get an error likely due to anti-bot. I've tried some things to avoid being detected but did not have any luck. Here is my code:
//const puppeteer = require('puppeteer');
const puppeteer = require("puppeteer-extra");
const pluginStealth = require("puppeteer-extra-plugin-stealth");
puppeteer.use(pluginStealth());
//Create Sleep function to use in Async/Await function
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
const randomDelay = (min, max) =>
Math.floor(Math.random() * (max - min + 1) + min);
(async () => {
await sleep(1000);
var browser;
browser = await puppeteer.launch({
executablePath: 'C:/Program Files/Google/Chrome/Application/chrome.exe',
headless: false,
args: ['--no-sandbox', '--disable-setuid-sandbox', '--disable-web-security'],
});
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36',
);
await page.setExtraHTTPHeaders({
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8',
});
await page.goto('https://www.nike.com/us/en_us/e/nike-plus-membership', {
waitUntil: 'networkidle0',
});
const emailSelector = '.nike-unite-text-input.emailAddress input';
await page.waitFor(emailSelector);
await page.waitFor(randomDelay(300, 600));
const inputs = [emailSelector, '.nike-unite-text-input.password input'];
await page.type(inputs[0], 'xyz#gmail.com', {
delay: randomDelay(200, 300),
});
await page.waitFor(randomDelay(300, 600));
await page.type(inputs[1], 'XYZDEFEWD!"', {
delay: randomDelay(200, 300),
});
const submitBtn = '.nike-unite-submit-button.loginSubmit input';
await page.waitFor(randomDelay(200, 500));
await page.click(submitBtn);
})();
Is there any way to identify what the website is using to detect that I am using puppeteer?
There could be a full proof solution of avoiding bot detection, but here are the someways you can try
Try proxying your IP through multiple countries
Try to add random intervals in your n/w calls
use random user agents instead of fixed one and also alter the viewport size.
I have a website login form I'm trying to log in to, I was able to get the username and password to type into the input forms. Then I wanted to wait submit the form, but when I do a page.Waitfor(), it seems to wipe out the input data fields. Can someone explain why or show a workaround?
async function Scraper(){
try{
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36');
await page.goto('https://onlyfans.com/');
await page.waitFor('input[name=email]');
console.log("starting to do this");
await page.$eval('input[name=email]', el => el.value = 'xxx#gmail.com');
await page.$eval('input[name=password]', el => el.value = 'xxx');
let selector = 'button[type="submit"]';
await page.screenshot({
path: 'yoursite.png',
fullPage: true
});
await page.waitFor(5000);
await page.evaluate((selector) => document.querySelector(selector).click(), selector);
await page.screenshot({
path: 'yoursite4.png',
fullPage: true});
console.log("done");
Here is the differences between the two images:
Looks like there is a delay till the login button gets enabled. The following worked for me:
await page.goto('https://onlyfans.com/', {waitUntil: "networkidle0"});
await page.waitForSelector('input[name=email]');
await page.waitForSelector('input[name=password]');
await page.waitForSelector('button[type="submit"]');
await page.type('input[name=email]', 'xxx#gmail.com', {delay: 200});
await page.type('input[name=password]', 'xxx', {delay: 200});
await page.click('button[type="submit"]');