How to convert HTML to image in Node.js - html

I need to convert an HTML template into an image, on a Node server.
The server will receive the HTML as a string. I tried PhantomJS (using a library called Webshot), but it doesn't work well with flex box and modern CSS. I tried to use Chrome headless-browser but it doesn't seem to have an API for parsing html, only URL.
What is the currently best way to convert a piece of HTML into image?
Is there a way to use headless Chrome in a template mode instead of URL mode? I mean, instead of doing something like
chrome.goTo('http://test.com')
I need something like:
chrome.evaluate('<div>hello world</div>');
Another option, suggested here in the comments to this post, is to
save the template in a file on the server and then serve it locally and do something like:
chrome.goTo('http://localhost/saved_template');
But this option sounds a bit awkward. Is there any other, more straightforward solution?

You can use a library called Puppeteer.
Sample code snippet :
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 960,
height: 760,
deviceScaleFactor: 1,
});
await page.setContent(imgHTML);
await page.screenshot({path: example.png});
await browser.close();
This will save a screenshot of the HTML in the root directory.

You can easily do it on frontend using html2canvas. On backend you can write the html on a file and access using a file URI (i.e: file:///home/user/path/to/your/file.html), it should work fine with chrome headless-browser and Nightmare (screenshot feature). Another option is to setup a simple HTTP server and access the url.

Related

puppeteer doesn't render pages with images URLs without a protocol scheme

I'm trying to use puppeteer to render html email messages which contains images from urls which do not always contain a protocol scheme. For example: <img src="example.com/someimage.jpg" /a>, the src really should have been https://example.com/someimage.jpg or http://....
I'm well aware that the url should contain a protocol scheme but I don't have control over the html received in the message body of the emails. Many mail clients such as gmail will render such emails just fine. I would like to mimic this behavior in puppeteer.
Is there some way in Puppeteer to trap the error and then:
try https:// prepended to the href, and failing that
try http:// prepended to the href, and failing that
then display a broken image?
This is what I do to render the html:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.setContent(htmlEmailBody);
const content = await page.$("body");
const imageBuffer = await page.screenshot({type: "jpeg", omitBackground: true, fullPage: true});
This works fine when all the urls have a scheme. What's the proper way to get this to work when some of the URLs don't always contain the scheme?
This question is related to puppeteer doesn't open a url without protocol but unfortunately it doesn't answer my question.

Emoji convered to greyed out in PDF output from Chrome Headless

(Note, even though this mention Pyppeteer, the Python version of Puppeteer, the code is exactly the same and works with either Puppeteer and Pyppeeteer).
Hi,
I'm converting the page http://getemoji.com/ into PDF using the following code :
import asyncio
from pyppeteer import launch
from pyppeteer.launcher import connect
async def main():
browser = await launch()
context = await browser.createIncognitoBrowserContext()
page = await context.newPage()
page.on('dialog', lambda dialog: dialog.dismiss())
# await page.emulateMedia('print')
await page.goto('http://getemoji.com/')
await page.screenshot({'path': 'example.png'})
await context.close()
await browser.disconnect()
asyncio.get_event_loop().run_until_complete(main())
And it generates properly the following image:
But if I try to convert the page into PDF, like this:
await page.pdf({
'path': 'example.pdf',
'format': 'A4'
})
All the emoticons are greyed in the resulting PDF, like this:
The issue is not a font issue with the emoji, since they work perfectly on the screenshot. It's something related to how the PDF is generated, but I can't find out why.
I'm hoping you'll find it :)
I came across the same issue and did some searching. The issue seems to be twofold.
As you guessed, it is to do with the CSS rules for media printing. The site uses a bootstrap css file which has following rules:
#media print {
* {
color: #000!important;
text-shadow: none!important;
background: 0 0!important;
box-shadow: none!important
}
...
}
which gives a blanket rule for foreground color of all content to be black when printing it.
Fortunately, Pyppeteer provides an API to define the media type to be used for printing the content: page.emulateMedia which if provided the 'screen' argument would ignore the media print rules. If you use this, you would see that some of the content now has colors, the links look better too.
The second issue seems to be the way Chrome prints a pdf. Pyppeteer actually uses the 'Print to PDF' functionality as provided by Chrome. So, the issue is not really with Pyppeteer.
To confirm this, I created a simple webpage which had some emojis and did not have any media printing css. When I manually open the page in Chrome and save it as a PDF, the emojis appear in black.

Using Angular to get html of a website URL

I am new in Angular
What I am going to try is to get the HTML of a page and reproduce it into an iFrame (it is an exercise).
I am using the following piece of code:
var prova = this._http.get(myUrl, {responseType: "text"}).subscribe((x) =>{
console.log(x);
});
I did it on a website (if is needed I can also insert the name of the pages) and it returns the html only of some pages.
In the other case the string x is empty.
Could it depend on connection?
Or there is some way to wait the end of the get request?
Or simply is wrong my approach and I should make a different type of request?
Your most likely going to need to use a library like puppeteer if you want to render a page properly. Puppeteer is a node library and useless headless chrome so I am not sure how well you could really integrate with Angular.
https://github.com/GoogleChrome/puppeteer

How can I include a hyperlink as a parameter in Express?

I'm trying to send a hyperlink (such as: "http://google.com") as a parameter to my Express server script. My current script looks like this:
var app = require("express")();
app.get("/new/:link(*)", function(req, res){
var link = req.params.link;
res.end(JSON.stringify({
site: link
}));
});
app.listen(process.env.PORT || 3000, function(){
console.log("Listening...");
});
This is just a test to see if I can get it working so I can build something bigger on top. The idea is that I can send a link and receive the link in JSON. However when I try to go to the site with the the link as parameter, my browser want to save a file called "google.com" and it doesn't receive any JSON from the server.
I know it's possible to do this without changing anything about my browser but I don't know how. Anyone has any ideas?
Ok, so I have accidentally fixed my problem.
Apparently i had to write "res.send(...)" instead of "end". It now works perfectly although I don't really understand why.

Chrome extension, replace HTML in response code before browser displays it

i wonder if there is some way to do something like that:
If im on a specific site i want that some of javascript files to be loaded directly from my computer (f.e. file:///c:/test.js), not from the server.
For that i was thinking if there is a possibility to make an extension which could change HTML code in a response which browser gets right before displaying it. So whole process should look like that:
request is made
browser gets response from server
#response is changed# - this is the part when extension comes in
browser parse changed response and display page with that new response.
It doesnt even have to be a Chrome extension anyway. It should just do the job described above. It can block original file and serve another one (DNS/proxy?) or filter whole HTTP traffic in my computer and replace specific code to another one of matched response.
You can use the WebRequest API to achieve that. For example, you can add a onBeforeRequest listener and redirect some requests:
chrome.webRequest.onBeforeRequest.addListener(function(details)
{
var responseData = "<div>Some text</div>"
return {redirectUrl: "data:text/html," + encodeURIComponent(responseData)};
}, {urls: ["https://www.google.com/"]}, ["blocking"]);
This will display a <div> element with the text "some text" instead of the Google homepage. Note that you can only redirect to URLs that the web server itself is allowed to redirect to. This means that redirecting to file:/// URLs is not possible, and you can only redirect to files inside your extension if these are web accessible. data: and http: URLs work fine however.
In Windows you can use the Proxomitron (proxomitron.info) which is a local proxy that can intercept any page or file being loading into your browser and change it using regular expressions (no DOM parsing) however you want, before it is rendered by the browser.