puppeteer doesn't render pages with images URLs without a protocol scheme - puppeteer

I'm trying to use puppeteer to render html email messages which contains images from urls which do not always contain a protocol scheme. For example: <img src="example.com/someimage.jpg" /a>, the src really should have been https://example.com/someimage.jpg or http://....
I'm well aware that the url should contain a protocol scheme but I don't have control over the html received in the message body of the emails. Many mail clients such as gmail will render such emails just fine. I would like to mimic this behavior in puppeteer.
Is there some way in Puppeteer to trap the error and then:
try https:// prepended to the href, and failing that
try http:// prepended to the href, and failing that
then display a broken image?
This is what I do to render the html:
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.setContent(htmlEmailBody);
const content = await page.$("body");
const imageBuffer = await page.screenshot({type: "jpeg", omitBackground: true, fullPage: true});
This works fine when all the urls have a scheme. What's the proper way to get this to work when some of the URLs don't always contain the scheme?
This question is related to puppeteer doesn't open a url without protocol but unfortunately it doesn't answer my question.

Related

Display image that returns HTTP 503 in Firefox

I have a status badge image that returns the HTTP code 503 when the respective service is offline (but the webserver is still there serving calls). Now opening the image URL directly will display the image properly, regardless of the underlying 503 error code. But using it inside an <img> tag shows the broken image icon. How can I prevent that while still allowing the image itself to return a 503? (External services depend on that)
Here are some screenshots to illustrate what's going on:
The badge on the page:
The status message in the developer console:
The badge itself:
Note: This happens on Firefox. Not Chrome
Edit: Here are a few requested pieces information:
Firefox 78.0.2 (64-Bit)
It's served from the same domain. But the domain is essentially just proxying serveral underlying webservices. And this badge is originating from a different service but all on the same domain.
It's a SVG image if that makes any difference.
Since XMLHttpRequest can retrieve the output of any request, no matter the response code, it is possible to request for the image with XMLHttpRequest, and then convert the blob response type to a base64 format image, which can be loaded in the browser.
The CORS proxy I used in the sample code may not be necessary in the majority of cases, but could be useful in the case where the image you are trying to display has weird response headers that prevent access to the image from another domain.
Here is the sample code. It should work no matter the response code, CORS, etc.
var xhr = new XMLHttpRequest();
xhr.onload = function () {
var reader = new FileReader();
reader.onloadend = function () {
// here, reader.result contains the base64-formatted string you can use to set the src attribute with
document.getElementsByTagName('img')[0].src = reader.result; // sets the first <img> tag to display the image, change to the element you want to use
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', "https://cors-anywhere.herokuapp.com/i.stack.imgur.com/8wB1j.png"); // don't include the HTTP/HTTPS protocol in the url
xhr.responseType = 'blob';
xhr.setRequestHeader('X-Requested-With', 'xhr');
xhr.send();
<img src="about:blank">
Everything works, as when you go into Inspect Element, you see that the src attribute of the <img> tag points to a base64 URL that can load in any browser.
You might want to compress or resize your images before uploading it to server , as they might be large enough to keep the server busy and show the error as most of the time, a 503 error occurs because the server is too busy.
More over the image is SVG so it might render dimesions before completing, hence I'd suggest
Try replacing the SVG with PNG or JPG
Also try for site like https://tinypng.com/ to compress the image size
This might work for you

Session ID not preserved between page navigation using Puppeteer's .goto method?

When attempting to navigate to a sub-page using Puppeteer's goto method, I have noted that cookie information is not being correctly preserved between navigation.
const puppeteer = require('puppeteer');
puppeteer.launch().then(async browser => {
const page = await browser.newPage();
await page.goto('http://www.example.com/Summary.aspx?sid=100-013-030);
await page.screenshot({path: 'example1.png'});
await page.goto('http://www.example.com/DetailInfo.aspx?did=af902cb3');
await page.screenshot({path: 'example2.png'});
await browser.close();
});
In the code above, upon making the second goto call, the example2.png file generated is a screenshot of the Summary landing page; indicating a silent failure. Conversely, when navigating manually within the Chrome browser itself, copying and pasting the DetailInfo link into a new tab opens the intended page with no issue.
Upon further investigation, I did note that the website is keeping a cookie with a session ID in the browser cache, but what is the difference between the manual approach, and using Puppeteer that is creating this discrepancy?

How to convert HTML to image in Node.js

I need to convert an HTML template into an image, on a Node server.
The server will receive the HTML as a string. I tried PhantomJS (using a library called Webshot), but it doesn't work well with flex box and modern CSS. I tried to use Chrome headless-browser but it doesn't seem to have an API for parsing html, only URL.
What is the currently best way to convert a piece of HTML into image?
Is there a way to use headless Chrome in a template mode instead of URL mode? I mean, instead of doing something like
chrome.goTo('http://test.com')
I need something like:
chrome.evaluate('<div>hello world</div>');
Another option, suggested here in the comments to this post, is to
save the template in a file on the server and then serve it locally and do something like:
chrome.goTo('http://localhost/saved_template');
But this option sounds a bit awkward. Is there any other, more straightforward solution?
You can use a library called Puppeteer.
Sample code snippet :
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setViewport({
width: 960,
height: 760,
deviceScaleFactor: 1,
});
await page.setContent(imgHTML);
await page.screenshot({path: example.png});
await browser.close();
This will save a screenshot of the HTML in the root directory.
You can easily do it on frontend using html2canvas. On backend you can write the html on a file and access using a file URI (i.e: file:///home/user/path/to/your/file.html), it should work fine with chrome headless-browser and Nightmare (screenshot feature). Another option is to setup a simple HTTP server and access the url.

What does "blob" mean in the `href` property in "<link>"? [duplicate]

My page generates a URL like this: "blob:http%3A//localhost%3A8383/568233a1-8b13-48b3-84d5-cca045ae384f" How can I convert it to a normal address?
I'm using it as an <img>'s src attribute.
A URL that was created from a JavaScript Blob can not be converted to a "normal" URL.
A blob: URL does not refer to data the exists on the server, it refers to data that your browser currently has in memory, for the current page. It will not be available on other pages, it will not be available in other browsers, and it will not be available from other computers.
Therefore it does not make sense, in general, to convert a Blob URL to a "normal" URL. If you wanted an ordinary URL, you would have to send the data from the browser to a server and have the server make it available like an ordinary file.
It is possible convert a blob: URL into a data: URL, at least in Chrome. You can use an AJAX request to "fetch" the data from the blob: URL (even though it's really just pulling it out of your browser's memory, not making an HTTP request).
Here's an example:
var blob = new Blob(["Hello, world!"], { type: 'text/plain' });
var blobUrl = URL.createObjectURL(blob);
var xhr = new XMLHttpRequest;
xhr.responseType = 'blob';
xhr.onload = function() {
var recoveredBlob = xhr.response;
var reader = new FileReader;
reader.onload = function() {
var blobAsDataUrl = reader.result;
window.location = blobAsDataUrl;
};
reader.readAsDataURL(recoveredBlob);
};
xhr.open('GET', blobUrl);
xhr.send();
data: URLs are probably not what you mean by "normal" and can be problematically large. However they do work like normal URLs in that they can be shared; they're not specific to the current browser or session.
another way to create a data url from blob url may be using canvas.
var canvas = document.createElement("canvas")
var context = canvas.getContext("2d")
context.drawImage(img, 0, 0) // i assume that img.src is your blob url
var dataurl = canvas.toDataURL("your prefer type", your prefer quality)
as what i saw in mdn, canvas.toDataURL is supported well by browsers. (except ie<9, always ie<9)
For those who came here looking for a way to download a blob url video / audio, this answer worked for me. In short, you would need to find an *.m3u8 file on the desired web page through Chrome -> Network tab and paste it into a VLC player.
Another guide shows you how to save a stream with the VLC Player.
UPDATE:
An alternative way of downloading the videos from a blob url is by using the mass downloader and joining the files together.
Download Videos Part
Open network tab in chrome dev tools
Reload the webpage
Filter .m3u8 files
Look through all filtered files and find the playlist of the '.ts' files. It should look something like this:
You need to extract those links somehow. Either download and edit the file manually OR use any other method you like. As you can see, those links are very similar, the only thing that differs is the serial number of the video: 's-0-v1-a1.ts', 's-1-v1-a1.ts' etc.
https://some-website.net/del/8cf.m3u8/s-0-v1-a1.ts
https://some-website.net/del/8cf.m3u8/s-1-v1-a1.ts
https://some-website.net/del/8cf.m3u8/s-2-v1-a1.ts
and so on up to the last link in the .m3u8 playlist file. These .ts files are actually your video. You need to download all of them.
For bulk downloading I prefer using the Simple Mass Downloader extension for Chrome (https://chrome.google.com/webstore/detail/simple-mass-downloader/abdkkegmcbiomijcbdaodaflgehfffed)
If you opt in for the Simple Mass Downloader, you need to:
a. Select a Pattern URL
b. Enter your link in the address field with only one modification: that part of the link that is changing for each next video needs to be replaced with the pattern in square brackets [0:400] where 0 is the first file name and 400 is the last one. So your link should look something like this https://some-website.net/del/8cf.m3u8/s-[0:400]-v1-a1.ts.
Afterwards hit the Import button to add these links into the Download List of Mass Downloader.
c. The next action may ask you for the destination folder for EACH video you download. So it is highly recommended to specify the default download folder in Chrome Settings and disable the Select Destination option in Chrome Settings as well. This will save you a lot of time! Additionally you may want you specify the folder where these files will go to:
c1. Click on Select All checkbox to select all files from the Download List.
c2. Click on the Download button in the bottom right corner of the SMD extension window. It will take you to next tab to start downloading
c3. Hit Start selected. This will download all vids automatically into the download folder.
That is it! Simply wait till all files are downloaded and you can watch them via the VLC Player or any other player that supports the .ts format. However, if you want to have one video instead of those you have downloaded, you need to join all these mini-videos together
Joining Videos Part
Since I am working on Mac, I am not aware of how you would do this on Windows. If you are the Windows user and you want to merge the videos, feel free to google for the windows solution. The next steps are applicable for Mac only.
Open Terminal in the folder you want the new video to be saved in
Type: cat and hit space
Open the folder where you downloaded your .ts video. Select all .ts videos that you want to join (use your mouse or cmd+A)
Drag and drop them into the terminal
Hit space
Hit >
Hit Space
Type the name of the new video, e.g. my_new_video.ts. Please note that the format has to be the same as in the original videos, otherwise it will take long time to convert and even may fail!
Hit Enter. Wait for the terminal to finish the joining process and enjoy watching your video!
Found this answer here and wanted to reference it as it appear much cleaner than the accepted answer:
function blobToDataURL(blob, callback) {
var fileReader = new FileReader();
fileReader.onload = function(e) {callback(e.target.result);}
fileReader.readAsDataURL(blob);
}
I'm very late to the party.
If you want to download the content you can simply use fetch now
fetch(blobURL)
.then(res => res.blob())
.then(blob => /*do what you want with the blob here*/)
Here the solution:
let blob = new Blob(chunks, { 'type' : 'video/mp4;' });
let videoURL = window.URL.createObjectURL(blob);
const blobF = await fetch(videoURL).then(res => res.blob())
As the previous answer have said, there is no way to decode it back to url, even when you try to see it from the chrome devtools panel, the url may be still encoded as blob.
However, it's possible to get the data, another way to obtain the data is to put it into an anchor and directly download it.
<a href="blob:http://example.com/xxxx-xxxx-xxxx-xxxx" download>download</a>
Insert this to the page containing blob url and click the button, you get the content.
Another way is to intercept the ajax call via a proxy server, then you could view the true image url.

File download rename not working

I try to rename file with download attribute but it's not working.
OK
FIDDLE
It only works if the file is on the same origin so if you can download a external file with CORS + ajax then you can save the blob with a custom name
$('a').click(function(evt){
evt.preventDefault();
var name = this.download;
// we need a blob so we can create a objectURL and use it on a link element
// jQuery don't support responseType = 'blob' (yet)
// So I use the next version of ajax only avalible in blink & firefox
// it also works fine by using XMLHttpRequest v2 and set the responseType
fetch("https://crossorigin.me/" + this.href)
// res is the beginning of a request it only gets the response headers
// here you can use .blob() .text() .json or res.arrayBuffer() depending
// on what you need, if it contains Content-Type: application/json
// then you might want to choose res.json()
// all this returns a promise
.then(res => res.blob())
.then(blob => {
$("<a>").attr({
download: name,
href: URL.createObjectURL(blob)
})[0].click();
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
OK
From the docs you linked:
If the HTTP header Content-Disposition: is present and gives a different filename than this attribute, the HTTP header has priority over this attribute.
My guess is that the server you're linking to sets this header.
Also if you're linking to an external resource it likely won't work:
This attribute is only honored for links to resources with the same-origin.
It appears that this attribute doesn't work anymore for external files, due to possible security concerns.
You can find a discussion about this issue for Chrome here
Use of the 'download' attribute will always trigger a download, but from M-35 onwards will only honor the suggested filename if the final resource URL is same-origin as the document. Even when it doesn't, as long as a MIME type is specified correctly, it will receive a filename like 'download.' where is the extension known to the host OS as mapping to the specified MIME type. If the resource is served with a Content-Disposition, then the Content-Disposition will take precedence.
And for Firefox here and here