How to reduce pdf size with PDFLib with heavy images? - pdflib

I'm creating pdf with the help of the PDFLib engine. My requirement is quite heavy in terms of data which is going to be stored in pdf. One pdf will going to store around 300 Images. And will create around 100 pdfs at the same time. Mine images are been repeated so, I'll know which image will be places where. Now if I go with the image_load option of the PDFLib, the pdf size is around 100Mb. Is there any way so that I can reduce the size ?

The answer is Templates.
PDFlib supports a PDF feature with the technical name Form XObjects. However, since this term conflicts with interactive forms we refer to this feature as templates.
A PDFlib template can be thought of as an off-page buffer into which text, vector, and image operations are redirected (instead of acting on a regular page).
After the template is finished it can be used much like a raster image, and placed an arbitrary number of times on arbitrary pages. Like images, templates can be subjected to
geometrical transformations such as scaling or skewing.
When a template is used on multiple pages (or multiply on the same page), the actual PDF operators for constructing the template are only included once in the PDF file, thereby saving PDF output file size. Templates suggest themselves for elements which appear repeatedly on several pages, such as a constant background, or a company logo.
And will also reduces the size.

Related

Page load time for multiple images

I am trying to find the best way to load many (say 100) images onto a web page. Some images are large background images for a parallax effect, others are images on cubes (so all sides of the cubes made from divs);
What I am wanting to know is what is the best approach, so far I have a few idea:
method A
1 lazy load all the images and background images when they appear.
2 use a sprite sheet for the smaller images.
3 compress all images for best compression, use Gzip.
method B (experimental)
1 create a separate style sheet with variables for each image converted to base64 (ex :root{--boxImage1:url(".....)
2 create a service worker for the css as it would become rather large
3 use compression to min the css file.
4 use gzip to transfer from server.
Method c
store the images as binary on a database
(I worry this would actually make them slower as it would be searching the database rather than the file system).
are any of these methods worth doing or is there a way simpler way that I should use ?
I'm currently working on PWA with a lot of user uploaded images, and I can confidently say that using your method A did wonders for our app load time.
Compressing images on upload and converting them to base64 strings drastically reduces their size, and combined with lazy load of 5 images at a time, app is much faster.
Since you have service worker tag in your question, I would also recommend precaching at least the initial batch of images for even faster load speed of your homepage.

Universal favicon file for all sizes on all platforms?

I know that there are many devices that uses the favicon from the website in different ways...
On these favicon generator websites you can easy put an image there and the website will do the rest (generating several scaled images for certain devices)
If i want to cover ALL devices (Apple, Android, Windows Metro Tiles, and more) i would have to store 26 images (that are ALL the same picture just with different sizes)
And i would have to add 19 lines of HTML code to refer the certain devices to the certain images.
Is there a way to use just 1 file for all sizes?
I know that an ICO-File can contain multiple dimensions of an image.
I also know that a SVG-File doesn't depend on pixel resolution at all because it's vector based. (So a SVG can support EVERY imaginable size)
I could imagine to implement all sizes of an image to just a ICO or a SVG file where every device can pick it's optimal size.
Is that possible?
Its not exactly possible to serve different sized PNGs' in a single file.
SVG file would be the best hope here but.. Browsers today don't like them
Alternate option would be to use a tool that manages this process automatically.
I had the same frustration as you, so I simply came up with this tool called MakeFavicon
It helps create multiple favicons with predefined sizes and filenames at a desired folder, and also creates config files such as browserconfig.xml,manifest.json, partial view of HEAD to be included with relevant info.
I'm using it as a part of my Visual Studio build process and it works seamlessly to update all these files on every build.
Here's link to its example usage.

Displaying pdf on web browser in other formats

It is difficult to have full control over how pdf document is rendered on the web browser (adjusting the zoom, page size, etc.) when it is embedded in an html document. So, I am considering to convert pdf documents in advance into formats such as svg or png, and embed them into an html document instead of embedding a pdf file. A multi-page pdf document will correspond to multiple files of svg or png, which will be stored in a directory. I can handle the change of page according to the user input using JavaScript, and that is not a problem.
Given that the pdf documents are scanned documents at around 300dpi, black and white, and the converted file should have a comparable quality, what format would be best suited for this situation mostly in terms of rendering speed on the browser? I understand that cache will change the speed, so I want to limit my consideration to when the pages are rendered for the first time. I have svg or png in mind. Which one is better, or is there a better format that can be easily be converted to from pdf?
When a bitmap document such as png is zoomed to a different size, I understand that it will be jaggy. On the other hand, if I feel that, if I have a svg file that embeds such scanned parts, anti-aliasing will work, removing the jagginess. Is my understanding correct?
what format would be best suited for this situation mostly in terms of rendering speed on the browser?
Once it is in the browser, the bitmap (PNG) will be faster. However if the PDF is mostly text and vectors, it will generally be a lot faster to first viewing. Downloading is usually slower than rendering.
If the PDF just consists of high resolution scans, then the two approaches will be roughly equivalent in terms of speed.
if I have a svg file that embeds such scanned parts, anti-aliasing will work, removing the jagginess. Is my understanding correct?
No, that is not correct. A bitmap image does not magically have infinite resolution when put inside an SVG. If you scale up the SVG, the bitmap inside will still get jaggy. Same as if it wasn't in an SVG.

What alternative I can use to SVG words cloud?

Recently, I've designed a word cloud in Illustrator for a customer. It uses around 5,000 people's names in white on a colored background on a logo path, and includes a few vector logos. Each name is ridiculously small, and we want to be able to search on the cloud and find our name.
We've put it online as a SVG with success - but a 20M file can cause problems!
So everything would be fine until we reach 10,000 visitors at the same time, and make all our servers crash and timeout everyone.
So what is our alternative to make this fast, easy for visitors to use, and latency free? We think about Canvas, but not sure if it's simple to make a words cloud with [really (thing about following a logo path)] custom shape.
It sounds like you have 20Mb because the names are being stored/represented with paths. If you represent them as text, you will substantially reduce the size of the file, AND make it appropriately searchable.
Assuming 13 characters per name (including the space in between), UTF-8 encoding, and 10,000 names the names themselves should only take 127Kb. You may wish to experiment with transmitting the background of the SVG and the names (JSON?), and using a script to construct the cloud in the browser.
Edit: Even if you create a completely static SVG, representing the text as text will result in a substantial saving of space over the use of paths.

How much faster is it to use inline/base64 images for a web site than just linking to the hard file?

How much faster is it to use a base64/line to display images than opposed to simply linking to the hard file on the server?
url(data:image/png;base64,.......)
I haven't been able to find any type of performance metrics on this.
I have a few concerns:
You no longer gain the benefit of caching
Isn't a base64 A LOT larger in size than what a PNG/JPEG file size?
Let's define "faster" as in: the time it takes for a user to see a full rendered HTML web page
'Faster' is a hard thing to answer because there are many possible interpretations and situations:
Base64 encoding will expand the image by a third, which will increase bandwidth utilization. On the other hand, including it in the file will remove another GET round trip to the server. So, a pipe with great throughput but poor latency (such as a satellite internet connection) will likely load a page with inlined images faster than if you were using distinct image files. Even on my (rural, slow) DSL line, sites that require many round trips take a lot longer to load than those that are just relatively large but require only a few GETs.
If you do the base64 encoding from the source files with each request, you'll be using up more CPU, thrashing your data caches, etc, which might hurt your servers response time. (Of course you can always use memcached or such to resolve that problem).
Doing this will of course prevent most forms of caching, which could hurt a lot if the image is viewed often - say, a logo that is displayed on every page, which could normally be cached by the browser (or a proxy cache like squid or whatever) and requested once a month. It will also prevent the many many optimizations web servers have for serving static files using kernel APIs like sendfile(2).
Basically, doing this will help in certain situations, and hurt in others. You need to identify which situations are important to you before you can really figure out if this is a worthwhile trick for you.
I have done a comparison between two HTML pages containing 1800 one-pixel images.
The first page declares the images inline:
<img src="">
In the second one, images reference an external file:
<img src="img/one-gray-px.png">
I found that when loading multiple times the same image, if it is declared inline, the browser performs a request for each image (I suppose it base64-decodes it one time per image), whereas in the other scenario, the image is requested once per document (see the comparison image below).
The document with inline images loads in about 250ms and the document with linked images does it in 30ms.
(Tested with Chromium 34)
The scenario of an HTML document with multiple instances of the same inline image doesn't make much sense a priori. However, I found that the plugin jquery lazyload defines an inline placeholder by default for all the "lazy" images, whose src attribute will be set to it. Then, if the document contains lots of lazy images, a situation like the one described above can happen.
You no longer gain the benefit of caching
Whether that matters would vary according to how much you depend on caching.
The other (perhaps more important) thing is that if there are many images, the browser won't get them simultaneously (i.e. in parallel), but only a few at a time -- so the protocol ends up being chatty. If there's some network end-to-end delay, then many images divided by a few images at a time multiplied by the end-to-end delay per image results in a noticeable time before the last image is loaded.
Isn't a base64 A LOT larger in size than what a PNG/JPEG file size?
The file format / image compression algorithm is the same, I take it, i.e. it's PNG.
Using Base-64, each 8-bit character represents 6-bits: therefore binary data is being decompressed by a ratio of 8-to-6, i.e. only about 35%.
How much faster is it
Define 'faster'. Do you mean HTTP performance (see below) or rendering performance?
You no longer gain the benefit of caching
Actually, if you're doing this in a CSS file it will still be cached. Of course, any changes to the CSS will invalidate the cache.
In some situations this could be used as a huge performance boost over many HTTP connections. I say some situations because you can likely take advantage of techniques like image sprites for most stuff, but it's always good to have another tool in your arsenal!