Best solution for turning a website into a pdf - html

The company I work for we have a CBT system we have developed. We have to go through and create books out of the content that is in our system, I have developed a program that goes through and downloads all of the content out of our system and creates a offline version of the different training modules.
I created a program that creates PDF documents using the offline version of the CBT. It works by using Websites Screenshot to create a screen shot of the different pages and then using iTextSharp it creates a PDF Document from those images.
It seams to be a memory hug and painfully slow. There are 40 CBT Modules that it needs to turn into books. Even though I take every step to clear the memory after each time it creates a book. After about 2 books it crashes because there is no memory left.
Is there a better way to do this instead of having to take a screen shot of the pages that will yield the same look of the web page inside the pdf document?

I have searched and demoed and found that ABCPdf from WebSuperGoo is the best product for .NET. It is the most accurate and doesn't require a printer driver. It uses IE as the rendering engine, so it looks almost exactly like what you get in IE.

PrinceXML is commercial software that generates pdf from websites.

I've used PDFSharp in the past and have had good success in generating PDF's.
It's open source as well, so in the event of troubles like you've mentioned, you're able to hunt and peck to increase performance.

If you control the source it is probably not too difficult to generate pdf directly instead of through a screenshot.
Did you try unloading the dll?
There are also different ways of getting screenshots:
http://mashable.com/2007/08/24/web-screenshots/

Related

Permanently available file to download for internet speed testing

I am implementing a very simple download speed tester.
To do that, I simply download a large image file from the web and see if it's received within a reasonable time.
I'm using the following file because I saw that in a source code somewhere:
https://cdn.pixabay.com/photo/2017/08/30/01/05/milky-way-2695569_1280.jpg
However, I am afraid that the image might go away some time.
Which image could I use to make sure it will always be there?
Is there perhaps an image that a foundation or so has created especially for such a purpose and who promises that it will be there for a long time?
I was even thinking about downloading a really popular js file because I was thinking that it will be there for a long time like https://code.jquery.com/jquery-3.6.0.min.js, but I am not sure about this either.
How could I handle this task in the most reliable way?
Thank you!
I would recommend Wikimedia commons.
Reasons:
Wikimedia has a very strict guideline that allows only high quality uploads which of course are higher in size.
It's free and persistent (It perisist for years )
You can go with videos for even bigger sizes.
For images it's https://commons.m.wikimedia.org/wiki/Category:Images

Reducing the loading time in an HTML document filled with thousands of PNGs by generating thumbnail JPGs

I'd like to make an HTML document with all of my own photos saved on my computer, and use it as an offline catalog, as most image viewing software (commercially available) don't satisfy my needs.
There's about 1000 photos and the vast majority of them are PNGs, about 10-15MB each.
I'm suspecting that each time I'd open this HTML document, loading would take an enormous amount of time.
To prevent that, I was told I should create thumbnail versions of those photos (e.g., 200x200px JPGs), and use them instead, while enclosing them in links that redirect me to the original locally saved PNG files.
Just asking to verify if there's any better alternative to this, or should I really go through this thumbnails solution? If there isn't, how would I mass generate those thumbnails, so I don't do it individually?
I'm on a High Sierra Mac and a Windows 7 PC.
Extra question: Should I, by any chance, use a document coded in a language other than HTML, for this purpose? For example, a language faster or more efficient or more cross-platform, that I didn't think of myself? I'm open to suggestions.
Thank you.

How can I save a webpage as an image in my rails app?

In my rails app I have a need to save some webpages and display them to the user as images. For example, how would I save www.google.com as an image?
There is a command line utility called CutyCapt that is using the WebKit-Rendering engine to render HTML-Pages into various image formats. Maybe this is for you?
http://cutycapt.sourceforge.net/
Prohibitively difficult to do in pure Ruby, so you'd want to use an external service for this. Browsershots does it, for example, and it looks like they have an api, although I haven't used it myself. Maybe someone else can chime in with alternative but similar services.
You'll also want to read up on delayed_job or something similar, to make sure you're accessing those page images as a background task and that it doesn't interfere with your actual application.
You can't do it easily (probably can't do it at all).
Each page is just a text - html data. The view you want to make an image of is a rendered page. Browser renders the page using tonns of techniques like html parsing, javascript parsing, css parsing, font rendering, etc.. To make the screenshot of google page - you would need to do all the rendering somewhere in memory and then take a screenshot of rendered page.
That task is almost impossible (there is nothing fully impossible).
If you are really eager to donate tonns of time to accomplish that task - you should do this steps:
1) Find some opensource rendering engine. Firefox would do.
2) Find some way to communicate between ruby-on-rails and that engine.
3) Wire it all together and see the results.
However, I see steps 1 and 2 as nearly impossible.
Firefox addon:
https://addons.mozilla.org/en-US/firefox/addon/1146/

Thumbnails from HTML pages created and used automatically in web application

I am working on a Ruby on Rails app that visualizes product trees. The tree is built of nodes an everything is rendered in HTML/CSS3. Some of the products make several hundred SQL queries as the tree builds up (up to 800 queries on the biggest tree).
I'd like to have small thumbnails of each tree to present it on an index page. So rendering each tree once again and modifying CSS to make a tiny representation is an option.
But i think it's probably easier to generate thumbnails, crop, cache, and show these on the index page.
Any ideas on how to do this? Any links/articles/blog posts that could help me?
Check out websnapr; it looks like they provide 100,000 free snaps a month.
I should check this site more often. :D Anyway, I've done some more research and it looks like you'll need to set up some server-side scripts that will open a browser to the page, take a screenshot, and dump the file/store in database/etc.
This question has been open for quite a while. I have a proposal which actually fulfills most of the requirements.
Webkit2png can create screenshots which and crop parts of the image. You can specify dimensions, crop areas, and also it provides a thumbnail of the pages.
However, it will not support login in your application out-of-the-box.
Webkit2png is really easy to use in a shell script, so you can just feed it with a number of URLS and it will return all the image files.
More info in this blog post: Batch Screenshots with webkit2png
Webkit2png has an open request to add authentication (so you can use it on logged in pages).

What is the way to programmatically render a website to an image from a server environment?

I would like to render websites to an image (JPEG, PNG, PDF, etc.) from a server environment. I have seen a few implementations that use Xvfb but would like to see if there are any decent implementations that would work standalone without X of any sort.
Google Fast Flip seems to do a pretty decent job. I have seen this on a smaller scale where mousing over links pops up a "preview" of the page the link connects to.
I've successfully used wkhtmltopdf to convert web pages to PDF, which I then convert to images. It's built on top of WebKit.
Back in 2006, I rolled my own version of Webshots using a combination of X, VNCServer, Firefox, PHP, and a few shell scripts. It was somewhat of a hack, but worked extremely well.
I don't see how you're going to do this without using some type of GUI environment. The webpage has to be rendered somehow for a screenshot to be captured. Alternatively, use one of the several commercial solutions that offer an API.
Sites like browsershots will do it by loading the webpage into a browser in a VM, then taking a screenshot of the VM environment.
If you have a small number of sites which need snapshots stored, the solutions you linked to should be fine. Otherwise, if you need shots of any/all arbitrary websites, you may want to consider using an existing third-party database of snapshots.
CutyCapt by Björn Höhrmann is excellent - cross platform, built on webkit, outputs to different file formats e.g. PNG / JPEG / PDF.
Usage: CutyCapt --url=http://www.example.org/ --out=localfile.png
Simples :)