I am just printing the HTML page, I need to show the page numbers opposite to the titles in Table of contents. I investigated a lot, but couldn't find a proper solution.
While investigating I found target-counter but it is not working for me. Seems it is explicitly providing support for PrinceXML, but I am using puppeteer to convert HTML content to PDF, since PrinceXML doesn't support .Net Core, I have opted Puppeteer Sharp.
So is there any way I could get page numbers in the table of contents.
This is more a theoretical question than a coding question. I am trying to create a epub, with some interactive components. Now to separate the code, I put each interactive component in a separate html file. Typical interactive components will be a questionnaire.
What will be the best way to link this to my epub, two options I am considering are:
Iframe - the interactive component will display as a part of the book, and users will seamlessly complete the activity.
External link - an Icon will be displayed in the book, which will serve as a link to a new page, this page will contain the interactive component.
I would like to keep the epub according to epub 3.0 standard.
Any suggestions or alternative solutions I can research?
Either approach is compatible with the standard since external links are allowed and scripting within an iFrame is also allowed (known as scripting in a container).
The broader question is what are you trying to achieve? If the user is sent to an external page in a browser then the browser will have to post the results back to some server since the browser can't write the results back to disk or the EPUB.
It also depends on whether the EPUB Reading System is browser-based itself or an app. It is an app then in theory the app might know how to log info locally (though that would require jumping through some hoops that could have security implications.
So what is the goal here?
In my web application, a user may make a post with images, embed videos and text with different styles. I want to generate a preview for the post to show it on the front page of the web application. It demands that it doesn't take too much space and as clear as possible.
I know that I need to parse the post html first and may extract image elements first. My consideration for the text is simply to extract all plain texts and show part of them.
Could someone provide other advice, methods or resource about this problem?
As I understand, you want to render a HTML page into an image? You should use some layout engine, such as WebKit or Gecko on server side.
Another option is to use some third-party online tool for these previews. But rendering the page is pretty hard process, because of it's time complexity and memory space requirements for storing images. I have found these services:
http://www.thumboo.com/
http://www.thumbalizr.com/
http://www.zubrag.com/scripts/website-thumbnail-generator.php
I'm trying to find a control for our project that will allow us to render HTML content in Silverlight, without having to use windowless mode or be an out-of-browser app. All of the controls I've found so far require windowless mode. For technical reasons, windowless mdoe and out of browser apps are not possible for us.
The intention is to use the control to show formatted text in our Help system, so if there's a control out there that does a partial implementation, it might still be useful to us. We are mostly looking for the ability for the help content to be defined in some kind of rich text format (most likely HTML) such that it can have formatting, bullets, perhaps tables, images, etc.
Can anyone suggest a control that can do this? We're currently using Silverlight 3, but Silverlight 4 is in the pipeline.
I've used the Vectorlight controls for this sort of thing for both displaying and editing HTML based content. The one I've used is Rich Text Editor which is the original control that works in SL3 and 4. A new one has been introduced called the Html RichTextArea. Note both are actually HTML based. I don't know how well the newer one works, I suspect it is based on the SL4 rich text stuff however the original control works fine.
There is a standard two-pass algorithm mentioned in RFC 1942: http://www.ietf.org/rfc/rfc1942.txt however I haven't seen any good real-world implementations. Anyone know of any? I haven't been able to find anything useful in the Mozilla or WebKit code bases, but I am not entirely sure where to look.
I guess this might actually be a deeper problem with having to actually render HTML (the contents of table cells) but just to keep it simple - plaintext HTML table as an image. Even an HTML table rendering algorithm ignoring the "as an image" part...
If a commercial tool is an option, look at:
HtmlCapture ActiveX Control V2.0 (originally named HtmlSnap)
Some features they claim:
By calling SnapHtmlString(), you can take a snapshot for a html string.
Get snapshot images rendered by either Microsoft IE or Mozilla Firefox.
Just by calling SnapUrl() and SaveImage(), you can take a snapshot of a webpage into various images, such as BMP, JPG, JPEG, GIF, PNG, TIF, TGA and PCX.
Convert html to vector image format like EMF and WMF.
Self contained ActiveX control with no third party dependencies.
Support custom gdi output of the resulting image.
Support saving resulting image both to file and in memory.
Support saving both full-size web page and thumbnail one.
Take a snapshot of a whole webpage into one image without scrollbars.
Make grayscale or B&W images with efficient algorithms to keep the quality.
Support JPEG compression level, compression method selection of TIFF and GIF.
Support setting color depth in images while keeping the quality of the image as much as possible.
Selectively save activeX, image, java applets, scripts and videos on a web page as you want.
Send custom cookies, http headers, credentials in snapshot requests.
Take snapshots of webpages via a Proxy server.
More than 30 samples written in VC, C- , Delphi, VB, C++ Builder, Java, JScript, Perl, VBScript, ASP, ASP.net and PHP are provided.
html table rendering is non-trivial due to the various ways that the sizes of the cells may be specified, tables nested within tables, etc.
if all you want is the image, a simple solution would be the .NET browser control (which is basically the COM component for IE) and a screen-capture function
if you want to get some source to manipulate, the Mozilla source should still be available
I'm not sure if this will meet your constraints or not, but you can try using IE or an IE control with MSHTML and the IHTMLElementRender interface to render the table to a device context.
If you have XHTML, not plain HTML, you should be able to retrieve the content of those cells along with information about the table's structure: colspan, rowspan, etc. Using this information, you can render the table using your own border, padding and margin values.
Things get complex when you also want to render the user defined dimensions. But for retrieving the table data and drawing it, you could use an XML parser. PHP's parser is here: http://ca3.php.net/xml
One tool that comes close is: http://www.terrainformatica.com/htmlayout/main.whtm
This library offers a way to capture rendered HTML to an image, however it is not open source (but free!). Hope it is useful to some!
Unfortunately my app is cross platform, C/C++ with no MFC or platform dependencies (nightmare!). I'm hopefully looking to find a general purpose algorithm for table rendering. I think the 2-pass option from the RFC comes pretty close so I'm probably going to just dig in and work against that. I'll be sure to blog about it and post my eventual solution here if I can!
Take a look at Prince XML - it's a commercial tool to render CSS-styled XML (including XHTML) documents to PDFs. This tool is conform with major W3C standards such as XHTML and CSS2.1. You can try the free demo version from their Homepage!
Since you want an image: It shouldn't be a big problem to convert the generated PDFs programatically to an images.