Using iText to convert HTML to PDF - html

Does anyone know if it is possible to convert a HTML page (url) to a PDF using iText?
If the answer is 'no' than that is OK as well since I will stop wasting my time trying to work it out and just spend some money on one of a number of components which I know can :)

I think this is exactly what you were looking for
http://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html
http://code.google.com/p/flying-saucer
Flying Saucer's primary purpose is to render spec-compliant XHTML and CSS 2.1 to the screen as a Swing component. Though it was originally intended for embedding markup into desktop applications (things like the iTunes Music Store), Flying Saucer has been extended work with iText as well. This makes it very easy to render XHTML to PDFs, as well as to images and to the screen. Flying Saucer requires Java 1.4 or higher.

I have ended up using ABCPdf from webSupergoo.
It works really well and for about $350 it has saved me hours and hours based on your comments above.

The easiest way of doing this is using pdfHTML.
It's an iText7 add-on that converts HTML5 (+CSS3) into pdf syntax.
The code is pretty straightforward:
HtmlConverter.convertToPdf(
"<b>This text should be written in bold.</b>", // html to be converted
new PdfWriter(
new File("C://users/mark/documents/output.pdf") // destination file
)
);
To learn more, go to http://itextpdf.com/itext7/pdfHTML

The answer to your question is actually two-fold. First of all you need to specify what you intend to do with the rendered HTML: save it to a new PDF file, or use it within another rendering context (i.e. add it to some other document you are generating).
The former is relatively easily accomplished using the Flying Saucer framework, which can be found here: https://github.com/flyingsaucerproject/flyingsaucer
The latter is actually a much more comprehensive problem that needs to be categorized further.
Using iText you won't be able to (trivially, at least) combine iText elements (i.e. Paragraph, Phrase, Chunk and so on) with the generated HTML. You can hack your way out of this by using the ContentByte's addTemplate method and generating the HTML to this template.
If you on the other hand want to stamp the generated HTML with something like watermarks, dates or the like, you can do this using iText.
So bottom line: You can't trivially integrate the rendered HTML in other pdf generating contexts, but you can render HTML directly to a blank PDF document.

Use itext libray:
Here is the sample code. It is working perfectly fine:
String htmlFilePath = filePath + ".html";
String pdfFilePath = filePath + ".pdf";
// create an html file on given file path
Writer unicodeFileWriter = new OutputStreamWriter(new FileOutputStream(htmlFilePath), "UTF-8");
unicodeFileWriter.write(document.toString());
unicodeFileWriter.close();
ConverterProperties properties = new ConverterProperties();
properties.setCharset("UTF-8");
if (url.contains(".kr") || url.contains(".tw") || url.contains(".cn") || url.contains(".jp")) {
properties.setFontProvider(new DefaultFontProvider(false, false, true));
}
// convert the html file to pdf file.
HtmlConverter.convertToPdf(new File(htmlFilePath), new File(pdfFilePath), properties);
Maven dependencies
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.1.6</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>html2pdf</artifactId>
<version>2.1.3</version>
</dependency>

Use iText's HTMLWorker
Example

When I needed HTML to PDF conversion earlier this year, I tried the trial of Winnovative HTML to PDF converter (I think ExpertPDF is the same product, too). It worked great so we bought a license at that company. I don't go into it too in depth after that.

Related

Is it possible to embed a HTML into a PDF?

I like to embed a HTML site into a PDF document. Are there any libraries or PDF creator that make that possible?
Update:
I am not looking for ways to convert a HTML to PDF. I actually want to use the HMTL as it is inside the PDF. So I am looking for something like iframe for PDF.
There are a few out there, depends if you need to build using PHP or another language. I have used MPDF before: http://www.mpdf1.com/mpdf/
Yes, its possible using xmlworker5.4.1.jar. The XML worker object allows you to embed html in your document. xmlString object below is your HTML content as HTMLWorker is deprecated so use XMLWorker only.
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
String currentLine = "";
StringBuffer xmlString = new StringBuffer();
xmlString.append("<html><body>");
String str = htmlPages[i];
xmlString.append(htmlPages[i]);
xmlString.append("</body></html>");
worker.parseXHtml(pdfWriter, pdfDocument, new StringReader(xmlString.toString()));
Inorder to incorporate fonts mentioned in font face tag u need to register fonts using
FontFanctory.redisterDirectory("path of font files");
because itext doesnt scan system for fonts. u need to register it yourself this way

Embed DWG file in HTML

I want to ask how to embed DWG file in HTML Page.
I have tried using tag with Volo Viewer but this solution run only in IE not in Firefox and Chrome.
Dwgview-x can do that, but it will need to be installed as a plug-in on client computers so that anyone can view the dwg file that you embed online.
There may be third party ActiveX controls that you could use, but I think ultimately you will find that it's not practical for drawing files of even average complexity. I recommend to create DWF (if you need vector format) or PNG files on demand (using e.g. the free DWG TrueView from http://usa.autodesk.com/design-review/ ) and embed those instead.
I use DWG Browser. Its a stand alone program that is used for reporting and categorizing drawings with previews. It saves exports in html too.
They have a free demo download available.
http://www.graytechnical.com/software/dwg-browser/
You'll find what I think is the latest information on Autodesk's labs site here: http://labs.blogs.com/its_alive_in_the_lab/2014/01/share-your-autodesk-360-designs-on-company-web-sites.html
It looks like a DWG can be embeded there is an example on this page, but clearly DWF is the way to go.
You can embed DWG file's content in an HTML page by rendering the file's pages as HTML pages or images. If you find it an attractive solution then you can do it using GroupDocs.Viewer API that allows you to render the document pages as HTML pages, images, or a PDF document as a whole. You can then include the rendered HTML/image pages or whole PDF document in your HTML page.
Using C#
ViewerConfig config = new ViewerConfig();
config.StoragePath = "D:\\storage\\";
// Create HTML handler (or ViewerImageHandler for rendering document as image)
ViewerHtmlHandler htmlHandler = new ViewerHtmlHandler(config);
// Guid implies that unique document name
string guid = "sample.dwg";
// Get document pages in html form
List<PageHtml> pages = htmlHandler.GetPages(guid);
// Or Get document pages in image form using image handler
//List<PageImage> pages = imageHandler.GetPages(guid);
foreach (PageHtml page in pages)
{
// Get HTML content of each page using page.HtmlContent
}
Using Java
// Setup GroupDocs.Viewer config
ViewerConfig config = new ViewerConfig();
// Set storage path
config.setStoragePath("D:\\storage\\");
// Create HTML handler (or ViewerImageHandler for rendering document as image)
ViewerHtmlHandler htmlHandler = new ViewerHtmlHandler(config);
String guid = "Sample.dwg"
// Get document pages in HTML form
List<PageHtml> pages = htmlHandler.getPages(guid);
for (PageHtml page : pages) {
// Get HTML content of each page using page.getHtmlContent
}
Disclosure: I work as a Developer Evangelist at GroupDocs.

Is there an example on using Razor to generate a static HTML page?

I want to generate a static HTML page by RAZOR, basically by using includes of partial sub pages.
I have tried T4 as well and do look for an alternative: see here and here
This answer says it is possible - but no concrete example
I have installed Razor generator because I thought this is the way to go, but I do not get how to generate static HTML with this.
Best would be a complete extension which behaves like the T4 concept, but allows me to use the RAZOR syntax and HTML formatting (the formatting issue is basically the reasons why I am not using T4).
If you are trying to take a Razor view and compile it and generate the HTML then you can use something like this.
public static string RenderViewToString(string viewPath, object model, ControllerContext context)
{
var viewEngineResult = ViewEngines.Engines.FindView(context, viewPath, null);
var view = viewEngineResult.View;
context.Controller.ViewData.Model = model;
string result = String.Empty;
using (var sw = new StringWriter())
{
var ctx = new ViewContext(context, view,
context.Controller.ViewData,
context.Controller.TempData,
sw);
view.Render(ctx, sw);
result = sw.ToString();
}
return result;
}
Or outside of ControllerContext http://razorengine.codeplex.com/
The current version of Razor Generator has the "Generator" option which when used with the "MvcHelper" generator produces a static method for the helpers too.
For example, add this line at the top of your CSHTML file (with the Custom Tool Visual Studio property set to RazorGenerator of course):
#* Generator: MvcHelper, GeneratePrettyNames : true *#
The pretty names option is not strictly necessary but is something I feel should be default, to avoid those crazy long class names with underscores :-)
As you may know already, the main benefit of this method is you can share your helpers in separate assemblies. That is why I use Razor Generator in the first place.
Even within the same assembly, you could now leave your code outside App_Code folder. However that is not the best practice (at least for security) and the Visual Studio designer gets confused. It thinks the method is still not static, but it isn't and works fine.
I'm prototyping my helpers in the App_Code folder of the same site/assembly for speed then copying them to shared components when they're tested. The reason I needed this solution was to create generic Bootstrap helpers without hand-coding every piece of HTML in a HtmlHelper, i.e. used together with this solution from #chrismilleruk.
I guess later I may have to convert the CSHTML helpers to a hand-coded HtmlHelper for speed. But to start with see a great development speed increase at the beginning, from the ability to copy and paste blocks of HTML code I want to automate, then perfect and debug them quickly in the same format/editor.

Link in swt browser to file inside jar file [duplicate]

I want to implement a help system for my tiny SWT desktop application.
I've thought about a SWT browser widget containing a single html markup page and a set of anchors to navigate (there are only very few things to explain).
Everything works fine, but how do I load the html file from a jar?
I know aboutgetClass().getClassLoader().getResourceAsStream("foo");, but what is the best practice when reading from the input stream? The answer to Load a resource contained in a jar dissuades using a FileInputStream.
Thanks in advance
Well, I found a rather simple solution that obviously just works:
InputStream in = getClass().getClassLoader().getResourceAsStream("html/index.html");
Scanner scanner = new Scanner(in);
StringBuffer buffer = new StringBuffer();
while(scanner.hasNextLine()) {
buffer.append(scanner.nextLine());
}
browser.setText(buffer.toString());
i tend to use commons-io for such a task giving me simple abstraction methods like IOUtils.toString(InputStream in); and leaving the choice of best implementations to the able people at apache ;)
commons-io: http://commons.apache.org/io/
apidocs: http://commons.apache.org/io/api-release/index.html

ASP.NET MVC: render view to generate PDF: use iTextSharp or better solution?

I display receipt in both HTML and printer-friendly version. HTML version does jQuery tabs, etc, while printer-friendly has zero scripts and external dependencies, no master layout, no additional buttons, inline CSS, and can be saved as HTML without problems.
Since I use Spark View Engine, I though maybe it's a good idea to generate PDF using iTextSharp engine. But after few paragraphs I decided it's too cumbersome, because a) I would have to rewrite entire receipt (source Spark view is about 5 pages long) b) I had problems with iTextSharp from beginning - for example, numbered lists kept bulleted, with no indentation, and indentationLeft="20" didn't work - maybe because of lack of documentation, but see (a).
So, my requirements for PDF are very simple: I want to keep the same HTML but insert page breaks between individual receipts (yes I have several ones in a single document).
Is there a simple way to generate PDF from view/HTML without rewriting the view using a strange half-documented engine?
UPDATE: tried community HTMLDoc version; didn't use my inline CSS styles, incorrectly displayed Unicode symbols for currencies. wkhtmltopdf did pick the CSS but failed for currency symbol; I suppose there's problem with encoding solved by setting charset to utf-8. wkhtmltopdf seems to be nice but I'm yet to figure out how to set page breaks...
If you can have the HTML in memory then you can convert it to PDF. I've once did something similar using xhtmlrenderer. It is a JAVA framework that bundles iText and that is capable of converting an HTML stream into PDF. As it is written in JAVA I've used the ikvmc.exe to convert the jar file into a .NET assembly and use it directly from managed code.
public class Pdf : IPdf
{
public FileStreamResult Make(string s)
{
using (var ms = new MemoryStream())
{
using (var document = new Document())
{
PdfWriter.GetInstance(document, ms);
document.Open();
using (var str = new StringReader(s))
{
var htmlWorker = new HTMLWorker(document);
htmlWorker.Parse(str);
}
document.Close();
}
HttpContext.Current.Response.ContentType = "application/pdf";
HttpContext.Current.Response.AddHeader("content-disposition", "attachment;filename=MyPdfName.pdf");
HttpContext.Current.Response.Buffer = true;
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.OutputStream.Write(ms.GetBuffer(), 0, ms.GetBuffer().Length);
HttpContext.Current.Response.OutputStream.Flush();
HttpContext.Current.Response.End();
return new FileStreamResult(HttpContext.Current.Response.OutputStream, "application/pdf");
}
}
}
Finally used wkhtmltopdf which works fine when I set encoding, I found out how to setup page breaks, and it processes my CSS very nice. On issue is that it can't correctly process stdin/out in Windows version (don't remember if it's in or out that doesn't work) - may be fixed in recent versions, but I'm ok with temp files.