Render images in iTextSharp HTML5 to PDF - html

Currently I have an MVC Core v2.0.0 application that uses the .NET 4.7 framework. Inside of this I'm using iTextSharp to try and convert HTML to PDF. If I add an image to the html I get the following exception "The page 1 was requested but the document has only 0 pages".
I have tried using both a full URL and a base64 encoded content.
<img src=\"http://localhost:4808/images/sig1.png\">
<img src=\"data:application/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAEnQAABJ0Ad5mH3gAAADsSURBVEhLtc2NbcMgAERhz9JtImXK1gtkq46RU58K9CX+KWDpswLhdLd83dZi+fyerx24ZEMD4cQgtcOhEfnUjj+hEfyoHTU0opzUjvLar72oHW2gh+5qhzL/4/v0Dd9/qB3KnOX7L7VDmVN8b6gdyuDjcZf6Wk/vqB08qXHLwUCoHWrZcTwQaoeKthwPkFM7SsuOvQFF1Q5lXm0OKAe1QxlZ6mm3ulA7lGnVgfPUDmWKnoFQO5RB50CoHcpE/0CoHcoMDYTa0QZGB0LtKK8TBkLt4GnOQKgd+X/aQKgdMwdC7TF5IC4fiDpwW590JX1NuZQyGwAAAABJRU5ErkJggg==\" alt=\"test.png\">
Here is the helper method that does the transform
public static Stream GeneratePDF(string html, string css = null)
{
MemoryStream ms = new MemoryStream();
//HttpRenerer.PdfSharp implemenation
//PdfSharp.Pdf.PdfDocument pdf =
// TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.Letter, 40);
//pdf.Save(ms, false);
try
{
//iTextSharp implementation
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (Document doc = new Document(PageSize.LETTER))
{
//Create a writer that's bound to our PDF abstraction and our stream
using (PdfWriter writer = PdfWriter.GetInstance(doc, ms))
{
writer.CloseStream = false;
if (string.IsNullOrEmpty(css))
{
//XMLWorker also reads from a TextReader and not directly from a string
using (StringReader srHtml = new StringReader(html))
{
doc.Open();
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
doc.Close();
}
}
else
{
using (MemoryStream msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(css)))
using (MemoryStream msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
doc.Open();
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
doc.Close();
}
}
}
}
}
catch (Exception ex)
{
ms.Dispose();
throw ex;
}
//I think this is needed to use the stream to generate a file
ms.Position = 0;
return ms;
}
Here I'm calling my helper method to generate a static PDF document.
public async Task<IActionResult> TestGenerateStaticPdf()
{
//Our sample HTML and CSS
example_html = "<!doctype html><head></head><body><h1>Test Report</h1><p>Printed: 2017-09-29</p><table><tbody><tr><th>User Details</th><th>Date</th><th>Image</th></tr><tr><td>John Doe</td><td>2017-09-29</td><td><img src=\"http://localhost:4808/images/sig1.png\"></td></tr></tbody></table></body>";
example_css = "h1 {color:red;} img {max-height:180px;width:100%;page-break-inside:avoid;} table {border-collapse:collapse;width:100%;} table, th, td {border:1px solid black;padding:5px;page-break-inside:avoid;}";
System.IO.Stream stream = ControllerHelper.GeneratePDF(example_html, example_css);
return File(stream, "application/pdf", "Static Pdf.pdf");
}

It turns out that iTextSharp does not honor the doctype and instead forces XHTML. According to XHTML, the image tag needs to be closed. If you use it seems to generate without the exception. However, I was not able to get the base64 encoded content to render. There is no error in this case but the image does not show up.

Related

How can I specify the download location of pdf?

How can I specify the download location of the converted pdf in the server?
When I run in the server I want to save the pdf in server files but I don't know how can I manipulate the memory stream or how to do it.
using Microsoft.AspNetCore.Mvc;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
using Syncfusion.HtmlConverter;
using Syncfusion.Pdf;
using System.IO;
using Microsoft.AspNetCore.Hosting;
namespace HospitalQR.Web.Controllers
{
public class FormController : Controller
{
private readonly IHostingEnvironment _hostingEnvironment;
public FormController(IHostingEnvironment hostingEnvironment)
{
_hostingEnvironment = hostingEnvironment;
}
public IActionResult Index()
{
return View("PreForm");
}
public IActionResult PdfConverter()
{
HtmlToPdfConverter converter = new HtmlToPdfConverter();
WebKitConverterSettings settings = new WebKitConverterSettings();
settings.WebKitPath = Path.Combine(_hostingEnvironment.ContentRootPath, "QtBinariesWindows");
converter.ConverterSettings = settings;
PdfDocument document = converter.Convert("https://localhost:44334/form");
MemoryStream ms = new MemoryStream();
document.Save(ms);
document.Close(true);
ms.Position = 0;
FileStreamResult fileStreamResult = new FileStreamResult(ms, "application/pdf");
fileStreamResult.FileDownloadName = "PreForm.pdf";
return fileStreamResult;
}
}
}
By default, it will automatically save the downloaded file in your browser location. We could not able set the defined path in an ASP NET Core Web application and download it from the browser. If you want the save the pdf in a defined path, please change the browser path setting to that predefined path.
However, if you want to save the PDF document in your “wwwroot” location, please refer the below code snippet,
//Initialize HTML to PDF converter with WebKit rendering engine
HtmlToPdfConverter htmlConverter = new HtmlToPdfConverter(HtmlRenderingEngine.WebKit);
WebKitConverterSettings settings = new WebKitConverterSettings();
//Set the QtBinaries folder path
settings.WebKitPath= Path.Combine(_hostingEnvironment.ContentRootPath, "QtBinariesWindows");
//Assign WebKit settings to HTML converter
htmlConverter.ConverterSettings = settings;
//Convert URL to PDF
PdfDocument document = htmlConverter.Convert("https://www.google.com");
string path = _hostingEnvironment.ContentRootPath+ "\\wwwroot\\output.pdf";
FileStream file = new FileStream(path,FileMode.Create, FileAccess.Write);
document.Save(file);
document.Close(true);
Please try the above solution on your end and let us know if it suits your requirement.
Note: I work for Syncfusion.

HTML to PDF using iTextSharp Library In ASP.NET

I have a scenario where i want to convert Html Template to pdf using iTextSharp.Html Template is situated in the below location
Server.MapPath("~/Template/CertificateMailTemplate.html")
This is the below code I have tried
public string SendCertificate()
{
try
{
byte[] outputstream = null;
using (var stream = new MemoryStream())
{
using (var document = new Document())
{
using (var writer = PdfWriter.GetInstance(document, stream))
{
document.Open();
using (var html = new StringReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html);
}
}
}
outputstream = stream.ToArray();
}
//Mail sending code
return "success";
}
catch (Exception ex)
{
return ex.Message;
}
finally
{
if (reader != null)
{
reader.Dispose();
}
}
}
Here I am getting the following error The document has no pages.
Any help will be highly appreciated.
The ParseXHtml method takes a TextReader as the third parameter and Microsoft provides two main implementations of that class, the StringReader and the StreamReader.
The StringReader (which you are using) is for when you want to take an existing .Net string and read it as a sequence of characters.
The StreamReader (which you should switch to) is for when you want to a byte source (such as a file) and read that as a sequence of characters.
You are passing a string such as "c:\blah\blah\blah.hthml" and iTextSharp is interpreting that as your HTML and not your path to your HTML.
If you want .Net to take a wild guess at the encoding of the file you can swap your code one-for-one using the constructor StreamReader(String). If you know the specific encoding of the file you can use one of the overloads such as StreamReader(String, Encoding).
So, this line:
//Incorrect
using (var html = new StringReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))
Should instead be:
//Correct
using (var html = new StreamReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))

Convert raw HTML codes to PDF File

I want to convert the raw html code to pdf file.
This is my Controller code
#RequestMapping("getpdf")
public void doGet(HttpServletRequest request,
HttpServletResponse response,String ref){
OutputStream out = null;
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
java.util.List items = null;
ArticalBean abean=serviceLayer.getArtical(Integer.parseInt(ref));
items = new ArrayList();
items.add(abean.getArticle());
try {
response.setContentType("application/pdf");
PdfWriter.getInstance(document, response.getOutputStream());
document.open();
Paragraph paragraph = new Paragraph("Microweb Systems");
document.add(paragraph);
ListItem listItem;
com.lowagie.text.List list = new com.lowagie.text.List(true, 15);
Iterator i = items.iterator();
while(i.hasNext()) {
listItem = new ListItem((String)i.next(),
FontFactory.getFont(FontFactory.TIMES_ROMAN, 12));
list.add(listItem);
}
document.add(list);
} catch (Exception e) {
} finally {
document.close();
}
document.close();
}
It converts the HTML codes to PDF but that pdf also contains the tags
Like
<h1>Hello World</h1>
Is there is any way to remove these tags and show only Data.
I am providing the data from database via DTO.
If I understand your question, you want to remove the tags.
This can be done with String.replaceAll(String regex, String replacement).
For example myString.replaceAll("^<[.]*>$" , ""); would remove any tags.
This, however, does not make the pdf look like the page does in a browser.

How to avoid java.net.UnknownHostException while parsing HTML content to generate Pdf file using iText

I want to convert some HTML content into a PDF file. The problem I'm facing is that the HTML content has some <img> tags with absolute image urls. Hence the
HTMLWorker.parse()
method throws following exception in case there is no network connectivity.
ExceptionConverter: java.net.UnknownHostException: xyz.com
Is there a way to avoid this exception in such case and generate a pdf without any image?
I'm using iText-5.0.5 library.
You should implement your ImageProvider and when there is a problem retrieving the image just return null, like
public static class MyImageProvider implements ImageProvider {
public Image getImage(String src, Map<String, String> h, ChainedProperties cprops, DocListener doc) {
try {
return Image.getInstance(IMAGE_URL); //create IMAGE_URL from src parameter
} catch (IOException e) {
return null;
}
}
}
Then you should use the HTMLWorker with this provider
HashMap<String,Object> map = new HashMap<String, Object>();
map.put(HTMLWorker.IMG_PROVIDER, new MyImageProvider());
HTMLWorker.parseToList(new FileReader(HTML), null, map);

FlyingSaucer: convert an HTML document to PDF ignoring external CSS?

I'm using the following to convert HTML to PDF:
InputStream convert(InputStream fileInputStream) {
PipedInputStream inputStream = new PipedInputStream()
PipedOutputStream outputStream = new PipedOutputStream(inputStream)
new Thread({
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(fileInputStream)
ITextRenderer renderer = new ITextRenderer()
renderer.setDocument(document, "")
renderer.layout()
renderer.createPDF(outputStream)
}).start()
return inputStream
}
From the documentation, apparently I should be able to set a "User Agent" resolver somewhere, but I'm not sure where, exactly. Anyone know how to ignore external CSS in a document?
Not the same question but my answer for that one will work here too: Resolving protected resources with Flying Saucer (ITextRenderer)
Override this method:
public CSSResource getCSSResource(String uri) {
return new CSSResource(resolveAndOpenStream(uri));
}
with
public CSSResource getCSSResource(String uri) {
return new CSSResource(new ByteArrayInputStream([] as byte[]));
}