FlyingSaucer: convert an HTML document to PDF ignoring external CSS? - html

I'm using the following to convert HTML to PDF:
InputStream convert(InputStream fileInputStream) {
PipedInputStream inputStream = new PipedInputStream()
PipedOutputStream outputStream = new PipedOutputStream(inputStream)
new Thread({
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(fileInputStream)
ITextRenderer renderer = new ITextRenderer()
renderer.setDocument(document, "")
renderer.layout()
renderer.createPDF(outputStream)
}).start()
return inputStream
}
From the documentation, apparently I should be able to set a "User Agent" resolver somewhere, but I'm not sure where, exactly. Anyone know how to ignore external CSS in a document?

Not the same question but my answer for that one will work here too: Resolving protected resources with Flying Saucer (ITextRenderer)
Override this method:
public CSSResource getCSSResource(String uri) {
return new CSSResource(resolveAndOpenStream(uri));
}
with
public CSSResource getCSSResource(String uri) {
return new CSSResource(new ByteArrayInputStream([] as byte[]));
}

Related

Render images in iTextSharp HTML5 to PDF

Currently I have an MVC Core v2.0.0 application that uses the .NET 4.7 framework. Inside of this I'm using iTextSharp to try and convert HTML to PDF. If I add an image to the html I get the following exception "The page 1 was requested but the document has only 0 pages".
I have tried using both a full URL and a base64 encoded content.
<img src=\"http://localhost:4808/images/sig1.png\">
<img src=\"data:application/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAEnQAABJ0Ad5mH3gAAADsSURBVEhLtc2NbcMgAERhz9JtImXK1gtkq46RU58K9CX+KWDpswLhdLd83dZi+fyerx24ZEMD4cQgtcOhEfnUjj+hEfyoHTU0opzUjvLar72oHW2gh+5qhzL/4/v0Dd9/qB3KnOX7L7VDmVN8b6gdyuDjcZf6Wk/vqB08qXHLwUCoHWrZcTwQaoeKthwPkFM7SsuOvQFF1Q5lXm0OKAe1QxlZ6mm3ulA7lGnVgfPUDmWKnoFQO5RB50CoHcpE/0CoHcoMDYTa0QZGB0LtKK8TBkLt4GnOQKgd+X/aQKgdMwdC7TF5IC4fiDpwW590JX1NuZQyGwAAAABJRU5ErkJggg==\" alt=\"test.png\">
Here is the helper method that does the transform
public static Stream GeneratePDF(string html, string css = null)
{
MemoryStream ms = new MemoryStream();
//HttpRenerer.PdfSharp implemenation
//PdfSharp.Pdf.PdfDocument pdf =
// TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.Letter, 40);
//pdf.Save(ms, false);
try
{
//iTextSharp implementation
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (Document doc = new Document(PageSize.LETTER))
{
//Create a writer that's bound to our PDF abstraction and our stream
using (PdfWriter writer = PdfWriter.GetInstance(doc, ms))
{
writer.CloseStream = false;
if (string.IsNullOrEmpty(css))
{
//XMLWorker also reads from a TextReader and not directly from a string
using (StringReader srHtml = new StringReader(html))
{
doc.Open();
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
doc.Close();
}
}
else
{
using (MemoryStream msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(css)))
using (MemoryStream msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
doc.Open();
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
doc.Close();
}
}
}
}
}
catch (Exception ex)
{
ms.Dispose();
throw ex;
}
//I think this is needed to use the stream to generate a file
ms.Position = 0;
return ms;
}
Here I'm calling my helper method to generate a static PDF document.
public async Task<IActionResult> TestGenerateStaticPdf()
{
//Our sample HTML and CSS
example_html = "<!doctype html><head></head><body><h1>Test Report</h1><p>Printed: 2017-09-29</p><table><tbody><tr><th>User Details</th><th>Date</th><th>Image</th></tr><tr><td>John Doe</td><td>2017-09-29</td><td><img src=\"http://localhost:4808/images/sig1.png\"></td></tr></tbody></table></body>";
example_css = "h1 {color:red;} img {max-height:180px;width:100%;page-break-inside:avoid;} table {border-collapse:collapse;width:100%;} table, th, td {border:1px solid black;padding:5px;page-break-inside:avoid;}";
System.IO.Stream stream = ControllerHelper.GeneratePDF(example_html, example_css);
return File(stream, "application/pdf", "Static Pdf.pdf");
}
It turns out that iTextSharp does not honor the doctype and instead forces XHTML. According to XHTML, the image tag needs to be closed. If you use it seems to generate without the exception. However, I was not able to get the base64 encoded content to render. There is no error in this case but the image does not show up.

HTML to PDF using iTextSharp Library In ASP.NET

I have a scenario where i want to convert Html Template to pdf using iTextSharp.Html Template is situated in the below location
Server.MapPath("~/Template/CertificateMailTemplate.html")
This is the below code I have tried
public string SendCertificate()
{
try
{
byte[] outputstream = null;
using (var stream = new MemoryStream())
{
using (var document = new Document())
{
using (var writer = PdfWriter.GetInstance(document, stream))
{
document.Open();
using (var html = new StringReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html);
}
}
}
outputstream = stream.ToArray();
}
//Mail sending code
return "success";
}
catch (Exception ex)
{
return ex.Message;
}
finally
{
if (reader != null)
{
reader.Dispose();
}
}
}
Here I am getting the following error The document has no pages.
Any help will be highly appreciated.
The ParseXHtml method takes a TextReader as the third parameter and Microsoft provides two main implementations of that class, the StringReader and the StreamReader.
The StringReader (which you are using) is for when you want to take an existing .Net string and read it as a sequence of characters.
The StreamReader (which you should switch to) is for when you want to a byte source (such as a file) and read that as a sequence of characters.
You are passing a string such as "c:\blah\blah\blah.hthml" and iTextSharp is interpreting that as your HTML and not your path to your HTML.
If you want .Net to take a wild guess at the encoding of the file you can swap your code one-for-one using the constructor StreamReader(String). If you know the specific encoding of the file you can use one of the overloads such as StreamReader(String, Encoding).
So, this line:
//Incorrect
using (var html = new StringReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))
Should instead be:
//Correct
using (var html = new StreamReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))

How to avoid java.net.UnknownHostException while parsing HTML content to generate Pdf file using iText

I want to convert some HTML content into a PDF file. The problem I'm facing is that the HTML content has some <img> tags with absolute image urls. Hence the
HTMLWorker.parse()
method throws following exception in case there is no network connectivity.
ExceptionConverter: java.net.UnknownHostException: xyz.com
Is there a way to avoid this exception in such case and generate a pdf without any image?
I'm using iText-5.0.5 library.
You should implement your ImageProvider and when there is a problem retrieving the image just return null, like
public static class MyImageProvider implements ImageProvider {
public Image getImage(String src, Map<String, String> h, ChainedProperties cprops, DocListener doc) {
try {
return Image.getInstance(IMAGE_URL); //create IMAGE_URL from src parameter
} catch (IOException e) {
return null;
}
}
}
Then you should use the HTMLWorker with this provider
HashMap<String,Object> map = new HashMap<String, Object>();
map.put(HTMLWorker.IMG_PROVIDER, new MyImageProvider());
HTMLWorker.parseToList(new FileReader(HTML), null, map);

Return Pdf document from Asp.net MVC controller

What is the best way to generate html page from data in view? I have a html template with all tables and etc. Don't want to use any templating like JqueryTemplate.
Try this approach using the hiqpdf html to pdf converter, a commercial product:
public class HomeController : Controller
{
public ActionResult Index()
{
ViewBag.Message = "Welcome to ASP.NET MVC!";
Session["MySessionVariable"] = "My Session Variable Value assigned in Index";
return View();
}
public ActionResult About()
{
return View();
}
public string RenderViewAsString(string viewName, object model)
{
// create a string writer to receive the HTML code
StringWriter stringWriter = new StringWriter();
// get the view to render
ViewEngineResult viewResult = ViewEngines.Engines.FindView(ControllerContext, viewName, null);
// create a context to render a view based on a model
ViewContext viewContext = new ViewContext(
ControllerContext,
viewResult.View,
new ViewDataDictionary(model),
new TempDataDictionary(),
stringWriter
);
// render the view to a HTML code
viewResult.View.Render(viewContext, stringWriter);
// return the HTML code
return stringWriter.ToString();
}
[HttpPost]
public ActionResult ConvertThisPageToPdf()
{
// get the HTML code of this view
string htmlToConvert = RenderViewAsString("Index", null);
// the base URL to resolve relative images and css
String thisPageUrl = this.ControllerContext.HttpContext.Request.Url.AbsoluteUri;
String baseUrl = thisPageUrl.Substring(0, thisPageUrl.Length - "Home/ConvertThisPageToPdf".Length);
// instantiate the HiQPdf HTML to PDF converter
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
// hide the button in the created PDF
htmlToPdfConverter.HiddenHtmlElements = new string[] { "#convertThisPageButtonDiv" };
// render the HTML code as PDF in memory
byte[] pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlToConvert, baseUrl);
// send the PDF file to browser
FileResult fileResult = new FileContentResult(pdfBuffer, "application/pdf");
fileResult.FileDownloadName = "ThisMvcViewToPdf.pdf";
return fileResult;
}
[HttpPost]
public ActionResult ConvertAboutPageToPdf()
{
// get the About view HTML code
string htmlToConvert = RenderViewAsString("About", null);
// the base URL to resolve relative images and css
String thisPageUrl = this.ControllerContext.HttpContext.Request.Url.AbsoluteUri;
String baseUrl = thisPageUrl.Substring(0, thisPageUrl.Length - "Home/ConvertAboutPageToPdf".Length);
// instantiate the HiQPdf HTML to PDF converter
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
// render the HTML code as PDF in memory
byte[] pdfBuffer = htmlToPdfConverter.ConvertHtmlToMemory(htmlToConvert, baseUrl);
// send the PDF file to browser
FileResult fileResult = new FileContentResult(pdfBuffer, "application/pdf");
fileResult.FileDownloadName = "AboutMvcViewToPdf.pdf";
return fileResult;
}
}
Source of this sample code: How to convert HTML to PDF using HiQPDF
Just create pdf server side and return File instead of html view.
I don't what kind of pdf provider do you use but this a solution for iTextSharp:
How to return PDF to browser in MVC?

How to serve html file from another directory as ActionResult

I have a specialised case where I wish to serve a straight html file from a Controller Action.
I want to serve it from a different folder other than the Views folder. The file is located in
Solution\Html\index.htm
And I want to serve it from a standard controller action. Could i use return File? And
how do I do this?
Check this out :
public ActionResult Index()
{
return new FilePathResult("~/Html/index.htm", "text/html");
}
If you want to render this index.htm file in the browser then you could create controller action like this:
public void GetHtml()
{
var encoding = new System.Text.UTF8Encoding();
var htm = System.IO.File.ReadAllText(Server.MapPath("/Solution/Html/") + "index.htm", encoding);
byte[] data = encoding.GetBytes(htm);
Response.OutputStream.Write(data, 0, data.Length);
Response.OutputStream.Flush();
}
or just by:
public ActionResult GetHtml()
{
return File(Server.MapPath("/Solution/Html/") + "index.htm", "text/html");
}
So lets say this action is in Home controller and some user hits http://yoursite.com/Home/GetHtml then index.htm will be rendered.
EDIT: 2 other methods
If you want to see raw html of index.htm in the browser:
public ActionResult GetHtml()
{
Response.AddHeader("Content-Disposition", new System.Net.Mime.ContentDisposition { Inline = true, FileName = "index.htm"}.ToString());
return File(Server.MapPath("/Solution/Html/") + "index.htm", "text/plain");
}
If you just want to download file:
public FilePathResult GetHtml()
{
return File(Server.MapPath("/Solution/Html/") + "index.htm", "text/html", "index.htm");
}
I extended wahid's answer to create HtmlResult
Create Html Result which extends FilePathResult
public class HtmlResult : FilePathResult
{
public HtmlResult(string path)
: base(path, "text/html")
{
}
}
Created static method on controller
public static HtmlResult Html(this Controller controller, string path)
{
return new HtmlResult(path);
}
used like we return view
public HtmlResult Index()
{
return this.Html("~/Index.html");
}
Hope it helps
I want put my two cents in. I have found this most terse and it is there already :
public ActionResult Index()
{
var encoding = new System.Text.UTF8Encoding();
var html = ""; //get it from file, from blob or whatever
return this.Content(html, "text/html; charset=utf-8");
}
Can you read the html file in a string and return it in action? It is rendered as Html page as shown below:
public string GetHtmlFile(string file)
{
file = Server.MapPath("~/" + file);
StreamReader streamReader = new StreamReader(file);
string text = streamReader.ReadToEnd();
streamReader.Close();
return text;
}
Home/GetHtmlFile?file=Solution\Html\index.htm
If the destination or storage mechanism of HTML files is complicated then you can you Virtual path provider
Virtual path provider MVC sample
Alternative approach if using .net core is to use a FileProvider.
The files could be in a folder or embedded at compile time.
In this example we will use embedded files.
Add a folder in your project let's say assets, in it create a file myfile.html, add some basic html to the file say
<html>
<head>
<title>Test</title>
</head>
<body>
Hello World
</body>
</html>
Right click on the new file (assuming you are in visual studio) select properties, in the properties screen / build action, select embedded resource. It will add the file to the csproj file.
Right click on your project, edit your csproj file.
Check that your property group contains the following:
<GenerateEmbeddedFilesManifest>true</GenerateEmbeddedFilesManifest>
If not please add it. The csproj should also contain the newly created html file as:
<ItemGroup>
<EmbeddedResource Include="assets\myfile.html" />
</ItemGroup>
To read the file in your controller and pass it to the client requires a file provider which is added to the startup.cs
Edit your startup.cs make sure it includes the HostingEnvironment:
private readonly IHostingEnvironment HostingEnvironment;
public Startup(IHostingEnvironment hostingEnvironment)
{
HostingEnvironment = hostingEnvironment;
}
Then create a file provider and make it a service that can be injected at runtime. Create it as follows:
var physicalProvider = HostingEnvironment.ContentRootFileProvider;
var manifestEmbeddedProvider =
new ManifestEmbeddedFileProvider(Assembly.GetEntryAssembly());
var compositeProvider =
new CompositeFileProvider(physicalProvider, manifestEmbeddedProvider);
services.AddSingleton<IFileProvider>(compositeProvider);
To serve the file go to your controller, use dependency injection to get the FileProvider, create a new service and serve the file. To do this, start with dependency injection by adding the provider to your constructor.
IFileProvider _fileProvider;
public MyController(IFileProvider fileProvider)
{
this._fileProvider = fileProvider;
}
Then use the file provider in your service
[HttpGet("/myfile")]
[Produces("text/html")]
public Stream GetMyFile()
{
// Use GetFileInfo to get details on the file passing in the path added to the csproj
// Using the fileInfo returned create a stream and return it.
IFileInfo fileinfo = _fileProvider.GetFileInfo("assets/myfile.html");
return fileinfo.CreateReadStream();
}
For more info see ASP .Net Core file provider sample and the Microsoft documentation here.