Convert raw HTML codes to PDF File

Convert raw HTML codes to PDF File - html

I want to convert the raw html code to pdf file.
This is my Controller code
#RequestMapping("getpdf")
public void doGet(HttpServletRequest request,
HttpServletResponse response,String ref){
OutputStream out = null;
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
java.util.List items = null;
ArticalBean abean=serviceLayer.getArtical(Integer.parseInt(ref));
items = new ArrayList();
items.add(abean.getArticle());
try {
response.setContentType("application/pdf");
PdfWriter.getInstance(document, response.getOutputStream());
document.open();
Paragraph paragraph = new Paragraph("Microweb Systems");
document.add(paragraph);
ListItem listItem;
com.lowagie.text.List list = new com.lowagie.text.List(true, 15);
Iterator i = items.iterator();
while(i.hasNext()) {
listItem = new ListItem((String)i.next(),
FontFactory.getFont(FontFactory.TIMES_ROMAN, 12));
list.add(listItem);
}
document.add(list);
} catch (Exception e) {
} finally {
document.close();
}
document.close();
}
It converts the HTML codes to PDF but that pdf also contains the tags
Like
<h1>Hello World</h1>
Is there is any way to remove these tags and show only Data.
I am providing the data from database via DTO.

If I understand your question, you want to remove the tags.
This can be done with String.replaceAll(String regex, String replacement).
For example myString.replaceAll("^<[.]*>$" , ""); would remove any tags.
This, however, does not make the pdf look like the page does in a browser.

Related

Render images in iTextSharp HTML5 to PDF

Currently I have an MVC Core v2.0.0 application that uses the .NET 4.7 framework. Inside of this I'm using iTextSharp to try and convert HTML to PDF. If I add an image to the html I get the following exception "The page 1 was requested but the document has only 0 pages".
I have tried using both a full URL and a base64 encoded content.
<img src=\"http://localhost:4808/images/sig1.png\">
<img src=\"data:application/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAEnQAABJ0Ad5mH3gAAADsSURBVEhLtc2NbcMgAERhz9JtImXK1gtkq46RU58K9CX+KWDpswLhdLd83dZi+fyerx24ZEMD4cQgtcOhEfnUjj+hEfyoHTU0opzUjvLar72oHW2gh+5qhzL/4/v0Dd9/qB3KnOX7L7VDmVN8b6gdyuDjcZf6Wk/vqB08qXHLwUCoHWrZcTwQaoeKthwPkFM7SsuOvQFF1Q5lXm0OKAe1QxlZ6mm3ulA7lGnVgfPUDmWKnoFQO5RB50CoHcpE/0CoHcoMDYTa0QZGB0LtKK8TBkLt4GnOQKgd+X/aQKgdMwdC7TF5IC4fiDpwW590JX1NuZQyGwAAAABJRU5ErkJggg==\" alt=\"test.png\">
Here is the helper method that does the transform
public static Stream GeneratePDF(string html, string css = null)
{
MemoryStream ms = new MemoryStream();
//HttpRenerer.PdfSharp implemenation
//PdfSharp.Pdf.PdfDocument pdf =
// TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.Letter, 40);
//pdf.Save(ms, false);
try
{
//iTextSharp implementation
//Create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF
using (Document doc = new Document(PageSize.LETTER))
{
//Create a writer that's bound to our PDF abstraction and our stream
using (PdfWriter writer = PdfWriter.GetInstance(doc, ms))
{
writer.CloseStream = false;
if (string.IsNullOrEmpty(css))
{
//XMLWorker also reads from a TextReader and not directly from a string
using (StringReader srHtml = new StringReader(html))
{
doc.Open();
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, srHtml);
doc.Close();
}
}
else
{
using (MemoryStream msCss = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(css)))
using (MemoryStream msHtml = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))
{
doc.Open();
iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
doc.Close();
}
}
}
}
}
catch (Exception ex)
{
ms.Dispose();
throw ex;
}
//I think this is needed to use the stream to generate a file
ms.Position = 0;
return ms;
}
Here I'm calling my helper method to generate a static PDF document.
public async Task<IActionResult> TestGenerateStaticPdf()
{
//Our sample HTML and CSS
example_html = "<!doctype html><head></head><body><h1>Test Report</h1><p>Printed: 2017-09-29</p><table><tbody><tr><th>User Details</th><th>Date</th><th>Image</th></tr><tr><td>John Doe</td><td>2017-09-29</td><td><img src=\"http://localhost:4808/images/sig1.png\"></td></tr></tbody></table></body>";
example_css = "h1 {color:red;} img {max-height:180px;width:100%;page-break-inside:avoid;} table {border-collapse:collapse;width:100%;} table, th, td {border:1px solid black;padding:5px;page-break-inside:avoid;}";
System.IO.Stream stream = ControllerHelper.GeneratePDF(example_html, example_css);
return File(stream, "application/pdf", "Static Pdf.pdf");
}

It turns out that iTextSharp does not honor the doctype and instead forces XHTML. According to XHTML, the image tag needs to be closed. If you use it seems to generate without the exception. However, I was not able to get the base64 encoded content to render. There is no error in this case but the image does not show up.

ItextPDF Adding Headers and Footers Complex Format

A couple of days ago I asked this question:
Itext PDF How To Add HTML Pre-formatted to PDF, but #bruno-lowagie told me to follow instructions on this existing thread: How To Add HTML Headers And Footers to a Page, I followed carefully the instructions, but found that that approach works for simple html headers and footers like:
<h1>Header Only Line</h1>
or
<h2>Footer Only Line</h2>
But my use case requires to add more complex data in header and footer like images, So I tried with a header that has an img element pointing to an image in the same server like this:
http://localhost:8080/DocGen/resources/images/main_header.jpg
I added some start and end "marks" to see if they got processed so my header was like this:
<p>----Header Start---</p>
<p><img alt="" src="http://localhost:8080/DocGen/resources/images/main_header.jpg" style="height:126px; width:683px" /></p>
<p>--Header End--</p>
But I'm getting an output pdf like this:
Edited: As you can see it doesn't show the image and didn't also show my end mark.
What should I do to successfully add headers and footers with images embedded?
Thanks a lot.
P.S: Sorry for any inconvenience as I am new here, and I hope my question is clear.
EDIT: Code, it's like in the other thread:
public class HtmlHeaderFooter {
private String DEST = null;//"results/events/html_header_footer.pdf";
private String HEADER = null;
private String FOOTER = null;
private float leftMargin;
private float rightMargin;
private float topMargin;
private float bottomMargin;
private Rectangle pageSize = null;
public class HeaderFooter extends PdfPageEventHelper {
protected ElementList header;
protected ElementList footer;
public HeaderFooter() throws IOException {
header = XMLWorkerHelper.parseToElementList(HEADER, null);
footer = XMLWorkerHelper.parseToElementList(FOOTER, null);
}
#Override
public void onEndPage(PdfWriter writer, Document document) {
try {
ColumnText ct = new ColumnText(writer.getDirectContent());
ct.setSimpleColumn(new Rectangle(36, 832, 559, 810));
for (Element e : header) {
System.out.println("Element on header: " + e.toString());
ct.addElement(e);
}
ct.go();
ct.setSimpleColumn(new Rectangle(36, 10, 559, 32));
for (Element e : footer) {
System.out.println("Element on footer: " + e.toString());
ct.addElement(e);
}
ct.go();
} catch (DocumentException de) {
throw new ExceptionConverter(de);
}
}
}
public void createPdfAlt(String outputFile, String inputFile){
Document document = new Document(pageSize, leftMargin, rightMargin, topMargin, bottomMargin);
FileOutputStream outputStream;
try {
outputStream = new FileOutputStream(DEST);
//System.out.println("Doc: " + document.);
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
writer.setPageEvent(new HeaderFooter());
document.open();
PdfContentByte cb = writer.getDirectContent();
// Load existing PDF
PdfReader reader = new PdfReader(new FileInputStream(inputFile));
PdfImportedPage page = writer.getImportedPage(reader, 1);
// document.setPageSize(reader.getPageSize(1));
// Copy first page of existing PDF into output PDF
document.newPage();
cb.addTemplate(page, 0, 0);
document.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
In my Managed Bean I set Header, Footer, outputfile and so on:
HtmlHeaderFooter htmlHeaderFooter = new HtmlHeaderFooter();
htmlHeaderFooter.setFOOTER(footerContent);
htmlHeaderFooter.setHEADER(headerContent);
//htmlHeaderFooter.setPageSize(xml2pdf.getPageSize());
htmlHeaderFooter.setPageSize(com.itextpdf.text.PageSize.A4);
htmlHeaderFooter.setLeftMargin(template2Export.getLeftMargin());
htmlHeaderFooter.setRightMargin(template2Export.getRightMargin());
htmlHeaderFooter.setTopMargin(template2Export.getSuperiorMargin());
htmlHeaderFooter.setBottomMargin(template2Export.getInferiorMargin());
htmlHeaderFooter.setDEST("salidaConHeaderAndFooter.pdf");
htmlHeaderFooter.createPdfAlt("PDFCompleto1.pdf", "test3.pdf");
EDIT 2: Header should look like this
If you are talking about the html code "as is" it's like this:
<p>----Header Start---</p>
<p><img alt="" src="http://localhost:8080/DocGen/resources/images/main_header.jpg" style="height:126px; width:683px" /></p>
<p>--Header End--</p>

You draw the header here:
ct.setSimpleColumn(new Rectangle(36, 832, 559, 810));
So you allow for about 22pt height (832 - 810) to draw all the header material.
On the other hand your header is expected to display this
<p>----Header Start---</p>
<p><img alt="" src="http://localhost:8080/DocGen/resources/images/main_header.jpg" style="height:126px; width:683px" /></p>
<p>--Header End--</p>
This header requires two paragraphs plus 126px (94.5pt). Thus, it does not fit. Consequently, only the first paragraph (which is the only header content that fits) is drawn.
You might want to start by allowing a lot of space, e.g.
ct.setSimpleColumn(new Rectangle(36, 832, 559, 0));
and then reduce it step by step according to your requirements.

HTML to PDF using iTextSharp Library In ASP.NET

I have a scenario where i want to convert Html Template to pdf using iTextSharp.Html Template is situated in the below location
Server.MapPath("~/Template/CertificateMailTemplate.html")
This is the below code I have tried
public string SendCertificate()
{
try
{
byte[] outputstream = null;
using (var stream = new MemoryStream())
{
using (var document = new Document())
{
using (var writer = PdfWriter.GetInstance(document, stream))
{
document.Open();
using (var html = new StringReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))
{
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html);
}
}
}
outputstream = stream.ToArray();
}
//Mail sending code
return "success";
}
catch (Exception ex)
{
return ex.Message;
}
finally
{
if (reader != null)
{
reader.Dispose();
}
}
}
Here I am getting the following error The document has no pages.
Any help will be highly appreciated.

The ParseXHtml method takes a TextReader as the third parameter and Microsoft provides two main implementations of that class, the StringReader and the StreamReader.
The StringReader (which you are using) is for when you want to take an existing .Net string and read it as a sequence of characters.
The StreamReader (which you should switch to) is for when you want to a byte source (such as a file) and read that as a sequence of characters.
You are passing a string such as "c:\blah\blah\blah.hthml" and iTextSharp is interpreting that as your HTML and not your path to your HTML.
If you want .Net to take a wild guess at the encoding of the file you can swap your code one-for-one using the constructor StreamReader(String). If you know the specific encoding of the file you can use one of the overloads such as StreamReader(String, Encoding).
So, this line:
//Incorrect
using (var html = new StringReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))
Should instead be:
//Correct
using (var html = new StreamReader(Server.MapPath("~/Template/CertificateMailTemplate.html")))

iframe doesn't display dynamically generated html file

I'm dynamically creating an HTML file using HTMLTEXTWRITER, I can view it in browsers but when I try to view this html in an iframe, nothing is displayed! what is wrong what dynamically generated htmls? also iframe doesn't display some HTMLs in firefox, here is how I create my html:
static string GetDivElements()
{
// Initialize StringWriter instance.
StringWriter stringWriter = new StringWriter();
// Put HtmlTextWriter in using block because it needs to call Dispose.
using (HtmlTextWriter writer = new HtmlTextWriter(stringWriter))
{
// Loop over some strings.
foreach (var word in _words)
{
// Some strings for the attributes.
string classValue = "ClassName";
string urlValue = "http://www.dotnetperls.com/";
string imageValue = "image.jpg";
// The important part:
writer.AddAttribute(HtmlTextWriterAttribute.Class, classValue);
writer.RenderBeginTag(HtmlTextWriterTag.Div); // Begin #1
...my html
writer.RenderBeginTag(HtmlTextWriterTag.Img); // Begin #3
writer.RenderEndTag(); // End #3
writer.Write(word);
writer.RenderEndTag(); // End #2
writer.RenderEndTag(); // End #1
}
}
// Return the result.
string pre = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">";
pre += "<html xmlns=\"http://www.w3.org/1999/xhtml\"><head>";
pre += "<meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\"><title></title></head><body>";
return stringWriter.ToString();// +"</body></html>";
}
private void Form1_Load(object sender, EventArgs e)
{
using (StreamWriter outfile =
new StreamWriter(#"d:\myFile.htm"))
{
outfile.Write(GetDivElements());
}
}

FlyingSaucer: convert an HTML document to PDF ignoring external CSS?

I'm using the following to convert HTML to PDF:
InputStream convert(InputStream fileInputStream) {
PipedInputStream inputStream = new PipedInputStream()
PipedOutputStream outputStream = new PipedOutputStream(inputStream)
new Thread({
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(fileInputStream)
ITextRenderer renderer = new ITextRenderer()
renderer.setDocument(document, "")
renderer.layout()
renderer.createPDF(outputStream)
}).start()
return inputStream
}
From the documentation, apparently I should be able to set a "User Agent" resolver somewhere, but I'm not sure where, exactly. Anyone know how to ignore external CSS in a document?

Not the same question but my answer for that one will work here too: Resolving protected resources with Flying Saucer (ITextRenderer)
Override this method:
public CSSResource getCSSResource(String uri) {
return new CSSResource(resolveAndOpenStream(uri));
}
with
public CSSResource getCSSResource(String uri) {
return new CSSResource(new ByteArrayInputStream([] as byte[]));
}

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Convert raw HTML codes to PDF File - html

If I understand your question, you want to remove the tags. This can be done with String.replaceAll(String regex, String replacement). For example myString.replaceAll("^<[.]*>$" , ""); would remove any tags. This, however, does not make the pdf look like the page does in a browser.

Related

Render images in iTextSharp HTML5 to PDF

ItextPDF Adding Headers and Footers Complex Format

HTML to PDF using iTextSharp Library In ASP.NET

iframe doesn't display dynamically generated html file

FlyingSaucer: convert an HTML document to PDF ignoring external CSS?

Categories

Resources