I have a ckeditor (http://ckeditor.com/) on my site. I would like for users to be able to push a button to generate a PDF. Currently, I have them press the print function that came with ckeditor, which brings up the print window and from most browsers they can generate a PDF. But I want to make it simplier. I know that generating PDFs from html is difficult, but are there any simple solutions to do this (generate a PDF from the html that ckeditor gives)?
I've heard of a few solutions like fpdf, dompdf and html2pdf.
You can use iText and XMLWorker to create PDF from HTML code.
public void createPDF() throws DocumentException, IOException
{
String fileName="path you want to create the document";
Document document=new Document();
PdfWriter pdfWriter=PdfWriter.getInstance(document, new FileOutputStream(fileName));
document.open();
String finall="<h1>This is a Demo</h1>";
InputStream is = new ByteArrayInputStream(finall.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(pdfWriter,document, is);
document.close();
}
Here we are using XML worker so all your tags should be closed correctly. You need iText and XMLWorker JAR files.Hope this will help you.
Related
I am trying to parse sidebar TOC(Table of Components) of some documentation site.
Jsoup
I have tried Jsoup. I can not get TOC elements because the HTML content in this tag is not part of initial HTML but is set by JavaScript after the page is loaded.
You can see my previous question here:JSoup cannot parse child elements after depth 2
The suggested solution is to examine what connections are made manually from the Browser Dev Tools menu find the last version of the website. Parsing sidebar TOC of some documentation site is just one component of my java program so I cannot do this manually.
JavaFX Webview(not Android Webview)
I have tried JavaFX Webview because I need a browser that executes javascript code and fills Toc tag components.
WebView browser = new WebView();
WebEngine webEngine = browser.getEngine();
webEngine.load("https://learn.microsoft.com/en-us/ef/ef6/");
But I don't know how can I retrieve HTML code of the loaded website and transfer this data to Jsoup Document?
ANy advice appreciated.
WebView browser = new WebView();
WebEngine webEngine = browser.getEngine();
String url = "https://learn.microsoft.com/en-us/ef/ef6/";
webEngine.load(url);
//get w3c document from webEngine
org.w3c.dom.Document w3cDocument = webEngine.getDocument();
// use jsoup helper methods to convert it to string
String html = new org.jsoup.helper.W3CDom().asString(webEngine.get);
// create jsoup document by parsing html
Document doc = Jsoup.parse(url, html);
I can't promise this is the best way as I've not used Jsoup before and I'm not an expert on the XML API.
The org.jsoup.Jsoup class has a method for parsing HTML in String form: Jsoup.parse(String). This means we need to get the HTML from the WebView as a String. The WebEngine class has a document property that holds a org.w3c.dom.Document. This Document is the HTML content of the currently showing web page. We just need to convert this Document into a String, which we can do with a Transformer.
import java.io.StringWriter;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.jsoup.Jsoup;
public class Utils {
private static Transformer transformer;
// not thread safe
public static org.jsoup.nodes.Document convert(org.w3c.dom.Document doc)
throws TransformerException {
if (transformer == null) {
transformer = TransformerFactory.newDefaultInstance().newTransformer();
}
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
return Jsoup.parse(writer.toString());
}
}
You would call this every time the document property changes. I did some "tests" by browsing Google and printing the org.jsoup.nodes.Document to the console and everything seems to be working.
There is a caveat, though; as far as I understand it the document property does not change when there are changes within the same web page (the Document itself may be updated, however). I'm not a web person, so pardon me if I don't make sense here, but I believe that this includes things like a frame changing its content. There may be a way around this by interfacing with the JavaScript using WebEngine.executeStript(String), but I don't know how.
I like to embed a HTML site into a PDF document. Are there any libraries or PDF creator that make that possible?
Update:
I am not looking for ways to convert a HTML to PDF. I actually want to use the HMTL as it is inside the PDF. So I am looking for something like iframe for PDF.
There are a few out there, depends if you need to build using PHP or another language. I have used MPDF before: http://www.mpdf1.com/mpdf/
Yes, its possible using xmlworker5.4.1.jar. The XML worker object allows you to embed html in your document. xmlString object below is your HTML content as HTMLWorker is deprecated so use XMLWorker only.
XMLWorkerHelper worker = XMLWorkerHelper.getInstance();
String currentLine = "";
StringBuffer xmlString = new StringBuffer();
xmlString.append("<html><body>");
String str = htmlPages[i];
xmlString.append(htmlPages[i]);
xmlString.append("</body></html>");
worker.parseXHtml(pdfWriter, pdfDocument, new StringReader(xmlString.toString()));
Inorder to incorporate fonts mentioned in font face tag u need to register fonts using
FontFanctory.redisterDirectory("path of font files");
because itext doesnt scan system for fonts. u need to register it yourself this way
I want to create an xml file using jsp. The xml file should contain multiple details as in employee details. The value for the xml tag is from html page. Every time the button is clicked, a new set of tags should be created under the employee tag, for each employee.
create a javascript function for the button in your jsp page. For creating the dynamic xml document, in the javascript function include the jsp tag for creating the file with the extension of xml, like this,
PrintWriter writer = new PrintWriter("the-file-name.xml", "UTF-8");
writer.println("<tag1>"+value1+"</tag1>");
writer.println("<tag2>"+value2+"</tag2>");
writer.close();
where values are obtained from the html pagge.
I'm using Primefaces dataExporter to generate pdf from a dataTable. The pdf generated has all the columns with the same width. I'm looking for a way to change the style of the table on the postProcessor/preProcessor functions. Can I use the setHtmlStyleClass method to change something before generating pdf? I tried to use it, but with no success. I think I didnt understand it correctly.
public void preProcessPDF(Object document) throws IOException, BadElementException, DocumentException {
Document pdf = (Document) document;
pdf.setHtmlStyleClass("reportClass");
...
}
If I can use that method, where can I define reportClass ? Is it a css class for the page on the browser?
If you look at whats going on in the PDFExporter.java export method, the data table in the PDF can not manipulated.
First a com.itextpdf.text.Document object is created.
Document document = new Document(PageSize.A4.rotate());
Then the preProcessor method is called passing the Document, this is before the table is added to the PDF Document.
if(preProcessor != null) {
preProcessor.invoke(facesContext.getELContext(), new Object[]{document});
}
Then the com.itextpdf.text.pdf.PdfPTable is created. The exportPDFTable method doesn't do any special formatting.
PdfPTable pdfTable = exportPDFTable(table, excludeColumns);
document.add(pdfTable);
Now the postProcess method is called and the Document is passed again. Here I would think you would be able to access and change the PdfPTable from the Document object but looking at the iText api it doesn't look like you can.
if(postProcessor != null) {
postProcessor.invoke(facesContext.getELContext(), new Object[]{document});
}
So if you want a styled PDF table your going to have to implement your own PDF export. Hopefully looking at how the PrimeFaces PDFExporter is done will help you with that.
I display receipt in both HTML and printer-friendly version. HTML version does jQuery tabs, etc, while printer-friendly has zero scripts and external dependencies, no master layout, no additional buttons, inline CSS, and can be saved as HTML without problems.
Since I use Spark View Engine, I though maybe it's a good idea to generate PDF using iTextSharp engine. But after few paragraphs I decided it's too cumbersome, because a) I would have to rewrite entire receipt (source Spark view is about 5 pages long) b) I had problems with iTextSharp from beginning - for example, numbered lists kept bulleted, with no indentation, and indentationLeft="20" didn't work - maybe because of lack of documentation, but see (a).
So, my requirements for PDF are very simple: I want to keep the same HTML but insert page breaks between individual receipts (yes I have several ones in a single document).
Is there a simple way to generate PDF from view/HTML without rewriting the view using a strange half-documented engine?
UPDATE: tried community HTMLDoc version; didn't use my inline CSS styles, incorrectly displayed Unicode symbols for currencies. wkhtmltopdf did pick the CSS but failed for currency symbol; I suppose there's problem with encoding solved by setting charset to utf-8. wkhtmltopdf seems to be nice but I'm yet to figure out how to set page breaks...
If you can have the HTML in memory then you can convert it to PDF. I've once did something similar using xhtmlrenderer. It is a JAVA framework that bundles iText and that is capable of converting an HTML stream into PDF. As it is written in JAVA I've used the ikvmc.exe to convert the jar file into a .NET assembly and use it directly from managed code.
public class Pdf : IPdf
{
public FileStreamResult Make(string s)
{
using (var ms = new MemoryStream())
{
using (var document = new Document())
{
PdfWriter.GetInstance(document, ms);
document.Open();
using (var str = new StringReader(s))
{
var htmlWorker = new HTMLWorker(document);
htmlWorker.Parse(str);
}
document.Close();
}
HttpContext.Current.Response.ContentType = "application/pdf";
HttpContext.Current.Response.AddHeader("content-disposition", "attachment;filename=MyPdfName.pdf");
HttpContext.Current.Response.Buffer = true;
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.OutputStream.Write(ms.GetBuffer(), 0, ms.GetBuffer().Length);
HttpContext.Current.Response.OutputStream.Flush();
HttpContext.Current.Response.End();
return new FileStreamResult(HttpContext.Current.Response.OutputStream, "application/pdf");
}
}
}
Finally used wkhtmltopdf which works fine when I set encoding, I found out how to setup page breaks, and it processes my CSS very nice. On issue is that it can't correctly process stdin/out in Windows version (don't remember if it's in or out that doesn't work) - may be fixed in recent versions, but I'm ok with temp files.