We are looking for a server-side solution that is capable of taking an HTML page and generating a document in one of several formats - pdf, as well as rtf, doc, etc.
I've used LiveDocX to mail-merge elements using a Word Doc template and generate pdf's with success. I also know that it is capable supporting our requirement of generating other formats in addition to pdf. But I am not sure if I can supply an HTML page and generate these files?
I see there is a gamut of HTML to PDF converter options. I've seen the the list posted up on https://stackoverflow.com/questions/3178448/list-of-html-to-pdf-converters. But what I am looking for is additional capability to generate not just pdf's but multiple formats.
A server-side Microsoft IIS (.NET or COM+) solution is preferred, but will also look at good PHP options.
THANKS
What about some funky OpenOffice skripts, that transform given HTML page. It should be flexible enough.
Related
I want to generate a PDF from HTML that has tags for accessibility embedded in it. In other words I want to convert HTML to PDF so that it is JAWS-friendly. The standard options out there for generating PDFs from HTML do not embed accessibility tags in the exported PDF. For example, there are various plug-ins for jsPDF and none of them have the ability to add the appropriate accessibility tags. Any one have experience with this?
Just mentioning that I found an open-source 508 / WCAG 2.0 / PDF-EU compliant PDF generation library that is worth a look.
I haven't evaluated it yet (will later today), but the project has 1000 stars on github and 200 forks:
https://github.com/danfickle/openhtmltopdf
Here's their glorious and concise write-up on compliance:
https://github.com/danfickle/openhtmltopdf/wiki/PDF-Accessibility-(PDF-UA,-WCAG,-Section-508)-Support
That's extremely sad, but there are very few libraries able to generate accessible PDFs.
To my knowledge, none of the free and/or very popular ones are able to do it. None out there in JavaScript or PHP at least.
The only one I know of is iText. It's Java, and quite expensive if you need to have a commercial license.
By the way, it's important to note that, when you open a PDF directly in your browser, given that they often use these same JavaScript libraries to render PDFs back into HTML, even if your PDF has accessibility tags, they aren't rendered back with accessibility in the browser.
The PDF has to be opened into a real PDF reader like Adobe Reader in order for the user to read the PDF accessibly.
I want to create some html help pages, separate html pages.
However, I want to have the same content on the top and bottom of the pages.
In the past I've used PHP or ASP, with header and footer files.
I've then had to do view source and save these pages to get what I want.
I just wondered if there an easiest way to do this ?
EDIT:
The pages are for use with software using a web object not a normal browser. So there won't be a web server
If your web server supports it, you could do server side includes
You could use frames, but it's not necessarily advisable (for one, it breaks navigation).
You could use XML files with an XSLT stylesheet to turn them into HTML documents that share similar elements.
You could use PHP or another server-side language to generate the pages, and then use a recursive download tool (such as wget) to turn them into HTML.
EDIT: you're basically asking whether the "standard-ish" subset of HTML supported by your component of choice provides a way of including data from a common file, just so you won't have to include the data in every HTML document.
The answer hovers somewhere between "no way" and "maybe your component has a few tricks to do that".
The sane thing to do here would be to have a tool generate the HTML documents from a common template. Could be XML + XSLT, PHP/ASP/whatever, or a fully-fledged CMS (this actually helps let non-technical users write the document contents).
It's awful, but you could include a JS file that uses a bunch of document.write("...") to include common elements. Not SEO friendly.
Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.
Remember, PDF to HTML. I'm NOT worried about HTML to PDF.
well one solution i can think of is to write little program that reads pdf text using library called iText and then generate html files.
well for java based PDF solutions...we dont have a clean way i guess-still.. all solutions are primitive and kind of workarounds... No easy solution for
1. Designing a template of a PDF
2. Then at runtime using java, populate data into this template...either using xml or other datasources...
such a simple requirement and NONE has a good "open-source and free" solution yet !
Eclipse BIRT comes close.. but does not handle Barcode elements ..OOB.
You were looking for pdf2htmlEX (C++), which converts PDF to HTML without losing text or format.
To convert further to semantic HTML, you can process pdf2htmlEX output using my project Transcript (Python). It is however not lossless anymore and works best on documents not deviating too much from conventional visual layout.
I'm looking to export a page that looks good in print media, to word.
Can this be done automatically, or mostly automatically with office apis?
The alternative is to create a program that reads all our style meta data and font meta data and convert to word and force a download.
The issue is our style metadata is already built for css, its a web app after all. And writing my own css parser, doesn't sound like a good use of time.
I know this sounds too simple to be true, but I belive you can simply rename a ".html" file to ".doc" to force it to open in word, and let office's html rendering take care of the rest.
If it's for reporting purposes, and you think you might have use for more of the same in the future, you could look at something like reporting services as a way of creating a report that can be downloaded in various formats. I'm not 100% sure if the newest version allows the creation of .doc files, but you can purchase plugins to permit this.
I am looking or a (preferably free) dll that can be used on a web site to convert pdf documents to html in a .Net IIS environment. It would be nice if it could accept the pdf as a byte array or file stream, and output the html as a stream suitable for Response.Write. It would really be great if the output HTML retained form inputs.
Has anyone seen such an animal?
Just bear in mind HTML does not have all the features of PDF so it may not look as good in HTML format.
You can do it with Aspose.PDF fo r.NET
It's not cheap, but their products are great and the documentation and user forums are amazing. Can't say enough good things about the product. IF you do a lot of .PDF manipulation, the price isn't bad for what you get.