PDF to HTML conversion for web site - html

I am looking or a (preferably free) dll that can be used on a web site to convert pdf documents to html in a .Net IIS environment. It would be nice if it could accept the pdf as a byte array or file stream, and output the html as a stream suitable for Response.Write. It would really be great if the output HTML retained form inputs.
Has anyone seen such an animal?

Just bear in mind HTML does not have all the features of PDF so it may not look as good in HTML format.

You can do it with Aspose.PDF fo r.NET
It's not cheap, but their products are great and the documentation and user forums are amazing. Can't say enough good things about the product. IF you do a lot of .PDF manipulation, the price isn't bad for what you get.

Related

Is it possible to condense a set of HTML documentation, including all links, into a single PDF, EBook etc?

I essentially want to produce and EBook, or PDF file from a large set of HTML documentation. Is there any code/app that could help? Is it theoretically, or realistically, possible???
Or do I need to RTFM :)
As for PDFs,
I work on a project where users author large (400+ pages) documents. We output documents in HTML and send them to DocRaptor. DocRaptor uses PrinceXML under the hood. For the project we went with DocRaptor due to costs. We don't output very much and have a tiny budget. The output from PrinceXML is great.

How to embed or convert PDF to support reading on mobile browser with offline support?

Before you downvote please read the full post. It is a legit question, for witch I have googled and found some answer but all come short therefore I come to the community and ask for advice.
The requirement asks for the ability to read catalogs that are in pdf format inside a mobile browser. There is also the need to read the files offline, so this kills a few options like google pdf viewer.
So faces with this requirement I have not found an easy way to embed a pdf file, therefore conversion to HTML5 or Images is the route that I am thinking on going.
In terms of HTML5 conversion I have found Flexpaper, crocodoc, Prizm, serverPDF and others, but almost all require the user to be online to read the files. Is there a client side only way to read and display PDF files? Or an intermediate browser/js friendly format?
if you optimize this project, maybe it will work
js and html pdf viewer

HTML to PDF/rtf/doc/etc converters?

We are looking for a server-side solution that is capable of taking an HTML page and generating a document in one of several formats - pdf, as well as rtf, doc, etc.
I've used LiveDocX to mail-merge elements using a Word Doc template and generate pdf's with success. I also know that it is capable supporting our requirement of generating other formats in addition to pdf. But I am not sure if I can supply an HTML page and generate these files?
I see there is a gamut of HTML to PDF converter options. I've seen the the list posted up on https://stackoverflow.com/questions/3178448/list-of-html-to-pdf-converters. But what I am looking for is additional capability to generate not just pdf's but multiple formats.
A server-side Microsoft IIS (.NET or COM+) solution is preferred, but will also look at good PHP options.
THANKS
What about some funky OpenOffice skripts, that transform given HTML page. It should be flexible enough.

How do I convert PDF to HTML programmatically?

Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.
Remember, PDF to HTML. I'm NOT worried about HTML to PDF.
well one solution i can think of is to write little program that reads pdf text using library called iText and then generate html files.
well for java based PDF solutions...we dont have a clean way i guess-still.. all solutions are primitive and kind of workarounds... No easy solution for
1. Designing a template of a PDF
2. Then at runtime using java, populate data into this template...either using xml or other datasources...
such a simple requirement and NONE has a good "open-source and free" solution yet !
Eclipse BIRT comes close.. but does not handle Barcode elements ..OOB.
You were looking for pdf2htmlEX (C++), which converts PDF to HTML without losing text or format.
To convert further to semantic HTML, you can process pdf2htmlEX output using my project Transcript (Python). It is however not lossless anymore and works best on documents not deviating too much from conventional visual layout.

How do I create "accessible" PDFs from HTML?

Does anyone have any suggestions on how to generate accessible PDFs (including images) from HTML?
The PDFs need to look like the original HTML, including positions of images etc.
Any special HTML structure required to help make the final PDF accessible?
I've seen questions about creating PDFS none of them specifically address the important issue of accessibility.
My poison of choice is Perl but references to any program, language or library will help.
I have a more in-depth question at TypeDoc if anyone has more general information to offer.
http://doctype.com/TiB
Also,
I, and others, would find it useful if users with accessibility problems could comment if they find the "usability experience" of using PDFs better or worse than reading from Plain Old Semantic HTML (POSH).
Thanks
Mike
Look into PrinceXML. Through CSS you can control margins, page breaking and orientation. While not open source, you can try it for free, but it places a small water mark in the upper right corner.
The Adobe ColdFusion server product does a really fine job of this, not surprisingly. But it's not free, and the open source implementations of the language (Smith and BlueDragon) don't support the pdf stuff.
Developer licenses to Adobe ColdFusion are free, and you can download it.
I've done this thing on a small scale but scripting Safari to print to PDFs. I don't recommend it for large-scale projects though.
By far the most capable PDF publishing tool I've ever come across is reportlab. There is an open source library written with Python and a proprietary system that allows you to construct a document using RML, a custom xml spec. The latter is easier for more complex docs. They tend to be very flexible (and reasonable) with pricing.
Not strictly an answer to your question as it doesn't handle html-to-pdf conversions, but perhaps of use to you.