Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.
Remember, PDF to HTML. I'm NOT worried about HTML to PDF.
well one solution i can think of is to write little program that reads pdf text using library called iText and then generate html files.
well for java based PDF solutions...we dont have a clean way i guess-still.. all solutions are primitive and kind of workarounds... No easy solution for
1. Designing a template of a PDF
2. Then at runtime using java, populate data into this template...either using xml or other datasources...
such a simple requirement and NONE has a good "open-source and free" solution yet !
Eclipse BIRT comes close.. but does not handle Barcode elements ..OOB.
You were looking for pdf2htmlEX (C++), which converts PDF to HTML without losing text or format.
To convert further to semantic HTML, you can process pdf2htmlEX output using my project Transcript (Python). It is however not lossless anymore and works best on documents not deviating too much from conventional visual layout.
Related
is there a way to parse a pdf using AS3 via Air in mobile?.
I don't need the full content of the PDF, only some data, is that possible?.
Edit for clarification:
I got a PDF file that was originally created based on a XML, what I'd need is to be able to retrieve that XML. Or at least, to find a string inside that PDF so I can make a call to a web service.
Original:
There's nothing native in AS3 for this kind of stuff but AlivePDF. It won't let you traverse things like XML so much, as it seems like you're trying to do by taking a small bit of a PDF, but it will let you create pdf's, add pages and change fonts etc.
You weren't entirely clear on what you're attempting to achieve, if you update your question a with a bit more detail we may be able to help a bit more.
Edit:
From the refined question, AlivePDF is not what you're after as it's really only for PDF generation. I'm assuming you're after a method to traverse the document like you would XML, by looking for a tag and extracting the information. I've not found a way to do this other than iterating through the document and searching manually which probably isn't what you're after.
After some searching I found an as3-pdfreader which doesn't seem to be complete at the moment. However on the Project Home the roadmap says parsing pdf files is complete, I've not been able to try it out yet though.
We are looking for a server-side solution that is capable of taking an HTML page and generating a document in one of several formats - pdf, as well as rtf, doc, etc.
I've used LiveDocX to mail-merge elements using a Word Doc template and generate pdf's with success. I also know that it is capable supporting our requirement of generating other formats in addition to pdf. But I am not sure if I can supply an HTML page and generate these files?
I see there is a gamut of HTML to PDF converter options. I've seen the the list posted up on https://stackoverflow.com/questions/3178448/list-of-html-to-pdf-converters. But what I am looking for is additional capability to generate not just pdf's but multiple formats.
A server-side Microsoft IIS (.NET or COM+) solution is preferred, but will also look at good PHP options.
THANKS
What about some funky OpenOffice skripts, that transform given HTML page. It should be flexible enough.
I've got a problem - our flagship product has a text field in which there is rich text formatting. Basically it uses the standard Windows Richedit control, and the output (saved in the DB) is in RTF format. I'm writing a web frontent for the same DB, and I need to display this text on a webpage (it's the "product description" field).
Luckily images cannot be pasted into it, so that's one major problem avoided, but for the rest... there are a few RTF ==> HTML convertors out there, but I'm afraid how good will the results be.
Alternatively this field is new and hasn't yet made it into production (or into testing for that matter), so we could still change it to HTML, but... the flagship product is written in Delphi 5, and I cannot find a good wysiwyg HTML editor for it (aside from embedding a browser control with CKEditor or something similar).
And attaching a RTF ==> HTML convertor would be a great deal easier, if I only could be sure that it won't mess up. So... should I attempt the RTF->HTML road, or should I rather spend my strength in looking for a HTML editor in Delphi 5?
I would say either approach is reasonable. Going with the RTF->HTML approach may make it trickier to deal with editing the existing data using a web frontend in the future, however.
I am looking or a (preferably free) dll that can be used on a web site to convert pdf documents to html in a .Net IIS environment. It would be nice if it could accept the pdf as a byte array or file stream, and output the html as a stream suitable for Response.Write. It would really be great if the output HTML retained form inputs.
Has anyone seen such an animal?
Just bear in mind HTML does not have all the features of PDF so it may not look as good in HTML format.
You can do it with Aspose.PDF fo r.NET
It's not cheap, but their products are great and the documentation and user forums are amazing. Can't say enough good things about the product. IF you do a lot of .PDF manipulation, the price isn't bad for what you get.
I'm looking for a HTML editor that kinda supports templated editing or live snippets or something like that.
Background: I'm working on a website for a friend. As there are no specifications what the webspace/webserver can or can't do, I decided to make it a pure HTML/CSS page, or rather 10 of them. I wrote a template, copied it 10 times and edited the content. And guess what, the template has to be changed.
Therefore I'm looking for a (HTML-)editor that has some kind of live template system where I can edit the content in as it where plain text and then save the project into the 10 pure HTML/CSS files.
I thought about using PHP (the only script language I've some knowledge in), but writing the underlying template script would cost me enough time that I could change all files by hand. I'm not that familiar with AJAX to know if there's a way to load content from another file. If so, this would be an option if there already is a script. With Webdeveloper (firefox extension) I could save the generated source code as HTML/CSS.
Thanks in advance
Edit: any hints how to do this without an editor are welcome
Edit2: In my mind the tool looks like a plain old text editor like SciTe, but capable of editing multiple files simultaneously in the same text area, so it looks like editing one ordinary file, but actually it's a whole bunch of files.
Dreamweaver will do this for you, it's had HTML templating of the type your describe built in from very early versions (because from how you phrase the question I do not think you're thinking along the lines of a PHP templating engine such as Smarty, but some sort of HTML layout formating)
Although I regularly look around for Dreamweaver replacements, and I've certainly been impressed by Aptana, I still tend to use Dreamweaver in my development stack simply because whereas I can compensate for some of the more coding-orientated features it misses, I find the WYSIWYG nature of the editor invaluable.
I would have used a template engine.
I wrote a post about a dead simple script using the Dwoo template engine and mod_rewrite, where I am taking the uri and loading the forrect data and template based on that. You should be able to get it running in a few minutes.
Maybe I am way off on this, but why don't you look into an Open Source Content Management System (PHP/MYSQL)? There are MANY light systems that are not like Drupal, Joomla (if you do not want the big bulk of those CMS's).
There are even a few good ones for light web design that are flat file driven.
That would be my suggestion, at least if not for this project, look into it for future projects.
Here is an example of a great micro CMS that would seem to fit the bill for what you are doing:
http://www.mini-print.com/