I've got a problem - our flagship product has a text field in which there is rich text formatting. Basically it uses the standard Windows Richedit control, and the output (saved in the DB) is in RTF format. I'm writing a web frontent for the same DB, and I need to display this text on a webpage (it's the "product description" field).
Luckily images cannot be pasted into it, so that's one major problem avoided, but for the rest... there are a few RTF ==> HTML convertors out there, but I'm afraid how good will the results be.
Alternatively this field is new and hasn't yet made it into production (or into testing for that matter), so we could still change it to HTML, but... the flagship product is written in Delphi 5, and I cannot find a good wysiwyg HTML editor for it (aside from embedding a browser control with CKEditor or something similar).
And attaching a RTF ==> HTML convertor would be a great deal easier, if I only could be sure that it won't mess up. So... should I attempt the RTF->HTML road, or should I rather spend my strength in looking for a HTML editor in Delphi 5?
I would say either approach is reasonable. Going with the RTF->HTML approach may make it trickier to deal with editing the existing data using a web frontend in the future, however.
Related
We are having Multiple PDF which have account tables and balance sheet within it. We have tried many Converters but the result is not satisfactory. Can anybody please suggest any good converter that would replicated the contents of PDF to Exact structure in HTML. IF any paid Converter is there please suggest me .
This is the PDF we want to convert and Show in html "http://www.marico.com/html/investor/pdf/Quarterly_Updates/Consolidated%20Financial%20Results%20-%20Q3FY11.pdf"
Have you looked into this? http://pdftohtml.sourceforge.net/
It's open source as well, so it's free and can be modified if necessary.
There's even a demo showing the before PDF and the after HTML version. Not bad if you ask me.
If you're having issues specifically with tables in PDFs, perhaps the issue are the table themselves and whatever program is being used to generate them. Not all PDFs are created equal.
ALSO: Be aware that all PDFs that I've created and come across over the years have had lots of issues when it comes to copy/pasting blocks/lines of text that have other blocks/lines of text at equal or higher height on any given page. I think Acrobat lacks the ability to define a "sequence order" of what block is selected after what (or most programs don't use it properly), so the system sorta moves from a top-down, left-to-right way of selecting content.....even if that means jumping over large blank areas or grabbing lines from multiple columns at once when you wouldn't expect it. This may be part of your tabular data issue. Your weak link here is the PDF format itself and I think perhaps you may be expecting too much from it. Turning anything into a PDF is pretty much a one-way street, especially when you start putting lots of editable text into it.
Have you tried http://www.jpedal.org/html_index.php - there is also a free online version
I have hundreds of .doc files with text that I need put on web pages.
I realize I could convert every .doc file to .txt, then use a server side include to embed the contents of each page into a webpage. This would save a lot of time because I could simply have one .php?txt=... page which will display a different .txt include depending on the link the user pressed to get there. This works perfectly content-wise.
However, all formatting is lost when it is converted to .txt (titles should be in bold)
When I convert these .doc files to .html using Microsoft Word, the ~20 line documents become bloated >300 line .htm files (probably because each paragraph is put into textboxes)
Dreamweaver's "Clean up Word HTML" helped a bit but the code was still extremely bloated.
How would you suggest going about this?
edit: I may have solved my own question, trying to embed Google docs into my page.
There is a program suite called wv (former mswordview). It has a program wvWare. This software can transform Word documents to HTML.
Furthermore you can use the output from Word and send it through tidy. This corrects markup and usually can handle the mistakes made by Word.
You can try converting the Word documents to a DocBook intermediate format, then you can easily transform the DocBook with existing tools to (X)HTML.
MS Word is bloatware. Its own markup is bloated, and therefore any attempt to automatically convert it to HTML will inherit these problems. You end up with garbage like: <strong><strong></strong></strong> for no good reason.
Dreamweaver can clean it up a lot, but nothing short of strip/remarkup is going to get you clean results.
That's why most people use PDFs for this type of issue.
My immediate reaction would be to convert the docs to PDFs. That will normally preserve formatting quite well, and users typically have their browsers set up to view PDFs one way or another (and the few who don't are undoubtedly accustomed to being unable to view a lot of documents on a lot of sites).
Alright thanks everyone for your suggestions, but I wanted to make this page accessible to everyone without pdf viewers as well.
Google docs allows you to bulk upload your text files (and converts them for you too)
You can then export them into an iframe to embed in any html document.
Are there any classes, COM objects, command line utilities, or anything else that I can make an API for that can convert a PDF to an HTML document? Obviously the conversion might be a little rough since PDFs can contain a lot more than HTML can describe. I found a utility called pdftohtml on Source Forge, but quite honestly it does a horrible job with the conversion. I don't care if the software is free or commercial, but is there anything out there at all that I can incorporate with my own software to do this sort of conversion at least decently? I know Google's developed their own method of doing this, since you can click "View as HTML" on a PDF attached to an email through Gmail, but I was hoping there was something out available to the public.
Remember, PDF to HTML. I'm NOT worried about HTML to PDF.
well one solution i can think of is to write little program that reads pdf text using library called iText and then generate html files.
well for java based PDF solutions...we dont have a clean way i guess-still.. all solutions are primitive and kind of workarounds... No easy solution for
1. Designing a template of a PDF
2. Then at runtime using java, populate data into this template...either using xml or other datasources...
such a simple requirement and NONE has a good "open-source and free" solution yet !
Eclipse BIRT comes close.. but does not handle Barcode elements ..OOB.
You were looking for pdf2htmlEX (C++), which converts PDF to HTML without losing text or format.
To convert further to semantic HTML, you can process pdf2htmlEX output using my project Transcript (Python). It is however not lossless anymore and works best on documents not deviating too much from conventional visual layout.
I'm looking for a HTML editor that kinda supports templated editing or live snippets or something like that.
Background: I'm working on a website for a friend. As there are no specifications what the webspace/webserver can or can't do, I decided to make it a pure HTML/CSS page, or rather 10 of them. I wrote a template, copied it 10 times and edited the content. And guess what, the template has to be changed.
Therefore I'm looking for a (HTML-)editor that has some kind of live template system where I can edit the content in as it where plain text and then save the project into the 10 pure HTML/CSS files.
I thought about using PHP (the only script language I've some knowledge in), but writing the underlying template script would cost me enough time that I could change all files by hand. I'm not that familiar with AJAX to know if there's a way to load content from another file. If so, this would be an option if there already is a script. With Webdeveloper (firefox extension) I could save the generated source code as HTML/CSS.
Thanks in advance
Edit: any hints how to do this without an editor are welcome
Edit2: In my mind the tool looks like a plain old text editor like SciTe, but capable of editing multiple files simultaneously in the same text area, so it looks like editing one ordinary file, but actually it's a whole bunch of files.
Dreamweaver will do this for you, it's had HTML templating of the type your describe built in from very early versions (because from how you phrase the question I do not think you're thinking along the lines of a PHP templating engine such as Smarty, but some sort of HTML layout formating)
Although I regularly look around for Dreamweaver replacements, and I've certainly been impressed by Aptana, I still tend to use Dreamweaver in my development stack simply because whereas I can compensate for some of the more coding-orientated features it misses, I find the WYSIWYG nature of the editor invaluable.
I would have used a template engine.
I wrote a post about a dead simple script using the Dwoo template engine and mod_rewrite, where I am taking the uri and loading the forrect data and template based on that. You should be able to get it running in a few minutes.
Maybe I am way off on this, but why don't you look into an Open Source Content Management System (PHP/MYSQL)? There are MANY light systems that are not like Drupal, Joomla (if you do not want the big bulk of those CMS's).
There are even a few good ones for light web design that are flat file driven.
That would be my suggestion, at least if not for this project, look into it for future projects.
Here is an example of a great micro CMS that would seem to fit the bill for what you are doing:
http://www.mini-print.com/
In my Delphi program I want to display some information generated by the application. Nothing fancy, just 2 columns of text with parts of words color-coded.
I think I basically have two options:
HTML in a TWebbrowser
RTF in a TRichEdit.
HTML is more standard, but seems to load slower, and I had to deal with The Annoying Click Sound.
Is RTF still a good alternative these days?
Note: The documents will be discarded after viewing.
I would vote for HTML.
I think it is more future oriented. The speed would not concern me.
The question of HTML or RTF may be irrelevant. If they are just used for display purposes, then the file format doesn't matter. It's really just an internal representation. (Are any files even being saved to disk?) I think the question to ask is which one solves the problem with the least amount of work.
I would be slightly concerned that the browser control is changing all the time. I doubt the richedit control will change much. I would lean towards the richedit control because I think there is less that could go wrong with it. But it's probably not a big deal either way.
Have you considered doing an ownerdraw TListView?
I'd also use HTML. Besides, you just got an answer for the clicking sound in TWebBrowser.
If you'd rather not use TWebBrowser, take a look at Dave Baldwin's free HTML Display Components.
I would vote for HTML, too.
We started an app a while ago...
We wanted to
display some information generated by the application. Nothing fancy, just...
(do you hear the bells ring???)
Then we wanted to display more information and style it even more....
...someone decided, that RTF isn't enough anymore, but for backwards compatibility we moved on to MS Word over OLE-Server. That was the end of talking about performance anymore.
I think if we would have done that in HTML it would be much faster now.
RTF is much easier to deal with, as the TRichEdit control is part of every single Windows installation, and has much less overhead than TWebBrowser (which is basically embedding an ActiveX version of Internet Explorer into your app).
TRichEdit is also much easier to use to programmatically add text and formatting. Using the SelStart and SelLength, along with the text Attributes, makes adding bolding and italics, setting different fonts, etc. simple. And, as Re0sless said, TRichEdit can easily be printed while TWebBrowser makes it more complicated to do so.
I would vote RTF as I dont like the fact TWebBrowser uses Internet explorer, as we have had trouble with this in the past on tightly locked down computers.
Also TRichEdit has a print method build in, where as you have to do all sorts of messing about to get the TWebBrowser to print.
Nobody seems to have mentioned a reporting component yet. Yes, it is overkill right now, but if you use it anyway (and maybe you already have got some reporting to do in your app, so the component is already included) you can just display the preview and allow to print / export to pdf later, if it makes any sense. Also if you later decide that you want to have a fancier display there is nothing holding you back.
If both HTML and RTF won't satisfy your need, you could also use an open source text/edit component that supports coloring words or create your own edit component based on a Delphi component.
Another alternative to the HTML browser is the "Embedded Web Browser" components which I used a few projects for displaying html documents to the user. You have complete control over the embedded browser, and I don't recall any clicks when a page is loaded.
I vote for HTML also
RTF is good only for its editor, else then you'd better go standard.
RTF offers some useful text editing options like horizontal tabulator which are not available in HTML. Automatic hyperlink detection is also a nice extra. But I think I would prefer HTML, if these features are not required.
I vote for HTML.
Easier to generate programmatically.
Widely supported.
Since you don't need WYSIWYG capabilities I think HTML advantages trump RTF. Moreover, should the need to export generated data for further, WP-like editing arise, remember that major word processor can open and convert HTML files.
Use HTML, but with 'Delphi Wrapper for Chromium Embedded' by Henri Gourvest , Chromium embedded uses the core that powers Google Chrome.
Don't use TWebBrowser, I'm suffering from all programs that use IE's web control - the font is too small on my 22' monitor with a 1920x1080 resolution, I use Windows 7 and my system's DPI is 150% (XP mode), I tried everything to tweak trying to fix that, no luck...