We are having Multiple PDF which have account tables and balance sheet within it. We have tried many Converters but the result is not satisfactory. Can anybody please suggest any good converter that would replicated the contents of PDF to Exact structure in HTML. IF any paid Converter is there please suggest me .
This is the PDF we want to convert and Show in html "http://www.marico.com/html/investor/pdf/Quarterly_Updates/Consolidated%20Financial%20Results%20-%20Q3FY11.pdf"
Have you looked into this? http://pdftohtml.sourceforge.net/
It's open source as well, so it's free and can be modified if necessary.
There's even a demo showing the before PDF and the after HTML version. Not bad if you ask me.
If you're having issues specifically with tables in PDFs, perhaps the issue are the table themselves and whatever program is being used to generate them. Not all PDFs are created equal.
ALSO: Be aware that all PDFs that I've created and come across over the years have had lots of issues when it comes to copy/pasting blocks/lines of text that have other blocks/lines of text at equal or higher height on any given page. I think Acrobat lacks the ability to define a "sequence order" of what block is selected after what (or most programs don't use it properly), so the system sorta moves from a top-down, left-to-right way of selecting content.....even if that means jumping over large blank areas or grabbing lines from multiple columns at once when you wouldn't expect it. This may be part of your tabular data issue. Your weak link here is the PDF format itself and I think perhaps you may be expecting too much from it. Turning anything into a PDF is pretty much a one-way street, especially when you start putting lots of editable text into it.
Have you tried http://www.jpedal.org/html_index.php - there is also a free online version
Related
I need help figuring out how to setup report in SSRS to be viewed in HTML and PDF Outputs, within a single .rdl.
Thanks
This is a pretty broad question. If you have specific issues then you should ask a question about those issues. You will most likely see down-votes or votes to close this question because of this. Don't feel that people are not willing to help, they are, bit StackOverflow is kept clean by closing questions that don't conform to SO standards laid out here https://stackoverflow.com/help/how-to-ask
Anyway....
Generally speaking, if you build a report, in most cases, you can export it to PDF without any real issues.
However, not all renderers support all circumstances so you might need to lookup the differences between the default HTML renderer and the PDF renderer. It's usually the following that cause issues.
Custom Fonts
Overlapping objects such as having text over an image
Page size and margins
Resolution of any images you might want to use in the report.
There will be others too that I can't think of but I suggest you build the report step by step and test each significant bit of work by exporting to PDF and checking the results.
As you come across problems that you cannot resolve, you can come back here and post questions about those specific issues.
I have a task to complete at work that I already completed once. The issue is Outlook and HTML templates. I created a large table that will be used as a pricing table for our customers but the issue is people don't know how to use html even when I dummy proof it with step by step instructions. I have attached the exact layout I need to produce to get some help with advice on what solution will be best for me to use. I have proposed the LiveCycle Designer by Adobe but I feel I am getting a snobby response from even suggesting this. PLEASE HELP! I have don't this 4 times using Acrobat Pro and the client hates it. I also redid it as an HTML template with Outlook specific code so it doesn't get mangled if it is forwarded to others.
Ok I need a reputation of 10 to post a picture so I will describe it the best I can. Let me know if I can send the image to anyone willing to help?
The table has an image header then Title and report date with a input text field next to it. Under this begins the table is has 12 columns and 29 rows. Each cell needs to be expandable in case the clients types in more characters and the form expands to show all the text in each cell. I would like everything to adjust to this.
I told them to use a Excel document but they said no. Then I suggested to just have each cell be larger so there is no need to have them be expandable. They shot that down as well due to the look of it. I really don't think I can pull this off without Adobe's Livecycle Designer.
I have a website http://www.bccfalna.com/ and the contents on this site are in HINDI Language. I want to make all these pages read only for peoples so that they can not copy the content.
Since I have written some books in HINDI Language on Computer Technology and I know that there are very few Information in HINDI language on the Internet about the Computer and I.T. Technology so I want to sell my EBooks in PDF format.
To show the usefulness of the contents of my books, I have placed all the contents in TEXT format in my website, so that people can see, read and can make decision to buy the book if the book is useful for them.
Since I have placed my whole books in Content form on my site so that various search engines also can give more and more traffic to my site but I am afraid that since I have placed all my content on my site in text form, any one can copy and will not be interested to buy them as PDF Format EBOOK.
I want that people can Read the content of my site but can not be able to copy the contents in any word processor.
Is it possible?
I don't want to make image like content, because Google, Yahoo like modern search engines don't gives too much importance to image sites.
I don't want to use Flash like sites too. The reason is same. Modern Search Engines don't gives too much attention to these kinds of sites.
I want my contents in TEXT format but I want to make them READ ONLY. Is it possible? If Yes: I would like to know HOW? and if No, I would like to get the alternative type solution.
Is there someone Genius to solve this problem? Thanks.
Generally speaking, any web content that is readable by a search engine will also be readable and copyable by people visiting your page.
I suppose you could examine the user_agent in the HTTP request to determine whether it originated from a popular search engine or not; if it did, return the plain-text of your content; if it did not, return a raster image of your content (text in an image can't be selected for copying and pasting, but it could be OCR'd or otherwise printed by the user). Some websites will use a script to disable right-clicking to save an image (but such scripts can easily be circumvented). Some sites will place a transparent image over the image containing the content (but this, too, can be circumvented). Note that the user_agent can be falsified if the web surfer knows you're treating search engines specially.
I suggest the best approach, though, is to keep things simple. Only publish the first chapter of your book and a table of contents online, or else only publish the first page of each chapter, or something similar. Search engines do not need the complete text of your book, only representative samples. Nobody will go to the trouble of copy/pasting your text if they can only get to a portion of the complete book.
You can't make it indexable to search engines and incapable of being copy & pasted... Google has to be able to copy words from your text to use in it's index. Maybe you could put snippets of the parts you want indexed in text format and put the majority in image/flash. It's not uncommon to see chapter previews on websites selling books.
Try Google Books:
I don't know if it works with the HINDI Language (It works. Some examples: http://www.scribd.com/doc/15257971/Google-Hindi-Books)
This solution allows Google to index and everyone to read the whole content. Anyhow copying remains awkward.
http://books.google.com/googlebooks/tour/
"Read-only" means they cant modify your webpages, "readable but not copyable" is impossible by definition, and makes about as much sense as "I want to give someone some water, but I dont want it to be wet". So, to answer your question, no this is not possible at all. (I regularly have to deal with people who think that this (and others) law of physics/mathematics doesn't apply to them, so sorry if I sound a bit rude.)
On a practical level, if you only give them some of the information, then they will only be able to copy that part of the information. (If they buy the book, they will be able to copy the rest from there.)
As others here have said, what you are asking is not possible.
If you host content for people to view in a browser, and for Google to index, there is absolutely no way to stop anyone from copying it. It is possible to make copying the content difficult (or at least inconvenient), but there's no way to stop someone from copying it if that's what they really want to do.
The only alternative, as others have already said, is to only post the first chapter of the book, and allow your readers to make a judgement based on that chapter. If they like the chapter they'll buy the whole book. This is a pretty common practice.
I understand that posting only part of the content is not what you want, but if you want to make it impossible to copy the whole book then this is your only real option.
The other alternative is to not worry about it. Cory Doctorow (and others I'm sure) publishes all his books under a Creative Commons license. They are free to download from his website but he still manages to make money from selling actual books. If people like your work enough, they'll pay to have it in a nice format.
There is, a way to instruct the browser to disable copying text. This does not, however, prevent copying, just makes is difficult. Not all browsers recognize this, especially older browsers. However, there are ways around this, the user can download the entire page and search for the text embedded in the HTML.
Another way, is to make it a graphic, rather than ASCII text. That way would mean that if anyone really wanted to copy your content, they would have to go through the process of using OCR (optical character recognition), then proof read plus correct the result.
Another way is to make it into a Flash animation, that can also be bypassed by doing a screen capture, then doing an OCR. In short, there is no way to prevent copying of material displayed in a browser ... but you can make it difficult and, hopefully, people won't bother.
FYI, typically people want their website to be read-only, to make it difficult or impossible for hackers to change their website content (i.e. replace content with vandalized content) ... not to prevent people from accessing the content legitiamately uploaded to the website.
Hope this helps.
Scan the text and post as an image, people can still read but not copy the text directly. They can copy the image but that will not matter as it would be the same as just reading from the screen they would have to retype it all if they wanted to steal the work.
I have a report I need to print out in an application I'm usually doing maintenance for. My question, which interests me beyond the scope of this task is, what are the ways to format an HTML page for printing? What are the pros and cons of each?
Note that the page is meant only to be printed. I'm not asking about an HTML page that looks ok also when printed.
Generally speaking, I know I can either rely heavily on <table>s or on <div>s, but I don't know which way to go.
I would also appreciate some resources to get me started, or to help with known problems, in any method you suggest.
Thanks,
Asaf
As you can certainly see, printing and web presentation are two different creatures. The main issue is the bounds of the printed page, which does not exist in a web page. Even if you think you have a page laid out in a manner that will fit a printed page, then you need to deal with the fact that the font you are using may not work or scale correctly on the user's printer.
I know of three ways to deal with this issue:
Use fixed-sized fonts (like Courier), limit yourself to an 80 column width, and only use font characters: meaning use something like asterisks for borders, etc. This is VERY old school - your reports look simple and old and plain. But, they will always print they way you intended.
Convert your report to an image. Images can be made to confirm to a specific size which can fit on a page. However, you can still have issues due to printer margin settings.
Let another application do the work for you. What I mean by this is put your report into a PDF or a spreadsheet. Both PHP and Perl have easy to use modules for creating a PDF - with no licensing needed. Perl has a fantastic spreadsheet module. This route takes a little learning up front, but frees you from having to be an expert on printing (which can be a real pain).
In case you DO want to have a page that also looks good when viewed in a browser, consider multiple stylesheets for different medias.
In my Delphi program I want to display some information generated by the application. Nothing fancy, just 2 columns of text with parts of words color-coded.
I think I basically have two options:
HTML in a TWebbrowser
RTF in a TRichEdit.
HTML is more standard, but seems to load slower, and I had to deal with The Annoying Click Sound.
Is RTF still a good alternative these days?
Note: The documents will be discarded after viewing.
I would vote for HTML.
I think it is more future oriented. The speed would not concern me.
The question of HTML or RTF may be irrelevant. If they are just used for display purposes, then the file format doesn't matter. It's really just an internal representation. (Are any files even being saved to disk?) I think the question to ask is which one solves the problem with the least amount of work.
I would be slightly concerned that the browser control is changing all the time. I doubt the richedit control will change much. I would lean towards the richedit control because I think there is less that could go wrong with it. But it's probably not a big deal either way.
Have you considered doing an ownerdraw TListView?
I'd also use HTML. Besides, you just got an answer for the clicking sound in TWebBrowser.
If you'd rather not use TWebBrowser, take a look at Dave Baldwin's free HTML Display Components.
I would vote for HTML, too.
We started an app a while ago...
We wanted to
display some information generated by the application. Nothing fancy, just...
(do you hear the bells ring???)
Then we wanted to display more information and style it even more....
...someone decided, that RTF isn't enough anymore, but for backwards compatibility we moved on to MS Word over OLE-Server. That was the end of talking about performance anymore.
I think if we would have done that in HTML it would be much faster now.
RTF is much easier to deal with, as the TRichEdit control is part of every single Windows installation, and has much less overhead than TWebBrowser (which is basically embedding an ActiveX version of Internet Explorer into your app).
TRichEdit is also much easier to use to programmatically add text and formatting. Using the SelStart and SelLength, along with the text Attributes, makes adding bolding and italics, setting different fonts, etc. simple. And, as Re0sless said, TRichEdit can easily be printed while TWebBrowser makes it more complicated to do so.
I would vote RTF as I dont like the fact TWebBrowser uses Internet explorer, as we have had trouble with this in the past on tightly locked down computers.
Also TRichEdit has a print method build in, where as you have to do all sorts of messing about to get the TWebBrowser to print.
Nobody seems to have mentioned a reporting component yet. Yes, it is overkill right now, but if you use it anyway (and maybe you already have got some reporting to do in your app, so the component is already included) you can just display the preview and allow to print / export to pdf later, if it makes any sense. Also if you later decide that you want to have a fancier display there is nothing holding you back.
If both HTML and RTF won't satisfy your need, you could also use an open source text/edit component that supports coloring words or create your own edit component based on a Delphi component.
Another alternative to the HTML browser is the "Embedded Web Browser" components which I used a few projects for displaying html documents to the user. You have complete control over the embedded browser, and I don't recall any clicks when a page is loaded.
I vote for HTML also
RTF is good only for its editor, else then you'd better go standard.
RTF offers some useful text editing options like horizontal tabulator which are not available in HTML. Automatic hyperlink detection is also a nice extra. But I think I would prefer HTML, if these features are not required.
I vote for HTML.
Easier to generate programmatically.
Widely supported.
Since you don't need WYSIWYG capabilities I think HTML advantages trump RTF. Moreover, should the need to export generated data for further, WP-like editing arise, remember that major word processor can open and convert HTML files.
Use HTML, but with 'Delphi Wrapper for Chromium Embedded' by Henri Gourvest , Chromium embedded uses the core that powers Google Chrome.
Don't use TWebBrowser, I'm suffering from all programs that use IE's web control - the font is too small on my 22' monitor with a 1920x1080 resolution, I use Windows 7 and my system's DPI is 150% (XP mode), I tried everything to tweak trying to fix that, no luck...