Convert html/css print media display to .doc with appropriate page breaks? - html

I'm looking to export a page that looks good in print media, to word.
Can this be done automatically, or mostly automatically with office apis?
The alternative is to create a program that reads all our style meta data and font meta data and convert to word and force a download.
The issue is our style metadata is already built for css, its a web app after all. And writing my own css parser, doesn't sound like a good use of time.

I know this sounds too simple to be true, but I belive you can simply rename a ".html" file to ".doc" to force it to open in word, and let office's html rendering take care of the rest.

If it's for reporting purposes, and you think you might have use for more of the same in the future, you could look at something like reporting services as a way of creating a report that can be downloaded in various formats. I'm not 100% sure if the newest version allows the creation of .doc files, but you can purchase plugins to permit this.

Related

.doc / .docx problem to display in browser

I am trying to find a solution for a website that I made so that the word documents will not download automatically anymore, but that they display directly in a new page in the browser when I click them.
Can someone help me?? How can I do that in HTML?
Thanks a lot in advance
There are two approaches you can take for this:
Ensure that all visitors to your site have a browser extension which can display Word documents. (This isn't something you can control for a typical website but might be an option for a company Intranet)
Convert the Word documents to a format that the browser can display (i.e. HTML). You could do this manually or with code (which could be client-side or server-side). The Word document formats are notoriously complex so you would need to find a third party library that could do this for you.

Automatic translation for HTML files

I want translate a web page for multiple languages. I can create a .html file for each language, but I would like only one .html file and use an automatic way for translations.
Is it be possible?
It is not possible for a static website. I suggest you stick to having separate files for each locale.
The easiest way for you to achieve that (since you are obviously the beginner) would be to actually use PHP and generate appropriate language web page on the fly.
Alternatively (but I must warn you that this approach may result in very unresponsive web page) you can use JavaScript to translate your web page - in fact jQuery plus Globalize. However, this force you to use common anti-pattern, that is you would need to force users to choose their language as oppose to automatically infer it from web browser's settings.
Either way, you have a lot of things to learn...

How do I create some HTML help pages, with the same content at the top and bottom, without php or ASP etc?

I want to create some html help pages, separate html pages.
However, I want to have the same content on the top and bottom of the pages.
In the past I've used PHP or ASP, with header and footer files.
I've then had to do view source and save these pages to get what I want.
I just wondered if there an easiest way to do this ?
EDIT:
The pages are for use with software using a web object not a normal browser. So there won't be a web server
If your web server supports it, you could do server side includes
You could use frames, but it's not necessarily advisable (for one, it breaks navigation).
You could use XML files with an XSLT stylesheet to turn them into HTML documents that share similar elements.
You could use PHP or another server-side language to generate the pages, and then use a recursive download tool (such as wget) to turn them into HTML.
EDIT: you're basically asking whether the "standard-ish" subset of HTML supported by your component of choice provides a way of including data from a common file, just so you won't have to include the data in every HTML document.
The answer hovers somewhere between "no way" and "maybe your component has a few tricks to do that".
The sane thing to do here would be to have a tool generate the HTML documents from a common template. Could be XML + XSLT, PHP/ASP/whatever, or a fully-fledged CMS (this actually helps let non-technical users write the document contents).
It's awful, but you could include a JS file that uses a bunch of document.write("...") to include common elements. Not SEO friendly.

Putting several hundred .doc pages into webpage

I have hundreds of .doc files with text that I need put on web pages.
I realize I could convert every .doc file to .txt, then use a server side include to embed the contents of each page into a webpage. This would save a lot of time because I could simply have one .php?txt=... page which will display a different .txt include depending on the link the user pressed to get there. This works perfectly content-wise.
However, all formatting is lost when it is converted to .txt (titles should be in bold)
When I convert these .doc files to .html using Microsoft Word, the ~20 line documents become bloated >300 line .htm files (probably because each paragraph is put into textboxes)
Dreamweaver's "Clean up Word HTML" helped a bit but the code was still extremely bloated.
How would you suggest going about this?
edit: I may have solved my own question, trying to embed Google docs into my page.
There is a program suite called wv (former mswordview). It has a program wvWare. This software can transform Word documents to HTML.
Furthermore you can use the output from Word and send it through tidy. This corrects markup and usually can handle the mistakes made by Word.
You can try converting the Word documents to a DocBook intermediate format, then you can easily transform the DocBook with existing tools to (X)HTML.
MS Word is bloatware. Its own markup is bloated, and therefore any attempt to automatically convert it to HTML will inherit these problems. You end up with garbage like: <strong><strong></strong></strong> for no good reason.
Dreamweaver can clean it up a lot, but nothing short of strip/remarkup is going to get you clean results.
That's why most people use PDFs for this type of issue.
My immediate reaction would be to convert the docs to PDFs. That will normally preserve formatting quite well, and users typically have their browsers set up to view PDFs one way or another (and the few who don't are undoubtedly accustomed to being unable to view a lot of documents on a lot of sites).
Alright thanks everyone for your suggestions, but I wanted to make this page accessible to everyone without pdf viewers as well.
Google docs allows you to bulk upload your text files (and converts them for you too)
You can then export them into an iframe to embed in any html document.

HTML or RTF to display columns of text with color-coded words

In my Delphi program I want to display some information generated by the application. Nothing fancy, just 2 columns of text with parts of words color-coded.
I think I basically have two options:
HTML in a TWebbrowser
RTF in a TRichEdit.
HTML is more standard, but seems to load slower, and I had to deal with The Annoying Click Sound.
Is RTF still a good alternative these days?
Note: The documents will be discarded after viewing.
I would vote for HTML.
I think it is more future oriented. The speed would not concern me.
The question of HTML or RTF may be irrelevant. If they are just used for display purposes, then the file format doesn't matter. It's really just an internal representation. (Are any files even being saved to disk?) I think the question to ask is which one solves the problem with the least amount of work.
I would be slightly concerned that the browser control is changing all the time. I doubt the richedit control will change much. I would lean towards the richedit control because I think there is less that could go wrong with it. But it's probably not a big deal either way.
Have you considered doing an ownerdraw TListView?
I'd also use HTML. Besides, you just got an answer for the clicking sound in TWebBrowser.
If you'd rather not use TWebBrowser, take a look at Dave Baldwin's free HTML Display Components.
I would vote for HTML, too.
We started an app a while ago...
We wanted to
display some information generated by the application. Nothing fancy, just...
(do you hear the bells ring???)
Then we wanted to display more information and style it even more....
...someone decided, that RTF isn't enough anymore, but for backwards compatibility we moved on to MS Word over OLE-Server. That was the end of talking about performance anymore.
I think if we would have done that in HTML it would be much faster now.
RTF is much easier to deal with, as the TRichEdit control is part of every single Windows installation, and has much less overhead than TWebBrowser (which is basically embedding an ActiveX version of Internet Explorer into your app).
TRichEdit is also much easier to use to programmatically add text and formatting. Using the SelStart and SelLength, along with the text Attributes, makes adding bolding and italics, setting different fonts, etc. simple. And, as Re0sless said, TRichEdit can easily be printed while TWebBrowser makes it more complicated to do so.
I would vote RTF as I dont like the fact TWebBrowser uses Internet explorer, as we have had trouble with this in the past on tightly locked down computers.
Also TRichEdit has a print method build in, where as you have to do all sorts of messing about to get the TWebBrowser to print.
Nobody seems to have mentioned a reporting component yet. Yes, it is overkill right now, but if you use it anyway (and maybe you already have got some reporting to do in your app, so the component is already included) you can just display the preview and allow to print / export to pdf later, if it makes any sense. Also if you later decide that you want to have a fancier display there is nothing holding you back.
If both HTML and RTF won't satisfy your need, you could also use an open source text/edit component that supports coloring words or create your own edit component based on a Delphi component.
Another alternative to the HTML browser is the "Embedded Web Browser" components which I used a few projects for displaying html documents to the user. You have complete control over the embedded browser, and I don't recall any clicks when a page is loaded.
I vote for HTML also
RTF is good only for its editor, else then you'd better go standard.
RTF offers some useful text editing options like horizontal tabulator which are not available in HTML. Automatic hyperlink detection is also a nice extra. But I think I would prefer HTML, if these features are not required.
I vote for HTML.
Easier to generate programmatically.
Widely supported.
Since you don't need WYSIWYG capabilities I think HTML advantages trump RTF. Moreover, should the need to export generated data for further, WP-like editing arise, remember that major word processor can open and convert HTML files.
Use HTML, but with 'Delphi Wrapper for Chromium Embedded' by Henri Gourvest , Chromium embedded uses the core that powers Google Chrome.
Don't use TWebBrowser, I'm suffering from all programs that use IE's web control - the font is too small on my 22' monitor with a 1920x1080 resolution, I use Windows 7 and my system's DPI is 150% (XP mode), I tried everything to tweak trying to fix that, no luck...