Distributing a HTML Document - html

I am creating a HTML 5 user manual. This contains a number of image folders and js fodlers. Now i wish to distribute this as a single document. In Windows there is mht or something to that effect. Is there any way I can do this on ubuntu that is not browser or OS dependent?

notice that :MHTML, short for MIME HTML, is a web page archive format used to bind resources which are typically represented by external links (such as images, Flash animations, Java applets, audio files) together with HTML code into a single file. The content of an MHTML file is encoded as if it were an HTML email message, using the MIME type multipart/related. The first of the file is normally encoded HTML; subsequent parts are additional resources identified by their original URLs and encoded in base64. This format is sometimes referred to as MHT, after the suffix .mht given to such files by default when created by Microsoft Word, Internet Explorer or Opera. MHTML is a proposed standard, circulated in a revised edition in 1999 as RFC 2557.
So try to save document as regular html with none of dependency. this will be run at any OS independently.

Well, as far as I know you won't be able to really distribute this manual as HTML OS independent.
BUT: you can distribute it as PDF, ZIP-file, host it anywhere, ePub, etc. These are pretty good options for your needs. Safari has a pretty cool feature called webarchive, but this only zips ONE single page to a viewable, always-the-same-looking page. And it will only be viewable with Safari. So you'd have to do this for ALL your pages...

Related

How to convert HTML to PDF with Bookmark

I am trying to save a customized html file as a pdf.. normally I would press ctrl-P at my browser (chrome) and print as pdf..
But when I open the pdf file, there is no bookmark tab on the left side of the pdf reader (adobe)..
What I want is to save an html file as a pdf and the bookmark should appear in the left side of the pdf reader:
I created the html file.. I added links to some parts of it using id and hyperlink:
part1
...some codes here...
<div id="part1">
and it works, but I don't know how to create a bookmark in pdf from an html... normally ms word or libre office can convert their documents to pdf with a bookmark..
But how can I made a pdf with a bookmark using HTML?
Okay, so I ran into this problem and really wanted there to be a solution here that worked. When there wasn't, I figured I should add what I found so that hopefully the next developer can benefit from it.
First up: HTML conversion to PDF isn't really up to the HTML itself - it's up to whatever the conversion engine decides to do with your HTML. So for instance, if your approach is: Open it in IE/Chrome/Firefox/whatever > File > Print > Microsoft Print to PDF - well, your conversion engine is 'Microsoft Print to PDF'. Doesn't matter what browser you were using at that point - all its doing is creating a print stream to send to a printer. So if Microsoft Print to PDF isn't going to make bookmarks for you (which it doesn't) then it doesn't matter which web browser you use to open the PDF.
And this is the critical problem with any Ctrl-P / Print avenues. The web browser is ultimately creating a print stream, which the conversion library simply streams into a PDF. And all the web browsers I looked at do not have native support built in to convert to PDF (why would they? 99% of the use cases are covered with a 'Print to PDF' functionality.) And the print drivers I tried (Microsoft Print to PDF, Adobe PDF Print) didn't manage to suss out bookmarks from the raw print stream. Which makes sense.
So, at this point, what you're looking for is a standalone PDF Conversion engine - something that can actively open the HTML file and convert from there, instead of going through a web browser. Are there PDF Conversion engines that do this and add Header-Tag based bookmarks? Possibly. The ones we had at our disposal (ABCPdf, Neevia) weren't able to do it, but it's certainly possible there's one out there.
So what now?
There are a few different options I explored.
Option #1: Separate Files, Combined With Adobe
Adobe Acrobat (non-viewer version), when it's the conversion engine, will automatically add bookmarks for each file it converts. So you can submit the HTML contents, not as a single HTML file, but as HTML files for each section you want a bookmark over.
The good news is that if a section has a hyperlink that points to another document its merging, it's smart enough to have that hyperlink point to the spot within the internal PDF its creating (it's not an external hyperlink like I expected it would be). There are two bits of bad news, though:
Each section has to be the start of a PDF page. If your section is
two inches tall, the rest of the page will be blank, and the next
section will start on the following page.
The bookmarks aren't clean. When I did it, each file had 3
bookmarks. Which is pretty darned ugly and off-putting.
Option #2: Separate Files, Combined With Another Library
The first 'downside' of Option #1 might not be a problem. But the second is pretty ugly. And other libraries definitely can create the bookmarks without creating 3-per-file. The main obstacle here is: the library has to be smart enough to resolve those 'external' hyperlinks to within the PDF that's created. One thing that often hurts is that those conversion libraries often want to convert each separate file to a PDF internally first and then merge the PDFs together... but that means that it won't handle the cross-file hyperlinks correctly. I wasn't able to find a way to make this work with our existing PDF conversion libraries.
Option #3: Different Origination Method
Instead of having a 'Help.html', which is then converted to PDF somehow, start with a format other than HTML. And the easiest source to get into PDF+Bookmarks is MSWord+Headers. Generally, for each PDF help file you want, you can have a master .DOCX sitting somewhere behind the scenes. We've used this approach before, and while it's not the most elegant, it at least works pretty well.
Option #4: Programmatic with Library
This might not be applicable for the OP's use case... but if you're generating the help, there's nothing to say you can't use the PDF Conversion library programatically to add whatever bookmarks you want. Pretty much every PDF engine I've seen allows API access to bookmarks, so if this avenue is open to you, it's almost certainly the cleanest solution-wise.
Option #5: PDF Conversion Scouring
Like I mentioned, it's possible there's a PDF conversion engine out there that has a good HTML parsing engine and can handle bookmarks from various HTML tags (like H1, H2, etc.) However, it's probably going to take a bit to find it, because it's so much easier for a potential engine-writer to allow the file to be rendered with a native viewer. Think about it. If you were writing a PDF Conversion Service, which would you rather do:
Develop routines that can accurately render an HTML document fed
into it - aka, basically write your own web browser from scratch.
Have IE/Chrome/Whatever render it and simply take their print output
to convert to PDF.
... that second option is so ridiculously easier than the first, that it's no surprise most PDF Conversion engines don't have their own internal HTML parser (or for that matter, Word parser, Excel parser, etc.)
The bookmarks in html input document are set like this:
....
...
...
...
<h1 id="marcador1"> Chapter 1 </h1>
...
Don't use chrome, although it is simple to convert a web page to a PDF file. If you want pdf bookmarks, you can try microsoft word (2010). Just save the web pages to local, and open it with MS word 2010, then save it as pdf. The bookmark is there. see also: https://www.w3.org/TR/WCAG20-TECHS/PDF2.html
App comparison for converting PDF (regarding bookmark & internal hyperlink)
I did some tests for different app, (results may not be accurate due to personal settings / mis-used)
pdf bookmark
internal hyperlink
downloaded as .htm
file format looking
Chrome (print as PDF)
N
Y
N
looks same as the webpage
Calibre
Y
N/Y
Y
looks same as the webpage
Print Friendly & PDF 2.8.1 (Chrome Extension)
N
Y
N
syntax color is changed
WPS docx
N/Y
N
Y
format is changed a lot
Foxit PDF
N
N
Y
looks same as the webpage
Adobe PDF
N
N
Y
looks same as the webpage
MS Word docx
Adobe PDF (Chrome Extension)
annotation:
pdf bookmark = contains bookmark in PDF file
internal hyperlink =
Y = the web hyperlinks inside jumps to the position in the PDF internally
N = the web hyperlinks inside opens an external web link in your browser
downloaded as .htm =
Y = the webpage is downloaded as .htm then converted to PDF
N = the webpage is directly converted in Chrome browser
file format looking
(Though I said "looks same as the webpage", its not "exactly" same as the webpage -- you need to config the settings when you convert.
Also some minor parts / components of the webpage may or may not be contained in the PDF.)
Calibre Usage
To use Calibre (As shown, Calibre contains the bookmark. But it doesnt have internal hyperlink.)
webpage is downloaded as .htm (along with a folder)
drag the .htm into Calibre, it becomes a .zip file
use Convert books to convert .zip to .pdf
You may need to set up the bookmark detection mechanism in Convert books > Table of Contents if Calibre doesnt detect it.
Calibre is highly customizable on the conversion
(wish I know how to solve the issue of "not having internal hyperlink" directly inside Calibre, without going through HTTrack)
To use Calibre, with HTTrack to add internal hyperlink:
use HTTrack to download the webpage
(with depth of level of 1 (--ie: just current webpage), should be enough)
(you may need to config it so that it captures external files like images / syntax-format files)
drag the index.html into Calibre ... (proceed same as [2~4] above)
(you need to enable the option of creating the index.html)
WPS docx Usage (not recommend)
webpage is downloaded as .htm (along with a folder)
save as .docx
output as .pdf (enable the option convert title style format to bookmark)
(if no title style format is detected, that may due to the title are actually in the style format of hyperlink style format, you need to manually remove all those hyperlink style format.)
note
testing subject weblink is this ; (testing result PDF are not posted here)
Again, I could be wrong -- results may not be accurate due to personal settings / mis-used
Personally, I believe big companies like Adobe should have such functionality to include bookmarks in PDF. It just I dont know how to do it...

Convert webarchive to html

I managed to collect the behavior of a complex web site into a webarchive. Thereafter I would like to turn that webarchive into an html set of nested directory. Yet, when I did it both with Waf and with a commercial software bought on the the Apple store, what I get is just the nested directory with the html page at the bottom and no images, nor css nor working links.
If you are interested the webarchive document is at:
http://www.miafoto.it/it/GiroMilano.webarchive
while the weak product of the extraction is at:
http://www.miafoto.it/it/Giromilano/Pagine/default.aspx
and the empty directories above.
In addition to the different look, the webarchive displays the same behavior as the official web site - when a listbox vales is selected and then the button pushed - while the extracted version produces a page with no contents by loading itself rather than the official page.
As you may see the webarchive is over 1MB while the extraction just little over 1 KB.
What is wrong with it and how may I perform such an apparently trivial business with usable results?
Thanks,
textutil -convert html example.webarchive
Be careful — html with files is created in the same folder as webarchive!
Also, I had to open .html with text editor and replace "file:///image.tiff" links (replace "file:///" with "") so they point to relative path.
Also, not all browsers display .tiff images.
Who knew we have Stack Overflow wiki?
I find that this WebArchiveExtractor.app works on my Mac (Mojave OS) –
https://robrohan.github.io/WebArchiveExtractor/
I managed the issue by finding all parameters being submitted in the page and submitting them too in my script, ignoring the webarchive.
To save HTML pages on mac, I use chrome. Download and install it and save your page as HTML. Safari will save the web pages with webarchiveformat and for me, it's very hard to deal with it.

HTML file form, accept Word documents

I got a bit of a weird issue here. (Either that, or I'm just overlooking something stupidly simple.)
I have a file upload form, and I want it to only accept certain types of files amongst which MS Word documents. I added the .doc and .docx MIME-types (application/msword and application/vnd.openxmlformats-officedocument.wordprocessingml.document respectively) to the accept attribute of the file input field, yet when I hit "choose file", the .doc and .docx files are still greyed out as not allowed to be uploaded.
So, what am I missing? Any help or pointers would be greatly appreciated!
(And yes, I know the form-check isn't a good way to filter uploaded files. I've got PHP covering that, this is more of a convenience for the user, so they don't go and upload a disallowed file.)
Support of the accept attribute has been poor but is slowly becoming more common. On my Google Chrome 19, the element <input type="file" accept="application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/msword" /> works so that the selection is limited to .doc and .docx files. Other browsers generally ignore such an accept attribute, though e.g. Firefox supports some simple cases like accept="image/gif".
In addition to this, browsers may map MIME types to filename extensions (which is generally what file systems treat as “file type” indicators) in different ways. Therefore, although the attribute may work in some situations, it might make things worse when the mapping is different.
I'm the MIME type for Word files likely isn't registered with the browser, so the Word file is being reported as application/octet-stream. In general, MIME type filtering in HTML forms does not work reliably, except for common image MIME types.
You could create a JavaScript solution to check the extension of the file.

Display/Render RTF doc in browser display using html textarea or something similar

My web application has an feature wherein preformatted RTF documents are used as templates and the user can select the source of data and then merge with the RTF documents templates to create merged RTF files. The RTF templates have placeholders which get replaced with user selected content. The final doc can either be saved or opened directly if word/wordpad is available on the local users machine.
Now, I have a requirement to display the merged document to the user for confirmation. The user may either print or save the document to the system directly. The display should not be word/wordpad application but should be within the application itself, using textarea or something similar to render the document. Can you please let me know if its possible to render the RTF document in textarea or not. Along with the displayed content, there should be options to print and save the document.If I have to convert the RTF to Html and then display the html content in textarea , please let me know how i can do the conversion and then display the html in the page.
That's a very difficult requirement. First of all, let's dismiss the idea about a <textarea>, because it does not support any formatting at all. All the WYSIWYG editors you've seen out there are based on <iframe>s.
Secondly, no browser can directly display a RTF. You can embed it as an <object>, and some might show it (IE probably will), but I can't say which ones won't. Portable devices almost certainly won't. But you should test this though, maybe it works well enough after all.
Failing that, HTML conversion is also out of question, because RTF has very very many features that cannot be emulated in HTML. There are some converters out there (google), but but they will all come with serious limitations. If you want full support, you will have to do your own rendering via Canvas or Flash or something.
To this end I'd suggest checking out Google Docs. They've gone through all of this hassle and have a rather feature-full engine for displaying most possible documents. I think it was also possible to embed them in your own webapges, though I've never checked it out myself.
Use a <PRE> tag to Display/Render RTF doc in browser.

How to force a txt file to be read as an html document by browsers?

I have .txt files which are mostly (truly) html document (they have the header, body, html tags etc.). (I'm working in Windows environment here). I would like any browser to readily read them as html document (html document with normal .html suffix). Right now i have to rename the .txt file to be able to read it in the browser (Ex: myfile.txt -> myfile.txt.htm). Any trick we can apply to fool the browser right away?
Relative question: Is there any code i could add on top of those .txt file so that only .txt files with that code will be open as html document and seen as such by browsers? (code could be anything added with hexadecimal editor ot plain ascii). Thanks.
Since you're reading the file directly off of your file system (ie: using a file: URL rather than http: or something else) your browser is using the extension to determine the content-type of the file. How this mapping from extension to content type is made varies from browser to browser (and also from OS to OS to a certain extent).
First off, I should say that I'd be a bit afraid of making this sort of change. There's probably lots of code that has a hard-coded assumption that .txt maps to text/plain, so altering that mapping is likely to expose all sorts of nasty bugs. Caveats aside, here's what you need to do:
In Firefox, ExternalHelperAppService is used to determine the type of file: URIs. Note that one of the steps is to use a hard-coded list of extension to type mappings, which most likely has .txt mapping to text/plain.
In IE the file type mappings come from OS settings. It varies a bit depending on which version of Windows you're dealing with, but usually in the same general part of the settings where you choose which program to run for each extension you can also set a mime-type for each extension. (This is also the place Firefox looks in the "the Operating System is asked for a MIME type" step mentioned on the page I linked to above, BTW.) If you sent the MIME type for .txt to text/html you should get the behavior you want.
It is the HTTP headers which tells your browser what kind of data it is transfering so you have to edit the settings of your web server
Save the text with its htm-codes in WORDPAD as OPEN Document text.
Use in the name of the file the extension .htm.
This worked for me.