I'm faced with situation when I need to edit a .mht file (for example: add some text to site).
Could you please suggest a way of editing .mht (web archive) files?
What I've tried:
(editors like: notepad, word);
I-Explorer add-ons (like HTML Quick edit Bar)
An MHTML file is a web page archive format. It is meant to be stored and viewed but not to be edited directly.
However, you can easily extract the MHTML file to a regular HTML document (with linked files), edit it with your favorite HTML editor and then export it back to an MHTML archive (including the linked files).
Since you're using Internet Explorer, note that you can open/save between HTML and MHTML files. This can effectively be used to unpack, edit and repack the MHTML archive. Google Chrome can do this as well.
You may also find software that are able to edit the MHTML file directly (doing the unpacking/repacking in the background). Microsoft Word seems to be able to do this, but depending on your document structure, it may impact the content layout.
A quick look at the wikipedia entry for MHTML shows that it's an archive format, a little bit like a zip or rar archive. In order to edit a .mht you will need to unpack it, edit the required file then repack the archive.
You don't say what platform/software you are using but if you do a websearch for ".mht unpacker" you should be able to find something to do the job.
Unpacking a .mht to a local folder, edit the code and re-save it to .mht won't work. If you save to .mht from a local drive none of the linked files (pictures and whatever else is used for the page other than included within the html file) will be stored in the container.
I used Word (office 365) to open modify and save the changes. Maybe is not a optimal solution but works.
WizBrother.com WizHtmlEditor is a super capable fast and light wysiwyg editor that is ideal for quick assembly of elements because it can accept almost anything you throw at it - an entire screen of formatted html including pictures, rtf, drag-n-drop from a browser, and from clipboard, even media files. It doesn't care if it's editing MHT or HTML or several other formats. It's free - and they have a bulk converter BTW. Do a search and see.
I just open and edit with Microsoft Word. This is actually the official approved way of doing it BTW.
Related
I have HTML code that compiles into a chm, and occasionally I want to include a link to directly download a file... for example a small binary drawing file (extension .qid in my app) used as sample data for a tutorial in the chm. I have been doing this just fine for little drawing files by just providing a link like this...
some text
But my current problem is I have a little sample dxf that is to be used in this tutorial and when I provide a download link like this...
some text
...then I get a link ok, but when I click on it, it puts the dxf contents inline as text, rather than poppping up a download Save As dialog for some file at a path like mk:#MSITStore:wherever.dxf
Now I looked at HTML attribute doco and found a 'download' attribute which is meant to force the link to download, but it made no difference. I used this syntax...
<a href="relativepath/some.dxf" title="whatever" download>some text</a>
...which generated a chm with a link but ignored the attribute 'download'.
How can I force the href link to lead to a download dialog for a dxf file?
Please note CHM's are 20 years old. hh.exe is the HTMLHelp executable on Windows and associated with *.CHM files. It's just a shell that uses the HTML Help API and is really just hosting a browser window based on the old Internet Explorer in the HTMLHelp Viewer window. This is not based on Microsoft's browser EDGE!
You know, the HTML (!) Attribute directs new browsers to download the linked resource rather than opening it.
But - the download attribute is not supported by Microsoft Internet Explorer.
I tested linking from a single local HTML file too. Other browsers like Firefox, Chrome and EDGE also open a link to a local *.DXF file always as text file.
This also happens with embedded (compiled into a CHM file) *.dxf files.
So, you'll need to create a link to a ZIP file like e.g. some.dxf.zipinstead.
UPDATE:
This is working when the *.DXF file is not embedded and stored on a server: Test it for your needs by using in the old manner
test.dxf download
I am trying to save a customized html file as a pdf.. normally I would press ctrl-P at my browser (chrome) and print as pdf..
But when I open the pdf file, there is no bookmark tab on the left side of the pdf reader (adobe)..
What I want is to save an html file as a pdf and the bookmark should appear in the left side of the pdf reader:
I created the html file.. I added links to some parts of it using id and hyperlink:
part1
...some codes here...
<div id="part1">
and it works, but I don't know how to create a bookmark in pdf from an html... normally ms word or libre office can convert their documents to pdf with a bookmark..
But how can I made a pdf with a bookmark using HTML?
Okay, so I ran into this problem and really wanted there to be a solution here that worked. When there wasn't, I figured I should add what I found so that hopefully the next developer can benefit from it.
First up: HTML conversion to PDF isn't really up to the HTML itself - it's up to whatever the conversion engine decides to do with your HTML. So for instance, if your approach is: Open it in IE/Chrome/Firefox/whatever > File > Print > Microsoft Print to PDF - well, your conversion engine is 'Microsoft Print to PDF'. Doesn't matter what browser you were using at that point - all its doing is creating a print stream to send to a printer. So if Microsoft Print to PDF isn't going to make bookmarks for you (which it doesn't) then it doesn't matter which web browser you use to open the PDF.
And this is the critical problem with any Ctrl-P / Print avenues. The web browser is ultimately creating a print stream, which the conversion library simply streams into a PDF. And all the web browsers I looked at do not have native support built in to convert to PDF (why would they? 99% of the use cases are covered with a 'Print to PDF' functionality.) And the print drivers I tried (Microsoft Print to PDF, Adobe PDF Print) didn't manage to suss out bookmarks from the raw print stream. Which makes sense.
So, at this point, what you're looking for is a standalone PDF Conversion engine - something that can actively open the HTML file and convert from there, instead of going through a web browser. Are there PDF Conversion engines that do this and add Header-Tag based bookmarks? Possibly. The ones we had at our disposal (ABCPdf, Neevia) weren't able to do it, but it's certainly possible there's one out there.
So what now?
There are a few different options I explored.
Option #1: Separate Files, Combined With Adobe
Adobe Acrobat (non-viewer version), when it's the conversion engine, will automatically add bookmarks for each file it converts. So you can submit the HTML contents, not as a single HTML file, but as HTML files for each section you want a bookmark over.
The good news is that if a section has a hyperlink that points to another document its merging, it's smart enough to have that hyperlink point to the spot within the internal PDF its creating (it's not an external hyperlink like I expected it would be). There are two bits of bad news, though:
Each section has to be the start of a PDF page. If your section is
two inches tall, the rest of the page will be blank, and the next
section will start on the following page.
The bookmarks aren't clean. When I did it, each file had 3
bookmarks. Which is pretty darned ugly and off-putting.
Option #2: Separate Files, Combined With Another Library
The first 'downside' of Option #1 might not be a problem. But the second is pretty ugly. And other libraries definitely can create the bookmarks without creating 3-per-file. The main obstacle here is: the library has to be smart enough to resolve those 'external' hyperlinks to within the PDF that's created. One thing that often hurts is that those conversion libraries often want to convert each separate file to a PDF internally first and then merge the PDFs together... but that means that it won't handle the cross-file hyperlinks correctly. I wasn't able to find a way to make this work with our existing PDF conversion libraries.
Option #3: Different Origination Method
Instead of having a 'Help.html', which is then converted to PDF somehow, start with a format other than HTML. And the easiest source to get into PDF+Bookmarks is MSWord+Headers. Generally, for each PDF help file you want, you can have a master .DOCX sitting somewhere behind the scenes. We've used this approach before, and while it's not the most elegant, it at least works pretty well.
Option #4: Programmatic with Library
This might not be applicable for the OP's use case... but if you're generating the help, there's nothing to say you can't use the PDF Conversion library programatically to add whatever bookmarks you want. Pretty much every PDF engine I've seen allows API access to bookmarks, so if this avenue is open to you, it's almost certainly the cleanest solution-wise.
Option #5: PDF Conversion Scouring
Like I mentioned, it's possible there's a PDF conversion engine out there that has a good HTML parsing engine and can handle bookmarks from various HTML tags (like H1, H2, etc.) However, it's probably going to take a bit to find it, because it's so much easier for a potential engine-writer to allow the file to be rendered with a native viewer. Think about it. If you were writing a PDF Conversion Service, which would you rather do:
Develop routines that can accurately render an HTML document fed
into it - aka, basically write your own web browser from scratch.
Have IE/Chrome/Whatever render it and simply take their print output
to convert to PDF.
... that second option is so ridiculously easier than the first, that it's no surprise most PDF Conversion engines don't have their own internal HTML parser (or for that matter, Word parser, Excel parser, etc.)
The bookmarks in html input document are set like this:
....
...
...
...
<h1 id="marcador1"> Chapter 1 </h1>
...
Don't use chrome, although it is simple to convert a web page to a PDF file. If you want pdf bookmarks, you can try microsoft word (2010). Just save the web pages to local, and open it with MS word 2010, then save it as pdf. The bookmark is there. see also: https://www.w3.org/TR/WCAG20-TECHS/PDF2.html
App comparison for converting PDF (regarding bookmark & internal hyperlink)
I did some tests for different app, (results may not be accurate due to personal settings / mis-used)
pdf bookmark
internal hyperlink
downloaded as .htm
file format looking
Chrome (print as PDF)
N
Y
N
looks same as the webpage
Calibre
Y
N/Y
Y
looks same as the webpage
Print Friendly & PDF 2.8.1 (Chrome Extension)
N
Y
N
syntax color is changed
WPS docx
N/Y
N
Y
format is changed a lot
Foxit PDF
N
N
Y
looks same as the webpage
Adobe PDF
N
N
Y
looks same as the webpage
MS Word docx
Adobe PDF (Chrome Extension)
annotation:
pdf bookmark = contains bookmark in PDF file
internal hyperlink =
Y = the web hyperlinks inside jumps to the position in the PDF internally
N = the web hyperlinks inside opens an external web link in your browser
downloaded as .htm =
Y = the webpage is downloaded as .htm then converted to PDF
N = the webpage is directly converted in Chrome browser
file format looking
(Though I said "looks same as the webpage", its not "exactly" same as the webpage -- you need to config the settings when you convert.
Also some minor parts / components of the webpage may or may not be contained in the PDF.)
Calibre Usage
To use Calibre (As shown, Calibre contains the bookmark. But it doesnt have internal hyperlink.)
webpage is downloaded as .htm (along with a folder)
drag the .htm into Calibre, it becomes a .zip file
use Convert books to convert .zip to .pdf
You may need to set up the bookmark detection mechanism in Convert books > Table of Contents if Calibre doesnt detect it.
Calibre is highly customizable on the conversion
(wish I know how to solve the issue of "not having internal hyperlink" directly inside Calibre, without going through HTTrack)
To use Calibre, with HTTrack to add internal hyperlink:
use HTTrack to download the webpage
(with depth of level of 1 (--ie: just current webpage), should be enough)
(you may need to config it so that it captures external files like images / syntax-format files)
drag the index.html into Calibre ... (proceed same as [2~4] above)
(you need to enable the option of creating the index.html)
WPS docx Usage (not recommend)
webpage is downloaded as .htm (along with a folder)
save as .docx
output as .pdf (enable the option convert title style format to bookmark)
(if no title style format is detected, that may due to the title are actually in the style format of hyperlink style format, you need to manually remove all those hyperlink style format.)
note
testing subject weblink is this ; (testing result PDF are not posted here)
Again, I could be wrong -- results may not be accurate due to personal settings / mis-used
Personally, I believe big companies like Adobe should have such functionality to include bookmarks in PDF. It just I dont know how to do it...
Using HTML5 File API I am able to read text and XML files without any problems. I have tried to read the .docx/.doc file with the same code and that was not working. In my chrome extension I need to open a .doc/.docx file in editable mode in Google chrome. I am really waiting to know all the possible ways to achieve this. I found some extensions like Google docs viewer etc.. But they are opening files in preview mode. Please help me on this
The .DOC file is binary, and DOCX is a zip file containing a whole collection of XML files that make up a Word document, so neither can easily be read by your straight XML reader.
I don't think there are any native extensions or bits of code for Chrome to edit DOC or DOCX files, so you'd have to write your own - presumably, that's what the extension you're considering would do. You can use the Google docs viewer as a jumping off point - there's no difference between "preview mode" and "edit mode" other than one writes back to the file and the other doesn't. And you'd need to add the controls to modify the document on screen, which may be the larger hurdle.
If you can give some detail on where exactly you're stuck, that might help the community point you towards a solution, but a general "nothing does this for me" is likely to result in a little less help.
Good luck!
you can use jquery for this.
you can use typewith me which is generated in jquery where you can import/export docx,doc.pdf,etc.. files check type with me and private pad
you can use its jquery code for your use as it is opensource.
In Word 2003 one can save as WEB PAGE and get document translatted into HTML coding.
You can use VIEW and see SOURCE CODE to get the HTML coding for that file.
In Word 2007 you can save as web page but I can't find how you VIEW the source code that was created with it.
What you need to do is right-click on the file and select Open With... and use notepad to view the HTML.
Shield your eyes; it's ugly, ugly code.
EDIT: To alleviate some of the bloat and make things more legible, I suggest http://textism.com/wordcleaner/ - I've had pretty good results with it in the past, but it only works for files up to 20kb.
For SO bonus points, check out Jeff's C# code here: Cleaning Word's Nasty HTML.
You can also change the extension of the .docx to zip, then view the contents. A .docx file is actually a zip file with several .xml files inside... but that probably won't give you what you're looking for.
If you've only got a simple HTML page (I can't imagine it being much more than that if it was wrote in Word) you can just view the source in your browser.
Is there a way to export a simple HTML page to Word (.doc format, not .docx) without having Microsoft Word installed?
If you have only simple HTML pages as you said, it can be opened with Word.
Otherwise, there are some libraries which can do this, but I don't have experience with them.
My last idea is that if you are using ASP.NET, try to add application/msword to the header and you can save it as a Word document (it won't be a real Word doc, only an HTML renamed to doc to be able to open).
There's a tool called JODConverter which hooks into open office to expose it's file format converters, there's versions available as a webapp (sits in tomcat) which you post to and a command line tool. I've been firing html at it and converting to .doc and pdf succesfully it's in a fairly big project, haven't gone live yet but I think I'm going to be using it.
http://sourceforge.net/projects/jodconverter/
There is an open source project called HTMLtoWord that that allows users to insert fragments of well-formed HTML (XHTML) into a Word document as formatted text.
HTMLtoWord documentation
While it is possible to make a ".doc" Microsoft Word file, it would probably be easier and more portable to make a ".rtf" file.
If you are working in Java, you can convert HTML to real docx content with code I released in docx4j 2.8.0. I say "real", because the alternative is to create an HTML altChunk, which relies on Word to do the actual conversion (when the document is first opened).
See the various samples prefixed ConvertInXHTML. The import process expects well formed XML, so you might have to tidy it first.
Well, there are many third party tools for this. I don't know if it gets any simpler than that.
Examples:
http://htmltortf.com/
http://www.brothersoft.com/windows-html-to-word-2008-56150.html
http://www.eprintdriver.com/to_word/HTML_to_Word_Doc.html
Also found a vbscribt, but I'm guessing that requires that you have word installed.
I presume from the "C#" tag you wish to achieve this programmatically.
Try Aspose.Words for .NET.
If it's just HTML, all you need to do is change the extension to .doc and word will open it as if it's a word document. However, if there are images to include or javascript to run it can get a little more complicated.
i believe open office can both open .html files and create .doc files
You can open html files with Libreoffice Writer. Then you can export as PDF from File menu. Also browsers can export html as a PDF file.
use this link to export to word, but here image wont work:
http://www.jqueryscript.net/other/Export-Html-To-Word-Document-With-Images-Using-jQuery-Word-Export-Plugin.html