How can I save a "complete" HTML file as single file? - html

Are there any utilities or web browsers that can save a file and referenced resources as a single HTML file?
With most web browsers / wget there's the option to download required CSS and images as seperate files. Is there a way to automatically inline the CSS and images?

I have made a python script for this. Up to now, it covers my own needs perfectly. Hopes to be useful.
https://github.com/zTrix/webpage2html

MHTML is the format for this.
http://en.wikipedia.org/wiki/MHTML

This web extension might help you.
https://github.com/gildas-lormeau/SingleFile
"It helps you to save a complete web page into a single HTML file."
It is available for almost all popular browsers.

Safari (on both Windows and Mac) can create .webarchive files.
Link:
http://en.wikipedia.org/wiki/Webarchive

If you have access to wget, then you likely have access to a tar utility too. While it won't give you a browser-readable single file, if you wget a page and then tar up all of the downloaded artifacts, you effectively have a 1-file version of everything needed for that page.

Related

Why is my HTML code not translating right to the browser?

When I type my code in VSCode, it ends up looking like below in Google Chrome. Why would it be doing that?
I tried fixing my code and I expected it to come out looking cleaner in the browser. It actually just brought over all of the code.
When loading local files, browsers use the file extension to determine how to process the file.
Since your file doesn't have one, it treats it as plain text.
Rename it so it ends in .html.
From the URL, it is clear that you've not saved your file with .html extension. That is why it is showing up as text instead of a web page.
Save the file with name Mywebsite.html and try again. Hope this helps!

wget downloads the same html for every version of the website

I'm attempting to download the html using wget for this website:
https://cxcfps.cfa.harvard.edu/cda/footprint/cdaview.html#Footprints|filterText%3D%24filterTypes%3D|query_string=&posfilename=&poslocalname=&inst=ACIS-S&inst=ACIS-I&inst=HRC-S&inst=HRC-I&RA=210.905648&Dec=39.609177&Radius=0.0006&Obsids=&preview=1&output_size=256&cutout_size=12.8|ra=&dec=&sr=&level=&image=&inst=ACIS-S%2CACIS-I%2CHRC-S%2CHRC-I&ds=
Which is a version of the main website:
https://cxcfps.cfa.harvard.edu/cda/footprint/cdaview.html
The only difference from the main website is that the first link takes you to the version that has already searched through a database and displayed results, which you can see in a table. But when I use wget to download the text version of the html for the longer link, but it gives me the exact same text as for the main/short link. I'm confused, but maybe I just don't understand enough about html. I thought they should be slightly different, display the text-html for the database results, etc.
I also used the --mirror option to download all the necessary files, but they all look the same, too. I've also tried using cURL for this too, and the same thing. Can someone please explain why this is happening and if it's fixable?
The problem is that the main website has a lot of javascript and other code that is not included in the version that you are downloading. The --mirror option will download all the necessary files, but it's not going to be exactly what you want. You can use wget to download the HTML file from the main website, then use wget again with the --mirror option to download all the necessary files. Then you can use grep to search through the HTML file for the table that you want.

hhc file of CHM to html

I have created a chm file using HTML Help workshop and it works fine in Windows. However the chm viewer used in Linux is not so good and I prefer not to use it in Linux. I am also restricted from using any other viewers in Linux. So I thought of decompiling the chm and create HTML files which can be used in Linux. However on decompiling, the hhc file generated, though contains the HTML tags, does not display properly in the browser.
<LI> <OBJECT type="text/sitemap">
<param name="Name" value="Main">
<param name="Local" value="Main.html">
</OBJECT>
The above code just displays only a bullet and nothing else. Is there a way by which the data in the param tags can be used. Like it displays Main as a link to Main.html.
NOTE: I don't want to use browser in Windows, so avoiding chm file as such is not a solution. Also I will have to make changes in multiple places if I use chm in Windows and separate HTML file for Linux. So the option I am thinking of is to use the HTML files generated by decompiling the chm in Linux.
You know, .hhc files contain the Table of Contents (TOC) for an HTMLHelp file (CHM), i.e. the entries displayed
in the left pane of the CHM viewer window. It's compiled into the CHM file.
An .hhc file is referred to as a sitemap file. Sitemap is a file format developed and proposed by Microsoft to the World Wide Web Consortium. Sitemap files control many navigation features for CHM files, such as the table of contents and index panes.
Please note HTMLHelp and all this is about 20 years old! The .hhc sitemap file was not standardised (standardized) by W3C e.g. as HTML5 and is a old proprietary Microsoft file format today.
I'd recommend using a so called web-based help under LINUX. If you really have permanent updates of your help topic content you'll need to survey your workflow.
Some thoughts (as I understand your needs):
Low budget and the man's way by using HTMLHelp Workshop (use of tools
recommended e.g. like FAR
HTML
Think about single-sourcing - one source of topics and different target formats (e.g. CHM, web-based (uncompressed help) on a server)
Think about your CHM file as a compiled web, create the HTML topics in structured folders like a web page (best use case for the wizards of FAR HTML)
Create a CHM file from source
Create a uncompressed web based help from source by uploading the web to Intranet or Internet (HTML files, images, ...), completed by a handmade Table of Contents derived from the .hhc file
But, don't decompile. Have single-sourcing in mind - I'd recommend using a time saver tool like mentioned above. To see what I mean navigate to following links:
Example 1: Uncompressed help - a bit dated
Example 2: FAR Web help created using FAR HTML.
For further information go to FAR HTML Tour and scroll down to uncompressed help.

How is Chrome SingleFile format achieved?

Chrome has an extension called SingleFile. It basically saves a web page in a one file *.htm page that is a clone of the original website. I have seen something like this done with Mozilla MAFF format. The MAFF format saves the file in *.maf, and is you want to see contents (html, css, images etc.) you can change the format to *.zip. Then you can unzip it. With the SingleFile (Chrome) you can’t unzip the file by changing extension. Does anybody know how this is achieved? Is this a known thing that *.htm can offer? Thanks
The MAFF format saves the file in *.maf, and is you want to see
contents (html, css, images etc.) you can change the format to *.zip.
Then you can unzip it.
I'm assuming that you're really asking just how the image files are stored in .htm since html and css can easily be stored as text in htm.
It uses uuencode/uudecode to embed image files in the .htm file. More on that here:
https://en.wikipedia.org/wiki/Uuencoding
This is why changing the extension to .zip won't turn the file into a zip package that you can unzip.
Illustrative side-by-side screenshots of external vs. embedded image.
I found this article may help: http://www.techgainer.com/enable-single-file-mhtml-support-chrome/
On Chrome address bar, type chrome://flags, then hit enter. Now use Ctrl+F (Command+F on Mac) bring search bar and search for mhtml as I did below. Once you find the option, click on Enable link.

Is there a way to export a page with CSS/images/etc using relative paths?

I work on a very large enterprise web application - and I created a prototype HTML page that is very simple - it is just a list of CSS and JS includes with very little markup. However, it contains a total of 57 CSS includes and 271 javascript includes (crazy right??)
In production these CSS/JS files will be minified and combined in various ways, but for dev purposes I am not going to bother.
The HTML is being served by a simple apache HTTP server and I am hitting it with a URL like this: http://localhost/demo.html and I share this link to others but you must be behind the firewall to access it.
I would like to package up this one HTML file with all referenced JS and CSS files into a ZIP file and share this with others so that all one would need to do is unzip and directly open the HTML file.
I have 2 problems:
The CSS files reference images using URLs like this url(/path/to/image.png) which are not relative, so if you unzip and view the HTML these links will be broken
There are literally thousands of other JS/CSS files/images that are also in these same folders that the demo doesn't use, so just zipping up the entire folder will result in a very bloated zip file
Anyway -
I create these types of demos on a regular basis, is there some easy way to create a ZIP that will:
Have updated CSS files that use relative URLs instead
Only include the JS/CSS that this html references, plus only those images which the specific CSS files reference as well
If I could do this without a bunch of manual work, if it could be automatic somehow, that would be so awesome!
As an example, one CSS file might have the following path and file name.
/ui/demoapp/css/theme.css
In this CSS file you'll find many image references like this one:
url(/ui/common/img/background.png)
I believe for this to work the relative image path should look like this:
url(../../common/img/background.png)
I am going to answer my own question because I have solved the problem for my own purposes. There are 2 options that I have found useful:
Modern browsers have a "Save Page As..." option under the File menu, or in Chrome on the one menu. This, however does not always work properly when the page is generated by javascript
I created my own custom application that can parse out all of the CSS/Javascript resources and transform the CSS references to relative URLs; however, this is not really a good answer for others.
If anyone else is aware of a commonly available utility or something like that which is better than using the browser built in "Save page as..." option - feel free to post another answer.