HTM Document unable to open in Word - html

I started working for a marketing company who had their intranet file (.htm) set up so non-tech people could open it in Word and edit the hyperlinks or add a new file.
Of course this led to some CSS issues. So I opened it in Dreamweaver, fixed the CSS saved it and now you cannot view the entire document in word any longer?
Can I reformat it so it is accessible to the rest of the staff in Word again?

An option (sorry if you've tried it). Create a new word document, copy and paste the content from a browser window of the intranetfile.HTM into the word document.
Save the word document as an new-intranetfile.HTM. Rename the intranetfile.HTM to old-intranetfile.HTM and rename new-intranetfile.HTM to intranetfile.HTM.
Now you should be able to open the HTM file in word again.
My guess is that when you saved intranetfile.HTM from DreamWeaver it added dreamweaver code which confuses word. Or the hidden files that word saves when it saves an HTM document no longer linked correctly to the intranetfile.HTM and that was the issue.
Dorje

Related

HTML Hyperlinks and HTML Structure

I'm a mech engineer w/ no experience in HTML, doing an odd task for my boss.
I have managed to save the excel sheet (contains hyperlinks) to HTML format. However, I'd like to edit these hyperlinks within the HTML code.
I right-clicked on the test.htm document and viewed the HTML in notepad. I expected to be able to find the hyperlink at this stage.
My question is:
What is the structure of HTML files, in the sense that the hyperlink must be stored somewhere, how do I view it, preferably within notepad?
This has to do with the way excel saves to html. It generates a filename.htm which acts as a viewer, and then places data for each sheet inside a "filename_files" folder. Inside, you'll find sheetxxx.htm files (which actually contain the href you are looking for). When you open filename.htm, scripts inside it load the corresponding sheets into the page.
And this is why you won't find what you want inside the main file. Keep in mind that html is dynamic, which means that, differently from opening a static file in notepad, it can be configured to perform actions, changing what you see in the page and its code behind (Dom Explorer)
Hyperlinks in HTML are designated by the "a href" tag. For instance,
This Link Goes To Google
creates a link that says "This Link Goes To Google"

Finding missing text or code in Dreamweaver

Is it possible to find which pages in Dreamweaver are missing a certain snippet of code or text?
I have a site with hundreds of pages, and I would like to search for the pages that are missing a certain snippet of code.
I'm open to using a different software if Dreamweaver cannot do it.
Dreamweaver can find the missing snippet , first you need to one page in Dreamweaver , and then press ctrl+f (find) and change current document to folder and browser your folder path, then copy paste what you need to search and click replace all then you need to open the folder and check the missing html ain't modified, other's are modified new date ,you can see which pages are missing the snippet,

Open html file in a new tab instead of downloading it

For my Trac plugin, I have made an export script which converts contents to a different format. The result is an HTML code.
When I click the link, some browsers open the HTML code in a new tab, while others offer to download it as a .print file, depending on their specific settings I think. Opening this .print file shows the same HTML page as opening it directly, but locally instead of from the server.
How can I force it to always open in a new tab?
I think it might be a mimetype issue. If it is, which mimetype can I use to tell the browser to open the HTML code directly? I am currently using text/html as mimetype.
EDIT: some more info
To give some more insight, adapting from a comment of mine below:
I do not create the link myself. The link is provided by Trac, the bug tracking software the plugin is for, and what I do is implement the method that creates the HTML code and let it return the HTML code along with the mimetype. Trac then returns the HTML code either as a file, or as a new tab, when clicking on that content conversion link. What I am searching for is a possibility to specify in the HTML code or mimetype that it gets opened in a new tab directly.
Maybe there is some kind of mimetype specifying the (HTML) text as an HTML web document instead of HTML file (if that distinction even exists).
Or an HTML/XML header or doctype specifying whether it gets downloaded or opened by a browser. I think the browser need to get that information from somewhere.
Or maybe there is an option to set in Trac.
I hope these ideas of mine about what could exist can help those of you who are versed with either or some of these to find a solution. I could not find a solution through my research yet.
If you have a link that "directly" opens (not in a new tab) and you want it to open a new tab, one way of doing it is
This will create a blank page, then paste the link there automagically and thus you will have a new tab with the desired page.

How to convert HTML to PDF with Bookmark

I am trying to save a customized html file as a pdf.. normally I would press ctrl-P at my browser (chrome) and print as pdf..
But when I open the pdf file, there is no bookmark tab on the left side of the pdf reader (adobe)..
What I want is to save an html file as a pdf and the bookmark should appear in the left side of the pdf reader:
I created the html file.. I added links to some parts of it using id and hyperlink:
part1
...some codes here...
<div id="part1">
and it works, but I don't know how to create a bookmark in pdf from an html... normally ms word or libre office can convert their documents to pdf with a bookmark..
But how can I made a pdf with a bookmark using HTML?
Okay, so I ran into this problem and really wanted there to be a solution here that worked. When there wasn't, I figured I should add what I found so that hopefully the next developer can benefit from it.
First up: HTML conversion to PDF isn't really up to the HTML itself - it's up to whatever the conversion engine decides to do with your HTML. So for instance, if your approach is: Open it in IE/Chrome/Firefox/whatever > File > Print > Microsoft Print to PDF - well, your conversion engine is 'Microsoft Print to PDF'. Doesn't matter what browser you were using at that point - all its doing is creating a print stream to send to a printer. So if Microsoft Print to PDF isn't going to make bookmarks for you (which it doesn't) then it doesn't matter which web browser you use to open the PDF.
And this is the critical problem with any Ctrl-P / Print avenues. The web browser is ultimately creating a print stream, which the conversion library simply streams into a PDF. And all the web browsers I looked at do not have native support built in to convert to PDF (why would they? 99% of the use cases are covered with a 'Print to PDF' functionality.) And the print drivers I tried (Microsoft Print to PDF, Adobe PDF Print) didn't manage to suss out bookmarks from the raw print stream. Which makes sense.
So, at this point, what you're looking for is a standalone PDF Conversion engine - something that can actively open the HTML file and convert from there, instead of going through a web browser. Are there PDF Conversion engines that do this and add Header-Tag based bookmarks? Possibly. The ones we had at our disposal (ABCPdf, Neevia) weren't able to do it, but it's certainly possible there's one out there.
So what now?
There are a few different options I explored.
Option #1: Separate Files, Combined With Adobe
Adobe Acrobat (non-viewer version), when it's the conversion engine, will automatically add bookmarks for each file it converts. So you can submit the HTML contents, not as a single HTML file, but as HTML files for each section you want a bookmark over.
The good news is that if a section has a hyperlink that points to another document its merging, it's smart enough to have that hyperlink point to the spot within the internal PDF its creating (it's not an external hyperlink like I expected it would be). There are two bits of bad news, though:
Each section has to be the start of a PDF page. If your section is
two inches tall, the rest of the page will be blank, and the next
section will start on the following page.
The bookmarks aren't clean. When I did it, each file had 3
bookmarks. Which is pretty darned ugly and off-putting.
Option #2: Separate Files, Combined With Another Library
The first 'downside' of Option #1 might not be a problem. But the second is pretty ugly. And other libraries definitely can create the bookmarks without creating 3-per-file. The main obstacle here is: the library has to be smart enough to resolve those 'external' hyperlinks to within the PDF that's created. One thing that often hurts is that those conversion libraries often want to convert each separate file to a PDF internally first and then merge the PDFs together... but that means that it won't handle the cross-file hyperlinks correctly. I wasn't able to find a way to make this work with our existing PDF conversion libraries.
Option #3: Different Origination Method
Instead of having a 'Help.html', which is then converted to PDF somehow, start with a format other than HTML. And the easiest source to get into PDF+Bookmarks is MSWord+Headers. Generally, for each PDF help file you want, you can have a master .DOCX sitting somewhere behind the scenes. We've used this approach before, and while it's not the most elegant, it at least works pretty well.
Option #4: Programmatic with Library
This might not be applicable for the OP's use case... but if you're generating the help, there's nothing to say you can't use the PDF Conversion library programatically to add whatever bookmarks you want. Pretty much every PDF engine I've seen allows API access to bookmarks, so if this avenue is open to you, it's almost certainly the cleanest solution-wise.
Option #5: PDF Conversion Scouring
Like I mentioned, it's possible there's a PDF conversion engine out there that has a good HTML parsing engine and can handle bookmarks from various HTML tags (like H1, H2, etc.) However, it's probably going to take a bit to find it, because it's so much easier for a potential engine-writer to allow the file to be rendered with a native viewer. Think about it. If you were writing a PDF Conversion Service, which would you rather do:
Develop routines that can accurately render an HTML document fed
into it - aka, basically write your own web browser from scratch.
Have IE/Chrome/Whatever render it and simply take their print output
to convert to PDF.
... that second option is so ridiculously easier than the first, that it's no surprise most PDF Conversion engines don't have their own internal HTML parser (or for that matter, Word parser, Excel parser, etc.)
The bookmarks in html input document are set like this:
....
...
...
...
<h1 id="marcador1"> Chapter 1 </h1>
...
Don't use chrome, although it is simple to convert a web page to a PDF file. If you want pdf bookmarks, you can try microsoft word (2010). Just save the web pages to local, and open it with MS word 2010, then save it as pdf. The bookmark is there. see also: https://www.w3.org/TR/WCAG20-TECHS/PDF2.html
App comparison for converting PDF (regarding bookmark & internal hyperlink)
I did some tests for different app, (results may not be accurate due to personal settings / mis-used)
pdf bookmark
internal hyperlink
downloaded as .htm
file format looking
Chrome (print as PDF)
N
Y
N
looks same as the webpage
Calibre
Y
N/Y
Y
looks same as the webpage
Print Friendly & PDF 2.8.1 (Chrome Extension)
N
Y
N
syntax color is changed
WPS docx
N/Y
N
Y
format is changed a lot
Foxit PDF
N
N
Y
looks same as the webpage
Adobe PDF
N
N
Y
looks same as the webpage
MS Word docx
Adobe PDF (Chrome Extension)
annotation:
pdf bookmark = contains bookmark in PDF file
internal hyperlink =
Y = the web hyperlinks inside jumps to the position in the PDF internally
N = the web hyperlinks inside opens an external web link in your browser
downloaded as .htm =
Y = the webpage is downloaded as .htm then converted to PDF
N = the webpage is directly converted in Chrome browser
file format looking
(Though I said "looks same as the webpage", its not "exactly" same as the webpage -- you need to config the settings when you convert.
Also some minor parts / components of the webpage may or may not be contained in the PDF.)
Calibre Usage
To use Calibre (As shown, Calibre contains the bookmark. But it doesnt have internal hyperlink.)
webpage is downloaded as .htm (along with a folder)
drag the .htm into Calibre, it becomes a .zip file
use Convert books to convert .zip to .pdf
You may need to set up the bookmark detection mechanism in Convert books > Table of Contents if Calibre doesnt detect it.
Calibre is highly customizable on the conversion
(wish I know how to solve the issue of "not having internal hyperlink" directly inside Calibre, without going through HTTrack)
To use Calibre, with HTTrack to add internal hyperlink:
use HTTrack to download the webpage
(with depth of level of 1 (--ie: just current webpage), should be enough)
(you may need to config it so that it captures external files like images / syntax-format files)
drag the index.html into Calibre ... (proceed same as [2~4] above)
(you need to enable the option of creating the index.html)
WPS docx Usage (not recommend)
webpage is downloaded as .htm (along with a folder)
save as .docx
output as .pdf (enable the option convert title style format to bookmark)
(if no title style format is detected, that may due to the title are actually in the style format of hyperlink style format, you need to manually remove all those hyperlink style format.)
note
testing subject weblink is this ; (testing result PDF are not posted here)
Again, I could be wrong -- results may not be accurate due to personal settings / mis-used
Personally, I believe big companies like Adobe should have such functionality to include bookmarks in PDF. It just I dont know how to do it...

Create a Sharepoint page from HTML file with images

I'm trying to create an editable page in Sharepoint. I already have the page in HTML (it's quite large) and it has many images in it. Previously I have just created a new page in sharepoint and pasted the HTML source in, the uploaded/inserted the images manually, one at a time.
Unfortunately, I am not able to do this in a reasonable amount of time since there are many images this HTML file is using.
So, I want an editable Sharepoint page that keeps the images intact from a directory that looks like this:
thepage.html
1.png
2.png
...
...
...
343.png
etc
Any ideas?
EDIT: For more clarity - this is a specifications document in HTML form, so it has a lot of text and header integrated with images. I'd like it to be converted to an actual Sharepoint Page that is editable from Sharepoint's interface.
Seems best here to use a low-tech solution, some HTML editing and use the best way for you to upload multiple files.
Assuming
C:\mypage
-> \page.html
-> \images\1.png
-> \images\2.png
...
-> \images\100.png
Via the UI
Go to a Document or Image library, and use the "Upload Multiple files/images" (this only appears on Internet Explorer)
Lets say you uploaded it to //sharepoint/myimages
Create a new content page (say an Article page, or WebPart Page with a Content Editor WebPart)
Lets say your page resides now at //sharepoint/pages/mypage.aspx
Change your html to point from <img src="images/1.png" /> to <img src="../myimages/1.png" />
Edit the HTML for your newly created page (Ribbon > Edit HTML Source), paste your HTML code
Via SharePoint Designer
Drag and Drop all the images in your desired location
repeat the HTML steps above
To replace text in bulk, SharePoint Designer, your favorite HTML editor or event Notepad can do that well using the CTRL+H menu / Edit > Find & Replace options.
NOTE: the //sharepoint address up there is the http url for your site, SO won't let me use a full fake address as a sample.
From IE or from Word, save the page as a complete webpage so it creates an HTML file plus a folder with the images.
In network places, create web folder (WebDAV) pointing to Sharepoint. This way, you can access it from the file system in Explorer.
Open your new network place, navigate to the library where you want your HTML file to be, and drag-n-drop the file and folder into there.
The file then will be visible in browser, with the pictures, but the folder will be hidden.
If I have understood correctly your question. You can use this post answer to load list of images by javascript and php ->
Load list of image from folder.
Upload files to Share Point server and use that folder.
Or you can dynamically write c# code to read Share Point folder and display images.