How to define a TOC in HTML for kindlegen to recognize - html

I convert a book which is written in DocBook into a single page HTML. The HTML contains a TOC:
<div class="toc">
<dl>
<dt><span class="preface">Preface</span></dt>
<dt><span class="chapter"><a href="#installation-und-versionsauswahl">1. Version Selection and
Installation</a></span></dt>
[...]
I'd like to use kindlegen to convert the HTML into a file I can use with a Kindle. That works without a problem. BUT the TOC is not recognized as a TOC. The Kindle user can't access the TOC directly with the TOC button.
What do I have to change that kindlegen recognize the TOC in my HTML file?

I'd recommend reading the official Kindle publishing guidlines from Amazon.
AFAIK kindlegen can't do that, you need a proper NCX file or an OPF with properly set TOC setting.
See also this short tutorial.

In case useful, I knocked up a quick PHP script to generate very basic NCX and OPF files to support the TOC without having to break up the document. I wrote the script based on a MS Word documented saved as HTML (so it is hard coded to use those style names). Just noting it here in case useful to anyone who comes along this post in the future. http://alankent.me/2016/03/05/creating-a-kindle-book-using-microsoft-word-quick-note/

Related

Markdown TOC with Special Characters?

I am trying to create a TOC for my Markdown blog.
The methods I am finding here... : Markdown to create pages and table of contents?
....do not work for me because I am naming all of my headers # _</>_ The Setup because I am using CSS on to style the "", giving each header a nice colored Icon next to it. If I simply use ```# The Setup ```` it works great.
This causes issues whenever I try to use [The Setup](#The-Setup).
I tried a few things like [The Setup](#_</>_-The-Setup) and other things, but I can not get it to work.
If someone can point me in the right direction I would greatly appreciate it. Also, if anyone has a better way of adding custom icons next to headers, I think that would be the better way to go about it.
As always, thanks in advance.
The general solution is to examine the rendered HTML output to see what the tool is converting the special characters to, in the HTML's element ID. Every tool could handle the conversion differently (it could convert special characters to -, _, or just remove special characters). Some examples:
<h1 id="_____the-setup">The Setup</h1>
<h1 id="-the-setup">The Setup</h1>
<h1 id="the-setup">The Setup</h1>
Once you have identified the exact id that the tool is using, then you use that value as the heading link in the markdown's table of contents. For example:
[The Setup](#_____the-setup)
Now, the tricky part is that not all Markdown tools will export the rendered HTML, including VS Code. The workaround for VS Code is:
Open the markdown preview mode (which renders to html internally).
Open the VS Code Developer Tools (Help > Toggle Developer Tools).
Use DevTools to inspect the element (in this case, the heading element for "The Setup").
I see that VS Code named the id as the-setup, so in the markdown's table of contents, I write [The Setup](#the-setup). Now the table of content hyperlink works in VS Code. Caveat: it might not work in other Markdown tools if they render a different HTML element ID!
Another shortcut now available in VS Code (1.70 July 2022), is that markdown can autocomplete the header ID. So you just type #, and it will list the valid IDs:

Rmarkdown (html): local search for table of contents (toc)

I was wondering if Rmarkdown for html outputs has a function that can do a local search for the TOC only. I don't want to use the Ctrl+F function as I have got repeated words used as the section names and it would be much slower than just searching over the TOC.
The TOC is built using Tocify.js. It does not have any such feature, so don't expect anything in RMarkdown unless you cook one up using JQuery/Javascript.

Using pandoc to generate PDF from Markdown with inline style

I'm looking to create a mostly markdown document, but would like to take advantage of inserting HTML when I might need a bit more control over formatting on a case-by-case basis. I have iaWriter on macOS and am able to do so, and from my understanding of markdown this is an included behaviour.
When using pandoc on my linux machine, however, some tags (most notably <i> tags at the moment) are not interpreted.
My markdown file is:
This _does_ work.
This does <i>not</i> work.
However, inserting a <p>tag</p> will create a line-break and new paragraph.
When I execute pandoc -o test.pdf test.md I get the result: test.pdf
I've tried a few extensions in the output (+raw_html, +inline_code_attributes) thinking maybe I was missing something but have so far not found an explanation.
Apologies if this is a duplicate, but I was unable to find it, and have so far been unable to source an answer.
Thank you.
See the pandoc MANUAL: Creating a PDF.
By default, pandoc will use LaTeX to create the PDF Therefore, raw HTML will be ignored and would only have an effect if your output format is HTML as well. However, you can use wkhtmltopdf instead of pdflatex to go from markdown to PDF via HTML, instead of via LaTeX.
From the raw HTML extension docs:
The raw HTML is passed through unchanged in HTML, S5, Slidy, Slideous, DZSlides, EPUB, Markdown, Emacs Org mode, and Textile output, and suppressed in other formats.

convert docx with (ordered) list to html

I'm trying to convert a large docx document with several layers' ordered list to an html. (see an example of the document here: http://docdro.id/X1oyfBv You should download it)
I tried the following things, including:
online converters such as html-cleaner and index.html (which only recognize one layer of the list)
save as html - which creates an horrendous file but still doesn't recognize the ol structure.
saved the file as zip and then opened the xml file, but I dont see an easy way to get the ol structure out of the w:... tags
saving it to google docs and running Omar Alzabir's script
http://omaralzabir.com/wp-content/uploads/2014/05/GoogleDocsEmail.jpg
btw. If I create a word file with an ordered list with multiple layers and i convert it, it does recognize it as ol's. But the existing file is not recognized as ol's even if I 'un-list' and list it again. So possibly there is something wrong with how the original document was created (?)
Any suggestions much appreciated:) Or indications as to why this problem occurs
Are you asking how to save a Word-doc in HTML format, with multi-level ordered-lists?
Word-HTML has bugs in its multi-level ordered lists. For the list-items, the indentation tends to be incorrect and inconsistent. There's an example here.
Word-HTML has similar bugs in its multi-level unordered lists. An example is here.
I recently wrote a Python program that fixes these bugs, in Word's HTML. The program is part of WordWebNav (WWN), which is free and open-source.
WWN is an app that converts a Microsoft-Word document to a usable web-page. It adds some missing features in the Word-HTML web-page (e.g., a navigation pane), and it fixes bugs in the Word-HTML.
You can use pandoc : https://github.com/jgm/pandoc
This is an open source universal command line tool to convert markup source based document files.
You can use it as something like that:
pandoc -o output.html input.docx

How to view xsd:documentation that is in HTML markup?

I am generating WSDL/XSD for SOAP services from a UML model using IBM Rational Software Architect (RSA). RSA allows you to document the classes and attributes in the model using rich-formatting.
For example, I have the following documentation on a Trailer class:
A wheeled Vehicle that is designed for towing by another
Vehicle. Known subtypes include:
Caravan
BoxTrailer
BoatTrailer
When the UML model is transformed to WSDL/XSD (using the out-of-the-box UML to WSDL transform), the formatting is preserved as HTML markup inside the xsd:documentation element:
<xsd:complexType name="Trailer">
<xsd:annotation>
<xsd:documentation><p>
A&nbsp;wheeled <strong>Vehicle</strong> that is designed for&nbsp;towing by another <strong>Vehicle.</strong> Known
subtypes include:&nbsp;
</p>
<ul>
<li>
<strong>Caravan</strong>
</li>
<li>
<strong>BoxTrailer</strong>
</li>
<li>
<strong>BoatTrailer</strong>
</li>
</ul></xsd:documentation>
</xsd:annotation>
</xsd:complexType>
Unfortunately, this is really hard to read and I've been searching (with no luck) for a program that can view WSDL/XSD with documentation in HTML markup.
XmlSpy 2008 can't do it, RSA can't do it (which is a bit surprising, as it generated the XSD in the first place), neither can any web browser I've tried.
I did write a JET template that extracted the documentation from the model and outputted to HTML, and I could probably write some XSLT to do something similar from the XSD, but I was hoping there's a program out there (ideally free) that could view the documentation as HTML.
Essentially, I'd like to be able to tell the consumers of our web service that they can view the WSDL in X program if they want to read the documentation - does anybody know the best solution to this?
Edit:
Thanks for the suggestions, but I think I have a solution! I didn't realise that RSA can export a WSDL to HTML (right-click on WSDL, export, HTML). The generated HTML has a graphical view of each schema element, the documentation for each element, as well as the original source, and everything is hyperlinked together.
Most importantly, the documentation is richly-formatted again! One small caveat is that the ;nbsp's appear in the HTML output. This seems to be because the ampersand is escaped in the HTML:
&nbsp;
Instead it should be
I will update my model-to-model transform to ensure that the ;nbsp's are replaced with real spaces (I don't believe I'll need non-breaking spaces in the documentation), so the generated WSDL/XSD won't ever have them.
I highly doubt if the standard xml/xsd editors can interpret the html tags and generate appropriate documentation. Oxygen XML Editor does a decent job of understanding and converting the XML entities (liket < etc) but HTML tags and entities are left as is. Below is the screen shot in design view.
The type of <xs:documentation> is <xs:any> so you should actually be able to include your documentation without escaping the markup, provided that it is a well formed XHTML fragment instead of HTML. I guess some XML Schema tools would be capable to interpret the embedded XHTML and show it as formatted text.
Do note that if the markup is not escaped it absolutely must be a well formatted XML fragment or the documentation element will cause your schema to be malformed. This applies also to HTML entities! If the documentation contains an (unescaped) entity reference (other than the 5 pre-defined XML entities), then your schema either must contain an external DTD reference or have an embedded DTD that defines what is the replacement text of that entity. In your case the documentation contains an entity reference. Probably easiest will be to replace such entities with the corresponding Unicode character/text or with character references (use   for )
If you have a chance, try to include the documentation without escaping the markup and make sure that it will be well formed. Otherwise you probably need to process the documentation twice: 1) parse the schema and extract documentation 2) parse the documentation text again (possibly as HTML, not XML).
I've tried this with the latest build of QTAssistant and it shows like this in the Schema Help Panel only; I've put a feature request for the grid view, as well as the documentation generator to work the same. Is this what you're expecting?
The help panel shows the annotation of the schema object that is selected in the Graph/Diagram view. To display the help panel press F1.
This issue is fixed in RSA 8.0.4 - which now supports exporting to WSDL/XSD with plain text (as well as an option to sort the schema by type, then name alphabetically!).
To view the the documentation in a WSDL/XSD generated from a UML model in prior versions of RSA, the easiest solution is to export the WSDL/XSD as HTML using RSA. You can do this by right-clicking on the WSDL/XSD, selecting export, then selecting HTML.
The generated HTML has a graphical view of each schema element, the documentation for each element, as well as the original source, and everything is hyperlinked together.
Most importantly, the documentation (that's virtually unreadable in the WSDL/XSD) is richly-formatted again! One small caveat is that the ;nbsp's that RSA's documentation editor inserts also appear in the HTML output. This seems to be because the ampersand is not only escaped in the WSDL/XSD (which is good), but also in the HTML (bad!):
&nbsp;
Instead it should be
A simple workaround to this is to replace all &nbsp;'s in the WSDL/XSD with real spaces before generating the HTML.