Asciidoc add link to another html file only in toc - html

I have a base asciidoc file where I want to reference other html files. In my current solution, I render them via include all on one page and add a reference to the toc.
This messes up in a very long scrollable page.
That is generally fine but I would prefer the following:
For each addtional html I want to add e reference to the toc only. When klicking this ref, I want to just render the specific html.
Is there a way to do this with the template toc or do I have to use a custom solution?

What you are asking for (I think) is one HTML page per Asciidoc markup file, with a common TOC/navigation between them. Normally, this capability is called "chunking", at least the way that DocBook does it.
Unfortunately, asciidoctor doesn't have a chunking capability. If you have multiple Asciidoc files, they can be converted to HTML easily, but assembling a common TOC is extra work. Also, cross-reference links between Asciidoc pages is not handled.
You might be interested to learn about Antora (https://antora.org/). It produces a static HTML site from one or more Git repos containing Asciidoc markup files, and each Asciidoc page becomes an HTML page, and there is a common TOC plus cross-reference links. It is, quite likely, a good solution for your current situation.

Related

I would like to put more than one website in a single html file

I want to have a couple html websites in a single file. Is this possible?
A website is not just an html "page".
An html file represents the document structure of one page.
Theoretically, saying that you want to represent multiple websites on one html file is like saying that you want to write different documents (your tax files, a book, a ticket for a movie, etc) based on one single template.
While theoretically you can dinamically change the structure of such a document, there is absolutely no point in doing so.
HTML describes the structure of Web pages using markup.
So why would you use a single HTML file to represent different web pages?
Sorry, but you can't. It's not possible. Why would you even do it?
The only thing that comes into my mind is to use <embed>tag, for ex. But it's probably not what you rly want
You must be more specific. The question is vague. In general you can write a code that can change dynamically the website appearance after inputs/actions from the users. For example a JavaScript code that shows/hides something (or the complete website) as long as the mouse is over an element or select/deselect an element. It all can be in a single html document (html5, css3, JavaScript/JQuery).

Including HTML in Markdown

Assuming I am in control of the parsing environment and I'm certain it is only to be converted to HTML (and not any of the many other formats possible); is it ok to embed some HTML within one's Markdown, in order to side-step around a bug?
Could there be any basic sideffects I (as a newbie) couldn't predict but should be aware of?
Non-conventional Markdown example:
_"<strong>This</strong> is an example sentence."_ -**OP**
Which outputs valid HTML:
<em>"<strong>This</strong> is an example sentence."</em> -<strong>OP</strong>
Resulting in successful content:
"This is an example sentence." -OP
Background (don't have to read):
I noticed that if I include HTML in my Markdown, it appears to get skipped during the conversion, resulting in it being seamlessly incorporated in the output HTML.
This appears to be a good thing, at least in my case (Using Hugo to build a website with a template theme) where the Markdown wasn't producing the correct result (leaving a pair of unwanted *s in the HTML: should have been *italic* but asterisks showing).
For those wondering - yes, I confirmed my Markdown was correct using other parsers that handled it fine.
Note: the examples here are simplifications of my specific case
Not only is it okay to do, but it is encouraged. As the rules state:
For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.
And later:
If you want, you can even use HTML tags instead of Markdown formatting; e.g. if you’d prefer to use HTML <a> or <img> tags instead of Markdown’s link or image syntax, go right ahead.
Of course, there are a few things to take into consideration. For example block level tags must be at the document root level (cannot be nested inside blockquotes, lists, etc) and content inside them does not get parsed as Markdown. However, inline tags can be placed anywhere and do not restrict Markdown parsing.
For people using Markdown in highly modular or user-flexible environments (probably slightly more advanced readers):
One should note that although Markdown is most commonly converted to HTML, it can also be used with other formats[1].
For this reason I think it's important to confirm that if you (as a publisher of content) are not the one who determines what the Markdown will be parsed with, or how it is converted it may be 'safer' to not embed HTML in it.
[1] as stated in the Markdown Wikipedia page.

same render style as github markdown page for a page on my jekyll site

I am using jekyll to generate my site and markdown files for creating posts.
I wrote markdown code to generate http://techtaste.in/blog/markdown/markdown-quick-reference.html. And I have used same content in https://github.com/Raghavendrak555/chari.github.io/blob/master/testMarkdown.md. These two pages render differently. Former one does not has syntax highlighted and latter one has.
What I can do to get the same sort of display for a markdown file in my site, same as it display in the github site.
Do I have to link any specific CSS file to achieve this or any config settings are needed in _config.yml.
Thanks in advance.
Do I have to link any specific CSS file to achieve this
Yes, exactly.
First you need to specify the markdown renderer and its syntax highligher. I use this:
markdown: kramdown
kramdown:
syntax_highlighter: rouge
(I'm not sure if this is strictly necessary, but it's what I do and it works for me.)
The syntax highlighter wraps your code in a bunch of span elements with particular classes depending on what color it should be. (On GitHub, inspect the highlighted code to see what I'm talking about.) Then you need to include CSS files that specify colors for each of those classes.
To find these CSS files, do a google search for "rogue syntax highlighting css files" or "pygments css" (rogue and pygments use the same class names). Here is a good list, but according to this you might have to change one class name.
See also: Add syntax highlighting to gh-pages

Transform PDF to HTML, keep layout

What methods are there to transform a PDF to HTML? It could be anything - online service, software, library. (Opensource preferred. In the last case, php or python would be preferred.) It has to keep the original layout (including page numbers, footnotes and such), keep the images (combining them to one single background image per page is acceptable) and keep the links. It should preferably output valid XHTML and clean up PDF features such as ligatures, but if there is some post-processing required, I can live with that. Something with a clean, relatively semantic HTML output would be great.
The closest one I found was zamzar.org, but it choked on links. (Also, the HTML output is an ugly heap of absolutely positioned divs and needs post-processing because of encoding problems.)
I know two options. Both look visually very similar, but the output is for sure not semantic.
Python: PyMuPDF
Install PyMuPDF: pip install pymupdf
import fitz
def to_html(filepath: str):
doc = fitz.open(filepath)
for i, page in enumerate(doc):
text = page.getText("html")
with open(f"pymupdf-page-{i}.html", "w") as fp:
fp.write(text)
doc.close()
pdftohtml
Within the debian sources (this one)
pdftohtml -c
I worked with iText library, and I found it good to parse the PDF structure (I used it to search for text).
It's a library that parses a PDF and creates an object model out of it, so you will need to code the HTML generator, but it should be not too difficult.
Process the PDFs using PDFtoHTMLEx which produces pixel perfect presentational HTML markup (positioned divs).
To get semantic HTML, you can post process the documents using transcript.py (I am the author). This produces semantic HTML including headings, paragraphs, lists and data tables. Bear in mind the tags are reconstructed (not extracted) because the python code is looking for visual design conventions and decides based on the layout. Structure tags and semantic information is not normally present in a PDF.
Few years ago I was using ABBYY PDF Transformer and it was nice for simple documents
Have you had a look at http://www.jpedal.org/html_index.php?

What extension should I use for files containing fragments of HTML? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 16 days ago.
Improve this question
The ".html" suffix on a filename implies that the document contains html, head, and body tags.
I have some files that each contains a div element or two, but no html or body tags. The file contents are well-formed HTML fragments in the sense that they could be inserted into a body tag of a compliant HTML document, and it would still be compliant. (They contain no "<% %>" markers, no PHP code, etc.) But a fragment file is not compliant HTML by itself, so I'd like to give it a different naming convention.
Several "file extension" sites include an entry for ".PHT" and describe it as "Partial Hypertext File." That sounds promising, but I can't find any additional explanation on the origin, expected file format, or applications that use it. Also, many of the same sites identify ".phtml" and ".phtm" (which appear to be longer versions of the .pht suffix) as PHP files — as noted, my files are not PHP files.
Should I use ".pht" as a suffix? Is there a more appropriate naming convention?
Edit:
I'd like to distinguish fragment files from the full HTML documents in the same directory.
I would use .inc (meaning include file) or .txt.
I generally use .inc or .tpl extensions.
As the files are really written in Hyper Text Mark-up Language, I think it's quite valid to give them the .html extension. Consider just calling the Directory something like "divs" or "panels", "forms", etc...
If selecting an extension for fragments it makes sense to pick an extension that code editors think is html so that you get all your highlighting goodness.
The Atom editor uses the following for html: .ejs, .htm, .html, .kit, .shtml, .tmpl, .tpl, .xhtml and the Scite editor uses: .asp, .cfm, .hta, .htm, .html, .htd, .jsp, .htt, .shtml, .tpl, and .xhtml. Select one used by both, and see if it is also defined by your editor of choice.
For templates I use .tpl and set the first line with a percent and exclamation mark:
%!
<followed by rest of file here>
because the %! causes Chrome and Firefox to treat a file as text by default.
By definition .html or .htm are files that contain HTML regardless of version. There is no official specification the mentions they should contain full HTML document definitions, just they should contain HTML.
XHTML is a different ball game as it must contain validated content, therefore a full well-formed XHTML document is required.
I use .html and have organised the files with HTML fragments in a separate folder however my editor respects the fact they do not have to be full documents and doesn't produce any warnings.
To distinguish differences in the same directory you could use .htm for partial content and .html for full documents or vise versa.
Alternatively you could prefix the extension such as file.partial.html or file.p.html
If your editor produces warnings, it may be an idea to look into your specific editors reference material (help files, website, support forums, etc.) for a solution, that may in-fact be using another extension like .tpl or .inc.
Edit - Forgot to write about XML:
HTML is XML so you could use the .xml extension, the caveats being:
You probably won't get HTML type-hinting
Valid xml files need <?xml version="1.0"?> as the first line so an include process may insert this line where its not wanted.
Only one container object is supported, so you'd need one <div> or similar element containing all your content, so you couldn't have two <div> in the root of the document.
I generally use .inc (include file) or .shtml (shared HTML). I've never heard of .pht, though.
I've only seen the ".phtml" extension before but I've never used it. I always use ".html", like any other ".html" file. Like MasterPeter said the file is written in HTML, so I think it's correct.
Another extension, ".fhtml", appears to be used by some outdated Macromedia product, and at least one other software package uses it to mean "fragment HTML".
Having a precedent is nice, but unfortunately the .fhtml suffix also used to indicate HTML templates with embedded Factor code. That kind of ambiguity is annoying.
I use ".i.html", which to me covers most of the bases, and is unique. Seems to work.
I've used:
.htm for fragment.
.html if its a full page.
Sometimes I think of using htmlf for fragments, as in JSP you have .jsp for full pages and .jspf for fragments.
I use "inc". I don't see any reason to worry about a thing like that. Call it whatever you want, as long as you get the benefit of looking at the files in a directory listing and not being confused.
One example where I don't use "inc" would be when there are includes inside includes, say your outer page includes a nav include, which then includes a third file. In that case you have to stay with whatever your server expects the extension to be for server-parsed files.