pandoc: Convert GitHub-flavoured MarkDown containing mixed html and markdown to html - html

My markdown was created according to the style from this top-result cheatsheet with HTML directives, using this commmand:
pandoc -f gfm -t html --atx-headers -s -o out.html in.md
However, the generated html always ignores titles that contains the following HTML code above them, leaving tons of ###, #### in my output HTML. My titles look like these:
# H1
<a name=toc-anchor-h2 />
## H2
<a name=toc-anchor-h3 />
### H3
<a name=toc-anchor-h4 />
#### H4
Then H1 works fine, but the # in the rest levels are all seen by pandoc as plain text. How should I solve this problem?

The headers must be preceded by a blank line. The missing blank line is causing the Markdown parser to not recognize them as headers. Therefore, edit your document to the following:
# H1
<a name=toc-anchor-h2 />
## H2
<a name=toc-anchor-h3 />
### H3
<a name=toc-anchor-h4 />
#### H4
Of, if you are concerned that that moves the anchors too far away from the intended target, include them inline:
# H1
## <a name=toc-anchor-h2 />H2
### <a name=toc-anchor-h3 />H3
#### <a name=toc-anchor-h4 />H4
Or, as you are using Pandoc, you could use one of the many Pandocs extensions which assigns identifiers directly to each header.
As it turns out, Pandoc's gfm variant of Markdown (which you are using) already includes the auto_identifiers extension. As the name implies, the auto_identifiers extension will cause id attributes to be auto-generated for every header. As a reminder, assigning an id attribute to an HTML element has the same effect as defining an anchor; you can link to either with a hash fragment. Therefore, you could simply remove your anchors and use the auto-generated ids which have already been assigned to the headers themselves.
However, if you would like to define your own custom id attributes for each header, then you may want to enable the header_attributes extension and alter your Markdown as follows:
# H1
## H2 {#toc-anchor-h2}
### H3 {#toc-anchor-h3}
#### H4 {#toc-anchor-h4}
which would generate the following HTML:
<h1 id="h1">H1</h1>
<h2 id="toc-anchor-h2">H2</h2>
<h3 id="toc-anchor-h3">H3</h3>
<h4 id="toc-anchor-h4">H4</h4>
Note that the "H1" header has an auto id assigned (based upon the text content of the element), while the remaining headers have the custom ids assigned to them.
One word of caution regarding the header_attributes extension: The syntax for defining the custom ids is non-standard and not supported by most Markdown implementations. If you want portable Markdown, then you should probably stick to the auto-generated ids as that does not require any non-standard markup in your documents.
Update: Note that according to the docs, the header_attributes extension is not compatible with gfm. Therefore, you wouldn't be able to use that extension. However, you get auto_identifiers by default. If you want custom identifiers, the you would need to use the custom raw HTML anchors. Of course that gives you the added benefit of a portable Markdown document.

Related

Using <details> tag in markdown is causing premature main-conatiner closing [duplicate]

I am using MarkEd which implements GitHub flavoured markdown.
I have some working markdown:
## Test heading
a paragraph.
## second heading
another paragraph
Which creates:
<h2 id="test-heading">Test heading</h2>
<p>a paragraph.</p>
<h2 id="second-heading">second heading</h2>
<p>another paragraph</p>
I would like to wrap that markdown section in a div, eg:
<div class="blog-post">
## Test heading
a paragraph.
## second heading
another paragraph
</div>
However this returns the following HTML:
<div class="blog-post">
## Test heading
a paragraph.
## second heading
another paragraph
</div>
Eg, no markdown, literally '## Test heading' appears in the HTML.
How can I properly wrap my markdown in a div?
I have found the following workaround, however it is ugly and not an actual fix:
<div class="blog-post">
<div></div>
## Test heading
a paragraph.
## second heading
another paragraph
</div>
Markdown
For Markdown, This is by design. From the Inline HTML section of the Markdown reference:
Note that Markdown formatting syntax is not processed within block-level HTML tags. E.g., you can’t use Markdown-style emphasis inside an HTML block.
But it is explicitly allowed for span-level tags:
Unlike block-level HTML tags, Markdown syntax is processed within span-level tags.
So, depending on your use-case, you might get away with using a span instead of a div.
CommonMark
If the library you use implements CommonMark, you are lucky. Example 108 and 109 of the spec show that if you keep an empty line in between the HTML block and the markdown code, the contents will be parsed as Markdown:
<div>
*Emphasized* text.
</div>
should work, while the following shouldn't:
<div>
*Emphasized* text.
</div>
And, again according to the same section in the reference, some implementations recognize an additional markdown=1 attribute on the HTML tag to enable parsing of Markdown inside it.
Though it doesn't seem to work in StackOverflow yet:
Testing **Markdown** inside a red-background div.
GitHub Pages supports the markdown="1" attribute to parse markdown inside HTML elements, e.g.
<div class="tip" markdown="1">Have **fun!**</div>
Note: As of 2019/03, this doesn't work on github.com, only GitHub Pages.
Note: Quotes, as in markdown="1", are not required by HTML5 but if you don't use quotes (markdown=1), GitHub does not recognize it as HTML. Also, support is buggy right now. You will likely get incorrect output if your HTML element is larger than a single paragraph. For example, due to bugs I was unable to embed a Markdown list inside a div.
If you find yourself in an environment in which markdown="1" doesn't work but span does, another option is to use <span style="display:block"> so that block-level classes are compatible with it, e.g.
<span style="display:block" class="note">It **works!**</span>
Tip: <span class="note"></span> is shorter than <div class="note" markdown="1"></div>, so if you control the CSS you might prefer to use <span> and add display: block; to your CSS.
Markdown Extra is needed to be able to for Markdown formatting works inside an HTML blocks, please check the documentation stated here -> https://michelf.ca/projects/php-markdown/extra/
Markdown Extra gives you a way to put Markdown-formatted text inside
any block-level tag. You do this by adding a markdown attribute to the
tag with the value 1 β€” which gives markdown="1"
Last resort option:
Some libraries may be case sensitive.
Try <DIV> instead of <div> and see what happens.
Markdownsharp has this characteristic - although on StackOverflow they strip out all DIVs anyway so don't expect it to work here.
By looking at the docs for Extending Marked and modifying the html renderer method, you can do something like this to replace the parts between tags with parsed markdown. I haven't done extensive testing, but it worked with my first few attempts.
const marked = require('marked');
const renderer = new marked.Renderer();
renderer.html = (mixedContent) => mixedContent.replace(/[^<>]+?(?=<)/g, (match) => {
const tokens = marked.lexer(match);
return marked.parser(tokens);
});
Edit
this new regex will ensure that only markdown with lines between it and the html tags will be parsed.
const marked = require('marked');
const renderer = new marked.Renderer();
renderer.html = (mixedContent) => mixedContent.replace(/\n\n[^<>]+?\n\n(?=<)/g, (match) => {
const tokens = marked.lexer(match);
return marked.parser(tokens);
});
In my case (on GitHub), the problem was resolved when I added newline between html tags and markdown text.

How can I add header metadata without adding the <h1>?

I'm writing something in markdown and converting it to html with pandoc, but when I add the title variable in the yaml header, it also adds an <h1> to the top of the document, which I don't want. In the pandoc documentation it says to use the title-meta variable, but it still says
[WARNING] This document format requires a nonempty <title> element.
Is there a way to set the title without adding the title block?
command I'm using:
pandoc -s "file.md" -o "file.html"`
output of pandoc --version:
pandoc 2.10.1
Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
Default user data directory: C:\Users\noah\AppData\Roaming\pandoc
Copyright (C) 2006-2020 John MacFarlane
Web: https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
One can set an explicit title with --metadata=title="My title" while simultaneously preventing the output of the <h1> and <header> elements by setting the template variable title to an empty string:
pandoc --metadata=title="Fancy title" --variable=title="" ...

How to set an anchor in Markdown File

I want to set an anchor in a .md file. The file is in the Team Foundation Server but the tag <a name="anchor"></a> does not work. Are there any other possibilities to set an anchor in a .md file?
I already tried the following:
Link to an anchor
[Question 22](answers.md#answer22)
Setting an anchor
<a name="answer22"></a> This is an answer for Question 22
The result is that clicking on "Question 22" the file answers.md is opened successfully but isn't "hopping" to Question 22. Furthermore the tag <a name="answer22"><a/> doesn't seem to be recognized as Code in the md. File in the TFS. If I open the File answer22.md and look at the preview (in TFS you can switch between "contents", "preview", "history" etc.) the tag is not hidden and you can see it in the preview as if it is just plain text.
As stated in Microsofts basic Markdown guidance, anchors are generated for every heading.
Just place your answer below a heading.
questions.md
[Question22]9(./answers.md#answer-22)
answers.md
## Answer 22
The answer is 42
See the Markdown Guide, Extended Syntax: Linking to Heading IDs.
This markdown:
[Heading IDs](#heading-ids)
Should be rendered to following HTML by most supported engines:
Heading IDs
Where the text of the headings serves as anchor-name in slug form (spaces converted to hyphens).
In Pandoc Markdown you can set anchors on arbitrary spans inside a paragraph using syntax [span]{#anchor}, e.g.:
[This is an answer for question 22]{#answer-22}
And then reference it as usual: [Question 22](#answer-22).
If you want to reference a whole paragraph then formally it's not possible but you can make a simple hack adding an empty span right in the beginning of the paragraph:
[]{#answer-22}
Paragraph text.

How can I wrap my markdown in an HTML div?

I am using MarkEd which implements GitHub flavoured markdown.
I have some working markdown:
## Test heading
a paragraph.
## second heading
another paragraph
Which creates:
<h2 id="test-heading">Test heading</h2>
<p>a paragraph.</p>
<h2 id="second-heading">second heading</h2>
<p>another paragraph</p>
I would like to wrap that markdown section in a div, eg:
<div class="blog-post">
## Test heading
a paragraph.
## second heading
another paragraph
</div>
However this returns the following HTML:
<div class="blog-post">
## Test heading
a paragraph.
## second heading
another paragraph
</div>
Eg, no markdown, literally '## Test heading' appears in the HTML.
How can I properly wrap my markdown in a div?
I have found the following workaround, however it is ugly and not an actual fix:
<div class="blog-post">
<div></div>
## Test heading
a paragraph.
## second heading
another paragraph
</div>
Markdown
For Markdown, This is by design. From the Inline HTML section of the Markdown reference:
Note that Markdown formatting syntax is not processed within block-level HTML tags. E.g., you can’t use Markdown-style emphasis inside an HTML block.
But it is explicitly allowed for span-level tags:
Unlike block-level HTML tags, Markdown syntax is processed within span-level tags.
So, depending on your use-case, you might get away with using a span instead of a div.
CommonMark
If the library you use implements CommonMark, you are lucky. Example 108 and 109 of the spec show that if you keep an empty line in between the HTML block and the markdown code, the contents will be parsed as Markdown:
<div>
*Emphasized* text.
</div>
should work, while the following shouldn't:
<div>
*Emphasized* text.
</div>
And, again according to the same section in the reference, some implementations recognize an additional markdown=1 attribute on the HTML tag to enable parsing of Markdown inside it.
Though it doesn't seem to work in StackOverflow yet:
Testing **Markdown** inside a red-background div.
GitHub Pages supports the markdown="1" attribute to parse markdown inside HTML elements, e.g.
<div class="tip" markdown="1">Have **fun!**</div>
Note: As of 2019/03, this doesn't work on github.com, only GitHub Pages.
Note: Quotes, as in markdown="1", are not required by HTML5 but if you don't use quotes (markdown=1), GitHub does not recognize it as HTML. Also, support is buggy right now. You will likely get incorrect output if your HTML element is larger than a single paragraph. For example, due to bugs I was unable to embed a Markdown list inside a div.
If you find yourself in an environment in which markdown="1" doesn't work but span does, another option is to use <span style="display:block"> so that block-level classes are compatible with it, e.g.
<span style="display:block" class="note">It **works!**</span>
Tip: <span class="note"></span> is shorter than <div class="note" markdown="1"></div>, so if you control the CSS you might prefer to use <span> and add display: block; to your CSS.
Markdown Extra is needed to be able to for Markdown formatting works inside an HTML blocks, please check the documentation stated here -> https://michelf.ca/projects/php-markdown/extra/
Markdown Extra gives you a way to put Markdown-formatted text inside
any block-level tag. You do this by adding a markdown attribute to the
tag with the value 1 β€” which gives markdown="1"
Last resort option:
Some libraries may be case sensitive.
Try <DIV> instead of <div> and see what happens.
Markdownsharp has this characteristic - although on StackOverflow they strip out all DIVs anyway so don't expect it to work here.
By looking at the docs for Extending Marked and modifying the html renderer method, you can do something like this to replace the parts between tags with parsed markdown. I haven't done extensive testing, but it worked with my first few attempts.
const marked = require('marked');
const renderer = new marked.Renderer();
renderer.html = (mixedContent) => mixedContent.replace(/[^<>]+?(?=<)/g, (match) => {
const tokens = marked.lexer(match);
return marked.parser(tokens);
});
Edit
this new regex will ensure that only markdown with lines between it and the html tags will be parsed.
const marked = require('marked');
const renderer = new marked.Renderer();
renderer.html = (mixedContent) => mixedContent.replace(/\n\n[^<>]+?\n\n(?=<)/g, (match) => {
const tokens = marked.lexer(match);
return marked.parser(tokens);
});
In my case (on GitHub), the problem was resolved when I added newline between html tags and markdown text.

In Markdown, what is the best way to link to a fragment of a page, i.e. #some_id?

I'm trying to figure out how to reference another area of a page with Markdown. I can get it working if I add a
<div id="mylink" />
and for the link do:
[My link](#mylink)
But my guess is that there's some other way to do an in-page link in Markdown that doesn't involve the straight up div tag.
Any ideas?
See this answer.
In summary make a destination with
<a name="sometext"></a>
inserted anywhere in your markdown markup (for example in a header:
## heading<a name="headin"></a>
and link to it using the markdown linkage:
[This is the link text](#headin)
or
[some text](#sometext)
Don't use <div> -- this will mess up the layout for many renderers.
(I have changed id= to name= above. See this answer for the tedious explanation.)
I guess this depends on what you're using to generate html from your markdown. I noticed, that jekyll (it's used by gihub.io pages by default) automatically adds the id="" attribute to headings in the html it generates.
For example if you're markdown is
My header
---------
The resulting html will look like this:
<h2 id="my-header">My header</h2>
So you can link to it simply by [My link](#my-header)
With the PHP version of Markdown, you can also link headers to fragment identifiers within the page using a syntax like either of the following, as documented here
Header 1 {#header1}
========
## Header 2 ## {#header2}
and then
[Link back to header 1](#header1)
[Link back to header 2](#header2)
Unfortunately this syntax is currently only supported for headers, but at least it could be useful for building a table of contents.
The destination anchor for a link in an HTML page may be any element with an id attribute. See Links on the W3C site. Here's a quote from the relevant section:
Destination anchors in HTML documents
may be specified either by the A
element (naming it with the name
attribute), or by any other element
(naming with the id attribute).
Markdown treats HTML as HTML (see Inline HTML), so you can create your fragment identifiers from any element you like. If, for example, you want to link to a paragraph, just wrap the paragraph in a paragraph tag, and include an id:
<p id="mylink">Lorem ipsum dolor sit amet...</p>
Then use your standard Markdown [My link](#mylink) to create a link to fragment anchor. This will help to keep your HTML clean, as there's no need for extra markup.
For anyone use Visual Studio Team Foundation Server (TFS) 2015, it really does not like embedded <a> or <div> elements, at least in headers. It also doesn't like emoji in headers either:
### πŸ”§ Configuration πŸ”§
Lorem ipsum problem fixem.
Gets translated to:
<h3 id="-configuration-">πŸ”§ Configuration πŸ”§</h3>
<p>Lorem ipsum problem fixem.</p>
And so links should either use that id (which breaks this and other preview extensions in Visual Studio), or remove the emoji:
Here's [how to setup](#-configuration-) //πŸ”§ Configuration πŸ”§
Here's [how to setup](#configuration) //Configuration
Where the latter version works both online in TFS and in the markdown preview of Visual Studio.
In Pandoc Markdown you can set anchors on arbitrary spans inside a paragraph using syntax [span]{#anchor}, e.g.:
Paragraph, containing [arbitrary text]{#mylink}.
And then reference it as usual: [My link](#mylink).
If you want to reference a whole paragraph then the most straightforward way is to add an empty span right in the beginning of the paragraph:
[]{#mylink}
Paragraph text.