Pandoc metadata not appearing in default HTML template - html

I'm converting org and markdown files to HTML using pandoc. I want to set metadata such as the title, subtitle, and author tags in an external YAML file and have them display using a template. However I can't get anything to appear beyond the normal body conversion.
I'm using the default HTML template. I've run the conversion concatenating the YAML config beforehand:
pandoc -t html -o output.html metadata.yaml input.md
I also tried including the yaml_metadata_block extension:
pandoc -t html+yaml_metadata_block -o output.html metadata.yaml input.md
Also, I've tried setting the variables in the command itself:
pandoc -t html -o output.html -V title="my title" input.md
My YAML file looks like this:
---
title: "my title"
subtitle: "my subtitle"
author: "the author"
...
Inspecting the default html template with pandoc -D html, it looks like when title etc. are defined, it'll place in a header block:
$if(title)$
<header>
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
$if(date)$
<p class="date">$date$</p>
$endif$
</header>
But in every case, the html file only contains the converted text from input.md. I think this is the $body$ line defined in the default template.
How can I get these fields to appear in my html document?

My goodness, all I was missing is the -s attribute!
from the man page:
-s, --standalone
Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automat‐
ically for pdf, epub, epub3, fb2, docx, and odt output.
Thus the following command works as expected
pandoc -s -t html -o output.html metadata.yaml input.md

Related

How can I add header metadata without adding the <h1>?

I'm writing something in markdown and converting it to html with pandoc, but when I add the title variable in the yaml header, it also adds an <h1> to the top of the document, which I don't want. In the pandoc documentation it says to use the title-meta variable, but it still says
[WARNING] This document format requires a nonempty <title> element.
Is there a way to set the title without adding the title block?
command I'm using:
pandoc -s "file.md" -o "file.html"`
output of pandoc --version:
pandoc 2.10.1
Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
Default user data directory: C:\Users\noah\AppData\Roaming\pandoc
Copyright (C) 2006-2020 John MacFarlane
Web: https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
One can set an explicit title with --metadata=title="My title" while simultaneously preventing the output of the <h1> and <header> elements by setting the template variable title to an empty string:
pandoc --metadata=title="Fancy title" --variable=title="" ...

Pandoc: Change font family to sans while converting from Markdown to HTML

I successfully get a nice formatted text I could paste anywhere using:
cat myFile.md | pandoc -s -f markdown -t html | xclip -selection clipboard -t text/html
xclip is a command line interface to X selections (clipboard). With ... -t html -o myFile.html works fine too.
I'm trying to change the font family, from the default Serif to some other Sans-serif font family. I found a lot of examples with LaTex, PDF and DOC, but no one that works in this scenario. Tried a lot of fonts (listed from fc-list : family, even after installing texlive-xetex package). The Closest answer I could find was this one.
I'm trying to just use certain parameters on CLI, trying to avoid things like --css source/styles.css.
Using pandoc 1.19.2.4 over Ubuntu 18.04.
Some --variable I tried:
-V fontfamily:arev
-V fontfamily:Ubuntu
-V fontfamilyoptions:sfdefault
-V "mainfont:DejaVuSans"
-V mainfont="DejaVu Sans Serif"
-V "sansfont:DejaVuSans"
Edit 1:
Based on mb21's answer, since Pandoc 1.12.x (source) is possible to provide more metadata to Pandoc adding a YAML block code.
On newer Pandoc versions, I also added a title key to avoid the "[WARNING] This document format requires a nonempty element.".
---
title: My File
header-includes: |
<style>
body {
font-family: "Liberation Sans";
}
</style>
---
I still don't see the fundamental difference in this aspect between coming from Markdown instead of LaTeX, and going to HTML instead of PDF.
Update: This is possible in pandoc 2.11. For details, see the MANUAL, but for example:
---
mainfont: sans-serif
---
my markdown
If your font name includes spaces then specify name in quotes escaped with backslash:
---
mainfont: \"Sanskrit 2020\"
---
Old answer: The font variables you mention are only for LaTeX/PDF output. To style HTML, you need CSS. You can for example put this in your markdown file:
---
header-includes: |
<style>
body {
font-family: sans-serif;
}
</style>
---
my markdown
Alternatively you can:
use --css
copy the default styles.html partial in ~/.pandoc/templates/styles.html and modify it. (You can just create the directories if they doen't exist.)
use a template like this one...
Also: pandoc 1.19 is ancient, see https://pandoc.org/installing.html
Another solution based on mb21's one is using a separated YAML file, --metadata-file option with that code in for e.g. metadata.yaml.
I provide the title with --metadata.
cat myFile.md | pandoc -s -f markdown -t html --metadata-file metadata.yaml --metadata title="My File" -o myFile.html
A metadata.yaml content example:
---
header-includes: |
<style>
body {
font-family: "DejaVu Sans";
}
</style>
---
AFAIK is not possible to provide the whole styling just through --metadata on the same on-liner command.
Another very useful onliner to convert clipboard to formatted rendered text is (on Mac use pbpaste):
xsel -b | pandoc -s -f markdown -t html | xclip -selection clipboard -t text/html

Converting multiline code snippets in HTML to Markdown with pandoc

I want to translate this snippet of HTML into Markdown using pandoc.
<code class="code_block"># chown root:root /boot/grub/grub.cfg<br/># chmod og-rwx /boot/grub/grub.cfg
</code>
The output I want to have, is something like this.
```
# chown root:root /boot/grub/grub.cfg
# chmod og-rwx /boot/grub/grub.cfg
```
But the output I never includes the <br> respectively a line break in the markdown file.
# chown root:root /boot/grub/grub.cfg# chmod og-rwx /boot/grub/grub.cfg
I already tried different commands and extensions.
$ pandoc -f html -t markdown t.html
$ pandoc -f html -t markdown+hard_line_breaks t.html
$ pandoc -f html -t markdown+raw_html+hard_line_breaks t.html
$ pandoc -f html -t markdown+raw_html+hard_line_breaks-inline_code_attributes t.html
Am I missing something?
This is due to the way pandoc represents inline code internally: the code is stored as a string of verbatim text together with a set of attributes. Newlines, being layout commands, don't fit into this representation and are ignored.
Note also that the above is a rather uncommon way of writing multi-line code. See, e.g., the MDN docs on the <code> element:
To represent multiple lines of code, wrap the <code> element within a <pre> element. The <code> element by itself only represents a single phrase of code or line of code.
The problem is that your code block is not properly formatted as a code block. You need (at least) the following:
<pre><code># chown root:root /boot/grub/grub.cfg
# chmod og-rwx /boot/grub/grub.cfg
</code></pre>
In addition to the HTML spec, covered in #tarleb's answer, the Markdown rules also differentiate between a code block and a code span based solely on the existence (or not) of the <pre> tag.
Note that the original Markdown rules demonstrate a code block as generating this HTML:
<pre><code>This is a code block.
</code></pre>
A <code> tag wrapped in a <pre> tag. In contrast, the same rules demonstrate a code span generating this HTML:
<p>Use the <code>printf()</code> function.</p>
Note that only the <code> tag is used, but it is only an inline span (wrapped in a <p>, not a block level element.
When Pandoc is converting from HTML back to Markdown it follows the same convention in reverse. Yes, you have class="code_block" set on your <code> tag, but Pandoc doesn't know what that means, nor should it. And yes, your <code> element is not wrapped in a <p>, but that is just poorly formed HTML (according to the HTML spec, <code> is not a block-level element, but phrasing content; that is, content which gets wrapped in a block-level element such as a <p> or a <pre> element).
And then there is the issue of your <br> tag. How would Pandoc know if that is part of the code or a styling hook? In fact, it doesn't. Which is why we use <pre> tags for multi-line code blocks. With the <pre> tag, whitespace is preserved. Therefore, you only need a newline character without the <br> tag.
For completeness, I realize that the original Markdown rules do not include fenced code blocks, so I will also point to the GitHub Flavored Markdown spec, which also demonstrates fenced code blocks as producing <pre><code> wrapped blocks. Naturally, to go in reverse, you would need to start with <pre><code> wrapped blocks to end up with fenced code blocks.

Using the Author field of R Markdown in footer.html

R Markdown allows to add a footer to your html output. The YAML header allows to give an author name using a specific field.
I would like to use this author name in my footer.html file, but cannot figure out how to achieve that.
Here is a minimal example:
fic.rmd:
---
title: "title"
author: "Mister-A"
output:
html_document:
include:
after_body: footer.html
---
content
And in the same folder the footer.html file:
I am - #author-name-field-that-I-don't-konw-how-to-get -
Any help or advice would me much appreciated. Thank you very much.
If you want to be able to use the YAML parameters within sections of the report, you need to alter the base pandoc template. You can find all of them here
The basic structure of making this work is to put the variable surrounded by dollar signs to use the YAML variable in the output document. So for example $author$ is required in this case.
Solution
We can create a copy of the pandoc template for HTML in our local directory using the following command. This is the same file as here.
# Copies the RMkarkdown template to the local directory so we can edit it
file.copy(rmarkdown:::rmarkdown_system_file("rmd/h/default.html"), to = "template.html")
In the template.html, we need to add the pandoc tags. To add a footer, we want to add code to the buttom of the document. This is line 457 in the current template but this may change in future versions, so we want to put it after the include-after tag:
$for(include-after)$
$include-after$
$endfor$
<hr />
<p style="text-align: center;">I am $author$</p>
$if(theme)$
$if(toc_float)$
</div>
</div>
$endif$
Finally, the R Markdown file looks like:
---
title: "title"
author: "Mister-A"
output:
html_document:
template: template5.html
---
This is some text
As a possible extension of this, you may want to check out this post on designing a stylish footer.

Pandoc HTML variables: `quotes` and `math`

Pandoc default HTML template contains these two variables:
quotes,
math.
How are they supposed to be used?
More specifically I see that quotes sets the values for the tag <q>. Is this tag used in markdown to HTML conversion?
tl;dr: they seem to be mostly obsolete legacies from previous versions of pandoc
quotes
A little archeology of pandoc commits shows that 'quotes' was added when pandoc switched from using <q> tags to directly adding quotes signs. A new option, --html-q-tags was added to keep the previous behavior: the option wraps quotes in <q> and sets quotes to true so that a piece of css code is added as explained in the html template. See this commit to pandoc and this commit to pandoc-templates. See the behavior with the following file:
"hello world"
This:
pandoc test.md -t html --smart --standalone
Produces (skipping the usual head, with no css affecting <q>)
<p>“hello world”</p>
While this
pandoc test.md -t html --standalone --html-q-tags --smart
produces (skipping the usual header)
<style type="text/css">q { quotes: "“" "”" "‘" "’"; }</style>
</head>
<body>
<p><q>hello world</q></p>
</body>
You have to use --smart though.
math
It looks like this was introduced to include math rendering scripts inside the standalone file. See this commit from 2010. I think some command-line options picking non-(currently)-default math rendering systems, like --mathml, sets this variable to a value that actually makes sense (like copying the math rendering scripts). Try:
pandoc -t html --mathml
For the quotes variable, see #scoa.
As regards the math variable, I found what follows.
When using MathML, that is the option --mathml, the code block:
$if(math)$
$math$
$endif$
in the default HTML conversion template adds a portability script to the HTML output.
Anyway, Chrome and Edge do not currently support MathML and Firefox seems to support it without this script.
So, for a custom template, removing the $if(math)$ ... code block will not affect MathML rendering.
When using MathJax, that is the option --mathjax, $if(math)$ ... adds to the HTML output the script block:
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
This is always necessary to render the maths formulae.
When using the --latexmathml, a giant script, converting the LaTeX style math into MathML, is inserted by the $if(math)$ ... code block. Without this code block in the conversion template, the script is not inserted and the maths can't be rendered.