Pandoc: Change font family to sans while converting from Markdown to HTML - html

I successfully get a nice formatted text I could paste anywhere using:
cat myFile.md | pandoc -s -f markdown -t html | xclip -selection clipboard -t text/html
xclip is a command line interface to X selections (clipboard). With ... -t html -o myFile.html works fine too.
I'm trying to change the font family, from the default Serif to some other Sans-serif font family. I found a lot of examples with LaTex, PDF and DOC, but no one that works in this scenario. Tried a lot of fonts (listed from fc-list : family, even after installing texlive-xetex package). The Closest answer I could find was this one.
I'm trying to just use certain parameters on CLI, trying to avoid things like --css source/styles.css.
Using pandoc 1.19.2.4 over Ubuntu 18.04.
Some --variable I tried:
-V fontfamily:arev
-V fontfamily:Ubuntu
-V fontfamilyoptions:sfdefault
-V "mainfont:DejaVuSans"
-V mainfont="DejaVu Sans Serif"
-V "sansfont:DejaVuSans"
Edit 1:
Based on mb21's answer, since Pandoc 1.12.x (source) is possible to provide more metadata to Pandoc adding a YAML block code.
On newer Pandoc versions, I also added a title key to avoid the "[WARNING] This document format requires a nonempty element.".
---
title: My File
header-includes: |
<style>
body {
font-family: "Liberation Sans";
}
</style>
---
I still don't see the fundamental difference in this aspect between coming from Markdown instead of LaTeX, and going to HTML instead of PDF.

Update: This is possible in pandoc 2.11. For details, see the MANUAL, but for example:
---
mainfont: sans-serif
---
my markdown
If your font name includes spaces then specify name in quotes escaped with backslash:
---
mainfont: \"Sanskrit 2020\"
---
Old answer: The font variables you mention are only for LaTeX/PDF output. To style HTML, you need CSS. You can for example put this in your markdown file:
---
header-includes: |
<style>
body {
font-family: sans-serif;
}
</style>
---
my markdown
Alternatively you can:
use --css
copy the default styles.html partial in ~/.pandoc/templates/styles.html and modify it. (You can just create the directories if they doen't exist.)
use a template like this one...
Also: pandoc 1.19 is ancient, see https://pandoc.org/installing.html

Another solution based on mb21's one is using a separated YAML file, --metadata-file option with that code in for e.g. metadata.yaml.
I provide the title with --metadata.
cat myFile.md | pandoc -s -f markdown -t html --metadata-file metadata.yaml --metadata title="My File" -o myFile.html
A metadata.yaml content example:
---
header-includes: |
<style>
body {
font-family: "DejaVu Sans";
}
</style>
---
AFAIK is not possible to provide the whole styling just through --metadata on the same on-liner command.
Another very useful onliner to convert clipboard to formatted rendered text is (on Mac use pbpaste):
xsel -b | pandoc -s -f markdown -t html | xclip -selection clipboard -t text/html

Related

How can I add header metadata without adding the <h1>?

I'm writing something in markdown and converting it to html with pandoc, but when I add the title variable in the yaml header, it also adds an <h1> to the top of the document, which I don't want. In the pandoc documentation it says to use the title-meta variable, but it still says
[WARNING] This document format requires a nonempty <title> element.
Is there a way to set the title without adding the title block?
command I'm using:
pandoc -s "file.md" -o "file.html"`
output of pandoc --version:
pandoc 2.10.1
Compiled with pandoc-types 1.21, texmath 0.12.0.2, skylighting 0.8.5
Default user data directory: C:\Users\noah\AppData\Roaming\pandoc
Copyright (C) 2006-2020 John MacFarlane
Web: https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
One can set an explicit title with --metadata=title="My title" while simultaneously preventing the output of the <h1> and <header> elements by setting the template variable title to an empty string:
pandoc --metadata=title="Fancy title" --variable=title="" ...

Converting multiline code snippets in HTML to Markdown with pandoc

I want to translate this snippet of HTML into Markdown using pandoc.
<code class="code_block"># chown root:root /boot/grub/grub.cfg<br/># chmod og-rwx /boot/grub/grub.cfg
</code>
The output I want to have, is something like this.
```
# chown root:root /boot/grub/grub.cfg
# chmod og-rwx /boot/grub/grub.cfg
```
But the output I never includes the <br> respectively a line break in the markdown file.
# chown root:root /boot/grub/grub.cfg# chmod og-rwx /boot/grub/grub.cfg
I already tried different commands and extensions.
$ pandoc -f html -t markdown t.html
$ pandoc -f html -t markdown+hard_line_breaks t.html
$ pandoc -f html -t markdown+raw_html+hard_line_breaks t.html
$ pandoc -f html -t markdown+raw_html+hard_line_breaks-inline_code_attributes t.html
Am I missing something?
This is due to the way pandoc represents inline code internally: the code is stored as a string of verbatim text together with a set of attributes. Newlines, being layout commands, don't fit into this representation and are ignored.
Note also that the above is a rather uncommon way of writing multi-line code. See, e.g., the MDN docs on the <code> element:
To represent multiple lines of code, wrap the <code> element within a <pre> element. The <code> element by itself only represents a single phrase of code or line of code.
The problem is that your code block is not properly formatted as a code block. You need (at least) the following:
<pre><code># chown root:root /boot/grub/grub.cfg
# chmod og-rwx /boot/grub/grub.cfg
</code></pre>
In addition to the HTML spec, covered in #tarleb's answer, the Markdown rules also differentiate between a code block and a code span based solely on the existence (or not) of the <pre> tag.
Note that the original Markdown rules demonstrate a code block as generating this HTML:
<pre><code>This is a code block.
</code></pre>
A <code> tag wrapped in a <pre> tag. In contrast, the same rules demonstrate a code span generating this HTML:
<p>Use the <code>printf()</code> function.</p>
Note that only the <code> tag is used, but it is only an inline span (wrapped in a <p>, not a block level element.
When Pandoc is converting from HTML back to Markdown it follows the same convention in reverse. Yes, you have class="code_block" set on your <code> tag, but Pandoc doesn't know what that means, nor should it. And yes, your <code> element is not wrapped in a <p>, but that is just poorly formed HTML (according to the HTML spec, <code> is not a block-level element, but phrasing content; that is, content which gets wrapped in a block-level element such as a <p> or a <pre> element).
And then there is the issue of your <br> tag. How would Pandoc know if that is part of the code or a styling hook? In fact, it doesn't. Which is why we use <pre> tags for multi-line code blocks. With the <pre> tag, whitespace is preserved. Therefore, you only need a newline character without the <br> tag.
For completeness, I realize that the original Markdown rules do not include fenced code blocks, so I will also point to the GitHub Flavored Markdown spec, which also demonstrates fenced code blocks as producing <pre><code> wrapped blocks. Naturally, to go in reverse, you would need to start with <pre><code> wrapped blocks to end up with fenced code blocks.

Pandoc metadata not appearing in default HTML template

I'm converting org and markdown files to HTML using pandoc. I want to set metadata such as the title, subtitle, and author tags in an external YAML file and have them display using a template. However I can't get anything to appear beyond the normal body conversion.
I'm using the default HTML template. I've run the conversion concatenating the YAML config beforehand:
pandoc -t html -o output.html metadata.yaml input.md
I also tried including the yaml_metadata_block extension:
pandoc -t html+yaml_metadata_block -o output.html metadata.yaml input.md
Also, I've tried setting the variables in the command itself:
pandoc -t html -o output.html -V title="my title" input.md
My YAML file looks like this:
---
title: "my title"
subtitle: "my subtitle"
author: "the author"
...
Inspecting the default html template with pandoc -D html, it looks like when title etc. are defined, it'll place in a header block:
$if(title)$
<header>
<h1 class="title">$title$</h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
$for(author)$
<p class="author">$author$</p>
$endfor$
$if(date)$
<p class="date">$date$</p>
$endif$
</header>
But in every case, the html file only contains the converted text from input.md. I think this is the $body$ line defined in the default template.
How can I get these fields to appear in my html document?
My goodness, all I was missing is the -s attribute!
from the man page:
-s, --standalone
Produce output with an appropriate header and footer (e.g. a standalone HTML, LaTeX, TEI, or RTF file, not a fragment). This option is set automat‐
ically for pdf, epub, epub3, fb2, docx, and odt output.
Thus the following command works as expected
pandoc -s -t html -o output.html metadata.yaml input.md

Pandoc HTML variables: `quotes` and `math`

Pandoc default HTML template contains these two variables:
quotes,
math.
How are they supposed to be used?
More specifically I see that quotes sets the values for the tag <q>. Is this tag used in markdown to HTML conversion?
tl;dr: they seem to be mostly obsolete legacies from previous versions of pandoc
quotes
A little archeology of pandoc commits shows that 'quotes' was added when pandoc switched from using <q> tags to directly adding quotes signs. A new option, --html-q-tags was added to keep the previous behavior: the option wraps quotes in <q> and sets quotes to true so that a piece of css code is added as explained in the html template. See this commit to pandoc and this commit to pandoc-templates. See the behavior with the following file:
"hello world"
This:
pandoc test.md -t html --smart --standalone
Produces (skipping the usual head, with no css affecting <q>)
<p>“hello world”</p>
While this
pandoc test.md -t html --standalone --html-q-tags --smart
produces (skipping the usual header)
<style type="text/css">q { quotes: "“" "”" "‘" "’"; }</style>
</head>
<body>
<p><q>hello world</q></p>
</body>
You have to use --smart though.
math
It looks like this was introduced to include math rendering scripts inside the standalone file. See this commit from 2010. I think some command-line options picking non-(currently)-default math rendering systems, like --mathml, sets this variable to a value that actually makes sense (like copying the math rendering scripts). Try:
pandoc -t html --mathml
For the quotes variable, see #scoa.
As regards the math variable, I found what follows.
When using MathML, that is the option --mathml, the code block:
$if(math)$
$math$
$endif$
in the default HTML conversion template adds a portability script to the HTML output.
Anyway, Chrome and Edge do not currently support MathML and Firefox seems to support it without this script.
So, for a custom template, removing the $if(math)$ ... code block will not affect MathML rendering.
When using MathJax, that is the option --mathjax, $if(math)$ ... adds to the HTML output the script block:
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
This is always necessary to render the maths formulae.
When using the --latexmathml, a giant script, converting the LaTeX style math into MathML, is inserted by the $if(math)$ ... code block. Without this code block in the conversion template, the script is not inserted and the maths can't be rendered.

HTML-to-RTF document conversion, preserving classes as styles

I need a HTML2RTF tool, that is, a software that converts HTML format to RTF format... But not "any convertion": I need to preserve the HTML class attributes (ex. of paragraphs) as MS-Word "styles".
My first option was some terminal command of LibreOffice, like
libreoffice --convert-to
because LibreWriter have the bigger community and suppose the best software convertion... But disappointed because not preserve class attributes as styles, even when testing as user in the graphical interface.
I need a Linux solution (also abiword not solved)... Or, last option, a webservice to easy plug in a intranet's Windows server.
Input sample:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>sample1 doc</title>
<!-- no style need, but can be declarated with anything, don't matter -->
<style type="text/css">
.myStyle1 {color: #F00;} .myStyle2 {color: #880;}
.a {color: #00F;} .b {color: #088;}
</style>
</head>
<body><!-- important to preserve class names -->
<p class="myStyle1">Hello in <i>style#1</i>.
<span class="a">SPAN S1</span>.</p>
<p class="myStyle2">... Hello in style#2...</p>
<p class="myStyle1">Bye <span class="b">S2</span>.</p>
</body>
</html>
In MS-Word this sample is imported and looks ok, with styles where was classes.
In LibreOffice (and libreoffice terminal tools) not.
So, there are another tool for LibreOffice? There are a tool for Linux?
PS: last possibility, if none for Linux, a webservice for Windows and MS-Office.
Works for me in Libreoffice 4.3.3.2. Just opened the HTML file you provided and I can see styles named Text.Body.myStyle1 and myStyle2.
Clues, for Debian Stable and UBUNTU LTS 64bits... See this How-To. Basic steps:
sudo apt-get remove libreoffice*
wget http://download.documentfoundation.org/libreoffice/stable/4.3.3/deb/x86_64/LibreOffice_4.3.3_Linux_x86-64_deb.tar.gz
tar -xzvf LibreOffice_4.3.3_Linux_x86-64_deb.tar.gz
cd LibreOffice_4.3.3*_Linux_x86-64_deb/DEBS
sudo dpkg -i *.deb
After v4.3.3, need also to install:
sudo apt-get install libreoffice-writer
then, the cited command:
libreoffice --headless -convert-to rtf libreTeste.html