Pandoc HTML variables: `quotes` and `math` - html

Pandoc default HTML template contains these two variables:
quotes,
math.
How are they supposed to be used?
More specifically I see that quotes sets the values for the tag <q>. Is this tag used in markdown to HTML conversion?

tl;dr: they seem to be mostly obsolete legacies from previous versions of pandoc
quotes
A little archeology of pandoc commits shows that 'quotes' was added when pandoc switched from using <q> tags to directly adding quotes signs. A new option, --html-q-tags was added to keep the previous behavior: the option wraps quotes in <q> and sets quotes to true so that a piece of css code is added as explained in the html template. See this commit to pandoc and this commit to pandoc-templates. See the behavior with the following file:
"hello world"
This:
pandoc test.md -t html --smart --standalone
Produces (skipping the usual head, with no css affecting <q>)
<p>“hello world”</p>
While this
pandoc test.md -t html --standalone --html-q-tags --smart
produces (skipping the usual header)
<style type="text/css">q { quotes: "“" "”" "‘" "’"; }</style>
</head>
<body>
<p><q>hello world</q></p>
</body>
You have to use --smart though.
math
It looks like this was introduced to include math rendering scripts inside the standalone file. See this commit from 2010. I think some command-line options picking non-(currently)-default math rendering systems, like --mathml, sets this variable to a value that actually makes sense (like copying the math rendering scripts). Try:
pandoc -t html --mathml

For the quotes variable, see #scoa.
As regards the math variable, I found what follows.
When using MathML, that is the option --mathml, the code block:
$if(math)$
$math$
$endif$
in the default HTML conversion template adds a portability script to the HTML output.
Anyway, Chrome and Edge do not currently support MathML and Firefox seems to support it without this script.
So, for a custom template, removing the $if(math)$ ... code block will not affect MathML rendering.
When using MathJax, that is the option --mathjax, $if(math)$ ... adds to the HTML output the script block:
<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS_CHTML-full" type="text/javascript"></script>
This is always necessary to render the maths formulae.
When using the --latexmathml, a giant script, converting the LaTeX style math into MathML, is inserted by the $if(math)$ ... code block. Without this code block in the conversion template, the script is not inserted and the maths can't be rendered.

Related

Jupyter syntax highlight in <code> environment

I was wondering if it was possible to highlight code written using the HTML <code>...</code> tags in a similar fashion to the native markdown code block using the tripe backtick ```.
I prefer the <code>...</code> environment due to its ability to be customized using css, as I want code to stand out visually.
You can customise the styles of the code syntax-highlighted by triple backtick by wrapping it in a <div>:
<div style="font-weight: bold">
```javascript
var x = 1;
function y() {
return 0;
}
```
</div>
Note the extra new lines (otherwise it will not render properly). This works well in JupyterLab:
JupyterLab jupytext
This extension adds a few Jupytext commands to the command palette. You can use it to select the desired ipynb/text pairing for your notebook.

Add proper syntax name to code blocks when converting from HTML to Markdown with Pandoc

I need to convert some HTML to Markdown with Pandoc. All is fine except the code blocks in my document are not converted properly. I need them to appear in the resulting Markdown document as backtick-code blocks with syntax definition.
For example, if I have such source HTML:
<pre class="python"><code>
def myfunc(param):
'''Description of myfunc'''
return do_something(param)
</code></pre>
I want Pandoc to convert it into:
```python
def myfunc(param):
'''Description of myfunc'''
return do_something(param)
```
But what I am getting is:
``` {.python}
def myfunc(param):
'''Description of myfunc'''
return do_something(param)
```
It's almost there, but the syntax definition is in curly braces and with a dot, which is not recognised by my Markdown parser. How can I get ```python instead of ``` {.python} when converting HTML to Markdown?
I have control over the source HTML, so I can change it the way needed. If there's an option to insert "raw markdown" into the HTML which will be ignored by Pandoc, that would work for me too, I can embed those blocks into the source HTML the way I need, but I need to tell Pandoc not to touch them. But I can't find such option in the docs.
This behavior is governed by the fenced_code_attributes extension. It is enabled by default; disabling it will give your desired output:
pandoc --to=markdown-fenced_code_attributes ...

Converting multiline code snippets in HTML to Markdown with pandoc

I want to translate this snippet of HTML into Markdown using pandoc.
<code class="code_block"># chown root:root /boot/grub/grub.cfg<br/># chmod og-rwx /boot/grub/grub.cfg
</code>
The output I want to have, is something like this.
```
# chown root:root /boot/grub/grub.cfg
# chmod og-rwx /boot/grub/grub.cfg
```
But the output I never includes the <br> respectively a line break in the markdown file.
# chown root:root /boot/grub/grub.cfg# chmod og-rwx /boot/grub/grub.cfg
I already tried different commands and extensions.
$ pandoc -f html -t markdown t.html
$ pandoc -f html -t markdown+hard_line_breaks t.html
$ pandoc -f html -t markdown+raw_html+hard_line_breaks t.html
$ pandoc -f html -t markdown+raw_html+hard_line_breaks-inline_code_attributes t.html
Am I missing something?
This is due to the way pandoc represents inline code internally: the code is stored as a string of verbatim text together with a set of attributes. Newlines, being layout commands, don't fit into this representation and are ignored.
Note also that the above is a rather uncommon way of writing multi-line code. See, e.g., the MDN docs on the <code> element:
To represent multiple lines of code, wrap the <code> element within a <pre> element. The <code> element by itself only represents a single phrase of code or line of code.
The problem is that your code block is not properly formatted as a code block. You need (at least) the following:
<pre><code># chown root:root /boot/grub/grub.cfg
# chmod og-rwx /boot/grub/grub.cfg
</code></pre>
In addition to the HTML spec, covered in #tarleb's answer, the Markdown rules also differentiate between a code block and a code span based solely on the existence (or not) of the <pre> tag.
Note that the original Markdown rules demonstrate a code block as generating this HTML:
<pre><code>This is a code block.
</code></pre>
A <code> tag wrapped in a <pre> tag. In contrast, the same rules demonstrate a code span generating this HTML:
<p>Use the <code>printf()</code> function.</p>
Note that only the <code> tag is used, but it is only an inline span (wrapped in a <p>, not a block level element.
When Pandoc is converting from HTML back to Markdown it follows the same convention in reverse. Yes, you have class="code_block" set on your <code> tag, but Pandoc doesn't know what that means, nor should it. And yes, your <code> element is not wrapped in a <p>, but that is just poorly formed HTML (according to the HTML spec, <code> is not a block-level element, but phrasing content; that is, content which gets wrapped in a block-level element such as a <p> or a <pre> element).
And then there is the issue of your <br> tag. How would Pandoc know if that is part of the code or a styling hook? In fact, it doesn't. Which is why we use <pre> tags for multi-line code blocks. With the <pre> tag, whitespace is preserved. Therefore, you only need a newline character without the <br> tag.
For completeness, I realize that the original Markdown rules do not include fenced code blocks, so I will also point to the GitHub Flavored Markdown spec, which also demonstrates fenced code blocks as producing <pre><code> wrapped blocks. Naturally, to go in reverse, you would need to start with <pre><code> wrapped blocks to end up with fenced code blocks.

How can I eliminate the empty line in code blocks rendered by jekyll?

GitHub Pages Jekyll use Pygments by default to render syntax highlighting for code blocks. But I prefer an easier alternative highlight.js to do the job because I only need to indent 4 spaces to mark code blocks in the markdown source files.
However, my R code are all mistakenly interpreted as php or perl or makefile or other type of code by highlight.js, and I want to manually mark the code block by
```r
(some r code)
```
instead. But when I use this, the first line of the code block always appears to be a blank line. I view the HTML source code produced by the 4-space mark, it is like
<pre><code>x <- rnorm(100)
y <- 2*x + rnorm(100)
lm(formula=y~x)
</code></pre>
which does not suffer from this problem.
How can I eliminate the blank line in the first line of the code block?
I face the same issue today when I change my highlighter to highlight.js.
With the help from others, I finally git rid of this blank line, and willing to share the solution. Basically, the whitespace inside <pre> is not trimmed, and be treated as a newline in the rendered page (you can use firebug extension of Firefox enabled with show whitespace to observe the extra line).
Then the solution is obvious.
put pre and code tags at the same line with your actual code. like this:
<pre><code class="css">#font-face {
font-family: Chunkfive; src: url('Chunkfive.otf');
}
or using solution provided by mhulse to make your raw post more readable
<pre><code
>line of code
Here and ...
Here
</code></pre>
Write your own js code to trim L/R whitespace(s) of your content before it be put in <pre>
For more details, check this page.

How to show the string inside a tag verbatim?

What tag can I use to prevent any interpretation? I need that because I need to write down some source code and it's result in blogger. I have this code in blogspot, but the code inside the <pre> is processed
The code is as follows:
<pre class='prettyprint'>
$latex \displaystyle S(n)=\sum_{k=1}^{n}{\frac{1}{T_{k}}=\sum_{k=1}^{n}{\frac{6}{k(k+1)(k+2)}$
</pre>
This is the result:
$latex \displaystyle S(n)=\sum_{k=1}^{n}{\frac{1}{T_{k}}=\sum_{k=1}^{n}{\frac{6}{k(k+1)(k+2)}$
When I can replace '$' in <pre> with something equivalent, I could avoid this issue.
I tried <code> and <pre>, but they all interpret the content.
ADDED
I'm trying to use the javascript code found in this post.
If I understand correctly, you are using Replacemath, and its documentation says: “Should you need to to prevent certain $ signs from triggering LaTeX rendering, replace $ with the equivalent HTML <span>$</span> or $, or put the code inside a <pre> or <code> block if appropriate.” Of these, the first method seems to actually work.
That is, replace all occurrences of “$” inside the pre element by <span>$</span>.
I tested this by publishing a test in my blog (which had been dormant for 6 years...). I had to manually break the pre block to fit into the column.