Rendering raw html with Redcarpet and Markdown - html

I'm using Redcarpet as a Markdown renderer and I would like to be able to display html or any text with < and > without it to be parsed.
Here is an illustration of what should happen:
The user types
I *want* to write <code>
The source of this comment when sent back by the server should be
I <em>want</em> to write <code>
Problem is since the renderer outputs escaped html when parsing the Markdown, I get:
I <em>want</em> to write <code>
Therefore I can't distinguish between the html that people send to the server and the html that is generated by the Redcarpet renderer. If I do a .html_safe on this, my markdown will be interpreted but the user-inputted html too, which shouldn't.
Any idea on how to fix this?
Note that the idea would be to display (but not parse) user-inputted html even if the user didn't use the backticks ` as expected with markdown.
Here is the relevant bit of code :
# this is our markdown helper used in our views
def markdown(text, options=nil)
options = [:no_intra_emphasis => true, ...]
renderer = MarkdownRenderer.new(:filter_html => false, ...)
markdown = Redcarpet::Markdown.new(renderer, *options)
markdown.render(text).html_safe
end

If I understand you correctly you just want <code> as normal text and not as an HTML element.
For that you need to escape the < and > with a backslash:
I *want* to write \<code\>

Related

Add proper syntax name to code blocks when converting from HTML to Markdown with Pandoc

I need to convert some HTML to Markdown with Pandoc. All is fine except the code blocks in my document are not converted properly. I need them to appear in the resulting Markdown document as backtick-code blocks with syntax definition.
For example, if I have such source HTML:
<pre class="python"><code>
def myfunc(param):
'''Description of myfunc'''
return do_something(param)
</code></pre>
I want Pandoc to convert it into:
```python
def myfunc(param):
'''Description of myfunc'''
return do_something(param)
```
But what I am getting is:
``` {.python}
def myfunc(param):
'''Description of myfunc'''
return do_something(param)
```
It's almost there, but the syntax definition is in curly braces and with a dot, which is not recognised by my Markdown parser. How can I get ```python instead of ``` {.python} when converting HTML to Markdown?
I have control over the source HTML, so I can change it the way needed. If there's an option to insert "raw markdown" into the HTML which will be ignored by Pandoc, that would work for me too, I can embed those blocks into the source HTML the way I need, but I need to tell Pandoc not to touch them. But I can't find such option in the docs.
This behavior is governed by the fenced_code_attributes extension. It is enabled by default; disabling it will give your desired output:
pandoc --to=markdown-fenced_code_attributes ...

Is there a non-javascript/PHP way to write sample code that won't get evaluated? [duplicate]

I use the <pre> tag in my blog to post code. I know I have to change < to < and > to >. Are any other characters I need to escape for correct html?
What happens if you use the <pre> tag to display HTML markup on your blog:
<pre>Use a <span style="background: yellow;">span tag with style attribute</span> to hightlight words</pre>
This will pass HTML validation, but does it produce the expected result? No. The correct way is:
<pre>Use a <span style="background: yellow;">span tag with style attribute</span> to hightlight words</pre>
Another example: if you use the pre tag to display some other language code, the HTML encoding is still required:
<pre>if (i && j) return;</pre>
This might produce the expected result but does it pass HTML validation? No. The correct way is:
<pre>if (i && j) return;</pre>
Long story short, HTML-encode the content of a pre tag just the way you do with other tags.
TL;DR
PHP: htmlspecialchars($html);
JavaScript(JS): Element.innerText = "<html>...";
Note that <pre> is just for styles, so you have to escape ALL HTML.
Only For You HTML "fossil"s: using <xmp> tag
This is not well known, but it really does exist and even chrome still supports it, however using a pair of <xmp> tag is NOT recommended to be relied on - it's just for you HTML fossils, but it's a very simple way to handle your personal content, e.g. DOCS. Even the w3.org Wiki says in its example: "No, really. don't use it."
You can put ANY HTML (excluding </xmp> end tag) inside <xmp></xmp>
<xmp>
<html> <br> just any other html tags...
</xmp>
The proper version
Proper version could be considered to be HTML stored as a STRING and displayed with the help of some escaping function/mechanism.
Just remember one thing - the strings in C-like languages are usually written between single quotes or double quotes - if you wrap your string in double => you should escape doubles (probably with \), if you wrap your string in single => escape singles (probably with \)...
The most frequent - Server-side language escaping (ex. in PHP)
Server-side scripting languages often have some built-in function to escape HTML.
<?php
$html = "<html> <br> or just any other HTML"; //store html
echo htmlspecialchars($html); //display escaped html
?>
Note that in PHP 8.1 there was a change so you no longer have to specify ENT_QUOTES flag:
flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401.
The client-side way (example in JavaScript / JS&jQuery)
Similar approach as on server-side is achievable in client-side scripts.
Pure JavaScript
There is no function, but there is the default behavior, if you set element's innerText or node's textContent:
document.querySelector('.myTest').innerText = "<html><head>...";
document.querySelector('.myTest').textContent = "<html><head>...";
HTMLElement.innerText and Node.textContent are not the same thing! You can find out more about the difference in the MDN doc links above
jQuery (a JS library)
jQuery has $jqueryEl.text() for this purpose:
$('.mySomething .test').text("<html><head></head><body class=\"test\">...");
Just remember the same thing as for server-side - in C-like languages, escape the quotes you've wrapped your string in.
For posting code within your markup, I suggest using the <code> tag. It works the same way as pre but would be considered semantically correct.
Otherwise, <code> and <pre> only need the angle brackets encoded.
Use this and don't worry about any of them.
<pre>
${fn:escapeXml('
<!-- all your code -->
')};
</pre>
You'll need to have jQuery enabled for it to work.

html to jade error when contains <pre>

I have some static html documents, and I want to convert them into Jade. I tried html2jade in npm, everything is OK except this: the <pre> elements in html convert empty, can someone help me?
The html code looks like this:
<pre><code><p>Hello</p><span>Hello Again</span></code></pre>
The result is:
pre.
You can write that a couple different ways in Jade. Here are two different methods. The first takes advantage of Jade's automatic escaping while the second uses HTML entities instead.
Automatic escaping:
pre
code= '<p>Hello</p><span>Hello Again</span>'
HTML entities:
pre
code <p>Hello</p><span>Hello Again</span>

Automatic <a> around headings in Pandoc

This Markdown code:
# Introduction
Turns into this HTML code when compiled with Pandoc:
<h1 id="introduction">Introduction</h1>
The way I use Markdown:
Generate HTML document
Edit it in MS Word to add page numbering
HTML version goes to blog, MS Word version goes to uni submissions
In CSS I can override link colors if they are inside H# tags, but MS Word has problems interpreting hierarchy of CSS overrides... and ends up with wrong colors anyway.
Is there a way to generate HTML without headings being wrapped in anchor tags, like below?
<h1 id="introduction">Introduction</h1>
In case there is no solution, here is a little PHP script I wrote to remove tags from headings that must be run on the resulting HTML file:
<?php
// Usage: php cleanheadings.php myhtmlfile.html
// Check that arguments were supplied
if(!isset($argv[1])) die('No input file, exiting');
// Load file
$content = file_get_contents($argv[1]);
// Cut out the <a> tag
$heading = '/(<h[123456] id="[\w-0-9]+">)(<a href="#[\w-0-9]+">)(.+)(<\/a>)(<\/h[123456])/mu';
$clean = '$1$3$5';
$cleanhtml = preg_replace($heading,$clean,$content);
// Write changes back to file
file_put_contents($argv[1], $cleanhtml);
?>

Rails 3 Escape BBCode-parsed HTML Only Within Pre+Code Tags

I'm trying to implement a markup system in my Rails application using the bb-ruby gem. Currently I'm working on something similar to how Stackoverflow handles it's code markdown and I ran into some difficulty.
Essentially I want the user-entered text:
[code]<h1>Headline</h1>[/code]
To spit out the code in plain-text, perhaps in a pre and code tag block. Passing that string of text to my code parser will wrap the code in a pre and code block but the HTML also gets rendered. I pass the string to my code parser like so:
sanitize(text.bbcode_to_html(formats, false).html_safe)
Of course, if I remove the .html_safe helper from the call my view will spit out:
<pre><code><br /> <h1>Hello World</h1><br /> </code></pre>
Obviously that's not the desired result. So my question is, how can I accomplish plain-text code only within the pre + code tags while maintaining the html_safe helper method?
I know this is an old question but you can try using the strip_tags after the bbcode_to_html one.