Invalid XML entity references from Jekyll - html

I'm using Jekyll on GitHub Pages to run my blog.
It seems as though Jekyll (semi-)randomly incorrectly XML escapes an XML special character as &tt;.
As an example, in the current version of the RSS feed, this source XML
</p>
<p>
in a single place becomes
</p>
<p&tt;
but it should have been
</p>
<p>
&tt; is an invalid XML entity reference, so some XML parsers choke on that and refuse to go on.
At first I suspected an invisible, invalid character at that place in the source, but as far as I can tell, this isn't the case. What's more is that this behaviour doesn't seem to be consistent:
The RSS feed currently has 7 such errors, of which the above is the first. However, the current Atom feed has only 5 such errors, and they are not in the same places. It's not only <p> tags that are affected, but other tags as well (e.g. <ul> tags should always be escaped as <ul>, but is in a single place instead escaped as &tt;ul>).
Furthermore, when I run
jekyll serve -w
on my local machine, I still see the same type of error, but not in the same places.
The HTML is XML escaped like this:
{{ post.content | xml_escape }}
Why does this happen, and what can I do about it?

The only thing xml_escape does is call CGI::escapeHTML, which replaces certain characters with their counterparts. If the bug is present in Jekyll, it's only because it's present in your version of Ruby's CGI module.

Related

Markdown/html not parsing correctly in eleventy from frontmatter generated by Netlify CMS

I've been stuck on this for an embarrassingly long time. I have two inputs that aren't displaying correctly, a markdown widget and the list widget. They both appear as one long string. I thought I needed to add a markdown parser for the former at least so I'm using markdown-it in a manner similar to this:
https://github.com/11ty/eleventy/issues/236
It is adding paragraph breaks where they should be but they show up on the page as p tags. I thought this was because I already had the parsed text nested between p tags but if I delete those nothing shows up at all. When I look at the html file created by eleventy, the tags show up as "&lt ;p&gt ;" (without the spaces) which it seems the browser isn't reading correctly when trying to interpret the html. I'm using nunjucks for templating if that matters. My .eleventy.js file looks like this currently. What am I missing? Also the markdown filter seems to only want to take a string so I'm not sure where to even begin with the list.
By default, Nunjucks HTML-escapes all variables when outputting templates. This is what you want most of the time, unless you're trying to render HTML input.
You might want to try using the safe filter after your markdownify filter.
{{ markdownContent | markdownify | safe }}

Unable to render variable in django template that contains HTML

I'm passing down a variable to a django template that contains an html. For example <strong>example</strong>. I mark this string as mark_safe() before storing it in my variable.
When I load it into the template and load the page in my browser it shows the html as plain text, <strong>example</strong>.
If I look at it in the chrome console the only thing that is different is that the text is surround with parenthesis. So it would look like this, "<strong>example</strong>"
Like I said I've read through all the other stackoverflow posts and marked the variables using the {% autoescape off %} tags and I've tried 'safe' tag. These will remove the escaping, but the HTML still doesn't render. Below is the actual html unescaped. I'm wondering if it's the space in front of it?
<p>Modern Comics That Are Valuable But Often Overlooked and Should Be Sought Out In Dollar Bins and In Your Own Collection</p>
<p><strong><em>Its Like Having $-Ray Vision</em></strong></p>
Thanks for the help.
The escaped string first needs to be parsed to HTML. Then you can unescape that string and pass it down and it will be rendered correctly.
import html.parser
html_parser = html.parser.HTMLParser()
description = html_parser.unescape(category.description)

Stop '---' being transformed to '<hr />'

I'm inserting remark.js slides (as MD files) into a jekyll site (hosted on github, and pre-processing done there).
Since remark.js uses three dashes to indicate a next slide, it's important that these three dashes do not get transformed into a new line '<hr />'.
Is there a way to turn off jekyll preprocessing within an MD file? Or, change the behavior so that --- are not transformed into <hr /> ?
I believe you would need to enter a backslash before the three hyphens, according to this document linked to from Jekyll's website.
Markdown allows you to use backslash escapes to generate literal characters which would otherwise have special meaning in Markdown’s formatting syntax.
But depending on the markdown processor you are using with Jekyll, the escape character could be something other than a backslash, or you might need to escape each hyphen.
This might be an old post, but recently I hit the same issue:
I couldn't escape --- in markdown such that remarkjs can render them as individual slides. In Jekyll 4.2.2, the --- was converted into </hr> and this was braking remarkjs.
My solution was to write my content for slides into an .md file and put it under _includes/presentations. I didn't add any --- at the beginning of this file so it will not be picked-up by Kramdown for processing. Then I added a regular .md file in _posts, to this file I added the previous one as an include between <pre> tags.
Content of the post file is:
---
layout: presentation
title: TDD Workshop Presentation
permalink: /tdd-workshop-presentation/
---
<pre>{% include presentations/tdd-workshop-1.md %}</pre>
Content of presentations/tdd-workshop-1.md
# TDD
## Test Driven Development Workshop
---
# Agenda
1. Introduction
2. Deep-dive
3. ...
Please mind the new line at the beginning of this file, as that's necessary for the first tag to be rendered properly.
I hope that this helps.

Thymleaf 3, attoparser, CDATA and HTML templating with script tag

I struggled with Spring Boot 1.3.5 which has dependency on old Thymeleaf 2.x series by trying to pass HTML template inside script tag:
<script type="text/template" id="catTmpl">
<![CDATA[
<b><%=name%></b>
]]>
</script>
which resulted in error:
org.xml.sax.SAXParseException: The content of elements must consist of well-formed character data or markup.
and after some manipulation it was properly passed but rendered with CDATA wrapping, that broke JS templating (undescore.js in my case):
_.template($("#catTmpl").html())
I came across some blog and found that Thymleaf 3 uses different parser. Checked:
$ gradle dependencies
| | +--- org.thymeleaf:thymeleaf:3.0.6.RELEASE
| | | +--- org.attoparser:attoparser:2.0.4.RELEASE
That parser assumes that script contain CDATA and above code works fine without CDATA.
What is attoparser?
Is Thymeleaf 3 ready to pass HTML templates via script (or HTML 5 template) tag without CDATA bullshit?
attoparser is http://www.attoparser.org/ From project home page:
To be easy to use. Few lines of code needed. And no more parser library hell worrying about your JDK's parser API versions.
To be fast. As fast as the fastest standard parsers. And in many scenarios, faster.
To offer a powerful interface. Consider well-formedness optional, line + column location, ability to reconstruct the original document, etc.
To simplify your parsing experience. By removing the need to worry about validation or entity resolution —both unneeded in many cases.
As written in mail list attoparser was designed to work with Thymeleaf.
It is able to parse HTML (so no need to close <p>), but as say this blog post parser accept <div/> which is invalid HTML 5. Thymeleaf 2 also accept this but produced valid <div></div>, Thymeleaf 3 don't. Be careful!
According to bug Modify the way is considered CDATA #9 attoparser interprets script body as CDATA only for:
All these alternatives should not be considered CDATA by default (unless they are explicitly enclosed in a CDATA section), so it should be a good idea to only consider the contents of a CDATA in the following situations regarding type:
No type
type is one of: javascript, ecmascript, text/javascript, text/ecmascript, application/javascript, application/ecmascript

Is freemarker template HTML escaped by default

I just started working with freemarker templates. I want to make sure that they are HTML escaped to avoid XSS vulnerabilities.
I tried using this template and passed anchor tag as a variable
String dummyAnchorTagVariable = "<a href='https://example.com'>Visit mysite</a>"
and used it in freemarker template
<div> ${dummyAnchorTagVariable} </div>
Result of this was seeing whole text including tags on the webpage and not as a link. So I assume that freemarker is HTML escaped by default
But when I try to find the documentation related to it, I don't find it anywhere that says Freemarker is HTML escaped by default
http://freemarker.incubator.apache.org/docs/ref_directive_escape.html
and there is even a blog post (although old) that describes how make it escape by default) http://watchitlater.com/blog/2011/10/default-html-escape-using-freemarker/
So I'm kind of confused about the HTML escaping in Freemarker.
FreeMarker before 2.3.24 is not escaped by default, unless someone is using a custom TemplateLoader that puts the template inside <#escape x as x?html>...</#escape>. If that's what happening in your case, then <#noescape>${dummyAnchorTagVariable}</#noescape> will work, otherwise it will give an error because there's no active #escape to disable.
FreeMarker 2.3.24 can auto-escape without TemplateLoader tricks (as of this writing it's not yet out, but hopefully RC1 comes in days and final in February).