Allowing basic html markup in django - html

Im creating an app that will process user submitted content. I would like to enable users to make their text-based content look pretty with basic html markup i.e < i > < b > < br > . However I do want to prevent them from using things like script tags. Django will auto escape everything therefore it will also disable all safe markup. I can disable this by using:
{{ somevar|safe }} or {% autoescape off %}
However this will also enable all harmfull script tags. Django does provide the linebreaks filter tag which transform white space to br or p tags while keeping the html safe:
{{ somevar|linebreaks }}
Unfortunately I am not aware of any filters that allow b or i tags to be used.
So I am wondering if there is a smart solution to this problem. And if you suggest a third party library would it be best to employ the solution when saving the model or when rendering the content.
UPDATE
In the end I went with this solution Python HTML sanitizer / scrubber / filter. This latter answer provide a way to use the Beautiful Soup library to remove all unwanted html tags from user submitted content. This can be done before saving the model therefore making it safe to use the template filter {{ somevar|safe }} when rendering the page.

Take a look at django-tinymce. It should give you the flexibility you're looking for. You're going to be safest sanitizing the content before it makes its way into your database. TinyMCE can be configured to allow or not allow whatever tags you'd like.

Related

Markdown/html not parsing correctly in eleventy from frontmatter generated by Netlify CMS

I've been stuck on this for an embarrassingly long time. I have two inputs that aren't displaying correctly, a markdown widget and the list widget. They both appear as one long string. I thought I needed to add a markdown parser for the former at least so I'm using markdown-it in a manner similar to this:
https://github.com/11ty/eleventy/issues/236
It is adding paragraph breaks where they should be but they show up on the page as p tags. I thought this was because I already had the parsed text nested between p tags but if I delete those nothing shows up at all. When I look at the html file created by eleventy, the tags show up as "&lt ;p&gt ;" (without the spaces) which it seems the browser isn't reading correctly when trying to interpret the html. I'm using nunjucks for templating if that matters. My .eleventy.js file looks like this currently. What am I missing? Also the markdown filter seems to only want to take a string so I'm not sure where to even begin with the list.
By default, Nunjucks HTML-escapes all variables when outputting templates. This is what you want most of the time, unless you're trying to render HTML input.
You might want to try using the safe filter after your markdownify filter.
{{ markdownContent | markdownify | safe }}

Django form is rendered with invalid HTML syntax

I have a Django project where I use the integrated forms. But it sends my client wrong HTML syntax. This shouldn't be that big of a deal since browsers nowadays clean up such errors. But when the form gets send back to the server the form isn't able to validate because firefox sends back the cleaned version.
I have a form with an multiple select:
class ProjectForm(forms.Form):
# [...]
project_leaders = forms.ModelChoiceField(widget=forms.SelectMultiple, queryset=User.objects.all(), initial=0)
This form is integrated in the respective html file:
{{ project_form.as_p | linebreaks }}
This is the source code from it (via Firefox Page Source):
<p>[...] <select name="project_leaders" required id="id_project_leaders" multiple><br>
<option value="test">test</option></p>
<p></select></p>
Firefox cleans it up oc but it should be send and accepted by django.
Does anybody know how I can django to do that?
This shouldn't be that big of a deal since browsers nowadays clean up such errors.
The browser tries its best to distill some meaning out of erroneous markup, but the result is not always what the author expected. For getting exactly the wanted structure, said author should write correct HTML. This hasn’t changed since the 90s.
In this specific case, my suggestion is to get rid of the | linebreaks filter. It is meant for plain text with at the most simple formatting tags.
The filter adds a <br> after the opening <select> tag. This leads the browser to automatically close the <select> again, since <br>s are not valid inside <select>s. The <option> elements are then placed outside the <select>, having no effect anymore whatsoever. The closing and re-opening <p> tags are a symptom of the browser not fully knowing what to do with the final stray </select>.

Django Haystack search in Html

i was just wondering (since i didn't find anything quick on Google) if its possible (and how do i achieve that) to search directly in an html file, and ignore the tags or not as i please?
explaining a bit further. we wrote a crawler and obviously the crawler gives back the HTML of the page. But if i feel like searching the content of the crawler, do i need 2 separate fields one with html and one without or i can just have one field with html and search ignoring the html tags or not.
thanks in advance.
If i correctly understand you, all you need is to set search indexes without html tags?
We solved that problem this way:
class PostIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.CharField(model_attr='text', use_template=True, document=True)
and in template (search/indexes/blogs/post_test.html) we just used striptags filter
{{ object.content|striptags }}
After that you need to build_schema and rebuild_index. Now it search correctly without tags.

Showing certain tags django templates

If I want to be able to show only certain tags in (say as in a forum post) using django tempalte variables how would I do that?
Say the content of my post is:
<div><b>Hell</div>o <i>everyone</i></b>
I don't want to show the div tags, but the b and i tags are fine. I know you can use |safe and autoescape but that seems to escape all html. Is there a better way to do this?
You could use a Custom Django Filter with a Regular Expression that does this.
Have a look here: http://djangosnippets.org/snippets/60/ replace the Regular Expression with what you need to remove the HTMl tags you don't want.

Dojo wysiwyg editor and Django template

I've added the Dojo wysiwyg editor to my django admin panel. I have an issue when I type more than one space. When I output the syntax created by the editor to a template I get &nbsp; for each extra space. I assume it's trying to create a non-breaking space, but it renders in the browser as .
Does anyone know why this is? How would I go about fixing this?
I think its django who is changing & to & on serverside. If its was a simple space django would have replaced it with by itself. I donno if there is any feature to turn off escaping for is specific case in django you can try that
After a little research you want to use the template filter safe to fix this issue. You'll probably also want to add the filter removetags with script as an option to remove potentially malicious javascript. So my template variable ends up looking like this: {{ var|removetags:"script"|safe }}