By default, jinja2 offers escape functionality for html. I am however trying to (ab)use the templating engine for latex documents, which means that the html escaping is not really useful. Instead I would like to implement some sort of (basic) latex escaping.
So the question is: How can I override the jinja2.escape() function?
I tried doing something like
import jinja2.utils
def latex_escape(test):
print(test)
return 'dummystring'
jinja2.utils.escape = latex_escape
but this doesn't work at all
Related
i have multi-language next js website.
i use packge i18next. i define variable in jsx like this.
{t("satisfied:title")}
it means
{t("JSONflieName:JSONvariable")}
but in json file i can't use HTML and it shows me like string
"title" : "this is my<br \/> <span class=\"test\"> test <\/span> title"
HTML entities are escaped for security reasons by React. In order to make your translations render actual HTML you could do something like this:
<div dangerouslySetInnerHTML={{ __html: t("satisfied:title")}} />
But, I would strongly advice you to not do it like above. The actual thing you should beware of is putting HTML into your translated texts. If these translations are managed at some point by external translators, they might not be familiar with the HTML syntax. Do you trust these translators to not destroy your UI accidentally? You should not.
Instead you should use the advanced concepts for handling cases like above using this technique: https://react.i18next.com/latest/trans-component
Is there a generic "form sanitizer" that I can use to ensure all html/scripting is stripped off the submitted form? form.clean() doesn't seem to do any of that - html tags are all still in cleaned_data. Or actually doing this all manually (and override the clean() method for the form) is my only option?
strip_tags actually removes the tags from the input, which may not be what you want.
To convert a string to a "safe string" with angle brackets, ampersands and quotes converted to the corresponding HTML entities, you can use the escape filter:
from django.utils.html import escape
message = escape(form.cleaned_data['message'])
Django comes with a template filter called striptags, which you can use in a template:
value|striptags
It uses the function strip_tags which lives in django.utils.html. You can utilize it also to clean your form data:
from django.utils.html import strip_tags
message = strip_tags(form.cleaned_data['message'])
Alternatively, there is a Python library called bleach:
Bleach is a whitelist-based HTML sanitization and text linkification library. It is designed to take untrusted user input with some HTML.
Because Bleach uses html5lib to parse document fragments the same way browsers do, it is extremely resilient to unknown attacks, much more so than regular-expression-based sanitizers.
Example:
import bleach
message = bleach.clean(form.cleaned_data['message'],
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
styles=ALLOWED_STYLES,
strip=False, strip_comments=True)
I am building a static blog, which uses Marked to parse markdown. I want to be able to have code blocks with tabs.
I want to parse code that looks like this:
```JavaScript
var geolocation = require("nativescript-geolocation");
```
```TypeScript
import geolocation = require("nativescript-geolocation");
```
To something like this (from the angular2 docs), where the tab names would be JavaScript and TypeScript.
I am programming in JavaScript (nodeJs), so I could manually render this if required? What would a custom implementation of a code block tab look like?
I am not sure if there is a special name for these, as I can't really seem to find any examples or templates.
I think answer is: 'Marked' does not support custom tags. I've spend few hours trying to find some way to extend it and finally switched to showdown.
It appears to be really easy to implement one ( her is expandable section tag example ).
Extension 'showdownjs/prettify-extension' implements code highlighting using Google Prettify.
Is there a CPAN module or code snippet that I can use to modify local HTML files without using a regExp?
What I want to do :
Change the start tag ( example : <div> to <div id="newtag"> )
Add a tag before another ( example : </head> to <script type="text/javascript"> ...</script></head>
Remove tags
Read the content of a given tag. (<- ok this can be done with an XML / HTML parser.
If you have HTML, and not XHTML, then you don't want to be using an XML parser.
HTML::Parser is the standard HTML parser for Perl. Pretty much everything else is built on top of it.
HTML::TokeParser is an alternative interface to HTML::Parser. It returns things on demand instead of passing everything to callbacks.
HTML::TreeBuilder builds a DOM-like tree from the HTML, which you can then modify.
HTML::TreeBuilder::XPath extends HTML::TreeBuilder with XPath support.
HTML::Query extends HTML::TreeBuilder with jQuery-like selectors.
pQuery is another module that brings more complete jQuery compatibility to HTML::TreeBuilder.
CPAN
XML::XPATH
XML::Xerces
A simple CPAN search returns
XML Search
XPATH
XPATH Tutorial
It sounds like you are not familiar with XPath. Here is a quick tutorial to get you familiar. Its not Perl but it will explain the concepts.
I have a couple of websites that I want to extract data from and based on previous experiences, this isn't as easy as it sound. Why? Simply because the HTML pages I have to parse aren't properly formatted (missing closing tag, etc.).
Considering that I have no constraints regarding the technology, language or tool that I can use, what are your suggestions to easily parse and extract data from HTML pages? I have tried HTML Agility Pack, BeautifulSoup, and even these tools aren't perfect (HTML Agility Pack is buggy, and BeautifulSoup parsing engine doesn't work with the pages I am passing to it).
You can use pretty much any language you like just don't try and parse HTML with regular expressions.
So let me rephrase that and say: you can use any language you like that has a HTML parser, which is pretty much everything invented in the last 15-20 years.
If you're having issues with particular pages I suggest you look into repairing them with HTML Tidy.
I think hpricot (linked by Colin Pickard) is ace. Add scrubyt to the mix and you get a great html scraping and browsing interface with the text matching power of Ruby http://scrubyt.org/
here is some example code from http://github.com/scrubber/scrubyt_examples/blob/7a219b58a67138da046aa7c1e221988a9e96c30e/twitter.rb
require 'rubygems'
require 'scrubyt'
# Simple exmaple for scraping basic
# information from a public Twitter
# account.
# Scrubyt.logger = Scrubyt::Logger.new
twitter_data = Scrubyt::Extractor.define do
fetch 'http://www.twitter.com/scobleizer'
profile_info '//ul[#class="about vcard entry-author"]' do
full_name "//li//span[#class='fn']"
location "//li//span[#class='adr']"
website "//li//a[#class='url']/#href"
bio "//li//span[#class='bio']"
end
end
puts twitter_data.to_xml
As language Java and as a open source library Jsoup will be a pretty solution for you.
hpricot may be what you are looking for.
You may try PHP's DOMDocument class. It has a couple of methods for loading HTML content. I usually make use of this class. My advises are to prepend a DOCTYPE element to the HTML in case it hasn't one and to inspect in Firebug the HTML that results after parsing. In some cases, where invalid markup is encountered, DOMDocument does a bit of rearrangement of the HTML elements. Also, if there's a meta tag specifying the charset inside the source be careful that it will be used internally by libxml when parsing the markup. Here's a little example
$html = file_get_contents('http://example.com');
$dom = new DOMDocument;
$oldValue = libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors($oldValue);
echo $dom->saveHTML();
Any language which works with HTML on DOM level is good.
for perl it is HTML::TreeBuilder module.