Are there native methods to include dynamic content in HTML? - html

You should never create html by concatenating strings "<div>" + user_content + "</div>" because user_content may include html tags that inject scripts leading to XSS attacks.
There are libraries that escape html in string, the DOM's javascript API allows safe content assignment with .textContent, but what about native html methods?
For instance, there could be an html element text that escapes all inner content and renders it as text. A length attribute would tell the parser to skip the next length characters.
`<div><text length="${user_content.length()}">` + user_content + "</text></div>"
For background, I'm writing an aws lambda function in javascript. It reads user data from dynamo and builds a simple web page. The lambda has no dependencies (except aws-sdk which is included in the lambda environment) and I'd like to keep it that way as a challenge. The page should not use javascript.
Can you write a javascript program that generates a safe html document, that does not use javascript, from untrusted input. Also, without using html entity encoding.

Related

Safely process user content in Django template language [duplicate]

Is there a generic "form sanitizer" that I can use to ensure all html/scripting is stripped off the submitted form? form.clean() doesn't seem to do any of that - html tags are all still in cleaned_data. Or actually doing this all manually (and override the clean() method for the form) is my only option?
strip_tags actually removes the tags from the input, which may not be what you want.
To convert a string to a "safe string" with angle brackets, ampersands and quotes converted to the corresponding HTML entities, you can use the escape filter:
from django.utils.html import escape
message = escape(form.cleaned_data['message'])
Django comes with a template filter called striptags, which you can use in a template:
value|striptags
It uses the function strip_tags which lives in django.utils.html. You can utilize it also to clean your form data:
from django.utils.html import strip_tags
message = strip_tags(form.cleaned_data['message'])
Alternatively, there is a Python library called bleach:
Bleach is a whitelist-based HTML sanitization and text linkification library. It is designed to take untrusted user input with some HTML.
Because Bleach uses html5lib to parse document fragments the same way browsers do, it is extremely resilient to unknown attacks, much more so than regular-expression-based sanitizers.
Example:
import bleach
message = bleach.clean(form.cleaned_data['message'],
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
styles=ALLOWED_STYLES,
strip=False, strip_comments=True)

Sanitize <script> element contents

Say that I want to provide some data to my client (in the first response, with no latency) via a dynamic <script> element.
<script><%= payload %></script>
Say that payload is the string var data = '</script><script>alert("Muahahaha!")';</script>. An end tag (</script>) will allow users to inject arbitrary scripts into my page. How do I properly sanitize the contents of my script element?
I figure I could change </script> to <\/script> and <!-- to <\!--. Are there any other dangerous strings I need to escape? Is there a better way to provide this "cold start" data?
Edited for non-mutation of data.
If I'm interpreting this correctly. You want to prevent the user from ending the script tag prematurely within the user submitted string. That can be done for html just as you stated with adding the backslash in with the ending tag <\/script>. That is the only escaping you should have to worry about in that case. You shouldn't need to escape html comments as the browser will interpret it as part of the javascript. Perhaps if some older browsers don't interpret script tags default to the type of text/javascript correctly (language="javascript" which is deprecated) adding in type='text/javascript' may be necessary.
Based on Mike Samuel's answer here I may have been wrong about not needing to escape html comments. However I was not able to reproduce it in chrome or chromium.
Assuming that you're doing this:
Payload is set to
var data = '[this is user controlled data]';
and the rest of the code (assignment, quotes and semi-colon) is generated by your application, then the encoding you want is hex entity encoding.
See the OWASP XSS Prevention Cheat Sheet, Rule #3 for more information. This will convert
</script><script>alert("Muahahaha!")
into
var data = '\x3c\x2fscript\x3e\x3cscript\x3ealert\x28\x22Muahahaha\x21\x22\x29';
Try this and you will see this has the advantage of storing the user set string exactly correct, no matter what characters it contains. Additionally it takes care of single and double quote encoding. As a super bonus, it is also suitable for storing in HTML attributes:
<a onclick="alert('[user data]');" />
which normally would have to be HTML encoded again for correct display (because & inside an HTML attribute is interpreted as &). However, hex entity encoding does not include any HTML characters with special meaning so you get two for the price of one.
Update from comments
The OP indicated that the server-side code would be generated in the form
var data = <%= JSON.stringify(data) %>;
The above still applies. It is upto the JSON class to properly hex entity encode values as they're inserted into the JSON. This cannot easily be done outside of the class as you'd have to effectively parse the JSON again to determine the current language context. I wouldn't recommend going for the simple option of escaping the forward slash in the </script> because there are other sequences that can end the grammar context such as CDATA closing tags. Escape properly and your code will be future proof and secure.

My backbone marionette model contains a field with escaped html in it. How do I render that field's contents as HTML and not text?

Classic problem. Want to see html rendered but I'm seeing text in the browser. Whether I tell handlebars js to decode it or not in template ( three curly braces vs two - {{{myHtmlData}}} vs {{myHtmlData}} ) doesn't get me there. Something about the JSON being returned via the model.fetch() has this html data wrapped up in such a way that it is resistant to the notion of displaying as HTML. It's always considered a string whether encoded or decoded so it always displays as text.
Is this just something backbone isn't meant to do?
The technologies involved here are:
backbone.marionette
handlebars.js
.NET Web API
Your data is being escaped automatically. It's a good thing, but since you're sure the data is a safe HTML. Use {{{}}} as in this other question Insert html in a handlebar template without escaping .

Protect XSS issue only by replacing '<' and '>'

I would like to know if I can protect my website against XSS attacks by replacing ONLY < and > by < and > or am I missing something.
Example :
<?php echo '<div>' . $escaped . '</div>' ?>
I already know htmlspecialchars PHP function & affiliates
No, for the HTML body you will also need to encode the & character to prevent an attacker from potentially escaping the escape.
Check out the XSS Experimental Minimal Encoding Rules:-
HTML Body (up to HTML 4.01):
HTML Entity encode < &
specify charset in metatag to avoid UTF7 XSS
XHTML Body:
HTML Entity encode < & >
limit input to charset http://www.w3.org/TR/2008/REC-xml-20081126/#charsets
Note that if you want to enter stuff inside of an attribute value, then you need to properly encode all characters with special meaning. The XSS (Cross Site Scripting) Prevention Cheat Sheet mentions to encode the following characters:-
&,<, >, ", ', /
You must also quote the attribute value for the escaping to be effective.
The answer is no, someone will find his way to exploit it, somehow.
You are underestimating the number of techniques and the creativity of attackers. Read through the OWASP XSS Cheat Sheet https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet to have an idea of the number of ways this could happen. In your case, does it protect against an XSS into an onload attribute? Or into an input that becomes part of a CSS definition? In those situations you already are into an implicit tag, so you only need JS code to be added, no reason to use '<' or '>'
Do output validation with XSS, it is the simplest thing and it will protect you everywhere, just do it every single time you write anything (no matter if it comes from the user or not) and pay attention to the context (escape/encode for an URL when you are writing a link, escape/encode for JS when you are writing directly into a JS script, escape/encode for CSS when you are writing part of a CSS definition, escape/encode JSON when you write JSON data, escape/encode HTML in any other case).
In addition, even if it is unrelated, I usually point to this site to show how people like to be creative about JS http://www.jsfuck.com/ - this is meant to be obfuscation-only but I used it for evading anti-XSS controls, usually when made by a 3rd party.

Why do I need XSS library while I can use Html-encode?

I'm trying to understand why do I need to use XSS library when I can merely do HtlEncode when sending data from server to client ...?
For example , here in Stackoverflow.com - the editor - all the SO tem neads to do is save the user input and display it with html encode.
This way - there will never going to be a HTML tag - which is going to be executed.
I'm probably wrong here -but can you please contradict my statement , or exaplain?
For example :
I know that IMG tag for example , can has onmouseover , onload which a user can do malicious scripts , but the IMG won't event run in the browser as IMG since it's <img> and not <img>
So - where is the problem ?
HTML-encoding is itself one feature an “XSS library” might provide. This can be useful when the platform doesn't have a native HTML encoder (eg scriptlet-based JSP) or the native HTML encoder is inadequate (eg not escaping quotes for use in attributes, or ]]> if you're using XHTML, or #{} if you're worried about cross-origin-stylesheet-inclusion attacks).
There might also be other encoders for other situations, for example injecting into JavaScript strings in a <script> block or URL parameters in an href attribute, which are not provided directly by the platform/templating language.
Another useful feature an XSS library could provide might be HTML sanitisation, for when you want to allow the user to input data in HTML format, but restrict which tags and attributes they use to a safe whitelist.
Another less-useful feature an XSS library could provide might be automated scanning and filtering of input for HTML-special characters. Maybe this is the kind of feature you are objecting to? Certainly trying to handle HTML-injection (an output stage issue) at the input stage is a misguided approach that security tools should not be encouraging.
HTML encoding is only one aspect of making your output safe against XSS.
For example, if you output a string to JavaScript using this code:
<script>
var enteredName = '<%=EnteredNameVariableFromServer %>';
</script>
You will be wanting to hex entity encode the variable for proper insertion in JavaScript, not HTML encode. Suppose the value of EnteredNameVariableFromServer is O'leary, then the rendered code when properly encoded will become:
<script>
var enteredName = 'O\x27leary';
</script>
In this case this prevents the ' character from breaking out of the string and into the JavaScript code context, and also ensures proper treatment of the variable (HTML encoding it would result in the literal value of O'leary being used in JavaScript, affecting processing and display of the value).
Side note:
Also, that's not quite true of Stack Overflow. Certain characters still have special meanings like in the <!-- language: lang-none --> tag. See this post on syntax highlighting if you're interested.