Safe Way to Include User Text Input in HTML

Safe Way to Include User Text Input in HTML - html

This feels like an easy one, but I'm having trouble finding the right search terms to get me what I need...
I have a requirement for part of my web page to display a previously-entered note from a user. The note is saved in the database, and I am currently incorporating it using Razor like this:
<span>#Model.UserNote</span>
This works fine, but it gets my spidey senses tingling... what if the user decides that he wants his note to be something like "</span><script>...</script><span>". I know how to use parameters to avoid injection attacks in SQL Server, but is there an HTML equivalent or another approach to avoid saving or injecting malicious markup in HTML? Displaying the text in a control like a textbox feels safer, but may not give me the visual appearance that I am looking for. Thanks in advance!

The thing you want to search for is cross-site scripting (xss).
The general solution is to encode output according to its context. For example if you are writing such data into plain html, you need html encoding, which is basically replacing < with & lt; and so on in dynamic data (~user input), so that everything only gets rendered as text. For a javascript context (for example but not only inside a <script> tag) you would need javascript encoding.
In .net, there is HttpUtility that includes such methods, eg. HttpUtility.JavascriptStringEncode(). Also there is the formerly separate AntiXSS library that can help by providing even stricter (whitelist-based) encoding, as opposed to the blacklist-based HttpUtility. So don't roll your own, it's trickier than it may first appear - just use a well-known implementation.
Also Razor has built-in protection against trivial xss attack vectors. By using #myVar, Razor automatically applies html encoding, so your code above is secure. Note that it would not be secure in a javascript context, where you need to apply javascript encoding yourself (ie. call the relevant method from HttpUtility for instance).
Note that without proper encoding, it is not more secure to use an input field or a textarea - an injection is an injection, doesn't matter much what characters need to be used if injection is possible.
Also slightly related, .net provides another protection besides the automatic html encoding. It uses "request validation", and by default won't allow request parameters (either get or post) to contain a less than character (<), immediately followed by a letter. Such a request would be blocked by the framework as potentially unsafe, unless this feature is deliberately turned off.
Your original example is blocked by both of these mechanisms (automatic encoding and request validation).
It's very important to note though, that in terms of xss, this is the very tip of the iceberg. While these protections in .net help somewhat, they are by no means sufficient in general. While your example is secure, in general you need to understand xss and what exactly these protections do to be able to produce secure code.

Related

Is handlebars' default escaping safe for use in HTML attributes

I'm doing some contract work on a project that has a variety of HTML escaping methods. Some properties are escaped in the back-end and rendered as raw strings using triple handlebars {{{escaped-in-backend}}}, others are passed from the back-end raw and escaped using double handlebars {{unsafe}}.
The JS back-end encoding is done using the (legacy?) ESAPI library, and uses a mixture of encodeForHtml() and encodeForHtmlAttribute(). I couldn't find a whole lot of info on this, but this post suggests that the attribute encoding also escapes spaces as in order to be considered safe from XSS attacks. This may be what prompted the previous developer to do the escaping in the back-end.
Back-end escaping is nasty and I'd like to get rid of it and rely on handlebars. I haven't done F/E work in years so I just wanted to double-check that handlebars' escaping strategy is safe for attribute values, for example: <input type="text" value="{{unsafe-value}}"/>.
I'm leaning towards 'yes', as handlebars is pretty widespread, and this would be a pretty glaring security hole, but I couldn't find any explicit documentation saying as such.

You have a programmer reverse-engineering problem. You're reversing decisions made by prior programmers!
The issue you're up against, is that escaping rules in HTML attributes are different than escaping in-between tags, so we separate those escaping functions because they're not 1:1. I.E.
<textarea>{{HTMLEscapingIsDoneHere}}</textarea>
<span background="[HTMLAttributeEscapingIsHere]"> ...
HTML escaping covers many more characters than HTML Attribute Encoding. The danger is that you could be escaping quite a bit more than you'd intend, thus breaking some other feature like css or jquery selectors that might be trying to key in on those attribute values and will now fail because you've escaped too much and those functions were expecting plaintext and not HTML Attributes. Handlebars itself might have some functionality that keys on attribute values that will break if you HTML-Escape them. [I don't know, I've never used it..]
If you pay attention to what gets escaped up front and what gets escaped on the back, and the prior developers were consistent in that the only things getting escaped were your HTML Attributes... then I'd bet $50 that they tried escaping everything as full HTML, broke something, and then came up with backend escaping as the easiest solution at the time. If they were inconsistent, well, all bets are off.
In your case, maybe experiment escaping everything as HTML to see if you break anything, but especially if it works you'll want to clearly document your approach that you could create future incompatibilities with other dynamic frameworks. If you can't tell, I've run into this before. It's the business's job to accept or reject that risk.
My first suggestion, is that if you're working with handlebars IN JSPs, then just import the ESAPI taglibs and utilize the encoding functions there. Ugly? Sure. You're using more than one FE framework.
My second suggestion since you're probably married to handlebars is to wrap the esapi encoding functions you want to use here and introduce them into handlebars. For example, in JSP you have taglibs that allow you to either use ESAPI's or write custom wrappers. Since I know ESAPI was never coded with handlebars in mind, I'd write wrapper functions and explicitly escape HTML using handlebars, and HTML attributes with your inserted handlebars functions. If handlebars doesn't have a capability to allow you to insert your own escaping functions, then I'm afraid backend escaping is the next best alternative. The prior developers saw value in separating these escaping functions, but were unable to do so in FE code.
As to the broader question, "Is it safe to escape attributes as HTML instead of HTML Attributes" it's one of those "yes, but..." answers where I've already gone over the pitfalls. Most of those pitfalls go away when using a Single Page Web Application framework but that's because you've normalized all client/server communication to Javascript. That's also off-topic, but an arrow you should have in your contractor's quiver.

As a programmer currently working with handlebars and esapi4js, I can suggest you to do my way of escaping with handlebars helpers in a seperate js file as below;
(function() {
Handlebars.registerHelper('encodeFor', function(forWhat, param) {
switch(forWhat) {
case 'html':
return $ESAPI.encoder().encodeForHTML(param);
case 'htmlAttribute':
return $ESAPI.encoder().encodeForHTMLAttribute(param);
}
});
})();
encodeFor is the name of the helper function
forWhat and param are the parameters for that
After registering an helper you can use it in your handlebars templates like ;
<input type="text" value="{{encodeFor 'htmlAttribute' unsafe-value}}"/>
<td class="headerColumn">{{{encodeFor 'html' unsafe-value}}}</td>

GWT HTML Widget XSS security

Might be a noobish question (most likely) but according to the official developer documents GWT's HTML widget is not XSS safe and one must exercise caution when embedding custom HTML/Script text.
So i guess my question is, why does this:
HTML testLabel = new HTML("dada<script type='text/javascript'>document.write('<b>Hello World</b>');</script>");
Not show a javascript popup? If somehow, GWT's HTML widget does protect from XSS attacks then in what types of situations does it not (so i can know what to expect)?

GWT documentation contains few articles about security (including dealing with XSS using SafeHtml).
Your example doesn't work because scripts defined via innerHTML doesn't get executed in Chrome/Firefox(i think there were some workaround for IE using defer attribute).
But you shouldn't rely on this browser restriction.. So it is better to use SafeHtml and always validate inputs from users.

I don't know about this widget in particular, but in general it is worth knowing that XSS vectors come in many many flavours. Only a small percentage actually use the script tag.
One very important factor is that they are location-dependent. For example, a string that is xss-safe outside any tags, may not be safe inside a tag's attribute value, or within a delimited string that is inside a javascript block.
They can also be browser-dependent, as many exploit 'bugs' in the document parsing model.
To get a sense of the variety of different vectors that can be abused to produce malicious javascript injection, please see these two cheat sheets
I also recommend you read the prevention cheat sheet at owasp

Dynamically Obfuscate HTML

I was wondering if there was any way to dynamically obfuscate html on a live server but not offline, so soon as my website was visited the source would be obfuscated rather than in plain text.

Since the client (browser) will have to parse it into a sensible DOM tree, this is pretty much fruitless. These days it's a lot more common to inspect a site using Firebug/Webkit Inspector, which provides a nicely formatted, navigable tree. Most people won't even notice that the HTML is "obfuscated", much less be stopped by it.
Executable code can be obfuscated by minimizing variable names and such without changing the result. HTML is the result though, if you change anything about it, the result will change. So "obfuscation" would mostly be limited to creative use of spacing anyway.

The real question you should ask yourself is "why do I need to obfuscate HTML?". If you're hiding sensitive information, then you should be either encrypting that data, or never presenting it to the client.
Most sensitive information or transactions should take place on the server, and the client only receives a token, or encrypted information, or a unique transaction identifier that can be passed back and forth.

Let me put it this way: There's no way to dynamically obfuscate the HTML on your site such that any reasonably competent person couldn't get it anyway.
You could use JavaScript to attempt to obfuscate it, but you'd have to do it in a way that didn't actually affect the DOM.
You could generate the contents of the page itself with JavaScript, but that is likely to damage accessibility, and once again the DOM will have to be in a condition the browser can use.
You could insert massive amounts of whitespace into the source, but that is easily overcome as well.
All this, and you make it harder and more annoying to manage your site. Minification has its purpose, but obfuscation here is lose-lose.

Your could search for and remove all tabs, newlines, extra spaces, and comments

If you are using php, IonCube has a plugin. it can be found here: http://www.ioncube.com/html_encoder.php it turns your html page into minified javascript.

Django templatetag for rendering a subset of html

I have some html (in this case created via TinyMCE) that I would like to add to a page. However, for security reason, I don't want to just print everything the user has entered.
Does anyone know of a templatetag (a filter, preferably) that will allow only a safe subset of html to be rendered?
I realize that markdown and others do this. However, they also add additional markup syntax which could be confusing for my users, since they are using a rich text editor that doesn't know about markdown.

There's removetags, but it's a blacklisting approach which fails to remove tags when they don't look exactly like the well-formed tags Django expects, and of course since it doesn't attempt to remove attributes it is totally vulnerable to the 1,000 other ways of script-injection that don't involve the <script> tag. It's a trap, offering the illusion of safety whilst actually providing no real security at all.
HTML-sanitisation approaches based on regex hacking are almost inevitably a total fail. Using a real HTML parser to get an object model for the submitted content, then filtering and re-serialising in a known-good format, is generally the most reliable approach.
If your rich text editor outputs XHTML it's easy, just use minidom or etree to parse the document then walk over it removing all but known-good elements and attributes and finally convert back to safe XML. If, on the other hand, it spits out HTML, or allows the user to input raw HTML, you may need to use something like BeautifulSoup on it. See this question for some discussion.
Filtering HTML is a large and complicated topic, which is why many people prefer the text-with-restrictive-markup languages.

Use HTML Purifier, html5lib, or another library that is built to do HTML sanitization.

You can use removetags to specify list of tags to be remove:
{{ data|removetags:"script" }}

Writing XSS Filter for (X)HTML Based on White List

I need to implement a simple and efficient XSS Filter in C++ for CppCMS. I can't use existing high quality filters
written in PHP because because it is high performance framework that uses C++.
The basic idea is provide a filter that have a while list of HTML tags and a white
list of options for these tags. For example. typical HTML input can consist of
<b>, <i>, tags and <a> tag with href. But straightforward implementation is not
good enough, because, even allowed simple links may include XSS:
Click On Me
There are many other examples can be found there. So I though also about a possibility to create a white list of prefixes for tags like href/src -- so I always need to check if it starts with (https?|ftp)://
Questions:
Are these assumptions are good enough for most of purposes? Meaning that If I do not
give an options for style tags and check src/href using white list of prefixes it solves XSS problems? Are there problems that can't be fixes this way?
Is there a good reference for formal grammar of HTML/XHTML in order to write simple
parser that would cleanup all incorrect of forbidden tags like <script>

You can take a look at the Anti Samy project, trying to accomplish the same thing. It's Java and .NET though.
http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project#.NET_version
http://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project_.NET
Edit 1, A bit extra :
You can potentially come up with a very strict white listing. It should be structured well and should be pretty tight and not much flexible. When you combine flexibility, so many tags, attributes and different browsers generally you end up with a XSS vulnerability.
I don't know what is your requirements but I'd go with a strict and simple tag support (only b li h1 etc.) and then strict attribute support based on the tag (for example src is only valid under href tag), then you need to do whitelisting in the attribute values as you stated http|https|ftp or style="color|background-color" etc.
Consider this one:
<x style="express/**/ion:(alert(/bah!/))">
Also you need to think about some character whitelisting or some UTF-8 normalization, because different encodings can cause awkward issues. Such as new lines in attributes, non valid UTF-8 sequences.

All details of HTML parsing are specified in HTML 5. However implementation of it is quite a lot of work, and it doesn't matter whether you'll parse HTML exactly with all corner cases. At worst you'll end up with different DOM, but you have to sanitize DOM anyway.

As you mentioned, there are various PHP implementations of this, but I don't know of any in C++, since that's not a language typically applied to web development. Overall, it's going to depend on how complex of an implementation you want to come up with.
A very restrictive whitelist is probably the "simplest" way, but if you want to be really comprehensive I would look into doing a conversion of one of the established versions to C++, as opposed to trying to write your own from scratch. There are so many tricks to worry about, that I think you'd be better off standing on the shoulders of others that have already gone through all that.
I don't know anything about using C++ for web development, but converting PHP to it doesn't seem like it would be a particularly difficult task, PHP doesn't really have any magical capabilities that C++ won't be able to duplicate. I'm sure there will be some small hitches, but overall if you want to go the more-complex route it'd definitely still be faster to do a conversion than a full design from scratch.
HTML Purifier seems like a strong PHP implementation that is still actively maintained, there's a comparison document where the author discuss some differences between his approach and others', probably worth reading.
Whatever you come up with, definitely test it with all the examples you link, and make sure it passes all those. Good luck!

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008