I'm doing some contract work on a project that has a variety of HTML escaping methods. Some properties are escaped in the back-end and rendered as raw strings using triple handlebars {{{escaped-in-backend}}}, others are passed from the back-end raw and escaped using double handlebars {{unsafe}}.
The JS back-end encoding is done using the (legacy?) ESAPI library, and uses a mixture of encodeForHtml() and encodeForHtmlAttribute(). I couldn't find a whole lot of info on this, but this post suggests that the attribute encoding also escapes spaces as in order to be considered safe from XSS attacks. This may be what prompted the previous developer to do the escaping in the back-end.
Back-end escaping is nasty and I'd like to get rid of it and rely on handlebars. I haven't done F/E work in years so I just wanted to double-check that handlebars' escaping strategy is safe for attribute values, for example: <input type="text" value="{{unsafe-value}}"/>.
I'm leaning towards 'yes', as handlebars is pretty widespread, and this would be a pretty glaring security hole, but I couldn't find any explicit documentation saying as such.
You have a programmer reverse-engineering problem. You're reversing decisions made by prior programmers!
The issue you're up against, is that escaping rules in HTML attributes are different than escaping in-between tags, so we separate those escaping functions because they're not 1:1. I.E.
<textarea>{{HTMLEscapingIsDoneHere}}</textarea>
<span background="[HTMLAttributeEscapingIsHere]"> ...
HTML escaping covers many more characters than HTML Attribute Encoding. The danger is that you could be escaping quite a bit more than you'd intend, thus breaking some other feature like css or jquery selectors that might be trying to key in on those attribute values and will now fail because you've escaped too much and those functions were expecting plaintext and not HTML Attributes. Handlebars itself might have some functionality that keys on attribute values that will break if you HTML-Escape them. [I don't know, I've never used it..]
If you pay attention to what gets escaped up front and what gets escaped on the back, and the prior developers were consistent in that the only things getting escaped were your HTML Attributes... then I'd bet $50 that they tried escaping everything as full HTML, broke something, and then came up with backend escaping as the easiest solution at the time. If they were inconsistent, well, all bets are off.
In your case, maybe experiment escaping everything as HTML to see if you break anything, but especially if it works you'll want to clearly document your approach that you could create future incompatibilities with other dynamic frameworks. If you can't tell, I've run into this before. It's the business's job to accept or reject that risk.
My first suggestion, is that if you're working with handlebars IN JSPs, then just import the ESAPI taglibs and utilize the encoding functions there. Ugly? Sure. You're using more than one FE framework.
My second suggestion since you're probably married to handlebars is to wrap the esapi encoding functions you want to use here and introduce them into handlebars. For example, in JSP you have taglibs that allow you to either use ESAPI's or write custom wrappers. Since I know ESAPI was never coded with handlebars in mind, I'd write wrapper functions and explicitly escape HTML using handlebars, and HTML attributes with your inserted handlebars functions. If handlebars doesn't have a capability to allow you to insert your own escaping functions, then I'm afraid backend escaping is the next best alternative. The prior developers saw value in separating these escaping functions, but were unable to do so in FE code.
As to the broader question, "Is it safe to escape attributes as HTML instead of HTML Attributes" it's one of those "yes, but..." answers where I've already gone over the pitfalls. Most of those pitfalls go away when using a Single Page Web Application framework but that's because you've normalized all client/server communication to Javascript. That's also off-topic, but an arrow you should have in your contractor's quiver.
As a programmer currently working with handlebars and esapi4js, I can suggest you to do my way of escaping with handlebars helpers in a seperate js file as below;
(function() {
Handlebars.registerHelper('encodeFor', function(forWhat, param) {
switch(forWhat) {
case 'html':
return $ESAPI.encoder().encodeForHTML(param);
case 'htmlAttribute':
return $ESAPI.encoder().encodeForHTMLAttribute(param);
}
});
})();
encodeFor is the name of the helper function
forWhat and param are the parameters for that
After registering an helper you can use it in your handlebars templates like ;
<input type="text" value="{{encodeFor 'htmlAttribute' unsafe-value}}"/>
<td class="headerColumn">{{{encodeFor 'html' unsafe-value}}}</td>
Related
This feels like an easy one, but I'm having trouble finding the right search terms to get me what I need...
I have a requirement for part of my web page to display a previously-entered note from a user. The note is saved in the database, and I am currently incorporating it using Razor like this:
<span>#Model.UserNote</span>
This works fine, but it gets my spidey senses tingling... what if the user decides that he wants his note to be something like "</span><script>...</script><span>". I know how to use parameters to avoid injection attacks in SQL Server, but is there an HTML equivalent or another approach to avoid saving or injecting malicious markup in HTML? Displaying the text in a control like a textbox feels safer, but may not give me the visual appearance that I am looking for. Thanks in advance!
The thing you want to search for is cross-site scripting (xss).
The general solution is to encode output according to its context. For example if you are writing such data into plain html, you need html encoding, which is basically replacing < with & lt; and so on in dynamic data (~user input), so that everything only gets rendered as text. For a javascript context (for example but not only inside a <script> tag) you would need javascript encoding.
In .net, there is HttpUtility that includes such methods, eg. HttpUtility.JavascriptStringEncode(). Also there is the formerly separate AntiXSS library that can help by providing even stricter (whitelist-based) encoding, as opposed to the blacklist-based HttpUtility. So don't roll your own, it's trickier than it may first appear - just use a well-known implementation.
Also Razor has built-in protection against trivial xss attack vectors. By using #myVar, Razor automatically applies html encoding, so your code above is secure. Note that it would not be secure in a javascript context, where you need to apply javascript encoding yourself (ie. call the relevant method from HttpUtility for instance).
Note that without proper encoding, it is not more secure to use an input field or a textarea - an injection is an injection, doesn't matter much what characters need to be used if injection is possible.
Also slightly related, .net provides another protection besides the automatic html encoding. It uses "request validation", and by default won't allow request parameters (either get or post) to contain a less than character (<), immediately followed by a letter. Such a request would be blocked by the framework as potentially unsafe, unless this feature is deliberately turned off.
Your original example is blocked by both of these mechanisms (automatic encoding and request validation).
It's very important to note though, that in terms of xss, this is the very tip of the iceberg. While these protections in .net help somewhat, they are by no means sufficient in general. While your example is secure, in general you need to understand xss and what exactly these protections do to be able to produce secure code.
I am looking for a good character pair to use for enclosing template code within a template for the next version of our inhouse template engine.
The current one uses plain {} but this makes the parser very complex to be able to distinguish between real code blocks and random {} chars in the literal text in the template.
I think a dual char combination like the one used in asp.net or php is a better aproach but the question is char character pair should I use or is there some perfect single char that is never used and thats easy to write.
Some criteria that needs to be fullfilled:
Cannot be changed by HTMLEncode, the sources will be editable through webbased HTML editors and plain textareas and need to stay the same no matter what editor is used.
Regex will be used to clean code parts after editing in an HTML editor that might have encoded the internal part of the code block like & chars.
Should be resonably easy to write on both english and swedish keyboard layout.
Should be a very rare combination, the template will generate HTML and Text and could include CSS and Javascript literal text with JSON, so any combination that might collide with those is bad unless very rare. That means that {{}} is out as it can occur in JSON.
The code within the code block will contain spaces, underscores, dollar and many more combinations, not only fieldnames but if/while constructs as well.
The parser is generated with Antlr
I am looking for suggestions and objections to find one or more combinations that would work i as many situations as possible, possibly multiple alternative pairs for different situations.
Template-Toolkit defaults to [% template directives %], which works reasonably well.
I was going over my django site looking for xss problems. I figured I had it covered since django does auto escaping. So I put the usual alert('foo'); in sample data and I found a huge hole where I'm using ajax to pull data down as json and using jquery.append to add it, none of that is escaped for html, oops.
So my question is what is the best way to fix this:
Use my own copy of simplejson that auto escapes based on a param.
Just make sure I always use escape() when creating dicts that are going to be json dumped
Always use .text on the client side
Something I haven't thought of
It seems like this is a pretty easy problem to get yourself into.
Do something that is obvious/transparent/automatic, like Joel suggested here: http://www.joelonsoftware.com/articles/Wrong.html
Still, I don't see how "alert('foo');" can be harmful when injected into HTML. What would be harmful is if it was surrounded by "< script />" tag.
And for escaping HTML, you have to figure out if you want to do this on input or on output. Depending on what you want to achieve (e.g. allow a subset of HTML tags) and taking performance issues into account, you might want to escape the input and store escaped HTML into database.
I have some html (in this case created via TinyMCE) that I would like to add to a page. However, for security reason, I don't want to just print everything the user has entered.
Does anyone know of a templatetag (a filter, preferably) that will allow only a safe subset of html to be rendered?
I realize that markdown and others do this. However, they also add additional markup syntax which could be confusing for my users, since they are using a rich text editor that doesn't know about markdown.
There's removetags, but it's a blacklisting approach which fails to remove tags when they don't look exactly like the well-formed tags Django expects, and of course since it doesn't attempt to remove attributes it is totally vulnerable to the 1,000 other ways of script-injection that don't involve the <script> tag. It's a trap, offering the illusion of safety whilst actually providing no real security at all.
HTML-sanitisation approaches based on regex hacking are almost inevitably a total fail. Using a real HTML parser to get an object model for the submitted content, then filtering and re-serialising in a known-good format, is generally the most reliable approach.
If your rich text editor outputs XHTML it's easy, just use minidom or etree to parse the document then walk over it removing all but known-good elements and attributes and finally convert back to safe XML. If, on the other hand, it spits out HTML, or allows the user to input raw HTML, you may need to use something like BeautifulSoup on it. See this question for some discussion.
Filtering HTML is a large and complicated topic, which is why many people prefer the text-with-restrictive-markup languages.
Use HTML Purifier, html5lib, or another library that is built to do HTML sanitization.
You can use removetags to specify list of tags to be remove:
{{ data|removetags:"script" }}
I was wondering, and was as of yet, unable to find any answers online, how to accomplish the following.
Let's say I have a string that contains the following:
my_string = "Hello, I am a string."
(in the preview window I see that this is actually formatting in BOLD and ITALIC instead of showing the "strong" and "i" tags)
Now, I would like to make this secure, using the html_escape() (or h()) method/function.
So I'd like to prevent users from inserting any javascript and/or stylesheets, however, I do still want to have the word "Hello" shown in bold, and the word "string" shown in italic.
As far as I can see, the h() method does not take any additional arguments, other than the piece of text itself.
Is there a way to escape only certain html tags, instead of all? Like either White or Black listing tags?
Example of what this might look like, of what I'm trying to say would be:
h(my_string, :except => [:strong, :i]) # => so basically, escape everything, but leave "strong" and "i" tags alone, do not escape these.
Is there any method or way I could accomplish this?
Thanks in advance!
Excluding specific tags is actually pretty hard problem. Especially the script tag can be inserted in very many different ways - detecting them all is very tricky.
If at all possible, don't implement this yourself.
Use the white list plugin or a modified version of it . It's superp!
You can have a look Sanitize as well(Seems better, never tried it though).
Have you considered using RedCloth or BlueCloth instead of actually allowing HTML? These methods provide quite a bit of formatting options and manage parsing for you.
Edit 1: I found this message when browsing around for how to remove HTML using RedCloth, might be of some use. Also, this page shows you how version 2.0.5 allows you to remove HTML. Can't seem to find any newer information, but a forum post found a vulnerability. Hopefully it has been fixed since that was from 2006, but I can't seem to find a RedCloth manual or documentation...
I would second Sanitize for removing HTML tags. It works really well. It removes everything by default and you can specify a whitelist for tags you want to allow.
Preventing XSS attacks is serious business, follow hrnt's and consider that there is probably an order of magnitude more exploits than that possible due to obscure browser quirks. Although html_escape will lock things down pretty tightly, I think it's a mistake to use anything homegrown for this type of thing. You simply need more eyeballs and peer review for any kind of robustness guarantee.
I'm the in the process of evaluating sanitize vs XssTerminate at the moment. I prefer the xss_terminate approach for it's robustness—scrubbing at the model level will be quite reliable in a regular Rails app where all user input goes through ActiveRecord, but Nokogiri and specifically Loofah seem to be a little more peformant, more actively maintained, and definitely more flexible and Ruby-ish.
Update I've just implemented a fork of ActsAsTextiled called ActsAsSanitiled that uses Santize (which has recently been updated to use nokogiri by the way) to guarantee safety and well-formedness of the RedCloth output, all without needing any helpers in your templates.