For setHTML() method, is it still safe If we do not use Safehtml but we validate the String & only accept some limited html tag (Gwt)? - html

Any widget that has setHTML method could give a hole in security system, but if we validate String & only accept some limited html tags such as <b>, <i>.... And then we put this string into setHTML method.
Then my question is "is it still safe if we do that"
For example, we check the String text to make sure it only contain some limited html tags <b>, </b>, <i>, </i>... If the string text contain other tags then we won't let uses to input that text. Then we use:
html1.setHTML(text); instead of html1.setHTML(SafeHtmlUtils.fromString(text))
i don't know why html1.setHTML(SafeHtmlUtils.fromString(text)) does not generate the formatted text, it just shows plain text when i run it in eclipse? For example
html1.setHTML(SafeHtmlUtils.fromString("<b>text</b>"))
will have plain text result <b>text</b> instead of bold text "text" with correct html format

You want to sanitize the html, not escape it. The fromString method is meant to escape the string - if a user types enters a < b, but forgets the space, then adds >c, you don't want the c to be bold and the b to be missing entirely. Escaping is done to actually render the string that is given, assuming it is text.
On the complete other end of the spectrum, you can use fromTrustedString which tells GWT that you absolutely trust the source of the data, and that you will allow it to do anything. This typically should not be done for any data that comes from the user.
Somewhere off to the side of all of the then we have sanitation, the process where you take a string that is meant to be HTML, and ensure it is safe, rather than either treating it like text, or trusting it implicitly. This is hard to do well - any tag that has a style attribute could potentially attack you (this is why GWT has SafeStyle like SafeHtml, any tag that has a uri, url or href could be used to attack (hence SafeUri), and any attribute that the browser treats as a callback such as onclick or the like can be used to run JavaScript. The HtmlSanitizer type is meant to be able to do this.
There is a built-in implementation of this, as of at least GWT 2.4 - SimpleHtmlSanitizer. This class whitelists certain html tags, including your <b> and <i> tags, as well as a few others. Attributes are completely removed, as there are too many cases where they might not be safe. As the class name suggests, this is just a simple approach to this problem - a more complex and in-depth approach might be more true to the original code, but this also comes with the risk of allowing unsafe HTML content.

Related

What kind of text may be wrapped within HTML code tags?

I would like to know what kind of text belongs to the HTML <code> element and what does not?
For example, I know that this is a good usage of HTML <code> tag:
Use the <code>str()</code> function to convert the object into a string.
But I am not sure if these are good usages of the <code> tag:
1. The list of users can be found at <code>/etc/passwd</code>.
2. We need to wait for <code>200 OK</code> response before the next step.
3. Enter the <code>ls</code> command to obtain a directory listing.
4. Compile the source code in <code>foo.c</code> to <code>foo.o</code>.
Is there a standards-document or a W3C guideline document or a similarly authoritative reference that precisely defines what elements may belong to the HTML <code> element and what may not?
The definition of the code element (from HTML 5.2) is:
The code element represents a fragment of computer code. This could be an XML element name, a file name, a computer program, or any other string that a computer would recognize.
This is what decides whether it’s allowed (i.e., semantic) to use the element or not. But you should also check if there is a more specific element available.
Reviewing your examples
Use the <code>str()</code> function to convert the object into a string.
This is fine.
The list of users can be found at <code>/etc/passwd</code>.
This is fine.
We need to wait for <code>200 OK</code> response before the next step.
You could consider using the samp element instead, which represents "sample or quoted output from another program or computing system".
Enter the <code>ls</code> command to obtain a directory listing.
You could consider using the kbd element instead, which represents "user input (typically keyboard input, […])".
Compile the source code in <code>foo.c</code> to <code>foo.o</code>.
This is fine.
There are no good or bad usages of <code> tag.
To be more precise, HTML spec (and browsers, for that matter) is not opinionated on the syntax of a <code> tag's content. It does not check if it's valid code in any existing programming language.
Any phrasing content is valid from HTML spec's point of view.
Any non-phrasing content is invalid.
The code tag is similar to <pre> tag and allows browsers (through their default stylesheets) and users to style content differently, based on the fact it is a different tag.
Many times, when code snippets, functions or method names (specific to programming) are used in other content, it is important (or at least desired) they are marked (and formatted) differently than normal text.
That is the intended purpose for which <code> tag was added to HTML.
This does not mean there's any mechanism in place stopping you from using it for any other purpose you may see fit, as long as it is fit for that purpose, given its limitation at only containing phrasing content.

ReactJS - How to render carriage returns correctly when returned in Ajax call

In ReactJS, how is it possible to render carriage returns that may be submitted by the user in a textarea control. The content containing the carriage returns is retrieved by an Ajax call which calls an API that needs to convert the \r\n characters to <br> or something else. And then, I have a div element in which the content should be rendered. I tried the following Ajax responses:
{
"Comment" : "Some stuff followed by line breaks<br/><br/><br/><br/>And more stuff.",
}
and
{
"Comment" : "Some stuff followed by line breaks\n\n\nAnd more stuff.",
}
But instead of rendering the carriage returns in the browser, it renders the br tags as plain text in the first case and \n character as space in the second case.
What's the recommended approach here? I'm guessing I should steer clear of the scary dangerouslySetInnerHTML property? For example the following would actually work but there must a safer way of handling carriage returns:
<div className="comment-text" dangerouslySetInnerHTML={{__html: comment.Comment}}></div>
dangerouslySetInnerHTML is what you want. The name is meant to be scary, because using it presents a risk for XSS attacks, but essentially it's just a reminder that you need to sanitize user inputs (which you should do anyway!)
To see an XSS attack in action while using dangerouslySetInnerHTML, try having a user save a comment whose text is:
Just an innocent comment.... <script>alert("XSS!!!")</script>
You might be surprised to see that this comment will actually create the alert popup. An even more malicious user might insert JS to download a virus when anyone views their comment. We obviously can't allow that.
But protecting against XSS is pretty simple. Sanitization needs to be done server side, but there are plenty of packages available that do this exact task for any conceivable serverside setup.
Here's an example of a good package for Rails, for example: https://github.com/rgrove/sanitize
Just be sure whichever sanitizer you pick uses a "whitelist" sanitization method, not a "blacklist" one.
If you're using DOM, ensure you're using innerHTML to add text. However, in react world, more favourable is to use https://www.npmjs.com/package/html-to-react
Also, browser only understands HTML and won't interpret \n as line break. You should replace that with <br/> before rendering.

html markup in messages properties with placeholders - XSS potential

Given the message in a messages properties file:
message = Change relation <strong>{0}</strong> -> <strong>{1}</strong> to <strong>{2}</strong> -> <strong>{3}</strong>?
if the content of any of the placeholders is a user-influenced string, I need to html escape the message in order to prevent a potential XSS (I do that by using the c:out tag in my JSP templates, I guess I could use the htmlEscape-attribute of the spring:message tag as well, but I think there's no difference).
However by doing so, I corrupt the markup in the message, <strong> etc. which leads to the output:
Change relation <strong>Peter</strong> -> <strong>Car</strong> to <strong>Carl</strong> -> <strong>Bus</strong>?
I've already read the thread here on stackoverflow but it does not address XSS.
I am thinking about these options:
1) Simply replace all <strong> tags from the messages properties files with single quotes. Then there's no problem html escaping the entire message, with the drawback of a little less highlighting of the specific parts of the message.
2) Split the message into parts which allow for separate markup in the (JSP) template. This feels like much work just to get the markup right.
Am I missing something here? Which is the better option, or is there another option?
Edit: Without html-escaping the message is, like I want it to be, like this:
Change relation Peter -> Car to Carl -> Bus?
So the html-markup as in the messages.properties file is being rendered when displayed in the template.
When escaping, the message is like above, showing me the <strong> tags instead of rendering them.
Going under the assumption that you are getting the following output:
Change relation <strong>Peter</strong> -> <strong>Car</strong> to <strong>Carl</strong> -> <strong>Bus</strong>
It looks like you are escaping your entire HTML string rather than just the part that needs to be escaped.
You should escape each {#} value on its own, and then place it into the HTML. The general values you need to escape are: <, >, ', ", and &, but use an anti-xss library and templating system if you can.
Once you've escaped all the potentially dangerous parts, you can use something like <c:out value="${msg}" escapeXml="false"/>. This is not a language/framework I know, but you need some way to output the actual HTML vs the escaped version. Whatever way you prefer should be fine as long as you properly escape the untrusted part.

Why do I need XSS library while I can use Html-encode?

I'm trying to understand why do I need to use XSS library when I can merely do HtlEncode when sending data from server to client ...?
For example , here in Stackoverflow.com - the editor - all the SO tem neads to do is save the user input and display it with html encode.
This way - there will never going to be a HTML tag - which is going to be executed.
I'm probably wrong here -but can you please contradict my statement , or exaplain?
For example :
I know that IMG tag for example , can has onmouseover , onload which a user can do malicious scripts , but the IMG won't event run in the browser as IMG since it's <img> and not <img>
So - where is the problem ?
HTML-encoding is itself one feature an “XSS library” might provide. This can be useful when the platform doesn't have a native HTML encoder (eg scriptlet-based JSP) or the native HTML encoder is inadequate (eg not escaping quotes for use in attributes, or ]]> if you're using XHTML, or #{} if you're worried about cross-origin-stylesheet-inclusion attacks).
There might also be other encoders for other situations, for example injecting into JavaScript strings in a <script> block or URL parameters in an href attribute, which are not provided directly by the platform/templating language.
Another useful feature an XSS library could provide might be HTML sanitisation, for when you want to allow the user to input data in HTML format, but restrict which tags and attributes they use to a safe whitelist.
Another less-useful feature an XSS library could provide might be automated scanning and filtering of input for HTML-special characters. Maybe this is the kind of feature you are objecting to? Certainly trying to handle HTML-injection (an output stage issue) at the input stage is a misguided approach that security tools should not be encouraging.
HTML encoding is only one aspect of making your output safe against XSS.
For example, if you output a string to JavaScript using this code:
<script>
var enteredName = '<%=EnteredNameVariableFromServer %>';
</script>
You will be wanting to hex entity encode the variable for proper insertion in JavaScript, not HTML encode. Suppose the value of EnteredNameVariableFromServer is O'leary, then the rendered code when properly encoded will become:
<script>
var enteredName = 'O\x27leary';
</script>
In this case this prevents the ' character from breaking out of the string and into the JavaScript code context, and also ensures proper treatment of the variable (HTML encoding it would result in the literal value of O'leary being used in JavaScript, affecting processing and display of the value).
Side note:
Also, that's not quite true of Stack Overflow. Certain characters still have special meanings like in the <!-- language: lang-none --> tag. See this post on syntax highlighting if you're interested.

Limiting HTML Input into Text Box

How do I limit the types of HTML that a user can input into a textbox? I'm running a small forum using some custom software that I'm beta testing, but I need to know how to limit the HTML input. Any suggestions?
i'd suggest a slightly alternative approach:
don't filter incoming user data (beyond prevention of sql injection). user data should be kept as pure as possible.
filter all outgoing data from the database, this is where things like tag stripping, etc.. should happen
keeping user data clean allows you more flexibility in how it's displayed. filtering all outgoing data is a good habit to get into (along the never trust data meme).
You didn't state what the forum was built with, but if it's PHP, check out:
http://htmlpurifier.org/
Library Features: Whitelist, Removal, Well-formed, Nesting, Attributes, XSS safe, Standards safe
Once the text is submitted, you could strip any/all tags that don't match your predefined set using a regex in PHP.
It would look something like the following:
find open tag (<)
if contents != allowed tag, remove tag (from <..>)
Parse the input provides and strip out all html tags that don't match exactly the list you are allowing. This can either be a complex regex, or you can do a stateful iteration through the char[] of the input string building the allowed input string and stripping unwanted attributes on tags like img.
Use a different code system (BBCode, Markdown)
Find some code online that already does this, to use as a basis for your implementation. For example Slashcode must perform this, so look for its implementation in the Perl and use the regexes (that I assume are there)
Regardless what you use, be sure to be informed of what kind of HTML content can be dangerous.
e.g. a < script > tag is pretty obvious, but a < style > tag is just as bad in IE, because it can invoke JScript commands.
In fact, any style="..." attribute can invoke script in IE.
< object > would be one more tag to be weary of.
PHP comes with a simple function strip_tag to strip HTML tags. It allows for certain tags to not be stripped.
Example #1 strip_tags() example
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text
Personally for a forum, I would use BBCode or Markdown because the amount of support and features provided such as live preview.