why need to escape string in html?

why need to escape string in html? - html

Well, I know that "correct" escaping will help to prevent SQL injection.
But I saw people escaping values in HTML
<input type="text" value =/"some/" /> <!-- some escaped, why? -->
Question is:
Why to escape in HTML?

<input type="text" value =/"some/" /> <!-- some escaped, why? -->
That is a syntax error. Don't do that.
Use character references to represent special characters (&, <, etc).
Why to escape in HTML?
(Assuming you use the correct syntax to do so): because some characters have special meaning in HTML. For example, you don't want a " (in the data) ending your attribute value prematurely since that can:
Lose data
Lose data but have it display in the page
Allow third parties to inject their JavaScript into your pages and steal data / redirect people to phishing sites / etc

Related

How to ignore apostrophes in html tag?

I'm sure someone will mark this as a duplicate question but no other answers worked for me.
I am using ruby and passing a variable into my html page. Let's say my variable "camp_name" is equal to "abc'd"
<%=camp_name%>
This outputs "abc'd" which is what I want.
<input type="text" class="form-control" name="campaign_name" required value='<%=camp_name%>'>
The value in the field is now "abc" because of the single apostrophe. How do i get it to ignore apostrophes? Thanks.

You can escape the variable to html entities:
camp_name.gsub("'", "&apos;")
You should do that for other characters as well, because, as mentioned by a comment, the user could simply insert an HTML tag in your page with your current script. Probably the most important ones are the following:
camp_name.gsub("<", "<")
camp_name.gsub(">", ">")

If you're using Rack (which would definitely be in use if you're using Rails or Sinatra, and it might be there even if you're not), there is a builtin for escaping HTML for just this kind of thing. Calling Rack::Utils#escape_html will replace ampersands, brackets, and quotes with their HTML entities (e.g. &apos; instead of ').
In your case, you'd want the following code:
<input type="text" class="form-control" name="campaign_name" required value='<%= Rack::Utils.escape_html(camp_name) %>'>
This would evaluate to:
<input type="text" class="form-control" name="campaign_name" required value='abc&apos;d'>
which is the proper way of displaying an apostrophe in HTML.
Just as a side note, displaying user-submitted text without escaping on a website is a very bad idea, because malicious users can add arbitrary Javascript that could render your site useless, add advertisements, and more. You should definitely get into the habit of escaping any text that users can submit before displaying it, either by gsubing manually or using a helper method like this.

XSS without HTML tags

It is possible to do a XSS attack if my input does not allow < and > characters?
Example: I enter <script>alert('this');</script> text
But it if I delete < and > the script is not text:
I enter script alert('this'); script text

Yes, it could still be possible.
e.g. Say your site injects user input into the following location
<img src="http://example.com/img.jpg" alt="USER-INPUT" />
If USER-INPUT is " ONLOAD="alert('xss'), this will render
<img src="http://example.com/img.jpg" alt="" ONLOAD="alert('xss')" />
No angle brackets necessary.
Also, check out OWASP XSS Experimental Minimal Encoding Rules.
For HTML body:
HTML Entity encode < &
specify charset in metatag to avoid UTF7 XSS
For XHTML body:
HTML Entity encode < & >
limit input to charset http://www.w3.org/TR/2008/REC-xml-20081126/#charsets
So within the body you can get away with only encoding (or removing) a subset of the characters usually recommended to prevent XSS. However, you cannot do this within attributes - the full XSS (Cross Site Scripting) Prevention Cheat Sheet recommends the following, and they do not have a minimal alternative:
Except for alphanumeric characters, escape all characters with the HTML Entity &#xHH; format, including spaces. (HH = Hex Value)
The is mainly though to cover the three types of ways of specifying the attribute value:
Unquoted
Single quoted
Double quoted
Encoding in such a way will prevent XSS in attribute values in all three cases.
Also be wary that UTF-7 attacks do not need angle bracket characters. However, unless the charset is explicitly set to UTF-7, this type of attack isn't possible in modern browsers.
+ADw-script+AD4-alert(document.location)+ADw-/script+AD4-
Also beware of attributes that allow URLs like href and ensure any user input is a valid web URL. Using a reputable library to validate the URL is highly recommended using an allow-list approach (e.g. if protocol not HTTPS then reject). Attempting to block sequences like javascript: is not sufficient.

If the user-supplied input is printed inside an HTML attribute, you also need to escape quotation marks or you would be vulnerable inputs like this:
" onload="javascript-code" foobar="
You should also escape the ampersand character as it generally needs to be encoded inside HTML documents and might otherwise destroy your layout.
So you should take care of the following characters: < > & ' "
You should however not completely strip them but replace them with the correct HTML codes i.e. < > & " '

Is it safe to display user input as input values without sanitization?

Say we have a form where the user types in various info. We validate the info, and find that something is wrong. A field is missing, invalid email, et cetera.
When displaying the form to the user again I of course don't want him to have to type in everything again so I want to populate the input fields. Is it safe to do this without sanitization? If not, what is the minimum sanitization that should be done first?
And to clearify: It would of course be sanitized before being for example added to a database or displayed elsewhere on the site.

No it isn't. The user might be directed to the form from a third party site, or simply enter data (innocently) that would break the HTML.
Convert any character with special meaning to its HTML entity.
i.e. & to &, < to <, > to > and " to " (assuming you delimit your attribute values using " and not '.
In Perl use HTML::Entities, in TT use the html filter, in PHP use htmlspecialchars. Otherwise look for something similar in the language you are using.

It is not safe, because, if someone can force the user to submit specific data to your form, you will output it and it will be "executed" by the browser. For instance, if the user is forced to submit '/><meta http-equiv="refresh" content="0;http://verybadsite.org" />, as a result an unwanted redirection will occur.

You cannot insert user-provided data into an HTML document without encoding it first. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data-values and never as HTML markup or Javascript code. Attacks against this mechanism are commonly known as "cross-site scripting", or simply "XSS".
If inserting into an HTML attribute value, then you must ensure that the string cannot cause the attribute value to end prematurely. You must also,of course, ensure that the tag itself cannot be ended. You can acheive this by HTML-encoding any chars that are not guaranteed to be safe.
If you write HTML so that the value of the tag's attribute appears inside a pair of double-quote or single-quote characters then you only need to ensure that you html-encode the quote character you chose to use. If you are not correctly quoting your attributes as described above, then you need to worry about many more characters including whitespace, symbols, punctuation and other ascii control chars. Although, to be honest, its arguably safest to encode these non-alphanumeric chars anyway.
Remember that an HTML attribute value may appear in 3 different syntactical contexts:
Double-quoted attribute value
<input type="text" value="**insert-here**" />
You only need to encode the double quote character to a suitable HTML-safe value such as "
Single-quoted attribute value
<input type='text' value='**insert-here**' />
You only need to encode the single quote character to a suitable HTML-safe value such as
Unquoted attribute value
<input type='text' value=**insert-here** />
You shouldn't ever have an html tag attribute value without quotes, but sometimes this is out of your control. In this case, we really need to worry about whitespace, punctuation and other control characters, as these will break us out of the attribute value.
Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and | (and more). [para lifted from OWASP]
Please remember that the above rules only apply to control injection when inserting into an HTML attribute value. Within other areas of the page, other rules apply.
Please see the XSS prevention cheat sheet at OWASP for more information

Yes, it's safe, provided of course that you encode the value properly.
A value that is placed inside an attribute in an HTML needs to be HTML encoded. The server side platform that you are using should have methods for this. In ASP.NET for example there is a Server.HtmlEncode method, and the TextBox control will automatically HTML encode the value that you put in the Text property.

Post newline/carriage return as hidden field value

I need to post multi-line data via a hidden field. The data will be viewed in a textarea after post. How can I post a newline/carriage return in the html form?
I've tried \r\n but that just posts the actual "\r\n" data
<input type="hidden" name="multiline_data" value="line one\r\nline two" />
Is there a way to do this?

Instead of using
<input type="hidden">
Try using
<textarea style="visibility:hidden;position:absolute;">

While new lines (Carriage Return & Line Feed) are technically allowed in <input>'s hidden state, they should be escaped for compatibility with older browsers. You can do this by replacing all Carriage Returns (\u000D or \r) and all Line Feeds (\u000A or \n) with proprietary strings that are recognized by your application to be a Carriage Return or New Line (and also escaped, if present in the original string).
Simply character entities don't work here, due to non-conforming browsers possibly knowing
and 
 are new lines and stripping them from the value.
Example
For example, in PHP, if you were to echo the passed value to a textarea, you would include the newlines (and unescaped string).
<textarea>Some text with a \ included
and a new line with \r\n as submitted value</textarea>
However, in PHP, if you were to echo the value to the value attribute of an <input> tag, you would escape the new lines with your proprietary strings (e.g. \r and \n), and escape any instances of your proprietary strings in the submitted value.
<input type="hidden" value="Some text with a \\ included\r\nand a new line\\r\\n as submitted value">
Then, before using the value elsewhere (inserting into a database, emailing, etc), be sure to unescape the submitted value, if necessary.
Reassurance
As further reassurance, I asked the WHATWG, and Ian Hickson, editor of the HTML spec currently, replied:
bfrohs Question about <input type=hidden> -- Are Line Feeds and Carriage Returns allowed in the value? They are specifically disallowed in Text state and Search state, but no mention is made for Hidden state. And, if not, is there an acceptable HTML solution for storing form data from a textarea?
Hixie yes, they are allowed // iirc // for legacy reasons you may wish to escape them though as some browsers normalise them away // i forget if we fixed that or not // in the spec
Source

Depends on the character set really but
should be linefeed and 
 should be carriage return. You should be able to use those in the value attribute.

You don't say what this is for or what technology you're using, but you need to be aware that you can't trust the hidden field to remain with value="line one
line two", because a hostile user can tamper with it before it gets sent back in the POST. Since you're putting the value in a <textarea> later, you will definitely be subject to, for example, cross site scripting attacks unless you verify and/or sanitize your "multiline_data" field contents before you write it back out.
When writing a value into a hidden field and reading it back, it's usually better to just keep it on the server, as an attribute of the session, or pageflow, or whatever your environment provides to do this kind of thing.

escaping html inside comment tags

escaping html is fine - it will remove <'s and >'s etc.
ive run into a problem where i am outputting a filename inside a comment tag eg. <!-- ${filename} -->
of course things can be bad if you dont escape, so it becomes:
<!-- <c:out value="${filename}"/> -->
the problem is that if the file has "--" in the name, all the html gets screwed, since youre not allowed to have <!-- -- -->.
the standard html escape doesnt escape these dashes, and i was wondering if anyone is familiar with a simple / standard way to escape them.

Definition of a HTML comment:
A comment declaration starts with <!, followed by zero or more comments, followed by >. A comment starts and ends with "--", and does not contain any occurrence of "--".
Of course the parsing of a comment is up to the browser.
Nothing strikes me as an obvious solution here, so I'd suggest you str_replace those double dashes out.

There is no good way to solve this. You can't just escape them because comments are read in plaintext. You will have to do something like put a space between the hyphens, or use some sort of code for hyphens (like [HYPHEN]).

Since it is obvoius that you cannnot directly display the '--'s you can either encode them or use the fn:escapeXml or fn:replace tags for appropriate replacements.
JSTL documentation

There's no universal working way to escape those characters in html unless the - characters are in multiples of four so if you do -- it wont work in firefox but ---- will work. So it all depends on the browser. For Example, looking at Internet Explorer 8, it is not a problem, those characters are escaped properly. The same goes for Googles Chrome... However Firefox even the latest browser (3.0.4), it doesn't handle escaping of these characters well.

You shouldn't be trying to HTML-escape, the contents of comments are not escapable and it's fine to have a bare ‘>’ or ‘&’ inside.
‘--’ is its own, unrelated problem and is not really fixable. If you don't need to recover the exact string, just do a replacement to get rid of them (eg. replace with ‘__’).
If you do need to get a string through completely unmolested to a JavaScript that will be reading the contents of the comment, use a string literal:
<!-- 'my-string' -->
which the script can then read using eval(commentnode.data). (Yes, a valid use for eval() at last!)
Then your escaping problem becomes how to put things in JS string literals, which is fairly easily solvable by escaping the ‘'’ and ‘-’ characters:
<!-- 'Bob\x27s\x2D\x2Dstring' -->
(You should probably also escape ‘<’, ‘&’ and ‘"’, in case you ever want to use the same escaping scheme to put a JS string literal inside a <script> block or inline handler.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008