HTML form input with special chars - html

Whenever I try to submit a text containing special chars like &, via a HTML form in a <textarea>, some chars are lost. (tested in Chrome browser)
So far, I could not find a form attribute to change this behaviour.
How to force the form to submit the input without this loss?

The children of a textarea are bog-standard text nodes. The element doesn't perform any automatic CDATA magic (like a script does).
If you have <textarea>&</textarea> then that means "A textarea element with a default value of 'an ampersand'".
If you want "&" to be the submitted data, then you have to represent the & with a character reference, just like (almost) anywhere else in HTML: <textarea>&amp;</textarea>
OTOH, if you are typing & and the amp; part is being lost, then it is probably because you are taking the value of that form control and treating it as HTML when you want to treat it as text. How you treat it as text instead of HTML depends on what you are using to process the data.

Related

How to stop an html TEXTAREA from decoding html entities

I have a strange problem:
In the database, I have a literal ampersand lt semicolon:
<div
whenever its printed into a html textarea tag, the source code of the page shows the > as >.
How do I stop this decoding?
You can't stop entities being decoded in a textarea since the content of a textarea is not (unlike a script or style element) intrinsic CDATA, even though error recovery may sometimes give the impression that it is.
The definition of the textarea element is:
<!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field -->
i.e. it contains PCDATA which is described as:
Document text (indicated by the SGML construct "#PCDATA"). Text may contain character references. Recall that these begin with & and end with a semicolon (e.g., Hergé's adventures of Tintin contains the character entity reference for the e acute character).
This means that when you type (the invalid HTML of) "start of tag" (<) the browser corrects it to "less than sign" (<) but when you type "start of entity" (&), which is allowed, no error correction takes place.
You need to write what you mean. If you want to include some HTML as data then you must convert any character with special meaning to its respective character reference.
If the data is:
<div
Then the HTML must be:
<textarea>&lt;div</textarea>
You can use the standard functions for converting this (e.g. PHP's htmlspecialchars or Perl's HTML::Entities module).
NB 1: If you were using XHTML[2] (and really using it, it doesn't count if you serve it as text/html) then you could use an explicit CDATA block:
<textarea><![CDATA[<div]]></textarea>
NB 2: Or if browsers implemented HTML 4 correctly
Ok , but the question is . why it decodes them anyway ? assuming i've added & , save the textarea , ti will be saved < , but displayed as < , saving it again will convert it back to < (but it will remain < in the database) , saving again will save it a < in the database , why the textarea decodes it ?
The server sends (to the browser) data encoded as HTML.
The browser sends (to the server) data encoded as application/x-www-form-urlencoded (or multipart/form-data).
Since the browser is not sending the data as HTML, the characters are not represented as HTML entities.
If you take the data received from the client and then put it into an HTML document, then you must encode it as HTML first.
In PHP, this can be done using htmlentities(). Example below.
<?php
$content = "This string contains the TM symbol: ™";
print "<textarea>". htmlentities($content) ."</textarea>";
?>
Without htmlentities(), the textarea would interpret and display the TM symbol (™) instead of "™".
http://php.net/manual/en/function.htmlentities.php
You have to be sure that this is rendered to the browser:
<textarea name="somename">&lt;div</textarea>
Essentially, this means that the & in < has to be html encoded to &. How to do it will depend on the technologies you're using.
UPDATE: Think about it like this. If you want to display <div> inside a textarea, you'll have to encode <> because otherwise, <div> would be a normal HTML element to the browser:
<textarea name="somename"><div></textarea>
Having said this, if you want to display <div> inside a textarea, you'll have to encode & again, because the browser decodes HTML entities when rendering HTML. It has nothing to do with your database.
You can serve your DB-content from a separate page and then place it in the textarea using a Javascript (jQuery) Ajax-call:
request = $.ajax
({
type: "GET",
url: "url-with-the-troubled-content.php",
success: function(data)
{
document.getElementById('id-of-text-area').value = data;
}
});
Explained at
http://www.endtask.net/how-to-prevent-a-textarea-element-from-decoding-html-entities/
I had the same problem and I just made two replacements on the text to show from the database before letting it into the text area:
myString = Replace(myString, "&", "&")
myString = Replace(myString, "<", "<")
Replace n:o 1 to trick the textarea to show the codes.
replace n:o 2: Without this replacement you can not show the word "" inside the textarea (it would end the textarea tag).
(Asp / vbscript code above, translate to a replace method of your language choice)
I found an alternative solution for reading and working with in-browser, simply read the element's text() using jQuery, it returns the characters as display characters and allows me to write from a textarea to a div's innerHTML using the property via html()...
With only JS and HTML...
...to answer the actual question, with a bare-minimal example:
<textarea id=myta></textarea>
<script id=mytext type=text/plain>
™
</script>
<script> myta.value = mytext.innerText; </script>
Explanation:
Script tags do not render html nor entities. By storing text in a script tag, it will remain unadultered-- problem is it will try to execute as JavaScript. So we use an empty textarea and store the text in a script tag (here, the first one).
To prevent that, we change the mime-type to text/plain instead of it's default, which is text/javascript. This will prevent it from running.
Then to populate the textarea, we copy the script tag's content to it (here done in the second script tag).
The only caveats I have found with this are you have to use JavaScript and you cannot include script tags directly in it.

populating a textarea with special characters

I'm populating a textarea with previous input of a user. This is pulled from a database and set as the content of the textarea server side.
It seems we are having an issue with a typo and a combination of special characters. if the user inputs &#6 originally, when I try to populate my textarea with that it just renders a little square like its interpreting the character encoded value.
Creating a HTML file with the following demonstrates my issue.
<textarea name"mytextarea">some text &#5 some more text </textarea
this is a typo, the user intended to enter #5 & #6 so a fix for this is simply to ensure when the user puts an ampersand in that I have a space on either side of it before I display it in the textarea. Its just a special character issue backwards from what i'm use to seeing.
I'm curious if there is a way to get the text area to display the characters like the user typed it and preserve that through form submission. To save the over head of having to parse or html encode the text before putting into the textarea.
Thanks,
Muchly
Inside a textarea, you need to convert the following characters into their HTML entities:
& => &
> => >
< => <
That way, &#5 would become &#5. Visually, to the user, it would remain &#5.
You are not specifying the server side language you're using. In PHP, the correct function would be htmlspecialchars()
escape the & as &

Is it safe to display user input as input values without sanitization?

Say we have a form where the user types in various info. We validate the info, and find that something is wrong. A field is missing, invalid email, et cetera.
When displaying the form to the user again I of course don't want him to have to type in everything again so I want to populate the input fields. Is it safe to do this without sanitization? If not, what is the minimum sanitization that should be done first?
And to clearify: It would of course be sanitized before being for example added to a database or displayed elsewhere on the site.
No it isn't. The user might be directed to the form from a third party site, or simply enter data (innocently) that would break the HTML.
Convert any character with special meaning to its HTML entity.
i.e. & to &, < to <, > to > and " to " (assuming you delimit your attribute values using " and not '.
In Perl use HTML::Entities, in TT use the html filter, in PHP use htmlspecialchars. Otherwise look for something similar in the language you are using.
It is not safe, because, if someone can force the user to submit specific data to your form, you will output it and it will be "executed" by the browser. For instance, if the user is forced to submit '/><meta http-equiv="refresh" content="0;http://verybadsite.org" />, as a result an unwanted redirection will occur.
You cannot insert user-provided data into an HTML document without encoding it first. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data-values and never as HTML markup or Javascript code. Attacks against this mechanism are commonly known as "cross-site scripting", or simply "XSS".
If inserting into an HTML attribute value, then you must ensure that the string cannot cause the attribute value to end prematurely. You must also,of course, ensure that the tag itself cannot be ended. You can acheive this by HTML-encoding any chars that are not guaranteed to be safe.
If you write HTML so that the value of the tag's attribute appears inside a pair of double-quote or single-quote characters then you only need to ensure that you html-encode the quote character you chose to use. If you are not correctly quoting your attributes as described above, then you need to worry about many more characters including whitespace, symbols, punctuation and other ascii control chars. Although, to be honest, its arguably safest to encode these non-alphanumeric chars anyway.
Remember that an HTML attribute value may appear in 3 different syntactical contexts:
Double-quoted attribute value
<input type="text" value="**insert-here**" />
You only need to encode the double quote character to a suitable HTML-safe value such as "
Single-quoted attribute value
<input type='text' value='**insert-here**' />
You only need to encode the single quote character to a suitable HTML-safe value such as ‘
Unquoted attribute value
<input type='text' value=**insert-here** />
You shouldn't ever have an html tag attribute value without quotes, but sometimes this is out of your control. In this case, we really need to worry about whitespace, punctuation and other control characters, as these will break us out of the attribute value.
Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and | (and more). [para lifted from OWASP]
Please remember that the above rules only apply to control injection when inserting into an HTML attribute value. Within other areas of the page, other rules apply.
Please see the XSS prevention cheat sheet at OWASP for more information
Yes, it's safe, provided of course that you encode the value properly.
A value that is placed inside an attribute in an HTML needs to be HTML encoded. The server side platform that you are using should have methods for this. In ASP.NET for example there is a Server.HtmlEncode method, and the TextBox control will automatically HTML encode the value that you put in the Text property.

HTML: <textarea>-Tag: How to correctly escape HTML and JavaScript content displayed in there?

I have a HTML Tag <textarea>$FOO</textarea> and the $FOO Variable will be filled with arbitrary HTML and JavaScript Content, to be displayed and edited within the textarea. What kind of "escaping" do I neet to apply to $FOO?
I first tought of escaping it HTML but this didnt work (as I will then get shown not the original HTML Code of $FOO but rather the escaped content. This is of course not what I want: I want to be displayed the unescaped HTML/JS Content of the variable...
Is it impossible to display HTML Content within a <textarea> tag and also allow it to be editable as full HTML?
thanks
jens
I first tought of escaping it HTML
Yes, that's right. The contents of a <textarea> are no different from the contents of any other element like a <span> or a <p>: if you want to put some text inside you must HTML-escape any < or & characters in it to < and & respectively.
Browsers do tend to give you more leeway with fault markup in <textarea>​s, in that the fallback for invalid unescaped < symbols is to render them as text instead of tags, but that doesn't make it any less wrong or dangerous (for XSS).
but this didnt work
Please post what you did that didn't work. HTML-escaping is definitely the right thing.
You need to replace the special character of HTML with character references (either numerical character references or entity references), in textarea, at least &, < and >.

<input> multi-line capable via CSS

Is there a way to get an <input />-field in HTML to wrap lines if the text is longer than the field using CSS? I don't want to use <textarea /> as I want to avoid users entering hard line-breaks by pressing enter.
No, sorry. <input type=text> is single line by definition. See the W3C document Forms in HTML Documents:
text
Creates a single-line text input control.
Using Dojo's Dijit TextArea form control, based off TextArea, you can have an input field which begins as a single line and expands as the user adds to it.
See its documentation.
You can't do what you want with CSS alone, but you could use JavaScript to prevent the user from entering line breaks in a <textarea> field.
Look at this,
http://www.echoecho.com/htmlforms08.htm
The wrap options are the most tricky part of text areas.
If you turn wrap off the text is handled as one long sequence of text without linebreaks.
If you set it to virtual the text appears on your page as if it recognized linebreaks - but when the form is submitted the linebreaks are turned off.
If you set it to physical the text is submitted exactly as it appears on the screen - linebreaks included.
Your best bet is use a textarea (with autogrow capabilities if you like), and then strip out the new lines when the form is submitted. Using php it would be something like this:
$text = str_replace(array("\n","\r"),'',$_POST['text_field']);
This would have the desired effect of blocking newline characters. As others have pointed out it's not really possible to get multi-line input in an input field.