$_GET textarea losing HTML characters - html

This is probably a really simple one but I can't find the answer anywhere!
I have a self submitting form with a textarea field like this
<textarea name="desc" wrap="1" cols="64" rows="5"></textarea>
When I type HTML characters in to the textarea field and hit the submit button, the HTML characters are being stripped and I can't see what is doing it!
Do $_GET variables have their HTML stripped automatically?
For example, If I type '[strong]Just[/strong] a test' in to the textarea, and echo the contents of 'desc' like this
echo(print_r($_GET));
I see $_GET['desc'] contains 'Just a test' rather than '[strong]Just[/strong] a test'.
Is this normal? If so, is there a way to keep the HTML so I can store it in a database?
I am using angle '<>' brackets rather than square '[]' in my code, but this forum converts them if I use them here!

Use CDATA
A CDATA section starts with "<![CDATA[" and ends with "]]>"
Source : http://www.w3schools.com/xml/xml_cdata.asp

Where are you printing the data too? The web will parse the html and if you're not looking at the page source you're only going to see the non-html parts.
However, you should be using print html_entities($_GET['desc']) to print out the contents with the html content properly encoded so it's printed instead of parsed.

Related

Encode hidden form field value

What method should I use to encode hidden a form field value? Assume that I am storing text like this:
This is a "really" long string with 'quotes' and special characters (~!##$%^&*()_+{}), which might have a random quote (") in the middle of it, making the HTML invalid.
We are using ASP.net to set the value:
<input type="hidden" value="<%= Model.UnencodedTextData %>" name="askingForTrouble" />
I believe if we HTML encoded it, it would solve the problem, but this form will be posted to another application, which we do not have control over. So will the receiving application (Marketo) automatically know how to decode this?
Thank you.
Marketo developer evangelist here. Before posting to Marketo, it is best to use HTML URL encoding for special characters. For example, the JavaScript code sample below would URL encode "&" and "%" characters.
function htmlEscape(str) {
return String(str)
.replace(/&/g, '%26')
.replace(/%/g, '%20');
}

Using ruby variables as html code

I would expect that the following:
<div style="padding-top:90px;"><%= u.one_line %></div>
simply pulls whatever is in u.one_line (which in my case is text from database), and puts it in the html file. The problem I'm having is that sometimes, u.one_line has text with formatted html in it (just line breaks). For example sometimes:
u.one_line is "This is < / b r > awesome"
and I would like the page to process the fact that there's a line break in there... I had to put it with spaces up ^^^ here because the browser would not display it otherwise on stackoverflow. But on my server it's typed correctly, unfortunately instead of the browser processing the line break, it prints out the "< / b r>" part...
I hope you guys understand what I mean :(?
always remember to use raw or html_safe for html output in rails because rails by default auto-escapes html content for protecting against XSS attacks.
for more see
When to use raw() and when to use .html_safe

How to stop an html TEXTAREA from decoding html entities

I have a strange problem:
In the database, I have a literal ampersand lt semicolon:
<div
whenever its printed into a html textarea tag, the source code of the page shows the > as >.
How do I stop this decoding?
You can't stop entities being decoded in a textarea since the content of a textarea is not (unlike a script or style element) intrinsic CDATA, even though error recovery may sometimes give the impression that it is.
The definition of the textarea element is:
<!ELEMENT TEXTAREA - - (#PCDATA) -- multi-line text field -->
i.e. it contains PCDATA which is described as:
Document text (indicated by the SGML construct "#PCDATA"). Text may contain character references. Recall that these begin with & and end with a semicolon (e.g., Hergé's adventures of Tintin contains the character entity reference for the e acute character).
This means that when you type (the invalid HTML of) "start of tag" (<) the browser corrects it to "less than sign" (<) but when you type "start of entity" (&), which is allowed, no error correction takes place.
You need to write what you mean. If you want to include some HTML as data then you must convert any character with special meaning to its respective character reference.
If the data is:
<div
Then the HTML must be:
<textarea>&lt;div</textarea>
You can use the standard functions for converting this (e.g. PHP's htmlspecialchars or Perl's HTML::Entities module).
NB 1: If you were using XHTML[2] (and really using it, it doesn't count if you serve it as text/html) then you could use an explicit CDATA block:
<textarea><![CDATA[<div]]></textarea>
NB 2: Or if browsers implemented HTML 4 correctly
Ok , but the question is . why it decodes them anyway ? assuming i've added & , save the textarea , ti will be saved < , but displayed as < , saving it again will convert it back to < (but it will remain < in the database) , saving again will save it a < in the database , why the textarea decodes it ?
The server sends (to the browser) data encoded as HTML.
The browser sends (to the server) data encoded as application/x-www-form-urlencoded (or multipart/form-data).
Since the browser is not sending the data as HTML, the characters are not represented as HTML entities.
If you take the data received from the client and then put it into an HTML document, then you must encode it as HTML first.
In PHP, this can be done using htmlentities(). Example below.
<?php
$content = "This string contains the TM symbol: ™";
print "<textarea>". htmlentities($content) ."</textarea>";
?>
Without htmlentities(), the textarea would interpret and display the TM symbol (™) instead of "™".
http://php.net/manual/en/function.htmlentities.php
You have to be sure that this is rendered to the browser:
<textarea name="somename">&lt;div</textarea>
Essentially, this means that the & in < has to be html encoded to &. How to do it will depend on the technologies you're using.
UPDATE: Think about it like this. If you want to display <div> inside a textarea, you'll have to encode <> because otherwise, <div> would be a normal HTML element to the browser:
<textarea name="somename"><div></textarea>
Having said this, if you want to display <div> inside a textarea, you'll have to encode & again, because the browser decodes HTML entities when rendering HTML. It has nothing to do with your database.
You can serve your DB-content from a separate page and then place it in the textarea using a Javascript (jQuery) Ajax-call:
request = $.ajax
({
type: "GET",
url: "url-with-the-troubled-content.php",
success: function(data)
{
document.getElementById('id-of-text-area').value = data;
}
});
Explained at
http://www.endtask.net/how-to-prevent-a-textarea-element-from-decoding-html-entities/
I had the same problem and I just made two replacements on the text to show from the database before letting it into the text area:
myString = Replace(myString, "&", "&")
myString = Replace(myString, "<", "<")
Replace n:o 1 to trick the textarea to show the codes.
replace n:o 2: Without this replacement you can not show the word "" inside the textarea (it would end the textarea tag).
(Asp / vbscript code above, translate to a replace method of your language choice)
I found an alternative solution for reading and working with in-browser, simply read the element's text() using jQuery, it returns the characters as display characters and allows me to write from a textarea to a div's innerHTML using the property via html()...
With only JS and HTML...
...to answer the actual question, with a bare-minimal example:
<textarea id=myta></textarea>
<script id=mytext type=text/plain>
™
</script>
<script> myta.value = mytext.innerText; </script>
Explanation:
Script tags do not render html nor entities. By storing text in a script tag, it will remain unadultered-- problem is it will try to execute as JavaScript. So we use an empty textarea and store the text in a script tag (here, the first one).
To prevent that, we change the mime-type to text/plain instead of it's default, which is text/javascript. This will prevent it from running.
Then to populate the textarea, we copy the script tag's content to it (here done in the second script tag).
The only caveats I have found with this are you have to use JavaScript and you cannot include script tags directly in it.

populating a textarea with special characters

I'm populating a textarea with previous input of a user. This is pulled from a database and set as the content of the textarea server side.
It seems we are having an issue with a typo and a combination of special characters. if the user inputs &#6 originally, when I try to populate my textarea with that it just renders a little square like its interpreting the character encoded value.
Creating a HTML file with the following demonstrates my issue.
<textarea name"mytextarea">some text &#5 some more text </textarea
this is a typo, the user intended to enter #5 & #6 so a fix for this is simply to ensure when the user puts an ampersand in that I have a space on either side of it before I display it in the textarea. Its just a special character issue backwards from what i'm use to seeing.
I'm curious if there is a way to get the text area to display the characters like the user typed it and preserve that through form submission. To save the over head of having to parse or html encode the text before putting into the textarea.
Thanks,
Muchly
Inside a textarea, you need to convert the following characters into their HTML entities:
& => &
> => >
< => <
That way, &#5 would become &#5. Visually, to the user, it would remain &#5.
You are not specifying the server side language you're using. In PHP, the correct function would be htmlspecialchars()
escape the & as &

How to stop automatic HTML Encoding when assigning to HTML Input Fields VB.NET

I have to submit a HTML form to a 3rd party website and one of the hidden fields is an XML string. The XML needs escaping before it is sent to the 3rd party.
However when I add the plain XML to the form field it semi-escapes it for me. So then when I use HTMLEncode myself part of the XML is double-escaped. How do I prevent the automatic escaping that appears to becoming from .NET.
Or even better how else can send the escaped XML via the hidden field.
XML
<systemCode>APP</systemCode>
Basic assigning to hidden input field
<systemCode>APP</systemCode>
When I HTML Encode it as well
&lt;systemCode&gt;APP&lt;/systemCode&gt;
I can see what's happening - but I don't know how to prevent it?
Don't use HTMLEncode as well ! Use it alone !
Something like:
'Setting value:
hdnField.Value = Server.HtmlEncode("<systemCode>APP</systemCode>")
'Outputs: &lt;systemCode&gt;APP&lt;/systemCode&gt;
'Retrieving encoded value:
Dim escaped as string = Request.Form("hdnField")
'Retrieves: <systemCode>APP</systemCode>
'Retrieving decoded value:
Dim myValue As String = Server.HtmlDecode(Request.Form("hdnField"))
'Retrieves: "<systemCode>APP</systemCode>"
In the end I used a literal and then HTMLEncoding the XML string before assigned a HTML form variable to the literal text field. A little bit like below:
portalReq.Text = "<input type=""hidden"" name=""portalReq"" value='" & HTMLENCODE(RequestXML) & "' />"
Not elegant but it's circumventing the problem.
You don't need to worry about the HTML output. Only worry about what data is submitted in the form. It doesn't matter whether the HTML is fully escaped or partially escaped - the same data gets submitted either way.
Both of these fields:
<input name="xml" value="<systemCode>APP</systemCode>" />
<input name="xml" value="<systemCode>APP</systemCode>" />
Get submitted as:
xml=%3CsystemCode%3EAPP%3C%2FsystemCode%3E
This is language agnostic - it is browser behavior. When the browser parses the HTML, it will actually normalize both fields to have the same html. If you view source of the page, you will see that the source HTML differs between the inputs, but if you read the form.innerHTML value, you'll see that the parsed HTML is identical.
Demo:
http://jsfiddle.net/gilly3/Xdj5E/