How to input a null character into a web form? - html

I am testing an ASP.NET web form which needs to filter out null characters from input.
To test this functionality, how can I actually type a null character in the html form? I've tried Alt+0 but it does not work.
I know I can do it in a GET request by using "%00" in the URL. However, I want to do it in a form POST.

I was able to do this using TamperData Firefox plugin.
https://addons.mozilla.org/en-US/firefox/addon/tamper-data/
When given the Tamper Popup I typed "%00" in the Post Parameter Value field.
Still, I cannot find a way to type a null character just using the keyboard.

my suggestion would be to write the null character to an html element you would like it in
ex:
document.getElementByID("my_null_tag").innerHTML += "\0"

You can use an HTML entity. Not fully sure of how many zeroes are required but:

For an arbitrary Unicode character it's easier to use the hexadecimal notation. E.g., ㍝ prints ㍝ wich is U+335D.
Update: This question is pretty tricky indeed. I've managed to insert a null character inside an HTML document (using a server-side script and verified with an hexadecimal editor). As expected, there is no difference with the HTML entity, which can be either  or . But the browser does not send the character in the post request (tested with Firefox and Firebug): it sends %EF%BF%BD, which is REPLACEMENT CHARACTER' (U+FFFD). Exactly, it sends the interrogation mark in a box that's used to print the null in the document (given that null is not printable).
My guess is that your testers need to script the task.

<html>
<body onload='document.forms[0].submit();'>
<form method="post" action="http://localhost/test.asp" >
<input type="hidden" name="param" value="%00">
</form>
</body>
</html>

Use AJAX to submit the form. I.e. paste something like this in the address bar instead:
javascript:(function(){var xhr=new XMLHttpRequest();xhr.onreadystatechange=function(){if(this.readyState==4){document.write(this.responseText)}};xhr.open("POST",'/form',true);xhr.send("\0");})()
Here it is unwrapped, so you can actually see it:
(function() {
var xhr = new XMLHttpRequest();
xhr.onreadystatechange = function () {
if (this.readyState == 4) {
document.write(this.responseText);
}
};
xhr.open("POST", '/form', true);
xhr.send("\0");
})()
There are some limitations to this, of course, since it just blindly replaces the content of the current page with whatever it got back. But this might be enough to tell if something went wrong or not.

This answer from Jonathan H. didn't really work for me, but it did put me on the right path. What did work was to set the .value attribute of the HTML element (via the browser dev tools console), and then submit the form.
document.getElementByID("element-id").value = "some text with a null byte\0"

A Latin character O with a slash through it is often acceptable as a symbol for Null/Nul (or blank if you prefer). If you are using database-driven applications you'll want to sanitize this symbol to replace with null or blank, depending on your needs.
How to do it:
ALT + 0216 in your favorite editor should give you Ø - which is, for this display purpose, Null.
Then, as an example and a best practice, you'll sanitize the form submission before it gets passed to the database.
Example Case:
If feeding a database-driven PHP site, your sanitation of this specific character this might look something like...
$dbstring = str_replace(Ø,NULL); and the value will be NULL
Or try...
$dbstring = str_replace(Ø,""); and the value will be BLANK
! alternatively, you may want to do this display with the HTML entity codes, which I mention below.
Alt + 0216 explained
If using a normal 104-key keyboard with English (American) set as your primary language, hold down the ALT key, and while holding, use the NUMBER PAD keys to enter 0216. Then release the ALT key, and your character should appear.
*This is primarily a Windows method. Macintosh, (X)nix and bsd users, you will probably be stuck using the HTML entity codes.
Special note: use of the top-of-keyboard numbers doesn't work.
If you are on a laptop or other device that makes this difficult or impossible. try an alternative: Use the HTML entity codes:
Ø = Ø (usually for null)
ø = ø (could be used, but the upper-case version seems more appropriate.)
Other thoughts that might be helpful: Nul vs Null vs Blank - They fundamentally mean the same thing, but different programming languages use or require these differently. (There's others too, like NULPTR for Null Pointer.)
The point I'm trying to make with NUL/NULL, is that the submitted variable 'doesn't exist' or simply 'wasn't there at all'. In most contexts, you can simply call this "Null" and be understood.
Some database systems treat Blank and NULL as the same thing. Others, Blank is actually an empty value, whereas NULL is no value at all (like mentioned above.)
Hopefully this helps in building the view you're looking for.

Related

ReactJS - How to render carriage returns correctly when returned in Ajax call

In ReactJS, how is it possible to render carriage returns that may be submitted by the user in a textarea control. The content containing the carriage returns is retrieved by an Ajax call which calls an API that needs to convert the \r\n characters to <br> or something else. And then, I have a div element in which the content should be rendered. I tried the following Ajax responses:
{
"Comment" : "Some stuff followed by line breaks<br/><br/><br/><br/>And more stuff.",
}
and
{
"Comment" : "Some stuff followed by line breaks\n\n\nAnd more stuff.",
}
But instead of rendering the carriage returns in the browser, it renders the br tags as plain text in the first case and \n character as space in the second case.
What's the recommended approach here? I'm guessing I should steer clear of the scary dangerouslySetInnerHTML property? For example the following would actually work but there must a safer way of handling carriage returns:
<div className="comment-text" dangerouslySetInnerHTML={{__html: comment.Comment}}></div>
dangerouslySetInnerHTML is what you want. The name is meant to be scary, because using it presents a risk for XSS attacks, but essentially it's just a reminder that you need to sanitize user inputs (which you should do anyway!)
To see an XSS attack in action while using dangerouslySetInnerHTML, try having a user save a comment whose text is:
Just an innocent comment.... <script>alert("XSS!!!")</script>
You might be surprised to see that this comment will actually create the alert popup. An even more malicious user might insert JS to download a virus when anyone views their comment. We obviously can't allow that.
But protecting against XSS is pretty simple. Sanitization needs to be done server side, but there are plenty of packages available that do this exact task for any conceivable serverside setup.
Here's an example of a good package for Rails, for example: https://github.com/rgrove/sanitize
Just be sure whichever sanitizer you pick uses a "whitelist" sanitization method, not a "blacklist" one.
If you're using DOM, ensure you're using innerHTML to add text. However, in react world, more favourable is to use https://www.npmjs.com/package/html-to-react
Also, browser only understands HTML and won't interpret \n as line break. You should replace that with <br/> before rendering.

Why do I need XSS library while I can use Html-encode?

I'm trying to understand why do I need to use XSS library when I can merely do HtlEncode when sending data from server to client ...?
For example , here in Stackoverflow.com - the editor - all the SO tem neads to do is save the user input and display it with html encode.
This way - there will never going to be a HTML tag - which is going to be executed.
I'm probably wrong here -but can you please contradict my statement , or exaplain?
For example :
I know that IMG tag for example , can has onmouseover , onload which a user can do malicious scripts , but the IMG won't event run in the browser as IMG since it's <img> and not <img>
So - where is the problem ?
HTML-encoding is itself one feature an “XSS library” might provide. This can be useful when the platform doesn't have a native HTML encoder (eg scriptlet-based JSP) or the native HTML encoder is inadequate (eg not escaping quotes for use in attributes, or ]]> if you're using XHTML, or #{} if you're worried about cross-origin-stylesheet-inclusion attacks).
There might also be other encoders for other situations, for example injecting into JavaScript strings in a <script> block or URL parameters in an href attribute, which are not provided directly by the platform/templating language.
Another useful feature an XSS library could provide might be HTML sanitisation, for when you want to allow the user to input data in HTML format, but restrict which tags and attributes they use to a safe whitelist.
Another less-useful feature an XSS library could provide might be automated scanning and filtering of input for HTML-special characters. Maybe this is the kind of feature you are objecting to? Certainly trying to handle HTML-injection (an output stage issue) at the input stage is a misguided approach that security tools should not be encouraging.
HTML encoding is only one aspect of making your output safe against XSS.
For example, if you output a string to JavaScript using this code:
<script>
var enteredName = '<%=EnteredNameVariableFromServer %>';
</script>
You will be wanting to hex entity encode the variable for proper insertion in JavaScript, not HTML encode. Suppose the value of EnteredNameVariableFromServer is O'leary, then the rendered code when properly encoded will become:
<script>
var enteredName = 'O\x27leary';
</script>
In this case this prevents the ' character from breaking out of the string and into the JavaScript code context, and also ensures proper treatment of the variable (HTML encoding it would result in the literal value of O'leary being used in JavaScript, affecting processing and display of the value).
Side note:
Also, that's not quite true of Stack Overflow. Certain characters still have special meanings like in the <!-- language: lang-none --> tag. See this post on syntax highlighting if you're interested.

How and when to use Html encode

I've recently learned that i shouldn't store html encoded data in the database, but i should rather html encode the data that is shown on the screen for the user.
No big deal, i have to fix my database records and make some code changes.
But my question is, when should I use html encode and when shouldn't I.
For example, within a html table, I'm writing directly from the database to the inner HTML of a column. Without encoding this would be dangerous, I get that.
What about when setting the value of a textbox. It seems to work without having to html encode the value. But I'm not sure why. This is what the textbox look like:
<input type="textbox" value="xxx"/>
But when setting the value to: "/><p style="font-size: 100px;">testing hack</p>
The html source will be:
<input type="textbox" value=""/><p style="font-size: 100px;">testing hack</p>
It will look fine though when viewed so the p-tag isn't working as intended by the "hack".
Is anyone getting what I'm trying to aim at :) ?
If I do try to html encode something i set to a textbox value, the result will display "&lt" and so on, which is not what I intended.
So in short: Should I only html encode stuff that is set to the innerHtml of html-controls, and not when setting the value of, for example, textboxes?
The answer came out of thejh's and my discussion in the comment to the question. I was not sure what to mark as answer so I decided to answer my own question. I hope that's ok.
It seems like when setting a value of an attribute (like the textbox's "value") .NET automatically html encodes the value so there is no need to do this by yourself.
When setting a html controls inner HTML though, it's important that you do html encode the value.
Thanks Thejh, sorry I couldn't up vote anything u wrote.
edit: I can't mark this as the answer for another 2 days.
in the case of
<input type="textbox" value="xxx"/>
'xxx' is an attribute, and you should use a different encoding. In ASP.NET it's HtmlAttributeEncode for example.
For HTML attributes, encode backslashes and double quotes.
Replace every \ by \\
Replace every " by \"
Oh, by the way: Sometimes PHP does this for you, see here.
This feature has been DEPRECATED as of PHP 5.3.0. Relying on this feature is highly discouraged.

What characters are allowed in the HTML Name attribute inside input tag?

I have a PHP script that will generate <input>s dynamically, so I was wondering if I needed to filter any characters in the name attribute.
I know that the name has to start with a letter, but I don't know any other rules. I figure square brackets must be allowed, since PHP uses these to create arrays from form data. How about parentheses? Spaces?
Note, that not all characters are submitted for name attributes of form fields (even when using POST)!
White-space characters are trimmed and inner white-space characters as well the character . are replaced by _.
(Tested in Chrome 23, Firefox 13 and Internet Explorer 9, all Win7.)
Any character you can include in an [X]HTML file is fine to put in an <input name>. As Allain's comment says, <input name> is defined as containing CDATA, so the only things you can't put in there are the control codes and invalid codepoints that the underlying standard (SGML or XML) disallows.
Allain quoted W3 from the HTML4 spec:
Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire ISO10646 character set.
However this isn't really true in practice.
The theory is that application/x-www-form-urlencoded data doesn't have a mechanism to specify an encoding for the form's names or values, so using non-ASCII characters in either is “not specified” as working and you should use POSTed multipart/form-data instead.
Unfortunately, in the real world, no browser specifies an encoding for fields even when it theoretically could, in the subpart headers of a multipart/form-data POST request body. (I believe Mozilla tried to implement it once, but backed out as it broke servers.)
And no browser implements the astonishingly complex and ugly RFC2231 standard that would be necessary to insert encoded non-ASCII field names into the multipart's subpart headers. In any case, the HTML spec that defines multipart/form-data doesn't directly say that RFC2231 should be used, and, again, it would break servers if you tried.
So the reality of the situation is there is no way to know what encoding is being used for the names and values in a form submission, no matter what type of form it is. What browsers will do with field names and values that contain non-ASCII characters is the same for GET and both types of POST form: it encodes them using the encoding the page containing the form used. Non-ASCII GET form names are no more broken than everything else.
DLH:
So name has a different data type for than it does for other elements?
Actually the only element whose name attribute is not CDATA is <meta>. See the HTML4 spec's attribute list for all the different uses of name; it's an overloaded attribute name, having many different meanings on the different elements. This is generally considered a bad thing.
However, typically these days you would avoid name except on form fields (where it's a control name) and param (where it's a plugin-specific parameter identifier). That's only two meanings to grapple with. The old-school use of name for identifying elements like <form> or <a> on the page should be avoided (use id instead).
The only real restriction on what characters can appear in form control names is when a form is submitted with GET
"The "get" method restricts form data set values to ASCII characters." reference
There's a good thread on it here.
While Allain's comment did answer OP's direct question and bobince provided some brilliant in-depth information, I believe many people come here seeking answer to more specific question: "Can I use a dot character in form's input name attribute?"
As this thread came up as first result when I searched for this knowledge I guessed I may as well share what I found.
Firstly, Matthias' claimed that:
character . are replaced by _
This is untrue. I don't know if browser's actually did this kind of operation back in 2013 - though, I doubt that. Browsers send dot characters as they are(talking about POST data)! You can check it in developer tools of any decent browser.
Please, notice that tiny little comment by abluejelly, that probably is missed by many:
I'd like to note that this is a server-specific thing, not a browser thing. Tested on Win7 FF3/3.5/31, IE5/7/8/9/10/Edge, Chrome39, and Safari Windows 5, and all of them sent " test this.stuff" (four leading spaces) as the name in POST to the ASP.NET dev server bundled with VS2012.
I checked it with Apache HTTP server(v2.4.25) and indeed input name like "foo.bar" is changed to "foo_bar". But in a name like "foo[foo.bar]" that dot is not replaced by _!
My conclusion: You can use dots but I wouldn't use it as this may lead to some unexpected behaviours depending on HTTP server used.
Do you mean the id and name attributes of the HTML input tag?
If so, I'd be very tempted to restrict (or convert) allowed "input" name characters into only a-z (A-Z), 0-9 and a limited range of punctuation (".", ",", etc.), if only to limit the potential for XSS exploits, etc.
Additionally, why let the user control any aspect of the input tag? (Might it not ultimately be easier from a validation perspective to keep the input tag names are 'custom_1', 'custom_2', etc. and then map these as required.)

Ways to remove the autocomplete of an input box

I need a text input field which does not use the autocomplete function - If a user has submitted the form before, his previous submissions should -not- appear as he types into the form again, even if he is typing the same thing again. As far as I can tell, there are a few ways to do this:
1. <form autocomplete="off">
However, I believe this is a proprietary tag, and I am not sure how compatible it is across browsers
2. Give the input field a random 'name'
One could even use JS to set the name back to an expected value before submission. However, if the user does not have JS installed, you'd need another hidden input with the name - and the php code on the other side gets messy fast.
Do you know of any other ways? Is one of these ways the "accepted" way? Comments?
Thanks,
Mala
Lookie here: Is there a W3C valid way to disable autocomplete in a HTML form?
Stick with the random name. You can do it simply enough server and client and you meet your no-js requirement.
You can store the original and changed name in a $_SESSION variable before outputting the form, and after the user submits, just get the name from there:
$random_name = md5('original_name' . time());
$_SESSION['original_name'] = $random_name;
...output form...
And after submitting you can easily get the value from $_POST using the $_SESSION variable:
$field_value = $_POST[$_SESSION['original_name']];
Just be sure that you have sessions available by calling session_start() before any processing.
Autocomplete is something that browsers decided to do on their own, so there’s certainly no spec document to look at.