Input field name starts with a number - html

I have an input field whose name is an MD5 string e.g.:
<input type="hidden" name="7815696ecbf1c96e6894b779456d330e" value="1">
Now I understand that having a number as the first letter in an input field name is generally bad practice, but are there any side-effects to this such as a certain browser won't send it in the POST request?

An ID attribute would have had to begin with a letter as per the HTML 4.01 W3C specification, however since the NAME attribute of input elements is of CDATA type (Source), this restriction does not apply.
One real restriction you get on NAME attributes is when you submit a form with the GET method, because in this case, form data must be restricted to ASCII codes (Source).

The HTML spec doesn't restrict the control name in any way. In fact it even says that the control name is URL-encoded and that spaces and non-alphanumeric characters are handled in a certain way, so obviously the designers anticipated names having an arbitrary format.

As far as I know, you should have no problem in any browser.
But you can always consider to prepend some kind of string, also for convenience:
e.g.,
<input type="hidden" name="h.7815696ecbf1c96e6894b779456d330e" value="1">
Which can help someway.

Related

Can the input tag's name attribute simply be an integer?

Are the names of input tags allowed to simply be integers?
<input type="text" name="34" />
Just asking in case I were a lazy programmer. Or if I have a ton of arbitrary fields and it's not important what they are named.
Yes, the name attribute is declared as taking CNAME value, which means any string of characters, without imposing constraints. HTML5 does not change this, except by disallowing the empty string; its definition explicitly says: “Any non-empty value for name is allowed”.
People sometimes confuse the name attribute with the id attribute, upon which there are various constraints depending on HTML version (e.g., some versions forbid a value that starts with a digit).
Yes, they can. I wouldn't recommend it, but there's nothing wrong with using a number as a name attribute.
No. Using integer is valid, yet it will be converted into a string.
The content attribute is the attribute as you set it from the content (the HTML code) and you can set it or get it via element.setAttribute() or element.getAttribute(). The content attribute is always a string even when the expected value should be an integer. For example, to set an element's maxlength to 42 using the content attribute, you have to call setAttribute("maxlength", "42") on that element.
ref: https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes
It's valid in HTML5, but I'd avoid it. However, your code is still not valid HTML5 because of the self-closing tag. It should be:
<input type="text" name="34">

HTML input - name vs. id [duplicate]

This question already has answers here:
Difference between id and name attributes in HTML
(22 answers)
Closed 3 years ago.
When using the HTML <input> tag, what is the difference between the use of the name and id attributes especially that I found that they are sometimes named the same?
In HTML4.01:
Name Attribute
Valid only on <a>, <form>, <iframe>, <img>, <map>, <input>, <select>, <textarea>
Name does not have to be unique, and can be used to group elements together such as radio buttons & checkboxes
Can not be referenced in URL, although as JavaScript and PHP can see the URL there are workarounds
Is referenced in JavaScript with getElementsByName()
Shares the same namespace as the id attribute
Must begin with a letter
According to specifications is case sensitive, but most modern browsers don't seem to follow this
Used on form elements to submit information. Only input tags with a name attribute are submitted to the server
Id Attribute
Valid on any element except <base>, <html>, <head>, <meta>, <param>, <script>, <style>, <title>
Each Id should be unique in the page as rendered in the browser, which may or may not be all in the same file
Can be used as anchor reference in URL
Is referenced in CSS or URL with # sign
Is referenced in JavaScript with getElementById(), and jQuery by $(#<id>)
Shares same name space as name attribute
Must contain at least one character
Must begin with a letter
Must not contain anything other than letters, numbers, underscores (_), dashes (-), colons (:), or periods (.)
Is case insensitive
In (X)HTML5, everything is the same, except:
Name Attribute
Not valid on <form> any more
XHTML says it must be all lowercase, but most browsers don't follow that
Id Attribute
Valid on any element
XHTML says it must be all lowercase, but most browsers don't follow that
This question was written when HTML4.01 was the norm, and many browsers and features were different from today.
The name attribute is used for posting to e.g. a web server. The id is primarily used for CSS (and JavaScript). Suppose you have this setup:
<input id="message_id" name="message_name" type="text" />
In order to get the value with PHP when posting your form, it will use the name attribute, like this:
$_POST["message_name"];
The id is used for styling, as said before, for when you want to use specific CSS content.
#message_id
{
background-color: #cccccc;
}
Of course, you can use the same denomination for your id and name attribute. These two will not interfere with each other.
Also, name can be used for more items, like when you are using radio buttons. Name is then used to group your radio buttons, so you can only select one of those options.
<input id="button_1" type="radio" name="option" />
<input id="button_2" type="radio" name="option" />
And in this very specific case, I can further say how id is used, because you will probably want a label with your radio button. Label has a for attribute, which uses the id of your input to link this label to your input (when you click the label, the button is checked). An example can be found below
<input id="button_1" type="radio" name="option" /><label for="button_1">Text for button 1</label>
<input id="button_2" type="radio" name="option" /><label for="button_2">Text for button 2</label>
IDs must be unique
...within page DOM element tree so each control is individually accessible by its id on the client side (within browser page) by
JavaScript scripts loaded in the page
CSS styles defined on the page
Having non-unique IDs on your page will still render your page, but it certainly won't be valid. Browsers are quite forgiving when parsing invalid HTML. but don't do that just because it seems that it works.
Names are quite often unique but can be shared
...within page DOM between several controls of the same type (think of radio buttons) so when data gets POSTed to server only a particular value gets sent. So when you have several radio buttons on your page, only the selected one's value gets posted back to server even though there are several related radio button controls with the same name.
Addendum to sending data to server: When data gets sent to server (usually by means of HTTP POST request) all data gets sent as name-value pairs where name is the name of the input HTML control and value is its value as entered/selected by the user. This is always true for non-Ajax requests. In Ajax requests name-value pairs can be independent of HTML input controls on the page, because developers can send whatever they want to the server. Quite often values are also read from input controls, but I'm just trying to say that this is not necessarily the case.
When names can be duplicated
It may sometimes be beneficial that names are shared between controls of any form input type. But when? You didn't state what your server platform may be, but if you used something like ASP.NET MVC you get the benefit of automatic data validation (client and server) and also binding sent data to strong types. That means that those names have to match type property names.
Now suppose you have this scenario:
you have a view with a list of items of the same type
user usually works with one item at a time, so they will only enter data with one item alone and send it to server
So your view's model (since it displays a list) is of type IEnumerable<SomeType>, but your server side only accepts one single item of type SomeType.
How about name sharing then?
Each item is wrapped within its own FORM element and input elements within it have the same names so when data gets to the server (from any element) it gets correctly bound to the string type expected by the controller action.
This particular scenario can be seen on my Creative stories mini-site. You won't understand the language, but you can check out those multiple forms and shared names. Never mind that IDs are also duplicated (which is a rule violation) but that could be solved. It just doesn't matter in this case.
name identifies form fields*; so they can be shared by controls that stand to represent multiple possibles values for such a field (radio buttons, checkboxes). They will be submitted as keys for form values.
id identifies DOM elements; so they can be targeted by CSS or JavaScript.
* name's are also used to identify local anchors, but this is deprecated and 'id' is a preferred way to do so nowadays.
name is the name that is used when the value is passed (in the URL or in the posted data). id is used to uniquely identify the element for CSS styling and JavaScript.
The id can be used as an anchor too. In the old days, <a name was used for that, but you should use the id for anchors too. name is only to post form data.
name is used for form submission in the DOM (Document Object Model).
ID is used for a unique name of HTML controls in the DOM, especially for JavaScript and CSS.
The name defines what the name of the attribute will be as soon as the form is submitted. So if you want to read this attribute later you will find it under the "name" in the POST or GET request.
Whereas the id is used to address a field or element in JavaScript or CSS.
The id is used to uniquely identify an element in JavaScript or CSS.
The name is used in form submission. When you submit a form only the fields with a name will be submitted.
The name attribute on an input is used by its parent HTML <form>s to include that element as a member of the HTTP form in a POST request or the query string in a GET request.
The id should be unique as it should be used by JavaScript to select the element in the DOM for manipulation and used in CSS selectors.
I hope you can find the following brief example helpful:
<!DOCTYPE html>
<html>
<head>
<script>
function checkGender(){
if(document.getElementById('male').checked) {
alert("Selected gender: "+document.getElementById('male').value)
}else if(document.getElementById('female').checked) {
alert("Selected gender: "+document.getElementById('female').value)
}
else{
alert("Please choose your gender")
}
}
</script>
</head>
<body>
<h1>Select your gender:</h1>
<form>
<input type="radio" id="male" name="gender" value="male">Male<br>
<input type="radio" id="female" name="gender" value="female">Female<br>
<button onclick="checkGender()">Check gender</button>
</form>
</body>
</html>
In the code, note that both 'name' attributes are the same to define optionality between 'male' or 'female', but the 'id's are not equals to differentiate them.
Adding some actual references to W3C documentation that authoritatively explain the role of the 'name' attribute on form elements. (For what it's worth, I arrived here while exploring exactly how Stripe.js works to implement safe interaction with the payment gateway Stripe. In particular, what causes a form input element to get submitted back to the server, or prevents it from being submitted?)
The following W3C documentation is relevant:
HTML 4: https://www.w3.org/TR/html401/interact/forms.html#control-name Section 17.2 Controls
HTML 5: https://www.w3.org/TR/html5/forms.html#form-submission-0 and
https://www.w3.org/TR/html5/forms.html#constructing-the-form-data-set Section 4.10.22.4 Constructing the form data set.
As explained therein, an input element will be submitted by the browser if and only if it has a valid 'name' attribute.
As others have noted, the 'id' attribute uniquely identifies DOM elements, but is not involved in normal form submission. (Though 'id' or other attributes can of course be used by JavaScript to obtain form values, which JavaScript could then use for Ajax submissions and so on.)
One oddity regarding previous answers/commenters concern about id's values and name's values being in the same namespace. So far as I can tell from the specifications, this applied to some deprecated uses of the name attribute (not on form elements). For example https://www.w3.org/TR/html5/obsolete.html:
"Authors should not specify the name attribute on a elements. If the attribute is present, its value must not be the empty string and must neither be equal to the value of any of the IDs in the element's home subtree other than the element's own ID, if any, nor be equal to the value of any of the other name attributes on a elements in the element's home subtree. If this attribute is present and the element has an ID, then the attribute's value must be equal to the element's ID. In earlier versions of the language, this attribute was intended as a way to specify possible targets for fragment identifiers in URLs. The id attribute should be used instead."
Clearly, in this special case, there's some overlap between id and name values for 'a' tags. But this seems to be a peculiarity of processing for fragment ids, not due to general sharing of namespace of ids and names.
An interesting case of using the same name: input elements of type checkbox like this:
<input id="fruit-1" type="checkbox" value="apple" name="myfruit[]">
<input id="fruit-2" type="checkbox" value="orange" name="myfruit[]">
At least if the response is processed by PHP, if you check both boxes, your POST data will show:
$myfruit[0] == 'apple' && $myfruit[1] == 'orange'
I don't know if that sort of array construction would happen with other server-side languages, or if the value of the name attribute is only treated as a string of characters, and it's a fluke of PHP syntax that a 0-based array gets built based on the order of the data in the POST response, which is just:
myfruit[] apple
myfruit[] orange
Can't do that kind of trick with ids. A couple of answers in What are valid values for the id attribute in HTML? appear to quote the spec for HTML 4 (though they don't give a citation):
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
followed by any number of letters, digits ([0-9]), hyphens ("-"),
underscores ("_"), colons (":"), and periods (".").
So the characters [ and ] are not valid in either ids or names in HTML4 (they would be okay in HTML5). But as with so many things html, just because it's not valid doesn't mean it won't work or isn't extremely useful.
If you are using JavaScript/CSS, you must use the 'id' of a control to apply any CSS/JavaScript stuff on it.
If you use name, CSS won't work for that control. As an example, if you use a JavaScript calendar attached to a textbox, you must use the id of the text control to assign it the JavaScript calendar.

Do browsers preserve order of inputs with same name on GET/POST?

I have this HTML code with multiple inputs with the same name:
<input type="hidden" value="42" name="authors" />
<input type="hidden" value="13" name="authors" />
<input type="hidden" value="33" name="authors" />
The order of the values is important. Does the HTML spec define that user agents have to preserve this order, and if yes, do the common (market share > 1%) browsers follow this definition?
Bonus points if someone knows if WSGI and especially Django preserve the order server-side :-)
Thanks!
Yes, they should be sent in the order they appear according to the html rfc
See 8.2.1. The form-urlencoded Media Type:
The fields are listed in the order they appear in the document with the
name separated from the value by =
and the pairs separated from each
other by &. Fields with null values
may be omitted. In particular,
unselected radio buttons and
checkboxes should not appear in the
encoded data, but hidden fields with
VALUE attributes present should.
I've found in the spec for html 4.0 too:
For url encoded data:
The control names/values are listed in
the order they appear in the document.
The name is separated from the value
by = and name/value pairs are
separated from each other by &.
For multipart data (thanks #Chuck):
A "multipart/form-data" message
contains a series of parts, each
representing a successful control. The
parts are sent to the processing agent
in the same order the corresponding
controls appear in the document
stream. Part boundaries should not
occur in any of the data; how this is
done lies outside the scope of this
specification.
The HTML5 spec for application/x-www-form-urlencoded and text/plain lays out an algorithm that "For each entry in the form data set [...] Append", resulting in the same order.
As for multipart/form-data: "The order of parts must be the same as the order of fields in the form data set. Multiple entries with the same name must be treated as distinct fields."
This would not be complete without obtaining order of the form data set as derived from the document: the same spec defines an algorithm for constructing the form data set that "Loop: For each element field in controls, in tree order, run the following substeps and only skip or Append an entry.
Therefore for HTML5-compliant user agents, whatever the encoding, the non-skipped parameters are tree-ordered, with duplicates allowed.

Is it safe to display user input as input values without sanitization?

Say we have a form where the user types in various info. We validate the info, and find that something is wrong. A field is missing, invalid email, et cetera.
When displaying the form to the user again I of course don't want him to have to type in everything again so I want to populate the input fields. Is it safe to do this without sanitization? If not, what is the minimum sanitization that should be done first?
And to clearify: It would of course be sanitized before being for example added to a database or displayed elsewhere on the site.
No it isn't. The user might be directed to the form from a third party site, or simply enter data (innocently) that would break the HTML.
Convert any character with special meaning to its HTML entity.
i.e. & to &, < to <, > to > and " to " (assuming you delimit your attribute values using " and not '.
In Perl use HTML::Entities, in TT use the html filter, in PHP use htmlspecialchars. Otherwise look for something similar in the language you are using.
It is not safe, because, if someone can force the user to submit specific data to your form, you will output it and it will be "executed" by the browser. For instance, if the user is forced to submit '/><meta http-equiv="refresh" content="0;http://verybadsite.org" />, as a result an unwanted redirection will occur.
You cannot insert user-provided data into an HTML document without encoding it first. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data-values and never as HTML markup or Javascript code. Attacks against this mechanism are commonly known as "cross-site scripting", or simply "XSS".
If inserting into an HTML attribute value, then you must ensure that the string cannot cause the attribute value to end prematurely. You must also,of course, ensure that the tag itself cannot be ended. You can acheive this by HTML-encoding any chars that are not guaranteed to be safe.
If you write HTML so that the value of the tag's attribute appears inside a pair of double-quote or single-quote characters then you only need to ensure that you html-encode the quote character you chose to use. If you are not correctly quoting your attributes as described above, then you need to worry about many more characters including whitespace, symbols, punctuation and other ascii control chars. Although, to be honest, its arguably safest to encode these non-alphanumeric chars anyway.
Remember that an HTML attribute value may appear in 3 different syntactical contexts:
Double-quoted attribute value
<input type="text" value="**insert-here**" />
You only need to encode the double quote character to a suitable HTML-safe value such as "
Single-quoted attribute value
<input type='text' value='**insert-here**' />
You only need to encode the single quote character to a suitable HTML-safe value such as ‘
Unquoted attribute value
<input type='text' value=**insert-here** />
You shouldn't ever have an html tag attribute value without quotes, but sometimes this is out of your control. In this case, we really need to worry about whitespace, punctuation and other control characters, as these will break us out of the attribute value.
Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and | (and more). [para lifted from OWASP]
Please remember that the above rules only apply to control injection when inserting into an HTML attribute value. Within other areas of the page, other rules apply.
Please see the XSS prevention cheat sheet at OWASP for more information
Yes, it's safe, provided of course that you encode the value properly.
A value that is placed inside an attribute in an HTML needs to be HTML encoded. The server side platform that you are using should have methods for this. In ASP.NET for example there is a Server.HtmlEncode method, and the TextBox control will automatically HTML encode the value that you put in the Text property.

Post newline/carriage return as hidden field value

I need to post multi-line data via a hidden field. The data will be viewed in a textarea after post. How can I post a newline/carriage return in the html form?
I've tried \r\n but that just posts the actual "\r\n" data
<input type="hidden" name="multiline_data" value="line one\r\nline two" />
Is there a way to do this?
Instead of using
<input type="hidden">
Try using
<textarea style="visibility:hidden;position:absolute;">
While new lines (Carriage Return & Line Feed) are technically allowed in <input>'s hidden state, they should be escaped for compatibility with older browsers. You can do this by replacing all Carriage Returns (\u000D or \r) and all Line Feeds (\u000A or \n) with proprietary strings that are recognized by your application to be a Carriage Return or New Line (and also escaped, if present in the original string).
Simply character entities don't work here, due to non-conforming browsers possibly knowing
and 
 are new lines and stripping them from the value.
Example
For example, in PHP, if you were to echo the passed value to a textarea, you would include the newlines (and unescaped string).
<textarea>Some text with a \ included
and a new line with \r\n as submitted value</textarea>
However, in PHP, if you were to echo the value to the value attribute of an <input> tag, you would escape the new lines with your proprietary strings (e.g. \r and \n), and escape any instances of your proprietary strings in the submitted value.
<input type="hidden" value="Some text with a \\ included\r\nand a new line\\r\\n as submitted value">
Then, before using the value elsewhere (inserting into a database, emailing, etc), be sure to unescape the submitted value, if necessary.
Reassurance
As further reassurance, I asked the WHATWG, and Ian Hickson, editor of the HTML spec currently, replied:
bfrohs Question about <input type=hidden> -- Are Line Feeds and Carriage Returns allowed in the value? They are specifically disallowed in Text state and Search state, but no mention is made for Hidden state. And, if not, is there an acceptable HTML solution for storing form data from a textarea?
Hixie yes, they are allowed // iirc // for legacy reasons you may wish to escape them though as some browsers normalise them away // i forget if we fixed that or not // in the spec
Source
Depends on the character set really but
should be linefeed and 
 should be carriage return. You should be able to use those in the value attribute.
You don't say what this is for or what technology you're using, but you need to be aware that you can't trust the hidden field to remain with value="line one
line two", because a hostile user can tamper with it before it gets sent back in the POST. Since you're putting the value in a <textarea> later, you will definitely be subject to, for example, cross site scripting attacks unless you verify and/or sanitize your "multiline_data" field contents before you write it back out.
When writing a value into a hidden field and reading it back, it's usually better to just keep it on the server, as an attribute of the session, or pageflow, or whatever your environment provides to do this kind of thing.