Why does HTML5 form-validation allow emails without a dot? - html

I'm writing a very simple mock-up to demonstrate some HTML5 form-validation. However, I noticed the email validation doesn't check for a dot in the address, nor does it check for characters following said dot.
In other words, "john#doe" is considered valid, when it's clearly not a valid email address; "doe" isn't a domain.
This is how I'm coding my email field:
<input type="email" required />
Is that not enough?
Check this fiddle to see what I mean.
Note: I know how to accomplish this via a RegEx pattern instead. I'm just wondering how someone could get away with using the email type instead.

You can theoretically have an address without a "." in.
Since technically things such as:
user#com
user#localserver
user#[IPv6:2001:db8::1]
Are all valid emails.
So the standard HTML5 validation allows for all valid E-mails, including the uncommon ones.
For some easy to read explanations (Instead of reading through the standards):
http://en.wikipedia.org/wiki/Email_address#Examples
Update from a comment: ICANN banned so-called "dotless" domains in 2013, but since that doesn't affect every case listed above, allowing "dotless" addresses is still valid.

Because a#b is a valid email address (eg localhost is a valid domain). See http://en.wikipedia.org/wiki/Email_address#Examples
Also, keep in mind that you should always do the input validation in server. The client side validation should be only for giving feedback to the user and not be relied on, since it can be easily bypassed.

Try adding this to the input
pattern="[a-z0-9._%+-]+#[a-z0-9.-]+\.[a-z]{2,63}$"
Fiddle

The RFC 822, chapter 6, gives the specification of an address in augmented Backus-Naur Form (BNF):
addr-spec = local-part "#" domain
local-part = word *("." word)
domain = sub-domain *("." sub-domain)
Using this specification a#b is a valid address.
UPDATE
To answer the comment of Trejkaz, I add the following definitions. We see that SPACE are allowed but only in quoted string.
word = atom / quoted-string
atom = 1*<any CHAR except specials, SPACE and CTLs>
quoted-string = <"> *(qtext/quoted-pair) <">
SPACE = <ASCII SP, space>
CTL = <any ASCII control character and DEL>
qtext = <any CHAR excepting <">, "\" & CR, and including linear-white-space>
quoted-pair = "\" CHAR

This MDN page shows the regex browsers should use to validate the email:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input/email#Validation
You can slightly change this regex to require at least one dot in the domain name: change the star * at the end of the regex to a plus +. Then use that regex as the pattern attribute:
<form>
<input
type="email"
pattern="^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+$"
title="Valid e-mail address including top-level domain"
required
/>
<button type="submit">Test</button>
</form>

You can customize the pattern of the email field:
input:valid {
border-color: green
}
input:invalid {
border-color: red
}
Email:
<input type="email" required value="a#b.c" /><br>
Non-dots Email:
<input type="email" required pattern="[^.]+#[^.]+" value="a#b.c" />

Here is how you can do it with html5 using regex pattern. You can also include a custom message to display.
<form>
<input type="email" value="paul#test" required pattern="[a-z0-9._%+-]+#[a-z0-9.-]+\.[a-z]{2,63}$" title="Hey, you are missing domain part in the email !!!"/>
<button type="submit">Click Me</button>
</form>

Hostnames without a TLD appear to be valid.
I say "appear" because there is this 2013 ICANN prohibition on dotless domains . . .
At its meeting on 13 August 2013, the ICANN Board New gTLD Program Committee (NGPC) adopted a resolution affirming that "dotless domain names" are prohibited.
. . . but judging from real world experience, it appears to have never been enforced.
Regardless, the PHP function FILTER_VALIDATE_EMAIL doesn't allow for dotless domain names.
So here's a simple back-end validation set-up that covers both your email and required fields:
if (empty($required_field) OR empty($another_required_field) OR
!filter_var($email_field, FILTER_VALIDATE_EMAIL)) {
// error handling here
exit;
}
While the "malformed" email may get passed the browser, it won't get passed the server.
References:
FILTER_VALIDATE_EMAIL definition
List of valid and invalid email addresses

This pattern always works for me.
Text must in lowercase pattern="[a-z0-9._%+-]+#[a-z0-9.-]+\.[a-z]{2,}$" but I think it covers more or less most emails.

Related

HTML5 Pattern to email field

I am learning how HTML5 pattern works, and I am trying do a email field validation.
The problem is that when I add the '#' in the pattern, when I try, the field does not consider it valid. If I try the pattern without '#' works perfectly.
Requirements of the validation:
-> starts with 5 characters or numbers
-> then goes an #
-> continuous with a range between 2 and 10 characters or numbers
-> then .
-> must have 'es', 'org' or 'com'
CODE:
<input type="text" name="email" id="email" pattern='^[a-zA-Z0-9]{5}#[a-zA-Z0-9]{2,10}.(es|com|org)$'>
PD: I know the email field exists, but I'm testing the html5 patterns.
The WHATWG community shared this JavaScript- and Perl-compatible regular expression that matches the HTML5 type="email".
https://html.spec.whatwg.org/multipage/input.html#valid-e-mail-address
/^[a-zA-Z0-9.!#$%&'*+\/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
you can use type= "email" in html 5. that will validate email but to validate email on specific pattern you need to write regex in jquery/javascript.
https://www.w3resource.com/javascript/form/email-validation.php

HTML5 Email validation pattern does not validate .edu.au, accepts even just edu.a

I'm supposed to be accepting only usyd.edu.au email addresses. The validation works, but it also works with usyd.edu.a emails. I want it to accept only when complete .au is entered. The pattern I have which works so far is:
input type="email", placeholder="Email" name="txtName" id="txtEmail" pattern="[a-z0-9._%+-]+#[usyd]+\.[edu]+\.[au]"/>
Close but not right.
[a-z0-9._%+-]+#[usyd]+\.[edu]+\.[au]
[a-z0-9._%+-]+ simple enough Allow a-z or 0-9 or . or _ or % or + or -
# only allow #
[usyd]+ allow one or more of u, s, y or d. Which could just be 'u'
\. only allow .
But the suggestion from Austen is also wrong:
[a-z0-9._%+-]+#(usyd)+\.(edu)+\.(au)
(usyd)+ would allow you to enter 'usydusydusyd'
(edu)+ would allow you to enter 'edueduedueduedu'
There is no reason for the parenthesis. and you definitely do not want the '+' since that allows one-or-more.
Instead you want this:
^[a-z0-9._%+-]+#usyd\.edu\.au$
This will make sure that nothing odd exists at the beginning of the string (The '^') and that the string ends with '#usyd.edu.au' (The '$')
input:invalid {
background: red;
}
<input type="email", placeholder="Email" name="txtName" id="txtEmail" pattern="[a-z0-9._%+-]+#usyd\.edu\.au"/>
You almost got it! If you change the square brackets [] in the last 3 places in your regex to be round brackets () then it will work as expected.
ex: [a-z0-9._%+-]+#(usyd)+\.(edu)+\.(au)
Here's why:
[au] will match a SINGLE character found within the brackets. (Either an a, or u)
(au) will match EVERYTHING found inside the brackets. (Exactly au)

Is both pattern and type="email" used in conjunction an issue?

HTML5 email type & patterns
Are there any issues, conflict or otherwise, between using both the new HTML5 type values (e.g. email, tel, etc.) in conjunction with the pattern attribute.
I'm not referring to HTML5 browser compatibility—only the direct effect the new values for these attributes have when used in conjunction with the pattern attribute.
I'll put a few examples in for clarity using type="email"
A. Type attribute only.
<input type="email">
B. Only pattern attribute
<input type="text" pattern="[email regex]">
C. Email & Pattern attributes used together
<input type="email" pattern="[email regex]">
I feel like you gain more semantic value with the new HTML5 type values; however, regex is much more controllable as email#localhost is valid via only the email value being used.
If their's a duplicate, my apologies, I looked around but didn't see this particular question
EDIT
I found a mention of possible negative effects when using both in conjunction with each other. However, I'm unsure how credible the source is.
As both ways of validating email addresses has their pros and cons it is up to you to decide which one to use. You should not try to use them both at the same time as this might induce a clash in browsers that support both features. Using type=”email” has the advantage that it is semantically correct both using the pattern attribute has the advantage that there are several easy-to-use polyfills on the web which ensures support for a greater range of audience.
Source
Just be sure to test thoroughly if both are used in unison. I'll update the question if I find any negative side effects.
Necessity
The pattern attribute shouldn't be necessary on any browser which fully conforms to the HTML5 specification on how the various type states are implemented.
For example, this is how the type=email input element should be implemented:
If the multiple attribute isn't specified...
While the value of the element is neither the empty string nor a single valid e-mail address, the element is suffering from a type mismatch.
If the multiple attribute is specified...
The value attribute, if specified, must have a value that is a valid e-mail address list.
— HTML5 Specification's Email State (type=email)
In really basic terms this means that the element's value will return empty if an empty string nor valid email address has been entered into it. This means that the browser should attempt to validate the email address where no pattern attribute is present. If the pattern attribute is present, both the browser and the specified regular expression would be taken into account.
Usefulness
Despite not being necessary, you may still want to use the pattern attribute to only catch certain varieties of input. For instance, admin#mailserver1 is a valid email address (according to Wikipedia), and you may want to explicitly only allow email addresses which have a TLD. The same applies to region-specific phone numbers.
It appears as though, in a lot of browsers, the default browser functionality trumps any custom functionality. I ran into this issue when trying to account for international email addresses (non-alphanumeric languages).
$(document).ready(function() {
var $emails = $('li pre'),
form = $('form').get(0),
$input = $('input');
$emails.each(function() {
$input.val($(this).text());
if (form.checkValidity()) {
$(this).addClass('success').removeClass('fail');
} else {
$(this).removeClass('success').addClass('fail');
}
});
$input.val('');
});
.success {
color: green;
}
.fail {
color: red;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Valid Email Addresses:
<ul>
<li><pre>Abc#example.com</pre></li>
<li><pre>Abc.123#example.com</pre></li>
<li><pre>user+mailbox/department=shipping#example.com</pre></li>
<li><pre>!#$%&'*+-/=?^_`.{|}~#example.com</pre></li>
<li><pre>"Abc#def"#example.com</pre></li>
<li><pre>"Fred Bloggs"#example.com</pre></li>
<li><pre>"Joe.\\Blow"#example.com</pre></li>
<li><pre>用户#例子.广告</pre></li>
<li><pre>उपयोगकर्ता#उदाहरण.कॉम</pre></li>
<li><pre>юзер#екзампл.ком</pre></li>
<li><pre>θσερ#εχαμπλε.ψομ</pre></li>
<li><pre>Dörte#Sörensen.example.com</pre></li>
</ul>
<form>
<input type="email" name="email" pattern=".+#.+" placeholder="XXXX#XXXX" title="XXXX#XXXX" required />
<button type="submit">
Submit
</button>
</form>
The browser testing results: https://www.browserstack.com/screenshots/0f88466bf4bd5fc4cdec39a7618ed8afc9c82806

Input field name starts with a number

I have an input field whose name is an MD5 string e.g.:
<input type="hidden" name="7815696ecbf1c96e6894b779456d330e" value="1">
Now I understand that having a number as the first letter in an input field name is generally bad practice, but are there any side-effects to this such as a certain browser won't send it in the POST request?
An ID attribute would have had to begin with a letter as per the HTML 4.01 W3C specification, however since the NAME attribute of input elements is of CDATA type (Source), this restriction does not apply.
One real restriction you get on NAME attributes is when you submit a form with the GET method, because in this case, form data must be restricted to ASCII codes (Source).
The HTML spec doesn't restrict the control name in any way. In fact it even says that the control name is URL-encoded and that spaces and non-alphanumeric characters are handled in a certain way, so obviously the designers anticipated names having an arbitrary format.
As far as I know, you should have no problem in any browser.
But you can always consider to prepend some kind of string, also for convenience:
e.g.,
<input type="hidden" name="h.7815696ecbf1c96e6894b779456d330e" value="1">
Which can help someway.

Post newline/carriage return as hidden field value

I need to post multi-line data via a hidden field. The data will be viewed in a textarea after post. How can I post a newline/carriage return in the html form?
I've tried \r\n but that just posts the actual "\r\n" data
<input type="hidden" name="multiline_data" value="line one\r\nline two" />
Is there a way to do this?
Instead of using
<input type="hidden">
Try using
<textarea style="visibility:hidden;position:absolute;">
While new lines (Carriage Return & Line Feed) are technically allowed in <input>'s hidden state, they should be escaped for compatibility with older browsers. You can do this by replacing all Carriage Returns (\u000D or \r) and all Line Feeds (\u000A or \n) with proprietary strings that are recognized by your application to be a Carriage Return or New Line (and also escaped, if present in the original string).
Simply character entities don't work here, due to non-conforming browsers possibly knowing
and 
 are new lines and stripping them from the value.
Example
For example, in PHP, if you were to echo the passed value to a textarea, you would include the newlines (and unescaped string).
<textarea>Some text with a \ included
and a new line with \r\n as submitted value</textarea>
However, in PHP, if you were to echo the value to the value attribute of an <input> tag, you would escape the new lines with your proprietary strings (e.g. \r and \n), and escape any instances of your proprietary strings in the submitted value.
<input type="hidden" value="Some text with a \\ included\r\nand a new line\\r\\n as submitted value">
Then, before using the value elsewhere (inserting into a database, emailing, etc), be sure to unescape the submitted value, if necessary.
Reassurance
As further reassurance, I asked the WHATWG, and Ian Hickson, editor of the HTML spec currently, replied:
bfrohs Question about <input type=hidden> -- Are Line Feeds and Carriage Returns allowed in the value? They are specifically disallowed in Text state and Search state, but no mention is made for Hidden state. And, if not, is there an acceptable HTML solution for storing form data from a textarea?
Hixie yes, they are allowed // iirc // for legacy reasons you may wish to escape them though as some browsers normalise them away // i forget if we fixed that or not // in the spec
Source
Depends on the character set really but
should be linefeed and 
 should be carriage return. You should be able to use those in the value attribute.
You don't say what this is for or what technology you're using, but you need to be aware that you can't trust the hidden field to remain with value="line one
line two", because a hostile user can tamper with it before it gets sent back in the POST. Since you're putting the value in a <textarea> later, you will definitely be subject to, for example, cross site scripting attacks unless you verify and/or sanitize your "multiline_data" field contents before you write it back out.
When writing a value into a hidden field and reading it back, it's usually better to just keep it on the server, as an attribute of the session, or pageflow, or whatever your environment provides to do this kind of thing.