What's a valid HTML5 document? - html

I've just been reading the HTML5 author spec.
It states that the <html>, <head> and <body> tags are optional.
Does that mean that you can leave them out completely and still have a valid HTML5 document?
If I'm interpreting this correctly, it means this should be completely valid:
<!DOCTYPE html>
<p>Hello!</p>
Is this correct?
You can check out the spec here:
http://dev.w3.org/html5/spec-author-view/syntax.html#syntax
"8.1.2.4 Optional tags" is the bit out about it being OK to omit <html>, <head> and <body>

The title element is indeed required, but as Jukka Korpela notes, it also must be non-empty. Furthermore, the content model of the title element is:
Text that is not inter-element whitespace.
Therefore, having just a space character in the title element is not considered valid HTML. You can check this in W3C validator.
So, an example of a minimal and valid HTML5 document is the following:
<!doctype html><title>a</title>

This is the minimal HTML5-valid document:
<!doctype html><title> </title>

W3C HTML validator maintainer here. FYI with regard to the validator behavior, as of today, the validator now enforces the requirement in the HTML spec that the title element must contain at least one non-whitespace character -
http://validator.w3.org/nu/?doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%3C%2521doctype%2520html%3E%3Ctitle%3E%2520%2520%2520%3C%252Ftitle%3E

While the <html>, <head> and <body> start and end tags are optional, the <title> tags are required, except in special circumstances, so no, your sample is not (ordinarily) valid.

Related

What is the necessity of <!DOCTYPE html> tag and what will happen if it is not present in the code? [duplicate]

This question already has answers here:
What is the functionality of !DOCTYPE?
(5 answers)
Closed 5 years ago.
The <!DOCTYPE html> tag must be present at the beginning of an HTML document. Can I know the reason why? Also, I've noticed that my webpage works all fine even without this tag.
Here is a sample code along with the tag.
<!doctype html>
<html>
<head>
</head>
<body>
Hello, world!
</body>
</html>
Here is the code without the tag.
<html>
<head>
</head>
<body>
Hello, world!
</body>
</html>
Both of the codes return the same result. What is the actual purpose of the tag and what will happen if it is not present in an HTML code?
The <!DOCTYPE> declaration must be the very first thing in your HTML document, before the tag.
The <!DOCTYPE> declaration is not an HTML tag; it is an instruction to the web browser about what version of HTML the page is written in.
In HTML 4.01, the <!DOCTYPE> declaration refers to a DTD, because HTML 4.01 was based on SGML. The DTD specifies the rules for the markup language, so that the browsers render the content correctly.
HTML5 is not based on SGML, and therefore does not require a reference to a DTD.
source
DTD-Document Type Definition.
also take look at previous answers
Firstly, you've to know doctype declaration is not a tag, but an instructor that tells the browser which version of html you are using or you want use.
In HTML 4.01 doctype declaration refers DTD, The DTD define complete rules for the mark up languages because HTML 4.01 based on SGML.
Html 5 or current version of html does not require the doctype declaration because it does not based on SGML.
But its better always to add the Doctype declaration to your html document so that the browser will understand which type html expect to.

What are the drawbacks of ignoring <html> and <body>

Is there any drawback to never using
<html> and <body>
on your web pages that are written in HTML and PHP?
I mean, everything works perfectly fine with it or without it, so why use it?
They are explicitly optional in the spec (so the document will still be valid).
This has been true since the original spec (which says <!ELEMENT HTML O O (( HEAD | BODY | %oldstyle)*, PLAINTEXT?)>, O O meaning Start Tag Optional, End Tag Optional) through to the current spec (which says "An html element's start tag can be omitted if the first thing inside the html element is not a comment. An html element's end tag can be omitted if the html element is not immediately followed by a comment.").
They are only mandatory in XHTML since XML has no concept of optional tags.
I've never seen any browser or user-agent fail to handle them correctly in an HTML document. (Note that while the tags are optional, the elements are not, so browsers will insert an HTML, HEAD and BODY elements even if the tags are missing, so any script which tries to find them in the DOM will still work).
The only technical drawback is that you can't put attributes on tags which aren't there, and a lang attribute for the HTML element is useful.
Leaving them out can confuse other people who have to maintain your code who don't know that the tags are optional though.
Both <head> and <body> tags are optional in HTML5. In fact it is recommended by Google's HTML style guide to not use them:
<!-- Not recommended -->
<!DOCTYPE html>
<html>
<head>
<title>Spending money, spending bytes</title>
</head>
<body>
<p>Sic.</p>
</body>
</html>
<!-- Recommended -->
<!DOCTYPE html>
<title>Saving money, saving bytes</title>
<p>Qed.
By not using those tags, some drawbacks include:
that it is drastically different from what is typically learned for developers, so it may cause some confusion.
a restriction that a comment cannot be immediately after the <html> tag that is omitted.
Reiterating the optional nature of the tags from the spec:
An html element's start tag may be omitted if the first thing inside the html element is not a comment.
A body element's start tag may be omitted if the element is empty, or if the first thing inside the body element is not a space character or a comment, except if the first thing inside the body element is a meta, link, script, style, or template element.
See:
https://google.github.io/styleguide/htmlcssguide.xml?showone=Optional_Tags#Optional_Tags
https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission
If you do not use <html> and <body> than your HTML document will be not valid, some libraries/plugins may not work too.

Does HTML5 requires spaces between attributes that are of quoted values?

HTML does normally allow to have no spaces between attributes when attributes have values and those values are quoted.
Example (Reference/Source):
In HTML-documents no White Spaces between Attributes are needed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>no attribute space</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<p class="CLASS"title='TITLE'></p>
</body>
</html>
See the third-last line:
<p class="CLASS"title='TITLE'></p>
^^
Now using such HTML chunk changing the doctype to HTML 5 (<!DOCTYPE HTML>), makes the experimental W3C HTML 5 conformance checker give an error exactly there telling me:
Validation Output: 1 Error
Error Line 9, Column 22: No space between attributes.
<p class="CLASS"title='TITLE'></p>
^
So I thought that HTML 5 is backwards compatible to how browsers deal with HTML in reality and browsers AFAIK deal with this well. So I'm a bit puzzeled at least. I also have problems to decipher the (somewhat needlessly) compilcated HTML 5 specs to be precise at this point because what I did find (W3C again, see http://www.w3.org/TR/html-markup/syntax.html#syntax-attributes) it's not saying that this is (may nor must) be an error.
You are reading a discontinued, non-normative reference. If you look at the definition of the start tag in the specification (which is normative) it says:
Then, the start tag may have a number of attributes, the syntax for which is described below. Attributes must be separated from each other by one or more space characters.
So I thought that HTML 5 is backwards compatible to how browsers deal with HTML in reality and browsers AFAIK deal with this well.
Being compatible with real world markup is a design goal, but lots of things have been obsoleted and leaving out the space between attributes is something that almost never occurs intentionally.
Section 4.3, "Elements" of the document you link in the question says:
Optionally, one or more attributes, each of which must be preceded by
one or more space characters.
Usin the W3C Official HTML Validator, having no spaces between attributes are checked as errors if you use the HTML5 Doctype:
<!DOCTYPE html>
The output message is the following:
Line 9, Column 23: No space between attributes.

What should a basic html5 file include?

I was wondering what an html file should ALWAYS include besides <html>, <header> and <body>. I've seen many things, so I'm not sure what to include ALWAYS.
The html, header and body tags are actually optional in HTML5. The only required element, aside from the doctype definition, is title. So the following would be a completely valid HTML5 document:
<!DOCTYPE html>
<title>example</title>
You can validate that using the W3C Validator.
Actually the start/end tags of the html/body tags are optional IF:
The <html> start tag is optional unless the first thing inside the html element is not a comment.
The <⁄html> end tag is optional unless the html element is not immediately followed by a comment.
see html tag, optional vs. required
and the body start/end tags may be omitted IF : see when

Is this minimalist HTML5 markup valid?

<!DOCTYPE html>
<meta charset="utf-8">
<body>
Hello, world!
SOURCE FOR CODE
If so, besides removing "Hello, world!" is there any tag that's able to be removed and it still be valid, and how do you know it's still valid?
It's not valid. To check it you can run it in W3C Validator
The error is: Element head is missing a required instance of child element title.
...
UPDATE
As vcsjones stated the head element is optional. That's the title one is required. Credit to mootinator for pointing out that the body is also optional.
So the simplest valid document will be:
<!DOCTYPE html>
<title></title>
(Assuming the HTML syntax of HTML5.)
Note that in some situations the title element is optional, too.
From HTML5’s definition of head:
The title element is a required child in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title element can be omitted.
So the minimal markup for a document that gets a title from a "higher-level protocol" is this:
<!DOCTYPE html>
If the document is the value of an iframe-srcdoc it’s this (assuming a title is provided by the container document):
<html>
And for a stand-alone document it’s this (the title element needs some actual content, as noted by kapep, so the "…" is just an example):
<!DOCTYPE html>
<title>…</title>
The title tag can't be empty or only consist of whitespace. So if the document is in a context where the title tag is required, you will have to set a valid title value.
The title content model is defined as "Text that is not inter-element whitespace".
"Empty Text nodes and Text nodes consisting of just sequences of [space characters]" are inter-element whitespace. Space characters are space, tab, line feed, form feed and carriage return.
If the title tag is empty, the W3C Validator complains that "Element title must not be empty". The Validator is fine with only adding just spaces, even though that is not correct according to the specs.
It is valid if you add another non-space character:
<!DOCTYPE html>
<title>x</title>
You could use other space characters like non-break space or zero-width non-break space if you want to fake an "empty" title.
The smallest HTML document for which the Nu Html Checker (the only HTML validator currently endorsed by the WHATWG) does not produce any errors nor warnings is the following:
<!DOCTYPE html>
<html lang="">
<title>x</title>