Does HTML5 requires spaces between attributes that are of quoted values? - html

HTML does normally allow to have no spaces between attributes when attributes have values and those values are quoted.
Example (Reference/Source):
In HTML-documents no White Spaces between Attributes are needed.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>no attribute space</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<p class="CLASS"title='TITLE'></p>
</body>
</html>
See the third-last line:
<p class="CLASS"title='TITLE'></p>
^^
Now using such HTML chunk changing the doctype to HTML 5 (<!DOCTYPE HTML>), makes the experimental W3C HTML 5 conformance checker give an error exactly there telling me:
Validation Output: 1 Error
Error Line 9, Column 22: No space between attributes.
<p class="CLASS"title='TITLE'></p>
^
So I thought that HTML 5 is backwards compatible to how browsers deal with HTML in reality and browsers AFAIK deal with this well. So I'm a bit puzzeled at least. I also have problems to decipher the (somewhat needlessly) compilcated HTML 5 specs to be precise at this point because what I did find (W3C again, see http://www.w3.org/TR/html-markup/syntax.html#syntax-attributes) it's not saying that this is (may nor must) be an error.

You are reading a discontinued, non-normative reference. If you look at the definition of the start tag in the specification (which is normative) it says:
Then, the start tag may have a number of attributes, the syntax for which is described below. Attributes must be separated from each other by one or more space characters.
So I thought that HTML 5 is backwards compatible to how browsers deal with HTML in reality and browsers AFAIK deal with this well.
Being compatible with real world markup is a design goal, but lots of things have been obsoleted and leaving out the space between attributes is something that almost never occurs intentionally.

Section 4.3, "Elements" of the document you link in the question says:
Optionally, one or more attributes, each of which must be preceded by
one or more space characters.

Usin the W3C Official HTML Validator, having no spaces between attributes are checked as errors if you use the HTML5 Doctype:
<!DOCTYPE html>
The output message is the following:
Line 9, Column 23: No space between attributes.

Related

Breaking multiple values in an attribute into multiple lines?

Let me explain by example:
<html lang="en-US" prefix="og: http://ogp.me/ns# fb: http://ogp.me/ns/fb# article: http://ogp.me/ns/article#">
...
</html>
As you can see, the prefix attribute in the html tag has multiple definitions. How do I break them into multiple lines? (Considering that a line break is equivalent a space when minified back into a single line... it's kinda tough.)
Is this considered normal?
<html lang="en-US" prefix="
og: http://ogp.me/ns#
fb: http://ogp.me/ns/fb#
article: http://ogp.me/ns/article#
">
EDIT: Facebook does it like this: https://developers.facebook.com/docs/payments/product/
<html lang="en-US" prefix=
"og: http://ogp.me/ns#
fb: http://ogp.me/ns/fb#
article: http://ogp.me/ns/article#">
The attribute values are different. Each whitespace character is stored in the DOM. Whether the difference matters depends on the definition of the attribute. Many attributes, such as class, are defined as taking a set of whitespace-separated tokens as value, and for them, the amount and type of whitespace characters between tokens, or before the first token and after the last token, does not matter.
The prefix attribute is not present in HTML specifications or drafts. The relevant specification is RDFa Core 1.1, which defines the prefix attribute as
“a white space separated list of prefix-name IRI pairs” and contains examples like
<html
xmlns="http://www.w3.org/1999/xhtml"
prefix="foaf: http://xmlns.com/foaf/0.1/
dc: http://purl.org/dc/terms/"
>
So for the prefix attribute, formatting as in the question is acceptable. (Whether it is “normal” in a sense other than “conforming” is a matter of opinion.)
I don't think it's all that "normal". In general, like the comments to your question suggest, it's technically possible but you're opening your page up to (unnecessary) potential parsing errors.
Look to the HTML WG's example regarding using newlines in the title attribute as a concrete example of this.
Furthermore, I was unable to find/remember a single case where I'd seen this used on purpose, with the exception of SVG (but that's not technically HTML).
However, if you run this sample through the W3C's validator, it'll pass with no errors or warnings in regards to multi-line attributes:
<!DOCTYPE html>
<html lang="en-US" prefix="
og: http://ogp.me/ns#
fb: http://ogp.me/ns/fb#
article: http://ogp.me/ns/article#
">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Hello</title>
</head>
<body>
<h1>Hello World!</h1>
</body>
</html>
Generally, it's better to be safe than sorry. Since I couldn't find any examples to the contrary in this case, I'd venture to say that other developers would agree (Do by all means correct me if I'm wrong).

Is this minimalist HTML5 markup valid?

<!DOCTYPE html>
<meta charset="utf-8">
<body>
Hello, world!
SOURCE FOR CODE
If so, besides removing "Hello, world!" is there any tag that's able to be removed and it still be valid, and how do you know it's still valid?
It's not valid. To check it you can run it in W3C Validator
The error is: Element head is missing a required instance of child element title.
...
UPDATE
As vcsjones stated the head element is optional. That's the title one is required. Credit to mootinator for pointing out that the body is also optional.
So the simplest valid document will be:
<!DOCTYPE html>
<title></title>
(Assuming the HTML syntax of HTML5.)
Note that in some situations the title element is optional, too.
From HTML5’s definition of head:
The title element is a required child in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title element can be omitted.
So the minimal markup for a document that gets a title from a "higher-level protocol" is this:
<!DOCTYPE html>
If the document is the value of an iframe-srcdoc it’s this (assuming a title is provided by the container document):
<html>
And for a stand-alone document it’s this (the title element needs some actual content, as noted by kapep, so the "…" is just an example):
<!DOCTYPE html>
<title>…</title>
The title tag can't be empty or only consist of whitespace. So if the document is in a context where the title tag is required, you will have to set a valid title value.
The title content model is defined as "Text that is not inter-element whitespace".
"Empty Text nodes and Text nodes consisting of just sequences of [space characters]" are inter-element whitespace. Space characters are space, tab, line feed, form feed and carriage return.
If the title tag is empty, the W3C Validator complains that "Element title must not be empty". The Validator is fine with only adding just spaces, even though that is not correct according to the specs.
It is valid if you add another non-space character:
<!DOCTYPE html>
<title>x</title>
You could use other space characters like non-break space or zero-width non-break space if you want to fake an "empty" title.
The smallest HTML document for which the Nu Html Checker (the only HTML validator currently endorsed by the WHATWG) does not produce any errors nor warnings is the following:
<!DOCTYPE html>
<html lang="">
<title>x</title>

What's a valid HTML5 document?

I've just been reading the HTML5 author spec.
It states that the <html>, <head> and <body> tags are optional.
Does that mean that you can leave them out completely and still have a valid HTML5 document?
If I'm interpreting this correctly, it means this should be completely valid:
<!DOCTYPE html>
<p>Hello!</p>
Is this correct?
You can check out the spec here:
http://dev.w3.org/html5/spec-author-view/syntax.html#syntax
"8.1.2.4 Optional tags" is the bit out about it being OK to omit <html>, <head> and <body>
The title element is indeed required, but as Jukka Korpela notes, it also must be non-empty. Furthermore, the content model of the title element is:
Text that is not inter-element whitespace.
Therefore, having just a space character in the title element is not considered valid HTML. You can check this in W3C validator.
So, an example of a minimal and valid HTML5 document is the following:
<!doctype html><title>a</title>
This is the minimal HTML5-valid document:
<!doctype html><title> </title>
W3C HTML validator maintainer here. FYI with regard to the validator behavior, as of today, the validator now enforces the requirement in the HTML spec that the title element must contain at least one non-whitespace character -
http://validator.w3.org/nu/?doc=data%3Atext%2Fhtml%3Bcharset%3Dutf-8%2C%3C%2521doctype%2520html%3E%3Ctitle%3E%2520%2520%2520%3C%252Ftitle%3E
While the <html>, <head> and <body> start and end tags are optional, the <title> tags are required, except in special circumstances, so no, your sample is not (ordinarily) valid.

What do I need to put at the top of my HTML?

I have the following at the top of my document:
<html class="js" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
Can someone tell me if I need the xmlns part? I am not 100% sure but I think this is
doing some things to my tags. For example when I look at the tag is see the
following with firebug:
element.style {
height: 100%;
}
If I just have this as at the top of my code then I don't see the element.style ..
<html class="js" lang="en">
Just to give some background. I'm developing an MVC application for use with English. It uses HTML5 things in a few places.
For the current html spec, (which is html5) you will not need any fancy attributes, the following is adequate:
<!DOCTYPE html>
<html>
<head>
<title>Html page</title>
</head>
<body>
<p>This is an example Html page.</p>
</body>
</html>
Also, if you are not using the html5 spec, you should.
If you are using HTML5, then the extra tags probably should not be there as they are not needed any longer.. HTML5 uses a much cleaner syntax. :)
Here is the W3 documentation about this
You do not need to give those attributes in the tag.
<html>
</html>
will work fine even in HTML5 or HTML 4.01
The xmlns attribute may be needed if the document will be processed by XML tools that do not necessarily use HTML namespace as the default. You can see this by saving the document locally and opening it in Firefox; if the xmlns attribute is missing, Firefox will display the document source, just with XML syntax coloring, because it treats all tags just as pure markup with do meaning or default rendering rules.
If the document is served as HTML (Content-Type: text/html), then browsers will imply HTML semantics (HTML namespace).
Regarding the question you asked in the heading, you should put a doctype declaration, such as <!DOCTYPE html>, for all new documents. Otherwise you will trigger Quirks Mode, which means a large and undocumented set of oddities.

Why won't <iframe> elements validate in HTML 4.01?

I was just checking to see if it was valid to put an <iframe> element inside a <noscript> element as a fall back for displaying dynamic content. It validated fine with the HTML 5 doctype, but for HTML 4.01, I get the following error:
Line 9, Column 35: element "IFRAME" undefined
<iframe name="test" src="test.htm"></iframe>
You have used the element named above in your document, but the document type you are using does not define an element of that name. This error is often caused by:
incorrect use of the "Strict" document type with a document that uses frames (e.g. you must use the "Frameset" document type to get the "" element),
by using vendor proprietary extensions such as "" or "" (this is usually fixed by using CSS to achieve the desired effect instead).
by using upper-case tags in XHTML (in XHTML attributes and elements must be all lower-case).
This is what I whittled the HTML down to:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>I AM YOUR DOCUMENT TITLE REPLACE ME</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div>
<iframe name="test" src="test.htm"></iframe>
</div>
</body>
</html>
The <iframe> element is defined in the HTML 4.01 specification at the following URL: http://www.w3.org/TR/html401/present/frames.html#h-16.5.
It passes with a transitional doctype, so I guess my question is "Why is it disallowed in a strict doctype, even though it's defined in the specification?".
"Why is it disallowed in a strict doctype, even though it's defined in the specification?
Lots of things are defined in the specification but not allowed in Strict. <font> springs to mind. These are the things that the developers of the specification considered in need of documenting, were in use in browsers in the day, but which should be transitioned away from.
I can think of two reasons why they might have thought that:
"Why do iframes suck?".
<iframe> does (in theory) little that can't be achieved with <object>
iframe isn't included in html strict. For validation, try using the object element instead.
<object data="test.html" type="text/html"></object>
You should also add width and height attributes to the object element. Note, unlike iframes objects cannot be a target for any page links.
Unless for some reason you specifically need html4 strict validation, it's better to use the html5 doctype.