Empty xml element confusion - HTML

Empty xml element confusion - HTML - html

Here are two html snippets:
<html>
<head>
<title>foo</title>
<style type="text/css"></style>
</head>
<body>
bar
</body>
</html>
<html>
<head>
<title>foo</title>
<style type="text/css"/>
</head>
<body>
bar
</body>
</html>
Try rendering in Firefox, Chrome or IE - the two snippets render differently! But I thought both versions of an empty element are the same? (The style element)

According to HTML Specification end tag is required for STYLE element.
14.2.3 Header style information: the STYLE element
Start tag: required, End tag: required
So self-closed version of style is not correct HTML document part.

In XML it would be valid, but you call your snippets HTML, where it is not. – In HTML5 for example (serialized as HTML, not as XML), you didn’t close the style element yet (slash is ignored).

When served as text/html, they are parsed with an HTML parser, which treats <style type="text/css"/> just as a typo for <style type="text/css"> (i.e., ignores the slash before the tag close). This makes the rest of the document part of the style element and thus ignored. – This is the reason why XHTML 1.0, appendix C, recommends that the “self-closing” syntax (aka. “minimized syntax”) be used only for elements with EMPTY declared content.
When served as genuine XHTML, with an XML content type, they are processed identically, by XML rules. However, in the absence of an xmlns attribute, they are treated just as generic XML without a style, so in practice the browsers just display the XML code as-is.

Related

Can I put an entire HTML document in a template?

I want to store an entire HTML document to put in an iframe (srcdoc) later.
Am I allowed to put everything in a template including the html, head and body like this?
<template>
<html>
<head>
<title>Document</title>
</head>
<body>
<main>Content</main>
</body>
</html>
</template>
If not, what's the best way to store an entire document? Thanks!

Unfortunately, the template tag is not allowed to contain <html> tags. per 4.1.1 of the HTML specification:
Contexts in which [the <html>] element can be used:
As document's document element.
Wherever a subdocument fragment is allowed in a compound document.
and from 4.12.3, the <template> tag does not provide either of these contexts. For the same reason, you can't use <head>, <body> or <title> tags either. Chrome and Firefox both actively strip out the invalid tags from the <template>, preventing you from using it.
The best way of storing HTML for use in iframes is to put the HTML code in a different file in your web server.
However, an acceptable alternative is to store the HTML inside your <iframe>, then populating the srcdoc attribute with the content.
<iframe id="yourIframe">
<!-- Everything inside here is interpreted as text, meaning you can even put doctypes here. This is legal, per 12.2.6.4.7 and 4.8.5 of the HTML specification. -->
<!doctype html>
<html>
<head>
<title>Document</title>
</head>
<body>
<main>Content</main>
</body>
</html>
</iframe>
...
<script>
...
const iframe = document.getElementById("yourIframe");
iframe.srcdoc = iframe.innerHTML;
</script>

What are the drawbacks of ignoring <html> and <body>

Is there any drawback to never using
<html> and <body>
on your web pages that are written in HTML and PHP?
I mean, everything works perfectly fine with it or without it, so why use it?

They are explicitly optional in the spec (so the document will still be valid).
This has been true since the original spec (which says <!ELEMENT HTML O O (( HEAD | BODY | %oldstyle)*, PLAINTEXT?)>, O O meaning Start Tag Optional, End Tag Optional) through to the current spec (which says "An html element's start tag can be omitted if the first thing inside the html element is not a comment. An html element's end tag can be omitted if the html element is not immediately followed by a comment.").
They are only mandatory in XHTML since XML has no concept of optional tags.
I've never seen any browser or user-agent fail to handle them correctly in an HTML document. (Note that while the tags are optional, the elements are not, so browsers will insert an HTML, HEAD and BODY elements even if the tags are missing, so any script which tries to find them in the DOM will still work).
The only technical drawback is that you can't put attributes on tags which aren't there, and a lang attribute for the HTML element is useful.
Leaving them out can confuse other people who have to maintain your code who don't know that the tags are optional though.

Both <head> and <body> tags are optional in HTML5. In fact it is recommended by Google's HTML style guide to not use them:
<!-- Not recommended -->
<!DOCTYPE html>
<html>
<head>
<title>Spending money, spending bytes</title>
</head>
<body>
<p>Sic.</p>
</body>
</html>
<!-- Recommended -->
<!DOCTYPE html>
<title>Saving money, saving bytes</title>
<p>Qed.
By not using those tags, some drawbacks include:
that it is drastically different from what is typically learned for developers, so it may cause some confusion.
a restriction that a comment cannot be immediately after the <html> tag that is omitted.
Reiterating the optional nature of the tags from the spec:
An html element's start tag may be omitted if the first thing inside the html element is not a comment.
A body element's start tag may be omitted if the element is empty, or if the first thing inside the body element is not a space character or a comment, except if the first thing inside the body element is a meta, link, script, style, or template element.
See:
https://google.github.io/styleguide/htmlcssguide.xml?showone=Optional_Tags#Optional_Tags
https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission

If you do not use <html> and <body> than your HTML document will be not valid, some libraries/plugins may not work too.

What should a basic html5 file include?

I was wondering what an html file should ALWAYS include besides <html>, <header> and <body>. I've seen many things, so I'm not sure what to include ALWAYS.

The html, header and body tags are actually optional in HTML5. The only required element, aside from the doctype definition, is title. So the following would be a completely valid HTML5 document:
<!DOCTYPE html>
<title>example</title>
You can validate that using the W3C Validator.

Actually the start/end tags of the html/body tags are optional IF:
The <html> start tag is optional unless the first thing inside the html element is not a comment.
The <⁄html> end tag is optional unless the html element is not immediately followed by a comment.
see html tag, optional vs. required
and the body start/end tags may be omitted IF : see when

Is it necessary to write HEAD, BODY and HTML tags?

Is it necessary to write <html>, <head> and <body> tags?
For example, I can make such a page:
<!DOCTYPE html>
<meta http-equiv="Content-type" content="text/html; charset=utf-8">
<title>Page Title</title>
<link rel="stylesheet" type="text/css" href="css/reset.css">
<script src="js/head_script.js"></script><!-- this script will be in head //-->
<div>Some html</div> <!-- here body starts //-->
<script src="js/body_script.js"></script>
And Firebug correctly separates head and body:
The W3C validator says it's valid.
But I rarely see this practice on the web.
Is there a reason to write these tags?

Omitting the html, head, and body tags is certainly allowed by the HTML specifications. The underlying reason is that browsers have always sought to be consistent with existing web pages, and the very early versions of HTML didn't define those elements. When HTML first did, it was done in a way that the tags would be inferred when missing.
I often find it convenient to omit the tags when prototyping and especially when writing test cases as it helps keep the markup focused on the test in question. The inference process should create the elements in exactly the manner that you see in Firebug, and browsers are pretty consistent in doing that.
But...
Internet Explorer has at least one known bug in this area. Even Internet Explorer 9 exhibits this. Suppose the markup is this:
<!DOCTYPE html>
<title>Test case</title>
<form action='#'>
<input name="var1">
</form>
You should (and do in other browsers) get a DOM that looks like this:
HTML
HEAD
TITLE
BODY
FORM action="#"
INPUT name="var1"
But in Internet Explorer you get this:
HTML
HEAD
TITLE
FORM action="#"
BODY
INPUT name="var1"
BODY
See it for yourself.
This bug seems limited to the form start tag preceding any text content and any body start tag.

The Google Style Guide for HTML recommends omitting all optional tags.
That includes <html>, <head>, <body>, <p> and <li>.
From 3.1.7 Optional Tags:
For file size optimization and scannability purposes, consider
omitting optional tags. The HTML5 specification defines what tags can
be omitted.
(This approach may require a grace period to be established as a wider
guideline as it’s significantly different from what web developers are
typically taught. For consistency and simplicity reasons it’s best
served omitting all optional tags, not just a selection.)
<!-- Not recommended -->
<!DOCTYPE html>
<html>
<head>
<title>Spending money, spending bytes</title>
</head>
<body>
<p>Sic.</p>
</body>
</html>
<!-- Recommended -->
<!DOCTYPE html>
<title>Saving money, saving bytes</title>
<p>Qed.

Contrary to Liza Daly's note about HTML5, that specification is actually quite specific about which tags can be omitted, and when (and the rules are a bit different from HTML 4.01, mostly to clarify where ambiguous elements like comments and whitespace belong)
The relevant reference is 8.1.2.4 Optional tags, and it says:
An html element's start tag may be omitted if the first thing inside the html element is not a comment.
An html element's end tag may be omitted if the html element is not immediately followed by a comment.
A head element's start tag may be omitted if the element is empty, or if the first thing inside the head element is an element.
A head element's end tag may be omitted if the head element is not immediately followed by a space character or a comment.
A body element's start tag may be omitted if the element is empty, or if the first thing inside the body element is not a space character or a comment, except if the first thing inside the body element is a script or style element.
A body element's end tag may be omitted if the body element is not immediately followed by a comment.
So your example is valid HTML5, and would be parsed like this, with the html, head and body tags in their implied positions:
<!DOCTYPE html><HTML><HEAD>
<meta http-equiv="Content-type" content="text/html; charset=utf-8">
<title>Page Title</title>
<link rel="stylesheet" type="text/css" href="css/reset.css">
<script src="js/head_script.js"></script></HEAD><BODY><!-- this script will be in head //-->
<div>Some HTML content</div> <!-- here body starts //-->
<script src="js/body_script.js"></script></BODY></HTML>
Note that the comment "this script will be in head" is actually parsed as part of the body, although the script itself is part of the head. According to the specification, if you want that to be different at all, then the </HEAD> and <BODY> tags may not be omitted. (Although the corresponding <HEAD> and </BODY> tags still can be.)

It's true that the HTML specifications permit certain tags to be omitted in certain cases, but generally doing so is unwise.
It has two effects - it makes the specification more complex, which in turn makes it harder for browser authors to write correct implementations (as demonstrated by Internet Explorer getting it wrong).
This makes the likelihood of browser errors in these parts of the specification high. As a website author, you can avoid the issue by including these tags - so while the specification doesn't say you have to, doing so reduces the chance of things going wrong, which is good engineering practice.
What's more, the latest HTML 5.1 WG specification currently says (bear in mind it’s a work in progress and may yet change).
A body element's start tag may be omitted if the element is empty, or
if the first thing inside the body element is not a space character or
a comment, except if the first thing inside the body element is a
meta, link, script, style, or template element.
From 4.3.1 The body element.
This is a little subtle. You can omit body and head, and the browser will then infer where those elements should be inserted. This carries the risk of not being explicit, which could cause confusion.
So this
<html>
<h1>hello</h1>
<script ... >
...
results in the script element being a child of the body element, but this
<html>
<script ... >
<h1>hello</h1>
would result in the script tag being a child of the head element.
You could be explicit by doing this:
<html>
<body>
<script ... >
<h1>hello</h1>
and then whichever you have first, the script or the h1, they will both, predictably appear in the body element. These are things which are easy to overlook while refactoring and debugging code (say for example, you have JavaScript which is looking for the 1st script element in the body - in the second snippet it would stop working).
As a general rule, being explicit about things is always better than leaving things open to interpretation. In this regard, XHTML is better, because it forces you to be completely explicit about your element structure in your code, which makes it simpler, and therefore less prone to misinterpretation.
So yes, you can omit them and be technically valid, but it is generally unwise to do so.

It's valid to omit them in HTML 4:
7.3 The HTML element
start tag: optional, End tag: optional
7.4.1 The HEAD element
start tag: optional, End tag: optional
From 7 The global structure of an HTML document.
In HTML5, there are no "required" or "optional" elements exactly, as HTML5 syntax is more loosely defined. For example, title:
The title element is a required child in most situations, but when a higher-level protocol provides title information, e.g. in the Subject line of an e-mail when HTML is used as an e-mail authoring format, the title element can be omitted.
From 4.2.2 The title element.
It's not valid to omit them in true XHTML5, though that is almost never used (versus XHTML-acting-like-HTML5).
However, from a practical standpoint you often want browsers to run in "standards mode," for predictability in rendering HTML and CSS. Providing a DOCTYPE and a more structured HTML tree will guarantee more predictable cross-browser results.

Firebug shows this correctly because your browser automagically fixes the bad markup for you. This behaviour is not specified anywhere and can (will) vary from browser to browser. Those tags are required by the DOCTYPE you're using and should not be omitted.
The HTML element is the root element of every html page. If you look at all other elements' description it says where an element can be used (and almost all elements require either head or body).

Why is my style not being applied to a non-HTML element in IE?

This doesn't work in IE6 or 7:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Title</title>
<style type="text/css">
N {display: block}
</style>
</head>
<body>
<div>
<N>element1</N>
<N>element2</N>
<N>element3</N>
<N>element4</N>
</div>
</body>
</html>
however, it does if I replace the N tags with A tags.
Does IE have a problem with styling non-HTML tags? Or is it something else?

Does IE have a problem with styling
non-HTML tags?
Yes. It won't.
You could hack it using:
<script type="text/javascript">
document.createElement('n');
</script>
… but that won't work if JS is not available and the document is still invalid.
If no element exists that describes the semantics you want, then use the one that matches most closely (or div/span if nothing better exists) and add classes.
(Or switch to a custom XML language)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008