Google privacy page html invalid - html

Viewing the html source for Googles Privacy Page the header is:
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Google Privacy Center</title>
<link rel="stylesheet" href="//www.google.com/css/privacy.css">
<h1><img src="//www.google.com/intl/en/images/logo_sm.gif" alt="Google"> Privacy Center</h1>
I noticed there is no body tag here or in the footer. Also, no ending </html>.
Is this valid markup?

HTML5 (which is what they're declaring that page as) allows you to omit a lot of stuff. For instance, the body tag's start and end tags are both optional, as is the ending html tag (ref).
The validator says it's valid, but the validator's HTML5 support is also still experimental. YMMV

The validation page from validator.w3.org says it's valid HTML5.
But note that Google doesn't really care if their pages are valid markup or not, as long as they display correctly. Google's main page (google.com) is riddled with invalid markup.

HTML allows to omit certain start and/or end tags:
Some HTML element types allow authors to omit end tags (e.g., the P and LI element types). A few element types also allow the start tags to be omitted; for example, HEAD and BODY. The HTML DTD indicates for each element type whether the start tag and end tag are required.
If you examine a document type definition like the one of HTML 4.01, the elements are declared with the element declarations <!ELEMENT … >. And within such an element declarations, two characters specify whether the start or end tag of an element can be omitted. See the definition of P for example:
<!ELEMENT P - O (%inline;)* -- paragraph -->
Here the - after the element name P denotes that the start tag is required and the O denotes that the end tag may be omitted. Another example, the HEAD element:
<!ELEMENT HEAD O O (%head.content;) +(%head.misc;) -- document head -->
Here the two O specify that both the start and end tag can be omitted.
Omitting both tags on elements is only possible as such elements are implicit in their context. In case of HEAD, the content model of the parent element HTML is specified as follows:
<!ELEMENT HTML O O (%html.content;) -- document root element -->
Where the parameter entity html.content is defined as follows:
<!ENTITY % html.content "HEAD, BODY">
That means the content model of HTML is implicitly defined as HEAD followed by BODY.
You can take a look at the index of HTML 4.01 elements to see what tags of what elements can be omitted.

Reading W3C HTML5 Spec syntax tag omission :
A body element's start tag may be
omitted if the element is empty, or if
the first thing inside the body
element is not a space character or a
comment, except if the first thing
inside the body element is a script or
style element. A body element's end
tag may be omitted if the body element
is not immediately followed by a
comment.
An html element's end tag may be
omitted if the html element is not
immediately followed by a comment.
Funny thing is that the editor of the document is Ian Hickson of Google, Inc.

If you run it through the W3C validator you get that This document was successfully checked as HTML5!.

It is valid HTML5.
However, it is not XHTML.

Yes this is HTML5
http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fwww.google.com%2Fintl%2Fen%2Fprivacy.html

Related

Using html tags without <html> tag?

So i have an example , In my example.html
,I am using
<h1>example</h1>
While running with browser it is giving correct output.
But still I am finding how these tags works without <html> tag.
My finding says -
.html extension will consider these tags
Latest browsers is considering
Is there any other valuable answer regrading this?
The specification defines the HTML element as having start and end tags that may be omitted.
If you don't include them explicitly, the parser is required to imply the start and end of the element from context (which it can do unambiguously).
An html element's start tag can be omitted if the first thing inside
the html element is not a comment. An html element's end tag can be
omitted if the html element is not immediately followed by a comment.

HTML5 Tag Omission: Spec Clarification With Regards To Language

The HTML5 specification for tag omission (http://www.w3.org/TR/html51/syntax.html#syntax-tag-omission) starts with the following two statements (emphasis mine):
An html element's start tag may be omitted if the first thing inside
the html element is not a comment.
An html element's end tag may be omitted if the html element is not
immediately followed by a comment.
Those to statements read similarly, but not the same and I am wondering if someon can offer clarification on what they mean.
The following case seems unambiguous - you can't remove the start or close tags:
<html><!-- start --> ... </html><!-- end -->
But what about when whitespace is introduced into the mix. Can the start tag for html be eliminated in the following case?
<html>
<!-- comment after whitespace -->
...
Can the end tag be eliminated in a similar scenario?
...
</html>
<!-- comment after whitespace -->
Some of the other rules make specific mention of whitespace characters which leads me to believe that they should be taken into account. Most of the rules say "...immediately followed by..." which is different than the first bullet point listed.
The important factor here is that the phrases first thing inside and immediately following are talking about nodes i.e. the DOM, not tags or other markup, so the distinction it is making is about whether the node is a child (first thing inside) or a following sibling (immediately following).
As far as spaces go:
An html element's start tag may be omitted if the first thing inside
the html element is not a comment.
The first thing inside an html element cannot be a space character because at that point in the parser algorithm, space character tokens are discarded and not added to the DOM.
An html element's end tag may be omitted if the html element is not
immediately followed by a comment.
Space characters, regardless of whether they appear just before or just after the </html> tag, end up inside the html element (in fact, also inside the body element), so the comment will be immediately following the html element regardless of whether there are spaces in between in the markup.
In html, space between tags doesn't matter. <html> <head> and <html><head> are the same thing to the browser. In content (e.g. between words inside a span/p tag) it's rendered up by the browser, but when you want to use space between elements (as a design resource) you should use &nbsp.
So, as I see, immediately followed by doesn't mean "the next character" but "the first thing after the place that end tag was supposed to be, no matter how many spaces between them.
Then, removing html tags in both cases would invalidate the html, because no matter how many spaces are betweeen the place </html> was supposed to be and the comment.
edit: I think they were trying to express the same thing by using another words and avoid being repetitive, but ended up being confuse;
The rules about tag omission are somewhat misleading in that for the most part they're not actually saying when tags can be omitted, but rather how they should be interpreted when they are omitted. Take, for example, the following document:
<!DOCTYPE html><!-- A comment --><title>A title</title>
This is a valid HTML5: you can run it through the W3C validator yourself. But the tag omission rules clearly state that
[a]n html element's start tag may be omitted if the first thing inside the html element is not a comment.
How do we reconcile this? The answer is that these are disambiguation rules. Because an html element's start tag may not be omitted if the first thing inside it is a comment, we are free to assume when parsing that the comment is not the first thing inside the html element. Similarly, the tag omission rules state that
[a] body element's start tag may be omitted if the element is empty, or if the first thing inside the body element is not a space character or a comment [...]
So we are free to assume that the comment is also not the first thing inside the body element. So in fact this document can be unambiguously parsed as equivalent to
<!DOCTYPE html><!-- A comment --><html><head><title>A title</title></head><body></body></html>
The parser algorithm for HTML5 specifies that if we are in the before html insertion mode, which is the state the parser transitions to after seeing <!DOCTYPE html>, and we see
A character token that is one of U+0009 CHARACTER TABULATION, "LF" (U+000A), "FF" (U+000C), "CR" (U+000D), or U+0020 SPACE
then we are to "Ignore the token." If on the other hand we see a comment token, then we are to
Insert a comment as the last child of the Document object.
It's not until we see some other kind of tag that we emit an html element. So we should expect this behavior not to be affected by whitespace, and indeed both Firefox 54 and Chrome 60 interpret the document
<!DOCTYPE html>
<!-- A comment -->
<title>A title</title>
identically to
<!DOCTYPE html><!-- A comment --><title>A title</title>
That is, both of them are treated like
<!DOCTYPE html><!-- A comment --><html><head><title>A title</title></head><body></body></html>

Why does a stray </p> end tag generate an empty paragraph?

Apparently, if you have a </p> end tag with no matching start tag within the body element, most if not all browsers will generate an empty paragraph in its place:
<!DOCTYPE html>
<title></title>
<body>
</p>
</body>
Even if any text exists around the end tag, none of it is made part of this p element — it will always be empty and the text nodes will always exist on their own:
<!DOCTYPE html>
<title></title>
<body>
some text</p>more text
</body>
If the above contents of body are wrapped in <p> and </p> tags... I'll leave you to guess what happens:
<!DOCTYPE html>
<title></title>
<body>
<p>some text</p>more text</p>
</body>
Interestingly, if the </p> tag is not preceded by a <body> or </body> tag, all browsers except IE9 and older will not generate an empty paragraph (IE ≤ 9 on the other hand will always create one, while IE10 and later behave the same as all other browsers):
<!DOCTYPE html>
<title></title>
</p>
<!DOCTYPE html>
<title></title>
</p><body>
<!DOCTYPE html>
<title></title>
</p></body>
I can't find any references stipulating that an end tag with no corresponding start tag should generate an empty element, but that shouldn't come across as surprising considering that it's not even valid HTML in the first place. Indeed, I've only found browsers to do this with the p element (and to some extent the br element as well!), but not any explanation as to why.
It is rather consistent across browsers using both traditional HTML parsers and HTML5 parsers, though, applying both in quirks mode and in standards mode. So, it's probably fair to deduce that this is for backward compatibility with early specifications or legacy behavior.
In fact, I did find this comment on an answer to a somewhat related question, which basically confirms it:
The reason why <p> tags are valid unclosed is that originally <p> was defined as a "new paragraph" marker, rather than p being a container element. Equivalent to <br> being a "new line" marker. You can see so defined in this document from 1992:http://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html and this one from 1993: http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt Because there were web pages pre-dating the change and browser parsers have always been as backward compatible as possible with existing web content, it's always stayed possible to use <p> that way.
But it doesn't quite explain why parsers treat an explicit </p> end tag (with the slash) as simply... a tag, and generate an empty element in the DOM. Is this part of some parser error handling convention from way back when the syntax wasn't as strictly defined as it was more recently or something? If so, is it documented anywhere at all?
That it is required is documented in HTML5. See http://w3c.github.io/html/syntax.html#the-in-body-insertion-mode and search down for An end tag whose tag name is "p" and it says:
If the stack of open elements does not have an element in button scope
with the same tag name as that of the token, then this is a parse
error; act as if a start tag with the tag name "p" had been seen, then
reprocess the current token.
Which translated into English means create a p element if the </p> tag can't be matched with an existing <p> tag.
Why it is so, is harder to ascertain. Usually, this is because some browser in the past caused this to happen as a bug, and web pages came to rely on the behaviour, so other browsers had to implement it too.
The HTML4 DTD states that the end tag is optional for the paragraph element, but the start tag is required.
The SGML declaration for HTML4 states that omittag is 'yes', which means that the start tag can be implied.
The end tag follows SGML rules:
an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags
Anonymous block boxes are generated for inline elements such as text nodes, so they need not be wrapped by the paragraph element.
There's a thread in the Mozilla bug database which explains this behaviour:
Mozilla parses "half-tags" gullibly, leading to XSS security problems
Here's a relevant comment by Boris Zbarsky:
Actually, as I understand it, proper parsing of SGML/HTML requires that we
behave this way. That is, the '<' of the next tag is a valid way to close out
the markup of a previous tag...
And summarized by Ian Hickson:
The basic principle at work here, it appears, is that the markup is fixed up by delaying any closing tags until after all other open elements have been closed, and no attempt is made to make the DOM follow the HTML DTD.
References
SGML Productions
HTML 2.0 Specification
Arguments against SGML
Tag Soup: How UAs handle
Tag Soup: How Mac IE 5 and Safari handle
Web SGML and HTML 4.0 Explained
Testing SGML SHORTTAG support across browsers
Mozilla Bug 226495
Shorttag and Omittag
Jotting on parsers for SGML-family document languages: SGML, HTML, XML
A brief, opinionated history of XML - bobdc.blog

Should an end tag close all unclosed intervening start tags with omitted end tags?

Am I reading the HTML 4.01 standard wrong, or is Google? In HTML 4.01, if I write:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html> <head> <body>plain <em>+em <strong>+strong </em>-em
The rendering in Google Chrome is:
plain +em +strong -em
This seems to contradict the HTML 4.01 standard, which summarizes the underlying SGML rules as: “an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags”.¹
That is, the </em> end tag should close not only the <em> start tag but also the unclosed intervening <strong> start tag, and the rendering should be:
plain +em +strong -em
A commenter pointed out that it is bad practice to leave tags open, but this is only an academic example. An equally good example would be: <em> +em <strong> +strong </em> -em </strong>. It was my understanding from the HTML 4.01 standard that this code fragment would not work as intended because of the overlapping elements: the </em> end tag should implicitly close the <strong>. The fact that it did work as intended was surprising, and this is what led to my question.
And it turned out I proposed a false dichotomy in the question: neither Google nor I were reading the HTML 4.01 standard wrong. A private correspondent at w3.org pointed me to Web SGML and HTML 4.0 Explained by Martin Bryan, which explains that “[t]he parsing program will automatically close any currently open embedded element which has been declared as having omissible end-tags when it encounters an end-tag for a higher level element. (If an embedded element whose end-tag cannot be omitted is still open, however, the program will report an error in the coding.)”² (Emphasis added.) Bryan’s summarization of the SGML standard is right, and HTML 4.01’s summarization is wrong.
The statement quoted from the HTML 4.01 specification is very obscure, or just plain wrong on all accounts. HTML 4.01 has specific rules for end tag omission, and these rules depend on the element. For example, the end tag of a p element may be omitted, the end tag of an em may never be omitted. The statement in the specification probably tries to say that an end tag implicitly closes any inner elements that have not yet been closed, to the extent that end tag omission is allowed.
No browser has ever implement HTML 4.01 (or any earlier HTML specification) as defined, with the SGML features that are formally part of it. Anything that the HTML specifications say about SGML should be taken as just theoretical until proven otherwise.
HTML5 doesn’t change the rules of the game in this respect, except that it writes down the error handling rules. In simple issues like these, the rules just make the traditional browser behavior a norm. They are tagsoup-oriented, treating certain tags more or less as formatting commands: <em> means “italicize,” </em> means “stop italicizing,” etc. But HTML5 also takes measures to define error handling more formally so that despite such tag soup usage, it is well-defined what document tree in the DOM will be constructed.
Some tags are allowed to be omitted (such as the end tag for <p> or the start and end tags for <body>), and some are not (such as the end tag for <strong>). It is the former that the section of the spec you quote is referring to. You can identify them by the use of a dash in the DTD:
<!ELEMENT P - O (%inline;)* -- paragraph -->
^A p element
^ requires a start tag
^ has optional end tag
^ contains zero or more inline things
^ Comment: Is a paragraph
What you have is not an HTML document with an omitted tag, but and invalid pseudo-HTML document that browsers will try to perform error recovery on.
The specification (for HTML 4) does not describe how to perform error recovery, that is left up to browsers.
The specification says that:
Some HTML element types allow authors to omit end tags (e.g., the P and LI element types).
This:
Please consult the SGML standard for information about rules governing elements (e.g., they must be properly nested, an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags (section 7.5.1), etc.).
Applies to elements which can have omitted end tags.
If you look the P element spec you will see:
Start tag: required, End tag: optional
So, when you use this:
<DIV>
<P>This is the paragraph.
</DIV>
The P element will be automatically closed.
But, if you look at the EM spec, you will see:
Start tag: required, End tag: required
So this rule of automatic closing is not valid since the HTML is not valid.
Curiously all the browsers presented the same behavior with that kind of invalid HTML.
All modern browsers use an HTML5 parser (even for HTML 4.01 content), so the parsing rules of HTML5 apply. You can find more information at the Parsing HTML Documents section in the HTML5 spec.
HTML Outline
HTML
HEAD
#text " " ()
BODY
#text "plain " ()
EM
#text "+em " (italic)
STRONG
#text "+strong " (bold/italic)
STRONG
#text "-em" (bold)
If you try running your HTML through http://validator.w3.org/check it will flag up this HTML as being pretty much invalid.
If your HTML is invalid, all bets are off, and different browsers may render your HTML differently.
If you look at the D.O.M. in Chrome by right clicking and saying inspect element, you'll be able to deduce that since your tags do not match up, it applied an algorithm to decide where you messed up. Technically, it does close the strong tag at the correct place. However, It decides that you were probably trying to make both pieces of text bold, so it puts the last -em in an entirely new, extra "strong" element while keeping the '+strong' in it's own "strong" element. It looks to me like the chrome team decided it is statistically likely that you want both things to be bold.

why we use <html> tag although my website runs perfect without <html> tag

I need to know what is the use of <html> tag from the beginning of the webpage although website runs perfectly without these <html> </html> tags.
I knew that doctype is required but why this <html> tag is required.
The <html> tag is not required.
From the DTD:
<!ELEMENT HTML O O (%html.content;) -- document root element -->
The two Os indicate that the start and end tags (respectively) are optional.
The element, on the other hand, is required (but the language is designed so that browsers can imply it).
Since a DOM consists of a tree of nodes, you have to have one node (the root element) for everything else to hang from, and that is the html element.
It is also a really useful place to stick a lang attribute that will apply to the entire document.
You don't have to use it, it's optional:
7.3 The HTML element
Start tag: optional, End tag: optional
Source: http://www.w3.org/TR/html401/struct/global.html#h-7.3
It is optional tag, but some browsers add it to page, when you are browsing.
generally it works
but when we have to give some arguments like html vesion
any encryption then these are followed through tag