Technically speaking would this block of code be valid (would a test say it's valid): <body><span>Some text</span></body> as opposed to <body><p><span>Some text</span></p></body> - which I know is valid
Yes that's html would be valid for both cases.
Take a look on HTML w3c Validator for verifrying whther html valid or not.
It is valid HTML5, which is the only HTML suitable for most practical needs today. However, formally it won't validate under outdated HTML 4 Strict and XHTML 1 Strict doctypes which required some block-level wrapper between body and text content.
Related
I have read the HTML5 specification, the microdata specification, and the WHATWG HTML5 (with microdata) specification. These are well written and easy to understand.
But now I read the schema.org Book specification, and came across snippets like the following:
<span itemprop="price" content="6.99">$6.99</span>
<span itemprop="inLanguage" content="en">English-language</span>
<span itemprop="name" content="Tolkien, J. R. R. (John Ronald Reuel)">
J. R. R. Tolkien</span>
Apparently (compare with the JSON version), the values of these microdata properties are the values of the content attributes of the span elements. (Of course, if there is no content attribute, the value is instead the textContents of the span element.)
But I cannot find any support for this practice in the HTML and microdata specifications. In fact, I cannot even find any evidence that there is a content attribute on span elements at all!
The microdata specification doesn't say anything about a span content attribute when it gives the rules for values. [Unless 'the element's textContent' is overridden by the content attribute, but I cannot find any support for this either.]
Not even the full WHATWG HTML5+microdata specification supports the claim that there is a content attribute on span (see The span element and Global attributes).
So, I suppose the schema.org example is non-conforming. But is it also plain wrong? If not, where does this practice come from, and how accepted is it?
Yes, this is wrong. Neither Microdata nor HTML5 define a content attribute for the span element.
Several people wanted to use it, see for example the code in these questions:
Hide Microdata property value in 'content' attribute?
Categories for Product in schema.org?
Is the "content" attribute valid for the <span> tag > if so is it a good practice?
schema.org product availability tags markup
I’m not sure where exactly this confusion is coming from.
(It doesn’t help that Google’s Structured Data Testing Tool incorrectly uses the content attribute instead of the element content; but at least all other Microdata parsers seem to do it correctly.)
Maybe some people got confused because RDFa (but not Microdata) defines and allows the content attribute for span. See HTML+RDFa’s Extensions to the HTML5 Syntax:
For the avoidance of doubt, the following RDFa attributes are allowed on all elements in the HTML5 content model: #vocab, #typeof, #property, #resource, #prefix, #content, #about, #rel, #rev, #datatype, and #inlist.
(Sorry, I didn't have enough reputation to post a comment.)
We're at the end of 2017 now. Somehow, the MDN webdocs (https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/itemprop)
and the schema docs (http://schema.org/telephone) still propose to use a content attribute on span using microdata. No html5 validator will accept this of course.
Am I reading the HTML 4.01 standard wrong, or is Google? In HTML 4.01, if I write:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html> <head> <body>plain <em>+em <strong>+strong </em>-em
The rendering in Google Chrome is:
plain +em +strong -em
This seems to contradict the HTML 4.01 standard, which summarizes the underlying SGML rules as: “an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags”.¹
That is, the </em> end tag should close not only the <em> start tag but also the unclosed intervening <strong> start tag, and the rendering should be:
plain +em +strong -em
A commenter pointed out that it is bad practice to leave tags open, but this is only an academic example. An equally good example would be: <em> +em <strong> +strong </em> -em </strong>. It was my understanding from the HTML 4.01 standard that this code fragment would not work as intended because of the overlapping elements: the </em> end tag should implicitly close the <strong>. The fact that it did work as intended was surprising, and this is what led to my question.
And it turned out I proposed a false dichotomy in the question: neither Google nor I were reading the HTML 4.01 standard wrong. A private correspondent at w3.org pointed me to Web SGML and HTML 4.0 Explained by Martin Bryan, which explains that “[t]he parsing program will automatically close any currently open embedded element which has been declared as having omissible end-tags when it encounters an end-tag for a higher level element. (If an embedded element whose end-tag cannot be omitted is still open, however, the program will report an error in the coding.)”² (Emphasis added.) Bryan’s summarization of the SGML standard is right, and HTML 4.01’s summarization is wrong.
The statement quoted from the HTML 4.01 specification is very obscure, or just plain wrong on all accounts. HTML 4.01 has specific rules for end tag omission, and these rules depend on the element. For example, the end tag of a p element may be omitted, the end tag of an em may never be omitted. The statement in the specification probably tries to say that an end tag implicitly closes any inner elements that have not yet been closed, to the extent that end tag omission is allowed.
No browser has ever implement HTML 4.01 (or any earlier HTML specification) as defined, with the SGML features that are formally part of it. Anything that the HTML specifications say about SGML should be taken as just theoretical until proven otherwise.
HTML5 doesn’t change the rules of the game in this respect, except that it writes down the error handling rules. In simple issues like these, the rules just make the traditional browser behavior a norm. They are tagsoup-oriented, treating certain tags more or less as formatting commands: <em> means “italicize,” </em> means “stop italicizing,” etc. But HTML5 also takes measures to define error handling more formally so that despite such tag soup usage, it is well-defined what document tree in the DOM will be constructed.
Some tags are allowed to be omitted (such as the end tag for <p> or the start and end tags for <body>), and some are not (such as the end tag for <strong>). It is the former that the section of the spec you quote is referring to. You can identify them by the use of a dash in the DTD:
<!ELEMENT P - O (%inline;)* -- paragraph -->
^A p element
^ requires a start tag
^ has optional end tag
^ contains zero or more inline things
^ Comment: Is a paragraph
What you have is not an HTML document with an omitted tag, but and invalid pseudo-HTML document that browsers will try to perform error recovery on.
The specification (for HTML 4) does not describe how to perform error recovery, that is left up to browsers.
The specification says that:
Some HTML element types allow authors to omit end tags (e.g., the P and LI element types).
This:
Please consult the SGML standard for information about rules governing elements (e.g., they must be properly nested, an end tag closes, back to the matching start tag, all unclosed intervening start tags with omitted end tags (section 7.5.1), etc.).
Applies to elements which can have omitted end tags.
If you look the P element spec you will see:
Start tag: required, End tag: optional
So, when you use this:
<DIV>
<P>This is the paragraph.
</DIV>
The P element will be automatically closed.
But, if you look at the EM spec, you will see:
Start tag: required, End tag: required
So this rule of automatic closing is not valid since the HTML is not valid.
Curiously all the browsers presented the same behavior with that kind of invalid HTML.
All modern browsers use an HTML5 parser (even for HTML 4.01 content), so the parsing rules of HTML5 apply. You can find more information at the Parsing HTML Documents section in the HTML5 spec.
HTML Outline
HTML
HEAD
#text " " ()
BODY
#text "plain " ()
EM
#text "+em " (italic)
STRONG
#text "+strong " (bold/italic)
STRONG
#text "-em" (bold)
If you try running your HTML through http://validator.w3.org/check it will flag up this HTML as being pretty much invalid.
If your HTML is invalid, all bets are off, and different browsers may render your HTML differently.
If you look at the D.O.M. in Chrome by right clicking and saying inspect element, you'll be able to deduce that since your tags do not match up, it applied an algorithm to decide where you messed up. Technically, it does close the strong tag at the correct place. However, It decides that you were probably trying to make both pieces of text bold, so it puts the last -em in an entirely new, extra "strong" element while keeping the '+strong' in it's own "strong" element. It looks to me like the chrome team decided it is statistically likely that you want both things to be bold.
I am debugging a layout right now and have come across some strange errors. and I am serving the page up as DTD XHTML 1.0 Strict.
The error shows is like this
ID "OFFICENAME" already defined:
div class="office" id="officename"
ID "OFFICENAME" first defined here
span id="officename">
and
NET-enabling start-tag requires SHORTTAG YES
This error is showing in the break code
<br />
Please any one help me out of this and tell me the correct way of representing
id must be unique. You can't have two elements with the same ID. You should remove one of the ids or use class instead. You can have multiple classes on any given element, e.g.:
class="office officename"
In HTML/SGML meaning of / is different than in XHTML: <foo/bar/ is <foo>bar</foo> and <foo/> is <foo></foo>> (that's an archaic quirk supported only by W3C validator).
You're probably sending XHTML markup as HTML. Use text/html MIME type with HTML5 DOCTYPE instead (you'll get better compatiblity, better validation and /> talismans will be allowed).
<!DOCTYPE html>
You can't have multiple elements with the same id. Change the id on the span or the div to something else.
I have a html document:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" >
In it I have tags such as
<br />
But Im reading that this tag is an XHTML element. Yet it still works why?
Original answer based on the question as written before a character was moved and completely changed it:
But Im reading that </br> is an XHTML element.
It isn't. Is is the end tag for an element.
<br /> would be a self closing tag (representing an entire element) in XHTML. In HTML 4 it means the same as <br>> (although most browsers don't respect that) and in HTML 5 the / is meaningless syntactic sugar to keep people used to XHTML happy.
In XHTML <br/> means the same as <br></br> (the latter is an error in HTML documents).
Yet it still works why?
Browsers perform enormous amounts of error correction to try to deal with the sort of bad markup that was prevalent in the late 90s.
They are not always consistent in how they recover from different errors (for example, I believe that some browsers will ignore that completely while others will treat it as a line break), so you should never depend on this behaviour.
Browsers failed to implement parsers that correctly handled HTML 4 and earlier.
They should have treated <br/> as "A br element followed by a greater than sign", but instead implemented it as "A br element with a / attribute, what's a / attribute? We'll drop it". This led to the feature being marked as something to avoid.
XHTML then exploited the bug for HTML-Compatible XHTML.
HTML 5 then redefined it as syntactic sugar so the XHTML junkies could keep on using the syntax they were used to.
It's the browser that get rid of these differences. Anyways the </br> with that slash is incorrect both in HTML and XHTML.
Occurring to http://www.w3schools.com/tags/tag_br.asp
In HTML the <br> tag has no end tag.
In XHTML the <br> tag must be properly
closed, like this: <br />.
Self closing tag is valid format in XML
XHTML means all tag must be closed
HTML
<br> valid
<br/> valid
XHTML
<br> invalid
<br/> valid
Edited:
</br> is invalid anyway and you are lucky if browser fix it :)
</br> is the same as <div id="gd"/>, both are invalid
I was wondering if I can write self closing elements like in XHTML in HTML5, for example, <input type="email"> can be <input type="email" />, and will it still validate? And is this the correct way to code HTML5 web pages?
HTML5 can either be coded as XHTML, or as HTML 4. It's flexible that way.
As to which is the correct way, that's a preference. I suspect that many web designers into standards are used to XHTML and will probably continue to code that way.
You can go straight to: http://html5.validator.nu/ to validate your code, or if you have the right doctype, the official W3C site will use it for you.
Self-closing tags may lead to some parsing errors. Look at this:
<!DOCTYPE html>
<html>
<head><title>Title</title></head>
<body>
<div>
<p>
<div/>
</p>
</div>
</body>
</html>
While it is perfectly valid HTML4, it is invalid in HTML5.
W3C validation complains about <div/>:
Self-closing syntax (/>) used on a non-void HTML element. Ignoring the slash and treating as a start tag.
If innermost self-closed div is treated as start tag, it breaks whole structure, so be careful.
Either will work, just try to be consistent.
Same goes for quoting attributes - I've read tutorials that discourage quoting one word attribute variables. I would quote them all, at least for consistency (unless you have a popular web app where every byte is precious).