Why put # wombat urls ß before actual HTML code? - html

Every page on the plosone website seems to have:
# wombat urls
ß
Before any actual HTML code begins. Why would this be? I've seen a tick being placed in the title before to check (force?) UTF8 support, but it seems weird to have this whole phrase there, and even weirder to place it outside of the HTML code...

It is likely a mistake
Whether intentional or by accident, the resulting code is invalid. The code snippet you reference is likely an unintentional relic from a server-side script.
According to the W3C, documents must consist of the following parts, in the given order:
Optionally, a single "BOM" (U+FEFF) character.
Any number of comments and space characters.
A DOCTYPE.
Any number of comments and space characters.
The root element, in the form of an html element.
Any number of comments and space characters.
The code snippet is not a "BOM" (byte-order mark) character, HTML comment, nor combination of space characters; so it is invalid.
It is also worth noting that while most modern browsers can cope with this invalid code snippet (by ignoring it), it would trigger some versions of Internet Explorer to render the page content in "Quirks Mode". See this answer to the question "Can comments appear before the DOCTYPE declaration?"

Related

Odd ui-grid bug with <!DOCTYPE html[]>

I'm experiencing what seems to be a bug in Angular's ui-grid. My index.html page has this at the top:
<!DOCTYPE HTML[]>
When I run the app, the column headers of the grids scroll of the page:
Now, if I remove the brackets, like this:
<!DOCTYPE HTML>
The grid displays correctly:
Has anyone worked through this? Is there a fix?
Note: I could remove the brackets and leave it at that, but our deployment tool modifies the index.html file at deployment time and adds the brackets, because that's apparently well-formed HTML.
"Well-formed" is a concept that only makes sense in XML. In rough terms, it means that every element has an explicit start tag and an explicit end tag and they they are in the right places. It has nothing to do with the content of the Doctype declaration.
The latest version of the HTML specification says:
A DOCTYPE must consist of the following components, in this order:
A string that is an ASCII case-insensitive match for the string "
One or more space characters.
A string that is an ASCII case-insensitive match for the string "html".
Optionally, a DOCTYPE legacy string or an obsolete permitted DOCTYPE string (defined below).
Zero or more space characters.
A ">" (U+003E) character.
… so the change your deployment tool is making is neither "well-formed" nor in any way correct.
Breaking the Doctype in that way triggers Quirks Mode. This makes browsers backwards compatible with the browsers of the late 1990s by emulating many of the bugs they featured.
The CSS is breaking because it depends on those bugs not being present.
You could probably rewrite all the CSS so it is designed to work in Quirks Mode, but fixing the deployment tool so it doesn't break the Doctype would be better.

HTML Minification: Whitespace between element attributes

I'd like to remove more unnecessary bytes from my output, and it seems it's acceptable (in practice) to strip what can add up to quite a lot of whitespace from HTML markup by omitting/collapsing the gaps between DOM element attributes.
Although I've tested and researched (a little in both cases), I'm wondering how safe it would be?
I tested in Chrome (43.0.2357.65 m), IE (11.0.9600.17801), FF (38.0.1) and Safari (5.1.7 (blah-di-blah)) and they didn't seem to mind, and couldn't find anything specific in The Specs about whitespace between attributes.
w3.org's Validator complains, which is a strong indication that this is not safe and shouldn't be expected to work, but (there's always a "but") it's possible the requirement for a space is only strict when no quotes are present (for obvious reasons).
Also (snippy but poignant): their SSL is "out of date" which doesn't inspire confidence in their opinion.
I noted also that someone's HTML compressor could (when enabled) strip quotes around attribute values where those values had no whitespace within them (e.g. id), which implies that at least most if not all HTML parsing is focussed on the text either side of the equals signs (except with booleans of course), and where quotes are in use, they'd be considered the prioritized delimiter.
So, would:
<!DOCTYPE html><html><body>
Yabba Dabba Doo!
</body></html>
▲ that ever go wrong, and if so, under which conditions?
What other reasons could there be to maintain this whitespace in production output (code "readability" is a non issue in this case)?
Update (since finding an answer):
Although I basically answered my own question insofar that there is a specification governing whether there should be a space between attributes, I still wonder if omitting them when using quoted values can be considered practically safe, and would appreciate feedback on this point.
Considering how often spaces may be omitted by accident in production HTML, and that the browsers I tested don't seem to mind when they are, I assume it would be very rare if ever that a browser failed to handle documents with these spaces omitted.
Although it's sensible to follow the specs in pretty much all situations, might this be one time cheating a bit could be acceptable?
After all - if we can magically save several hundred bytes without affecting the quality of the output, why not?
There is a specification (after all)
It turns out I should have looked harder. My bad.
According to these specs:
If an attribute using the empty attribute syntax is to be followed by another attribute, then there must be a space character separating the two.
and
If an attribute using the unquoted attribute syntax is to be followed by another attribute or by the optional U+002F SOLIDUS character (/) allowed in step 6 of the start tag syntax above, then there must be a space character separating the two.
and
If an attribute using the single-quoted attribute syntax is to be followed by another attribute, then there must be a space character separating the two.
and
If an attribute using the double-quoted attribute syntax is to be followed by another attribute, then there must be a space character separating the two.
Which unless I am mistaken (again), means there must always be spaces between attributes.
You could try online HTML minifiers like http://www.whak.ca/minify/HTML.htm or http://www.scriptcompress.com/minify-HTML.htm (search google for more) and find little things they change for hints to what can be taken out yet still render the HTML code.
On the first link your code:
<!DOCTYPE html><html><body>
Yabba Dabba Doo!
</body></html>
Turns into:
<!DOCTYPE html><html><body>Yabba Dabba Doo!
saving you 18 bytes already...

Does HTML5 change the standard for HTML commenting?

Recently I found that there is, possibly, a new way of commenting in HTML5.
Instead of the typical <!-- --> multi-line commenting I've read about, I thought I noticed that my IDE made a regular <!div > commented out. So I tested it out, and to my surprise Chrome had commented out that tag. It only commented out the tag and not the contents of the div, so I had to comment out the closer <!/div> to avoid closing other divs.
I tested another and it appears that generally putting an exclamation marker in front of the opening of any tag, this symbol <, makes that tag commented out.
Is this actually new? Is it bad practice? It is actually very convenient, but is it practical yet (if not new)?
Extra details:
Although a syntax error or misinterpretations of this particular syntax is a good reason, how come Chrome actually renders them as full comments?
The code is written as:
<!div displayed> some text here that is still displayed <!/div>
And then it is rendered as:
<!--div displayed--> some text here that is still displayed <!--/div-->
There is no new standard for comments in HTML5. The only valid comment syntax is still <!-- -->. From section 8.1.6 of W3C HTML5:
Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS (<!--).
The <! syntax originates in SGML DTD markup, which is not part of HTML5. In HTML5, it is reserved for comments, CDATA sections, and the DOCTYPE declaration. Therefore whether this alternative is bad practice depends on whether you consider the use of (or worse, the dependence on) obsolete markup to be bad practice.
Validator.nu calls what you have a "Bogus comment." — which means that it's treated like a comment even though it's not a valid comment. This is presumably for backward compatibility with pre-HTML5, which was SGML-based, and had markup declarations that took the form <!FOO>, so I wouldn't call this new. The reason they're treated like comments is because SGML markup declarations were special declarations not meant to be rendered, but since they are meaningless in HTML5 (with the above exceptions), as far as the HTML5 DOM is concerned they are nothing more than comments.
The following steps within section 8.2.4 lead to this conclusion, which Chrome appears to be following to the letter:
8.2.4.1 Data state:
Consume the next input character:
"<" (U+003C)
Switch to the tag open state.
8.2.4.8 Tag open state:
Consume the next input character:
"!" (U+0021)
Switch to the markup declaration open state.
8.2.4.45 Markup declaration open state:
If the next two characters are both "-" (U+002D) characters, consume those two characters, create a comment token whose data is the empty string, and switch to the comment start state.
Otherwise, if the next seven characters are an ASCII case-insensitive match for the word "DOCTYPE", then consume those characters and switch to the DOCTYPE state.
Otherwise, if there is an adjusted current node and it is not an element in the HTML namespace and the next seven characters are a case-sensitive match for the string "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET character before and after), then consume those characters and switch to the CDATA section state.
Otherwise, this is a parse error. Switch to the bogus comment state. The next character that is consumed, if any, is the first character that will be in the comment.
Notice that it says to switch to the comment start state only if the sequence of characters encountered is <!--, otherwise it's a bogus comment. This reflects what is stated in section 8.1.6 above.
8.2.4.44 Bogus comment state:
Consume every character up to and including the first ">" (U+003E) character or the end of the file (EOF), whichever comes first. Emit a comment token whose data is the concatenation of all the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character (i.e. up to the character just before the U+003E or EOF character), but with any U+0000 NULL characters replaced by U+FFFD REPLACEMENT CHARACTER characters. (If the comment was started by the end of the file (EOF), the token is empty. Similarly, the token is empty if it was generated by the string "<!>".)
In plain English, this turns <!div displayed> into <!--div displayed--> and <!/div> into <!--/div-->, exactly as described in the question.
On a final note, you can probably expect other HTML5-compliant parsers to behave the same as Chrome.
I don't think this is a good habit to take since <! stands for markup declarations like <!DOCTYPE. Thus you think it's commented (well... browser will try to interpret it).
Even if it doesn't appear, this seems not to be the correct syntax for commenting HTML code.

Space Before Closing Slash?

I've frequently seen a space preceding the closing slash in XML and HTML tags. The XHTML line break is probably the canonical example:
<br />
instead of:
<br/>
The space seems superfluous. In fact, I think that it is superfluous.
What is the reason for writing this space?
I've read that the space solves some "backwards compatibility issues." Which backwards compatibility issues? Are those issues still relevant, or are we still adding extra spaces for the sake of, say, IE3 compatibility? Does there exist some spec with the definitive answer on this?
If not backwards compatibility, then is it a readability issue? Similar to the Great Open Curly Brace debate?
void it_goes_up_here() {
int no_you_fool_it_goes_down_there()
{
I can certainly respect differing stylistic opinions, so I'll be happy to learn that writing the space is simply a matter of taste.
The answer is people wish to adhere to Appendix C of the XHTML1.0 specification. Which you only need to do if you are serving XHTML as text/html. Which most people do, because XHTML's real MIME type (application/html+xml) does not work in Internet Explorer.
No current browser cares for the space. Browsers are very tolerant of these things.
The space used to be required to ensure HTML parsers treated the trailing slash as an unrecognised attribute.
Supporting bobince's answer with screenshot of Netscape 4.80 showing documents
data:text/html,<title>space</title>foo<br />bar
(top left, linebreak rendered) and
data:text/html,<title>no space</title>foo<br/>bar
(bottom left, linebreak ignored).
Posting as answer to show the picture
Tangentially related: in fact I had a lengthy answer identifying the cause of such misbehaviour of ancient browsers (and resulting recommendation to include space) in misunderstood SGML specs, namely SGML Null End Tag (NET) (where 1<tag/2/3 equals 1<tag>2</tag>3 so 1<tag/>2 would actually mean 1<tag>>2), but not only I was unable to find good proof and concrete version of standard, I wasn't even able to grasp proper standard-complying behaviour. So few raw links for reference:
w3c validator notice mentioning problematic closing slash and pointing to
Empty elements in SGML, HTML, XML, and XHTML # www.cs.tut.fi/~jkorpela/
Beware of XHTML: Null End Tags (NET) stating, that
However, there are still some smaller user agents that properly support Null End Tags. One of the more well-known user agents that support it is the W3C validator.
(Unable to reproduce there now, but supports Lee Kowalkowski's statement about multiple browsers affected by this.)
XML W3C Working Draft 07-Aug-97 - latest specs draft that includes reference of Null End Tag in DTD snippet: NET "/>"
Are those issues still relevant or are we still adding extra spaces for the sake of, say, IE3 compatibility?
You were close - it is for Netscape 4.
It is interesting to see other rationalisations, but that's all it was meant for.
No, the space is not required but it is necessary for some older browsers to render those tags correctly. The proper way to do it is without the extra space as this is something XHTML inherited from XML.
In XHTML, br tags must be closed, but the space is not necessary. It's a stylistic thing. In HTML, br tags cannot be closed, so both are wrong.
The space just makes the tags more readable. I am a big proponent of formatting for more readable code. Little things like that go a long way. Without the space the closing tag blends in with the opening tag. It takes just an instant longer for me to process it as I am quickly reading the code.
I think that the white space is a way to reinforce the idea that this tag is empty and it closes itself.
Today i don't use the white space anymore because i never had a problem with no white space.
What if there was a very lazy html writer out there or maybe he had a fear of quotation marks.
Consider the following if you were his robot page crawler...
<img src=http://myunquotedurl.com/image.jpg />
versus
<img src=http://myunquotedurl.com/image.jpg/>
This might seem small but look what it can do if the space isn't there. The robot won't know if the slash is part of the url or part of the closing tag.

Does the DOCTYPE declaration have to be the first tag in an HTML document? [duplicate]

This question already has answers here:
Can comments appear before the DOCTYPE declaration?
(5 answers)
Closed 6 years ago.
Our security manager dynamically inserts a bit of javascript at the top of every html page when a page is requested by the client. It is inserted above the DOCTYPE statement. I think this might be the cause of the layout problems I am having.
Ideas anyone?
Yes, the DOCTYPE must come first.
The definition is here: http://www.w3.org/TR/REC-html40/struct/global.html. Note that it says a document consists of three parts, and the DTD must be first.
Yes, DOCTYPE must be the first data on the page: http://www.w3schools.com/tags/tag_DOCTYPE.asp
The recommendation for HTML expresses it as an application of SGML, which requires that the DOCTYPE declaration appear before the HTML element (ignoring HTML comments). Even without the DOCTYPE, adding a SCRIPT element outside the HTML element (either before it or after it) is not valid HTML. Of course, HTML validity may not be a requirement for you, so long as it works in most browsers, and then the quirks-mode switching mentioned will get you: without the DOCTYPE, many browsers will switch to quirks mode, possibly changing the layout.
I assume the TAM script fragment is being added by some proxy or other which is not able to properly analyse the HTML structure of the page and insert the SCRIPT in the correct position in the HEAD or BODY of the document. In this case, adding to the end of the document, while not valid HTML, will work in most web browsers.
It could be the source of your problem though!
Check out "quirks mode" as that depends on doctype settings.
Further study : http://www.quirksmode.org/
explanation: you can toggle your browser into (mostly IE) strict standards compilant mode, and loose mode. This will greatly affect rendering. TAM's setting could have switched this on/off.
I read the w3 specs which just say that there are 3 parts to a document. The sequence is assumed and there is no explicit statement forbidding, for example, a little js snippit up front.
I understand that it is possible to configure TAM to add the js at the end of the dicument but it beats me why they put it up top if it can cause such obvious problems!
W3c (at w3.org), on a page called html5/syntax.html, says "a DOCTYPE is a required preamble" which I interpret to mean it is required and that it must come first.
It also says it must consist of the following components in this order:
A string that is an ASCII case-insensitive match for the string <!DOCTYPE.
One or more space characters.
A string that is an ASCII case-insensitive match for the string html.
Optionally, a DOCTYPE legacy string or an obsolete permitted DOCTYPE string (defined below).
Zero or more space characters.
A > (U+003E) character.
Yes, the doctype must be first thing in the document (except for comments). You should avoid inserting scripts before the doctype; compliant parsers are not required to accept that. (They should accept scripts appended after the rest of the document, if that is an alternative.)
From the HTML 5 specification:
8.1 Writing HTML documents
This section only applies to documents, authoring tools, and markup > generators. In particular, it does not apply to conformance checkers; > conformance checkers must use the requirements given in the next section > ("parsing HTML documents").
Documents must consist of the following parts, in the given order:
Optionally, a single "BOM" (U+FEFF) character.
Any number of comments and space characters.
A DOCTYPE.
Any number of comments and space characters.
The root element, in the form of an html element.
Any number of comments and space characters.
The various types of content mentioned above are described in the next few sections.
From HTML 4.01 Specification:
7 The global structure of an HTML document
An HTML 4 document is composed of three parts:
a line containing HTML version information,
a declarative header section (delimited by the HEAD element),
a body, which contains the document's actual content. The body may be implemented by the BODY element or the FRAMESET element.
[...]
White space (spaces, newlines, tabs, and comments) may appear before or after each section.
[...]
A valid HTML document declares what version of HTML is used in the document. The document type declaration names the document type definition (DTD) in use for the document (see [ISO8879]).
It’s not a tag, but yup. Mainly because that’s the only way to get Internet Explorer (pre-version 8, I think) into standards mode.