IE not parsing HTML correctly - html

Here's a snapshot of my HTML code as viewed after clicking View Source in IE9. I opens up in Notepad++
Here is the code as parsed by the IE9's Developer Tool:
Why is there a disconnect between the two?

As #Shadow Wizard suggested in a comment, the probable cause is some data character before the <!DOCTYPE ...> declaration. For example, the problem is reproducible by taking a valid XHTML document and inserting the no-break space character (NBSP, U+00A0) at the very start of the document. But the presence of a BOM at the start does not cause the problem (at least not on IE 9, hardly on any browser in use). There is a large number of invisible or barely visible characters that might cause the problem.

The Developer Tool does NOT show you the source of your code, it shows the DOM (Document Object Model) that's why there are differences. You can read more about DOM here: https://developer.mozilla.org/en/DOM/
The content showed in Developer Tool represents your parsed Source with some modifications to be more readable and without incorrect tags, in two words it does not show the exact source.

Related

Forcing the Chrome Inspector to show the real source code instead of its interpretation of it

The Chrome Inspector is pretty neat when looking at your HTML/JS in an application, but a few times now I have noticed that it does not display the real source, but rather shows you how it has interpreted your page.
This will occasionally make things complicated or confusing, because a bug is actually caused by a behavior that the inspector refuses to show and instead requires you to look at the page through View Source to see.
For example, if you nest two forms (which is illegal in html) then the inspector will instead show that it's closing the first form before opening the second one, which makes it look like everything is okay.
I've also seen it remove attributes that it doesn't understand, swap out quotation marks for different ones, and do some more odd things that complicate the life of a debugging programmer.
Is there any way to turn this feature off and force the inspector to show you what it has really been reading?
The inspector doesn't show the HTML source of the page, instead it show the HTML representation of the current DOM. When the HTML is initially parsed and validates and fixes it before building the DOM. At that time errors such as nested forms are solved. If you wish to view the original source of a page, right click the page and select view source.

Basic encoding/decoding of characters for the web

I feel like this is something I should definitely know about, but I'm not entirely sure of the details of at what point a character is decoded by a browser (or even if I'm thinking about it in the right way).
While inspecting the DOM of a site to which I've added some content (through a form, for example), I can see my < (in the contents of my comment) appear as a string. Even if the angular brackets are well-balanced (e.g. <something>), it appears as a string rather than an element in the DOM. I appreciate this is critical in defense against injection attacks such as XSS, so (on the server), the content is written as a string literal rather than an element - but how does the browser recognise this and render it differently? And when does it decode it?
If the server does respond with > or < why do I not see this in dev tools?
My confusion comes from the fact that, when inspecting, there is no difference between my <something> content and a <something> element (if there were such a thing).
So, I'd expect to see (when inspecting the DOM) <content>, but it seems not.
This is merely because your browser's DOM inspector is a bit loose in its representation. You're inspecting the DOM after all, a complex object oriented internal memory structure, yet your browser is showing it to you in an HTML-like presentation. Either because of an oversight or as a conscious decision to make this presentation more readable, not everything that should be an HTML entity in valid HTML is being displayed as HTML entity.
If you inspect the actual source code of the page, you'll see <content>.

Odd ui-grid bug with <!DOCTYPE html[]>

I'm experiencing what seems to be a bug in Angular's ui-grid. My index.html page has this at the top:
<!DOCTYPE HTML[]>
When I run the app, the column headers of the grids scroll of the page:
Now, if I remove the brackets, like this:
<!DOCTYPE HTML>
The grid displays correctly:
Has anyone worked through this? Is there a fix?
Note: I could remove the brackets and leave it at that, but our deployment tool modifies the index.html file at deployment time and adds the brackets, because that's apparently well-formed HTML.
"Well-formed" is a concept that only makes sense in XML. In rough terms, it means that every element has an explicit start tag and an explicit end tag and they they are in the right places. It has nothing to do with the content of the Doctype declaration.
The latest version of the HTML specification says:
A DOCTYPE must consist of the following components, in this order:
A string that is an ASCII case-insensitive match for the string "
One or more space characters.
A string that is an ASCII case-insensitive match for the string "html".
Optionally, a DOCTYPE legacy string or an obsolete permitted DOCTYPE string (defined below).
Zero or more space characters.
A ">" (U+003E) character.
… so the change your deployment tool is making is neither "well-formed" nor in any way correct.
Breaking the Doctype in that way triggers Quirks Mode. This makes browsers backwards compatible with the browsers of the late 1990s by emulating many of the bugs they featured.
The CSS is breaking because it depends on those bugs not being present.
You could probably rewrite all the CSS so it is designed to work in Quirks Mode, but fixing the deployment tool so it doesn't break the Doctype would be better.

Odd HTML/XML encoding issue

I'm having some real issues with a site we're building on our bespoke content management system. The system renders all views via XSLT, which may be the problem.
The problem we're experiencing appears to be the result of character encoding mismatches, but I'm struggling to work out which part of the process is breaking down.
The issue does not occur in Firefox or Chrome, and in IE is fine for the initial load of the page and when it is refreshed, however, when using the 'back' button or 'forward' button in IE, I find that any unicode characters are showing as a white question mark in a black diamond which implies that the wrong character set is being used. We've also seen odd results as a result of this with the page as indexed by google (it appears to index the DOCTYPE reference and the content of the head element rather than the content as would normally be the case).
All of the XSLT stylesheets are outputting UTF-16 and the XSLT files themselves are UTF-16 files (previously there was a mismatch). The site is serving the pages as UTF-16 and the HTML output has a meta tag setting the content type to use a charset of UTF-16.
I've checked the results using Fiddler to see what's coming from the server, however, Fiddler isn't logging a request/response when IE uses the back/forward buttons, so presumably it's got them cached somewhere.
Anyone got any ideas?
The site is serving the pages as UTF-16
Whoah! Don't do that.
There are several browser bugs to do with UTF-16 pages. I hadn't heard of this particular one before but it's common for UTF-16 to break form handling, for example. UTF-16 is very rarely used on the web, and as a consequence it turns up a lot of little-known bugs in browsers and other agents (like search engines and other tools written in one of the many scripting languages with poor Unicode support like PHP).
the HTML output has a meta tag setting the content type to use a charset of UTF-16
This has no effect. If the browser fails to detect UTF-16 then, because UTF-16 is not ASCII-compatible, it won't even be able to read the meta tag.
On the web, always use an ASCII-compatible encoding—usually UTF-8. UTF-8 is by far the best-supported encoding, and is almost always smaller in size than UTF-16. UTF-16 offers pretty much no advantage and I would avoid it in every case.
Possibly IE is corrupting the files when they are read from the cache. Could be related to this (unfotunately unanswered) question
Firefox & IE: Corrupted data when retrieved from cache
A few things you could check/try:
Make sure encoding is specified in both http Content-Type: header and <?xml encoding=...> declaration at the top of the XML
Are you specifing the endian of your UTF-16 or relying on byte order mark? If the latter try specifying. I think windows is usually fond of UTF-16LE.
Are you able to try another encoding? Namely UTF-8?
Are you able to disable caching from the server end (if its practical)? pragma: no-cache or whatever its modern day equivalent is? (sorry, been a while since I played with this stuff).
Sorry, no real answer here, but too much to write as a comment.

Strange treatment of "plus" character (+) by Internet Explorer 7

This is really weird... When I open the following simple HTML document in Internet Explorer 7.0.5730.11 (on Windows Server 2003 Web Edition SP2)
<html>
<body>
<p>+</p>
</body>
</html>
it shows me a totally blank page. FWIW, this is just a trivial "repro" sample. In real HTML documents, I observed other, even more bizzarre effects caused by presense of the "plus" character that follows a tag.
NB: The problem appears to be extremely ittermittent. Most of the time it does work properly (i.e. displays the "plus" character), and I still can't find any way to reproduce this problem at will.
Some additional details based on recent comments:
There was no server involved. I was opening a file on disk (i.e. used file:// protocol).
The file did not contain anything except five lines shown above. No document type declarations, no character encodings, no nothings.
Looks like a bug in IE. Did anybody encounter the same or similar problem?
NB: I appreciate all the responses received so far, but neither of respondednts encountered this problem. Something tells me that 99.(9)% of StackOverflow audience will not be able to reproduce it. :-)
Does it work if you use the numeric character reference notation?
<html>
<body>
<p>+</p>
</body>
</html>
Does it work if you use a Doctype? IE does get a bit picky if you don't use a doctype (insert no-right-to-be-picky pun here).
By intermittent do you mean using the same code it appears and doesn't? That sounds really strange.
CLOSED - NOT REPRO... er I mean I only get the +, no matter how many times I refresh. I suggest using the HTML entity reference - but this might be a problem with your system/browser if others can't reproduce either.
For whatever it's worth, I just tested this on IE 7 (7.0.5730.13C0) and it consistently displays the "+" even with several refreshes (at least 10 or 12). You didn't mention an OS but in my case it's Windows XP SP2 (Help About displays Version 5.1 (Build 2600.xpsp_sp2_qfe.070227-2300: Service Pack 2). The OS may make a difference in this case.
It's possible that this is due to the server, particularly if it's trying to parse the page as a script. To check:
What HTTP headers do you see when the effect occurs?
When you "View Source" at that point, what do you see?
Does the effect ever occur when you load the page directly as a file?
Does the effect ever occur in other browsers?