Why does Opera parse my web page as XML? - html

I just tried viewing my website http://www.logmytime.de/ in Opera (version 10.50) it gives me an "xml parsing failed error" and refuses to display the web page.
I can choose to "Reparse the document as HTML" and then the page works fine, but that's hardly a solution to my problem.
The weird thing is that the error still occurs after setting a HTML (instead of XTHML) doctype:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
I checked the source output from the browser to make sure I did not make any mistake with the Doctype I even viewed the same web page in Firebug and it shows a Content-Type of text/html; .
So, why does Opera still try to parse my web page as XML?
Thanks,
Adrian
Edit: Just to clarify: I am not asking what the error on my web page is. I understand why this is not valid XHTML. However, I am also using the javascript micro templating engine, and it's templates are never valid XML, which is why I need the browser to parse my entire web site as HTML, not XHTML. In order to demonstrate this, I just inserted an example template into the web page.
<script type="text/html" id="StopWatchTemplate" >
<h1><#=currentlyRunning?"Aktueller":"Letzter"#> Stoppuhr-Zeiteintrag</h1>
<%-- Stoppuhr - Ende--%>
</script>
When opening the page in Opera, you can see that the template now produces XML parsing errors even though the doctype for the page is still HTML.
Edit 2:: Just to make this even clearer: I am not asking why my web page is not valid XHTML. I am asking why Opera tries to parse it as XHTML despite the HTML doctype.
Edit3:: Please do not post any more answers, I have found the cause of this and documented it below.

Your document is not a valid HTML document. So, the browser should reject it. Unfortunately, due to a historic accident, most browsers do not reject invalid documents, but rather try to fix them (usually with pretty crappy results), so that the authro never even notices that his document is broken.
Thankfully, with XHTML, the browser vendors decided to fix that, and actually reject invalid documents. In your case, you are delivering your document as XHTML with the application/xhtml+xml MIME type:# curl --head http://www.logmytime.de/
HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 12529
Content-Type: application/xhtml+xml; charset=utf-8
^^^^^^^^^^^^^^^^^^^^^
Server: Microsoft-IIS/7.5
X-AspNetMvc-Version: 2.0
X-AspNet-Version: 2.0.50727
Set-Cookie: Referrer=None; path=/
X-Powered-By: ASP.NET
Date: Tue, 04 May 2010 16:08:40 GMTSo, the browser rejects your document (as it should). When you switch over to HTML, then it tries to fix your broken HTML.
Now, you have changed your DOCTYPE to HTML 4.01, but you are still delivering it as XHTML. All you have achieved now is that there are two reasons for the browser to reject your document: it's still invalid because you haven't fixed the actual bug and the DOCTYPE and the MIME type don't match up.
Instead of mucking around with DOCTYPEs and MIME types in order to get the browser to parse your broken document, the correct way to solve this problem would be to simply fix the invalid markup and remove the extraneous class attribute on line 172. [BTW: who wrote that document? The indentation and formatting is awful.]

You have the "class" attribute specified two times.
From Well-formedness constraint: Unique Att Spec:
An attribute name MUST NOT appear more than once in the same start-tag or empty-element tag.

You got the correct answer (HTTP content-type header mandating XML parsing) and it seems it's fixed. I'll just add a minor hint on how you can figure out what's wrong from within Opera itself. Two possible ways:
1) Info panel
This is not visible by default, but if you open the panel bar on the left (press F4 to toggle if you don't see it), then click the small plus sign at the bottom, you can enable "Info" in the menu.
The info panel shows some assorted information about the page currently open, including encoding and MIME type.
2) Opera Dragonfly
Press Ctrl-Shift-I to open developer tools (or go through menus to Tools > Advanced > Opera Dragonfly)
Go to "Network" tab, then re-load site. You will see the request and can review the headers. Comparing this with corresponding information from Firebug would have shown you the difference in Content-type headers. (Here you will also see that Opera sends an "Accept" header that contains "application/xhtml+xml". This means "Hi server, if you happen to have this file in real XHTML format I would understand that just fine.". Perhaps your server-side framework saw that header and wrongly responded with the XHTML content-type even though the content was invalid?)

In case someone else has the same problem: As suggested by DeveloperArt it can be fixed with a simple ContentType="text/html" attribute in the page element.
Edit: The problem was in fact caused by a bug with the mobile.Browser file I am using in my web project. The workaround above works, but it is not really necessary in my case. See this answer for more details.

It seems like the server is serving a different mime types to different user-agents. Firefox is getting text/html but Opera (and curl according to Jörg W Mittag) is getting application/xhtml+xml. Do you have any content-negotiation code for your site?

Try from another PC to make sure that you're not hitting a cache issue.

The page code is cached in your browser, which is why you are continuing to see the error. You originally saw the error, because your code is likely not valid.

It is because you've kind of told it to...
<html xmlns="http://www.w3.org/1999/xhtml">

application/xhtml+xml
If the server sends the page as application/xhtml+xml, the browser parses it as XML as required by specification. When parsing as XML, the first XML well-formedness mistake will stop the parsing and the client (browser) usually displays an error message.
text/html
The parsers for text/html are more tolerant (due to the history of html development).
Changing the mime type
To change the content type sent by the server, you have to override the HTTP header value: Content-Type. This can be done through scripting language on the server side or sometimes in the configuration of the server such as Apache for example. I do not know how Microsoft-IIS/7.5 can specify on a URI base.
Content-Type: application/xhtml+xml; charset=utf-8
or
Content-Type: text/html; charset=utf-8

This mostly occurs with ASP.NET as it sets content type for opera as application/xhtml+xml. In order to over come this issue. You need to set content type to text/html. The best way to fix this issue is to add following code to .browser config file for opera in App_Browser file.
<capability name="preferredRenderingMime" value="text/html" />
<capability name="preferredRenderingType" value="html32" />
<capability name="SupportsXhtmlRendering" value="false" />

Related

How to prevent "DOCTYPE expected" when setting Content-disposition to attachment?

I have a Python script which is executed for a specific URL and creates an iCal file that can be saved.
It all seems to be working fine except for the fact that, in the developer tools for Edge and Chrome, a warning is being given:
Edge: DOCTYPE expected. Consider adding a valid HTML5 doctype: “”.
Chrome: Resource interpreted as Document but transferred with MIME type text/calendar
The script is setting the following headers:
Content-Type: text/calendar; charset=utf-8
Content-Disposition: attachment; filename="connect.ics"
It seems as if both browsers are expecting a HTML page to be returned rather than non-HTML content, despite the headers I've used.
Other SO questions on a similar topic (e.g. Chrome says: "Resource interpreted as Document but transferred with MIME type application/vnd.openxmlformats-officedocument.wordprocessingml.document") have suggested adding download to the A tag. This seems to be a HTML5 tag (https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#attr-download). Adding that tag stops the message appearing in the developer tools but also stops the progress indicator from appearing!
So it seems I'm stuck between a rock and a hard place: have the progress indicator but get a warning in the developer tools, or fix the warning and have no progress indicator.
Not sure if the progress indicator thing is a bug but both Edge and Chrome behave the same way.
Is it possible to fix the warning and keep the progress indicator or should I just not worry about the warning?

How is the transition of code made in view source page

While viewing a page source in fireBug the code is displayed in html. But my code is written in xhtml. How is the conversions made.
Thanks.
Firebug does not show the source. It shows a representation of the live DOM.
Additionally, browsers will treat XHTML as HTML if the server tells them it is Content-Type: text/html instead of Content-Type: application/xhtml+xml.
Firebug dose not show the source code .Firebug use as a debugging tool .Main Reason of use firebug
-->The Key to Successful Web Development
-->Debugging and Testing the Web with Firebug
-->Debug and tune applications on the fly with Firebug
-->Quick & Easy CSS Development with Firebug So on
when we try to see source view of a page browser return the code as a plain Html.This is one of reason you cannot see your code which is written in xhtml

getting namespaced attributes in Chrome

Oh, what frustration. The supposedly XHTML-complient CKEditor can't actually be served as application/xhtml+xml, so I have to switch to text/html. Suddenly my pages start breaking all over the place.
I serve a well-formed HTML5 document that uses namespaces---in particular, the "example" namespace. Some elements have the "example:fooBar" attribute, but I see now that Chrome when reading a document as text/html converts all attributes to lowercase---grrr!!!
So I change the attribute to "example:foobar" and try element.getAttributeNS("http://example.com/ns", "foobar"). No luck. So I investigate the DOM, and Chrome 17 shows a "localName" of example:foobar. Ack! How hard can namespaces be? Shouldn't Chrome be using a local name of foobar? That is, after all, the local name; example is the namespace prefix!
Is this is Chrome bug? Do all browsers do screwy things like this?

rss and atom content type

I have a problem with my feed's content-type:
When I set the content-type to "application/rss+xml" or "application/atom+xml" the Firefox will render it current (and displays the default subscribe page), but the Chrome renders it as "text/plain". When I change the content-type to "text/xml" or "application/xml" both Firefox and Chrome render it as a xml document (and Firefox will not show the Subscribe page).
Have you any idea or suggestion? Which content-type should I use for rss.xml and atom.xml?
The MIME type application/rss+xml is not recognized by the IANA, though some RSS-consuming applications support it anyway.
Because it's not recognized, the only official MIME type to use is text/xml.
Atom feeds do have the official MIME type of application/atom+xml.
You should use the exact type that is specified for that file format. Chrome itself doesn't do anything special with application/rss+xml or atom+xml and then when things such as registerContentHandler or web intents land applications will be able to interact with these types.

How browsers use the STRING defined in the <img src="STRING" /> to load picture file

I have a very strange problem:
I use xsl to show an html picture where the source is defined in the xml file like this:
<pic src="..\_images\gallery\smallPictures\2009-03-11 אפריקה ושחור לבן\020.jpg" width="150" height="120" />
[the funny chars are Hebrew- ;) ]
Now comes the strange part:
When testing the file locally it works on Firefox and Safari but NOT in IE and opera. (file://c:/file.xml)
Next I send the file to the host throw FTP (nothing more)
Than it suddenly works with all browsers when calling the page from the host: (http://www.host/file.xml)
The question is how can the server send the xml file to my browser in a way that my browser can read, while the same browser cannot read the same file stored locally ?!
I always thought that both HTML(xml) and pictures are sent to the client which is responsible to load the page - so how come the same files works for my webhost provider and not for me?
And what makes it totally strange is that IE is not alone - Opera joins it with this strange behavior.
Any ideas?
Thanks alot
Asaf
When you open the file locally, there is no server to serve up HTTP headers. That's a big difference at least. Try examining the coding the browser thinks the page is in, when it's opened manually from disc, and when served over HTTP.
If headers are set correctly by either your script, or the server, then that is likely why.
This is most likely an encoding problem. Try to specify the encoding explicitly in the generated HTML page by including the following META element in the head of the page (assuming that your XSLT is set to generate UTF-8):
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
...
</head>
...
This tells the browser to use UTF-8 encoding when rendering the page (You can actually see the encoding used in Internet Explorer's Page -> Encoding menu).
The reason why this works when the page is served by your web server is that the web server tells the browser already what encoding the response has in one of the HTTP headers.
To get a basic understanding what encoding means I recommend you to read the following article:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets
..\_images\gallery\smallPictures\2009-03-11 אפריקה ושחור לבן\020.jpg
that's a Windows filepath and not anything like a valid valid URI. You need to:
replace the \ backslashes with /;
presumably, remove the .., if you're expecting the file to be in the root directory;
replace the spaces (and any other URL-unfriendly punctuation) with URL-encoded versions;
for compatibility with browsers that don't properly support IRI (and to avoid page encoding problems) non-ASCII characters like the Hebrew have to be UTF-8-and-URL-encoded.
You should end up with:
<img src="_images/gallery/smallPictures/2009-03-11%20020/%D7%90%D7%A4%D7%A8%D7%99%D7%A7%D7%94%20%D7%95%D7%A9%D7%97%D7%95%D7%A8%20%D7%9C%D7%91%D7%9F%10.jpg"/>
There's no practical way you can convert filepath to URI in XSLT alone. You will need some scripting language on the server, for example in Python you'd use nturl2path.pathname2url().
It's generally better to keep the file reference in URL form in the XML source.
#Asaf, I believe #Svend is right. HTTP headers will specify content type, content encoding, and other things. Encoding is likely the reason for the weird behavior. In the absence of header information specifying encoding, different browsers will guess the encoding using different methods.
Try right-clicking on the page in the browser and "Show page info". Content encoding should be different when you serve it from a server, than when it's coming straight from your hard drive, depending on your browser.