XHTML rendering timeline different from HTML in WebKit? - html

I'm working on a project where we went from XHTML to HTML back to XHTML and there are some definite behavioral changes going back with regards to the page rendering before the CSS loads and scripts that read styles reading them before the CSS loads. Can anyone shed some light on why the following is happening and what can be done about it?
Basically, I have a page with the following structure:
<body>
<!-- Content from Source A -->
<link href="http://a.example.com/style.css" />
<header>...</header>
<!-- Content from Source B -->
<link href="http://b.example.com/style.css" />
<div>...</div>
<!-- Content from Source A -->
<footer>...</footer>
<script src="http://a.example.com/script.js">
/* e.g. */
alert($('header').offset().height);
</script>
</body>
When we were in HTML rendering mode, the page blocks rendering at expected points. When we hit the Source A CSS, rendering pauses (blank screen); when we hit the Source B CSS, rendering pauses (header is visible). When we hit the Source A JavaScript, rendering pauses (full page shown) and the script reads element styles from their rendered state. (In reality, of course, WebKit doesn't stop parsing the DOM or executing JavaScript while the CSS loads, but it does halt execution at the first point where the script needs to read a style.)
When we are in XHTML mode, the page doesn't halt rendering at all and will render the entire page completely unstyled. After that, it appears to process the scripts and stylesheets in the order loaded, or rather it executes the scripts in order but doesn't wait for the stylesheet to load before executing a loaded script. This means that the page will render three times (unformatted, with one stylesheet, and with two stylesheets) and the script may infer completely inaccurate values for element sizes.
Can someone shed light on this? This is happening in all WebKit browsers I've tested, including Chrome 17, Mobile Safari 5, and Android Browser 2.1. Is there any way to ensure HTML render ordering without resorting to the text/html mime type?

WebKit uses libxml2 to handle XML, which sends the parsed XHTML back to WebCore and JavaScriptCore to do the CSS rendering and JavaScript execution.
Stylesheet and script tags link to what's called an external entity in XML terminology. That means they are processed last. The XML spec says:
Except when standalone="yes", they must not process entity declarations or attribute-list declarations encountered after a reference to a parameter entity that is not read, since the entity may have contained overriding declarations; when standalone="yes", processors must process these declarations.
Since standalone="yes" specifies that the XML document should be validated by a DTD, this triggers a different processing model.
Link tags are handled differently than xml-stylesheet processing-instructions. The XML stylesheet spec says:
Any links to style sheets that are specified externally to the document (e.g. Link headers in some versions of HTTP [RFC2068]) are considered to create associations that occur before the associations specified by the xml-stylesheet processing instructions. The application is responsible for taking all associations and determining how, if at all, their order affects its processing.
Try commenting out the script tags and converting the link tags xml-stylesheet instructions. Also, try adding standalone="yes" to the XML declaration:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?xml-stylesheet href="foo.css"?>
In addition, the use of special characters, entities, and XSLT can further complicate the picture, since the processing model differs between HTML and an XML dialect like XHTML:
The range of allowed chars in XML is defined by the XML spec, and
the range is fully checked by libxml2. Not a concern, unless you parse
this for example with an HTML parser and give the preparsed tree to
libxml2 to serialize back. I hope you're not doing this as XSLT is
an XML language and must be parsed by an XML parser.
References
libxml2 Paser Internals
blink-dev => Intent to Deprecate and Remove: XSLT
blink-dev => Security: libxml2 growBuffer integer overflow on 64-bit machines
blink-dev => Stack-buffer-overflow in xmlSerializeHexCharRef
Webkit Title Index

Related

How does browser loads DOM and CSSOM partially?

First let me give you simple overview how it loads, then ill ask question regarding that.
Browser Fetch HTML => parse html => create nodes => parse nodes and start converting them to Dom elements => finds style node so start creating CSSOM => on finishing parsing if there was style tag it waits to let it construct CSSOM tree => once both are finished it merges both, DOM and CSSOM, and fires DOMContentLoaded Event.
So in summary as soon as CSSOM is ready browser starts rendering and Dom can incrementally be added.
This is all fine, but how does the flow go when browser starts rendering page when not the whole html is loaded..(for example in nodejs you can partial html then wait 2s and then send more)
What if there was another style tag at the bottom of the page. Not having received all html, and no css browser would start rendering, but from my understanding rendering only occurs after cssom has been completely built.
What happens to script tag, if css isn't done processing script tag isn't executed and thus also stops parsing. JS is ran after CSSOM is complete.
Things may block the DOMContentLoaded Event, but that does not prevent rendering of the incomplete page. That can be important for very long pages streamed from a slow server.
Browsers can and do interleave script execution, re-styling, rendering with the document parsing. This can be trivially shown by executing javascript in the <head> and querying the DOM, you will see that the document will not have all of its nodes (possibly not even a body element) before the DOMContentLoaded event has fired.
You have to think of document construction more as a stream than sequentially executed blocks that run to completion before the next block starts.
CSSOM stops parsing. Thus execution of subsequent script tags, and also delays rendering.
Script tags before style tags will execute before CSS is loaded into CSSOM from style tags afterwards.
Style tags that come after script tags will alter CSSOM. And if script accessed styles that are being altered then what it read is outdated. Order matters.
Parsing is stopped not just rendering.
JavaScript blocks parsing because it can modify the document. CSS
can’t modify the document, so it seems like there is no reason for it
to block parsing, right?
However, what if a script asks for style information that hasn’t been
parsed yet? The browser doesn’t know what the script is about to
execute—it may ask for something like the DOM node’s background-color
which depends on the style sheet, or it may expect to access the CSSOM
directly.
Because of this, CSS may block parsing depending on the order of
external style sheets and scripts in the document. If there are
external style sheets placed before scripts in the document, the
construction of DOM and CSSOM objects can interfere with each other.
When the parser gets to a script tag, DOM construction cannot proceed
until the JavaScript finishes executing, and the JavaScript cannot be
executed until the CSS is downloaded, parsed, and the CSSOM is
available
.
https://hacks.mozilla.org/2017/09/building-the-dom-faster-speculative-parsing-async-defer-and-preload/
A few important facts:
Event DOMContentLoaded is fired when the document has been fully parsed by the main parser AND the DOM has been completely built.
Any normal script (not async or deferred) effectively blocks DOM construction. The reason is that a script can potentially alter DOM.
Referencing a stylesheet is not parser-blocking nor a DOM-construction-blocker.
If you add a <script> (be it external or inline) after referencing a CSS, the execution (not fetching) of the script is delayed until fetching and parsing of the CSS has been finished even if the script's fetch finishes sooner. The reason is that the scripts may be dependent on the to-be-loaded CSS rules. So the browser has to wait.
Only in this case, a CSS blocks the document parser and DOM construction indirectly.
When the browser is blocked on a script, a second lightweight parser scans the rest of the markup looking for other resources e.g. stylesheets, scripts, images etc., that also need to be retrieved. It's called "Pre-loading".

What happens behind the scene, when a file is loaded with a file extension into a browser (like fox or opera)?

What are browsers interpreting when the get a file extension upon load?
I try to mix html, svg and dtd (entities). I try to do that in a valid way. But now a stand for a problem that i dont understand. I did:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html [
<!ENTITY duration "3s">
...
...
<div style='width:100%; border:1px solid black;'>high there</div>
<p>toaster</p>
<hr/>
<svg width="600" ...
and my 'page' displays well as intended, my Rubymine 'reads' the the file without any annotation.
as long as the file extenstion is SVG (ie. index.svg). If I change it to HTML - bad luck. The Page only looks a bit like it should.
see both variants here:
as svg and as html
what happens now behind the scene when the browser "changes its mind" depending on a file exetension?
BTW my RumyMine tells me that there is something wrong with the file whe it has the 'html' extention (but not what).
I would preferre both: tell me what happens, and tell me what would be the correct way to do what i want.
This has nothing to do with the browser processing the extension, but it starts with how the server processes the extension.
In fact, the spec says:
File extensions are not used to determine the supplied MIME type of a
resource retrieved via HTTP because they are unreliable and easily
spoofed.
When the page is served as http://keepitsimple-soft.com/question.html, your Apache server includes this HTTP header in the response: Content-Type: text/html, so the browser knows that it's an HTML page and uses an HTML parser to read it. The HTML parser doesn't process those entity definitions in the DOCTYPE, so it can't correctly interpret them in the SVG.
When the page is served as http://keepitsimple-soft.com/question.svg, the server includes this HTTP header in the response: Content-Type: image/svg+xml. In this case, the browser recognises the "+xml" part and parses the file using its XML parser. This does interpret the entity definitions, and can therefore handle the SVG fully.
As far as what you should do, you can either use XHTML and stick with the XML parser, or resolve the entity definitions before sending the page over the wire, in which case, your page should work with the HTML parser. (Though I haven't tested this.)

One W3C validation errors I really want to correct

This would be my first website and I do not want to leave it these errors. Can someone please help me with these ones?
Error 1:
if (xmlhttp.readyState==4 && xmlhttp.status==200)
error: character "&" is the first character of a delimiter but occurred as data.
WHEN i &, then my AJAX code stops working
I have no clue how to correct this one.
Error 2:
…ems"><a href="brushdescription.php?id=<?php echo $popularbrushesrow['bd_brushi…
error: character "<" is the first character of a delimiter but occurred as data
Again the same error but for < this time
UPDATE:
I am using this doctype:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
< and & are some of the predefined entities in XML, which need escaping when validating the page as XML or XHTML.
< should be replaced with < (less than)
& should be replaced with & (ampersand)
However, if using these characters in JavaScript you can (instead) enclose the script in a <![CDATA[]]> section, which instructs the parser to not interpret the code as markup and will also not result in a validation error.
Try wrapping your Javascript with <![CDATA[]]> tags like so:
<script>
//<![CDATA[
// Javascript goes here.
//]]>
</script>
Also, you should look into separation of concerns. Try to move your logic out of you view. If your Javascript is in your HTML page, try to include it from a separate file.
From Wikipedia:
HyperText Markup Language (HTML), Cascading Style Sheets (CSS), and JavaScript (JS) are complementary languages used in the development of webpages and websites. HTML is mainly used for organization of webpage content, CSS is used for definition of content presentation style, and JS defines how the content interacts and behaves with the user. Historically, this was not the case though. Prior to the introduction of CSS, HTML performed both duties of defining semantics and style.
Use HTML, not XHTML (or, if you insist on using XHTML, see the guidelines on how to write XHTML that can be parsed as HTML).
I can't see how you could have generated that error. Some more context would be useful.
For the first error, consider switching from XHTML to HTML5. There's really little reason to use XHTML. Use this:
<!DOCTYPE html>
The W3C validator is for client-side code, but it seems you are trying to validate server-side code, hence the PHP tag. Send the rendered code for validation and the second error will go away. The rendered code is the one visible in the browser under "View source". You can supply the URL if it's already online somewhere.
By XML rules, “The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings " & " and " < " respectively.” So “&&” is to be written as &&.
However, this is such works only when the document is processed as “real XHTML” due to having been sent with an XML content type, e.g. with the HTTP header Content-Type: application/xhtml+xml. Doing so implies that old versions of IE will choke on it and modern browsers will refuse to render the document at all if it contains any well-formedness error. People don’t normally do that – they just use XHTML because someone told them to do so, and their documents are sent with an HTML document type, which means among other things that script element content is processed differently. This explains why a fix that satisfies the validator makes the page break.
In the XHTML 1.0 specification, the (in)famous appendix C says: “Use external scripts if your script uses < or & or ]]> or --.” This is the simple cure if you need to use XHTML. Just put your script in an external file and “call” it with <script src="foo.js"></script>.

How best open xml, parse with xslt and show result in browser

I am currently studying ways to present transformed xml files in browsers. My experience with this is minimal, so a number of questions pop up.
I have a transformation test.xslt which transforms input xml to html, and an input file test.xml containing
<?xml version="1.0" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="test.xslt" ?>
<root>...</root>
which, when opened in IE9, neatly displays the transformed xml contained above in the root element.
Question 1
Is there a processing instruction or similar available to include the source xml into the xml to be opened, somewhat like the following:
<?xml version="1.0" standalone="yes"?>
<?xml-stylesheet type="text/xsl" href="test.xslt" ?>
<... instruction to include source file data.xml>
Question 2
The file opened has extension xml. Is there a way to change file contents so it is valid html, allowing the file to be saved with extension html, so that when opened, the default browser will be selected (simply changing extension to html obviously does not have the desired effect so some structural change is necessary) ?
Question 3
My goal is to query a db to get the data to be parsed by the xslt code. What is the best way to do this (no problem if this includes javascript)?
Question 4
Standard db utilities may export query results in attribute-centered fashion (column names and values being represented as attribute names and values). This may involve pre-parsing the xml from db in order to convert it to parent-child fashion (columns as children instead of attributes). What is the best way to do this pre-parsing (note: I already have the xslt for this; I wonder about the data flow and when/how to run two xslt's in sequence) and then apply test.xslt (preferably without saving intermediate xml result files on the server)?
Question 5
When I open above xml in IE9, this works fine as said. But opening it in Firefox errors (RTF issue, apparently I need to use Firefox's node-set function but I still have to discover which namespace that has), and Opera/Chrome/Safari do not show any content. What exactly are the prerequisites for the various browsers where can I find more information on this?
Q1 If you start by serving an html file which then accesses the xml and xslt via javascript it naturally has access to both the input and the output of the xslt. If you are serving the xml and initiating the transformation using xml-stylesheet pi, then perhaps the best thing to do (depending on what you want to do) is to stuff the original source into the output, then javascript in the generated page can access it if needed, eg
<xsl:template matcj="whatever">
<html>
<head>
<script id="source" type="x-xml-spurce">
<xsl:copy-of select="/"/>
</script>
.... whatever you were going to do
then if you need to access the source in response to a user action on the page, a script can retrieve the script with id source and do whatever is needed. (If there is a possibility of the the source including the string you have to code it a bit more defensively).
Q2 If you want to use the xml-stylesheet API then you have to serve it as xml. However you can instead just serve html and then access the xml and xslt from within a script in the html page using the browsers javascip xslt api. as noted above that is more flexible than the xml-stylesheet mechanism.
Q3 pass
Q4 If you are accessing the xslt from javascript then it is easy to chain the output of one to the input of another without writing back to the server as you just have access to the result as a DOM node (or string, depending)
Answer to question 5: Firefox/Mozilla, Opera, Safari, Chrome all support the EXSLT node-set extension function in the namespace http://exslt.org/common, for IE and MSXML you can use script (imported) inside the XSLT stylesheet to allow it to support that namespace too, see http://dpcarlisle.blogspot.de/2007/05/exslt-node-set-function.html. That way inside the main stylesheet where you need to use the node-set function you don't need to write different code to cater for the different namespaces.

Why JSPX does not like empty elements?

this <div id="adiv"></div> will in JSPX somehow be translated into <div id="adiv" />. The way I got it to work is to add empty comments inside like <div id="adiv"><!-- --></div>. I dont understand why this is happening and is there a better to solve this issue?
That's by JSP specification:
JSP.6.2.3 Semantic Model
...
To clearly explain the processing of whitespace, we follow the structure of the
XSLT specification. The first step in processing a JSP document is to identify the
nodes of the document. Then, all textual nodes that have only white space are
dropped from the document; the only exception are nodes in a jsp:text element,
which are kept verbatim. The resulting nodes are interpreted as described in the
following sections. Template data is either passed directly to the response or it is
mediated through (standard or custom) actions.
In theory, it should not harm if you use a XHTML doctype to present the document in the client side instead of a HTML doctype. However, even the XHTML spec requires some elements to be not self-closing. Another workaround would be using <jsp:text /> instead of a comment.
<div id="adiv"><jsp:text /></div>
JSP(X) is however an old view technology. Its successor, Facelets, does a better job in this.
See also:
JSP Document/JSPX: what determines how tabs/space/linebreaks are removed in the output?
Why don't self-closing script tags work?