I was wondering how one would attempt to compare the rendering of a website in various browsers via an algorithm in order to detect incompatibilities (e.g. floats etc.), just like browsera.
You could attempt to parse the and HTML and CSS and look for known problems, like a ‘lint’ tool. But there are so very many browser bugs (esp. IE6 layout bugs) that you'd be unlikely to find everything that way.
The other way would be to load actual instances of each of the target browsers and script them to load the given URL. You could then inject JavaScript to walk their DOMs reading the page-relative positions of each element (using the offset*properties), and flag any elements whose positions/dimensions vary greatly between browsers. You'd also want to catch and record any unhandled JS errors thrown, perhaps through window.onerror.
Related
Is it possible to switch a browser to a "strict mode" in order to write proper code at least during the development phase?
I see always invalid, dirty html code (besides bad javascript and css) and I feel that one reason is also the high tolerance level of all browsers. So at least I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
Is there anything like that with any of the known browser?
I know about w3c-validator but honestly who is really using this frequently?
Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
Is there anything like that with any of the known browser? Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
The answer to all those questions is “No“. No browsers have any built-in integration like what you describe. There are (or were) some browser extensions that would take every single document you load and send it to the W3C validator for checking, but using one of those extensions (or anything else that automatically sends things to the W3C validator in the background) is a great way to get the W3C to block your IP address (or the IP-address range for your entire company network) for abuse of W3C services.
I know about w3c-validator but honestly who is really using this frequently?
The W3C validator currently processes around 17 requests every second—around 1.5 million documents every day—so I guess there are quite a lot of people using it frequently.
I see always invalid, dirty html code… I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
I'm not sure what specifically you mean by “dirty html code” or “proper code“ but I can say that there are a lot of markup cases that are not bad or invalid but which some people mistakenly consider bad.
For example, some people think every <p> start tag should always have a matching </p> end tag but the fact is that from the time when HTML was created, it has never required documents to always have matching </p> end tags in all cases (in fact, when HTML was created, the <p> element was basically an empty element—not a container—and so the <p> tag simply was a marker.
Another example of a case that some people mistakenly think of as bad is the case of unquoted attribute values; e.g., <link rel=stylesheet …>. But that fact is that unless an attribute value contains spaces, it generally doesn't need to be quoted. So in fact there's actually nothing wrong at all with a case like <link rel=stylesheet …>.
So there's basically no point in trying to find a tool or mechanism to check for cases like that, because those cases are not actually real problems.
All that said, the HTML spec does define some markup cases as being errors, and those cases are what the W3C validator checks.
So if you want to catch real problems and be able to fix them, the answer is pretty simple: Use the W3C validator.
Disclosure: I'm the maintainer of the W3C validator. 😀
As #sideshowbarker notes, there isn't anything built in to all browsers at the moment.
However I do like the idea and wish there was such a tool also (that's how I got to this question)
There is a "partial" solution, in that if you use Firefox, and view the source (not the developer tools, but the CTRL+U or right click "View Page Source") Firefox will highlight invalid tag nesting, and attribute issues in red in the raw HTML source. I find this invaluable as a first pass looking at a page that doesn't seem to be working.
It is quite nice because it isn't super picky about the asdf id not being quoted, or if an attribute is deprecated, but it highlights glitchy stuff like the spacing on the td attributes is messed up (this would cause issues if the attributes were not quoted), and it caught that the span tag was not properly closed, and that the script tag is outside of the html tag, and if I had missed the doctype or had content before it, it flags that too.
Unfortunately "seeing" these issues is a manual process... I'd love to see these in the dev console, and in all browsers.
Most plugins/extensions only get access to the DOM after it has been parsed and these errors are gone or negated... however if there is a way to get the raw HTML source in one of these extension models that we can code an extension for to test for these types of errors, I'd be more than willing to help write one (DM #scunliffe on Twitter). Alternatively this may require writing something at a lower level, like a script to run in Fiddler.
I would like to know how the browser handles CSS rules that come after most (if not all) of the HTML. Will it have to reparse the whole page due to the new rules or does it use some other kind of technique to handle this type of situation? Thanks.
There are many cases when a repaint must occur, and in many occurences in a page lifetime the DOM is changed.
But once the page is parsed, there is no reason to parse it again, all changes are made on the in memory DOM.
This being said, you should put the CSS links in the HEAD because
it lets the browser start their download faster
it complies with HTML4 norm ("it may only appear in the HEAD section of a document")
it lets the browser start the rendering sooner
it lets your colleagues and your future yourself not be surprised when maintaining the code
Relayout and repaint, perhaps. (That is, if it has already started rendering it and the styles loaded require different display.)
Reparse, no. Style sheets are purely presentational; they do not affect the parsing.
Assuming that the browser has already started rendering the page when it sees the additional CSS (there are quite a few browser-specific triggers for this behavior) and assuming that the new rules result in CSS property changes for at least one element, the browser will simply mark that element as one that needs redrawing.
This will result in any visible changes to the page being shown the next time the browser repaints part of its window.
It's important to keep in mind that modern browsers do all of this asynchronously and schedule events like applying new CSS, recalculating layout and painting to the screen mostly (but not totally) independently of each other.
I have almost 200,000 html pages and I need to figure out the height and width at which each html will render in any browser. I only need approximate numbers. How I can programmatically do this with C#?
C# Does not execute inside of a browser, and should not be used to try and determine the width and height a given HTML page will render at in a browser. Moreover, there is no answer for "any browser", as different browsers may support different fonts, may render the same content slightly different, and may be configured with different display-related settings (most browsers allow the user to arbitrarily scale the default font size up or down as desired, which would of course impact the final render size).
In general, however, I would suggest you do something like:
Come up with a JavaScript snippet that can compute the current size of the document.
Write a C# (or Java, C, bash, etc.) program to append your snippet to each of your 200,000 pages.
Use a browser-based test-harness like Selenium or Webdriver to load up each of your 200,000 pages, extract the result from your JavaScript snippet, and log it out to somewhere convenient.
Optionally, you can repeat step 3 with different browsers to get the width/height for all the different browsers that you care about.
Edit: Apparently Webdriver and Selenium are the same thing now. When did that happen?
It's pretty straightforward. Just write an HTML parser and enough of a rendering engine to at least know the height and width of any HTML element (for any screen size, font setting?). Obviously you will need a CSS parser and engine. Since you want to know for any browser, you will need to have modes of emulating each. If you can't directly get the DOM of the HTML pages you are trying to measure you will need a java-script engine to get the values as they appear on the page.
Or you could run the HTML in a browser and use java-script to get the values. This won't be in .NET, though. You could have the java-script post the data to an ASP.NET page if you like though.
Or you could use one of the tools recommended in answer to your earlier question.
I am thinking about starting to use some HTML5 elements in my sites. With the varying lack of support for HTML5 in Internet Explorer I was considering using HTML5shiv. I have read that I would need to set the CSS for various unrecognised elements to be block level and also the possibility of issues with loading HTML5 elements via ajax.
I would like to know what issues others have encountered when using this script. Thanks.
If you're going to dynamically load HTML5 elements you'll need the innershiv. You'll also need to bear in mind that if the IE user has JavaScript disabled, it won't work at all.
I've found the existing solution to be highly unreliable when used in real world scenarios - it's fine for noddy little "hello world" examples but as soon as the pages start getting more complex then you will find that styles will stop applying on some requests etc.
It's not a very nice answer, but the truth is that if you need to support older IE versions then you basically can't rely on being able to style HTML5 elements reliably. If you can get away with using the elements but use superflous markup (divs etc.) to do things like layout then you might get away with it, but then it depends what you consider to be the lesser of the two evils : Loads of noddy markup or no IE support.
When writing filters for the Firefox Add-On 'Adblock Plus' you can write rules to completely remove certain HTML elements from the page, but filtering criteria is in fact limited to a handful of things, like class and id names and attribute values.
What I was hoping for is say a Firefox Add-On which would pass the HTML for a page to some arbitrary process you specify, where this process could reconstitute the HTML for the entire page in any arbitrary way and then have the browser display that. Is there a Firefox add-on that allows this or is this sort of operation commonly accomplished by some entirely different but well-known means (and perhaps not browser-specific).
Wouldn't this allow you to augment pages coming from some website to your browser with arbitrary new features, maybe from an entirely different website.
You are looking for Greasemonkey.