Details about html validator - html

I need to know the reason why we use or rely on html validator for html?? I guess it doesnt meet our requirements of the browser compatibility.. Or else what is the use of this HTML Validator??

This document
attempts to answer the questions many people have regarding why they should bother with Validating their web sites and tries to dispel a few common myths.

Many browsers will 'cheat' the established standards and render erroneous HTML in a way that looks OK visually. An HTML validator can tell you whether or not your page actually conforms to the standard, giving you peace of mind into the future (of possibly stricter browsers).

HTML interpretation is sloppy as crap. There is a significant amount of garbage that HTML processors with interpret when they shouldn't. This means problems that should destroy a HTML document will be processed anyways resulting in malformed content or function features that do not resemble their intended objectives. Some of these behaviors include unquoted attributes that cause problems if attribute values collide with attribute names or contain spaces or so forth. Missing ending tags can cause all sort of problems. Honestly, a browser should reject processing of pages that are so broken, but they won't. Instead they will perform any level of defamation that your horrible HTML code will allow them to render.
By validating your code avoids these destructive syntax problems. Validation is necessary, because if the browser is willing to processing any level of garbage that it can then it is certainly not going to warn of you problems, so there is no way to know there are problems unless the code is validated.

Related

Why am I able to use css outside html tag? [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
It's a well known fact that browsers will accept invalid HTML and do their best trying to make sense out of it. If you create a web page containing only the following code:
<html>
<head>
<title>This is bad HTML</title>
<body>
<h1>Bad HTML</h2>
<p>This is a paragraph
</body>
then you will get a webpage parsed in a way that will show an acceptable view. Whether it is what you meant or not, depends on each browser's understanding of your mistakes.
This, to me, is the same as if Javascript could be written like this:
if (some_var == 1) {
say_something("some text');
else {
do_something_else();
// END OF CODE
which, a Javascript compiler written with the same effort to make sense out of invalid code could proably parse as you meant - or make its own sense but run it after all.
I've seen several articles and questions regarding the question "Is it even worth it writting valid HTML?", which present several opinions on the pros and cons of writting valid HTML. However, what this really makes me wonder is:
Why are browsers accepting invalid HTML in the first place?
NOTE: The following questions are not more questions, but a way to give context to the only question I'm asking here:
Why aren't browsers strict?
Why don't they reject with errors invalid code, just like any other programming language? (not that I'm calling HTML a programming language, but you get the point)
Wouldn't that force all developers to write HTML code that will be interpreted exactly the same in any browser?
If browsers refused to parse invalid markup, wouldn't that effectively result in valid markup everywhere and from anyone wanting to publish content in the web?
If this comes from historical reasons and backward compatibility, isn't it time already to change when we already see sites like adsense.google.com refusing compatibility with IE < v10?
EDIT: Those voting to close this question, please reconsider. This is not a broad question neither is a opinion based one. It's a very specific question on a very specific subject, completely related to the programming world and that can definitely be answered with a real answer by those who actually know it. Thanks.
"Why are browsers accepting invalid HTML in the first place?"
For compatibility reasons, and in the case of newer browsers, because HTML5 dictates an algorithm for parsing even invalid documents.
Earlier HTML specifications were ambiguous on many situations,
such as what happens when the wrong tag is seen, or inconsistent nesting of
tags, such as <b><i></b></i>. Even so, many documents "just work" because some earlier browsers ignore unexpected tags or even "correct" incorrect nesting.
But now the HTML5 specification includes a much less ambiguous algorithm for parsing HTML documents. Note that the algorithm includes points where "parse errors" can occur. But these parse errors usually don't stop a modern browser from displaying an HTML document, although the browser is free to display parse errors in its developer tools if it chooses to:
[U]ser agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification. [Emphasis added.]
But again, no modern browser, to my knowledge, aborts parsing a document this early because of parse errors (barring extraordinary situations, such as running out of memory).
On the adsense.google.com situation: This probably has nothing to do with invalid HTML, but rather, perhaps, because IE9 and earlier's DOM support is not sufficient for adsense.google.com's needs.
I don't know why they allowed it from the start, but here is why they cant switch now: Legacy Support. If a browser forced strict html, huge parts of the internet would just break, and yes some people would update their code, but some pages would just be lost. There is no incentive for browsers to do this because it would seem to the consumer that browser just doesn't work on some pages and would switch to another that still supports less optimal html.
Basically because it was allowed from the beginning, now it has to be allowed now.
To avoid opinion-based answers, this type of question requires an answer based on an authorative reference with credible and/or official sources.
The following excerpts are quotes from W3C Validator Help & FAQ that addresses Why are browsers accepting invalid HTML in the first place? and some other demonstrated concerns related to that.
About Markup
Most pages on the World Wide Web are written in computer languages
(such as HTML) that allow Web authors to structure text, add
multimedia content, and specify what appearance, or style, the result
should have.
As for every language, these have their own grammar, vocabulary and
syntax, and every document written with these computer languages are
supposed to follow these rules. The (X)HTML languages, for all
versions up to XHTML 1.1, are using machine-readable grammars called
DTDs, a mechanism inherited from SGML.
However, Just as texts in a natural language can include spelling or
grammar errors, documents using Markup languages may (for various
reasons) not be following these rules.
[...]
Concepts
One of the important maxims of computer programming is: "Be
conservative in what you produce; be liberal in what you accept."
Browsers follow the second half of this maxim by accepting Web pages
and trying to display them even if they're not legal HTML. Usually
this means that the browser will try to make educated guesses about
what you probably meant. The problem is that different browsers (or
even different versions of the same browser) will make different
guesses about the same illegal construct; worse, if your HTML is
really pathological, the browser could get hopelessly confused and
produce a mangled mess, or even crash.
That's why you want to follow the first half of the maxim by making
sure your pages are legal HTML.
[...]
Validity might not mean quality, and invalidity might not mean poor quality
A valid Web page is not necessarily a good web page, but an invalid
Web page has little chance of being a good web page.
For that reason, the fact that the W3C Markup Validator says that one
page passes validation does not mean that W3C assesses that it is a
good page. It only means that a tool (not necessarily without flaws)
has found the page to comply with a specific set of rules. No more, no
less. This is also why the "valid ..." icons should never be
considered as a "W3C seal of quality".
Unexpected browser behavior might mean that they actually don't accept invalid markup
While contemporary Web browsers do an increasingly good job of parsing
even the worst HTML “tag soup”, some errors are not always caught
gracefully. Very often, different software on different platforms will
not handle errors in a similar fashion, making it extremely difficult
to apply style or layout consistently.
Using standard, interoperable markup and stylesheets, on the other
hand, offers a much greater chance of having one's page handled
consistently across platforms and user-agents.
[...]
Compatibility problems
Checking that a page “displays fine” in several contemporary browsers
may be a reasonable insurance that the page will “work” today, but it
does not guarantee that it will work tomorrow.
In the past, many authors who relied on the quirks of Netscape 1.1
suddenly found their pages appeared totally blank in Netscape 2.0.
Whilst Internet Explorer initially set out to be bug-compatible with
Netscape, it too has moved towards standards compliance in later
releases.
[...]
Relying too much on 3rd party tools
The answer to this one is that markup languages are no more than data
formats. So a website doesn't look like anything at all! It only takes
on a visual appearance when it is presented by your browser.
In practice, different browsers can and do display the same page very
differently. This is deliberate, and doesn't imply any kind of browser
bug. A term sometimes used for this is WYSINWOG - What You See Is Not
What Others Get (unless by coincidence). It is indeed one of the
principal strengths of the web, that (for example) a visually impaired
user can select very large print or text-to-speech without a publisher
having to go to the trouble and expense of preparing a separate
edition.

Is there a better approach to html standards validation than w3c validator?

Granted, this is a very generic question, but I am wondering if w3c validation is considered a best practice for html validation, or if there are better approaches to ensure contemporary standards-compliant markup.
This question arose when I noticed duplicate IDs on an MDN page (a site I would have assumed would be very strict about its coding practices). It appeared to be an artifact of how they generated the sections of the page.
Curious, I validated the page's code on the w3c validator, and there were various "errors" that suggested that MDN was just ignoring that a certain attribute or value was not valid. Generally, these related to seemingly appropriate uses of rel attributes.
I was left wondering if standards for valid, semantic markup matter less, or if there's a new ideal approach to code validation and standardization than relying on w3c validation.
Maintainer of the current W3C HTML Checker (validator) here. I think it's important to understand the intended purpose of the current HTML checker, which is different from the purpose of the legacy W3C Markup Validator.
The purpose of the checker is documented at https://validator.w3.org/nu/about.html#why-validate:
The core reason to run your HTML documents through a conformance checker is simple: To catch unintended mistakes—mistakes you might have otherwise missed—so that you can fix them.
Beyond that, some document-conformance requirements (validity rules) in the HTML spec are there to help you and the users of your documents avoid certain kinds of potential problems.
There are some markup cases defined as errors because they are potential problems for accessibility, usability, interoperability, security, or maintainability—or because they can result in poor performance, or that might cause your scripts to fail in ways that are hard to troubleshoot.
Along with those, some markup cases are defined as errors because they can cause you to run into potential problems in HTML parsing and error-handling behavior—so that, say, you’d end up with some unintuitive, unexpected result in the DOM
Validating your documents alerts you to those potential problems.
So as far as your question about "are better approaches to ensure contemporary standards-compliant markup", the answer is that it's not an either-or thing; there are a variety of approaches and the W3C HTML Checker is just one of them, and its goal isn't to be the single way to determine anything but instead to just help you catch mistakes you might otherwise miss and that might cause unexpected problems for your users.
As far as ways to get alerted to specific device issues or browser-implementation issues, we don’t have good automated checking tools for that, but a couple of things which are huge help there are:
https://caniuse.com/ — detailed information about the level of support for particular web-runtime features in different browsers, and in different versions of those browsers, and in release of the browsers for mobile devices vs desktop
https://wptdashboard.appspot.com/ — current test results across all major browser engines for dozens of web-runtime features/specs; if https://caniuse.com/ doesn’t have information about a particular feature, you can look through this dashboard and browse to the directory that has tests for that feature, and find whether a browser passes the tests for the feature
But as far as good automated tools we do actually have for checking other things, here are two:
https://validator.w3.org/i18n-checker/ — W3C Internationalization Checker
https://observatory.mozilla.org/ — for doing a security assessment of the content of your site
I recently faced a problem with mentioned above W3C HTML Checker. I respect a huge amount of work that was done by author of this validator, but it did not allow me in any way a tag <script type="text/vbscript" src="file.vbs">. It was said to change type value to empty string, a JavaScript MIME type, or module, which makes my page useless.
I know than VBScript language is rarely used now, it was just a test page, but let me share with you less tricky alternative, as good as the first one for HTML error checking.
Maintainer of the current JsonFormatter (validator) is here

Switch browser to a strict mode in order to write proper html code

Is it possible to switch a browser to a "strict mode" in order to write proper code at least during the development phase?
I see always invalid, dirty html code (besides bad javascript and css) and I feel that one reason is also the high tolerance level of all browsers. So at least I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
Is there anything like that with any of the known browser?
I know about w3c-validator but honestly who is really using this frequently?
Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
Is there anything like that with any of the known browser? Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
The answer to all those questions is “No“. No browsers have any built-in integration like what you describe. There are (or were) some browser extensions that would take every single document you load and send it to the W3C validator for checking, but using one of those extensions (or anything else that automatically sends things to the W3C validator in the background) is a great way to get the W3C to block your IP address (or the IP-address range for your entire company network) for abuse of W3C services.
I know about w3c-validator but honestly who is really using this frequently?
The W3C validator currently processes around 17 requests every second—around 1.5 million documents every day—so I guess there are quite a lot of people using it frequently.
I see always invalid, dirty html code… I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
I'm not sure what specifically you mean by “dirty html code” or “proper code“ but I can say that there are a lot of markup cases that are not bad or invalid but which some people mistakenly consider bad.
For example, some people think every <p> start tag should always have a matching </p> end tag but the fact is that from the time when HTML was created, it has never required documents to always have matching </p> end tags in all cases (in fact, when HTML was created, the <p> element was basically an empty element—not a container—and so the <p> tag simply was a marker.
Another example of a case that some people mistakenly think of as bad is the case of unquoted attribute values; e.g., <link rel=stylesheet …>. But that fact is that unless an attribute value contains spaces, it generally doesn't need to be quoted. So in fact there's actually nothing wrong at all with a case like <link rel=stylesheet …>.
So there's basically no point in trying to find a tool or mechanism to check for cases like that, because those cases are not actually real problems.
All that said, the HTML spec does define some markup cases as being errors, and those cases are what the W3C validator checks.
So if you want to catch real problems and be able to fix them, the answer is pretty simple: Use the W3C validator.
Disclosure: I'm the maintainer of the W3C validator. 😀
As #sideshowbarker notes, there isn't anything built in to all browsers at the moment.
However I do like the idea and wish there was such a tool also (that's how I got to this question)
There is a "partial" solution, in that if you use Firefox, and view the source (not the developer tools, but the CTRL+U or right click "View Page Source") Firefox will highlight invalid tag nesting, and attribute issues in red in the raw HTML source. I find this invaluable as a first pass looking at a page that doesn't seem to be working.
It is quite nice because it isn't super picky about the asdf id not being quoted, or if an attribute is deprecated, but it highlights glitchy stuff like the spacing on the td attributes is messed up (this would cause issues if the attributes were not quoted), and it caught that the span tag was not properly closed, and that the script tag is outside of the html tag, and if I had missed the doctype or had content before it, it flags that too.
Unfortunately "seeing" these issues is a manual process... I'd love to see these in the dev console, and in all browsers.
Most plugins/extensions only get access to the DOM after it has been parsed and these errors are gone or negated... however if there is a way to get the raw HTML source in one of these extension models that we can code an extension for to test for these types of errors, I'd be more than willing to help write one (DM #scunliffe on Twitter). Alternatively this may require writing something at a lower level, like a script to run in Fiddler.

html validator versus SU:BADGE

How to fix that
element "SU:BADGE" undefined <su:badge layout="5"></su:badge>
I use HTML 4.01 Transitional on my website ...
The BADGE is from stumbleupon.com website
Thank you . Regards.
Since such markup is not valid in HTML 4.01, the validator is just doing what you asked it to do: reporting any reportable markup errors.
The only practical reason why such error messages might be disturbing is that if there are many of them, you might accidentally miss to notice some other error messages, relating to constructs where your markup unintentionally deviates from HTML 4.01. If this is an issue, consider writing a custom DTD. It requires some understanding of SGML, but so does the use of validators in general, does it not?
On the other hand, you might decide to refrain from using tags suggested by people who cannot do such a simple thing using valid HTML. There are many ways to put information on a web page in a manner that does not affect the visible appearance but can be retrieved by programs that search for specific constructs (e.g., meta tags). They decided on a way that causes problems to authors who wish to use validtors.
You don’t.
Validation is nice (especially as a sanity check while developing) but if you don’t use valid markup then there’s nothing you can do, and if you need invalid markup (because you use a third-party API for instance) then validation simply ceases to be relevant.
Alternatively, you could serve your page as XML instead of HTML 4 and define an appropriate su XML namespace. But since this would mean that some browsers no longer display your page correctly this is more a theoretical possibility.

Is XHTML compliance pointless?

I'm building a site right now, so far I've painfully forced everything to be compliant and it looks pretty much the same across browsers. However, I'm starting to implement some third party/free javascripts which do things like add attributes (eg. order=2). I could work around this but it's a pain, and I'm starting to lose my principals of making sure everything is valid. Really, is there any point to working around something like this? I got the HTMLValidator plugin for firefox, and looking at most major sites (including this one, google, etc.), they aren't valid XHTML or HTML.
The validation is useful to determine when things are failing to meet standards you presumably agree with. If you are purposefully using a tool that specifically adds something not in the validation standards, obviously that does not break your personal standards agreement.
This discussion gets much more difficult if you have a boss or a client who believes everything should return the green light, as you'll have to explain the above to them and convince them it's not simply you being lazy.
That said, be sure it's not simply be a case of you being lazy. While the validators may annoyingly constantly bring up every instance of the third party attribute, that doesn't invalidate (ha) the other validation errors they're mentioning. It's often worth scanning through as a means of double-checking your work.
Standards compliance is about increasing the chance that your page will work in the browsers you don't test against. This includes screen readers, and the next update of the browsers you do test against, and browsers which you do test against but which have been configured in unexpected ways by the user.
Validating doesn't guarantee you anything, since it's possible for your page to validate but still be sufficiently ambiguous that it won't behave the way you want it to on some browser some day.
However, if your page does validate, you at least have the force of the XHTML spec saying how it should behave. If it doesn't validate, all you have is a bunch of informal conventions between browser writers.
It's probably better to write valid HTML 3 than invalid XHTML, if there's something you want to do which is allowed in one but not the other.
If you're planning on taking advantage of XHTML as XML, then it's worth it to make your pages valid and well formed. Otherwise, plain old semantic HTML is probably want you want. Either way, the needs of your audience outweigh the needs of a validator.
I have yet to experience an instance where the addition of a non-standard attribute has caused a rendering issue in any browser.
Don't try to work around those non-standard attributes. Validators are handy as tools to double check your code for unintentional mistakes, but as we all know, even fully valid xhtml will not always render consistently across browsers. There are many times when design decisions require us to use browser specific (and non-standard) hacks to achieve an effect. This is the life of a web developer as evidenced by the number of technology driving sites (google, yahoo, etc.) that do not validate.
Just keep in mind that the XHTML tag renders differently in most browsers than not having it. The DOCTYPE attribute determines what mode the browser renders in and dictates what is and isn't allowed. If you stray from the XHTML compliance just be sure to retest in all browsers.
Personally I stick with the latest standards whenever possible, but you have to weigh time/money against compliance for sure and it comes down to personal preference for most.
As far as browsers are concerned, XHTML compliance is pointless in that:
Browsers don't have XHTML parsers. They have non-version-specific, web-compatible HTML parsers that build a DOM around the http://www.w3.org/1999/xhtml namespace.
Some browsers that have XML parsers can treat XHTML markup served as application/xhtml+xml as XML. This will take the XML and give default HTML style and behavior to elements in the http://www.w3.org/1999/xhtml namespace. But, as far as parsing goes, it has nothing to do with XHTML. XML parsing rules are followed, not some XHTML DTD's rules.
So, when you use XHTML markup, you're giving something alien to browsers and seeing if it comes out as you intend. The thing is, you can do this with any markup. If it renders as intended and produces the correct DOM, you're doing pretty good. You just have to make sure to keep DOCTYPE switching in mind and make sure you're not relying on a browser bug (so things don't fall apart in browsers that don't have the bug).
What XHTML compliance is good for is syntax checking (by validating) to see if the markup is well formed. This helps avoid parsing bugs. Of course, this can be done with HTML also, so there's nothing special about XHTML in this case. Either way, you still have to test in browsers and hope browser vendors make awesome HTML parsers that can accept all kinds of crap.
What's not pointless is trying to conform to what browsers expect. HTML5 helps with this big time. And, speaking of HTML5, you can define custom attributes all you want. Just prefix them with data-, as in <p data-order="This is a valid, custom attribute.">test</p>.
Being HTML Valid is usually a help for both of you and the browser rendering engine. The less quirks the browsers have to deal with, the more they can focus on adding new features. The more strict you are, the less time you'll spend time wondering why this f##cking proprietary tag does not work in the other browsers.
On the other hand, XHTML is, IMHO, more pointless, except if you plan to integrate it within some XML document. As IE still does not recognize it, it's pretty useless to stay stick with.
I think writing "valid code" is important, simply because you're setting an example by following the rules. If every developer had written code for Fx, Safari and Opera, I think IE had to "start following the rules" sooner than with version 8.
I try write compliant code most of the time weighing the time/cost vs the needs of the audience in all cases but one. Where you code needs to be 503 compliant, it is in your best interest and the interest of your audience to write compliant code. I've come across a bunch of screen readers that blow up when the code is even slightly off.
Like the majority of posters said, it's really all about what your audience needs.
It's not pointless by any means, but there is plenty of justification for breaking it. During the initial stages of CSS development it's very useful for diagnosing browser issues if your markup is valid. Beyond that, if you want to do something and you feel the most appropriate method is to break the validation, that's usually ok.
An alternative to using custom attributes is to make use of the 'rel' attribute, for an example see Litebox (and its kin).
Sure, you could always just go ahead and write it in the way you want, making sure that at minimum it works. Of course, we've already suffered this mentality and have witnessed its output, Internet Explorer 6.
I am a big fan of the Mike Davidson approach to standards-oriented development.
Just because you can validate your code doesn’t mean you are better than anybody else. Heck, it doesn’t even necessarily mean you write better code than anybody else. Someone who can write a banking application entirely in Flash is a better coder than you. Someone who can integrate third-party code into a complicated publishing environment is a better coder than you. Think of validation as using picture perfect grammar; it helps you get your ideas across and is a sign of a good education, but it isn’t nearly as important as the ideas and concepts you think of and subsequently communicate. The most charismatic and possibly smartest person I’ve ever worked for was from the South and used the word “ain’t” quite regularly. It didn’t make him any less smart, and, in fact, it made him more memorable. So all I’m saying is there are plenty of things to judge someone on… validation is one of them, but certainly not the most important.
A lot of people misunderstand this post to mean that we shouldn't code to standards. We should, obviously, but it's not something that should even really be thought about. The validation army will always decry those that do not validate, but validation means so much more than valid code.
So, don't lose your principles, but remember that if you follow the standards you're a lot less likely to end up in the deep-end of issues in the future. The content you're trying to provide is far more important than how it is displayed.