Let's face it: writing proper, standards compliant HTML is quite difficult to do. Writing semantic HTML is even more so, but I don't think it's possible for a computer to figure that out.
So my question to you is what would the "ideal" feedback for a user who entered HTML be? Would it be a W3C validator style list of errors and corresponding line numbers and columns? Would it be a annotated code display of highlighted lines, explanations of the errors, and possible fixes? A spell-check style mode where you handle each error separately? Would it be not giving them any error information at all? Also, what types of errors are a good idea to tell users? (Some broad classes of errors include parsing errors, nesting errors (i.e. putting a div in a b tag) and well-formedness errors.)
Scottm: Good point; I've never liked the W3C way of listing all the errors either. However, there is still the question of then letting the user edit the offending HTML appropriately.
onebyone: Ok, so looking at some screenshots it looks like HTML Validator has a W3C error list, but combined with the ability to go straight to the relevant source segment and expanded error information, as well as the fact that you don't have to go scrolly to jump from one section to another. Looks pretty good, but is it usable by the average Joe?
Edit 1: As a clarification, this is with regards to the interface, not necessarily the underlying implementation. However, interface needs to be feasible with plain HTML and JavaScript (double usability points if it just needs HTML, but I think you're going to get stuck with W3C in that case).
The output from the Firefox "HTML validator" add-on is pretty good. It shows you the source in a big window, and a list of errors in a small window (smallness doesn't matter, since you generally only care about the first one, since you're aiming for a total of none). Click an error to highlight, and an expanded explanation is shown in a second small window, while the offending part of the code is highlighted in the big window.
The add-on doesn't include a text editor, though, so it's not a full solution to your problem. It uses both an SGML-based validator and HTML Tidy, though, and I think for local files you can get it to make the corrections suggested by Tidy.
I always think syntax highlighting is great. In HTML this would be very useful too, as tags can be easily distinguished by the developer when he/she can see them appropraitely coloured.
Personally I don't like the W3C way of giving you a big boring list of problems. Visual aids in the code itself are much better.
Related
Is it possible to switch a browser to a "strict mode" in order to write proper code at least during the development phase?
I see always invalid, dirty html code (besides bad javascript and css) and I feel that one reason is also the high tolerance level of all browsers. So at least I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
Is there anything like that with any of the known browser?
I know about w3c-validator but honestly who is really using this frequently?
Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
Is there anything like that with any of the known browser? Is there maybe some sort of regular interface between browser and validator? Are there any development environments where the validation is tested automatically?
The answer to all those questions is “No“. No browsers have any built-in integration like what you describe. There are (or were) some browser extensions that would take every single document you load and send it to the W3C validator for checking, but using one of those extensions (or anything else that automatically sends things to the W3C validator in the background) is a great way to get the W3C to block your IP address (or the IP-address range for your entire company network) for abuse of W3C services.
I know about w3c-validator but honestly who is really using this frequently?
The W3C validator currently processes around 17 requests every second—around 1.5 million documents every day—so I guess there are quite a lot of people using it frequently.
I see always invalid, dirty html code… I would be ready to have a stricter mode while I use the browser for the development for the pages in order to force myself to proper code.
I'm not sure what specifically you mean by “dirty html code” or “proper code“ but I can say that there are a lot of markup cases that are not bad or invalid but which some people mistakenly consider bad.
For example, some people think every <p> start tag should always have a matching </p> end tag but the fact is that from the time when HTML was created, it has never required documents to always have matching </p> end tags in all cases (in fact, when HTML was created, the <p> element was basically an empty element—not a container—and so the <p> tag simply was a marker.
Another example of a case that some people mistakenly think of as bad is the case of unquoted attribute values; e.g., <link rel=stylesheet …>. But that fact is that unless an attribute value contains spaces, it generally doesn't need to be quoted. So in fact there's actually nothing wrong at all with a case like <link rel=stylesheet …>.
So there's basically no point in trying to find a tool or mechanism to check for cases like that, because those cases are not actually real problems.
All that said, the HTML spec does define some markup cases as being errors, and those cases are what the W3C validator checks.
So if you want to catch real problems and be able to fix them, the answer is pretty simple: Use the W3C validator.
Disclosure: I'm the maintainer of the W3C validator. 😀
As #sideshowbarker notes, there isn't anything built in to all browsers at the moment.
However I do like the idea and wish there was such a tool also (that's how I got to this question)
There is a "partial" solution, in that if you use Firefox, and view the source (not the developer tools, but the CTRL+U or right click "View Page Source") Firefox will highlight invalid tag nesting, and attribute issues in red in the raw HTML source. I find this invaluable as a first pass looking at a page that doesn't seem to be working.
It is quite nice because it isn't super picky about the asdf id not being quoted, or if an attribute is deprecated, but it highlights glitchy stuff like the spacing on the td attributes is messed up (this would cause issues if the attributes were not quoted), and it caught that the span tag was not properly closed, and that the script tag is outside of the html tag, and if I had missed the doctype or had content before it, it flags that too.
Unfortunately "seeing" these issues is a manual process... I'd love to see these in the dev console, and in all browsers.
Most plugins/extensions only get access to the DOM after it has been parsed and these errors are gone or negated... however if there is a way to get the raw HTML source in one of these extension models that we can code an extension for to test for these types of errors, I'd be more than willing to help write one (DM #scunliffe on Twitter). Alternatively this may require writing something at a lower level, like a script to run in Fiddler.
When it comes to my markup, I'm anal. It always has to be perfectly indented, easily readable to me, and 100% valid with the W3C. Often time, when viewing the markup of other websites, I'm appalled with the lack of effort by the developer to try to and keep their markup in the browser clean, organized, and valid.
On the flip side, there's a lot of people who will force all their markup on to one, continuous line for the size saving benefits. This annoys me as well, though not to the same extent because it is done with a purpose. But for the most part, it seems like no developer ever actually looks at their markup in the browser and does anything about it.
Understanding that, to the parser in the browser, indents and spaces (usually) don't matter, how should I be handling my markup? Is it worth the extra time to get my markup perfectly easily readable to humans as well as the browser? Are all my \t's and \n's being used in vain?
There are some browsers who has bugs that renders indented well formed html completely wrong. Such as some versions of Internet explorer with tables and images.
Other than that, i try to keep sane indention, I don't spend to much time with it, just enough to make it easy to debug.
Is it worth the extra time to get my markup perfectly easily readable
My answer is no. The arguments:
Whoever tries to look at the code probably will want modify it so, for editing the code you need good code editor with code formatting (e.g. Netbeans). You'll very soon need other features like, syntax coloring.
Some users might prefer other type of formatting than you.
Anyone interested in readable HTML may use Tidy (of Tidy extension to Firefox) to format it.
It's a performance issue too: additional overload of formatting + stripping whitespace (and minifying when possible) will speed up the site. It's very important for sites with high traffic.
It's worth the effort imho since it helps you understand what exactly is going on in your html page, and that's definitely worth something.
If we want to write clean, elegant code in general this means we should want to generate nice, clean elegant html as well, not?
Not sure if this answers your question, but as long as the code is valid by W3C, is structured as intended. As far as your view-ability of the code (like view source) structure, that's really up to you, but I would not add too much clutter (comments etc). Use the correct DOCTYPE for your markup and you should be fine with that. I don't see any reason to "waste" time on making the source code from the browser "book" readable. The view source would only be beneficial to you so you can quickly see what's happening at a glance through source view.
I like to correctly format my markup, and I think it makes it easier to manage when I do.
Then again, I use ASP.NET and a lot of markup is generated through various controls and classes. In this case, I've decided it is not worth trying to track down each mis-aligned markup and see if something can be done to get the associated control to produce the correct result.
In short, nicely formatted markup is worth it if it can be accomplished without a huge effort.
Yes, in my opinion it is worth. It will be easier to maintain, for you and for other collegues, now and in the future.
About the disadvantage of lower performance, why not to develop a well indented and commented source file and to generate a minimized version to run on the server? It can be acheived with a simple series of regex replacements.
The W3C HTML validator reports errors in lines which are inside script <script> tags. It's creating a lot of noise in the validation output. I can wrap my own script in CDATA but I have a lot of script added dynamically by third party controls.
Is there an HTML validator which can ignore everything in all <script> sections?
Short Bad Answer
If you wish to continue to use the w3 validator but get rid of certain errors regarding html in script tags, you can comment your JavaScript as shown in this guide. This is clearly a hack and is not recommended.
Long Good Answer
The main point of a validator is to ensure your code keeps to standards. The documentation for the w3 validator points you to this guidance and the w3 itself has a guide on keeping html within script to standards.
Personally, I don't see a point of a validator that selectively ignores some standards. You can't know how a random browser is going to implement the w3 standard and just because the major browsers assumedly do not do anything wrong when ignoring errors embedded in script tags, that doesn't mean there aren't browsers that don't conform to standards more closely. Furthermore, there is no guarantee that major browsers won't change their implementation in the future to be closer to standards and thus break your code. It is better to fix the errors you are getting rather than ignore them.
Solution:
Remove the offending third party scripts while you're validating the HTML.
It might be that Michael Robinson's suggestion or Rupert's Bad Short Answer can be done programmatically, though it might be painful to program.
If you can put a proxy or filter in front of your page that strips or modifies the script tags on the fly, the validator will not see the scripts.
Unfortunately, stripping the scripts is only easy if you've got valid XHTML, in which case of course you mightn't really need the validator...
Aside from the fact that this might be fun to try, I'm in favor of Rupert's Long Good Answer.
Is there a reason many websites place a small link/button to the W3C CSS/HTML validation of the respective site or is this just a weird practice that caught on?
Validation just shows that you took time to ensure that the webpage adheres to the standard, as specified in your DOCTYPE.
Ideally every page should validate, but that is very much not the exception.
For some companies it is to avoid lawsuits, as there is a standard for blind people, so, if you don't pass their validation then you can be sued.
Here is a link about ADA compliance:
http://www.icdri.org/CynthiaW/is_%20yoursite_ada_compliant.htm
You may want look here for more reasons:
http://www.clfsrpm.net/w3c-validator/docs/why.html
Basically, if you want to do something, you might as well learn to do it right and pass the validation.
So people can show off the fact that they've spent time complying with standards.
Complying with standards can be useful, as it helps ensure that your page will work across browsers. But very frequently there's no reward for spending the time getting your page to validate, and validators are a lot pickier than they really need to be to ensure interoperability. So, some people find it nice to stick a little badge on their page indicating that it validates, to show off the work they've done.
It's really not all that useful to stick a validation badge on your page, and a large number of the pages I've seen the badge on don't actually validate (as they stuck that badge on before they made changes, or they have an error that allows an & to come through unquoted), so you're pretty much right, it's just a weird little practice.
Well, it's just like why corporations want to display their ISO certifications wherever they can, on the name cards, website etc.
Maybe it's a consciousness-raising move by the developer to belatedly right the wrong done by all those rubbish 'Best viewed in Netscape Navigator 4' type of pseudo-disclaimers you used to get all the time.
It should be noted that there is no "standard", there are only "recommendations", which are not strictly followed by anyone. Besides, validator marks a lot of useless things as errors (such as missing "alt" attribute). And it doesn't guarantee anything related to how your website is displayed, it only guarantees that the syntax of your html is valid (you can easily "break" the recommendations with your code still being validated perfectly).
In my opinion it's mostly used in the same way as "Web 2.0" term: to show off. It doesn't really mean anything.
I have a lot of empty span tags and so on in my code, due to CSS image replacement techniques.
They trigger a HTML validation warning.
Should I care?
I agree with #Gumbo, but having lots of warnings that make no difference can hide the ones that you do need to pay attention to. If you can put something in the span that gets rid of the warning but doesn't break your UI, you should at least think about doing it.
I've made validation part of my workflow because it helps me catch mistakes early. And while I don't consider empty elements to be a problem, it negates some of the value of using a validator if I have to mentally parse a list of warnings each time and decide whether a warning is important or not. So I try to keep my pages both error- and warning-free so that a quick glance at the HTML Validator icon in the Firefox status bar only changes when there is a real problem. To that end I keep empty elements "unempty" by inserting an empty comment.
<span><!-- --></span>
(At least that works with the Tidy validator.)
Now, that being said, I don't think this is at all necessary for many situations. It is perfectly reasonably to think that adding eight extra characters to your code just to avoid a validator warning is ridiculous. But it works for me.
You should consider the behavior of the page for things like screen readers. It is common to actually put a few words describing the image in the tag that are then hidden by the image replacement.
See the CSS Zen Garden where you can see examples like H1 spans with text being replaced in CSS by images.
This will improve the not only the accessibility of your site, but also the search-ability.
An "empty" tag has a very specific definition in HTML:
<span/>
versus
<span></span>
The former is not permitted by the HTML 4.0 Strict DTD, so should be flagged by a validator. The only tags that can use the former syntax are those specifically identified as "EMPTY" in the DTD (eg, <br>).
The second form is valid HTML, and does not get flagged by the W3C validator.
So I have to assume that either (1) your validator is broken, or (2) you are using the tag incorrectly.
A warning is not an error. It’s just a reminder that you should improve something.
I suppose if bandwidth was an issue, those empty tags could be revisited to see if you could get them from appearing alogether.
Eric Meyer also says that empy tags are bad, semantically.
Warning don't mean it's wrong, but say it could, or sometimes should, be better.
In the same way
if("value"==variable)
is better than
if(variable=="value")