Can an HTML element have the same attribute twice? - html

I'm considering writing code which produces an HTML tag that could have duplicate attributes, like this:
<div data-foo="bar" class="some-class" data-foo="baz">
Is this legal HTML? Does one of the data-foo-values take precendence over the other? Can I count on semi-modern browsers (IE >= 9) to parse it without choking?
Or am I about to do something really stupid here?

It is not valid to have the same attribute name twice in an element. The authoritative references for this are somewhat complicated, as old HTML versions were nominally based on SGML and the restriction is implied by a normative reference to the SGML standard. In HTML5 PR, section 8.1.2.3 Attributes explicitly says: “There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.”
What happens in practice is that the latter attribute is ignored. Well, future browsers might do otherwise. In the DOM, attributes appear as properties of the element node as well as in the attributes object, so there would be no natural way to store two values.

It's not technically valid, but every browser will ignore duplicate attributes in HTML documents and use the first value (data-foo="bar" in your case).
Using the same attribute name twice in a tag is considered an internal parse error. It would cause your document to fail validation, if that's something you're worried about. However, it's important to understand that HTML 5 defines an expected result even for cases where you have a "parse error". The parser is allowed to stop when it encounters an error, but if it chooses not to stop it must produce a specific result described in the specification. In practice, no browsers choose to stop when encountering errors in HTML documents (XML/XHTML is a different matter), so all modern browsers will handle this case successfully and consistently.
The WHATWG HTML specification describes this case in section 12.2.4.33 "Attribute name state":
When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute's name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).
See also its description of "parse error" from the opening of section 12.2 "Parsing HTML documents":
Certain points in the parsing algorithm are said to be parse errors. The error handling for parse errors is well-defined (that's the processing rules described throughout this specification), but user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.

I wanted to add a comment to the excellent accepted answer, but my reputation is not high enough.
I wanted to add it is important to consider how your code gets compiled.
For example, Angular removes prior duplicate (non-angular) class attributes and only keeps the last one.
Note: Angular also modifies the value of the class attribute with ngClass and any [class.class-name] attributes.
This is also something you can use linter for.
See htmlhint (attr-no-duplication) or htmllint (attr-no-dup).

Related

Exclamation mark for HTML attribute

I'm looking at an ancient website and I see !style="width:380px; height:30px;" inside an input element. What does the exclamation mark mean?
It does not mean anything. It is invalid markup, and no meaning is assigned to it, any more than an attribute specification like vjkfhjidfhgsi="fbhhjgf" has a meaning.
It has an effect, though, but only in the sense that the specification is parsed and an entry is created from it into the attributes object of the element node, with !style as the attribute name. This has no effect in itself, but such an attribute could be used in scripting and in CSS.
If you meant to ask why the exclamation mark was used, then your guess in your own answer may well be correct. Though invalid, the method is probably safer now than it was in the past. In the old days, syntax error handling was not defined for HTML, and one might argue that HTML parsers could well just skip a character like ! in this context and process the tag as if it were not there. In HTML5 parsing rules, which largely just make common browser practice the rule, specify the ! is to be taken as the first character of an attribute name, see 8.2.4.34 Before attribute name state and the next clause.
Since there are no way to comment out an attribute in HTML, this is probably a hack to make the attribute invalid and thus the styles are not applied. The exclamation mark is probably a convention used by the developer to mark all his/her unneeded attributes.

May the same attribute be specified more than one time in an HTML5 element?

I am looking now through the HTML5 specification (W3C Recommendation 28 October 2014) and I can not find where there is written whether the same attribute may be specified for an element more than one time. For example sometimes attribute style has a very long value. So a question arises: may it be split in several style attributes for better readability?
Could somebody point to the place in the specification where there is said that it is allowed or not?
EDIT: Also in Section "3.2.5.8 The style attribute" there is written "All HTML elements may have the style content attribute set". If the same attribute may be specified at most once then what does words attribute set mean in this context?
The HTML5 spec, Section 8.2.4.35 - 'Attribute name state' says:
When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute's name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be removed from the token.
So, to answer your question, it's invalid HTML.

Why should I use "data-" in my attributes or dashes in my tags?

According to many recent HTML specs, when we are using custom attributes (meaning any attributes not defined in the spec), we should prefix them with data-. However, I see no reason to have to do this (unless you require perfectly valid HTML, obviously). Pretty much all current browsers correctly ignore custom attributes, meaning no conflicts except with identically-named attributes from others' code, and we can ignore even this with custom prefixes or something similar (as suggested on the AngularJS directive page). What, if any, other benefits are there? This question has been asked before, at least twice, but both are pretty old.
I forget where I read it, but some guide said custom HTML tags need dashes, and single-word tags aren't valid. First of all, why? Second, should we do this, and why (besides being valid)? Would there be any problem with underscores or camelCase, etc.? Also, conflicts with existing elements shouldn't be a problem, if, like with data attributes, you prefix or suffix them, etc. See the Angular directive page again.
I'm sure all these questions have been asked before, but I'm combining them into one. Is that a good idea (quick, someone ask on Meta)?
The data-* attributes have two advantages:
It is a convention meaning other programmers will understand quickly that it is a custom attribute.
You get a DOM Javascript API for free: HTMLElement.dataset. If you use jQuery, it leverages this to populates the keys and values you find with .data().
The reason for the - in custom element names is for two basic reasons:
It is a quick way for the HTML parser to know it is a custom element instead of a standard element.
You don't run into the issue of a new standard element being added with the same name which would cause conflict if you register a custom Javascript prototype for the DOM element.
Should you use your own custom element name? Right now it is so new that don't expect it to be fully supported. Let's say it does work. You have to balance the issue of the extra complexity with the benefit. If you can get away with a classname, then use a classname. But if you need a whole new element with a custom Javascript DOM prototype for the element, then you may have a valid usage of it.

Why is it a bad thing to have multiple HTML elements with the same id attribute?

Why is it bad practice to have more than one HTML element with the same id attribute on the same page? I am looking for a way to explain this to someone who is not very familiar with HTML.
I know that the HTML spec requires ids to be unique but that doesn't sound like a convincing reason. Why should I care what someone wrote in some document?
The main reason I can think of is that multiple elements with the same id can cause strange and undefined behavior with Javascript functions such as document.getElementById. I also know that it can cause unexpected behavior with fragment identifiers in URLs. Can anyone think of any other reasons that would make sense to HTML newbies?
Based on your question you already know what w3c has to say about this:
The id attribute specifies a unique id for an HTML element (the id
attribute value must be unique within the HTML document).
The id attribute can be used to point to a style in a style sheet.
The id attribute can also be used by a JavaScript (via the HTML DOM)
to make changes to the HTML element with the specific id.
The point with an id is that it must be unique. It is used to identify an element (or an anything: if two students had the same student id schools would come apart at the seems). It's not like a human name, which needn't be unique. If two elements in an array had the same index, or if two different real numbers were equal... the universe would just fall apart. It's part of the definition of identity.
You should probably use class for what you are trying to do, I think (ps: what are you trying to do?).
Hope this helps!
Why should I care what someone wrote in some document?
You should care because if you are writing HTML, it will be rendered in a browser which was written by someone who did care. W3C created the spec and Google, Mozilla, Microsoft etc... are following it so it is in your interest to follow it as well.
Besides the obvious reason (they are supposed to be unique), you should care because having multiple elements with the same id can break your application.
Let's say you have this markup:
<p id="my_id">One</p>
<p id="my_id">Two</p>
CSS is forgiving, this will color both elements red:
#my_id { color:red; }
..but with JavaScript, this will only style the first one:
document.getElementById('my_id').style.color = 'red';
This is just a simple example. When you're doing anything with JavaScript that relies on ids being unique, your whole application can fall apart. There are questions posted here every day where this is actually happening - something crucial is broken because the developer used duplicate id attributes.
Because if you have multiple HTML elements with the same ID, it is no longer an IDentifier, is it?
Why can't two people have the same social security number?
You basicaly responded to the question. I think that as long as an elemenet can no longer be uniquely identified by the id, than any function that resides on this functionality will break. You can still choose to search elements in an xpath style using the id like you would use a class, but it's cumbersome, error prone and will give you headaches later.
The main reason I can think of is that multiple elements with the same id can cause strange and undefined behavior with Javascript functions such as document.getElementById.
... and XPath expressions, crawlers, scrapers, etc. that rely on ids, but yes, that's exactly it. If they're not convinced, then too bad for them; it will bite them in the end, whether they know it or not (when their website gets visited poorly).
Why should a social security number be unique, or a license plate number? For the same reason any other identifier should be unique. So that it identifies exactly one thing, and you can find that one thing if you have the id.
The main reason I can think of is that multiple elements with the same
id can cause strange and undefined behavior with Javascript functions
such as document.getElementById.
This is exactly the problem. "Undefined behavior" means that one user's browser will behave one way (perhaps get only the first element), another will behave another way (perhaps get only the last element), and another will behave yet another way (perhaps get an array of all elements). The whole idea of programming is to give the computer (that is, the user's browser) exact instructions concerning what you want it to do. When you use ambiguous instructions like non-unique ID attributes, then you get unpredictable results, which is not what a programmer wants.
Why should I care what someone wrote in some document?
W3C specs are not merely "some document"; they are the rules that, if you follow in your coding, you can reasonably expect any browser to obey. Of course, W3C standards are rarely followed exactly by all browsers, but they are the best set of commonly accepted ground rules that exist.
The short answer is that in HTML/JavaScript DOM API you have the getElementById function which returns one element, not a collection. So if you have more than one element with the same id, it would not know which one to pick.
But the question isn't that dumb actually, because there are reasons to want one id that might refer to more than one element in the HTML. For example, a user might make a selection of text and wants to annotate it. You want to show this with a
<span class="Annotation" id="A01">Bla bla bla</span>
If the user selected text that spans multiple paragraphs, then the needs to be broken up into fragments, but all fragments of that selection should be addressable by the same "id".
Note that in the past you could put
<a name="..."/>
elements in your HTML and you could find them with getElementsByName. So this is similar. But unfortunately the HTML specifications have started to deprecate this, which is a bad idea because it leaves an important use case without a simple solution.
Of course with XPath you can do anything use any attribute or even text node as an id. Apparently the XPointer spec allows you to make reference to elements by any XPath expression and use that in URL fragment references as in
http://my.host.com/document.html#xpointer(id('A01'))
or its short version
http://my.host.com/document.html#A01
or, other equivalent XPath expressions:
http://my.host.com/document.html#xpointer(/*/descendant-or-self::*[#id = 'A01'])
and so, one could refer to name attributes
http://my.host.com/document.html#xpointer(/*/descendant-or-self::*[#name = 'A01'])
or whatever you name your attributes
http://my.host.com/document.html#xpointer(/*/descendant-or-self::*[#annotation-id = 'A01'])
Hope this helps.

Validation error "Bad value apple-touch-icon-precomposed for attribute rel on element link: Keyword apple-touch-icon-precomposed is not registered."

I'm getting this error in w3C HTML 5 validator
Line 9, Column 101: Bad value
apple-touch-icon-precomposed for
attribute rel on element link: Keyword
apple-touch-icon-precomposed is not
registered. …-icon-precomposed"
sizes="72x72"
href="images/sl/touch/m/apple-touch-icon.png">
Syntax of link type valid for :
A whitespace-separated list of link
types listed as allowed on in
the HTML specification or listed as an
allowed on on the Microformats
wiki
How to fix this error?
Ignore it.
If that's the only error you have, then your document is valid HTML5.
Here's what the official (in development) spec states about the <meta> tag: Extensions to the predefined set of metadata names may be registered. I can't find the area in the spec that talks about the "ref" tag values, but the validator treats them similarly (one for links, one for strings), and points us to the extension Wiki. You 'may' register them, but don't have to. In RFC terminology this is a SHOULD not a MUST.
The spec doesn't seem to mandate a fixed list, or use of the Wiki. Doing so would seem odd, as these fields have often evolved with time. It does state that Conformance checkers must use the information given on the WHATWG Wiki MetaExtensions page to establish if a value is allowed or not: values defined in this specification or marked as "proposed" or "ratified" must be accepted. which is an interesting line as it is a specification for the HTML Validators, not HTML5 itself, and doesn't itself make the markup invalid.
In fact, many of these "extensions" are already in the wiki (including your one), they just haven't been accepted. Same with many meta tags, even very common ones. It seems many won't be accepted either.
I think it's very nice of the W3C to create a standardised list of these. It helps developers know what they should be using now and in the future (and can hopefully clean up some things linke reducing the number of ways you can specify a creation date from 5+ to 1).
Unfortunately we are dealing with third parties here (e.g. Apple) – and unless you want to contact every third party who has created one of these informal specification, and tell them to formalize a spec, and submit it to the W3C's list (which may or may not get accepted) what are you to do? At the end of the day you still need to support it.
Anyway, isn't the very point of having these HTML elements to support extensions so vendors don't break the spec by adding new elements to do what the need?
If you move the touch icons into your web root and follow the Apple documentation for naming conventions, you won't actually need to insert the link tags in your HTML and will avoid those validation errors.
The iOS devices will look for the icons in the web root automatically, using the predefined naming conventions and the correct resolution as also outline here. Good luck.
Delete the element from your source.
You probably don't want to do that though. Remember that validation is a tool, not a competition.
You might want to edit the wiki of supported link types and then wait for the validator to catch up.