In page anchors with spaces in the title - html

I have a page with some <h1 name="Header Name"> tags. Notice that the name attribute has a space in it. I want to use an anchor so that I can jump to the heading. I thought that if I added %20 to replace the spaces it would work, but no.
I am currently swamped, so would prefer not to edit the source and redeploy with each header having an ID or title with hyphens instead of spaces.
<!-- non-working example of what I want -->
<a href="mypage.html#Header%20Name" />
<h1 title="Header Name">Foo</h1>
I read through the spec here, but couldn't find an answer. I also could not find an answer via Google or SO.
http://www.w3.org/TR/html4/struct/links.html
Is there a way to do this?

That's not how anchor tags work. You can jump to a name of an anchor tag, or an id of any element. You can never jump to a title attribute, unfortunately. You would need to modify the html, which you don't want to do, or include javascript to accomplish what you are asking.
Edit: Standards have always said id attributes cannot include spaces, including HTML5, which says this:
HTML5 gets rid of the additional restrictions on the id attribute. The
only requirements left — apart from being unique in the document — are
that the value must contain at least one character (can’t be empty),
and that it can’t contain any space characters.

Here's the W3C spec https://www.w3.org/TR/REC-html40-971218/types.html#h-6.2
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
No spaces.

Anchors point to name or id attributes. It won't find your title attribute.
Use <h1 id="Header Name"> and it should work, but ideally you shouldn't have spaces there.

The destination anchor should be defined by an id attribute, and it must not contain a space. (HTML5 removes all other constraints but disallows a space and an empty value. Older HTML versions are much more restrictive.) So:
<h1 id="HeaderName">Foo</h1>
Alternatively, you could use an a element with name attribute, but this is clumsy and outdated (and forbidden in HTML5, though it still works, properly used):
<h1><a name="HeaderName">Foo</a></h1>
You can use the name attribute on a few elements only, not e.g. in h1, and only in a elements does it have the meaning of acting as a destination anchor.
The title attribute can be used on any element (at least as per HTML5), but it is merely an advisory title and has nothing to do with linking.
The construct <a href="mypage.html#Header%20Name" /> denotes an empty a element, a usability nightmare, and does not actually work (it is taken just as a start tag of an element), except in XHTML when served with an XML document type, which you are most probably not doing. So just don’t use the “self-closing” syntax except for elements with empty declared content, if at all.

There /is/ a way to have an anchor with a space in its name without violating the HTML specification. Use <a name="my%20name%20with%20spaces"></a>. The specification makes it clear that you can use percent-encoding when using <a> with the name attribute. However, when using id, you should not percent-encode the attribute, and since spaces are disallowed in the id attribute, the only option is to use <a name=...> instead.
More information: https://blog.markasoftware.com/post/spaces-in-anchor-names/

Related

Valid and Invalid HTML tags

So recently I found a question that was
Which of the following is a valid tag?
<213person>
<_person> (This is given as the right answer)
Both
None
(Note: this is the explanation that was given:- Valid HTML tags are surrounded by the angle brackets and the tag name can only either start from an alphabet or an underscore(_))
As far as my knowledge goes none of the reserved tags start with an underscore and according to what I've read about custom HTML tags it has to start with an alphabet(I tested it and it doesn't work with a custom tag starting with any character that's not an alphabet). So in my opinion and according to what I tested HTML tags can only start with alphabets or! (in case of !-- -- and !DOCTYPE HTML)
What I want to know is if the given explanation is correct or not and if it's correct then can someone provide some proper documentation and working examples for it?
As mentioned by #Rob, the standard defines a valid tag name as string containing alphanumeric ASCII characters, being:
0-9|a-z|A-Z
However, browsers handle things differently.
There's a few main points that I've noticed which don't align with the current standard.
Tag names must start with a letter
If a tag name starts with any character outside a-z|A-Z, the start tag ends up being interpreted as text and the end tag gets converted into a comment.
Special characters can be used
The following HTML is valid in a lot of browsers and will create an element:
<Z[\]^_`a></Z[\]^_`a>
This seems to be browsers only checking if the characters are ASCII. The only exception is the first character (as stated above).
Initially, I thought this was a simplified check, so instead of [A-Z]|[a-z| they checked [A-z], but you can use any character outside this range.
This makes the following HTML also "valid" in the eyes of certain browsers:
<a!></a!>
<aʬ></aʬ>
<a͢͢͢></a͢͢͢>
<a͢͢͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ΘΘΘΘ></a͢͢͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ΘΘΘΘ>
<a></a>
I tested the HTML elements in both Chrome and Firefox, I didn't test any other browsers. I also didn't test every ASCII character, just some very high and low in terms of their character code.
From the HTML standard:
Start tags must have the following format:
The first character of a start tag must be a U+003C LESS-THAN SIGN
character (<). The next few characters of a start tag must be the
element's tag name.
So what is allowed in the element's tag name? This is defined just above:
Tags contain a tag name, giving the element's name. HTML elements all
have names that only use ASCII alphanumerics. In the HTML syntax, tag
names, even those for foreign elements, may be written with any mix of
lower- and uppercase letters that, when converted to all-lowercase,
matches the element's tag name; tag names are case-insensitive.

Random Letter html Tag

I was wondering if you can use a random letter as an html tag. Like, f isn't a tag, but I tried it in some code and it worked just like a span tag. Sorry if this is a bad question, I've just been curious about it for a while, and I couldn't find anything online.
I was wondering if you can use a random letter as an html tag.
Yes and no.
"Yes" - in that it works, but it isn't correct: when you have something like <z> it only works because the web (HTML+CSS+JS) has a degree of forwards compatibility built-in: browsers will render HTML elements that they don't recognize basically the same as a <span> (i.e. an inline element that doesn't do anything other than reify a range of the document's text).
However, to use HTML5 Custom Elements correctly you need to conform to the Custom Elements specification which states:
The name of a custom element must contain a dash (-). So <x-tags>, <my-element>, and <my-awesome-app> are all valid names, while <tabs> and <foo_bar> are not. This requirement is so the HTML parser can distinguish custom elements from regular elements. It also ensures forward compatibility when new tags are added to HTML.
So if you use <my-z> then you'll be fine.
The HTML Living Standard document, as of 2021-12-04, indeed makes an explicit reference to forward-compatibility in its list of requirements for custom element names:
https://html.spec.whatwg.org/#valid-custom-element-name
They start with an ASCII lower alpha, ensuring that the HTML parser will treat them as tags instead of as text.
They do not contain any ASCII upper alphas, ensuring that the user agent can always treat HTML elements ASCII-case-insensitively.
They contain a hyphen, used for namespacing and to ensure forward compatibility (since no elements will be added to HTML, SVG, or MathML with hyphen-containing local names in the future).
They can always be created with createElement() and createElementNS(), which have restrictions that go beyond the parser's.
Apart from these restrictions, a large variety of names is allowed, to give maximum flexibility for use cases like <math-α> or <emotion-😍>.
So, by example:
<a>, <q>, <b>, <i>, <u>, <p>, <s>
No: these single-letter elements are already used by HTML.
<z>
No: element names that don't contain a hyphen - cannot be custom elements and will be interpreted by present-day browsers as invalid/unrecognized markup that they will nevertheless (largely) treat the same as a <span> element.
<a:z>
No: using a colon to use an XML element namespace is not a thing in HTML5 unless you're using XHTML5.
<-z>
No - the element name must start with a lowercase ASCII character from a to z, so - is not allowed.
<a-z>
Yes - this is fine.
<a-> and <a-->
Unsure - these two names are curious:
The HTML spec says the name must match the grammar rule [a-z] (PCENChar)* '-' (PCENChar)*.
The * denotes "zero-or-more" which is odd, because that implies the hyphen doesn't need to be followed by another character.
PCENChar represents a huge range of visible characters permitted in element names, curiously this includes -, so by that rule <a--> should be valid.
But note that -- is a reserved character sequence in the greater SGML-family (including HTML and XML) which may cause weirdness. YMMV!

What characters can come immediately after the < in a tag?

On a webpage I found a tag that begins with a Unicode letter 休
Is there a list somewhere of the letters and symbols may validly follow right after the less than sign?
They're not using a mark-up less than sign "<" on their site but rather they are using the HTML entity less than < to display the reserved character as text rather than HTML.
This can be treated just like ordinary text. So in essence, it's not a tag, its just ordinary text.
For instance the line:
<font style="color:#F00;"><休闲文化></font>
Actually is:
<font style="color:#F00;"><休闲文化></font>
Thus, <休闲文化> isn't a tag itself, but rather just text (which uses HTML reserved characters within it - perhaps marking you confuse it for a tag)
In which context is it used? XML, HTML,...?
In case of HTML there are tags already defined, you can't use a random one. In XML you can define you're own tags. In both cases you might use random tags, while not ending up with error you would notice, the tag might just get skipped.
I believe this Wiki page might help you:
https://en.m.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
All characters you want (but you should quote few).
As you notice, the character <休 come in <a href="http:(...)" target="_blank" title="<休闲文化>【成都大熊猫基. So this is inside a string (an attribute value). The < in this case will not indicate a start of a tag.

RegEx to grab an entire div with a specific ID?

I have some markup which contains the firebug hidden div.
(long story short, the YUI RTE posts content back that includes the hidden firebug div is that is active)
So, in my posted content I have the extra div which I will remove server side in PHP:
<div firebugversion="1.5.4" style="display: none;" id="_firebugConsole"></div>
I cant seem to get a handle on the regex I would need to write to match this string, bearing in mind that it won't always be that exact string (version may change).
All help welcome!
Regex is not the best tool for the job, but you can try:
<div firebugversion=[^>]*></div>
The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. [^…] is a negated character class. [^aeiou] matches one of anything but the lowercase vowels.
The * is the zero-or-more repetition. Thus, [^>]* matches a sequence of anything but >.
If you want to target the id specifically, you can try:
<div [^>]*\bid="_firebugConsole"[^>]*></div>
The \b is the word boundary anchor.
Match this regex -
<div.*id="_firebugConsole".*?/div>
I would suggest this:
\<div firebugversion="(.+)" style="(.+)" id="(.+)"\>
Then you have three groups:
firebugversion
style
id
This one is a little more complicated, and probably not perfect, but it will:
Match any div containing the attribute firebugversion
Match the firebugversion attribute no matter which order attributes appear in the tag
Match the div, even if it contains content or spacing between it and its closing tag (i've seen the firebug tag with a &nbsp ; tag inside it before) Note: it does lazy matching so it will only match the next tag, rather than the last it finds in the document
<(div)\b([^>]*?)(firebugversion)([^>]*?)>(.*?)</div>

Can an HTML element have multiple ids?

I understand that an id must be unique within an HTML/XHTML page.
For a given element, can I assign multiple ids to it?
<div id="nested_element_123 task_123"></div>
I realize I have an easy solution with simply using a class. I'm just curious about using ids in this manner.
No. From the XHTML 1.0 Spec
In XML, fragment identifiers are of
type ID, and there can only be a
single attribute of type ID per
element. Therefore, in XHTML 1.0 the
id attribute is defined to be of type
ID. In order to ensure that XHTML 1.0
documents are well-structured XML
documents, XHTML 1.0 documents MUST
use the id attribute when defining
fragment identifiers on the elements
listed above. See the HTML
Compatibility Guidelines for
information on ensuring such anchors
are backward compatible when serving
XHTML documents as media type
text/html.
Contrary to what everyone else said, the correct answer is YES
The Selectors spec is very clear about this:
If an element has multiple ID attributes, all of them must be treated as IDs for that element for the purposes of the ID selector.Such a situation could be reached using mixtures of xml:id, DOM3 Core, XML DTDs, and namespace-specific knowledge.
Edit
Just to clarify: Yes, an XHTML element can have multiple ids, e.g.
<p id="foo" xml:id="bar">
but assigning multiple ids to the same id attribute using a space-separated list is not possible.
No. While the definition from W3C for HTML 4 doesn't seem to explicitly cover your question, the definition of the name and id attribute says no spaces in the identifier:
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").
My understanding has always been:
IDs are single use and are only applied to one element...
Each is attributed as a unique identifier to (only) one single element.
Classes can be used more than once...
They can therefore be applied to more than one element, and similarly yet different, there can be more than one class (i.e., multiple classes) per element.
No. Every DOM element, if it has an id, has a single, unique id. You could approximate it using something like:
<div id='enclosing_id_123'><span id='enclosed_id_123'></span></div>
and then use navigation to get what you really want.
If you are just looking to apply styles, class names are better.
You can only have one ID per element, but you can indeed have more than one class. But don't have multiple class attributes; put multiple class values into one attribute.
<div id="foo" class="bar baz bax">
is perfectly legal.
No, you cannot have multiple ids for a single tag, but I have seen a tag with a name attribute and an id attribute which are treated the same by some applications.
No, you should use nested DIVs if you want to head down that path. Besides, even if you could, imagine the confusion it would cause when you run document.getElementByID(). What ID is it going to grab if there are multiple ones?
On a slightly related topic, you can add multiple classes to a DIV. See Eric Myers discussion at,
http://meyerweb.com/eric/articles/webrev/199802a.html
I'd like to say technically yes, since really what gets rendered is technically always browser-dependent. Most browsers try to keep to the specifications as best they can and as far as I know there is nothing in the CSS specifications against it. I'm only going to vouch for the actual HTML, CSS, and JavaScript code that gets sent to the browser before any other interpreter steps in.
However, I also say no since every browser I typically test on doesn't actually let you.
If you need to see for yourself, save the following as a .html file and open it up in the major browsers. In all browsers I tested on, the JavaScript function will not match to an element. However, remove either "hunkojunk" from the id tag and all works fine.
Sample Code
<html>
<head>
</head>
<body>
<p id="hunkojunk1 hunkojunk2"></p>
<script type="text/javascript">
document.getElementById('hunkojunk2').innerHTML = "JUNK JUNK JUNK JUNK JUNK JUNK";
</script>
</body>
</html>
Nay.
From 3.2.3.1 The id attribute:
The value must not contain any space characters.
id="a b" <-- find the space character in that VaLuE.
That said, you can style multiple IDs. But if you're following the specification, the answer is no.
From 7.5.2 Element identifiers: the id and class attributes:
The id attribute assigns a unique identifier to an element (which may
be verified by an SGML parser).
and
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
followed by any number of letters, digits ([0-9]), hyphens ("-"),
underscores ("_"), colons (":"), and periods (".").
So "id" must be unique and can't contain a space.
No.
Having said that, there's nothing to stop you doing it. But you'll get inconsistent behaviour with the various browsers. Don't do it. One ID per element.
If you want multiple assignations to an element use classes (separated by a space).
Any ID assigned to a div element is unique.
However, you can assign multiple IDs "under", and not "to" a div element.
In that case, you have to represent those IDs as <span></span> IDs.
Say, you want two links in the same HTML page to point to the same div element in the page.
The two different links
<p>Exponential Equations</p>
<p><Logarithmic Expressions</p>
Point to the same section of the page
<!-- Exponential / Logarithmic Equations Calculator -->
<div class="w3-container w3-card white w3-margin-bottom">
<span id="exponentialEquationsCalculator"></span>
<span id="logarithmicEquationsCalculator"></span>
</div>
The simple answer is no, as others have said before me. An element can't have more than one ID and an ID can't be used more than once in a page. Try it out and you'll see how well it doesn't work.
In response to tvanfosson's answer regarding the use of the same ID in two different elements. As far as I'm aware, an ID can only be used once in a page regardless of whether it's attached to a different tag.
By definition, an element needing an ID should be unique, but if you need two ID's then it's not really unique and needs a class instead.
That's interesting, but as far as I know the answer is a firm no. I don't see why you need a nested ID, since you'll usually cross it with another element that has the same nested ID. If you don't there's no point, if you do there's still very little point.
Classes are specially made for this, and
here is the code from which you can understand it:
<html>
<head>
<style type="text/css">
.personal{
height:100px;
width: 100px;
}
.fam{
border: 2px solid #ccc;
}
.x{
background-color:#ccc;
}
</style>
</head>
<body>
<div class="personal fam x"></div>
</body>
</html>
ID's should be unique, so you should only use a particular ID once on a page. Classes may be used repeatedly.
Check HTML id Attribute (W3Schools) for more details.
I don´t think you can have two Id´s but it should be possible. Using the same id twice is a different case... like two people using the same passport. However one person could have multiple passports... Came looking for this since I have a situation where a single employee can have several functions. Say "sysadm" and "team coordinator" having the id="sysadm teamcoordinator" would let me reference them from other pages so that employees.html#sysadm and employees.html#teamcoordinator would lead to the same place... One day somebody else might take over the team coordinator function while the sysadm remains the sysadm... then I only have to change the ids on the employees.html page ... but like I said - it doesn´t work :(