Random Letter html Tag - html

I was wondering if you can use a random letter as an html tag. Like, f isn't a tag, but I tried it in some code and it worked just like a span tag. Sorry if this is a bad question, I've just been curious about it for a while, and I couldn't find anything online.

I was wondering if you can use a random letter as an html tag.
Yes and no.
"Yes" - in that it works, but it isn't correct: when you have something like <z> it only works because the web (HTML+CSS+JS) has a degree of forwards compatibility built-in: browsers will render HTML elements that they don't recognize basically the same as a <span> (i.e. an inline element that doesn't do anything other than reify a range of the document's text).
However, to use HTML5 Custom Elements correctly you need to conform to the Custom Elements specification which states:
The name of a custom element must contain a dash (-). So <x-tags>, <my-element>, and <my-awesome-app> are all valid names, while <tabs> and <foo_bar> are not. This requirement is so the HTML parser can distinguish custom elements from regular elements. It also ensures forward compatibility when new tags are added to HTML.
So if you use <my-z> then you'll be fine.
The HTML Living Standard document, as of 2021-12-04, indeed makes an explicit reference to forward-compatibility in its list of requirements for custom element names:
https://html.spec.whatwg.org/#valid-custom-element-name
They start with an ASCII lower alpha, ensuring that the HTML parser will treat them as tags instead of as text.
They do not contain any ASCII upper alphas, ensuring that the user agent can always treat HTML elements ASCII-case-insensitively.
They contain a hyphen, used for namespacing and to ensure forward compatibility (since no elements will be added to HTML, SVG, or MathML with hyphen-containing local names in the future).
They can always be created with createElement() and createElementNS(), which have restrictions that go beyond the parser's.
Apart from these restrictions, a large variety of names is allowed, to give maximum flexibility for use cases like <math-α> or <emotion-😍>.
So, by example:
<a>, <q>, <b>, <i>, <u>, <p>, <s>
No: these single-letter elements are already used by HTML.
<z>
No: element names that don't contain a hyphen - cannot be custom elements and will be interpreted by present-day browsers as invalid/unrecognized markup that they will nevertheless (largely) treat the same as a <span> element.
<a:z>
No: using a colon to use an XML element namespace is not a thing in HTML5 unless you're using XHTML5.
<-z>
No - the element name must start with a lowercase ASCII character from a to z, so - is not allowed.
<a-z>
Yes - this is fine.
<a-> and <a-->
Unsure - these two names are curious:
The HTML spec says the name must match the grammar rule [a-z] (PCENChar)* '-' (PCENChar)*.
The * denotes "zero-or-more" which is odd, because that implies the hyphen doesn't need to be followed by another character.
PCENChar represents a huge range of visible characters permitted in element names, curiously this includes -, so by that rule <a--> should be valid.
But note that -- is a reserved character sequence in the greater SGML-family (including HTML and XML) which may cause weirdness. YMMV!

Related

Valid and Invalid HTML tags

So recently I found a question that was
Which of the following is a valid tag?
<213person>
<_person> (This is given as the right answer)
Both
None
(Note: this is the explanation that was given:- Valid HTML tags are surrounded by the angle brackets and the tag name can only either start from an alphabet or an underscore(_))
As far as my knowledge goes none of the reserved tags start with an underscore and according to what I've read about custom HTML tags it has to start with an alphabet(I tested it and it doesn't work with a custom tag starting with any character that's not an alphabet). So in my opinion and according to what I tested HTML tags can only start with alphabets or! (in case of !-- -- and !DOCTYPE HTML)
What I want to know is if the given explanation is correct or not and if it's correct then can someone provide some proper documentation and working examples for it?
As mentioned by #Rob, the standard defines a valid tag name as string containing alphanumeric ASCII characters, being:
0-9|a-z|A-Z
However, browsers handle things differently.
There's a few main points that I've noticed which don't align with the current standard.
Tag names must start with a letter
If a tag name starts with any character outside a-z|A-Z, the start tag ends up being interpreted as text and the end tag gets converted into a comment.
Special characters can be used
The following HTML is valid in a lot of browsers and will create an element:
<Z[\]^_`a></Z[\]^_`a>
This seems to be browsers only checking if the characters are ASCII. The only exception is the first character (as stated above).
Initially, I thought this was a simplified check, so instead of [A-Z]|[a-z| they checked [A-z], but you can use any character outside this range.
This makes the following HTML also "valid" in the eyes of certain browsers:
<a!></a!>
<aʬ></aʬ>
<a͢͢͢></a͢͢͢>
<a͢͢͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ΘΘΘΘ></a͢͢͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ΘΘΘΘ>
<a></a>
I tested the HTML elements in both Chrome and Firefox, I didn't test any other browsers. I also didn't test every ASCII character, just some very high and low in terms of their character code.
From the HTML standard:
Start tags must have the following format:
The first character of a start tag must be a U+003C LESS-THAN SIGN
character (<). The next few characters of a start tag must be the
element's tag name.
So what is allowed in the element's tag name? This is defined just above:
Tags contain a tag name, giving the element's name. HTML elements all
have names that only use ASCII alphanumerics. In the HTML syntax, tag
names, even those for foreign elements, may be written with any mix of
lower- and uppercase letters that, when converted to all-lowercase,
matches the element's tag name; tag names are case-insensitive.

unfamiliar html/react tag row- , col-

I was browsing through the source code of a moderately popular repo, and not sure what are the following tags.
see https://github.com/pusher/react-slack-clone/blob/master/src/index.js#L243
<row->
<col->
....
</col->
</row->
why - after the html tags? and how is it an acceptable tag?
They are custom elements. In regards to the tag's validity, you may have noticed that it is not defined anywhere in the code. As per step 5 of the spec, it is valid, and has a namespace of Element.
For a higher-level overview of custom elements, take a look at the MDN tutorial on using custom elements.
An additional note: These tags could be replaced by regular <div> tags with classes, and the functionality would be no different.
This is most likely an error in the source code which has gone unnoticed (possibly by using search & replace?). React accepts element names which end on a - character and it gets rendered to the DOM via document.createElement() as any other element (for a simple example see here: https://jsfiddle.net/nso3gjpw/ ). Since browsers are very forgiving in case of weird html, it just renders the element as an unknown custom element which behaves roughly like a span element. The row- and col- elements are also styled (https://github.com/pusher/react-slack-clone/blob/master/src/index.css#L73).
In the Blink rendering engine source code the following definition for tag names is given (https://www.w3.org/TR/REC-xml/#NT-CombiningChar):
// DOM Level 2 says (letters added):
//
// a) Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.
// b) Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.
// c) Characters in the compatibility area (i.e. with character code greater than #xF900 and less than #xFFFE) are not allowed in XML names.
// d) Characters which have a font or compatibility decomposition (i.e. those with a "compatibility formatting tag" in field 5 of the database -- marked by field 5 beginning with a "<") are not allowed.
// e) The following characters are treated as name-start characters rather than name characters, because the property file classifies them as Alphabetic: [#x02BB-#x02C1], #x0559, #x06E5, #x06E6.
// f) Characters #x20DD-#x20E0 are excluded (in accordance with Unicode, section 5.14).
// g) Character #x00B7 is classified as an extender, because the property list so identifies it.
// h) Character #x0387 is added as a name character, because #x00B7 is its canonical equivalent.
// i) Characters ':' and '_' are allowed as name-start characters.
// j) Characters '-' and '.' are allowed as name characters.
//
// It also contains complete tables. If we decide it's better, we could include those instead of the following code.
Especially important here is rule j) Characters '-' and '.' are allowed as name characters.

<in a nutshell> as text not html tag

I have a text: Our process<in a nutshell>
that has an output as:
Our process<in nutshell="" a=""></in>
I didn't even know in is a tag and cannot find on google what it does.
How do I post it as text? And what is <in>?
Thanks!
In HTML:
Our process <in a nutshell>
There is no <in> tag defined in HTML, but browsers and other parsers still treat <in a nutshell> as tag. It creates an element node in the document tree, representing an unknown element, so it has only a set of general properties. It has no special rendering, and no functionality is associated with it. But you could style it and/or use client-side JavaScript to add functionality to it.
In this case, you didn’t mean to do anything like that, but the tag is still parsed, and in is treated as the element name (tag name) and nutshell and a as attribute names, with attribute values defaulted to the empty string. Since tags are treated as code for starting an element, the tag itself is not rendered. Browsers may imply a closing tag </in> under certain conditions. This explains the “output” presented in the question; it’s really just the fragment of code viewed in a browser’s Developer Tools. The actual rendering in the example case is just the string “Our process”.
To prevent this processing, the “<” character needs to be escaped somehow; < is the best and most common method, so you would write
Our process<in a nutshell>
There is no need to escape the “>”, but you may do so, for symmetry, using >.
Try to replace
< with <
and replace
> with >
Does this give you the expected results?
The browser is interpreting anything in '<>' as a tag.
You need to use the character code to display those symbols as text:
Our process <in a nutshell>

invalid tags in HTML <abc> vs <1234>

I was writing a simple web page. And I wanted to print <abc> and <1234> inside the page. Why <1234> is printed not <abc>? I know <abc> is invalid tag thats why it is not rendered. But what about <1234>?
You have to do it like:
and <1234>
Use HTML entities.
< = <
> = >
Using them tells HTML that you want the < and > to be displayed as it is and not be interpreted as the < and > in <html>
DEMO
P.S.: Here's a list of them.
This is down to the way that browsers parse the HTML into a format that gets displayed as a web page.
As a rule, HTML tags must start with letters. Because of this, the browser attempts to parse as a valid tag (therefore hiding it), but doesn't recognise <1234> and therefore leaves it untouched.
Edit:
As #Arkana pointed out below, there's nothing I can see in the HTML specification that specifically forbids starting a HTML tag with a number. My best guess is that because no (currently valid) HTML tags actually do start with a number, the browser's parser just ignores these tags, based on the same rule that IDs and Names follow according to the HTML4 spec.
In XHTML and in HTML5 (even in HTML serialization), both <abc> and <123> are invalid. In HTML 4.01, <123> is valid, though not recommended, and it simply means those five data characters.
What matters in browsers is how they parse an HTML document. There is an attempted semi-formal description of this in HTML5 CR, but it’s a bit hard reading. The bottom line is that < triggers special parsing: if the next character is a letter, data is parsed as an HTML tag; otherwise, the < as well as data after it are taken as normal data characters.
When a tag like <abc> has been parsed, modern browsers construct an element node in the document tree – even though the tag is invalid and the tag name is not known to the browser at all. If there is no end tag <abc>, the node contains all the rest there is in the document. But for an element node with an unknown name, there is no default styling and no default action. You won’t notice its existence, unless you try to do something with it (like put abc { color: solid red } in a style sheet).
Technically, one could say that the cause of the difference is that “a” is a name start character (a character that may appear as the first one in a tag name), whereas “1” is not.
It is safest to always escape a “<” character in content (except for style and script and xmp elements, which have rules of their own) as <. There is no need to escape a “>”, but if desired, for symmetry, you can escape it as >.
Unrecognised elements are added to the DOM for forward compatibility (they can be enhanced with CSS/JS). Element names may not begin with a number though, so they are not added to the DOM and error recovery treats them as text instead.
Use < and > if you want to include < and > as data instead of markup.

In what scopes do special HTML characters need to be escaped?

In HTML,
Dust & Bones
needs to be escaped as follows:
Dust & Bones
What's the scope of where &amp needs to be applied. Is it just href or is it anywhere within HTML text? What about
<input value="http://... & ">?
or within
<script>... & ... </script>
do these need escaping?
update
The bigger question, which would explain this, is, when does the HTML parser look for &XXX; tokens and replace them? Is it done once on the whole document, or do different rules apply for the text between tags vs. attribute values within a tag vs. wihtin tagA vs. within tagB -- different parsing rules seem to apply within , so I may write && (for AND) and < for (LESS-THAN). So, what rules apply in which scopes?
The rules vary depending on the version of HTML you are dealing with but are always more complex then is worth trying to remember.
The safe approach is "Use character references to represent the 5 HTML special characters everywhere except inside script and style elements", which makes you safe for everything except XHTML.
For XHTML the rule is the same with the additional proviso of "and use explicit CDATA sections in script and style elements".
The bigger question, which would explain this, is, when does the HTML parser look for &XXX; tokens and replace them?
As it parses the HTML (depending on what the current state of the tokeniser is ("inside start tag" and "inside attribute value" are examples of different states)).
Is it done once on the whole document
Unless you trigger additional HTML parsing (e.g. by setting innerHTML on an element).
or do different rules apply for the text between tags vs. attribute values within a tag vs. wihtin tagA vs. within tagB
Different rules apply in different places. The complete, current rules are (as I suggested in a comment) rather complex and would require a lot of work to extract from the HTML 5 parsing rules. This is why I suggest, if you are an HTML author and not a browser author, using the simpler rules of "Use character references unless you are in a script or style element".
-- different parsing rules seem to apply within <script>, so I may write && (for AND) and < for (LESS-THAN). So, what rules apply in which scopes?
In HTML 4 terms, script and style elements are defined as containing CDATA (where the only sequence of characters with special meaning in HTML are </ which terminates the CDATA section). Everywhere else in the document (including, counter-intuitively, attribute values that are defined as containing CDATA) & indicates the start of a character reference (although there might be a few exceptions based on what the character following the & is).
The HTML 5 rules are more complicated, but the basic principle of "It is safe and sane to use character references for &, <, >, " and ' everywhere except inside script and style elements" holds.