In what contexts can you use greater than as text in html - html

In what contexts can I use the greater than symbol < as text in HTML?
For example < & <= parse render as text perfectly fine if they are in a tag:
<p>
<
<=
</p>
However <t will be parsed as HTML by the browser and not produce the text <t.
Is there a rule for what characters can proceed the greater than symbol for the browser to assume that it is the start of a tag?

The rule is: almost never.
Only inside quoted attribute values (and in raw text tags like script and style) are you permitted to write < unescaped. I think attribute names permit these too, but not > (though why you would put a < in an attribute name is beyond me).
Browsers will do their best to recover from bad HTML, so sometimes you might get away with it if you forget.
But it's best to always encode your entities.
You should scan the HTML spec, but here's one relevant chapter with some of the constraints listed in various sections.
Use an HTML validator in strict mode to make sure you're getting it right; the HTML you gave in your question is rejected by the linked tool, with a suggestion to switch to <.

Related

<in a nutshell> as text not html tag

I have a text: Our process<in a nutshell>
that has an output as:
Our process<in nutshell="" a=""></in>
I didn't even know in is a tag and cannot find on google what it does.
How do I post it as text? And what is <in>?
Thanks!
In HTML:
Our process <in a nutshell>
There is no <in> tag defined in HTML, but browsers and other parsers still treat <in a nutshell> as tag. It creates an element node in the document tree, representing an unknown element, so it has only a set of general properties. It has no special rendering, and no functionality is associated with it. But you could style it and/or use client-side JavaScript to add functionality to it.
In this case, you didn’t mean to do anything like that, but the tag is still parsed, and in is treated as the element name (tag name) and nutshell and a as attribute names, with attribute values defaulted to the empty string. Since tags are treated as code for starting an element, the tag itself is not rendered. Browsers may imply a closing tag </in> under certain conditions. This explains the “output” presented in the question; it’s really just the fragment of code viewed in a browser’s Developer Tools. The actual rendering in the example case is just the string “Our process”.
To prevent this processing, the “<” character needs to be escaped somehow; < is the best and most common method, so you would write
Our process<in a nutshell>
There is no need to escape the “>”, but you may do so, for symmetry, using >.
Try to replace
< with <
and replace
> with >
Does this give you the expected results?
The browser is interpreting anything in '<>' as a tag.
You need to use the character code to display those symbols as text:
Our process <in a nutshell>

invalid tags in HTML <abc> vs <1234>

I was writing a simple web page. And I wanted to print <abc> and <1234> inside the page. Why <1234> is printed not <abc>? I know <abc> is invalid tag thats why it is not rendered. But what about <1234>?
You have to do it like:
and <1234>
Use HTML entities.
< = <
> = >
Using them tells HTML that you want the < and > to be displayed as it is and not be interpreted as the < and > in <html>
DEMO
P.S.: Here's a list of them.
This is down to the way that browsers parse the HTML into a format that gets displayed as a web page.
As a rule, HTML tags must start with letters. Because of this, the browser attempts to parse as a valid tag (therefore hiding it), but doesn't recognise <1234> and therefore leaves it untouched.
Edit:
As #Arkana pointed out below, there's nothing I can see in the HTML specification that specifically forbids starting a HTML tag with a number. My best guess is that because no (currently valid) HTML tags actually do start with a number, the browser's parser just ignores these tags, based on the same rule that IDs and Names follow according to the HTML4 spec.
In XHTML and in HTML5 (even in HTML serialization), both <abc> and <123> are invalid. In HTML 4.01, <123> is valid, though not recommended, and it simply means those five data characters.
What matters in browsers is how they parse an HTML document. There is an attempted semi-formal description of this in HTML5 CR, but it’s a bit hard reading. The bottom line is that < triggers special parsing: if the next character is a letter, data is parsed as an HTML tag; otherwise, the < as well as data after it are taken as normal data characters.
When a tag like <abc> has been parsed, modern browsers construct an element node in the document tree – even though the tag is invalid and the tag name is not known to the browser at all. If there is no end tag <abc>, the node contains all the rest there is in the document. But for an element node with an unknown name, there is no default styling and no default action. You won’t notice its existence, unless you try to do something with it (like put abc { color: solid red } in a style sheet).
Technically, one could say that the cause of the difference is that “a” is a name start character (a character that may appear as the first one in a tag name), whereas “1” is not.
It is safest to always escape a “<” character in content (except for style and script and xmp elements, which have rules of their own) as <. There is no need to escape a “>”, but if desired, for symmetry, you can escape it as >.
Unrecognised elements are added to the DOM for forward compatibility (they can be enhanced with CSS/JS). Element names may not begin with a number though, so they are not added to the DOM and error recovery treats them as text instead.
Use < and > if you want to include < and > as data instead of markup.

In what scopes do special HTML characters need to be escaped?

In HTML,
Dust & Bones
needs to be escaped as follows:
Dust & Bones
What's the scope of where &amp needs to be applied. Is it just href or is it anywhere within HTML text? What about
<input value="http://... & ">?
or within
<script>... & ... </script>
do these need escaping?
update
The bigger question, which would explain this, is, when does the HTML parser look for &XXX; tokens and replace them? Is it done once on the whole document, or do different rules apply for the text between tags vs. attribute values within a tag vs. wihtin tagA vs. within tagB -- different parsing rules seem to apply within , so I may write && (for AND) and < for (LESS-THAN). So, what rules apply in which scopes?
The rules vary depending on the version of HTML you are dealing with but are always more complex then is worth trying to remember.
The safe approach is "Use character references to represent the 5 HTML special characters everywhere except inside script and style elements", which makes you safe for everything except XHTML.
For XHTML the rule is the same with the additional proviso of "and use explicit CDATA sections in script and style elements".
The bigger question, which would explain this, is, when does the HTML parser look for &XXX; tokens and replace them?
As it parses the HTML (depending on what the current state of the tokeniser is ("inside start tag" and "inside attribute value" are examples of different states)).
Is it done once on the whole document
Unless you trigger additional HTML parsing (e.g. by setting innerHTML on an element).
or do different rules apply for the text between tags vs. attribute values within a tag vs. wihtin tagA vs. within tagB
Different rules apply in different places. The complete, current rules are (as I suggested in a comment) rather complex and would require a lot of work to extract from the HTML 5 parsing rules. This is why I suggest, if you are an HTML author and not a browser author, using the simpler rules of "Use character references unless you are in a script or style element".
-- different parsing rules seem to apply within <script>, so I may write && (for AND) and < for (LESS-THAN). So, what rules apply in which scopes?
In HTML 4 terms, script and style elements are defined as containing CDATA (where the only sequence of characters with special meaning in HTML are </ which terminates the CDATA section). Everywhere else in the document (including, counter-intuitively, attribute values that are defined as containing CDATA) & indicates the start of a character reference (although there might be a few exceptions based on what the character following the & is).
The HTML 5 rules are more complicated, but the basic principle of "It is safe and sane to use character references for &, <, >, " and ' everywhere except inside script and style elements" holds.

How does the browser distinguish between escaped strings and actual HTML when both are displayed similarly?

If I place <a href="www.stackoverflow.com"> inside the body tag, and if I place the following string inside the body tag "<a href="www.stackoverflow.com">", how does the browser know that the first is to be rendered as an actual link, and the latter as simple text ?
The less than character “<” is defined to be a tag start character. The notation < is something completely different; it simply means the less than character as a data character, not interpreted as markup at all. So the answer is really “By definition.”
By the way, href="www.stackoverflow.com" contains a relative address, resolved relative to the current base address. To refer to StackOverflow main page, you need to write href="http://www.stackoverflow.com".
If we uses reserved characters/ HTML tags in our html pages they are rendered as markups by the borwsers.Some times we are in need to use these charcatres as itself not as markups then we have to use some escape sequences to achive them.
you can get a good idea of how browsers work from this link.
you can find some escape sequences from here
In our case <a href="www.stackoverflow.com"> in that < and > are a reserved charcter by html when ever it uses in the page its rendered as an html tag but if you want to use or display < or > in your page you have to use coresponding escape sequence. thats how browsers replaces the < as < and displayed in the page

What else can be used instead of < or > in HTML codes?

When we do any HTML coding we use < and > to specify a tag which any browser does not show as text but as display. Can anything else(any coding, for HTML) be used instead of these symbols?
I think you are asking whether it is possible to use characters other than < and > as tag start and tag end characters. For example, can one somehow define that [ and ] are used instead, so that we would write [p] and not <p>.
The answer is no. HTML was formally based on SGML, which has provisions for such definitions; in SGML, < and > are just “reference concrete syntax” characters for abstract “start of tag” and “end of tag” notations. But HTML was never actually implemented as SGML-based, and the HTML specifications even formally fixed the syntax to use < and >. And XML, the simplified version of SGML, upon which XHTML is based, has no provisions for setting such syntax features.
In practical terms: No. Only < and > mark the start and end of a tag in HTML.
In theoretical terms only (because this is not supported by any mainstream browser), in HTML 4 and earlier you could use SHORT TAGS. The syntax for this is to use / instead of > to end the start tag and then / instead of the entire end tag:
For example:
<title/This is the title/
or
<br/ <!-- note that the end tag for br elements must be omitted in HTML 4 and earlier -->
Some other SGML features may allow other options, but they would also not be supported by browsers.
The following is my answer to what appeared to be the original question after someone had edited it to show < instead of <.
In theory, for HTML 4 and earlier, you can use CDATA sections … but they never saw widespread support in browsers so aren't of any practical value in HTML.
There is also the <xmp> element, which is obsolete. The HTML 5 draft marks it as non-conforming and says:
Use pre and code instead, and escape "<" and "&" characters as "<" and "&" respectively
The W3C Wiki has this to say about xmp:
No, really. don't use it.
Character references (< and co) are the correct tools for the job. Any desire to avoid them is better replaced by learning to love a programatic solution or the find & replace feature of your editor.