Html entity for true minus symbol - html

I'm a typophile adding mathematical equations to my pages.
I've found questions like this one that have explained to use × instead of 'x' for a true multiplication symbol. But I can't find any questions that indicate whether an html entity exists for a true minus symbol instead of using a hyphen, en-dash or em-dash?
Any help would be much appreciated.

According to this reference, the HTML entity is −

Related

How to edit this html lexer rule?

I want to edit this HTML lexer rule and I need help with the Regular Expression
the TAG_NAME refers to any HTML attribute for ex: (required, class, id, etc...).
I want to edit it to make it does not accept this exact syntax: 'az-'.
I think this needs regular expression modification, I looked it up but I couldn't integrate what I found online with the way these rules are written.
I tried to remove the '-' in the Tag_NameChar as a first try but that made the HTML doesnt recognize attributes like 'data-target'.
This snippet is for the rule:
and this one shows how the attributes are recognized.
ANTLR does not support lookahead syntax like some regex engines do, so there's no easy way to exclude certain matches from within the regex. It's always possible to rewrite a regular expression to exclude a given string (regular expressions are closed under negation and intersection), but it usually ends up quite painful. In your case, you'd end up with something following the logic of "a tag name can either have less than 3 characters, more than 3 characters, or it could have three characters where the first isn't an 'a', the second isn't a 'z' or the last isn't a '-'".
The less painful, but also less cross-language solution is to use a predicate that returns false if the text of the tag name equals az-. So something like {getText().equals("az-")}? depending on the language.
If you're okay with introducing an additional lexer rule, you may also introduce a rule INVALID_TAG_NAME (or whatever you want to call it) that matches exactly az- and that's defined before TAG_NAME. That way any tag that's named exactly az- will produce an INVALID_TAG_NAME token instead of a TAG_NAME token.
Depending on your requirements, you could also leave the grammar unchanged altogether and simply produce an error when you see a tag named az- when you traverse the tree in a listener or visitor.

What’s the difference between and 
?

Apart from readability, what is the difference between the HTML codes
and 
?
ASCII Code
HTML Entity 

Hexadecimal value
These are all referring to the same thing but are represented in different ways. They all translate to Unicode U+0000A LINE FEED (LF)
Example:
The number 2 can be represented using 1+1. It can also be represented using |sqrt(4)|. The result is the same, but using different syntaxes we can achieve the same result in different ways.
References:
https://theasciicode.com.ar/ascii-control-characters/line-feed-ascii-code-10.html
https://www.quackit.com/character_sets/unicode/co_controls_and_basic_latin_unicode_character_codes.cfm
https://www.w3schools.com/html/html_symbols.asp
https://www.w3schools.com/charsets/ref_html_ascii.asp

Am I using < correctly?

I am new to HTML and Xpath and need a little help.
I am trying to use the less than function in HTML but it keeps coming up with an error.
value.singleNodeValue.setAttribute("select", "match[round &lT; '"+matchround+"']");
where round is an attribute name and matchround is the user input I want to compare to.
Can someone please highlight what I am doing wrong as the greater than statement works perfectly.
General and parameter entities in XML are case sensitive.
&lT; (with a capital T as is currently shown in the code in your question) is not the same as <.
It's also not real clear how your example is intending to use XPath. It looks like you're just trying to set an attribute named select with a string that has barely a passing resemblance of the XPath match() function. Note also that match() is only available in XPath 2.0.
The answer to your asked question is that you must use < (case sensitive) instead of a literal < in XML so that parsers do not mistake the < for the start of an element.
The answer to your real question will depend on your clarification of what your true end goal is.
What kind of comparison do you want to make, what kind of value does matchround have? Is it a string or a number? XPath 1.0 does not support less than or greater than comparisons on string values, only number comparison. And you say that round is an attribute name, in that in XPath you need #round. So if matchround is a number (e.g. 50) then doing setAttribute("select", "match[#round < " + matchround + "]") would create an attribute with the XPath expression match[#round < 50] as its value. Any escaping would happen only if the DOM tree is serialized, the attribute value in the DOM would contain the pure XPath expression with a < character, not with an entity reference.

HTML5 input validate

i want a HTML5 form validation pattern that can very strictly validate these characters...
IN16/20032/2012
in16/20032/12
inP12/20003/13
inP32/20003/2013
there must be only two back slashes
upper case and lower case aphabets are allowed (maximum of 3, minimum of 2)
number must not be less than 9, or more than 11
must start with alphabets
must not start with number
for example, the validation should reject the following:
ihfg45/......
45in/........etc,
please assist me solve this and i will greately appreciate...
i vale looked at the following links...
http://www.girliemac.com/blog/2012/11/21/html5-form-validation/ and
http://w3resource.com/gallery/html5-based-fom-validation-without-javascript
I believe this should cover your validation rules
(?:[A-Za-z]{2}[0-9]{2,4}|[A-Za-z]{3}[0-9]{1,3})\/[0-9]{5}\/[0-9]{2,4}
You can read the explanation here: https://regex101.com/r/gN1aD7/1 (Note that you don't need ^ and $ for HTML5 validation. It's implied).
And here is a simple demo: http://jsfiddle.net/sb9agLtt/

Variable order regex syntax

Is there a way to indicate that two or more regex phrases can occur in any order? For instance, XML attributes can be written in any order. Say that I have the following XML:
Home
Home
How would I write a match that checks the class and title and works for both cases? I'm mainly looking for the syntax that allows me to check in any order, not just matching the class and title as I can do that. Is there any way besides just including both combinations and connecting them with a '|'?
Edit: My preference would be to do it in a single regex as I'm building it programatically and also unit testing it.
No, I believe the best way to do it with a single RE is exactly as you describe. Unfortunately, it'll get very messy when your XML can have 5 different attributes, giving you a large number of different REs to check.
On the other hand, I wouldn't be doing this with an RE at all since they're not meant to be programming languages. What's wrong with the old fashioned approach of using an XML processing library?
If you're required to use an RE, this answer probably won't help much, but I believe in using the right tools for the job.
Have you considered xpath? (where attribute order doesn't matter)
//a[#class and #title]
Will select both <a> nodes as valid matches. The only caveat being that the input must be xhtml (well formed xml).
You can create a lookahead for each of the attributes and plug them into a regex for the whole tag. For example, the regex for the tag could be
<a\b[^<>]*>
If you're using this on XML you'll probably need something more elaborate. By itself, this base regex will match a tag with zero or more attributes. Then you add a lookhead for each of the attributes you want to match:
(?=[^<>]*\s+class="link")
(?=[^<>]*\s+title="Home")
The [^<>]* lets it scan ahead for the attribute, but won't let it look beyond the closing angle bracket. Matching the leading whitespace here in the lookahead serves two purposes: it's more flexible than matching it in the base regex, and it ensure that we're matching a whole attribute name. Combining them we get:
<a\b(?=[^<>]*\s+class="link")(?=[^<>]*\s+title="Home")[^<>]+>[^<>]+</a>
Of course, I've made some simplifying assumptions for the sake of clarity. I didn't allow for whitespace around the equals signs, for single-quotes or no quotes around the attribute values, or for angle brackets in the attribute values (which I hear is legal, but I've never seen it done). Plugging those leaks (if you need to) will make the regex uglier, but won't require changes to the basic structure.
You could use named groups to pull the attributes out of the tag. Run the regex and then loop over the groups doing whatever tests that you need.
Something like this (untested, using .net regex syntax with the \w for word characters and \s for whitespace):
<a ((?<key>\w+)\s?=\s?['"](?<value>\w+)['"])+ />
The easiest way would be to write a regex that picks up the <a .... > part, and then write two more regexes to pull out the class and the title. Although you could probably do it with a single regex, it would be very complicated, and probably a lot more error prone.
With a single regex you would need something like
<a[^>]*((class="([^"]*)")|(title="([^"]*)"))?((title="([^"]*)")|(class="([^"]*)"))?[^>]*>
Which is just a first hand guess without checking to see if it's even valid. Much easier to just divide and conquer the problem.
An first ad hoc solution might be to do the following.
((class|title)="[^"]*?" *)+
This is far from perfect because it allows every attribute to occur more than once. I could imagine that this might be solveable with assertions. But if you just want to extract the attributes this might already be sufficent.
If you want to match a permutation of a set of elements, you could use a combination of back references and zero-width
negative forward matching.
Say you want to match any one of these six lines:
123-abc-456-def-789-ghi-0AB
123-abc-456-ghi-789-def-0AB
123-def-456-abc-789-ghi-0AB
123-def-456-ghi-789-abc-0AB
123-ghi-456-abc-789-def-0AB
123-ghi-456-def-789-abc-0AB
You can do this with the following regex:
/123-(abc|def|ghi)-456-(?!\1)(abc|def|ghi)-789-(?!\1|\2)(abc|def|ghi)-0AB/
The back references (\1, \2), let you refer to your previous matches, and the zero
width forward matching ((?!...) ) lets you negate a positional match, saying don't match if the
contained matches at this position. Combining the two makes sure that your match is a legit permutation
of the given elements, with each possibility only occuring once.
So, for example, in ruby:
input = <<LINES
123-abc-456-abc-789-abc-0AB
123-abc-456-abc-789-def-0AB
123-abc-456-abc-789-ghi-0AB
123-abc-456-def-789-abc-0AB
123-abc-456-def-789-def-0AB
123-abc-456-def-789-ghi-0AB
123-abc-456-ghi-789-abc-0AB
123-abc-456-ghi-789-def-0AB
123-abc-456-ghi-789-ghi-0AB
123-def-456-abc-789-abc-0AB
123-def-456-abc-789-def-0AB
123-def-456-abc-789-ghi-0AB
123-def-456-def-789-abc-0AB
123-def-456-def-789-def-0AB
123-def-456-def-789-ghi-0AB
123-def-456-ghi-789-abc-0AB
123-def-456-ghi-789-def-0AB
123-def-456-ghi-789-ghi-0AB
123-ghi-456-abc-789-abc-0AB
123-ghi-456-abc-789-def-0AB
123-ghi-456-abc-789-ghi-0AB
123-ghi-456-def-789-abc-0AB
123-ghi-456-def-789-def-0AB
123-ghi-456-def-789-ghi-0AB
123-ghi-456-ghi-789-abc-0AB
123-ghi-456-ghi-789-def-0AB
123-ghi-456-ghi-789-ghi-0AB
LINES
# outputs only the permutations
puts input.grep(/123-(abc|def|ghi)-456-(?!\1)(abc|def|ghi)-789-(?!\1|\2)(abc|def|ghi)-0AB/)
For a permutation of five elements, it would be:
/1-(abc|def|ghi|jkl|mno)-
2-(?!\1)(abc|def|ghi|jkl|mno)-
3-(?!\1|\2)(abc|def|ghi|jkl|mno)-
4-(?!\1|\2|\3)(abc|def|ghi|jkl|mno)-
5-(?!\1|\2|\3|\4)(abc|def|ghi|jkl|mno)-6/x
For your example, the regex would be
/<a href="home.php" (class="link"|title="Home") (?!\1)(class="link"|title="Home")>Home<\/a>/