How to represent negative number in HTML - html

Simple question, what is the proper character to use for representing a negative number? Should I use a normal dash, a minus entity or is there a more appropriate entity to use?
To be more clear, in an HTML document if I want to display a negative temperature, should I use:
-5 °C
or
−5 °C
or something else?

There are two separate questions here: 1) what do you use as the character that acts as the sign of a negative number (or, more generally, how you write a negative number with characters), and 2) how do you represent that character in HTML. Only the latter is HTML-specific and can thus be considered on-topic at SO.
However, question 1 is, too, somewhat programming-related. In most computer languages (such as JavaScript, HTML, and CSS), a negative number is almost always written using the common ASCII hyphen, officially HYPHEN-MINUS, “-”, U+002D. For this reason, people often use the ASCII hyphen in general texts, too, and even in mathematical texts. This typically violates the rules of human languages. In most languages, the MINUS SIGN “−” U+2212 is preferred. It is also typographically much better, especial in quality fonts, where e.g. “−42” has a noticeable sign and in “-42” the ASCII hyphen is less noticeable. The MINUS SIGN also has better line breaking properties: web browsers do not treat it as allowing a line break between it and the following digit, as they may well do to the ASCII hyphen.
Having chosen to use the minus sign, the simplest approach is to use the character “−” itself. For this, you need some method of inputting it. You also need to take care of character encoding issues, normally using UTF-8, but this is something that should be done anyway.
You can also use the named character reference −. It stands for the minus sign, and it might be convenient casually when you need to use the character but lack a convenient quick way of typing it.

Here you can find different operators representation in html.

The other answers give you the correct information on the html, in that there are character codes that mean "minus", and the downside of using the hyphen is that the browser could word-wrap the number and put the digits on the line after the hyphen.
However, you asked "Should I use a normal dash"? That really depends on the context. In particular, if you want the user to be able to copy your text and paste it into another program, and have that program interpret your negative numbers as negative numbers, you are going to have to use a hyphen.
For example, copy the following two lines and paste it into an Excel spreadsheet:
−45
-45
You will notice that the first number is treated as text, and the second is treated as a number, even though according to the html, the opposite should happen.
To use a hyphen with negative numbers, and prevent wrapping, use a white-space style of "nowrap".

Related

Hyphenating arbitrary text automatically

What kinds of challenges are there facing automatic hyphenation? It seems that you could just draw word by word, breaking when the length of the line exceeds the length of the viewport (or whatever we're wrapping our text in), placing hyphens after as many characters as can fit (provided at least two characters fit and the word is at least four characters), skipping words that already contain a hyphen (there's no requirement that words have to be hyphenated).
But I note how Firefox and IE need a dictionary to be able to hyphenate with CSS's hyphens. This seems to imply that there are further issues regarding where we can place hyphens.
What kinds of issues are these? Do any exist in the English language or do they only exist in other languages?
You have these issues in all languages. You can only place a hyphen where meaningful tokens result from the split, as has already been pointed out. You don't want to, for example, split a word like "wr-ong".
This may or may not be a syllable, while in most languages (including English) it is. But the main point is that you cannot pin it down as easily just with some simple rules. You would need to consider a lot of phonology to get a highly accurate result, and these rules vary from language to language.
With this background, I can see why one would take a dictionary instead, and frankly, being a computational linguist myself, this is also what I would probably opt for.
If you DO want to go for an automatic solution, I would recommend doing some research in English phonology of syllables, or the so-called syllabification. You might want to start with this article on Wikipedia:
Wikipedia - Syllabification

Do I need to use HTML tags for en as well as em dashes?

I'm editing the HTML for an ebook. I'm using — (—) for when there is an em dash, but do I need to do the same for a regular hyphen, as in "micro-dot" or "over-sensitive"? Or can I just leave the "-" as-is in the text?
You do not need to use the entity —, since you can enter “—” as such, if the e-book is UTF-8 encoded, as it should. Neither do you need to change — to the em dash itself, if you now have it in the data.
There is no need to escape the common hyphen, i.e. the Ascii hyphen (officially called HYPHEN-MINUS in Unicode), in any way in HTML.
Note that at least according to Merriam-Webster, the words “microdot” and “oversensitive” are written without a hyphen. If you would like the spell them that way but specify an allowed hyphenation point (for automatic hyphenation by a browser), you would use the SOFT HYPHEN character (U+00AD). It, too, can be written as such in HTML, but since it is normally invisible, you might find it more convenient to use a named character reference ­ for it, e.g.
micro­dot

Using Fractions On Websites

Is it possible to use any fraction symbol on a website, represented as ¼ rather than 1/4 for example?
From what I've gathered, these are the only ones I can use:
½
⅓ ⅔
¼ ¾
Is this right and why is that? The reason why I ask this is because I've done a Google web search and can't seem to locate any others ... eg. 2/4
You can test http://www.mathjax.org/ it is a JavasScript library to make a Math Formula if this is what you want.
The image below displays all unicode-defined fraction symbols. Each of them is treated as one single character. You can use all of them freely, of course, but if you want more, e.g. 123/321, then you should look out for a library that can create fractions dynamically.
An option for doing so would be using LaTeX. There is another question (with very good answers) on how to do this.
Image from http://symbolcodes.tlt.psu.edu/bylanguage/mathchart.html#fractions
As I undserstand HTML5 includes MathML which can represent any fraction you want.
While searching the unicode table I also found these: ⅑ ⅒ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞.
A web page is built up with text, and that text is encoded in a certain character set. The character set you select decides on which characters can be displayed. This also means that characters or symbols that don't exist in the character set cannot be displayed.
As shown in Michael's answer, Unicode defines symbols for a number of fractions. These can be displayed without using all kinds of tricks, for example server or client side generated small bitmaps showing the desired fraction, or as indicated by
mohammad mohsenipur a Javascript library that transforms TeX or MathML.
There are several possibilities:
Use special character for fractions. Not possible for 2/4 for example, and problematic in font support for all but the three most common (vulgar) fractions you had found.
Use markup like <sub>2</sub>/<sup>4</sup>. Probably messes up your line spacing, and does not look particularly good.
Construct a fraction using some CSS for positioning and size control and using fraction slash character instead of the common slash. Rather awkward really, I would say.
Use OpenType <code>"frac"</code> feature. Rather limited support in browsers and especially in fonts.
MathJax, e.g. \(\frac{2}{4}\) or some more elaborated TeX code to produce a different style for fraction.
MathML. Verbose, and browser support to MathML inside HTML could be better.
These are explained more and illustrated in my page “Math in HTML (and CSS)”, section Fractions.
The choice thus depends on many factors. It also depends on the font quite a lot. I suggest that you test the different options using the font family declaration you intend to use. Despite the many alternatives, you might end up with using just the simple linear notation like 2/4.

Right angle bracket in HTML

For obvious reasons, the left angle bracket cannot be used literally in HTML text. The status of the right angle bracket is not quite so clear. It does work when I try it, but of course browsers are notoriously forgiving of constructs that are not strictly valid HTML.
http://www.w3.org/TR/html4/charset.html#h-5.4 seems to be saying it's valid, though may not be supported by older browsers, but also makes specific mention of quoted attribute values. Is it necessary to html encode right angle brackets? also says it's valid but again specifically talks about quoted attribute values.
What's the answer for plain chunks of text (contents of a <pre> Element happens to be the case I'm looking at), and does it differ in any way?
The character “>” can be used as such as data character in any version of HTML, both in element content and in an attribute value. This follows from the lack of any statement to the contrary in the specifications.
It is often routinely escaped as >, which is valid but not required for any formal or technical reason. It is used partly because people assume it is needed the same way as the “<” character needs to be escaped, partly for symmetry: writing, say, <code> may look more symmetric than <code>.
The character “>” is the GREATER THAN character. It is used in many contexts, like HTML markup, as a delimiter of a kind, in a bracket-like manner, but the real angle brackets, as used in some mathematical notations, are rather different, such as “⟩” U+27E9. If you need to include angle brackets in an HTML document, you have some serious issues to consider, but they relate to fonts (and semantics), not to any potential clash with markup-significant characters.
Right angle brackets are legal within a <pre> tag or as text within an element.
There is no ambiguity when using them in this manner and parsers have no issue with "understanding" them.
Personally, I just escape these whenever I need to use them, just to match left angle brackets...

HTML Escaping - Reg expressions?

I'd like to HTML escape a specific phrase automatically and logically that is currently a statement with words highlighted with quotation marks. Within the statement, quotation or inch marks could also be used to describe a distance.
The phrase could be:
Paul said "It missed us by about a foot". In fact it was only about 9".
To escape this phrase It should really be
<pre>Paul said “It missed us by about a foot”.
In fact it was only about 9′.</pre>
Which gives
<pre>Paul said “It missed us by about a foot”.
In fact it was only about 9″.</pre>
I can't think of a sample phrase to add in a " escape as well but that could be there!
I'm looking for some help on how to identify which of the escape values to replace " characters with at runtime. The phrase was just an example and it could be anything but should be correctly formed i.e. an opening and closing quote would be present if we are to correctly escape the text.
Would I use a regular expression to find a quoted phrase in the text i.e. two " " characters before a full stop and then replace the first then the second. with
“
then
”
If I found one " replace it with a
"
unless it was after a number where I replace it with
″
How would I deal with multiple quotes within a sentence?
"It just missed" Paul said "by a foot".
This would really stump me.....
<pre>"It just missed" Paul said "by 9" almost".</pre>
The above should read when escaped correctly. (I'm showing the actual characters this time)
“It just missed” Paul said “by 9″ almost”.
Obviously an edge case but I wondered if it's possible to escape this at runtime without an understanding of the content? If not help on the more obvious phrases would be appreciated.
I would do this in two passes:
The first pass searches for any "s which are immediately preceded by numbers and does that replacement:
s/([0-9])"/\1″/g
Depending on the text you're dealing with, you may want/need to extend this regex to also recognize numbers that are spelled out as words; I've only checked for digits for the sake of simplicity.
With all of those taken care of, a second pass can then easily convert pairs of "s as you've described:
s/"([^"]*)"/“\1”/g
Note the use of [^"]* rather than .* - we want to find two sets of double-quotes with any number of non-double-quote characters between them. By adding that restriction, there won't be any problems handling strings with multiple quoted sections. (This could also be accomplished using the non-greedy .*?, but a negated character class more clearly states your intent and, in most regex implementations, is more efficient.)
A stray, mismatched " somewhere in the string, or an inch marker which is missed by the first pass, can still cause problems, of course, but there's no way to avoid that possibility without implementing understanding of the content.
what you've described is basically a hidden markov model,
http://en.wikipedia.org/wiki/Hidden_Markov_model
you have a set of input symbols (your original text and ambiguous punctuation), and a set of output symbols (original text and more fine-grained punctuation) but no good way of really observing the connection between the two in a programmatic way. you could write some rules to cover some of the edge cases, but that will basically never work for the multiple quotes situation. in this case you can't really use a regex for the same reason, but with an hmm, and a bunch of training text you could probably mmake some pretty good guesses.
sorry that's probably not very helpful if you're trying to get something ready for deployment, but the input has greater ambiguity than the output, so your only option is to consider the context, and that basically means either a very lengthy set of rules, or some kind of machine learning approach.
interesting question though - it would be neat to see what kind of performance you could get. maybe someone's already written a paper on it?
I wondered if it's possible to escape
this at runtime without an
understanding of the content?
Considering that you're adding semantic meaning to the punctuation which is currently encoded in the other text... no, not really.
Regular expressions would be the easiest tool for at least part of it. I'd suggest looking for /\d+"/ for the inch number cases. But for quotes delimiters, after you'd looked for any other special cases or phrases, it may be easier to use an algorithm for matching pairs, like with parentheses and brackets: tokenize and count. Then test on real-world input and refine.
But I really have to ask: why?
I am not sure if it is possible at all to do that without understanding the meaning of the sentence. I tend to doubt it.
My first attempt would be the following.
go from left to right through the string
alternate replacing double primes with left and right double quotes, but replace with double primes if there is a number to the left
if the quotation marks are unbalanced at the end of the string go back until you find a number with double primes and change the double primes into left or right double quotes depending on the preceding double quotes.
I am quite sure that you can easily fail this strategy. But it is still the easy case - hard work starts when you have to deal with nested quotation marks.
I know this is off the wall, but have you considered Mechanical Turk? This is the sort of problem humans excel at, and computers, currently, are terrible at. Choosing the correct punctuation requires understanding of the meaning of the sentence, so a regex is bound to fail for edge cases.
You could try something like this. First replace the quotations with this regular expression:
"((?:[^"\d]+|\d"?)*)"
And than the inch sign:
(\d+)"
Here’s an example in JavaScript:
'"It just missed" Paul said "by 9" almost"'.replace(/"((?:[^"\d]*|\d["']?)+)"/g, "“$1”").replace(/(\d+)"/g, "$1″");