What is the proper HTML entity for the "x" in a dimension? - html

Is the proper HTML entity for giving dimensions ×? I want to be semantically correct, but that begs the question, is listing a dimension as 2" x 3" even semantic? If the x represents "by", would I use the letter x or ×?
In my code I've been using 2″ × 3″, or 2″ × 3″. The non-breaking spaces are to prevent the dimension from being wrapped, as per the suggestions found in The Elements of Typographic Style Applied to the Web.

×
Unicode: U+00D7 MULTIPLICATION SIGN
HTML: ×, ×
CSS: \00d7
See the Wikipedia article about the multiplication sign:
In mathematics, the symbol × (read as times or multiplied by) is primarily used to denote the […]
Geometric dimension of an object, such as noting that a room is 10×12 feet in area.
Depending on the context, the math element (for MathML) element could be of use.

The proper question is which character should be used. The use of entity references for characters adds no semantics. There is no formal standard on denoting dimensions, but clearly this is about multiplication rather than the Latin letter x, so “x” (×) is the correct character.
In practice, this is more of an orthography and typography question than about “semantic web”. Search engines, browsers, etc., don’t really care; it’s the human readers that matter.

You're doing everything correctly. I believe × here is [semantically] related to the operation of multiplication, i.e. in fact you write the area by specifying two dimensions.

Related

What character should I use to maintain height of an empty (zero width) string?

I have a string that can potentially be empty, and in that case, I want to substitute it with a special character to maintain the ordinary text height while having zero width. In TeX, this would be called \strut. What is the counterpart for that in HTML? I came up with two candidates: ⁠ and . Should I use one of these?
On modern browsers, any zero-width character will do the job, provided that the browser either knows that the character is zero-width or uses a font that contains an empty glyph for it. But some characters may have effects, depending on the context and on software used to process the HTML file.
U+2060 WORD JOINER has the effect of preventing line break.
U+FEFF ZERO WIDTH NO-BREAK SPACE has the same effect. It is formally deprecated for any use except as Byte Order Mark, but in reality it works more often than WORD JOINER (though there are exceptions).
U+200B ZERO WIDTH SPACE has the effect of allowing a line break even when it would otherwise not be permitted; it’s like SPACE, but with zero width.
Usually the worst-case scenario for characters like this is an old version of IE. Checking in IE 6 shows that U+FEFF and U+200B are OK, but U+2060 shows as a small rectangle (i.e., the browser tries to render the character but finds no glyph for it).
So I’d use  or ​ depending on whether I’d like to prevent or allow line break at that point. If it does not matter, ​ is more logical to use.
I would suggest  or if zero width is not essential or if it is essential you could try the Unicode character ⁠ which is a zero width non-breaking space.

Prevent Breaking of Negative Numbers

My HTML page contain tables with many negative numbers, like –0.25 . 8211 is the n-dash. Because my document is supposed to become epub2 eventually, javascript is not allowed. only xhtml+css.
Unfortunately, both ebook readers and the print function in Chrome think that it is a reasonable idea to line-break a negative number between the en-dash and the zero, even when there is a space before and/or after, e.g., in a table.
I need a "non-breaking" en-dash? there are non-breakable spaces, after all, too. Or is there a way to instruct css never to break such negative numbers anywhere throughout the entire document? (I doubt this one, but just had to ask.)
of course, I can wrap each negative number into a span to prevent breaking, but this is quite painful. literally, by the time I am all done, my number --0.25 would have to become <span class="nobreak">–0.25</span>. (joke: it's almost like a DOS 10x amplification attack, with 4 chars becoming 40 characters, all because I want to have negative numbers.)
advice appreciated.
/iaw
You can prevent negative numbers from breaking by using the proper MINUS SIGN “−” (U+2212). In
text rendering, browsers, ebook readers, and other software often treat EN DASH as well as HYPHEN-MINUS (the common Ascii hyphen) as allowing a line break after it, even when immediately followed by a digit. No such behavior has been observed for MINUS SIGN.
In HTML, you can write MINUS SIGN as − if you have difficulties in typing the character or if you wish to make it clear to anyone reading the HTML source that MINUS SIGN is used.

How does Zalgo text work?

I've seen weirdly formatted text called Zalgo like below written on various forums. It's kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within a certain "container". Obviously the Zalgo text is moving vertically and doesn't seem to be restricted to any space.
Is this a bug/flaw/exploit/hack in Unicode? Are these individual characters with weird properties? "What" is happening here?
H̡̫̤̤̣͉̤ͭ̓̓̇͗̎̀ơ̯̗̱̘̮͒̄̀̈ͤ̀͡w͓̲͙͖̥͉̹͋ͬ̊ͦ̂̀̚ ͎͉͖̌ͯͅͅd̳̘̿̃̔̏ͣ͂̉̕ŏ̖̙͋ͤ̊͗̓͟͜e͈͕̯̮̙̣͓͌ͭ̍̐̃͒s͙͔̺͇̗̱̿̊̇͞ ̸̤͓̞̱̫ͩͩ͑̋̀ͮͥͦ̊Z̆̊͊҉҉̠̱̦̩͕ą̟̹͈̺̹̋̅ͯĺ̡̘̹̻̩̩͋͘g̪͚͗ͬ͒o̢̖͇̬͍͇͓̔͋͊̓ ̢͈͙͂ͣ̏̿͐͂ͯ͠t̛͓̖̻̲ͤ̈ͣ͝e͋̄ͬ̽͜҉͚̭͇ͅx͎̬̠͇̌ͤ̓̂̓͐͐́͋͡ț̗̹̝̄̌̀ͧͩ̕͢ ̮̗̩̳̱̾w͎̭̤͍͇̰̄͗ͭ̃͗ͮ̐o̢̯̻̰̼͕̾ͣͬ̽̔̍͟ͅr̢̪͙͍̠̀ͅǩ̵̶̗̮̮ͪ́?̙͉̥̬͙̟̮͕ͤ̌͗ͩ̕͡
The text uses combining characters, also known as combining marks. See section 2.11 of Combining Characters in the Unicode Standard (PDF).
In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character
So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model. Such a sequence has no meaning of course, and even a monkey could produce it (e.g., given a keyboard with suitable driver).
And you can mix “combining above” and “combining below” marks.
The sample text in the question starts with:
LATIN CAPITAL LETTER H - H
COMBINING LATIN SMALL LETTER T - ͭ
COMBINING GREEK KORONIS - ̓
COMBINING COMMA ABOVE - ̓
COMBINING DOT ABOVE - ̇
Zalgo text works because of combining characters. These are special characters that allow to modify character that comes before.
OR
y + ̆ = y̆ which actually is
y + ̆ = y̆
Since you can stack them one atop the other you can produce the following:
y̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
which actually is:
y̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
The same goes for putting stuff underneath:
y̰̰̰̰̰̰̰̰̰̰̰̰̰̰̰̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
that in fact is:
y̰̰̰̰̰̰̰̰̰̰̰̰̰̰̰̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
In Unicode, the main block of combining diacritics for European languages and the International Phonetic Alphabet is U+0300–U+036F.
More about it here
To produce a list of combining diacritical marks you can use the following script (since links keep on dying)
for(var i=768; i<879; i++){console.log(new DOMParser().parseFromString("&#"+i+";", "text/html").documentElement.textContent +" "+"&#"+i+";");}
Also check em out
Mͣͭͣ̾ Vͣͥͭ͛ͤͮͥͨͥͧ̾

HTML: why isn't the input field SIZE deprecated? we have style="width:xx"!

simply put, i think that the input 'size' field is now obsolete (like the rest of html styling outside of css), and most of the sizing attributes have been deprecated, so why not input.size?
Maybe you got confused between size and width/height attributes (btw I got confused at first when you said field). But assuming you didn't, let me explain what size is for.
size attribute
The attribute size for element <input> applies to text inputs, like e-mail, password, etc. It defines the maximum character width for the input. Let's say for example you want the maximum password length to be 4 to screw the users, you give it size=4, so you cannot enter passwords like dinosaur (anything you type after dino will not appear unless you delete the previous letters first)
Edit: as pointed out by Maksym in the comments, the above is defined by maxwidth, not size. size is, if we refer to HTML4 spec,
except when type attribute has the value "text" or "password". In that case, its value refers to the (integer) number of characters.
So size=4 is about 4 (monospaced?) characters wide (My experiment with Google Chrome has makes it size+1 though, i.e. size=4 is 5 characters wide.)
Dimension attributes
Now in case of the dimension attributes width and height. For <input> elements, they apply only to image buttons. They define the dimensions of the button. Now why can't you just apply CSS to them?
First, know that CSS is for visual purposes. Image buttons submit coordinates that the user clicked on. This behavior needs to be consistent across browsers, whether they support CSS or not. See this warped image:
There may be a case where the user is asked to click the letter e for the form submit to be processed differently. Probably the server will check whether the x coordinate is 75 <= x <= 90. But if you defined the dimension with CSS, browsers which disabled CSS will see this image instead:
And the previous coordinate range check is no longer valid, seeing that the letter e is further to the right, hardly within 75 and 90 (and you need to click on the left side of the first o to get the same input).
That is a rhetorical question really... or an attempt at persuasive argument. You are right, though-- it could easily be deprecated and CSS take over. As for why, that answer can only come from someone inside W3C who is part of the decision making process. You could also ask why cellpadding, cellspacing, and width properties are not deprecated in tables.
The best answer I can give you to your non-question is that HTML isn't a purist language--while it's getting back to it's roots of being just content and not style or behavior, it still has it's legacy from the 90's and 00's, which means it still has concerns beyond just content.
There's a valid reason why this attribute isn't deprecated: "Web Usability"
When a user views your site, considering the CSS is turned of or unavailable, it's really a bad idea to let the person see a long field while in fact, the field only requires a few characters (such as ids, phone numbers, etc.).
CSS is good, yes, but only if you also considered people who won't be able to use them.

What explains the term orthogonal in a more non-nerd fashion?

For example:
Cardinality and optionality are
orthogonal properties of a
relationship. You can specify that a
relationship is optional, even if you
have specified upper and/or lower
bounds. This means that there do not
have to be any objects at the
destination, but if there are then the
number of objects must lie within the
bounds specified.
What exactly does "orthogonal" mean? I bet it's just a fancy soundig nerd-style word for something that could be expressed a lot easier to understand for average people ;)
From wikipedia:
In mathematics, two vectors are
orthogonal if they are perpendicular,
i.e., they form a right angle. The
word comes from the Greek ὀρθός
(orthos), meaning "straight", and
γωνία (gonia), meaning "angle".
Anyone?
In the quoted context above you could substitute the word "independent" or "unrelated" for "orthogonal".
Items/concepts/values etc.. that are Orthogonal means that one does not constrain the other, so you can establish one item/concept/value without regards for how other orthogonal items are set.
Loosely speaking, orthogonal means independent.
Specifically in 2d space an orthogonal line is one with bends at 90 degrees to each other.