How to replicate the function of the tilde character in LaTeX? - html

In LaTeX the tilde character is used set a space between two text elements that cannot be separated by a line break. This is useful to keep citations right next to the citing text, for instance. It is also most useful when presenting figures according to the ISU. As an example, the code:
1~400~t/a
Guarantees the following text output, always packed together in the same line:
1 400 t/a
Is there a way to mimic this behaviour in HTML?

A non-breaking space, in HTML, is expressed as

Related

Term to describe the space left between blocks of code (e.g. in a script)?

What's the correct term for the space between blocks of code? The best I have come up with is 'block delimiter' (as in code block delimiter)
Background
I'm writing some documentation and need to know what the space between blocks of code is called. I can see a very common pattern is to leave a single line gap (in other words, \n\n goes between the last character of the last code block and the first character of the next code block - examples here).
Question
What is the appropriate term for the space between the last character of a code block and the first character of the following code block?
Consider term padding line between blocks as inspired by ESLint's rule padding lines between statements:
Require or disallow padding lines between statements
(padding-line-between-statements)
This rule requires or disallows blank lines between the given 2 kinds
of statements. Properly blank lines help developers to understand the
code.

How are tabs interpreted in CommonMark?

See the description before Example 6 in the CommonMark spec at: http://spec.commonmark.org/0.27/#example-5
I am trying to understand how the following code leads to a code-block starting with two spaces.
>→→foo
Example 6 shows that this would translate to the following.
<blockquote>
<pre><code> foo
</code></pre>
</blockquote>
But Section 2.2 clearly states:
However, in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters.
So as per my understanding, the above Markdown behaves like the following (I denote a space with a dot).
>........foo
Since, one optional space is allowed after >, and 4 spaces are used to indent code block, we are left with,
>...foo
That's a code-block starting with three spaces. How does CommonMark claim then that it should lead to a code-block starting with two spaces? What am I missing?
The key is in the very first paragraph of the Tabs section (emphasis added):
Tabs in lines are not expanded to spaces. However, in contexts where whitespace helps to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters.
Notice that is says "4 characters" not 4 spaces.
If you configure your text editor to use a tab stop of length four and to replace tabs with spaces (any good text editor should offer this setting), the text editor will use columns that are four characters wide. When you press the tab key, it will forward the cursor to the next column, which will only every be four characters wide. If the column already contains any characters, then only as many spaces are added to total four characters, which, in this case would be less than four spaces.
For example, if you type an angle bracket (>) character in your editor and then press tab, you will get the following (when configured to replace tabs with spaces):
>···
Therefore the angle bracket plus the tab moves forward to the end of the column (four characters) for a total of three spaces. As we are now at the beginning of the next column, pressing tab a second time would move us to the next column (4 more spaces) for a total of 7 spaces:
>·······
We can confirm this is the correct interpretation with a more recent change to the spec committed in 3bc01c5dc (which apparently hasn't made it it to a release yet). As the commit comment suggests, the clarification helps the math make more sense (emphasis added):
Normally the > that begins a block quote may be followed
optionally by a space, which is not considered part of the
content. In the following case > is followed by a tab,
which is treated as if it were expanded into three spaces.
Since one of these spaces is considered part of the
delimiter, foo is considered to be indented six spaces
inside the block quote context, so we get an indented
code block starting with two spaces.
Notice the added sentence (in bold) which confirms that the first tab only adds "three spaces".
Therefore, as we have now established, we start with an angle bracket plus seven spaces. So first we break off the blockquote deliminator, which consists of the angle bracket and the first space (in the following examples the | is used to indicate where the parser breaks the string and should not be counted as characters):
>·|······
The text contained in the blockquote is now indented six spaces. Four of them are the code block deliminator:
>·|····|··
Which leaves two spaces at the start of the code block.
Of course, as stated back at the beginning (of the section in the spec), the tabs aren't actually replaced with spaces, it just behaves as if they were. And that can be confusing at times. It may help to configure your text editor to always replace tabs with spaces and then you can avoid this confusion.

Adding Whitespace in Middle of Sentence

In HTML5, how do you skip 5 spaces in a <div>? For example, how do you do this:
"Drumroll... [5 spaces] Something!"
<div>Drumroll... [5 spaces] Something!</div> just outputs "Drumroll... Something!"
There does not seem to be any tags such as <indent> that I have found to indent in the middle of a sentence.
&nbsp&nbsp&nbsp&nbsp&nbsp works, but is there a more efficient way? Such as...
<skip 10px></skip>
Specifically, I am looking for the solution to insert exactly 1,000 spaces easily, for example.
This is not perfectly five spaces, and I'm not sure if there's a way to do it without using five consecutive s, but this will allow you to add a specifiable amount of space inline.
<p>Drumroll...<span style="margin-left:50px;"></span>something</p>
http://jsfiddle.net/5drHj/1/
Another option might be to use the <pre> tag...
<pre>Drumroll... something</pre>
http://jsfiddle.net/5drHj/2/
If you do decide to use consecutive you could use a javascript loop (or php loop for server side construction) to add the 1000 s
Edit: At the risk of losing my tick, I'd like to point out that the answer given by #vals is a third option, and perhaps the most elegant of the three.
No, there is no such element in HTML. Long ago, there was the nonstandard <spacer> tag, but it was abandoned. You are supposed to use CSS for things like this. Wrap some preceding text in a <span> element and set padding-left: 1.25em on it. Tune the value as needed. The width of a space depends on font but is on the average around 0.25em.
The question that you pose in the first half of the question (How to insert spaces easily), is achieved with the property:
white-space: pre;
It means that your text is pre-formatted, and the white spaces should stay as they are. Then just insert those spaces.
fiddle
If you want to insert 1000 spaces, then we are talking probably about alignment, and there is a huge amount of posibilities. (padding specified in em being the most obvious), but you should then give more details of your situation.

Why is a trailing punctuation mark rendered at the start with direction:rtl?

This is more a sort of curiosity. While working on a multilingual web application I noticed that certain characters like punctuation marks (!?.;,) at the end of a block element are rendered as if they were placed at the beginning instead when the writing direction is right-to-left (as it is the case for certain Asian languages I do not speak).
In other words, The string
Hello, World!
is rendered as
!Hello, World
when placed in a div block with direction: rtl
This becomes even more evident if the text is split in two parts and given different colors: a contiguous chunk of text at the end is rendered in two separated regions:
http://jsfiddle.net/22Qk9/
What's the point of this behavior? I guess this must be a peculiarity of (all?) right-to-left languages which is automatically handled by the browser, so I don't need to care about it, or should I?
If you want to fix this behavior add the LRM character ‎ in the end. It's a non=printing character.
Source : http://dotancohen.com/howto/rtl_right_to_left.html
Example : http://jsfiddle.net/yobjj6ed/
The reason is that the exclamation mark “!” has the BiDi class O.N. ('Other Neutrals'), which means effectively that it adapts to the directionality of the surrounding text. In the example case, it is therefore placed to the left of the text before it. This is quite correct for languages written right to left: the terminating punctuation mark appears at the end, i.e. on the left.
Normally, you use the CSS code direction: rtl or, preferably, the HTML attribute dir=rtl for texts in a language that is written right to left, and only for them. For them, this behavior is a solution, not a problem.
If you instead use direction: rtl or dir=rtl just for special effects, like making table columns laid out right to left, then you need to consider the implications. For example, in the table case, you would need to set direction to ltr for each cell of the table (unless you want them to be rendered as primarily right to left text).
If you have, say, an English sentence quoted inside a block of Arabic text, then you need to set the directionality of an element containing the English text to ltr, e.g.
<blockquote dir=ltr>Hello, World!</blockquote>
A similar case (just with Arabic inside English text) is discussed as use case 6 in the W3C document What you need to know about the bidi algorithm and inline markup (which has a few oddities, though, like using cite markup for quoted text, against W3C recommendations).
The accepted answer https://stackoverflow.com/a/20799360/477420 works if you can control markup/CSS of the value, if you have no control over HTML following approach could work.
If you don't know if page will be rendered RTL or LTR but some text is definitely LTR (i.e. English-only) you can wrap the value with LRE/PDF marks to signify that is LTR region. Text will be rendered LTR irrespective of page's LTR or RTL direction.
This works when you have some code that tries to render text without ability to change markup of how exactly it will show up on the page. I.e. you rendering value for "song tile" or "company name" field in some nested child component (or server side) without ability to control surrounding HTML elements.
One drawback of this and similar approaches (like LRM proposal in this question) with adding marks to text is copy-paste of such value from the resulting HTML page will generally preserve the marks but they are not visible/zero width. While for most cases it is fine consider if that is a problem for you.
Approximate sample code (some companies have "Inc." at the end which will end up with dot at the beginning when rendered as-is on RTL page):
// comanyName = "Alphabet Inc." - really likes dot at the end including RTL
if(stringIsDefinitelyAscii(companyName))
{
companyName = "\u202A" + companyName + "\u202C"
}
return companyName;
Details on LRE/PDF symbols can be found in https://unicode.org/reports/tr9/#Explicit_Directional_Embeddings:
LRE U+202A LEFT-TO-RIGHT EMBEDDING
Treat the following text as embedded left-to-right.
PDF U+202C POP DIRECTIONAL FORMATTING End the scope of the last LRE, RLE, RLO, or LRO.
Some approaches to figure out if string has RTL characters can be found in How to detect whether a character belongs to a Right To Left language?, JavaScript: how to check if character is RTL?, How to detect if a string contains any Right-to-Left character?.

How does Zalgo text work?

I've seen weirdly formatted text called Zalgo like below written on various forums. It's kind of annoying to look at, but it really bothers me because it undermines my notion of what a character is supposed to be. My understanding is that a character is supposed to move horizontally across a line and stay within a certain "container". Obviously the Zalgo text is moving vertically and doesn't seem to be restricted to any space.
Is this a bug/flaw/exploit/hack in Unicode? Are these individual characters with weird properties? "What" is happening here?
H̡̫̤̤̣͉̤ͭ̓̓̇͗̎̀ơ̯̗̱̘̮͒̄̀̈ͤ̀͡w͓̲͙͖̥͉̹͋ͬ̊ͦ̂̀̚ ͎͉͖̌ͯͅͅd̳̘̿̃̔̏ͣ͂̉̕ŏ̖̙͋ͤ̊͗̓͟͜e͈͕̯̮̙̣͓͌ͭ̍̐̃͒s͙͔̺͇̗̱̿̊̇͞ ̸̤͓̞̱̫ͩͩ͑̋̀ͮͥͦ̊Z̆̊͊҉҉̠̱̦̩͕ą̟̹͈̺̹̋̅ͯĺ̡̘̹̻̩̩͋͘g̪͚͗ͬ͒o̢̖͇̬͍͇͓̔͋͊̓ ̢͈͙͂ͣ̏̿͐͂ͯ͠t̛͓̖̻̲ͤ̈ͣ͝e͋̄ͬ̽͜҉͚̭͇ͅx͎̬̠͇̌ͤ̓̂̓͐͐́͋͡ț̗̹̝̄̌̀ͧͩ̕͢ ̮̗̩̳̱̾w͎̭̤͍͇̰̄͗ͭ̃͗ͮ̐o̢̯̻̰̼͕̾ͣͬ̽̔̍͟ͅr̢̪͙͍̠̀ͅǩ̵̶̗̮̮ͪ́?̙͉̥̬͙̟̮͕ͤ̌͗ͩ̕͡
The text uses combining characters, also known as combining marks. See section 2.11 of Combining Characters in the Unicode Standard (PDF).
In Unicode, character rendering does not use a simple character cell model where each glyph fits into a box with given height. Combining marks may be rendered above, below, or inside a base character
So you can easily construct a character sequence, consisting of a base character and “combining above” marks, of any length, to reach any desired visual height, assuming that the rendering software conforms to the Unicode rendering model. Such a sequence has no meaning of course, and even a monkey could produce it (e.g., given a keyboard with suitable driver).
And you can mix “combining above” and “combining below” marks.
The sample text in the question starts with:
LATIN CAPITAL LETTER H - H
COMBINING LATIN SMALL LETTER T - ͭ
COMBINING GREEK KORONIS - ̓
COMBINING COMMA ABOVE - ̓
COMBINING DOT ABOVE - ̇
Zalgo text works because of combining characters. These are special characters that allow to modify character that comes before.
OR
y + ̆ = y̆ which actually is
y + ̆ = y̆
Since you can stack them one atop the other you can produce the following:
y̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
which actually is:
y̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
The same goes for putting stuff underneath:
y̰̰̰̰̰̰̰̰̰̰̰̰̰̰̰̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
that in fact is:
y̰̰̰̰̰̰̰̰̰̰̰̰̰̰̰̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆̆
In Unicode, the main block of combining diacritics for European languages and the International Phonetic Alphabet is U+0300–U+036F.
More about it here
To produce a list of combining diacritical marks you can use the following script (since links keep on dying)
for(var i=768; i<879; i++){console.log(new DOMParser().parseFromString("&#"+i+";", "text/html").documentElement.textContent +" "+"&#"+i+";");}
Also check em out
Mͣͭͣ̾ Vͣͥͭ͛ͤͮͥͨͥͧ̾