Bidirectional (BiDi) text inside HTML textarea not respecting LRM control character - html

I'm having a hard time with making BiDi strings work inside an HTML textarea as I'd expect.
This test string contains both Arabic and English, plus sequences of pseudo-tags (<1/>, <2/>), which are composed of neutral-direction characters (<, >, /, numbers) and should inherit their direction by the strong-direction character before them.
Given that these pseudo-tags are positioned after both RTL and LTR text, I need to force the direction of the text putting one LRM (U+200E, ‎) char before each pseudo-tags.
The result it's not what I expected:
Note that the textarea has the direction property set as follow: dir='rtl'
Tested with both Chrome and FF, none of them seems to work as expected. Am I missing something?
Results on Jsfiddle are even different: https://jsfiddle.net/o7d2ymdc/1/

Unfortunately, displaying these inside a textarea is going to be extremely difficult, if at all possible.
There are several issues that are at play here, among them is the fact that brackets and parentheses are mirrored in the Unicode Bidirectional algorithm: This <span dir="ltr"><</span> is rendered as '<', while this <span dir="rtl"><</span> is rendered as '>'. And all of this is added on top of the fact that we have different definitions of "end of string" in either of the RTL and LTR strings.
Your best bet could be using ContentEditable. You can display editable rich text - that is actually html nodes - and essentially isolate your RTL pieces from the HTML markup properly with spans, as if you would have statically displayed it. However, if this textbox allows for custom user-generated text, you may need to come up with a good algorithm that wraps the bidirectional text automatically as the user types, which can be a pretty big challenge.
If this helps, you're not the only one to deal with this. If you edit HTML blocks in Arabic Wikipedia, for example, you will see the exact same problem (which makes editing HTML and wikitext a fairly big challenge)
This problem is also one of the reasons why people prefer a WYSIWYG editor - that has proper contextual and conceptual separation between the markup/style and the text itself.

Related

Puncutation marks appear in the wrong side inside a right-to-left pharagraph tag in html

I am trying to mix between Hebrew and English within a single right-to-left paragraph tag, but puncuation marks are rendered on the opposite side.
For example, I wish to have the following rendered on the page:
But the double slash punctuation marks which should appear at the far left, are getting switched with the single puncuation mark at the far right, as you can see from running the following code snippet:
<p dir="rtl">אחת שתיים \\one\two\</p>
I had tried solving the problem using different methods (For example, using the unicode-bidi css property) but none of my attempts have worked.
Note: Changing the original text inside the paragraph tag to include special unicode characters (such as rlm control characters) or dividing the text within the tag into multiple tags is not an option in my case (I am trying to solve this problem without changing the html structure).
Preferably, I would want to solve this problem using only html or css, but also javascript might be an option, if one can't do it the other ways.

Why does "[x]y" display incorrectly in the RTL direction?

<div style="direction: rtl">
[x]y
</div>
You can see HTML text [x]y displays as x]y].
What is the reason of that result?
PS: I get that result in Chrome 56.0.2924.87 (64-bit).
I cannot tell you the reason but I can tell you how to fix it: add unicode-bidi: bidi-override;. See more about it
<div style="direction: rtl; unicode-bidi: bidi-override;">
[x]y
</div>
The description
The unicode-bidi property is used together with the direction property to set or return whether the text should be overridden to support multiple languages in the same document.
is not clear enough to explain the behaviour. However, it works.
EDIT
The MDN article brings some light here, bidi-override actually disables the browser standard smart behaviour and everything works as is / as expected.
It is rendered correctly, i.e. according to specifications. You have asked for right-to-left layout. The rendering first takes the [ character. It is directionally neutral and therefore rendered in a RTL run rightmost and mirrored (so it looks like ]). Next, to the left of it comes x]y in that order, since the Latin letters x and y have inherent left-to-right directionality and the neutral ] gets its directionality from them.
The conclusions to be drawn depends on the rendering you want and your reasons for using right-to-left directionality.
After some research, I found the following info: Right-To-Left text direction
Parentheses and square brackets do not have an inherent direction. The open parenthesis is between LTR and RTL text runs and so cannot "inherit" the direction of the surrounding text. It therefore defaults to the RTL base direction of the paragraph and is placed to the left of the Hebrew word shalom. Note the closing square bracket is embedded in a single run of left-to-right text. It therefore adopts the direction of its surrounding text and is placed to the right of the English word shalom.
one of the solutions is to add ‎ after the bracket
thanks to #freeworlder for the solution on
brackets displays wrongly for right to left display style
even you can use other char, follow this link
http://www.codetable.net/hex/200e
Try using an open square bracket "[" where you need a closed square bracket, and vice versa. I had a font with a letter mapped to "[" and it would not display. I changed the letter in the database to "]" and it worked.

Two spaces after every full stop in paragraph using CSS?

How do I put two spaces after every full stop in a paragraph using CSS?
Ah, the old "two-spaces-after-a-period" meme rears its ugly head again.
Two spaces after a period is something that pertains to the typewriter world, or the monospaced font world. We moved beyond it long ago, starting with TeX or even before. The point is not to have one or two space characters after a period, but to have a pleasing amount of space there. Algorithms like TeX go to great length to do so. The algorithms in modern web browsers are still primitive by comparison, but are starting to do better. Consider the following:
You'll see that the space after the period is (slightly) greater than the inter-word space, as it should be.
What about the case of justification? You'd hope the browser would put the extra space between sentences, in preference to putting it between words. And that's what happens:
Anyway, so you want more fine-grained control, to realize your own typographical vision on your web pages. The following has four characters between the sentences:
You could also use spaces of different widths from Unicode to get just the amount of space you want (see Wikipedia article).
So is there any way to do this automatically? CSS has a word-spacing property, but no sentence-spacing property (actually, it's not that easy to figure out what a "sentence" is, even in English, and less so in other languages). Of course, putting more spaces in your HTML is not going to do a thing, since HTML treats any run of white space as a single space. So you're going to have to write some code, or find a plug-in, which traverses the text in your page and inserts markup. Or, add a plug-in or something to your CMS to spit out code which is marked up appropriately. Your alternatives for doing so are:
Add or a combination of different-width Unicode spaces.
As another poster suggested, use span tags with margin.
As a variant on the above, use a <span class="sentence"> element, with a CSS rules like .sentence::after { content: "\2002"; }, where 2002 is the "en-space". This results in:
However, the bottom line is that the web is not a typographical environment, notwithstanding the many worthy efforts to nudge it in that direction. Depending on your goals, you might consider creating your documents in a high-end document preparation environment, and publishing them as PDFs, for example.
The two spaces concept after a sentence is not "ugly" - in fact, it's just the opposite. Because of modern font kerning as well as the variety of fonts that Web browsers now support, it's sometimes very difficult to determine if a sentence has ended or if there is simply a word that is abbreviated that requires a period, not to mention a look of constant run-on. With 'fat' letters beginning a sentence, such as an upper-case "W", it can appear as though there is actually no space at all. Adding an additional space after a sentence provides readers with clear breaks. However, I get it that it would be quite difficult to create CSS that could "understand" what a sentence is so that it would automatically insert an additional space after each.
You could put your full stop in a span-tag and give it some CSS attributes, like "margin-right: 5px;", if it's only the appearance you are looking for.
Can only be done if you put your full stop to a tag, like <span>. For example :
www<span>.</span>google<span>.</span>com
Then the css is :
span:after{
content : " "; /*two spaces*/
}

Bidirectional text and numbers

I have a website that displays in two languages - english and farsi. The title of a list item can be in both languages mixed at the same time. All ok until here as far as you have text only it will render ok using direction:rtl in css.
But the catch is that I can also have a number inside or at the end of title (which in farsi is written and read same as in english - left to right). This ends up with a problem since no matter where I put that number it will mess up the words order in the title (the number is an ad ID at the end of the title).
To solve this issue I use &rlm and &lrm infront of the id - but the catch is that I have to switch this two according which language is choosen.
My correct html is as this (‏ is what fixes the id number issue in farsi):
<h3>
The name of my خدمات باشد is long
<span style="color:#999;">‏#89798798</span>
</h3>
JS FIDDLE: http://jsfiddle.net/WzF2D/
I tried setting direction:ltr on the span wrapping around ID but it still won't work. I also tried to use unicode-bidi:embed on h3 but also no go.
How can I solve this by using css only without having to rely on ‏?
I will assume that the desired rendering uses overall right-to-left writing, even though the text (at least in the example) is mainly English, with some words in Arabic letters inside the sentence. Moreover, I assume that expressions like “#89798798” are to be treated as separate fragments, so that when it appears after an English word, it is not considered as part of English text but set to the left of it, in RTL layout.
Under these (rather astonishing) premises, the CSS solution is to make such a fragment a bidirectionality isolate:
<span style="color:#999; unicode-bidi: embed">#89798798</span>

Partially colored Arabic word in HTML

I don't speak Arabic, but I need specific support for Arabic on our web. I need parts of Arabic words to be in a <span> with a different style than the rest of word. When I type two characters ش and س, they are composed into word شس, but when I use HTML markup
<span>ش</span>س
these letters are not concatenated right in the output.
In the picture, desired output is on second line, actual output is on first line.
EDIT: It works on Firefox, but does not work in Chrome/Safari.
Insert a zero-width joiner (e.g. using the entity reference ‍) at the end of the span element content: <span>ش‍</span>س.
More generally, the zero-width joiners at the start and end of each span element as well as (just to be more sure) before and after each span element, in situations where the text should have cursive (joining) behavior and span may break it.
The issue is discussed and illustrated on the Bidirectional text page by Andreas Prilop.
Update: Unfortunately, it seems that even ‍ does not help on current versions of WebKit browsers. They seem to treat HTML markup as breaking joining behavior, no matter what.
Update 2: As described in #NasserAl-Wohaibi’s comment, the new problem can be solved by using ‍ twice. However, in current Safari (5.1.7) for Windows, it does not help; in fact, it displays even ش‍س wrong whereas without the joiner, it shows شس correctly.
This is actually a reported bug in WebKit, thus presumably affects all WebKit-based browsers.
As Jukka K. Korpela indicated, This is mostly a bug in most WebKit-based browsers(chrome, safari, etc).
A simple hack other than the TAMDEED char or getting contextual forms for Arabic letters would be to put the zero-width-joiner (‍ or ‍) before/after the letter you want to be treated as single Arabic ligature - two chars making up another one. e.g.
<p>عرب‍<span style="color: Red;">‍ي</span></p>
demo: jsfiddle
see also the webkit bug report.