Flash/AS3 highlighting Arabic text - actionscript-3

Whenever I try to highlight Arabic text, it starts to go really weird and highlights at wrong parts.
Try it out for yourself to see what I mean: http://www.fastswf.com/iJ_P74c
Try highlighting the Arabic text and notice how weird things happen, Then highlight the English sentence and you'll notice that it's perfectly normal.
Is there any way to fix this?
Edit: If it's important; the font used is Times New Roman; Device Fonts; Regular.

This is a known limitation of mixing RTL and LTR languages in the same text field, 'Classic Text` as Adobe calls it versus 'FTL text'.
Use TLFTextField instead of UITextField and text highlighting for mixed RTL and LTR languages in the same text block will be handled correctly.
TLF text provides the following enhancements over Classic text:
The ability to create right-to-left text for Arabic and Hebrew scripts.
Support for bi-directional text, where right-to-left text can contain elements of left-to-right text. This is important for embedding English words or Arabic numerals within Arabic/Hebrew text, for example
Print-quality typography.
Additional character styles, including leading, ligatures, highlight color, underline, strikethrough, case, digit case, and more.
Additional paragraph styles, including multi-column support with gutter width, last line justification options, margins, indents, paragraph spacing, and container padding values.
Control of additional Asian text attributes, including Tate Chu Yoko, Mojikumi, Kinsoku Shori Type, and Leading model.
You can apply attributes such as 3D Rotation, Color Effects, and Blend Modes to TLF text without placing it in a movie clip symbol.
Text can flow across multiple text containers. These containers are called threaded or linked text containers.

Related

Bidirectional (BiDi) text inside HTML textarea not respecting LRM control character

I'm having a hard time with making BiDi strings work inside an HTML textarea as I'd expect.
This test string contains both Arabic and English, plus sequences of pseudo-tags (<1/>, <2/>), which are composed of neutral-direction characters (<, >, /, numbers) and should inherit their direction by the strong-direction character before them.
Given that these pseudo-tags are positioned after both RTL and LTR text, I need to force the direction of the text putting one LRM (U+200E, ‎) char before each pseudo-tags.
The result it's not what I expected:
Note that the textarea has the direction property set as follow: dir='rtl'
Tested with both Chrome and FF, none of them seems to work as expected. Am I missing something?
Results on Jsfiddle are even different: https://jsfiddle.net/o7d2ymdc/1/
Unfortunately, displaying these inside a textarea is going to be extremely difficult, if at all possible.
There are several issues that are at play here, among them is the fact that brackets and parentheses are mirrored in the Unicode Bidirectional algorithm: This <span dir="ltr"><</span> is rendered as '<', while this <span dir="rtl"><</span> is rendered as '>'. And all of this is added on top of the fact that we have different definitions of "end of string" in either of the RTL and LTR strings.
Your best bet could be using ContentEditable. You can display editable rich text - that is actually html nodes - and essentially isolate your RTL pieces from the HTML markup properly with spans, as if you would have statically displayed it. However, if this textbox allows for custom user-generated text, you may need to come up with a good algorithm that wraps the bidirectional text automatically as the user types, which can be a pretty big challenge.
If this helps, you're not the only one to deal with this. If you edit HTML blocks in Arabic Wikipedia, for example, you will see the exact same problem (which makes editing HTML and wikitext a fairly big challenge)
This problem is also one of the reasons why people prefer a WYSIWYG editor - that has proper contextual and conceptual separation between the markup/style and the text itself.

How we can use complex bullets elements in html text area?

How can I insert different complex bullets elements which we use in Microsoft word into html text area?
When I insert them into text area, text area changes its style and replace bullets elements with '?' question mark.
Why html is not identifying complex bullets elements? Textarea is only identifying simple bullets elements.
HTML knows nothing of Microsoft Word's proprietary complex bullet elements.
HTML knows nothing of Microsoft Word's formatting.
HTML knows nothing but the plain ASCII text you placed into the textarea.
There are Rich Text Editors like http://ckeditor.com/ that can convert MS Word content into HTML (you used one similar on the textarea where you entered your question).
You may also need to ensure your content is rendering using UTF-8 to correctly display your content (and avoid the empty squares, etc.).

Why are some Unicode characters taller than normal text?

I noticed that some unicode chars are taller than the normal text.
E.g. the diagonal arrows (North East Arrow ↗, South East Arrow ↘, ...), they claim more space on top of the letter than a normal text.
<body style="font-size:48px;">
<div style="border:1px solid #00ff00;float:left;">
1-<br>2-<br>3-<br>4-<br>5-<br>6-<br>7-<br>8-<br>9-
</div>
<div style="border:1px solid #ff0000;float:left;margin-left:5px;">
-1<br>-2 ↗<br>-3<br>-4 ↗<br>-5<br>-6 ↗<br>-7<br>-8 ↗<br>-9
</div>
</body>
http://jsfiddle.net/q5LEt/
You can see the dashes moving down every time an arrow is in the same line.
How can I avoid this behaviour, e.g. by CSS?
You can avoid it by explicitly specifying a line-height:
line-height: 1em;
Re the question in the title: The basic answer is that characters are different. An “A” is taller than an “a”, and some special characters can be even taller than “A”. But there’s more. If you mix fonts, there is additional variation, since fonts have been designed differently. For example, “a” in Verdana is much taller than “a” in Calibri, with the same font size.
Re the code example and the jsfiddle: No font family has been declared, so each browser will use its defaults. What happens when I test it on Firefox is that the problem described indeed occurs, for the reason that “normal” characters have been taken from Times New Roman (the common default font), whereas tehe arrow characters—which do not appear in Times New Roman—have been taken from Segoe UI Symbol. Your mileage may vary, but what usually happens is that fallback font used, such as Segoe UI Symbol, has a larger default line height than Times New Roman has. This causes the line as a whole to take more vertical space.
So the issue does not have nothing to do with heights of letters; it depends on properties of the font. (Well, indirectly there is a connection: fonts like Segoe UI Symbol have relatively large default line height set on them, because they contain glyphs that are rather tall.)
You can usually deal with the symptoms by setting line-height, as #KonradRudolph suggests (though the value of 1em, i.e. setting the text solid, would be somewhat extremist, suitable for special cases but not normal texts).
Alternatively, avoid the problem by selecting a list of fonts that contain all the characters you need in the text and declare font-family with such a list as its value. This means that the arrows and the rest of the text is in the same font, which is generally good even for purely typographic reasons. Finding out the list of fonts takes some time; see my Guide to using special characters in HTML for some tips.

Bidirectional text and numbers

I have a website that displays in two languages - english and farsi. The title of a list item can be in both languages mixed at the same time. All ok until here as far as you have text only it will render ok using direction:rtl in css.
But the catch is that I can also have a number inside or at the end of title (which in farsi is written and read same as in english - left to right). This ends up with a problem since no matter where I put that number it will mess up the words order in the title (the number is an ad ID at the end of the title).
To solve this issue I use &rlm and &lrm infront of the id - but the catch is that I have to switch this two according which language is choosen.
My correct html is as this (‏ is what fixes the id number issue in farsi):
<h3>
The name of my خدمات باشد is long
<span style="color:#999;">‏#89798798</span>
</h3>
JS FIDDLE: http://jsfiddle.net/WzF2D/
I tried setting direction:ltr on the span wrapping around ID but it still won't work. I also tried to use unicode-bidi:embed on h3 but also no go.
How can I solve this by using css only without having to rely on ‏?
I will assume that the desired rendering uses overall right-to-left writing, even though the text (at least in the example) is mainly English, with some words in Arabic letters inside the sentence. Moreover, I assume that expressions like “#89798798” are to be treated as separate fragments, so that when it appears after an English word, it is not considered as part of English text but set to the left of it, in RTL layout.
Under these (rather astonishing) premises, the CSS solution is to make such a fragment a bidirectionality isolate:
<span style="color:#999; unicode-bidi: embed">#89798798</span>

How to get a tab character?

In HTML, there is no character for a tab, but I am confused as to why I can copy and paste one here: " " (You can't see the full width of it, but if you click to edit my question, you will see the character.) If I can copy and paste a tab character, there should be a unicode equivalent that can be coded into html. I know it doesn't exist, but this is a mystery I've never been able to grasp.
So my question is: why is there not a unicode character for a tab even if I can copy and paste it?
Sure there's an entity for tabs:
(The tab is ASCII character 9, or Unicode U+0009.)
However, just like literal tabs (ones you type in to your text editor), all tab characters are treated as whitespace by HTML parsers and collapsed into a single space except those within a <pre> block, where literal tabs will be rendered as 8 spaces in a monospace font.
Try  
as per the docs :
The character entities   and   denote an en space and an em
space respectively, where an en space is half the point size and an em
space is equal to the point size of the current font. For fixed pitch
fonts, the user agent can treat the en space as being equivalent to A
space character, and the em space as being equuivalent to two space
characters.
Docs link : https://www.w3.org/MarkUp/html3/specialchars.html
put it in between <pre></pre> tags then use this characters
it would not work without the <pre></pre> tags
Posting another alternative to be more complete. When I tried the "pre" based answers, they added extra vertical line breaks as well.
Each tab can be converted to a sequence non-breaking spaces which require no wrapping.
" "
This is not recommended for repeated/extensive use within a page. A div margin/padding approach would appear much cleaner.
I use <span style="display: inline-block; width: 2ch;"> </span> for a two characters wide tab.
Tab is [HT], or character number 9, in the unicode library.
As mentioned, for efficiency reasons sequential spaces are consolidated into one space the browser actually displays. Remember what the ML in HTML stand for. It's a Mark-up Language, designed to control how text is displayed.. not whitespace :p
Still, you can pretend the browser respects tabs since all the TAB does is prepend 4 spaces, and that's easy with CSS. either in line like ...
<div style="padding-left:4.00em;">Indenented text </div>
Or as a regular class in a style sheet
.tabbed {padding-left:4.00em;}
Then the HTML might look like
<p>regular paragraph regular paragraph regular paragraph</p>
<p class="tabbed">Indented text Indented text Indented text</p>
<p>regular paragraph regular paragraph regular paragraph</p>