Partially colored Arabic word in HTML - html

I don't speak Arabic, but I need specific support for Arabic on our web. I need parts of Arabic words to be in a <span> with a different style than the rest of word. When I type two characters ش and س, they are composed into word شس, but when I use HTML markup
<span>ش</span>س
these letters are not concatenated right in the output.
In the picture, desired output is on second line, actual output is on first line.
EDIT: It works on Firefox, but does not work in Chrome/Safari.

Insert a zero-width joiner (e.g. using the entity reference ‍) at the end of the span element content: <span>ش‍</span>س.
More generally, the zero-width joiners at the start and end of each span element as well as (just to be more sure) before and after each span element, in situations where the text should have cursive (joining) behavior and span may break it.
The issue is discussed and illustrated on the Bidirectional text page by Andreas Prilop.
Update: Unfortunately, it seems that even ‍ does not help on current versions of WebKit browsers. They seem to treat HTML markup as breaking joining behavior, no matter what.
Update 2: As described in #NasserAl-Wohaibi’s comment, the new problem can be solved by using ‍ twice. However, in current Safari (5.1.7) for Windows, it does not help; in fact, it displays even ش‍س wrong whereas without the joiner, it shows شس correctly.

This is actually a reported bug in WebKit, thus presumably affects all WebKit-based browsers.

As Jukka K. Korpela indicated, This is mostly a bug in most WebKit-based browsers(chrome, safari, etc).
A simple hack other than the TAMDEED char or getting contextual forms for Arabic letters would be to put the zero-width-joiner (‍ or ‍) before/after the letter you want to be treated as single Arabic ligature - two chars making up another one. e.g.
<p>عرب‍<span style="color: Red;">‍ي</span></p>
demo: jsfiddle
see also the webkit bug report.

Related

How can I control (across all devices and browsers) whether a character is displayed as the emoji version or text version?

Below (and this live demo here) is the HTML that produced these 2 screenshots. The first is in Chrome on Windows 10, and the second is from Chrome on iOS 12.
Notice that Win 10 correctly flattens and colors red all of the characters in the bottom line. But in the top line, it incorrectly does not stylize the ⚠️, even though elsewhere (also on Win 10) I see it correctly displayed in yellow, such as here.
Also notice that iOS 12 correctly stylizes all the emojis but does not flatten and color red the first 2 characters (🌄︎ 💬︎).
How can I control (across all devices and browsers) whether a character is displayed as the emoji version or text version?
This is NOT a duplicate of other questions because (as you can see from the HTML) I already know about the text variation selector ︎, and I've experimented with tons of different local fonts (such as https://emojisymbols.com) and Google Fonts.
How to prevent Unicode characters from rendering as emoji in HTML from JavaScript?
https://apple.stackexchange.com/q/347993/53510
<link href="https://fonts.googleapis.com/css?family=Raleway&display=swap" rel="stylesheet">
<div style='text-align: right; font-family: "Raleway"; margin: auto; display: inline-block; font-size: 22px;'>
emojis 🌄
💬
⌛
⚡
⚠
<div style="color: red;">
using text variation selector
🌄︎
💬︎
⌛︎
⚡︎
⚠︎
</div>
</div>
The short answer is that you can’t. The text variation selector does not work generally for all characters; only those sequences explicitly defined in the standard are valid. Chrome on Windows is in fact violating the standard in your first example because there are no variation sequences for 🌄 and 💬. There is no Unicode mechanism to stop these characters from behaving like emoji; <U+1F304, U+FE0E> must display identically to U+1F304 alone.
All emoji characters that allow variation selectors are listed in the data file emoji-variation-sequences.txt, and I also curate a visual table on my website for easy access.
However, even for those characters that do support variation selectors, there is no guarantee that they will actually work. Older Android phones for example cannot display many emoji as text-style because they simply lack the fonts necessary to do so.
If you want to ensure universal text-style display, you will need to supply your own fonts to override the system defaults.
Sidenote: While your Windows example gets the variation selectors wrong, it actually does handle ⚠ correctly because that character is meant to be text-style by default unlike all the others. If you need emoji-style display, you have to append the emoji variation selector U+FE0F like so: ⚠️. This is not necessary (but possible) for ⌛ and ⚡ because they’re emoji-default.

Bidirectional (BiDi) text inside HTML textarea not respecting LRM control character

I'm having a hard time with making BiDi strings work inside an HTML textarea as I'd expect.
This test string contains both Arabic and English, plus sequences of pseudo-tags (<1/>, <2/>), which are composed of neutral-direction characters (<, >, /, numbers) and should inherit their direction by the strong-direction character before them.
Given that these pseudo-tags are positioned after both RTL and LTR text, I need to force the direction of the text putting one LRM (U+200E, ‎) char before each pseudo-tags.
The result it's not what I expected:
Note that the textarea has the direction property set as follow: dir='rtl'
Tested with both Chrome and FF, none of them seems to work as expected. Am I missing something?
Results on Jsfiddle are even different: https://jsfiddle.net/o7d2ymdc/1/
Unfortunately, displaying these inside a textarea is going to be extremely difficult, if at all possible.
There are several issues that are at play here, among them is the fact that brackets and parentheses are mirrored in the Unicode Bidirectional algorithm: This <span dir="ltr"><</span> is rendered as '<', while this <span dir="rtl"><</span> is rendered as '>'. And all of this is added on top of the fact that we have different definitions of "end of string" in either of the RTL and LTR strings.
Your best bet could be using ContentEditable. You can display editable rich text - that is actually html nodes - and essentially isolate your RTL pieces from the HTML markup properly with spans, as if you would have statically displayed it. However, if this textbox allows for custom user-generated text, you may need to come up with a good algorithm that wraps the bidirectional text automatically as the user types, which can be a pretty big challenge.
If this helps, you're not the only one to deal with this. If you edit HTML blocks in Arabic Wikipedia, for example, you will see the exact same problem (which makes editing HTML and wikitext a fairly big challenge)
This problem is also one of the reasons why people prefer a WYSIWYG editor - that has proper contextual and conceptual separation between the markup/style and the text itself.

Two spaces after every full stop in paragraph using CSS?

How do I put two spaces after every full stop in a paragraph using CSS?
Ah, the old "two-spaces-after-a-period" meme rears its ugly head again.
Two spaces after a period is something that pertains to the typewriter world, or the monospaced font world. We moved beyond it long ago, starting with TeX or even before. The point is not to have one or two space characters after a period, but to have a pleasing amount of space there. Algorithms like TeX go to great length to do so. The algorithms in modern web browsers are still primitive by comparison, but are starting to do better. Consider the following:
You'll see that the space after the period is (slightly) greater than the inter-word space, as it should be.
What about the case of justification? You'd hope the browser would put the extra space between sentences, in preference to putting it between words. And that's what happens:
Anyway, so you want more fine-grained control, to realize your own typographical vision on your web pages. The following has four characters between the sentences:
You could also use spaces of different widths from Unicode to get just the amount of space you want (see Wikipedia article).
So is there any way to do this automatically? CSS has a word-spacing property, but no sentence-spacing property (actually, it's not that easy to figure out what a "sentence" is, even in English, and less so in other languages). Of course, putting more spaces in your HTML is not going to do a thing, since HTML treats any run of white space as a single space. So you're going to have to write some code, or find a plug-in, which traverses the text in your page and inserts markup. Or, add a plug-in or something to your CMS to spit out code which is marked up appropriately. Your alternatives for doing so are:
Add or a combination of different-width Unicode spaces.
As another poster suggested, use span tags with margin.
As a variant on the above, use a <span class="sentence"> element, with a CSS rules like .sentence::after { content: "\2002"; }, where 2002 is the "en-space". This results in:
However, the bottom line is that the web is not a typographical environment, notwithstanding the many worthy efforts to nudge it in that direction. Depending on your goals, you might consider creating your documents in a high-end document preparation environment, and publishing them as PDFs, for example.
The two spaces concept after a sentence is not "ugly" - in fact, it's just the opposite. Because of modern font kerning as well as the variety of fonts that Web browsers now support, it's sometimes very difficult to determine if a sentence has ended or if there is simply a word that is abbreviated that requires a period, not to mention a look of constant run-on. With 'fat' letters beginning a sentence, such as an upper-case "W", it can appear as though there is actually no space at all. Adding an additional space after a sentence provides readers with clear breaks. However, I get it that it would be quite difficult to create CSS that could "understand" what a sentence is so that it would automatically insert an additional space after each.
You could put your full stop in a span-tag and give it some CSS attributes, like "margin-right: 5px;", if it's only the appearance you are looking for.
Can only be done if you put your full stop to a tag, like <span>. For example :
www<span>.</span>google<span>.</span>com
Then the css is :
span:after{
content : " "; /*two spaces*/
}

preserve white space in options text of a select for string created by java

Server side, I build a list of strings which are the option text of an html select multiple.
Every string is the result of the concatenation of four strings. First, second and third have a length=5. Third string has a variable length, so I complete its length to 19 chars with white spaces:
StringUtils.rightPad(data.toUpperCase(), 19, " ");
Nevertheless, in my html page, these whites spaces are removed.
I have looked for similar problems in this web and others, I have tried with & nbsp;, \u0020, I have tried with css style white-space:pre-wrap;, I have tried a lot of things but white spaces are not preserved.
Any one knows how to solve this problem without javascript? only with html/styles.
Thank you, regards
The default styling in webpages is to collapse whitespace, you can easily change this with the white-space property in CSS:
p {
white-space: pre;
}
The values pre and pre-wrap preserve whitespace, the difference between them being that pre will only wrap the text on line-breaks, whereas pre-wrap will wrap on all whitespace characters (like regular text. Your question states that you have tried this and it did not work, however I have tested this code and it worked fine for me (using Google Chrome) and the W3C reference says that it works in all major browsers, therefore I suspect it is a mistake in implementation, try again and double-check you are applying styles to the correct class and there are no specificity issues.

Korean sentences being split randomly

I'm facing a problem here: I have a text in Korean decimal Unicode and the text is displayed in 4 columns and many rows (as it's the answers of a language test). The problem is that because the width of each answer is 20%, the sentence splits randomly in the middle of the word when it doesn't fit, instead of in the spaces between words. I don't know how to treat this, since this text is loaded and displayed automatically from a database.
The HTML code for each one of the 4 columns is like this:
<table class="courses" border="0" cellpadding="2" cellspacing="2" width="100%" style="font-size:13px;">
<tbody>
<td width="20%">
<p align="center">
<input name="a[X]" value=1" type="radio">
<br>
<?php echo "바쁘면 가지 마세요" ?> // this comes from a DB, its the unicodes of the korean characters<br>
</p>
</td>
</tbody>
</table>
What could I do to fix this and, when it doesn't fit, avoid splitting randomly, but do it when a sentence ends? If you notice in the Unicode codes, you can tell there's a space between ;면 &#44032, but it breaks just anywhere, the same for all the text.
(Note that there aren't any encoding problems, the Korean characters are displayed properly. And it doesn't happen with other languages like Swedish or Spanish).
EDIT
Here's a working example.
Note that in the example, the first answer is split in the last two characters, when that word has five characters, so should be split 3 chars before.
Line breaking for CJK (Chinese/Japanese/Korean) text can be quite problematic given the current state of web standards.
There is not too much you can do in a language-agnostic manner; CSS level 3 defines related attributes (line-break and word-break), but I 'm not so sure what the support level is accross modern browsers (obviously not-so-modern browsers are entirely out of the picture).
It doesn't really matter as Korean can be split anywhere anyway. See this screenshot from Chosun.com:
The words are cut anywhere, seemingly randomly. You don't need to worry about hyphenation.
It sounds like you are just encountering the default behavior of white-space. You could take a look at the CSS white-space property and try something like pre.
I'm facing the same issue and I think the best solution is to wrap each word in a span with white-space: nowrap;. This makes sure there won't be any line break inside the word.
See this JSFiddle for a proof of concept: http://jsfiddle.net/we7jx08r/. When you change body's width, you'll notice that the line breaks are always correct.
See http://css-tricks.com/almanac/properties/w/whitespace/ for white-space: nowrap browser support (IE5.5+, FF1+, Safari 1+).
You could try a solution I worked on a while ago: https://stackoverflow.com/a/46714474/2114953
That is, if you like to use JS in order to wrap each word into span HTML elements and then use CSS display: inline-block to force words to go to new lines if need be.
I've been looking in to this for a project I'm working on. I think https://www.w3.org/TR/css-text-3/#line-breaking covers this, particularly "Example 5":
As another example, Korean has two styles of line-breaking: between any two Korean syllables (word-break: normal) or, like English, mainly at spaces (word-break: keep-all).
Indeed, the default behavior of Google Chrome (version 61) is that it breaks on the syllables (I assume as I don't speak or read Korean)
Setting word-break: all, seems to override this behavior and put line wraps in on white spaces only.
word-break: keep-all; Fixed it for me. I added this if the user is view the site in Korean specifically.
word-break: keep-all; -> "Word breaks should not be used for Chinese/Japanese/Korean (CJK) text. Non-CJK text behavior is the same as value 'normal'" (https://www.w3schools.com/cssref/css3_pr_word-break.asp)