My HTML is using custom font to display special symbols. There are a lot of symbols, but only one of them is not displayed. It has the code 00AD
<table id="table_symbols">
<tr>
<td class="symbol"></td>
<td>Pair of bishops</td>
</tr>
</table>
I've searched and found that this is a special code for the soft hyphen character in HTML, so looks like the character is not taken from the font, but replaced with soft hyphen.
Is there any way to forbid such replacement? Or it is easier to edit the font?
Read this:
http://unicode-table.com/en/00AD/
https://www.cs.tut.fi/~jkorpela/shy.html
The Unicode character 00AD is defined to be invisible, except at the
end of a line, where it may or may not be visible, depending on the
script.
Anyway what you used was not the html correct code, instead or are used in html.
Use this example to understand the behavior of the element; (Note resize slowly and watch as the text has hyphen on the end but on normal view there is not).
This is the trick I used when making design for Hiring ads and the client wants pure grammar and also justified effect with no inner spaces(languages like German have big words and if no hyphens words will spread apart when justified).
Related
I'm having a hard time with making BiDi strings work inside an HTML textarea as I'd expect.
This test string contains both Arabic and English, plus sequences of pseudo-tags (<1/>, <2/>), which are composed of neutral-direction characters (<, >, /, numbers) and should inherit their direction by the strong-direction character before them.
Given that these pseudo-tags are positioned after both RTL and LTR text, I need to force the direction of the text putting one LRM (U+200E, ) char before each pseudo-tags.
The result it's not what I expected:
Note that the textarea has the direction property set as follow: dir='rtl'
Tested with both Chrome and FF, none of them seems to work as expected. Am I missing something?
Results on Jsfiddle are even different: https://jsfiddle.net/o7d2ymdc/1/
Unfortunately, displaying these inside a textarea is going to be extremely difficult, if at all possible.
There are several issues that are at play here, among them is the fact that brackets and parentheses are mirrored in the Unicode Bidirectional algorithm: This <span dir="ltr"><</span> is rendered as '<', while this <span dir="rtl"><</span> is rendered as '>'. And all of this is added on top of the fact that we have different definitions of "end of string" in either of the RTL and LTR strings.
Your best bet could be using ContentEditable. You can display editable rich text - that is actually html nodes - and essentially isolate your RTL pieces from the HTML markup properly with spans, as if you would have statically displayed it. However, if this textbox allows for custom user-generated text, you may need to come up with a good algorithm that wraps the bidirectional text automatically as the user types, which can be a pretty big challenge.
If this helps, you're not the only one to deal with this. If you edit HTML blocks in Arabic Wikipedia, for example, you will see the exact same problem (which makes editing HTML and wikitext a fairly big challenge)
This problem is also one of the reasons why people prefer a WYSIWYG editor - that has proper contextual and conceptual separation between the markup/style and the text itself.
I'm facing a problem here: I have a text in Korean decimal Unicode and the text is displayed in 4 columns and many rows (as it's the answers of a language test). The problem is that because the width of each answer is 20%, the sentence splits randomly in the middle of the word when it doesn't fit, instead of in the spaces between words. I don't know how to treat this, since this text is loaded and displayed automatically from a database.
The HTML code for each one of the 4 columns is like this:
<table class="courses" border="0" cellpadding="2" cellspacing="2" width="100%" style="font-size:13px;">
<tbody>
<td width="20%">
<p align="center">
<input name="a[X]" value=1" type="radio">
<br>
<?php echo "바쁘면 가지 마세요" ?> // this comes from a DB, its the unicodes of the korean characters<br>
</p>
</td>
</tbody>
</table>
What could I do to fix this and, when it doesn't fit, avoid splitting randomly, but do it when a sentence ends? If you notice in the Unicode codes, you can tell there's a space between ;면 가, but it breaks just anywhere, the same for all the text.
(Note that there aren't any encoding problems, the Korean characters are displayed properly. And it doesn't happen with other languages like Swedish or Spanish).
EDIT
Here's a working example.
Note that in the example, the first answer is split in the last two characters, when that word has five characters, so should be split 3 chars before.
Line breaking for CJK (Chinese/Japanese/Korean) text can be quite problematic given the current state of web standards.
There is not too much you can do in a language-agnostic manner; CSS level 3 defines related attributes (line-break and word-break), but I 'm not so sure what the support level is accross modern browsers (obviously not-so-modern browsers are entirely out of the picture).
It doesn't really matter as Korean can be split anywhere anyway. See this screenshot from Chosun.com:
The words are cut anywhere, seemingly randomly. You don't need to worry about hyphenation.
It sounds like you are just encountering the default behavior of white-space. You could take a look at the CSS white-space property and try something like pre.
I'm facing the same issue and I think the best solution is to wrap each word in a span with white-space: nowrap;. This makes sure there won't be any line break inside the word.
See this JSFiddle for a proof of concept: http://jsfiddle.net/we7jx08r/. When you change body's width, you'll notice that the line breaks are always correct.
See http://css-tricks.com/almanac/properties/w/whitespace/ for white-space: nowrap browser support (IE5.5+, FF1+, Safari 1+).
You could try a solution I worked on a while ago: https://stackoverflow.com/a/46714474/2114953
That is, if you like to use JS in order to wrap each word into span HTML elements and then use CSS display: inline-block to force words to go to new lines if need be.
I've been looking in to this for a project I'm working on. I think https://www.w3.org/TR/css-text-3/#line-breaking covers this, particularly "Example 5":
As another example, Korean has two styles of line-breaking: between any two Korean syllables (word-break: normal) or, like English, mainly at spaces (word-break: keep-all).
Indeed, the default behavior of Google Chrome (version 61) is that it breaks on the syllables (I assume as I don't speak or read Korean)
Setting word-break: all, seems to override this behavior and put line wraps in on white spaces only.
word-break: keep-all; Fixed it for me. I added this if the user is view the site in Korean specifically.
word-break: keep-all; -> "Word breaks should not be used for Chinese/Japanese/Korean (CJK) text. Non-CJK text behavior is the same as value 'normal'" (https://www.w3schools.com/cssref/css3_pr_word-break.asp)
I don't speak Arabic, but I need specific support for Arabic on our web. I need parts of Arabic words to be in a <span> with a different style than the rest of word. When I type two characters ش and س, they are composed into word شس, but when I use HTML markup
<span>ش</span>س
these letters are not concatenated right in the output.
In the picture, desired output is on second line, actual output is on first line.
EDIT: It works on Firefox, but does not work in Chrome/Safari.
Insert a zero-width joiner (e.g. using the entity reference ) at the end of the span element content: <span>ش</span>س.
More generally, the zero-width joiners at the start and end of each span element as well as (just to be more sure) before and after each span element, in situations where the text should have cursive (joining) behavior and span may break it.
The issue is discussed and illustrated on the Bidirectional text page by Andreas Prilop.
Update: Unfortunately, it seems that even does not help on current versions of WebKit browsers. They seem to treat HTML markup as breaking joining behavior, no matter what.
Update 2: As described in #NasserAl-Wohaibi’s comment, the new problem can be solved by using twice. However, in current Safari (5.1.7) for Windows, it does not help; in fact, it displays even شس wrong whereas without the joiner, it shows شس correctly.
This is actually a reported bug in WebKit, thus presumably affects all WebKit-based browsers.
As Jukka K. Korpela indicated, This is mostly a bug in most WebKit-based browsers(chrome, safari, etc).
A simple hack other than the TAMDEED char or getting contextual forms for Arabic letters would be to put the zero-width-joiner ( or ) before/after the letter you want to be treated as single Arabic ligature - two chars making up another one. e.g.
<p>عرب<span style="color: Red;">ي</span></p>
demo: jsfiddle
see also the webkit bug report.
In HTML, there is no character for a tab, but I am confused as to why I can copy and paste one here: " " (You can't see the full width of it, but if you click to edit my question, you will see the character.) If I can copy and paste a tab character, there should be a unicode equivalent that can be coded into html. I know it doesn't exist, but this is a mystery I've never been able to grasp.
So my question is: why is there not a unicode character for a tab even if I can copy and paste it?
Sure there's an entity for tabs:
(The tab is ASCII character 9, or Unicode U+0009.)
However, just like literal tabs (ones you type in to your text editor), all tab characters are treated as whitespace by HTML parsers and collapsed into a single space except those within a <pre> block, where literal tabs will be rendered as 8 spaces in a monospace font.
Try
as per the docs :
The character entities and denote an en space and an em
space respectively, where an en space is half the point size and an em
space is equal to the point size of the current font. For fixed pitch
fonts, the user agent can treat the en space as being equivalent to A
space character, and the em space as being equuivalent to two space
characters.
Docs link : https://www.w3.org/MarkUp/html3/specialchars.html
put it in between <pre></pre> tags then use this characters
it would not work without the <pre></pre> tags
Posting another alternative to be more complete. When I tried the "pre" based answers, they added extra vertical line breaks as well.
Each tab can be converted to a sequence non-breaking spaces which require no wrapping.
" "
This is not recommended for repeated/extensive use within a page. A div margin/padding approach would appear much cleaner.
I use <span style="display: inline-block; width: 2ch;"> </span> for a two characters wide tab.
Tab is [HT], or character number 9, in the unicode library.
As mentioned, for efficiency reasons sequential spaces are consolidated into one space the browser actually displays. Remember what the ML in HTML stand for. It's a Mark-up Language, designed to control how text is displayed.. not whitespace :p
Still, you can pretend the browser respects tabs since all the TAB does is prepend 4 spaces, and that's easy with CSS. either in line like ...
<div style="padding-left:4.00em;">Indenented text </div>
Or as a regular class in a style sheet
.tabbed {padding-left:4.00em;}
Then the HTML might look like
<p>regular paragraph regular paragraph regular paragraph</p>
<p class="tabbed">Indented text Indented text Indented text</p>
<p>regular paragraph regular paragraph regular paragraph</p>
Does HTML support any kind of hinting for (title) line-breaks? The problem is that often page titles or navigational titles have funny line breaking, as they happen to break wherever the current HTML element spacing allows. Manual line-breaks would give much nicer visuals, when line-breaks are needed.
The line-break hints should be something the editor can himself/herself enter to an <input type=text> field.
There's the (soft hyphen) entity and the <wbr> tag (HTML5) for word break marks. Notice that will create a hyphen if the line breaks, while <wbr> won't.
See also http://www.quirksmode.org/oddsandends/wbr.html.
So an editor could use for word breaking if he prefers a dash (-), or if he prefers only a word break.