Korean sentences being split randomly - html

I'm facing a problem here: I have a text in Korean decimal Unicode and the text is displayed in 4 columns and many rows (as it's the answers of a language test). The problem is that because the width of each answer is 20%, the sentence splits randomly in the middle of the word when it doesn't fit, instead of in the spaces between words. I don't know how to treat this, since this text is loaded and displayed automatically from a database.
The HTML code for each one of the 4 columns is like this:
<table class="courses" border="0" cellpadding="2" cellspacing="2" width="100%" style="font-size:13px;">
<tbody>
<td width="20%">
<p align="center">
<input name="a[X]" value=1" type="radio">
<br>
<?php echo "바쁘면 가지 마세요" ?> // this comes from a DB, its the unicodes of the korean characters<br>
</p>
</td>
</tbody>
</table>
What could I do to fix this and, when it doesn't fit, avoid splitting randomly, but do it when a sentence ends? If you notice in the Unicode codes, you can tell there's a space between ;면 &#44032, but it breaks just anywhere, the same for all the text.
(Note that there aren't any encoding problems, the Korean characters are displayed properly. And it doesn't happen with other languages like Swedish or Spanish).
EDIT
Here's a working example.
Note that in the example, the first answer is split in the last two characters, when that word has five characters, so should be split 3 chars before.

Line breaking for CJK (Chinese/Japanese/Korean) text can be quite problematic given the current state of web standards.
There is not too much you can do in a language-agnostic manner; CSS level 3 defines related attributes (line-break and word-break), but I 'm not so sure what the support level is accross modern browsers (obviously not-so-modern browsers are entirely out of the picture).

It doesn't really matter as Korean can be split anywhere anyway. See this screenshot from Chosun.com:
The words are cut anywhere, seemingly randomly. You don't need to worry about hyphenation.

It sounds like you are just encountering the default behavior of white-space. You could take a look at the CSS white-space property and try something like pre.

I'm facing the same issue and I think the best solution is to wrap each word in a span with white-space: nowrap;. This makes sure there won't be any line break inside the word.
See this JSFiddle for a proof of concept: http://jsfiddle.net/we7jx08r/. When you change body's width, you'll notice that the line breaks are always correct.
See http://css-tricks.com/almanac/properties/w/whitespace/ for white-space: nowrap browser support (IE5.5+, FF1+, Safari 1+).

You could try a solution I worked on a while ago: https://stackoverflow.com/a/46714474/2114953
That is, if you like to use JS in order to wrap each word into span HTML elements and then use CSS display: inline-block to force words to go to new lines if need be.

I've been looking in to this for a project I'm working on. I think https://www.w3.org/TR/css-text-3/#line-breaking covers this, particularly "Example 5":
As another example, Korean has two styles of line-breaking: between any two Korean syllables (word-break: normal) or, like English, mainly at spaces (word-break: keep-all).
Indeed, the default behavior of Google Chrome (version 61) is that it breaks on the syllables (I assume as I don't speak or read Korean)
Setting word-break: all, seems to override this behavior and put line wraps in on white spaces only.

word-break: keep-all; Fixed it for me. I added this if the user is view the site in Korean specifically.
word-break: keep-all; -> "Word breaks should not be used for Chinese/Japanese/Korean (CJK) text. Non-CJK text behavior is the same as value 'normal'" (https://www.w3schools.com/cssref/css3_pr_word-break.asp)

Related

Is there a CSS solution for removing spaces for non-space using languages?

Some languages don't use space. Japanese for example.
A typical paragraph might look like this (taken from the Japanese Wikipedia article on Stack Overflow)
本サービスはコンピュータ・プログラミングの広範囲なトピックを扱っていることが特色である。ウェブサイトは質問と回答を行う機能、またそれらに対する評価付け、wikiやdiggに似た文書の編集機能を備えており、ユーザの活発な参加を促している。Stack Overflowのユーザは良質な回答を行うことによって、評価ポイントや「バッヂ」を得ることができ、本サービスは伝統的なQ&Aサイト・フォーラムにゲーミフィケーションを施したものと言える。全てのユーザによる記述内容はクリエイティブ・コモンズライセンス下にある。
Even though there are 3 sentences in the paragraph above the only space in inside Stack Overflow.
So there's the issue. Japanese users generally don't write long sentences and paragraphs with no breaks. To write the paragraph above most people would not write.
<p>本サービスはコンピュータ・プログラミングの広範囲なトピックを扱っていることが特色である。ウェブサイトは質問と回答を行う機能、またそれらに対する評価付け、wikiやdiggに似た文書の編集機能を備えており、ユーザの活発な参加を促している。Stack Overflowのユーザは良質な回答を行うことによって、評価ポイントや「バッヂ」を得ることができ、本サービスは伝統的なQ&Aサイト・フォーラムにゲーミフィケーションを施したものと言える。全てのユーザによる記述内容はクリエイティブ・コモンズライセンス下にある。</p>
They'd write something more along the lines of
<p>
本サービスはコンピュータ・プログラミングの広範囲なトピックを扱っていることが特色である。
ウェブサイトは質問と回答を行う機能、またそれらに対する評価付け、wikiやdiggに似た文書の
編集機能を備えており、ユーザの活発な参加を促している。Stack Overflowのユーザは
良質な回答を行うことによって、評価ポイントや「バッヂ」を得ることができ、本サービスは
伝統的なQ&Aサイト・フォーラムにゲーミフィケーションを施したものと言える。全てのユーザに
よる記述内容はクリエイティブ・コモンズライセンス下にある。
</p>
Which unfortunately becomes this
With all these unwanted gaps
The only solution I can think of requires JavaScript to go through and remove spaces between Japanese characters and any other character at display time or by adding a build step.
Is there a CSS only solution?
Here's a live sample: The first paragraph is one long hard to edit line. The 2nd paragraph has the line breaks in it
<p>
本サービスはコンピュータ・プログラミングの広範囲なトピックを扱っていることが特色である。ウェブサイトは質問と回答を行う機能、またそれらに対する評価付け、wikiやdiggに似た文書の編集機能を備えており、ユーザの活発な参加を促している。Stack Overflowのユーザは良質な回答を行うことによって、評価ポイントや「バッヂ」を得ることができ、本サービスは伝統的なQ&Aサイト・フォーラムにゲーミフィケーションを施したものと言える。全てのユーザによる記述内容はクリエイティブ・コモンズライセンス下にある。
</p>
<p>
本サービスはコンピュータ・プログラミングの広範囲なトピックを扱っていることが特色である。
ウェブサイトは質問と回答を行う機能、またそれらに対する評価付け、wikiやdiggに似た文書の
編集機能を備えており、ユーザの活発な参加を促している。Stack Overflowのユーザは
良質な回答を行うことによって、評価ポイントや「バッヂ」を得ることができ、本サービスは
伝統的なQ&Aサイト・フォーラムにゲーミフィケーションを施したものと言える。全てのユーザに
よる記述内容はクリエイティブ・コモンズライセンス下にある。
</p>
Here are screenshots to show the difference.
1st paragraph with no breaks in HTML
2nd with
Also note that whatever solution it should not collapse the space in Stack Overflow
Use white-space: pre or white-space: pre-wrap
Reference: https://www.w3schools.com/cssref/pr_text_white-space.asp
JSFiddle : https://jsfiddle.net/kqcvp10w/

HTML emsp entity exact behavior

I am having trouble finding an exact reference of the behavior of the HTML emsp entity. I have looked at W3.org, MDN, W3schools, and here, but I have not yet found a description that describes its breaking or wrapping behavior in HTML that does not have any special styling applied.
The code below shows an experiment I resorted to, but I am still a bit confused about when it will and will not wrap. Is there a good reference?
<!DOCTYPE html>
<html>
<style>
body {
font-size: 20px; font-family: Courier, fixed;
}
</style>
<body>
<p>Following is some text with some embedded emsp entities.<br>Here is one mid-word: sam ple. <br>And here is one on each side of a dash: lock - step.<br>Then, how about one after a period?<br>Right after the next period is one and then a normal space.  How about the standard space and then the emsp?<br>That sequence follows this sample sentence.  (Note that since the regular space came first, this can cause this text after it to become indented, whereas the emsp-then-regular space occurrence just before will never do that, I think.) As long as I'm looking at them after the end of a sentence, we should try putting just one emsp after a sentence instead of the regular space caracter.<br>I thought that would stick the two sentences together, but it does not do so here. Indeed, this is consistent with its behavior mid-word. Okay, how about multiple occurrences of it?<br>There are 3 in the brackets here: [   ] and so on. I played with that last one a bit and I cannot get it to break after the 3 emsps. Here, they seem to keep their width (they are not combined into one) and they are not breakable, not even either before the first or after the last one. So, I seem to only be able to get the "[" and never the "]" as the first character on a new line.<br>Okay, more extremely, trying brackets around 5 emsp chars, a word, and 5 more emsp chars: [     word     ] .<br>There we seem able to break before "word", but still never before "]". What's going on there?
</p>
<p>From the examples above, I think the mystery around emsp characters is mostly resolved for me.</p>
<p>Consider the standard behavior for a regular space character. Here, first remember that multiple occurrences of regular whitespace characters are all compressed as if they were a single regular space character. Then, of course, the regular space character takes a certain width, and lines are never broken just before the regular space, only after it. And the space normally allocated for a regular space character does not need to be rendered at the right edge of a box.</p>
<p>Similarly, text can break after an emsp character, but will not do so before. It is wider than a regular space character, but mostly behaves like it. Where it differs is if you have multiple emsp entities right next to each other. In that case, no break will occur before, within, or after the group (unless there is whitespace before or after it, in which case the whitespace is the location of the break). But if a set of multiple emsp characters are placed directly between two non-white characters (as in the bracket example above) then they are not compressed and no breaking occurs. That's all I'm thinking of at the moment ...</p>
</body>
</html>
emsp is a white space having the same width as the letter "M"
I found a brief description at http://opencoder.net/emsp.html
Whether it is a breaking or a non-breaking space, I would say the easiest way to find out is to test it.

Special character from custom font is not displayed in HTML

My HTML is using custom font to display special symbols. There are a lot of symbols, but only one of them is not displayed. It has the code 00AD
<table id="table_symbols">
<tr>
<td class="symbol">­</td>
<td>Pair of bishops</td>
</tr>
</table>
I've searched and found that this is a special code for the soft hyphen character in HTML, so looks like the character is not taken from the font, but replaced with soft hyphen.
Is there any way to forbid such replacement? Or it is easier to edit the font?
Read this:
http://unicode-table.com/en/00AD/
https://www.cs.tut.fi/~jkorpela/shy.html
The Unicode character 00AD is defined to be invisible, except at the
end of a line, where it may or may not be visible, depending on the
script.
Anyway what you used was not the html correct code, instead ­ or ­ are used in html.
Use this example to understand the behavior of the ­ element; (Note resize slowly and watch as the text has hyphen on the end but on normal view there is not).
This is the trick I used when making design for Hiring ads and the client wants pure grammar and also justified effect with no inner spaces(languages like German have big words and if no hyphens words will spread apart when justified).

Two spaces after every full stop in paragraph using CSS?

How do I put two spaces after every full stop in a paragraph using CSS?
Ah, the old "two-spaces-after-a-period" meme rears its ugly head again.
Two spaces after a period is something that pertains to the typewriter world, or the monospaced font world. We moved beyond it long ago, starting with TeX or even before. The point is not to have one or two space characters after a period, but to have a pleasing amount of space there. Algorithms like TeX go to great length to do so. The algorithms in modern web browsers are still primitive by comparison, but are starting to do better. Consider the following:
You'll see that the space after the period is (slightly) greater than the inter-word space, as it should be.
What about the case of justification? You'd hope the browser would put the extra space between sentences, in preference to putting it between words. And that's what happens:
Anyway, so you want more fine-grained control, to realize your own typographical vision on your web pages. The following has four characters between the sentences:
You could also use spaces of different widths from Unicode to get just the amount of space you want (see Wikipedia article).
So is there any way to do this automatically? CSS has a word-spacing property, but no sentence-spacing property (actually, it's not that easy to figure out what a "sentence" is, even in English, and less so in other languages). Of course, putting more spaces in your HTML is not going to do a thing, since HTML treats any run of white space as a single space. So you're going to have to write some code, or find a plug-in, which traverses the text in your page and inserts markup. Or, add a plug-in or something to your CMS to spit out code which is marked up appropriately. Your alternatives for doing so are:
Add or a combination of different-width Unicode spaces.
As another poster suggested, use span tags with margin.
As a variant on the above, use a <span class="sentence"> element, with a CSS rules like .sentence::after { content: "\2002"; }, where 2002 is the "en-space". This results in:
However, the bottom line is that the web is not a typographical environment, notwithstanding the many worthy efforts to nudge it in that direction. Depending on your goals, you might consider creating your documents in a high-end document preparation environment, and publishing them as PDFs, for example.
The two spaces concept after a sentence is not "ugly" - in fact, it's just the opposite. Because of modern font kerning as well as the variety of fonts that Web browsers now support, it's sometimes very difficult to determine if a sentence has ended or if there is simply a word that is abbreviated that requires a period, not to mention a look of constant run-on. With 'fat' letters beginning a sentence, such as an upper-case "W", it can appear as though there is actually no space at all. Adding an additional space after a sentence provides readers with clear breaks. However, I get it that it would be quite difficult to create CSS that could "understand" what a sentence is so that it would automatically insert an additional space after each.
You could put your full stop in a span-tag and give it some CSS attributes, like "margin-right: 5px;", if it's only the appearance you are looking for.
Can only be done if you put your full stop to a tag, like <span>. For example :
www<span>.</span>google<span>.</span>com
Then the css is :
span:after{
content : " "; /*two spaces*/
}

Partially colored Arabic word in HTML

I don't speak Arabic, but I need specific support for Arabic on our web. I need parts of Arabic words to be in a <span> with a different style than the rest of word. When I type two characters ش and س, they are composed into word شس, but when I use HTML markup
<span>ش</span>س
these letters are not concatenated right in the output.
In the picture, desired output is on second line, actual output is on first line.
EDIT: It works on Firefox, but does not work in Chrome/Safari.
Insert a zero-width joiner (e.g. using the entity reference ‍) at the end of the span element content: <span>ش‍</span>س.
More generally, the zero-width joiners at the start and end of each span element as well as (just to be more sure) before and after each span element, in situations where the text should have cursive (joining) behavior and span may break it.
The issue is discussed and illustrated on the Bidirectional text page by Andreas Prilop.
Update: Unfortunately, it seems that even ‍ does not help on current versions of WebKit browsers. They seem to treat HTML markup as breaking joining behavior, no matter what.
Update 2: As described in #NasserAl-Wohaibi’s comment, the new problem can be solved by using ‍ twice. However, in current Safari (5.1.7) for Windows, it does not help; in fact, it displays even ش‍س wrong whereas without the joiner, it shows شس correctly.
This is actually a reported bug in WebKit, thus presumably affects all WebKit-based browsers.
As Jukka K. Korpela indicated, This is mostly a bug in most WebKit-based browsers(chrome, safari, etc).
A simple hack other than the TAMDEED char or getting contextual forms for Arabic letters would be to put the zero-width-joiner (‍ or ‍) before/after the letter you want to be treated as single Arabic ligature - two chars making up another one. e.g.
<p>عرب‍<span style="color: Red;">‍ي</span></p>
demo: jsfiddle
see also the webkit bug report.