Related
For a recent project I've been working on a simple word processor, and because I need fine-grained control have had to implement a lot of the text shaping myself. Most of this is fairly straight-forward and described in detail places like here and here.
It's less obvious how to handle pressing down or up on the keyboard when dealing with non-monospaced text split across many lines. In monospaced text the algorithm is simple: move the text caret down one line and the same number of characters to the right that it was. But what about variable-width fonts? I've tried an algorithm like this (in pseudocode):
; Return text offset into next line after navigating down
function moveCaretDown():
Move text caret to start of next line
targetPixelOffset := previous pixel offset of caret in line above
textOffsetIntoLine := 0
pixelOffsetIntoLine := 0
prevDelta := Infinity
for each char in text of new line:
delta = abs(pixelOffsetIntoLine - targetPixelOffset)
; We are now further from the desired cursor offset than before, this must
; be closest slot to the caret's previous horizontal offset in this line.
if (delta > prevDelta):
return textOffsetIntoLine - 1
prevDelta = delta
pixelOffsetIntoLine += measureWidth(char)
currentOffset++
; Else return the offset of the last character in the line
return length of newline - 1
But I've found its behavior differs from text inputs in major web browsers and/or text editors (I can come up with some specific examples if needed). Is there some standard algorithm for this used by GUI toolkits or text shaping libraries? I was surprised I couldn't find a W3C standard on it, for example, considering this is behavior needed in every web browser.
* Inserting line breaks into a string at the correct places, handling ragged or fully-justified text, etc.
I don't think there's a standard other than to follow the Principle of Least Astonishment. Nowadays, that typically means seeing what the major applications do, since that will likely be familiar behavior to the user.
On the current line, you know the current horizontal offset. Let's call it x. I'm talking about the pixel position, not the number of characters or glyphs since the beginning of that line.
On the destination line, there is a set of horizontal offsets the caret can be placed (e.g., between glyphs). So you want to pick the one of those that's as similar to your current x as possible.
Furthermore, if the user moves the caret vertically several times in a row, you probably want to find the nearest to the original x. The caret may wiggle horizontally a bit as the user moves up and down, but you don't want it to drift. Once the user does something that intentionally changes the horizontal offset (e.g., inserts a character, uses a horizontal arrow, clicks the mouse, etc.) that's the best time to update x.
If you already have code to find the closest caret position to a mouse click, you might be able to re-use it as though the user had clicked the point exactly one line above or below the current x.
I've also seen some editors (including monospace text editors) that treat the end of the line as a special case. So if you move up or down when you're at the end of a line, you move to the end of the preceding or succeeding line. That seems a nice way to handle ragged right text and short lines at the end of a paragraph.
How does <div dir=auto>bla bla שלום bla</div> work?
By majority of words? First character?
I'm looking for a solution that looks at the majority of characters in a span in order to know its direction (an angularjs directive would also be good, but if this is already built in HTML I'd like to know).
When an element has its dir set to "auto", the direction of the
element is determined based on its first strong directionality
character, or default to the directionality of its parent element.
https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement.dir
Characters with the left-to-right, right-to-left, and right-to-left
Arabic types are called “strong directional”, or just “strong”
characters. Numbers are a special case; their reading direction is
always left to right, but they do not affect the reading direction of
neighboring characters. Even numbers that are displayed with
Arabic-Indic digits have a left-to-right character direction.
In the example you posted, it'll be ltr.
This is more a sort of curiosity. While working on a multilingual web application I noticed that certain characters like punctuation marks (!?.;,) at the end of a block element are rendered as if they were placed at the beginning instead when the writing direction is right-to-left (as it is the case for certain Asian languages I do not speak).
In other words, The string
Hello, World!
is rendered as
!Hello, World
when placed in a div block with direction: rtl
This becomes even more evident if the text is split in two parts and given different colors: a contiguous chunk of text at the end is rendered in two separated regions:
http://jsfiddle.net/22Qk9/
What's the point of this behavior? I guess this must be a peculiarity of (all?) right-to-left languages which is automatically handled by the browser, so I don't need to care about it, or should I?
If you want to fix this behavior add the LRM character in the end. It's a non=printing character.
Source : http://dotancohen.com/howto/rtl_right_to_left.html
Example : http://jsfiddle.net/yobjj6ed/
The reason is that the exclamation mark “!” has the BiDi class O.N. ('Other Neutrals'), which means effectively that it adapts to the directionality of the surrounding text. In the example case, it is therefore placed to the left of the text before it. This is quite correct for languages written right to left: the terminating punctuation mark appears at the end, i.e. on the left.
Normally, you use the CSS code direction: rtl or, preferably, the HTML attribute dir=rtl for texts in a language that is written right to left, and only for them. For them, this behavior is a solution, not a problem.
If you instead use direction: rtl or dir=rtl just for special effects, like making table columns laid out right to left, then you need to consider the implications. For example, in the table case, you would need to set direction to ltr for each cell of the table (unless you want them to be rendered as primarily right to left text).
If you have, say, an English sentence quoted inside a block of Arabic text, then you need to set the directionality of an element containing the English text to ltr, e.g.
<blockquote dir=ltr>Hello, World!</blockquote>
A similar case (just with Arabic inside English text) is discussed as use case 6 in the W3C document What you need to know about the bidi algorithm and inline markup (which has a few oddities, though, like using cite markup for quoted text, against W3C recommendations).
The accepted answer https://stackoverflow.com/a/20799360/477420 works if you can control markup/CSS of the value, if you have no control over HTML following approach could work.
If you don't know if page will be rendered RTL or LTR but some text is definitely LTR (i.e. English-only) you can wrap the value with LRE/PDF marks to signify that is LTR region. Text will be rendered LTR irrespective of page's LTR or RTL direction.
This works when you have some code that tries to render text without ability to change markup of how exactly it will show up on the page. I.e. you rendering value for "song tile" or "company name" field in some nested child component (or server side) without ability to control surrounding HTML elements.
One drawback of this and similar approaches (like LRM proposal in this question) with adding marks to text is copy-paste of such value from the resulting HTML page will generally preserve the marks but they are not visible/zero width. While for most cases it is fine consider if that is a problem for you.
Approximate sample code (some companies have "Inc." at the end which will end up with dot at the beginning when rendered as-is on RTL page):
// comanyName = "Alphabet Inc." - really likes dot at the end including RTL
if(stringIsDefinitelyAscii(companyName))
{
companyName = "\u202A" + companyName + "\u202C"
}
return companyName;
Details on LRE/PDF symbols can be found in https://unicode.org/reports/tr9/#Explicit_Directional_Embeddings:
LRE U+202A LEFT-TO-RIGHT EMBEDDING
Treat the following text as embedded left-to-right.
PDF U+202C POP DIRECTIONAL FORMATTING End the scope of the last LRE, RLE, RLO, or LRO.
Some approaches to figure out if string has RTL characters can be found in How to detect whether a character belongs to a Right To Left language?, JavaScript: how to check if character is RTL?, How to detect if a string contains any Right-to-Left character?.
I am url encoding a string of text to pass along to a function. However, it encodes the second space in a double-space as "%A0". This means that when I decode the string, the "%A0" is displayed as a question mark in a black box.
I really just need to be able to remove the extra space, but I'd like to understand what is causing this and how to handle it correctly.
For example:
Something Something else
Encodes to:
Something+%A0Something+else
%A0 indicates a NBSP (U+00A0). + indicates a normal space (U+0020). The NBSP displays as a replacement character (U+FFFD) because the encoding of the character does not match the encoding of the page, so its byte sequence is not valid for the page.
A quick Googling shows that %A0 is the non-breaking space character or in html. A + is the form-encoding for a standard space character.
Source
The problem you're having is that the second "space" is not really a space, it's a character that that font doesn't have a glyph (I think that's the term) to represent (hence the black box with the question mark). %A0 is the escape code for that character. Your code is technically handling it correctly, I think the problem is with whatever is generating the string in the first place.
If I refer to the chart on this page, %A0 is not a space. %20 is the space caracter's encoded value.
I am developing an application where i have to display names of users in following structure :
In above structure, in the name field, it may exceed the right border of the outer <div> tag, i want to cut the name value just before it touches the right border and append the a string '...' in the end just like below
How can i make it work for UTF-8, unicode or Normal english letters in the name field?
P.S. I m using PHP for server side processing.
The easiest way is to set some CSS on that div...
<div style="white-space:nowrap; text-overflow:ellipsis;">Er. Christopher Allen (ChristAllenMoreTextMoreText)</div>
You have to set the white-space attribute as well, otherwise you won't ever get to this elipsis point. You should also probably set overflow:hidden.
Big caveat though! This does not work in all browsers. IE7 and beyond, Safari/Chrome, are all fine. I believe there are issues with it in Firefox though.
Edit: Here is a Firefox workaround. Not amazing though.
Since you can't know the exact width of the container on the client side (in the user's browser), you must use Javascript for this.
You can't just let it wrap?
Perfecting a server-side character limiter so it cuts off just before the line-end will be tough, if not impossible, unless using a monospace font. And you can't account for the user's browser settings, so the solution will always be flawed. If any character limiter is created, it will need to be client-side.
What about setting overflow: hidden on the name element?
I personally do mine on the back-end with PHP using the following custom function:
function trunc($string, $limit, $break=" ", $pad="...") {
// return with no change if string is shorter than $limit
if(strlen($string) <= $limit) return $string;
$string = substr($string, 0, $limit);
if(false !== ($breakpoint = strrpos($string, $break))) {
$string = substr($string, 0, $breakpoint);
}
return $string . $pad;
}
Sure, it's not terribly schnazzy, and it doesn't exactly auto adjust, but my project requirments demand that everything works 100% correctly in IE6 and this works without fail.
If you HAVE to do it on the front end, try 3 dots: http://tpgblog.com/threedots/
As Brad suggested: CSS is the way to go. Flexible presentation is better handled on the client side. However, most browsers do not implement this specific CSS3 feature (yet). In the meantime you could try one of these jQuery (javascript) plugins that makes the same functionality available to all browsers.
http://plugins.jquery.com/plugin-tags/ellipsis
http://plugins.jquery.com/project/shorten
http://plugins.jquery.com/project/ThreeDots
http://plugins.jquery.com/project/text-overflow
http://plugins.jquery.com/project/AutoEllipsis
Are you using a database? If so you can write a conditional statement to check if the name is over a certain length and if so append ...
You might want to define your width in Em and then count characters PHP side to approximate where you need to cut and append your ellipses.
In effect you need to work out the pixel length of the string and keep chopping chars until it falls below the threshold. I could explain that on the server side if you had .net at hand, but you dont, so maybe a this javascript might help on the client : Truncate a string nicely to fit within a given pixel width