How to properly display Hebrew in text widget? - tcl

I'm using Manjaro Linux KDE and the most recent versions of Tcl and Tk, and am attempting to display Hebrew in a text widget. In testing, the Hebrew text was pasted into the Tcl script in the Kate text editor and appears in the correct order, right to left with compound characters.
Without using a specific font in Tcl/Tk, the text prints from left to right and separates the components of compound characters, such that the vowel points and cantillation marks appear as separate characters. After using the SBL Hebrew font, the words look better but the vowel points are not located properly and they are still written from left to right. I tried using the \u200f and \u200e marks but it made no difference; but I really don't know what I'm doing there and simply tried prefixing and suffixing it to the Hebrew word. Reversing the the string helps but the vowel points are not combined with the consonants.
I'm not using Tkinter but this older SO post seems to indicate that it is a Linux issue with Tcl.
If I extract Hebrew from SQLite using Tcl and write it to the command line using puts, it displays correctly. Also, if I copy the reversed text from the Tk text widget and paste it in this SO question, it is displayed in the correct order. To clarify, by reversed here, I don't mean using string reverse but simply that it appears reversed in Tk but when pasted in this SO box, it displays correctly.
Would you please tell me what I'm doing wrong and how to get it to display properly?
I tried to follow this document on internationalization in Tcl and encoding but don't follow how this affects displaying Hebrew in a text Widget. I also came across a web site that has code for a unicode editor that displays several languages including Hebrew but I can't follow that code either. I tried running the code and, if select Hebrew language, it writes right to left but I don't see vowel points or cantillation marks; but I don't know much about typing the Hebrew language.
Thank you.
.tw tag configure heb -font {"SBL Hebrew" 18 normal}
.tw insert end "בְּרֵאשִׁ֖ית" "heb"
# Also tried "בְּרֵאשִׁ֖ית\u200f" and "\u200fבְּרֵאשִׁ֖ית".
# and "בְּרֵאשִׁ֖ית\u200e" and "\u200eבְּרֵאשִׁ֖ית".
# Tried .t insert end [string reverse $h ] "heb", which order the
# consonants but the vowel points and cantillation marks are not correct.
This is the correct rendering.
This is from Tk. The first is in normal order and the second using string reverse. It can be observed that the vowel points are not "on" the consonants and the cantillation marks are not correct. I know little about Hebrew but I can tell they don't match and appear to be printed as separate characters instead of combined. I think what looks like a "t" under the Hebrew letter that looks similar to a "W" is two characters on top of each other-- a dot and the symbol sort of similar to a left parenthesis in the correct rendering.
I don't know why but after rebooting and installing the next batch of updates, not that they have anything to do with Tk, the rendering is different when a font is not set. However, once the SBL Hebrew font is set, then the characters are separated as displayed above.

I can tell you know that the text renders very close to correctly with Tk on macOS (I'm not sure how much is just font differences, and there's a bit of clipping of the descender decorations that I don't like, but I don't think that's Tk itself doing the wrong thing).
That means that it's definitely a rendering bug that you're seeing. I suspect it might relate to the size of chunks of characters fed into the renderer; if the low levels of the renderer are only being given a character at a time, then they've got no chance to get the overall placement correct or to apply any character combining. I'm guessing that the real issue is that TkpDrawCharsInContext() just calls Tk_DrawChars(), if my reading of the comments is right. (By contrast, the macOS renderer does something different here.)
I don't have a workaround.

Related

HTML - how to render unicode symbols appearing (from api) such as ’ – etc

There is a data quality issue in our app. Basically some characters from a very long time ago were not saved with standard chars.
Dashes appear as –
Apostrophe appear as ’
etc
Is this standard Unicode? I have looked for a few tables but I couldn't find &#150 or &#146 that matches to the punctuation chars I'm expecting.
Also, is there an easy way to render those HTML characters? Right now, it is appearing as square boxes in some editors, and in Notepad++ it is appearing as SPA (in black box).

Matching backticks/accute accents for ES6 template literals should appear immediately in VS Code

When I enter an apostrophe / single quote in VS Code, VS Code will automatically add a second one and put the cursor in between of the two. I want the same behaviour when I enter the acute accent `. By default, it will not show anything at first in order to allow creating special characters such as è. That might be useful in other applications, but definitely not needed in VS Code. Is there a way to fix that?
It's mostly an aesthetics thing, but I find it distracting when writing code.

Custom google font error in HTML

I have a blog where I use custom fonts from Google Fonts in each and every text of the <body> element, but whenever there is an inverted comma or a double inverted comma in my text, it is not shown as it should be - it is replaced by an unknown character.
I had even looked into the font and there is the character support for the inverted commas.
I don't think this has anything to do with your font.
If you look at the source code you will see the characters already are broken there:
This rather is a problem of your encoding. Your site is UTF-8, but the characters seem to be non-UTF-8. You either need to use UTF-8 characters or change the encoding of your site. (1st option is preferable)
If you change the site encoding to Windows-1252 (which is automatically suggested by Chrome based on the content) everything seems fine:
The question is how did you create this text? Maybe in Word and then copy and pasted? Or is your blog backend not UTF-8?
Also note there are two different characters: ’vs ´.
It's a special character. Please check below example
if you want to write "Don't" than you have to use "don’t"
if you want to write in double quote "highly sought users" than you have to use “highly sought users”
I hope this will help you.
Usually the special characters appears when you copy the text from other sources like MS word. This can be solved by manually entering inverted commas while entering or modifying in the database.

What are these characters and why are they rendered this way?

I want to understand what is happening when these characters are displayed that they are displayed the way they are displayed.
I saw it on social media (FB and Twitter) and can't seem to understand what's technically happening.
Edit: If they characters from a character set I don't have installed I still don't get why they tend to not be displayed in a line and overlap other space even outside their line?
!̸̶͚͖͖̩̻̩̗͍̮̙̈͊͛̈͒̍̐ͣͩ̋ͨ̓̊̌̈̊́̚͝͠ͅ ̷̧̢̛͖̤̟̺̫̗͚̗͖ͪ̏̔̔̒́ͥ̓ͫ̀ͤ̇ͥ͝ ̡̊͛̇ ͫ̉ͦ̊̀̔ͧͮ͆̽ͦͩ͋̌͗̚̚҉̵͖̟͙̮͈̼̹̞͝ͅ
It's the magic of Unicode.
Unicode handles all extant writing systems of the world, and that includes the ones with symbols instead of letters, the ones that are written right-to-left instead of left-to-right, and the ones which are written top-to-bottom. It also contains provisions for how to render glyphs that are technically combinations of base and modifier glyphs (even 16 bit isn't enough for all possible accented, composited, or context-adapted characters in all languages). (Trivia: The Unicode standard is so complex and contains so much code that security issues have actually been found in it.)
Any software that claims to support Unicode fully has to be able to follow all these rules, and that includes stacking characters on top of each other, overlaying them etc. etc. This means that any person with an internet connection who connects can have their native language rendered correctly - but I dare say that on English-language boards the predominant use of all those features is to render cool pseudo-graphics, as in your example.

Is it possible to print DOS characters on a website?

I would like to print some kind of ASCII "art" on a web page in pre-tags. These graphics use DOS characters to show a map like old maze games did. I didn't find anything in the HTML special character reference. Is there a way to use these characters in HTML ?
Thanks in advance.
With the right Unicode characters, the old character encodings shouldn't make much odds. The tricky bit may be converting existing ASCII art into Unicode - at which point you need to know the original encoding.
The relevant code charts will be listed on the Unicode "symbols" charts page. In particular, I suspect you'll find the box drawing and block elements charts useful.
You'll need to make sure that your page uses a font which contains the right characters, of course...
As an example, you can render this:
┌┐
└┘
With:
<pre>┌┐
└┘</pre>
Not quite a proper box, but getting there...
You can send them in the <pre> tags, although in XHTML you'll need to encapsulate it in <![CDATA[[]> I think. Be careful though, not all encodings render this correctly. For example, a lot of ASCII art designed for DOS code page 430 (US) fails over here in the UK (830). Eastern Europe suffers especially.
I think the best approach here would be to render images.
EDIT: Oh. You could try , but I'm not sure if that would work.