well on my site for some reason any hebrew text is showing up like
\u05d0\u05d9\u05d9\u05dc \u05d2\u05d5\u05dc\u05df
and so on, anyone know why? and how I can get it showing up properly.
Apparently, the Hebrew text is supposed to be written by some software, possibly JavaScript code, where e.g. \u05d0 within a string literal denotes the Unicode character U+05D0 HEBREW LETTER ALEF “א”. It sounds like such a construct has been mistakenly “escaped” e.g. so that the \ character is preceded by another \ character, which nullifies the meaning and turns the \ to a mere character.
This was a rather abstract answer, though somewhat more concrete than the question. For a more detailed answer, please post some code.
Related
I'm using Manjaro Linux KDE and the most recent versions of Tcl and Tk, and am attempting to display Hebrew in a text widget. In testing, the Hebrew text was pasted into the Tcl script in the Kate text editor and appears in the correct order, right to left with compound characters.
Without using a specific font in Tcl/Tk, the text prints from left to right and separates the components of compound characters, such that the vowel points and cantillation marks appear as separate characters. After using the SBL Hebrew font, the words look better but the vowel points are not located properly and they are still written from left to right. I tried using the \u200f and \u200e marks but it made no difference; but I really don't know what I'm doing there and simply tried prefixing and suffixing it to the Hebrew word. Reversing the the string helps but the vowel points are not combined with the consonants.
I'm not using Tkinter but this older SO post seems to indicate that it is a Linux issue with Tcl.
If I extract Hebrew from SQLite using Tcl and write it to the command line using puts, it displays correctly. Also, if I copy the reversed text from the Tk text widget and paste it in this SO question, it is displayed in the correct order. To clarify, by reversed here, I don't mean using string reverse but simply that it appears reversed in Tk but when pasted in this SO box, it displays correctly.
Would you please tell me what I'm doing wrong and how to get it to display properly?
I tried to follow this document on internationalization in Tcl and encoding but don't follow how this affects displaying Hebrew in a text Widget. I also came across a web site that has code for a unicode editor that displays several languages including Hebrew but I can't follow that code either. I tried running the code and, if select Hebrew language, it writes right to left but I don't see vowel points or cantillation marks; but I don't know much about typing the Hebrew language.
Thank you.
.tw tag configure heb -font {"SBL Hebrew" 18 normal}
.tw insert end "בְּרֵאשִׁ֖ית" "heb"
# Also tried "בְּרֵאשִׁ֖ית\u200f" and "\u200fבְּרֵאשִׁ֖ית".
# and "בְּרֵאשִׁ֖ית\u200e" and "\u200eבְּרֵאשִׁ֖ית".
# Tried .t insert end [string reverse $h ] "heb", which order the
# consonants but the vowel points and cantillation marks are not correct.
This is the correct rendering.
This is from Tk. The first is in normal order and the second using string reverse. It can be observed that the vowel points are not "on" the consonants and the cantillation marks are not correct. I know little about Hebrew but I can tell they don't match and appear to be printed as separate characters instead of combined. I think what looks like a "t" under the Hebrew letter that looks similar to a "W" is two characters on top of each other-- a dot and the symbol sort of similar to a left parenthesis in the correct rendering.
I don't know why but after rebooting and installing the next batch of updates, not that they have anything to do with Tk, the rendering is different when a font is not set. However, once the SBL Hebrew font is set, then the characters are separated as displayed above.
I can tell you know that the text renders very close to correctly with Tk on macOS (I'm not sure how much is just font differences, and there's a bit of clipping of the descender decorations that I don't like, but I don't think that's Tk itself doing the wrong thing).
That means that it's definitely a rendering bug that you're seeing. I suspect it might relate to the size of chunks of characters fed into the renderer; if the low levels of the renderer are only being given a character at a time, then they've got no chance to get the overall placement correct or to apply any character combining. I'm guessing that the real issue is that TkpDrawCharsInContext() just calls Tk_DrawChars(), if my reading of the comments is right. (By contrast, the macOS renderer does something different here.)
I don't have a workaround.
I know that Hebrew and Arabic characters are going from right to left but I want to see all of them.
Have a look at the bidirectional character type (Unicode character property Bidi_Class). You're looking for characters of type R (Right-to-Left) or AL (Right-to-Left Arabic). The file DerivedBidiClass.txt in the Unicode database contains a list of all code points with these classes.
Quoting i18nguy:
Languages don't have a direction. Scripts have a writing direction,
and so languages written in a particular script, will be written with
the direction of that script.
Here are some scripts using RTL: Arabic, Hebrew, N'ko, Syriac, Thaana/Thâna, Tifinar, Urdu.
You can just look for unicode range of a given script. Like for example Tifinar: U+2D30 – U+2D7F.
Not sure what you want to achieve by looking at all those characters but I think that is the only way of actually finding them.
You can refer to the original page here:
http://www.i18nguy.com/temp/rtl.html
I'm putting an old text into HTML. Sometimes it uses Greek terms and phrases. But there's one character I've never seen before. It seems to be a combination of two other characters: small omicron (ο, ο) + small upsilon with perispomeni (ῦ, ῦ). Here is a PNG illustrating the character, and how it works:
Does anyone know how to put this character into HTML? Can it be found anywhere in Unicode? Has anyone even heard of it?
Thanks.
That's called a ligature. I couldn't find any Unicode character for that one, though there is the Latin version of it:
http://en.wikipedia.org/wiki/Ou_(ligature)
Which mentions the Greek.
I'm receiving data from my database, and I'm showing it through echo statements, but for some reason all the basic punctuation eg (',") are all returning small diamonds with Questionmarks inside of them, can someone tell me what is wrong?
It sounds like you may need to escape some of those special characters. Here is a list of escape codes that you can use:
Escape Character Codes
If using these codes doesn't work, make sure that the actual document encoding matches the UTF-8 encoding specified. This can be examined in a text editor like Notepad++.
Ok, so I want to have the characters from below in my html page. Seems easy, except I can't find the HTML encoding for them.
Note: I would like to do this without having sized elements, plain ol' text would be fine ^_^.
Cheers.
You can see that they have a unicode number of the selected character - at the bottom of the picture ("U+266A: Eighth Note").
Simply use the last portion in a unicode character entity: ♪ - ♪
If your page is already UTF-8, you can simply paste it in.
Try encoding it as █ - that should do the trick!
In a UTF-8 encoded page, just copy and paste them as-is.
Otherwise, use the number that the dialog gives you for each character, e.g. ♪
However, when working with rather exotic characters, be very wary of font support. See e.g. this question for background: Unicode support in Web standard fonts
This page gives some information about support for the characters you want to use. They seem to be relatively well supported, but a test on Linux and Mac machines won't hurt.
Here is one comprehensive entity reference. If you want to convert symbols into their entity counterparts, I suggest using this converter.
My suggestion is to use hexadecimal reference. ( it's easy dont worry :) )
for example, the first character you have highlighted in red got ascii value of 175, which is AF in hex.
So in short you can encode it using %AF, and so on...
is it clear mate? Let me know if you need further explanation or help about this :)
Edit: my post is meant for url encoding.