Arabic with diacritics renders weird - html iOS with webfonts

Arabic with diacritics renders weird - html iOS with webfonts - html

I have been debugging this for days, and no luck. The html page I am developing, has Arabic text. When using a webfont, Arabic text with diacritics does not render well. Words are too close to one another, often overlapping, and paragraphs have leading space, and text is clipped at the end. Things render well when diacritics are not present. Things render well without webfonts. The problem is just with iOS. I tried multiple webfonts. I tried many different things. Please help.

Related

Tesseract/gImageReader OCR: older texts are missing spaces between words

I'm working with some older texts, were the letters are sometimes a little smudgy.
The letters and words get recognized near-perfectly, and when I look at the hOCR or html file, the text looks perfect.
But when I export to PDF with an invisible text layer the spaces between words frequently go missing, for paragraphs at a time. This is annoying when trying to highlight parts of the text and then copy-paste those excerpts.
Any advice?
Other than these old texts, gImageReader is absolutely amazing and does exactly what I want. I did try the "Middle English" language, but that had the same result.

ETX characters (as L) showing up on websites

In the last few days I have had a couple of clients contact me saying that they are having some uppercase "L"'s appearing in places on their website. Upon investigating, I found that there were some random ETX characters on their websites. They are showing up on the websites on Windows (definitely on Chrome, maybe on other browsers too), but in Firefox on Mac I can see them in the source code. On Chrome on Mac I can't see them anywhere. Here are pictures of the problem:
picture of the issue
source code
My clients websites have not been updated in months so I'm guessing that Windows pushed out an update in the last week to the default language/encoding which is making these show up now.
Removing them is easy, but I wanted to understand where they are coming from and how I can avoid the problem in the future. It looks like the characters are in text that I would have copied out of Photoshop. Is there any easy way to sanitise and remove these kind of characters when I copy from Photoshop or other similar programs?
As I mentioned earlier, I am on Mac, using Chrome primarily. Is there any way to get Chrome to actually show these characters so that I can see if they are appearing?

You are correct that the issue is with Photoshop. Line breaks (Shift+Enter) are encoded in Photoshop as an ETX character (end of text), not an LF (line feed) or CRLF (carriage return + line feed).
These characters can be seen by pasting your content into a plain text editor such as Sublime Text. The find/replace function should make removing them easy.
I don't believe there is any way to get the ETX characters to display in Chrome for Mac.
However, since the characters are still present (even if they are invisible), you could select all the text on the page (Mac: Cmd+A / Win:
Ctrl+A) and paste it all into Sublime Text to find them.

What is the correct way to unify a page so it prints the same on all browsers?

I am creating a contract in HTML and I am having a lot of problems with printing. I can't get the font sizes to be anywhere close to the same between the newest version of Chrome and IE 9.
Chrome's fonts are big and the spacing between characters are unpredictable (sometimes really close together, sometimes far apart).
IE's font looks better but it's spacing between characters are really close and it is causing the page to be too short.
Is there a way to "synchronize" the printing between different browsers? Or at least get them close? Are there best practices when doing something like this?
Here is a scan of what the print difference looks like. The print preview looks fine but once it's printed the text on Chrome is really close together (or spread apart such as the word Drawings).

Looks like the letter spacing is a known Chrome bug
https://code.google.com/p/chromium/issues/detail?id=72017
I submitted my findings there.

Hebrew diacritics and Latin script combination support on Chrome for windows

I am currently coding a website where the designer has decided to combine Hebrew diacritics on a Latin script. such as the example below:
ayֶelֶet
This kind of combination renders properly (i.e. the diacritics are below the Latin letter e in both instances) on all windows browsers except for chrome. The funny thing is, that while it doesn't render properly on chrome for windows, it does on chrome for android, chrome for linux (debian) and chrome for MacOS. I tried the following two different markups, but to no avail:
<h1>ayֶelֶet</h1>
and:
<h1>ayֶelֶet</h1>
Does anyone have a solution or a workaround? I would love to just let it go, but since chrome for windows has such a large user share, I can't just ignore this. Also since I'm a bit of a standards geek, I'd really rather avoid using a .png instead of raw text.
Thanks a million,
Itamar.

I’m afraid Chrome for Windows has a bug in dealing with diacritic marks. It basically does not seem to treat U+05B6 HEBREW POINT SEGOL (or other Hebrew diacritics) as a nonspacing mark when it follows a Latin letter.
Note that the rendering is not correct on other browsers either – the segol is slightly misplaced (away from the horizontal middle of “e”) –, and the reason why it looks like acceptable is a result of two errors accidentally almost canceling out each other. In Unicode, a combining diacritic mark is written after the base character, not before it. On the other hand, IE and Firefox seem to handle the segol so that it is placed below the letter that follows it. This is presumably caused by directionality.
In general, browsers are still rather poor in rendering combining diacritic marks, except when used in usual contexts – and using Hebrew diacritics on Latin letters is rather unusual (even though it is valid by Unicode principles, which allow any diacritic to be combined with any letter, but there is no guarantee of what the result will look like).

Numerals as first characters in a line of html text are not displaying in Chrome

I'm observing this super weird bug on a news site maybe someone has seen before.
In the html text, if the first characters in a line of text are numerals, they are not displayed by the browser.
The html is coming through via a CMS, which forces the line breaks in the editor, but no tags are inserted. CMS data is XSLT processed into html templates.
When this text is sent to the browser, you can see the new lines are formed (without br tags), and you see that the numerals are still within the content. But these new lines are only honored by the browser if a white-space property is set using one of the "pre" values.
Seems to be related to the white space property as i can use the inspector to add white-space:pre-line/pre-wrap and boom, they appear.
Really keen to hear some thoughts on this, or could this be a possible Chrome bug?
Link to an example article here:
tvnz.co.nz/national-news/flights-cancelled-130km-h-winds-hit-wellington-5508294
In the last paragraph of that article you can read/inspect to see the missing numeral values.

So I really don't understand why this happens, but it has something to with the zoom setting... There are all kinds of articles about chrome bugs w/ the zoom setting, but none seem to address exactly what you were seeing...
If you inspect the page and change the zoom from 1 to .99999 it works... Again, I got the suggestion from this link but I'm at a loss to explain exactly what is broken w/ chrome, but it does seem like a chrome bug...

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Arabic with diacritics renders weird - html iOS with webfonts - html

Related

Tesseract/gImageReader OCR: older texts are missing spaces between words

ETX characters (as L) showing up on websites

What is the correct way to unify a page so it prints the same on all browsers?

Hebrew diacritics and Latin script combination support on Chrome for windows

Numerals as first characters in a line of html text are not displaying in Chrome

Categories

Resources