I am using tesseract 3.05 for reasons beyond my control. I am using source files to train the engine to detect this unique font. As I have a vast amount of samples, I am simply using the samples themselves as the training images rather than segment them into a font training image as this should give it more variation and training with the specific spacing issues this font has.
My question when generating the box files, as some letters are touching at corners (i.e . no clear break between glyphs), it will detect them as one glyph instead of two separate glyphs. An example it sometimes struggles with NA as the front serif of the A has bled into serif of the N. The image pre-processing I have applied has improved it by leaps and bounds but there are still some that I cannot correct on the image enough.
My question is this: can I simply denote the glyph as being NA in the box file?
If I cannot what would be the simplest solution? Introducing another glyph box seems like it wouldn't be a good idea but the only other solution I can see is to manually edit the image to make the separation of glyphs more obvious. This is itself anthi-thetical however as this is the kind of problem the font will have in the future that I am trying to OCR.
Thank you in advance but the documentation isn't specific on if I can correct a box glyph to being two characters instead of just one (or I just haven't found a relevant section where they explain this).
After scouring the documentation, I managed to find a lone paragraph that wasn't appearing in my website scraping:
"If you didn't successfully space out the characters on the training image, some may have been joined into a single box. In this case, you can either remake the images with better spacing and start again, or if the pair is common, put both characters at the start of the line, leaving the bounding box to represent them both. (As of 3.00, there is a limit of 24 bytes for the description of a "character". This will allow you between 6 and 24 unicodes to describe the character, depending on where your codes sit in the unicode set. If anyone hits this limit, please file an issue describing your situation.)"
Thus you can do what I ask: represent a glyph with two or more characters in a box file for Tesseract.
I'm trying to use the Oxygen font from Google Fonts in my website, but I'm having strange problems with it.
Firstly, it doesn't seem to want to render at certain sizes, like I can't make it 19px. It will do 18px or 20px, but not 19px.
I also notice that the heights of the letters are borked. Take a look at the attached pic, how the 'S' is out. That's a screengrab of the font at 19pt. However, everything is 18px tall except the 'S' which is the one thing that sticks out to 19px.
And at larger sizes to this, other letters start antialiasing oddly too.
Whether I try ems or pxs or pts, I'm getting these glitches.
If you go to Google Fonts and search for Oxygen at the left and type in some text at larger sizes, it does the same thing, strange S's, etc. But strangely, if you search for specimens of this font in Google Images, it seems to render and antialias much better than this (Oxygen specimens in Google Images). Any way to fix it or is this font broken at source?
I hate to tell you but the font is broken at the source. It was obviously made unprofessionally or designed to be exactly that way. I'd recommend just using a new font or dealing with it as a regular user wont exactly mind it. Maybe a web analyst would but it's a nice font to a normal user.
Just like I have mentioned in the title.
How do you guys manage the fonts , when you do a web design
because different browsers even you set equal in pixel but the result seems to be different.
It makes other element collapse with the things I have designed.
so please advice me.
Eg: most browsers are okay , but IE is bad.The fonts appear to be very big than in others.
For main body text, you don't. Some people want bigger text so they can read it more easily; get in their way at your peril. Use relative font sizes in units such as ‘em’ or ‘%’.
For small amounts of presentational text where you need text size to match pixel-sized on-screen elements, use the ‘px’ unit. Don't use ‘pt’ - that only makes sense for printing, it'll resize more-or-less randomly when viewed on-screen.
You can still never get the text exactly the same size because fonts differ across platforms—and Lucida Grande and Helvetica look very different of course.
Do all font ascenders/descenders have the same space above/below? I'm trying to write a global stylesheet which will take away top and bottom space from h1-6 elements which I found to be no more than 8 pixels or so (which lowers as the h elements lower.) The reason I'm considering this is because I won't have any tall characters which will occupy the ascender/descender so I really have no usage for it (plus I need exact precision in the positioning of my elements.)
My question is if all fonts have the same ascending/descending space, or if it varies by font or OS or browser.
Do all font ascenders/descenders have the same space above/below?
Nope
It varies by font, by OS, by Browser, and probably lunar cycle as well. You can expect fonts to consistently be inconsistent.
Some fonts don't even have the concept of ascenders/descenders. What would you do if an icon font was used? Some fonts align descenders such that they don't even descend below the baseline. Others, such as calligraphic fonts tend to drop below the baseline, whether or not the character has an actual descender.
When I'm building pages off of comps that don't include font-size descriptions, I often have to render a large set of varying font-sizes of a particular family. I have a utility webpage that I use locally so that I can determine which font size must be used, and what font alignment will work.
Example:
This example uses Arial, and even Arial renders differently for some sizes between Chrome, Firefox and IE. When you're using sets of fonts you then also have to worry about all the other options in the set, in case the user doesn't have that font installed.
If you absolutely must have an exact rendering, you should be using an image to render the text. Use the [alt] attribute to reference the text in the image. It's not as manageable as text because it requires re-rendering every time a content change is desired, but it works well enough, especially for things like logos which absolutely must render in a specified manner.
What is the best value for font size and line height where readability is concerned?
I myself prefer huge font size and greater line height like the one used in Dive into Python 3.
As with every other "what's the best" question in the world, the answer to this is "there is no 'best'" :-)
For font-size, arguably the 'best' is whatever the user has chosen themselves, either as the default or the minimum. In other words, leave the font size alone for main body copy, and only increase it for headings. You might consider decreasing it by a very small amount for non-critical content. 16px is generally the browser default.
For line-height, values between 1.3 and 1.5 are typically recommended for good readability, although this varies with font face and line length.
According to what W3C recommended, always use relative font size (em).
use
h1 {
font-size: 2em;
line-height: 2.5em;
}
instead of
h1 { font-size: 24px; line-height: 30px; }
So that user can always override the default font size.
There is no "best" font size and line height.
It all depends on the type of a site.
If it's mainly a WEB SITE with articles as the main content then bigger fonts and line heights may be better.
If it is an WEB APPLICATION then huge font sizes will prevent you from building a compact and functional interface. So you'll have to resort to typical OS font sizes.
It all depends.
Actually, there [is] a best font size and line height as far as “paragraph readability” on a web page is concerned. The objective of the question above is clear: readability. They did not ask for the best music or dressing style or something else that tolerates variety and different tastes; they did not ask for the best font color, or font size for my or your taste, or for old people, but the best font size and line height for specifically [readability], and there are most certainly guidelines for the best practice for that purpose. This is not a matter of taste, but a matter of optics and the mechanical how of reading; and it requires knowledge of typography too. Simplifying this to “whatever you and I like” is a gross confusion between taste and efficiency, color and health.
And I'm going to assume that the question is not asking about font sizes for titles, but for paragraphs on screen (because that's what we read mostly here on the web).
Even if we would politely label the existing guidelines as “recommendations”, they still pretty much provide an answer for the quest for “Best”. And before I share what I know, please realize that if you are not a typography designer by trade or at least very well researched in the field, then you are hardly qualified to answer the question above. And sometimes people do need excellent professionals to show them what they never realized was available, possible or existed; sometimes people are not sure what will feel or seem “best” until you actually show it to them and ask them to try it and see if they like it or if it grows fast on them. Sometimes people get used to wrong, unhealthy, or counter-productive practices, like sitting with a slouched back or reading web pages in small Times New Roman, until the wrong practice actually feels [comfortable] to them; it becomes their comfort zone! Does that mean that the “best” for them is what they habitually choose? Obviously not; not necessarily at all. And sometimes the problem is [not] even in the font size! Meaning, the user who chooses to increase the font size does not do so because of the font size, but because the line length (paragraph or column width) is way too long, or the line-height is way too tight, or there are no paragraphs at all (!) and it is just a wall of text, or the paragraph and background colors are too similar, so they increase font size to compensate for such bad-design issues, when they would've been perfectly happy with the same font size on a properly designed web page.
Now, regarding font size of paragraphs, and specifically for readability on the web, the best or recommended range is from 12px to 16px. This may change slightly if you're choosing non-conventional fonts, but that is not recommended anyway for websites. And specifying the size in px is probably preferable to pt (which is the unit used in Microsoft Word, for example) due to higher consistency in display across browsers when the size is specified in px (pixels); px for screen, and pt for print.
The argument of “whatever the user has chosen themselves” is invalid, because it assumes that the typical user changes the default font size in the browser settings at all; such an assumption is very misleading, and definitely does not represent actual data about the majority percentage or the typical Internet user who never bothers to change default browser font settings.
I respect the W3C recommendations, but today most browsers are perfectly capable of increasing the size of fonts on the screen anyway with Ctrl+(+), even if the size was specified by px, and Ctrl+0 to get the page back to original sizes. And in most cases, an informed designer is actually going to specify the font size of the paragraphs, line height, and width column so that they all compliment each other and enhance readability. So having [only] the font size of a well-designed page changed by an unaware user may actually compromise readability rather than enhance it, especially for long paragraphs. It is understandable that accessibility is important, and sometimes required for some governmental websites, but that can still be accounted for by the designer, so that column width increases as the font size is increased by the user; and in such cases, you would use the em unit instead of px, and the range would be from 0.75 em to 1 em (unless the base font size was hacked by a “body {font-size: 62.5%;}” CSS statement, then the range is 1 em to 1.6 em).
Regarding line height of paragraphs, the absolute minimum recommended is 150%. And in most cases, you should not need to go above 175%.
And again for web readability, sans-serif fonts (like Verdana) are much more highly recommended for paragraphs than serif fonts (like 'Times New Roman'). For printing, it is the opposite.
The two most recommended fonts to use on the web are Verdana and Georgia. For the best results, contrast them: best combination is Verdana for paragraphs and Georgia for titles.
You should also consider your content column width: it should fit from 75 to 85 paragraph characters (13 to 18 words) per full line in average. If lines are longer than that, web readability is compromised or slowed down; narrower than that, and your paragraphs look awkward.
Justify? No, pass on it; it slows down reading a bit, thus compromises readability, especially for fast readers (and you want fast readers to like your site!). Stick to align left for English.
The reasons for all these recommendations can be found in any popular web typography book and can be explained by most typography designers, should you really want to understand the wisdom behind in detail in order to feel convinced that these recommendations (or “best” practice guidelines) have very good reasons. Too big (or huge) a font, and readability is actually [not] enhanced, but compromised for most normal readers; too tiny a font, ditto. Why? Again, you can research that with the keywords I shared.
To summarize, for web readability of paragraphs: Verdana (sans-serif), 12—16px (not pt, and 0.75 em to 1 em when accessibility is super important), with line-height 150%—175%, 75—85 characters per line (control column width).