'font-family: Symbol' and Windows-1252 - html

I have a bunch of HTML documents that contain some simple text in Windows-1252 encoding, but throughout the text there are numerous appearances of span elements with font-family: Symbol.
For example:
<span style='font-family:Symbol'>Ñ</span>
Which appears as the greek delta - Δ in the browser.
Google told me that using the Symbol font might show different results on different systems, as it's not actually a well defined font.
Is this really true? Is it "unsafe" to use the Symbol font?
If so, is there any way to reliably convert (on my own system) such symbols in the Symbol font to their Windows-1252 counterparts?

It's been always unsafe to rely on having certain font installed on all the computers/smartphones/gadgets that visit your site. There're some font embedding techniques that work reasonably well in some modern browsers but you'd need to repack the Symbol font and I doubt the copyright owner allows you to do it.
Of course, most characters in the Symbol font are not in the Windows-1252 encoding but that should not be an issue. You can use the following map to obtain the appropriate HTML entities. However, you'll have to write a script or program using a programming language (HTML is just a markup language).

When using font-family, if neither of the listed font faces are found on the client, that is without the webfont embeds, may result in changing to default font of client hence a different font replacement for what you'd show to your users.
You may want to use UTF-8 encoding and put the delta (Δ) sign in your HTML content, or use webfont embeds to provide an option, "use the font I want from this".

The problem is that the greek letter you see is just the appearance, the actual letter is something completely different.
I can think of two ways to convert it:
1. Write a script (in your language of choice) that converts each letter to it's Greek counterpart. (Ñ => Δ)
2. Take a screenshot of the document/page and use an OCR-program to convert it to Greek text.

Related

ž is displayed bigger than other letters

I'm currently working in a website where some words contains č, ď, ň, ř, š, ž.
Unfortunately the special letters are displayed bigger than the normal letters.
Here an example:
I use a .properties file where I include the text like this:
Any ideas? I have another website with the same content but different cms and there it works well. CSS and font is the same for both sites.
Any suggestions?
Find a webfont containing all required characters.
Of course this may be nearly impossible if you're expecting historians to enter linear-b, hieroglyphs and other ancient and/or modern characters .. AFAIK there is no font covering all of UTF-8.
As already said, probably the used font doesn't contain that character, so the browser uses another font only for that letter, which can be different in actual size.
To avoid that, you should find a font that contains all the letters you need. I usually go to Googlefonts for that (https://fonts.google.com/) and there, at the top center, in the input field that is labeled "sentence" by default, type all the special characters needed (like č ď ň ř š ž ü ä ß © etc.). This will display these characters with all the fonts listed. That way you can see if a font contains those (and how they look) or shows "?" or similar instead.
You might find the CSS Unicode range facility useful.
If you have a font which has gaps, or in which you don’t like a certain character it is giving you, then you can change it for a character from a different font.
See for example https://developer.mozilla.org/en-US/docs/Web/CSS/#font-face/unicode-range
The unicode-range CSS descriptor sets the specific range of characters
to be used from a font defined by #font-face and made available for
use on the current page. If the page doesn't use any character in this
range, the font is not downloaded; if it uses at least one, the whole
font is downloaded.

html special character UTF-8

i want to design my own video player. I want to add this special character as a "volume"-button: 🔉 http://www.fileformat.info/info/unicode/char/1f509/index.htm. But it always shows a weird rectangle with the number 01F509.
Whats wrong ?
greetings
When you see a weird rectangle with the number 01F509, it means that the browser has correctly recognized the character but cannot display it due to lack of glyph for it. Either the system has no font containing a glyph for the character, or the character is unable to use such a font, due to a browser bug. For generalities, see my Guide to using special characters in HTML.
The Fileformat.info page cited has a link to a list of fonts that support the character. The list is short: Quivira, Segoe UI Emoji, Segoe UI Symbol, Symbola. (LastResort is not a real font.)
Segoe UI fonts are proprietary and available only in relatively new versions of Windows. Besides, there fonts exist in several version, and this character seems to be a rather recent addition.
This means that you wold need to use Quivira or Symbola as a web font (downloadable font). If you choose to do that, use e.g. http://www.fontsquirrel.com/ to generate the font files and the CSS code for using them. Note that both Quivira and Symbola rather large fonts, so using them just to get one glyph is a bit disproportionate.
At this point, it is rather obvious that some other approach is most probably better, e.g. using an image in sufficient size and reducing it according to font size by setting its height in em units.
To avoid encoding issues, try including the character in your HTML code using the hexadecimal entity notation: 🔉 then make sure that the font you are using contains this character, else you'll see a rectangle in place of the character.
You may create your own font with the desired character using tools such as http://icomoon.io
Also note that some browsers have issues displaying characters outside of the range 0000 to FFFF (plane 0 of the Basic Multilingual Plane (BMP)). I have experienced the issue with Safari on Windows and IE <= 8. So try avoiding this range if you want to support all browsers.

Why the HTML entity doesn't work on the Windows Chrome?

HTML entities are not working on chrome and IE (on windows).
I have entered the following code in my page and it works fine on mac chrome or firefox or safari, but not on windows.
<span class="font-family:Arial;">〈 〉 〉 〈 </span>
This is primarily a font issue, though there is a nasty silent change in HTML specs involved.
Modern browsers interpret 〈 and 〉 as referring to U+27E8 MATHEMATICAL LEFT ANGLE BRACKET “⟨” and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET “⟩”, informally known as “bra” and “ket”. This interpretation is being made official in the named character references section of HTML5.
These characters are adequate for use in many mathematical notations, and the ISO 80000-2 standard explicitly specifies that they are used e.g. for certain scalar product notations. But support to them in fonts is rather limited. In old Windows systems, no font contains them. In newer Windows systems, from Windows Vista onwards, Cambria Math should be available. It is possible that you have been testing on an old Windows version, but it is also possible that Chrome is unable to find the right font. To give it a helping hand, use a CSS rule that suggests that font, e.g. with the attribute
style="font-family: Cambria Math"
You might consider adding some other fonts to the list, using fonts that are known to contain the characters. See my Guide to using special characters in HTML.
The nasty change is that in HTML 4.01, in the entities section, 〈 and 〉 are defined as referring to U+2329 LEFT-POINTING ANGLE BRACKET “〈” and U+232A RIGHT-POINTING ANGLE BRACKET “〉”. They are logically less satisfactory (and deprecated by the Unicode Standard), but they have somewhat wider font support.
So in addition to declaring fonts that contain the characters you use, you need to decide which pair of these characters you use or whether you use something else; it's a complicated question. If you use them, it is best to use them as such (in a UTF-8 encoded HTML document) or using numeric character references such as ⟨. The reason is that 〈 and 〉 should not be expected to work consistently; they probably work the HTML5 way in all modern browsers, but there is hardly any reason to take the risk, when you can unambiguously indicate the characters you want.
That particular character is simply a unicode codepoint which is an arbitrary number. There are a lot of unicode codepoints that do not have an 'official' symbol. Even if they do have a symbol, it is not necessarily the case that your font has a symbol for that codepoint. If you choose a different font, you may end up with a different symbol.
I looked at the CSS for the page and it shows this character displaying in Arial (plus a bunch of other fonts that do not matter). Windows comes with Arial so it should always pick up that font first. It looks like Arial does not have a symbol for that unicode codepoint. Anytime you do not have a glyph for a codepoint, it puts in some form of a box indicating there is no glyph
It depends on the entity, and the fonts on the system your reader is using. The issue is that these characters are not in the MathJax web fonts, so MathJax has to fall back on system fonts to find them. Some browsers are better at that than others. Your configuration controls what fonts MathJax lists for the browser to look in, so you may want to modify that to include fonts where you know your entities can be found (and you may want to think about the fact that you may have people reading your site on Windows, Mac, and Linux, and also mobile devices, so such decisions are not always easy).
Notice that when you install STIX fonts, it works for you. This is because STIXGeneral is in the default list of fonts that MathJax uses for unknown characters. You want to add others to that list (it is stored in the undefinedFamily property of the HTML-CSS and SVG sections of your configuration). Note however, that IE will stop checking fonts once it encounters a font that is installed on the system, even if it doesn't include the needed character and later fonts in the list do, so you have to be careful about the order that you use.

Render MS Symbol font characters in html5

I want to take characters in the Microsoft Symbol font (taken from the w:sym tag in a docx file) and render them in html. When I look at how Word writes out the characters when I save the doc as html, I see this:
<span style='mso-char-type:symbol;mso-symbol-font-family:Symbol'>Â</span>
This appears as a script R in both Word and Word's html output.
When I write the same thing in my own html file, I see the A-hat in the regular font, and Chrome's element inspector warns that the mso- properties are unknown.
In Word's html output there is lots of mso-specific stuff but nothing I can see that lets Chrome know how to interpret mso-char-type and mso-symbol-font. I see the same behavior in IE.
Is there an easy way to tell the browser to use the Symbol font? Or do I have to explicitly translate the Symbol font characters to Unicode (using a static translation table?)
Thanks,
Wayne
The Symbol font is a privately-encoded font, i.e. it places various glyphs in positions that should be occupied by other characters according to character code standards. This means that a web page using it will fail badly whenever the Symbol font is not available, or the page style sheet is overriden, or the browser behaves correctly: e.g., the letter “” cannot be rendered using the Symbol font, so the browser will use a fallback font.
The proper way is to use Unicode encoded characters, such as “ℜ”, in a UTF-8 encoded page, with font-family on the applicable element set to contain a list of fonts that contain this character. For general notes on this, see my Guide to using special characters in HTML.
An inappropriate way that has worked on some faulty browsers is to set font to Symbol in a manner generally understood by browsers, e.g. <font face=Symbol>Â</font> or <span style="font-family: Symbol">Â</span>. But as said, if this “works”, consider it a browser bug.
So yes, if you now have data using Symbol font, it should be mapped to Unicode characters.
Note that characters like “ℜ” (Black-letter capital R, not script R) are seldom needed. In particular, the standard (as per ISO 80000-2) notation for the real part of a complex number z is not ℜ(z) but Re z.
Ok, just removing mso-symbol and writing font-family:Symbol seems to have worked. However I suspect this is not really best practice... A table for translating symbols into unicode can be found here: http://www.alanwood.net/demos/symbol.html

Prevent browsers from using default/fallback fonts

I have a web app in which a user can change the font family of an input text area in a WSIWYG-kind style. Now, let's say the user inputs some Chinese text in the text area, but selects a Font that has no support for Chinese characters. In my application, I'd like the user to see those nasty squares (or something like that) that are usually shown when the font doesn't support the character. That way, the user would know that the font doesn't support the language and could choose a different one. The issue I'm having is that the browsers (Firefox 17 and Chrome 23) seem to render the Chinese part of the text with fonts (as Arial) that do support those Chinese characters, making the user believe that the font he's trying to use works fine.
Is there a way (I'm guessing through CSS) to prevent this? Is there a way of making the browsers not to be so "nice" for only this time?
Thank you in advance.
As the other answer already explained, the solution is to use a fallback font which includes 'all' unicode codepoints. However the difficult part was to find or built one which doesn't weight a few MBs.
A few years later there is now a more lightweight solution for a fallback font, the NotDef font by Adobe. It shows a box with a cross for 1,111,998 Unicode code points, is only about 22Kb and is using the SIL OPEN FONT LICENSE Version 1.1.
If you don't want to show anything there is also the Adobe Blank font.
You can intercept the font substitution process by throwing in a catchall font, using some equivalent of font-family: userChoice, yourCatchAll where yourCatchAll is a font that has a generic glyph for all characters.
The problem is in finding such a font. The LastResort font distributed by the Unicode Consortium would be ideal, since it also visually indicates the category of the character in broad terms, but its EULA does not seem to allow modifications. It is debatable whether this applies to the construction of web font formats (like .eot and .woff).
The Unicode BMP Fallback Font appears to have more liberal rules of use, but it displays a character simply as its Unicode number in a box (and supports only Basic Multilingual Plane, though it contains all characters that most people ever heard of).