How do you choose alternate glyphs for a character? - actionscript-3

I'm working with the Gotham font, and it has alternate glyphs for the same unicode character. For example, at 0031, it has one character under GID 388, but another character under GID 257.
Is there any way I can specify the GID used?
Examples, as seen in Photoshop:
GID 257
GID 388

No, programmatically that's not possible with AS3/Flash, the only way I see is to create another remapped font based on Gotham.

Related

How to display ASCII 26 (control characters) in HTML

We have a record in SQL database, which contains a ASCII 26 character:
SELECT char(26)
From the looking, it's like a arrow, which we can see it in the Eclipse debugging. However, when we try to output it to HTML front-end, it just skipped that character. What's more strange is, the arrow does appear in page source.
It seems 26 belongs to control characters. So is it possible to display the arrow in HTML? Why some place like the debugging window of Eclipse can show it well?
It's a control character, unprintable by definition. Some character sets (or fonts, not sure which determines that) do print control characters; Unicode is not one of them. See Browser Test Page for Unicode Character 'SUBSTITUTE' (U+001A).
Decide what you actually want to display, and replace this character with an actually printable Unicode character.
You could for example use →, →, Unicode Character 'RIGHTWARDS ARROW' (U+2192).

Use specific glyph name with no Unicode value in HTML?

How can I use for example the glyph name "rcaron.terminal" which has no Unicode value in HTML? or any other such case? Is it even possible? I think it must be surely but I got no clue. It's easy for regular letters like the glyph "ß" where I would just type "&#xDF" and get that character or "&#223" (same result) but for glyphs without any Unicode value I don't know what I'm supposed to do...? I've tried also "&rcaron.terminal" but nothing, where as something like "&hearts" would work giving a heart glyph of god knows what font, probably Arial I dunno.
Do I need to use state some specific encoding aside from ANSI in my html document?
ie. < meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8" > or something... like Im really lost lol
All I found on the net was this http://text-symbols.com/html/unicode/ but I cant find any more info so I came here.
Please help! Thanks! :)
There are no glyphs in HTML which do not have a Unicode name.
If you really need to have a glyph which is not representable using regular Unicode, you might want to create a font of your own and define the glyphs you need in the private use area; but obviously, then, your HTML will be impossible to use without that particular font.
Background links:
http://arstechnica.com/information-technology/2008/10/embedded-web-fonts/
http://www.font-face.com/
Practical guides:
http://blog.fogcreek.com/trello-uses-an-icon-font-and-so-can-you/
http://blogs.atlassian.com/2013/07/how-to-make-an-icon-font-the-8-step-guide/
First navigate to this site: https://fontdrop.info/#/?darkmode=true
Upload the file with your font
Click on the Ligatures tab.
Every Glyph should have a Components field
copy the components for the character you want to use
paste that string into HTML
You don't need any & or #, it just detects the string and converts it.

Displaying UTF-8 codes from JSON file as Emoticons

I am loading a JSON file that contains some UTF-8 codes, that represent emoticons.
The JSON content looks as follows:
"Studying! \uf4d6"
"Winning \uf40e\uf3c1 #4mile"
"Cheer me on \uf603 #werunamsterdam"
These UTF-8 codes are displayed as blocks in the browser. But when I look at this Unicode reference in Firefox, the codes are actually recognized!
(for example, UF4D6 is a book)
How do I convert the code from my json so that a browser can display them?
The code points from \uE000 to \uF8FF are in a private use area, so there aren't any standard glyphs associated with them.
You can, however, create your own font with suitable icons at these code points. This can be done quite easily using online tools like IcoMoon. Alternatively, use a string replacement routine to swap these characters with suitable markup (e.g., replace \uf4d6 with <img src="/icons/book.png" alt="[Book]" />)
These emoticons are encoded as regular characters as defined in Unicode, i.e. they're no different from the letter "A" or "%". All you need is a font that has glyphs for these "characters". Since not everyone can be expected to have such fonts installed (apparently you don't), if you want maximum compatibility, there are libraries for most languages that replace these characters with equivalent images. Google for one that suits your needs.

Detect Multibyte and Chinese Characters in rtf markup

I'm trying to translate parse a RTF formatted message (I need to keep the formatting tags so I can't use the trick where you just paste into a RichTextBox and get the .PlainText out)
Take the RTF code for the string a基bমূcΟιd pasted straight into Wordpad:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 Calibri;}{\f1\fswiss\fcharset128 MS PGothic;}{\f2\fnil\fcharset1 Shonar Bangla;}{\f3\fswiss\fcharset161{\*\fname Arial;}Arial Greek;}}
{\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\f0\fs22 a\f1\fs24\'8a\'ee\f0\fs22 b\f2\fs24\u2478?\u2498?\f0\fs22 c\f3\fs24\'cf\'e9\f0\fs22 d\par
}
It's difficult to make out if you've not had much to do with RTF. So here's the bit I'm looking at
\'8a\'ee\f0\fs22 b\f2\fs24\u2478?\u2498?\f0\fs22 c\f3\fs24\'cf\'e9
Notice the 基 (u+57FA) is \'8a\'ee but the মূ, which is actually two characters ম (\u2478?) and ূ (\u2498?), is \u2478?\u2498? which is fine, but the Οι which is two separate characters Ο and ι is \'cf\'e9.
Is there a way to determine if I'm looking at something that should be one character such as 基 = \'bb\'f9 or two characters Ο and ι = \'cf\'e9?
I was thinking that maybe the \lang was it, but that isn't the case at all because the \lang does not change from when it's first set. I am already accounting for the Different Codepages from different Charset values in the fonts, but it doesn't seem to tell me anything about if I should treat two Unicode references next to each other as being a double byte character or not.
How can I tell if the character I'm looking at should be double-byte (or multi-byte) or single byte?
\'xx escapes represent bytes and should be interpreted using the fcharset encoding. (Or potentially cchs. Falling back to the ansicpg if not present.)
You need to know that encoding intimately to be able to decide whether a single \'xx sequence represents a character on its own or is only a part of a multi-byte character; typically you will be consuming each section of text as a unit before converting that byte string into a Unicode string using whatever library or OS interface you have available, to avoid having to write byte-by-byte parsers for every code page supported by RTF.
\uxxxx? escapes represent UTF-16 code units. This is much simpler, but Word[pad] only produces this form of encoding as a last resort, because it's not compatible with earlier RTF versions. (? is the fallback character for when the receiver can't cope with the Unicode.)
So:
The two characters Οι are represented as two byte-escapes because the font associated with that stretch of text is using a Greek single-byte encoding (charset 161 = cp1253).
The one character 基 is represented as two byte-escapes because the font associated with that stretch of text is using a Japanese multibyte encoding (charset 128 = cp932 ≈ Shift-JIS). In Shift-JIS the leading \'8a byte signals a further byte to come, as do various others in the top-bit-set range (but not all of them).
The two characters মূ are represented as Unicode code unit escapes, because there's no other option: there isn't any RTF-compatible code page that contains Bengali characters. (Code page 57003 for ISCII came much later.)
RTF has tags for specifying the codepage/encoding used to encode Unicode characters. The actual hex codes for the characters are the byte octets used by the specified encoding. In this case, \ansicpg1252 for Ansi codepage 1252.

What characters are allowed in the HTML Name attribute inside input tag?

I have a PHP script that will generate <input>s dynamically, so I was wondering if I needed to filter any characters in the name attribute.
I know that the name has to start with a letter, but I don't know any other rules. I figure square brackets must be allowed, since PHP uses these to create arrays from form data. How about parentheses? Spaces?
Note, that not all characters are submitted for name attributes of form fields (even when using POST)!
White-space characters are trimmed and inner white-space characters as well the character . are replaced by _.
(Tested in Chrome 23, Firefox 13 and Internet Explorer 9, all Win7.)
Any character you can include in an [X]HTML file is fine to put in an <input name>. As Allain's comment says, <input name> is defined as containing CDATA, so the only things you can't put in there are the control codes and invalid codepoints that the underlying standard (SGML or XML) disallows.
Allain quoted W3 from the HTML4 spec:
Note. The "get" method restricts form data set values to ASCII characters. Only the "post" method (with enctype="multipart/form-data") is specified to cover the entire ISO10646 character set.
However this isn't really true in practice.
The theory is that application/x-www-form-urlencoded data doesn't have a mechanism to specify an encoding for the form's names or values, so using non-ASCII characters in either is “not specified” as working and you should use POSTed multipart/form-data instead.
Unfortunately, in the real world, no browser specifies an encoding for fields even when it theoretically could, in the subpart headers of a multipart/form-data POST request body. (I believe Mozilla tried to implement it once, but backed out as it broke servers.)
And no browser implements the astonishingly complex and ugly RFC2231 standard that would be necessary to insert encoded non-ASCII field names into the multipart's subpart headers. In any case, the HTML spec that defines multipart/form-data doesn't directly say that RFC2231 should be used, and, again, it would break servers if you tried.
So the reality of the situation is there is no way to know what encoding is being used for the names and values in a form submission, no matter what type of form it is. What browsers will do with field names and values that contain non-ASCII characters is the same for GET and both types of POST form: it encodes them using the encoding the page containing the form used. Non-ASCII GET form names are no more broken than everything else.
DLH:
So name has a different data type for than it does for other elements?
Actually the only element whose name attribute is not CDATA is <meta>. See the HTML4 spec's attribute list for all the different uses of name; it's an overloaded attribute name, having many different meanings on the different elements. This is generally considered a bad thing.
However, typically these days you would avoid name except on form fields (where it's a control name) and param (where it's a plugin-specific parameter identifier). That's only two meanings to grapple with. The old-school use of name for identifying elements like <form> or <a> on the page should be avoided (use id instead).
The only real restriction on what characters can appear in form control names is when a form is submitted with GET
"The "get" method restricts form data set values to ASCII characters." reference
There's a good thread on it here.
While Allain's comment did answer OP's direct question and bobince provided some brilliant in-depth information, I believe many people come here seeking answer to more specific question: "Can I use a dot character in form's input name attribute?"
As this thread came up as first result when I searched for this knowledge I guessed I may as well share what I found.
Firstly, Matthias' claimed that:
character . are replaced by _
This is untrue. I don't know if browser's actually did this kind of operation back in 2013 - though, I doubt that. Browsers send dot characters as they are(talking about POST data)! You can check it in developer tools of any decent browser.
Please, notice that tiny little comment by abluejelly, that probably is missed by many:
I'd like to note that this is a server-specific thing, not a browser thing. Tested on Win7 FF3/3.5/31, IE5/7/8/9/10/Edge, Chrome39, and Safari Windows 5, and all of them sent " test this.stuff" (four leading spaces) as the name in POST to the ASP.NET dev server bundled with VS2012.
I checked it with Apache HTTP server(v2.4.25) and indeed input name like "foo.bar" is changed to "foo_bar". But in a name like "foo[foo.bar]" that dot is not replaced by _!
My conclusion: You can use dots but I wouldn't use it as this may lead to some unexpected behaviours depending on HTTP server used.
Do you mean the id and name attributes of the HTML input tag?
If so, I'd be very tempted to restrict (or convert) allowed "input" name characters into only a-z (A-Z), 0-9 and a limited range of punctuation (".", ",", etc.), if only to limit the potential for XSS exploits, etc.
Additionally, why let the user control any aspect of the input tag? (Might it not ultimately be easier from a validation perspective to keep the input tag names are 'custom_1', 'custom_2', etc. and then map these as required.)