Unprintable characters in MySQL - mysql

I want to test if a certain BLOB starts with character 255 (\xff). Is there a way to encode this character in a literal string?
Table 9.1. Special Character Escape Sequences gives certain special characters but not a way to encode arbitrary characters.
Failing a way to encode characters, is there a different workaround?

Use a hexadecimal literal, e.g. X'FF'. See http://dev.mysql.com/doc/refman/5.0/en/hexadecimal-literals.html

Related

Control characters in JSON string

The JSON specification states that control characters that must be escaped are only with codes from U+0000 to U+001F:
7. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
Main idea of escaping is to don't damage output when printing JSON document or message on terminal or paper.
But there other control characters like [DEL] from C0 and other control characters from C1 set (U+0080 through U+009F). Shouldn't be they also escaped in JSON strings?
From the JSON specification:
8. String and Character Issues
8.1. Character Encoding
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
In UTF-8, all codepoints above 127 are encoded in multiple bytes. About half of those bytes are in the C1 control character range. So in order to avoid having those bytes in a UTF-8 encoded JSON string, all of those code points would need to be escaped. This effectively eliminates the use of UTF-8 and the JSON string might as well be encoded in ASCII. As ASCII is a subset of UTF-8 this is not disallowed by the standard. So if you are concerned with putting C1 control characters in the byte stream just escape them, but requiring every JSON representation to use ASCII would be wildly inefficient in anything but an english environment.
UTF-16 and UTF-32 could not possibly be parsed by something that uses the C1 (or even C0) control characters so the point is rather moot for those encodings.

Difference In UTF-8 and ISO-8859-1

I have this &#2361 &#2379 and also this \u0936\u093e\u0902\u0924\u093f
But I dont know to what encoding they belong to.
The hindi font gets stored as the 1st encoding in the database.
So please tell me what type encoding it is?
And also how to get my hindi font characters in the 2nd encoding type(\u0924\u093f).
Both ह and \u0939 encode the Unicode Character Devanagari letter HA : ह
ह is the HTML entity that represents the unicode character 2361 in decimal, which is equivalent to hexadecimal 0939
\u0939 is javascript escape sequence that represents the unicode character 0939 in hexadecimal, which is the equivalent to decimal 2361.
ISO 8859 do not include those characters.

How to get the represented character for an encoding in Racket?

From an HTML file I got 使. I know it represent the Chinese character 使. But how to convert them in Racket?
You can use (integer->char 20351) to get that character, or you can use literal syntax, #\u4F7F.

Ampersand character in OBX segment causing problems - HL7 formatting

There are html equivalents for ">" and "<" ("<" and ">") in the OBX-5 field which is causing the Terser.get(..) method to only fetch the characters up to the ampersand character. The encoding characters in MSH-2 are "^~\&". Is the terser.get(..) failing because there's an encoding character in the OBX-5 field? Is there a way to change these characters to ">" and "<" easily?
Thanks a lot for your help.
Yes, it fails because the ampersand has been declared as subcomponent separator and the message you are trying to process is not valid -- it should not contain (unescaped) html character entities (< and >).
If you cannot help how the incoming messages are encoded you should preprocess the message before giving it to terser, replacing illegal characters. I'm pretty sure HAPI cannot help you there.
In a valid HL7v2 message, the data type used in OBX-5 is determined by OBX-2. OBX-5 should only contain the characters and escape sequences allowed by declared data type. < and > are among them (if not declared as separators in MSH-2).
HL7 standtard defines escape sequences for the separator and delimiter characters (e.g. \T\ is the escape sequence for subcomponent separator).

Why doesn't JSON data include special characters?

Why doesn't JSON data support special characters?
If json data includes special characters, etc:\r,/,\b,\t, you must transfer them, but why?
JSON supports all Unicode characters in strings. What do you mean by "transferring"?
Those characters need to be escaped because JSON specification says so. For some characters reasons is simple -- for example, double-quotes need to be escaped because regular double-quote ends String value, so there would be no way to tell end marker for character in content. For linefeeds reason probably was to enforce limitation that no String value spans multiple text lines; and for other control-character to avoid "invisible characters". This is similar to escaping required by XML or CSV; all textual data formats require escaping, or prohibit use of certain characters.