Convert Mix of Ascii and Escape Sequences Into Binary - binary

I have an encoded binary given as a long string of ascii, hex escape sequences, and null escape sequences. How can I convert it back into a binary?
sample:
Res\x01\x0c\0\0\0\x12\xff\x88\x01\x01\xfccave\x02>\0\x01\x03EOF\0");
The solutions i've tried are:
Convert hex to binary
https://unix.stackexchange.com/questions/352569/converting-from-binary-to-hex-and-back
They either can't account for ascii, or escaped null chars.

Related

Control characters in JSON string

The JSON specification states that control characters that must be escaped are only with codes from U+0000 to U+001F:
7. Strings
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
Main idea of escaping is to don't damage output when printing JSON document or message on terminal or paper.
But there other control characters like [DEL] from C0 and other control characters from C1 set (U+0080 through U+009F). Shouldn't be they also escaped in JSON strings?
From the JSON specification:
8. String and Character Issues
8.1. Character Encoding
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
In UTF-8, all codepoints above 127 are encoded in multiple bytes. About half of those bytes are in the C1 control character range. So in order to avoid having those bytes in a UTF-8 encoded JSON string, all of those code points would need to be escaped. This effectively eliminates the use of UTF-8 and the JSON string might as well be encoded in ASCII. As ASCII is a subset of UTF-8 this is not disallowed by the standard. So if you are concerned with putting C1 control characters in the byte stream just escape them, but requiring every JSON representation to use ASCII would be wildly inefficient in anything but an english environment.
UTF-16 and UTF-32 could not possibly be parsed by something that uses the C1 (or even C0) control characters so the point is rather moot for those encodings.

Difference In UTF-8 and ISO-8859-1

I have this &#2361 &#2379 and also this \u0936\u093e\u0902\u0924\u093f
But I dont know to what encoding they belong to.
The hindi font gets stored as the 1st encoding in the database.
So please tell me what type encoding it is?
And also how to get my hindi font characters in the 2nd encoding type(\u0924\u093f).
Both ह and \u0939 encode the Unicode Character Devanagari letter HA : ह
ह is the HTML entity that represents the unicode character 2361 in decimal, which is equivalent to hexadecimal 0939
\u0939 is javascript escape sequence that represents the unicode character 0939 in hexadecimal, which is the equivalent to decimal 2361.
ISO 8859 do not include those characters.

Does MySql's HEX() return any other characters other than alphanumeric?

I'm doing HEX(aes_encrypt('my data'),'key') in a script of mine, and was wondering if HEX() returns data other than alphanumeric?
Does HEX() return special characters or anything other than alphanumeric data?
The Documentation states:
Hex() -
For a string argument str, HEX() returns a hexadecimal string
representation of str where each character in str is converted to two
hexadecimal digits. The inverse of this operation is performed by the
UNHEX() function.
I would say the answer is no.
No - it returns only pairs of hexadecimal characters, ie characters from the set: 0123456789ABCDEF

confused by html5, utf-8 and 8859-1

Yesterday I upgraded an html page from "4.01 strict" to html5.
* http://r0k.us/rock/games/CoH/HallsOfHeroes/
The character encoding is iso-8859-1. The http://validator.w3.org fails and won't even parse it when utf-8 is specified as charset, apparently because I use footnote characters such as ² . They are in the upper 128 bytes of the character set. What confuses me is that I keep reading that the first 256 bytes of utf-8 is 8859-1.
Does anyone know why the page won't validate as utf-8 ?
Actually, only the first 128 code points are encoded in UTF-8 as ASCII, but UTF-8 is not ASCII, in particular, the next 128 code points differ.
You need to re-save the files as UTF-8 if you want them to be served as UTF-8.
The character ² ("SUPERSCRIPT TWO") is represented by the number 0xb2 (178 decimal) -- but it's represented differently in 8859-1 and UTF-8.
In 8859-1, it's represented as a single byte with the value 0xb2.
In UTF-8, it's represented as two consecutive bytes with the values 0xc2, 0xb2. See here for an explanation of the encoding.
(8859-1 is more compact that UTF-8 for files containing 8-bit characters, but it's incapable of representing anything past 255. UTF-8 is compatible with ASCII and with 8859-1 for 7-bit characters, is reasonably compact for most text, and can represent more than a million distinct characters.)
A file containing only 7-bit characters can be interpreted either as ASCII, 8859-1, or UTF-8. A file containing 8-bit characters cannot; it has to be translated.
If you're on a Unix-like system with the iconv command installed, this:
iconv -f iso-8859-1 -t utf-8
will perform the appropriate translation.

Unprintable characters in MySQL

I want to test if a certain BLOB starts with character 255 (\xff). Is there a way to encode this character in a literal string?
Table 9.1. Special Character Escape Sequences gives certain special characters but not a way to encode arbitrary characters.
Failing a way to encode characters, is there a different workaround?
Use a hexadecimal literal, e.g. X'FF'. See http://dev.mysql.com/doc/refman/5.0/en/hexadecimal-literals.html