Does MySql's HEX() return any other characters other than alphanumeric? - mysql

I'm doing HEX(aes_encrypt('my data'),'key') in a script of mine, and was wondering if HEX() returns data other than alphanumeric?
Does HEX() return special characters or anything other than alphanumeric data?

The Documentation states:
Hex() -
For a string argument str, HEX() returns a hexadecimal string
representation of str where each character in str is converted to two
hexadecimal digits. The inverse of this operation is performed by the
UNHEX() function.
I would say the answer is no.

No - it returns only pairs of hexadecimal characters, ie characters from the set: 0123456789ABCDEF

Related

MySQL REGEXP word boundary detection with german umlauts when using BINARY Operator

I made a strange discovery. If I execute the following SQL-Command:
SELECT 'Konzessionäre' REGEXP '[[:<:]]Konzession[[:>:]]'
it gives me the result - as expected - 0
But if I do the same together with the BINARY operator:
SELECT BINARY 'Konzessionäre' REGEXP '[[:<:]]Konzession[[:>:]]'
the result ist 1, so I think there is a MySQL problem with the regexp word boundary detection and german umlauts (like here the "ä") in conjunction with the BINARY Operator. As another example I can do this query:
SELECT BINARY 'Konzessionsäre' REGEXP '[[:<:]]Konzession[[:>:]]'
So here the result is 0 - as I would expect. So how can I solve this? Is this probably a bug in MySQL?
Thanks
By casting your string as BINARY you have stripped its associated character set property. So it's unclear how the word-boundary pattern should match. I'd guess it matches only ASCII values A-Z, a-z, 0-9, and also _.
When casting the string as BINARY, MySQL knows nothing about any other higher character values that also should be considered alphanumeric, because which characters should be alphanumeric depends on the character set.
I guess you are using BINARY to make this a case-sensitive regular expression search. Apparently, this has the unintended consequence of spoiling the word-boundary pattern-match.
You should not use BINARY in this comparison. You could do a secondary comparison to check for case-sensitive matching, but not with word boundaries.
SELECT (BINARY 'Konzessionäre' REGEXP 'Konzession') AND ('Konzessionäre' REGEXP '[[:<:]]Konzession[[:>:]]')
MySQL's REGEXP works with bytes, not characters. So, in CHARACTER SET utf8, ä is 2 bytes. It is unclear what the definition of "word boundary" in such a situation.
Recent versions of MariaDB have a better regexp engine.

MYSQL commands and Encoding

I will use PhP notation for variables to make life easy.
Suppose the database is UTF-8, and the client is set to UTF-8.
There are two sides to the question. Knowing that the ASCII for ' (quote) is 39 decimal:
Client Side
When the query variable, $title, is escaped, using function such as real_escape_string(), will the function escape all bytes that have value of 39 separately? Or will it see if the byte of value 39 is a part of UTF-8 symbol?
Server Side
SELECT * from STORIES WHERE title = 'Hello'
What does MYSQL assume the query encoding to be? This includes the part:
SELECT * from STORIES WHERE title = '
Then if a $filteredTitle happens to have the byte 39 in it which is part of a UTF-8 symbol, how does MYSQL know that it is not a quote?
Let's look at two issues.
When providing a SELECT statement, strings must be "escaped". Otherwise, there would be syntax problems with quotes inside quotes. In particular ', ", and \ must be preceded by a backslash to avoid confusion. mysqli_real_escape_string() provides that function. (Don't use mysql_real_escape_string(), it belongs to the deprecated mysql_* API.) No "un-escaping" is needed when you SELECT the string.
The ascii apostrophe (decimal 39, hex 27, sometimes called "single quote") is commonly used in many programming languages for quoting strings. A long list of utf8 "quotes" can be found here.

how to detect thai language in SQL query

I have a column in a table which is a string, and some of those strings have thai language in it, so an example of a thai string is:
อักษรไทย
Is there such way to query/find a string like this in a column?
You could search for strings that start with a character in the Thai Unicode block (i.e. between U+0E01 and U+0E5B):
WHERE string BETWEEN 'ก' AND '๛'
Of course this won't include strings that start with some other character and go on to include Thai language, such as those that start with a number. For that, you would have to use a much less performant regular expression:
WHERE string RLIKE '[ก-๛]'
Note however the warning in the manual:
Warning
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
You can do some back and forth conversion between character sets.
where convert(string, 'AL32UTF8') =
convert(convert(string, 'TH8TISASCII'), 'AL32UTF8', 'TH8TISASCII' )
will be true if string is made only of thai and ASCII, so if you add
AND convert(string, 'AL32UTF8') != convert(string, 'US7ASCII')
you filter out the strings made only of ASCII and you get the strings made of thai.
Unfortunately, this will not work if your strings contain something outside of ASCII and Thai.
Note: Some of the convert may be superfluous depending on your database default encoding.

Unprintable characters in MySQL

I want to test if a certain BLOB starts with character 255 (\xff). Is there a way to encode this character in a literal string?
Table 9.1. Special Character Escape Sequences gives certain special characters but not a way to encode arbitrary characters.
Failing a way to encode characters, is there a different workaround?
Use a hexadecimal literal, e.g. X'FF'. See http://dev.mysql.com/doc/refman/5.0/en/hexadecimal-literals.html

iPhone: Decode characters like \U05de

I used SBJsonParser to parse a json string.
inside, instead of hebrew chars, I got a string full of chars in a form like \U05de
what would be the best way to decode these back to hebrew chars,
so i can put these on controls like UIFieldView?
Eventually I ran a loop iterating in the string for the chars \u
in the loop, when detected such a substring, i took a range of 6 characters since that index,
giving me a substring for example \u052v that need to be fixed.
on this string, i ran the method [str JSONValue], which gave me the correct char, then i simply replaced all occurrences of \u052v (for example) with the latter corrected char.