Problem: MySQL's uuid() default collation does not compare to configured connnection collation.
I have a database + tables + fields created with charset: utf-8 and collation utf8_polish_ci.
The my.cnf is as follows:
init_connect='SET NAMES utf8 COLLATE utf8_polish_ci'
character-set-server=utf8
collation-server=utf8_polish_ci
character sets:
mysql> show variables like "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
collations:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+----------------+
| Variable_name | Value |
+----------------------+----------------+
| collation_connection | utf8_polish_ci |
| collation_database | utf8_polish_ci |
| collation_server | utf8_polish_ci |
+----------------------+----------------+
Now, when using the uuid() function, following error is returned:
mysql> select replace(uuid(),'-','');
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE) for operation 'replace'
This happens, due to uuid()'s default collation seems to be utf8_general_ci.
mysql> select charset(uuid()), collation(uuid());
+-----------------+-------------------+
| charset(uuid()) | collation(uuid()) |
+-----------------+-------------------+
| utf8 | utf8_general_ci |
+-----------------+-------------------+
Is there a way, to change the default collation used by uuid() so that it matches the collation_connection?
In our environment we write SQL updates that are executed on different MySQL databases with different collations. Therefore, to force a collation by specifying it is not an option.
(This is not really an answer, but an attempt at isolating what causes the problem and what might fix it.)
Get in a DATABASE with a totally irrelevant CHARACTER SET and COLLATION.
mysql> CREATE DATABASE `so40064402` /*!40100 DEFAULT CHARACTER SET ucs2 COLLATE ucs2_bin */
mysql> USE so40064402;
Database changed
Establish utf8_polish for the client:
mysql> SET NAMES utf8 COLLATE utf8_polish_ci;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'c%a%t%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 | -- from SET NAMES
| character_set_connection | utf8 | -- from SET NAMES
| character_set_database | ucs2 | -- from DATABASE
| character_set_filesystem | binary | -- (constant)
| character_set_results | utf8 | -- from SET NAMES
| character_set_server | utf8mb4 |
| character_set_system | utf8 | -- (constant)
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_polish_ci | -- from SET NAMES
| collation_database | ucs2_bin | -- from DATABASE
| collation_server | utf8mb4_unicode_520_ci |
+--------------------------+----------------------------+
11 rows in set (0.00 sec)
mysql> select charset(uuid()), collation(uuid());
+-----------------+-------------------+
| charset(uuid()) | collation(uuid()) |
+-----------------+-------------------+
| utf8 | utf8_general_ci | -- part of the problem, but can't fix this
+-----------------+-------------------+
1 row in set (0.00 sec)
mysql> select replace(uuid(),'-','');
ERROR 1270 (HY000): Illegal mix of collations
(utf8_general_ci,COERCIBLE),
(utf8_polish_ci,COERCIBLE),
(utf8_polish_ci,COERCIBLE) for operation 'replace'
mysql>
mysql>
mysql>
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_polish_ci;
Query OK, 0 rows affected (0.00 sec)
Now let's change SET NAMES only. Now it works!?? In spite of UUID() being utf8!?
mysql> SHOW VARIABLES LIKE 'c%a%t%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 | -- from SET NAMES
| character_set_connection | utf8mb4 | -- from SET NAMES
| character_set_database | ucs2 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 | -- from SET NAMES
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8mb4_polish_ci | -- from SET NAMES
| collation_database | ucs2_bin |
| collation_server | utf8mb4_unicode_520_ci |
+--------------------------+----------------------------+
11 rows in set (0.00 sec)
mysql> select replace(uuid(),'-','');
+----------------------------------+
| replace(uuid(),'-','') |
+----------------------------------+
| ea841aacf83b11e8a66580fa5b3669ce |
+----------------------------------+
1 row in set (0.00 sec)
mysql>
I found a straightforward, if daft-looking, workaround: cast(uuid() as char):
Example:
MariaDB [(none)]> set collation_connection = 'utf8_polish_ci';
Query OK, 0 rows affected (0.000 sec)
# Broken
MariaDB [(none)]> select replace(uuid(), '-', '');
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE) for operation 'replace'
# Working
MariaDB [(none)]> select replace(cast(uuid() as char), '-', '');
+----------------------------------------+
| replace(cast(uuid() as char), '-', '') |
+----------------------------------------+
| 0e1bc84c0ffb11ec875c0242ac140002 |
+----------------------------------------+
1 row in set (0.000 sec)
What I believe is happening here is that uuid() is generating a string with what is ultimately an arbitrarily-selected charset+collation (utf8_general_ci I guess). Casting it to char converts it to a string with the connection-specified charset+collation. This should match the charset+collation of the '-' and '' literals in the query.
MariaDB 10.3 is my environment. If I'm interpreting this correctly, I suspect the collation mismatch in the output of uuid() is a bug - it should match connection collation here - and so I expect its behavior to change eventually.
Related
I am trying to create a database using the utf8mb4 character set and utf8mb4_unicode_ci collation. However, I don't seem to be able to insert unicode characters into my tables.
What I have done:
SET NAMES utf8mb4;
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE mydb;
CREATE TABLE test (val VARCHAR(16));
INSERT INTO test (val) VALUES ("á");
ERROR 1366 (22007): Incorrect string value: '\xA0' for column `mydb`.`test`.`val` at row 1
If I don't use SET NAMES utf8mb4;, then I can insert the "á" character without issue.
These are my default character set variables:
show variables like 'char%'; show variables like 'collation%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MariaDB 10.5\share\charsets\ |
+--------------------------+-----------------------------------------------+
8 rows in set (0.000 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | cp850_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8_general_ci |
+----------------------+--------------------+
3 rows in set (0.000 sec)
And after using SET NAMES:
show variables like 'char%'; show variables like 'collation%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MariaDB 10.5\share\charsets\ |
+--------------------------+-----------------------------------------------+
8 rows in set (0.000 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8_general_ci |
+----------------------+--------------------+
3 rows in set (0.000 sec)
How can I fix this issue so I can insert characters in the utf8mb4 character set?
Your text (or .sql) file itself is encoded in cp850 and not in utf-8.
You can see that encoded value is a single byte - UTF-8 encoding should be at least 2 bytes.
In order to use SET NAMES utf8mb4; command, your file needs to be converted to utf-8. Some advanced editors allow that, and even windows notepad can save a text file as utf-8 in modern versions.
If you are using Windows cmd, the command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too.
To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console
I did a conversion of my database to utf8mb4, yet it still returns incorrect UTF8 characters:
For example, Café becomes Café
Here are my mysql collation variables:
mysql> SHOW VARIABLES LIKE 'char%'; SHOW VARIABLES LIKE 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
Also, my DB has slowed down at least 10x since switching to utf8.
Mojibake. This is the classic case of
The bytes you have in the client are correctly encoded in utf8mb4 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8mb4.)
The column in the tables may or may not have been CHARACTER SET utf8mb4, but it should have been that.
If you need to fix for the data it takes a "2-step ALTER", something like
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8mb4 ...;
where the lengths are big enough and the other "..." have whatever else (NOT NULL, etc) was already on the column.
When insert emoji character in mysql interactive interface, I found some phenomena very confusing. Hope someone could clear it. Now see below:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
CREATE TABLE `t` (
`data` varchar(100) CHARACTER SET utf8mb4 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into t select '\U+1F600';
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x80' for column 'data' at row 1
mysql> set names utf8mb4;
mysql> insert into t select '\U+1F600';
Query OK, 1 row affected (0.00 sec)
mysql> select * from t;
+------+
| data |
+------+
| 😀 |
+------+
mysql> select data, hex(data) from t;
+------+-----------+
| data | hex(data) |
+------+-----------+
| 😀 | F09F9880 |
+------+-----------+
Why do I need execute set names utf8mb4 explicitly? From error message, it seems it resolved the data content to four byte(f0 9f 98 80) successully? Why still can't insert successfully?
Below is another puzzle for me.
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> insert into t select '\U+1F600';
Query OK, 1 row affected (0.01 sec)
mysql> select data,hex(data) from t;
+------+--------------------+
| data | hex(data) |
+------+--------------------+
| 😀 | C3B0C5B8CB9CE282AC |
+------+--------------------+
I have to say I feel a little shock about this. In my opinion only utf8mb4 support emoji character, but now latin1 support emoji character too.
Anybody can clear it for me. Thanks!
You can insert UTF8 data into a latin1 table, but MySQL won't treat the byte stream as a UTF8 character. So you won't be able to query against it for example. If your application understands the UTF8 byte stream then it will look like its working OK. But the table charset really needs to be utf8 (or utf8mb4) if MySQL is to understand those bytes as Unicode characters.
I've implemented simple web server to have proxy to MySQL using 'sqljocky' package.
And I have issue with character encoding, cyrillic glyphs displays incorrectly:
ÐавÑдов ÐиÑалий instead of Давыдов Витайлий
EDIT: Table collation is utf8_general_ci.
I've tried to query SET NAMES UTF8:
pool.query('set names utf8');
[UPDATED] Then I've created my.cnf in /etc/ directory with this content:
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Output of show variables like "%char%";
mysql> show variables like "%char%";
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.12-osx10.7-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0,00 sec)
Output of show variables like 'collation%'
mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0,00 sec)
But still having incorrect displayed characters.
How to get it displayed correctly?
There was a bug in sqljocky which meant that unicode characters weren't being encoded properly. I have fixed some of the places where the bug was occurring in v0.5.5 which I have just published. When I have more time I will make sure that it is fixed everywhere.
I don't understand why the following comparison don't work in utf8 (mysql 5.5.27 - dotdeb).
select '00:00:00' < curtime();
Can you explain how mysql evaluates this expression because I don't see any logical reasons why this is not working :
mysql> set names 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> select '00:00:00' < curtime();
ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,NUMERIC) for operation '<'
mysql> set names 'latin1';
Query OK, 0 rows affected (0.00 sec)
mysql> select '00:00:00' < curtime();
+------------------------+
| '00:00:00' < curtime() |
+------------------------+
| 1 |
+------------------------+
1 row in set (0.00 sec
An other example, it works if the comparison is done in a certain order :
mysql> set names 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> select '00:00:00' < curtime();
ERROR 1267 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,NUMERIC) for operation '<'
mysql> select curtime() > '00:00:00';
+------------------------+
| curtime() > '00:00:00' |
+------------------------+
| 1 |
+------------------------+
1 row in set (0.00 sec)
The collation and character variables in utf8 :
mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
This is working with MariaDb (5.5.25-MariaDB)
MariaDB [(none)]> set names 'utf8';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> select '00:00:00' < curtime();
+------------------------+
| '00:00:00' < curtime() |
+------------------------+
| 1 |
+------------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
MariaDB [(none)]> SHOW VARIABLES LIKE 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
You are trying to compare different data types - string and time, in this case MySQL uses type conversion and throws an error. Try to use STR_TO_DATE function to convert string to TIME and compare two identical types:
SET NAMES 'utf8';
SELECT STR_TO_DATE('00:00:00', '%H:%i:%s') < CURTIME();
More information here - Type Conversion in Expression Evaluation.
Check this solution, its working.
You need to cast value to time.
Eg 1 -
select cast('00:00:00' as time) < curtime();
Eg 2 -
CURTIME()<= ADDTIME(time_column,'-00:15:00')