Problem: MySQL's uuid() default collation does not compare to configured connnection collation.
I have a database + tables + fields created with charset: utf-8 and collation utf8_polish_ci.
The my.cnf is as follows:
init_connect='SET NAMES utf8 COLLATE utf8_polish_ci'
character-set-server=utf8
collation-server=utf8_polish_ci
character sets:
mysql> show variables like "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
collations:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+----------------+
| Variable_name | Value |
+----------------------+----------------+
| collation_connection | utf8_polish_ci |
| collation_database | utf8_polish_ci |
| collation_server | utf8_polish_ci |
+----------------------+----------------+
Now, when using the uuid() function, following error is returned:
mysql> select replace(uuid(),'-','');
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE) for operation 'replace'
This happens, due to uuid()'s default collation seems to be utf8_general_ci.
mysql> select charset(uuid()), collation(uuid());
+-----------------+-------------------+
| charset(uuid()) | collation(uuid()) |
+-----------------+-------------------+
| utf8 | utf8_general_ci |
+-----------------+-------------------+
Is there a way, to change the default collation used by uuid() so that it matches the collation_connection?
In our environment we write SQL updates that are executed on different MySQL databases with different collations. Therefore, to force a collation by specifying it is not an option.
(This is not really an answer, but an attempt at isolating what causes the problem and what might fix it.)
Get in a DATABASE with a totally irrelevant CHARACTER SET and COLLATION.
mysql> CREATE DATABASE `so40064402` /*!40100 DEFAULT CHARACTER SET ucs2 COLLATE ucs2_bin */
mysql> USE so40064402;
Database changed
Establish utf8_polish for the client:
mysql> SET NAMES utf8 COLLATE utf8_polish_ci;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'c%a%t%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 | -- from SET NAMES
| character_set_connection | utf8 | -- from SET NAMES
| character_set_database | ucs2 | -- from DATABASE
| character_set_filesystem | binary | -- (constant)
| character_set_results | utf8 | -- from SET NAMES
| character_set_server | utf8mb4 |
| character_set_system | utf8 | -- (constant)
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_polish_ci | -- from SET NAMES
| collation_database | ucs2_bin | -- from DATABASE
| collation_server | utf8mb4_unicode_520_ci |
+--------------------------+----------------------------+
11 rows in set (0.00 sec)
mysql> select charset(uuid()), collation(uuid());
+-----------------+-------------------+
| charset(uuid()) | collation(uuid()) |
+-----------------+-------------------+
| utf8 | utf8_general_ci | -- part of the problem, but can't fix this
+-----------------+-------------------+
1 row in set (0.00 sec)
mysql> select replace(uuid(),'-','');
ERROR 1270 (HY000): Illegal mix of collations
(utf8_general_ci,COERCIBLE),
(utf8_polish_ci,COERCIBLE),
(utf8_polish_ci,COERCIBLE) for operation 'replace'
mysql>
mysql>
mysql>
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_polish_ci;
Query OK, 0 rows affected (0.00 sec)
Now let's change SET NAMES only. Now it works!?? In spite of UUID() being utf8!?
mysql> SHOW VARIABLES LIKE 'c%a%t%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 | -- from SET NAMES
| character_set_connection | utf8mb4 | -- from SET NAMES
| character_set_database | ucs2 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 | -- from SET NAMES
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8mb4_polish_ci | -- from SET NAMES
| collation_database | ucs2_bin |
| collation_server | utf8mb4_unicode_520_ci |
+--------------------------+----------------------------+
11 rows in set (0.00 sec)
mysql> select replace(uuid(),'-','');
+----------------------------------+
| replace(uuid(),'-','') |
+----------------------------------+
| ea841aacf83b11e8a66580fa5b3669ce |
+----------------------------------+
1 row in set (0.00 sec)
mysql>
I found a straightforward, if daft-looking, workaround: cast(uuid() as char):
Example:
MariaDB [(none)]> set collation_connection = 'utf8_polish_ci';
Query OK, 0 rows affected (0.000 sec)
# Broken
MariaDB [(none)]> select replace(uuid(), '-', '');
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE) for operation 'replace'
# Working
MariaDB [(none)]> select replace(cast(uuid() as char), '-', '');
+----------------------------------------+
| replace(cast(uuid() as char), '-', '') |
+----------------------------------------+
| 0e1bc84c0ffb11ec875c0242ac140002 |
+----------------------------------------+
1 row in set (0.000 sec)
What I believe is happening here is that uuid() is generating a string with what is ultimately an arbitrarily-selected charset+collation (utf8_general_ci I guess). Casting it to char converts it to a string with the connection-specified charset+collation. This should match the charset+collation of the '-' and '' literals in the query.
MariaDB 10.3 is my environment. If I'm interpreting this correctly, I suspect the collation mismatch in the output of uuid() is a bug - it should match connection collation here - and so I expect its behavior to change eventually.
i have a problem with inserting data with polish chars to Mysql DB. Im working on windows 8 and Ubuntu. At Windows there is no problem but on ubuntu i can not insert that kind of chars: "żąśźćłż" in place of them i get: "?????". I have checked with TRACE lvl of logging. My application put correct Strings to prepared query but in db i see "???????". I can insert that kind of chars via cmd and its ok, so problably there is some problem with connector? Or some other settings. I have tried change:
mysql> show variables like "collation%";;
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+--------------------+
to
utf8_general_ci
every where but after service(mysql) restart its come back with the same with
mysql> show variables like "character%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I can not set utf8 for database and server.
Anyone have some ideas?
Adding the line
character_set_server = utf8
in the [mysqld] section of the MySQL configuration file (my.ini or my.cnf) should set the new value the next time the MySQL server is started.
I did a conversion of my database to utf8mb4, yet it still returns incorrect UTF8 characters:
For example, Café becomes Café
Here are my mysql collation variables:
mysql> SHOW VARIABLES LIKE 'char%'; SHOW VARIABLES LIKE 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
Also, my DB has slowed down at least 10x since switching to utf8.
Mojibake. This is the classic case of
The bytes you have in the client are correctly encoded in utf8mb4 (good).
You connected with SET NAMES latin1 (or set_charset('latin1') or ...), probably by default. (It should have been utf8mb4.)
The column in the tables may or may not have been CHARACTER SET utf8mb4, but it should have been that.
If you need to fix for the data it takes a "2-step ALTER", something like
ALTER TABLE Tbl MODIFY COLUMN col VARBINARY(...) ...;
ALTER TABLE Tbl MODIFY COLUMN col VARCHAR(...) ... CHARACTER SET utf8mb4 ...;
where the lengths are big enough and the other "..." have whatever else (NOT NULL, etc) was already on the column.
I've implemented simple web server to have proxy to MySQL using 'sqljocky' package.
And I have issue with character encoding, cyrillic glyphs displays incorrectly:
ÐавÑдов ÐиÑалий instead of Давыдов Витайлий
EDIT: Table collation is utf8_general_ci.
I've tried to query SET NAMES UTF8:
pool.query('set names utf8');
[UPDATED] Then I've created my.cnf in /etc/ directory with this content:
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci
Output of show variables like "%char%";
mysql> show variables like "%char%";
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.12-osx10.7-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0,00 sec)
Output of show variables like 'collation%'
mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0,00 sec)
But still having incorrect displayed characters.
How to get it displayed correctly?
There was a bug in sqljocky which meant that unicode characters weren't being encoded properly. I have fixed some of the places where the bug was occurring in v0.5.5 which I have just published. When I have more time I will make sure that it is fixed everywhere.
Update:
it turns out this is not directly related to the DB server itself, but the clients encoding. If the client uses encoding utf8, the german character is rendered incorrectly. But if the client uses encoding cp850, then the german character is rendered correctly. But I need to use utf8 since there might be other class of characters that the app needs to deal with. what should I do?
Original:
I have two database servers, viewing from the same mysql cient, server1 is rendering the german characters correctly, server2 is not. the following are the differences. But I'm baffled since server2 uses utf8 more. What could be the cause of this?
server1's encodings:
mysql> SHOW VARIABLES LIKE "character\_set\_database";
+------------------------+--------+
| Variable_name | Value |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
server2
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE "character\_set\_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)