When insert emoji character in mysql interactive interface, I found some phenomena very confusing. Hope someone could clear it. Now see below:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
CREATE TABLE `t` (
`data` varchar(100) CHARACTER SET utf8mb4 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into t select '\U+1F600';
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x80' for column 'data' at row 1
mysql> set names utf8mb4;
mysql> insert into t select '\U+1F600';
Query OK, 1 row affected (0.00 sec)
mysql> select * from t;
+------+
| data |
+------+
| ๐ |
+------+
mysql> select data, hex(data) from t;
+------+-----------+
| data | hex(data) |
+------+-----------+
| ๐ | F09F9880 |
+------+-----------+
Why do I need execute set names utf8mb4 explicitly? From error message, it seems it resolved the data content to four byte(f0 9f 98 80) successully? Why still can't insert successfully?
Below is another puzzle for me.
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> insert into t select '\U+1F600';
Query OK, 1 row affected (0.01 sec)
mysql> select data,hex(data) from t;
+------+--------------------+
| data | hex(data) |
+------+--------------------+
| ๐ | C3B0C5B8CB9CE282AC |
+------+--------------------+
I have to say I feel a little shock about this. In my opinion only utf8mb4 support emoji character, but now latin1 support emoji character too.
Anybody can clear it for me. Thanks!
You can insert UTF8 data into a latin1 table, but MySQL won't treat the byte stream as a UTF8 character. So you won't be able to query against it for example. If your application understands the UTF8 byte stream then it will look like its working OK. But the table charset really needs to be utf8 (or utf8mb4) if MySQL is to understand those bytes as Unicode characters.
Related
I currently have MariaDB version 10.4.18 on CentOS 8.0. When I'm trying to save a string with stylized fonts like below,
๐๐ด๐ต๐ณ๐ช ๐๐ฏ๐ข๐ฏ๐ต๐ข
MariaDB saved them as "??? ????"
The statement
mysql> insert into testings(test) values ('๐๐ด๐ต๐ณ๐ช ๐๐ฏ๐ข๐ฏ๐ต๐ข');
Here is my database's charset and collation
mysql> select ##collation_database;
+----------------------+
| ##collation_database |
+----------------------+
| utf8mb4_unicode_ci |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ##character_set_database;
+--------------------------+
| ##character_set_database |
+--------------------------+
| utf8mb4 |
+--------------------------+
The table
mysql> SHOW FULL COLUMNS FROM testings;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test | text | utf8mb4_unicode_ci | NO | | NULL | | select,insert,update,references | |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
Can anyone point me to right direction?
Answered by #Akina, I edited my database config with parameters below
SET collation_connection = 'utf8mb4_unicode_ci';
SET character_set_client = 'utf8mb4';
SET character_set_results = 'utf8mb4';
SET character_set_system = 'utf8mb4';
Now it works!
I am trying to create a database using the utf8mb4 character set and utf8mb4_unicode_ci collation. However, I don't seem to be able to insert unicode characters into my tables.
What I have done:
SET NAMES utf8mb4;
CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE mydb;
CREATE TABLE test (val VARCHAR(16));
INSERT INTO test (val) VALUES ("รก");
ERROR 1366 (22007): Incorrect string value: '\xA0' for column `mydb`.`test`.`val` at row 1
If I don't use SET NAMES utf8mb4;, then I can insert the "รก" character without issue.
These are my default character set variables:
show variables like 'char%'; show variables like 'collation%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MariaDB 10.5\share\charsets\ |
+--------------------------+-----------------------------------------------+
8 rows in set (0.000 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | cp850_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8_general_ci |
+----------------------+--------------------+
3 rows in set (0.000 sec)
And after using SET NAMES:
show variables like 'char%'; show variables like 'collation%';
+--------------------------+-----------------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MariaDB 10.5\share\charsets\ |
+--------------------------+-----------------------------------------------+
8 rows in set (0.000 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8_general_ci |
+----------------------+--------------------+
3 rows in set (0.000 sec)
How can I fix this issue so I can insert characters in the utf8mb4 character set?
Your text (or .sql) file itself is encoded in cp850 and not in utf-8.
You can see that encoded value is a single byte - UTF-8 encoding should be at least 2 bytes.
In order to use SET NAMES utf8mb4; command, your file needs to be converted to utf-8. Some advanced editors allow that, and even windows notepad can save a text file as utf-8 in modern versions.
If you are using Windows cmd, the command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too.
To set the font in the console window: Right-click on the title of the window โ Properties โ Font โ pick Lucida Console
Problem: MySQL's uuid() default collation does not compare to configured connnection collation.
I have a database + tables + fields created with charset: utf-8 and collation utf8_polish_ci.
The my.cnf is as follows:
init_connect='SET NAMES utf8 COLLATE utf8_polish_ci'
character-set-server=utf8
collation-server=utf8_polish_ci
character sets:
mysql> show variables like "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
collations:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+----------------+
| Variable_name | Value |
+----------------------+----------------+
| collation_connection | utf8_polish_ci |
| collation_database | utf8_polish_ci |
| collation_server | utf8_polish_ci |
+----------------------+----------------+
Now, when using the uuid() function, following error is returned:
mysql> select replace(uuid(),'-','');
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE) for operation 'replace'
This happens, due to uuid()'s default collation seems to be utf8_general_ci.
mysql> select charset(uuid()), collation(uuid());
+-----------------+-------------------+
| charset(uuid()) | collation(uuid()) |
+-----------------+-------------------+
| utf8 | utf8_general_ci |
+-----------------+-------------------+
Is there a way, to change the default collation used by uuid() so that it matches the collation_connection?
In our environment we write SQL updates that are executed on different MySQL databases with different collations. Therefore, to force a collation by specifying it is not an option.
(This is not really an answer, but an attempt at isolating what causes the problem and what might fix it.)
Get in a DATABASE with a totally irrelevant CHARACTER SET and COLLATION.
mysql> CREATE DATABASE `so40064402` /*!40100 DEFAULT CHARACTER SET ucs2 COLLATE ucs2_bin */
mysql> USE so40064402;
Database changed
Establish utf8_polish for the client:
mysql> SET NAMES utf8 COLLATE utf8_polish_ci;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW VARIABLES LIKE 'c%a%t%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 | -- from SET NAMES
| character_set_connection | utf8 | -- from SET NAMES
| character_set_database | ucs2 | -- from DATABASE
| character_set_filesystem | binary | -- (constant)
| character_set_results | utf8 | -- from SET NAMES
| character_set_server | utf8mb4 |
| character_set_system | utf8 | -- (constant)
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8_polish_ci | -- from SET NAMES
| collation_database | ucs2_bin | -- from DATABASE
| collation_server | utf8mb4_unicode_520_ci |
+--------------------------+----------------------------+
11 rows in set (0.00 sec)
mysql> select charset(uuid()), collation(uuid());
+-----------------+-------------------+
| charset(uuid()) | collation(uuid()) |
+-----------------+-------------------+
| utf8 | utf8_general_ci | -- part of the problem, but can't fix this
+-----------------+-------------------+
1 row in set (0.00 sec)
mysql> select replace(uuid(),'-','');
ERROR 1270 (HY000): Illegal mix of collations
(utf8_general_ci,COERCIBLE),
(utf8_polish_ci,COERCIBLE),
(utf8_polish_ci,COERCIBLE) for operation 'replace'
mysql>
mysql>
mysql>
mysql> SET NAMES utf8mb4 COLLATE utf8mb4_polish_ci;
Query OK, 0 rows affected (0.00 sec)
Now let's change SET NAMES only. Now it works!?? In spite of UUID() being utf8!?
mysql> SHOW VARIABLES LIKE 'c%a%t%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 | -- from SET NAMES
| character_set_connection | utf8mb4 | -- from SET NAMES
| character_set_database | ucs2 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 | -- from SET NAMES
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | utf8mb4_polish_ci | -- from SET NAMES
| collation_database | ucs2_bin |
| collation_server | utf8mb4_unicode_520_ci |
+--------------------------+----------------------------+
11 rows in set (0.00 sec)
mysql> select replace(uuid(),'-','');
+----------------------------------+
| replace(uuid(),'-','') |
+----------------------------------+
| ea841aacf83b11e8a66580fa5b3669ce |
+----------------------------------+
1 row in set (0.00 sec)
mysql>
I found a straightforward, if daft-looking, workaround: cast(uuid() as char):
Example:
MariaDB [(none)]> set collation_connection = 'utf8_polish_ci';
Query OK, 0 rows affected (0.000 sec)
# Broken
MariaDB [(none)]> select replace(uuid(), '-', '');
ERROR 1270 (HY000): Illegal mix of collations (utf8_general_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE), (utf8_polish_ci,COERCIBLE) for operation 'replace'
# Working
MariaDB [(none)]> select replace(cast(uuid() as char), '-', '');
+----------------------------------------+
| replace(cast(uuid() as char), '-', '') |
+----------------------------------------+
| 0e1bc84c0ffb11ec875c0242ac140002 |
+----------------------------------------+
1 row in set (0.000 sec)
What I believe is happening here is that uuid() is generating a string with what is ultimately an arbitrarily-selected charset+collation (utf8_general_ci I guess). Casting it to char converts it to a string with the connection-specified charset+collation. This should match the charset+collation of the '-' and '' literals in the query.
MariaDB 10.3 is my environment. If I'm interpreting this correctly, I suspect the collation mismatch in the output of uuid() is a bug - it should match connection collation here - and so I expect its behavior to change eventually.
I have a table named CHINESE which has only one column NAME.
The output of SHOW VARIABLES LIKE 'char%' is:
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.1.73-osx10.6-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
When I run this query: INSERT INTO CHINESE VALUES ('ไฝ ๅฅฝ'), the values get inserted.
But, when I try to execute this query: SELECT * FROM CHINESE, the result is:
+------+
| NAME |
+------+
| ?? |
+------+
The result of SELECT HEX(NAME) FROM CHINESE is:
+-----------+
| HEX(NAME) |
+-----------+
| 3F3F |
+-----------+
Where am I making mistake?
If mysql>=5.5.3, use utf8mb4 .
Alter origin table
ALTER TABLE $tablename
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_general_ci
Create new table
CREATE TABLE $tablename (
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Modify Column
ALTER TABLE $tablename
MODIFY $col1
VARCHAR(191)
CHARACTER SET utf8mb4;
refer: Mysql DOC: Column Character Set Conversion
Try the following to change the character set: SET NAMES 'big5';
Update:
it turns out this is not directly related to the DB server itself, but the clients encoding. If the client uses encoding utf8, the german character is rendered incorrectly. But if the client uses encoding cp850, then the german character is rendered correctly. But I need to use utf8 since there might be other class of characters that the app needs to deal with. what should I do?
Original:
I have two database servers, viewing from the same mysql cient, server1 is rendering the german characters correctly, server2 is not. the following are the differences. But I'm baffled since server2 uses utf8 more. What could be the cause of this?
server1's encodings:
mysql> SHOW VARIABLES LIKE "character\_set\_database";
+------------------------+--------+
| Variable_name | Value |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
server2
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE "character\_set\_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)