I am trying to insert some emoji characters into a table in MySQL, but values are stored as question marks (????).
I made sure to create the database with the proper utf8mb4 encoding:
mysql> describe users;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(191) | YES | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.01 sec)
Then I tried to make sure, does MySql understand emoji or not, so I did this:
mysql> select '🌰';
+------+
| 🌰 |
+------+
| 🌰 |
+------+
1 row in set (0.00 sec)
Then I did this:
mysql> insert into users (name) values ('🌰');
Query OK, 1 row affected, 1 warning (0.05 sec)
mysql> select * from users;
+----+------------+
| id | name |
+----+------------+
| 21 | فاضل |
| 30 | سلاحف |
| 46 | ???? |
| 47 | ???? |
| 48 | ???? |
| 49 | ???? |
+----+------------+
6 rows in set (0.01 sec)
I don't know what to do to fix that..
** EDIT ** : as requested in the comments, I ran the following command:
mysql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-------------------------+
| Variable_name | Value |
+--------------------------+-------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /static/share/charsets/ |
+--------------------------+-------------------------+
8 rows in set (0.00 sec)
Your connection is set up for utf8; it needs to be set up for utf8mb4.
How did you set it? Change to whichever of these applies.
SET NAMES utf8mb4
PDO(... charset=utf8mb4)
mysqli::set_charset('utf8mb4')
etc
Emoji are 4-byte utf8 codes, hence the four question marks.
Related
I currently have MariaDB version 10.4.18 on CentOS 8.0. When I'm trying to save a string with stylized fonts like below,
𝘈𝘴𝘵𝘳𝘪 𝘈𝘯𝘢𝘯𝘵𝘢
MariaDB saved them as "??? ????"
The statement
mysql> insert into testings(test) values ('𝘈𝘴𝘵𝘳𝘪 𝘈𝘯𝘢𝘯𝘵𝘢');
Here is my database's charset and collation
mysql> select ##collation_database;
+----------------------+
| ##collation_database |
+----------------------+
| utf8mb4_unicode_ci |
+----------------------+
1 row in set (0.00 sec)
mysql> SELECT ##character_set_database;
+--------------------------+
| ##character_set_database |
+--------------------------+
| utf8mb4 |
+--------------------------+
The table
mysql> SHOW FULL COLUMNS FROM testings;
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| test | text | utf8mb4_unicode_ci | NO | | NULL | | select,insert,update,references | |
+-------+------+--------------------+------+-----+---------+-------+---------------------------------+---------+
Can anyone point me to right direction?
Answered by #Akina, I edited my database config with parameters below
SET collation_connection = 'utf8mb4_unicode_ci';
SET character_set_client = 'utf8mb4';
SET character_set_results = 'utf8mb4';
SET character_set_system = 'utf8mb4';
Now it works!
While investigating an issue with a failing migration I found out the following strange behaviour. Using SET NAMES utf8 on my client session changes the behaviour of REPLACE(uuid(),'','') calls.
mysql> select replace(uuid(),'','') from mysql.user;
+--------------------------------------+
| replace(uuid(),'','') |
+--------------------------------------+
| 4b483d57-ecdc-11e8-844f-0242ac120002 |
| 4b483d57-ecdc-11e8-844f-0242ac120002 |
| 4b483d57-ecdc-11e8-844f-0242ac120002 |
+--------------------------------------+
3 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.01 sec)
mysql> select replace(uuid(),'','') from mysql.user;
+--------------------------------------+
| replace(uuid(),'','') |
+--------------------------------------+
| 539c0b5c-ecdc-11e8-844f-0242ac120002 |
| 539c0b79-ecdc-11e8-844f-0242ac120002 |
| 539c0b7f-ecdc-11e8-844f-0242ac120002 |
+--------------------------------------+
3 rows in set (0.01 sec)
As you can see the generated UUID's are unique only after setting NAMES to utf8. The way I found out about SET NAMES utf8 was passing the query through MySQL Workbench.
I would greatly appreciate if someone here can explain how character sets (NAMES) influence the output of REPLACE(UUID(), ...) calls. Thanks in advance.
Update: adding a snippet to prove that the problem 1) is not with UUID() generating non-unique values and 2) relates to utf8mb4 charset
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
mysql> select uuid, replace(uuid,'','') from ( select uuid() as uuid from mysql.user) tmp;
+--------------------------------------+--------------------------------------+
| uuid | replace(uuid,'','') |
+--------------------------------------+--------------------------------------+
| e41d2fe4-ed70-11e8-844f-0242ac120002 | e41d2dc2-ed70-11e8-844f-0242ac120002 |
| e41d3042-ed70-11e8-844f-0242ac120002 | e41d2dc2-ed70-11e8-844f-0242ac120002 |
| e41d309c-ed70-11e8-844f-0242ac120002 | e41d2dc2-ed70-11e8-844f-0242ac120002 |
+--------------------------------------+--------------------------------------+
3 rows in set (0.00 sec)
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> select uuid, replace(uuid,'','') from ( select uuid() as uuid from mysql.user) tmp;
+--------------------------------------+--------------------------------------+
| uuid | replace(uuid,'','') |
+--------------------------------------+--------------------------------------+
| e9059092-ed70-11e8-844f-0242ac120002 | e9059117-ed70-11e8-844f-0242ac120002 |
| e90591a1-ed70-11e8-844f-0242ac120002 | e905923e-ed70-11e8-844f-0242ac120002 |
| e9059380-ed70-11e8-844f-0242ac120002 | e90593e1-ed70-11e8-844f-0242ac120002 |
+--------------------------------------+--------------------------------------+
3 rows in set (0.00 sec)
mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> select uuid, replace(uuid,'','') from ( select uuid() as uuid from mysql.user) tmp;
+--------------------------------------+--------------------------------------+
| uuid | replace(uuid,'','') |
+--------------------------------------+--------------------------------------+
| ef564f32-ed70-11e8-844f-0242ac120002 | ef564d0c-ed70-11e8-844f-0242ac120002 |
| ef564fa4-ed70-11e8-844f-0242ac120002 | ef564d0c-ed70-11e8-844f-0242ac120002 |
| ef565019-ed70-11e8-844f-0242ac120002 | ef564d0c-ed70-11e8-844f-0242ac120002 |
+--------------------------------------+--------------------------------------+
3 rows in set (0.00 sec)
Update 2: Adding EXPLAIN queries below to trace the actual SQL code as suggested by #qbolec. That reveals that the use of CONVERT(... using ...) is the culprit for the non-unique UUIDs. I still do not exactly understand why as this is not the behaviour I expect from CONVERT() func.
mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select replace(uuid(),'','') from mysql.user;\W
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 7 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select replace(convert(uuid() using utf8mb4),'','') AS `replace(uuid(),'','')` from `mysql`.`user`
Show warnings enabled.
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> explain select replace(uuid(),'','') from mysql.user;\W
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | user | NULL | index | NULL | PRIMARY | 276 | NULL | 7 | 100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select replace(uuid(),'','') AS `replace(uuid(),'','')` from `mysql`.`user`
Show warnings enabled.
after Mysql installation , I create a table and add some records like this
create table score (
student_id int,
subject varchar(10),
subject_score int
);
insert into score values (001,'数学',50),(002,'数学',60),(003,'数学',70);
and the result is
mysql> select * from score;
+------------+---------+---------------+
| student_id | subject | subject_score |
+------------+---------+---------------+
| 1 | ?? | 50 |
| 2 | ?? | 60 |
| 3 | ?? | 70 |
+------------+---------+---------------+
3 rows in set (0.00 sec)
and then I changed "character set"
mysql> alter table score character set utf8;
mysql> alter table score modify subject varchar(10) character set utf8;
mysql> set character_set_server=utf8;
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.5.47-linux2.6-x86_64/share/charsets/ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)
and my table struture is
mysql> show create table score\G;
*************************** 1. row ***************************
Table: score
Create Table: CREATE TABLE `score` (
`student_id` int(11) DEFAULT NULL,
`subject` varchar(10) DEFAULT NULL,
`subject_score` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
I think everything is fine but when I query my table, Chinese character still cannot display correctly
mysql> select * from score;
+------------+---------+---------------+
| student_id | subject | subject_score |
+------------+---------+---------------+
| 1 | ?? | 50 |
| 2 | ?? | 60 |
| 3 | ?? | 70 |
+------------+---------+---------------+
3 rows in set (0.00 sec)
but when I insert the same records again , new record is fine!
mysql> insert into score values (001,'数学',50),(002,'数学',60),(003,'数学',70);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from score;
+------------+---------+---------------+
| student_id | subject | subject_score |
+------------+---------+---------------+
| 1 | ?? | 50 |
| 2 | ?? | 60 |
| 3 | ?? | 70 |
| 1 | 数学 | 50 |
| 2 | 数学 | 60 |
| 3 | 数学 | 70 |
+------------+---------+-------
how to save my old records? I need somebody help and thank you very much
Use UTF-8 when you create table.
create table table_name () CHARACTER SET = utf8;
Use UTF-8 when you insert to table
set username utf8; INSERT INTO table_name (ABC,VAL);
When insert emoji character in mysql interactive interface, I found some phenomena very confusing. Hope someone could clear it. Now see below:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
CREATE TABLE `t` (
`data` varchar(100) CHARACTER SET utf8mb4 DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into t select '\U+1F600';
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x80' for column 'data' at row 1
mysql> set names utf8mb4;
mysql> insert into t select '\U+1F600';
Query OK, 1 row affected (0.00 sec)
mysql> select * from t;
+------+
| data |
+------+
| 😀 |
+------+
mysql> select data, hex(data) from t;
+------+-----------+
| data | hex(data) |
+------+-----------+
| 😀 | F09F9880 |
+------+-----------+
Why do I need execute set names utf8mb4 explicitly? From error message, it seems it resolved the data content to four byte(f0 9f 98 80) successully? Why still can't insert successfully?
Below is another puzzle for me.
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> insert into t select '\U+1F600';
Query OK, 1 row affected (0.01 sec)
mysql> select data,hex(data) from t;
+------+--------------------+
| data | hex(data) |
+------+--------------------+
| 😀 | C3B0C5B8CB9CE282AC |
+------+--------------------+
I have to say I feel a little shock about this. In my opinion only utf8mb4 support emoji character, but now latin1 support emoji character too.
Anybody can clear it for me. Thanks!
You can insert UTF8 data into a latin1 table, but MySQL won't treat the byte stream as a UTF8 character. So you won't be able to query against it for example. If your application understands the UTF8 byte stream then it will look like its working OK. But the table charset really needs to be utf8 (or utf8mb4) if MySQL is to understand those bytes as Unicode characters.
I've declared a field in my INNODB/MySQL table as
VARCHAR(255) CHARACTER SET utf8 NOT NULL
however when inserting my data is truncated at 255 bytes not characters. This
might chop the trailing two bite code point iemphasized textn two leaving an invalid character.
Any ideas what I might be doing wrong
EDIT:
A sample session is like this
mysql> update channel set comment="ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬x" where id = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Rows matched: 1 Changed: 0 Warnings: 1
mysql> select id, channelName, comment from channel;
+----+-------------+------------------------------------------------------------------------------------------
| id | channelName | comment |
+----+-------------+-----------------------------------------------------------------------------------------
| 1 | foo | ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩ�� |
+----+-------------+-----------------------------------------------------------------------------------------
1 row in set (0.00 sec)
via mysql-admin I look at the comment field and see that it is indeed VARCHAR(255) and uses "UTF-8 Unicode"
from the command
show full columns from channel
I get
+-----------------------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-----------------------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| channelName | varchar(255) | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
| comment | varchar(255) | utf8_general_ci | NO | | NULL | | select,insert,update,references | |
+-----------------------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
mysql> SHOW VARIABLES LIKE 'character_set%'
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
According to the manual, you should be fine:
MySQL interprets length specifications in character column definitions in character units. (Before MySQL 4.1, column lengths were interpreted in bytes.) This applies to CHAR, VARCHAR, and the TEXT types.
Do you happen to be using a pre-4.1 version of mySQL?
This is a stab in the dark, but are you using UTF-8 as the connection and client character sets? Issue SHOW VARIABLES LIKE 'character_set%' and see whether it tells you UTF-8 or latin-1.
Perhaps if you are using the wrong connection/client character sets, the UTF-8 bytes are reinterpreted as single-byte characters and stored that way in the database.