I am a Chinese. I can not understand why the result of "SELECT 'ä' = 'ae' COLLATE latin1_german2_ci", which is the example on MySQL document, is 1?
Besides, I have read another article. It set the charset as latin1 first, and then the result of "SELECT 'ä' = 'ae' COLLATE latin1_german2_ci" becomes 0. Why is the two results of the same sql is different? Is it because the charset difference?
On MySQL document.
mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
| 1 |
+--------------------------------------+
On the another article.
mysql> set charset latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+---------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+---------------------------------------+
| 0 |
+---------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+------------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+------------------------------------------+
| 0 |
+------------------------------------------+
1 row in set (0.00 sec)
Related
For example, ἐν or Ἐν are the same, but should be distinguished from ἕν/Ἓν. I've tried utf8_bin which seems to be the closest, but is also case sensitive.
mysql> select 'ἐν' = 'Ἐν' collate utf8mb4_0900_as_ci;
+----------------------------------------------+
| 'ἐν' = 'Ἐν' collate utf8mb4_0900_as_ci |
+----------------------------------------------+
| 1 |
+----------------------------------------------+
mysql> select 'ἐν' = 'ἕν' collate utf8mb4_0900_as_ci;
+----------------------------------------------+
| 'ἐν' = 'ἕν' collate utf8mb4_0900_as_ci |
+----------------------------------------------+
| 0 |
+----------------------------------------------+
In MySQL, what is the best way of programmatically retrieving the character set and the collation of the current database?
Is the following:
SELECT
default_character_set_name, default_collation_name
FROM
information_schema.SCHEMATA
WHERE
SCHEMA_NAME = SCHEMA()
identical to the below example?
select ##character_set_database, ##collation_database
According to the documentation:
character_set_database
...
The character set used by the default
database. The server sets this variable whenever the default database
changes. If there is no default database, the variable has the same
value as character_set_server.
...
and
collation_database
...
The collation used by the default database. The
server sets this variable whenever the default database changes. If
there is no default database, the variable has the same value as
collation_server.
...
with both sentences would obtain the same result:
SELECT
default_character_set_name, default_collation_name
FROM
information_schema.SCHEMATA
WHERE
SCHEMA_NAME = SCHEMA()
and
select ##character_set_database, ##collation_database
demonstrated in the following test:
mysql> DROP DATABASE IF EXISTS `my_database`;
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT SCHEMA();
+----------+
| SCHEMA() |
+----------+
| NULL |
+----------+
1 row in set (0.00 sec)
mysql> SELECT
-> ##SESSION.character_set_database,
-> ##SESSION.collation_database,
-> ##SESSION.character_set_server,
-> ##SESSION.collation_server;
+----------------------------------+------------------------------+--------------------------------+----------------------------+
| ##SESSION.character_set_database | ##SESSION.collation_database | ##SESSION.character_set_server | ##SESSION.collation_server |
+----------------------------------+------------------------------+--------------------------------+----------------------------+
| latin1 | latin1_swedish_ci | latin1 | latin1_swedish_ci |
+----------------------------------+------------------------------+--------------------------------+----------------------------+
1 row in set (0.00 sec)
mysql> CREATE DATABASE IF NOT EXISTS `my_database`
-> CHARACTER SET utf8mb4
-> COLLATE utf8mb4_general_ci;
Query OK, 1 row affected (0.00 sec)
mysql> SELECT SCHEMA();
+----------+
| SCHEMA() |
+----------+
| NULL |
+----------+
1 row in set (0.00 sec)
mysql> SELECT
-> ##SESSION.character_set_database,
-> ##SESSION.collation_database,
-> ##SESSION.character_set_server,
-> ##SESSION.collation_server;
+----------------------------------+------------------------------+--------------------------------+----------------------------+
| ##SESSION.character_set_database | ##SESSION.collation_database | ##SESSION.character_set_server | ##SESSION.collation_server |
+----------------------------------+------------------------------+--------------------------------+----------------------------+
| latin1 | latin1_swedish_ci | latin1 | latin1_swedish_ci |
+----------------------------------+------------------------------+--------------------------------+----------------------------+
1 row in set (0.00 sec)
mysql> USE `my_database`;
Database changed
mysql> SELECT SCHEMA();
+-------------+
| SCHEMA() |
+-------------+
| my_database |
+-------------+
1 row in set (0.00 sec)
mysql> SELECT
-> ##SESSION.character_set_database,
-> ##SESSION.collation_database,
-> ##SESSION.character_set_server,
-> ##SESSION.collation_server;
+----------------------------------+------------------------------+--------------------------------+----------------------------+
| ##SESSION.character_set_database | ##SESSION.collation_database | ##SESSION.character_set_server | ##SESSION.collation_server |
+----------------------------------+------------------------------+--------------------------------+----------------------------+
| utf8mb4 | utf8mb4_general_ci | latin1 | latin1_swedish_ci |
+----------------------------------+------------------------------+--------------------------------+----------------------------+
1 row in set (0.00 sec)
mysql> SELECT
-> `DEFAULT_CHARACTER_SET_NAME`,
-> `DEFAULT_COLLATION_NAME`
-> FROM
-> `information_schema`.`SCHEMATA`
-> WHERE
-> SCHEMA_NAME = SCHEMA();
+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8mb4 | utf8mb4_general_ci |
+----------------------------+------------------------+
1 row in set (0.00 sec)
however, a union of both sentences would not be wrong:
mysql> USE `my_database`;
Database changed
mysql> SELECT
-> `DEFAULT_CHARACTER_SET_NAME`,
-> `DEFAULT_COLLATION_NAME`
-> FROM
-> `information_schema`.`SCHEMATA`
-> WHERE
-> SCHEMA_NAME = SCHEMA() AND
-> `DEFAULT_CHARACTER_SET_NAME` = ##SESSION.character_set_database AND
-> `DEFAULT_COLLATION_NAME` = ##SESSION.collation_database;
+----------------------------+------------------------+
| DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+----------------------------+------------------------+
| utf8mb4 | utf8mb4_general_ci |
+----------------------------+------------------------+
1 row in set (0.00 sec)
I am having a problem with collation. I want to set collation to support the Japanese language. For example, when table.firstname has 'あ', a query with 'ぁ' should return the record. Thanks in advance.
That's like "uppercase" and "lowercase", correct?
mysql> SELECT 'あ' = 'ぁ' COLLATE utf8_general_ci;
+---------------------------------------+
| 'あ' = 'ぁ' COLLATE utf8_general_ci |
+---------------------------------------+
| 0 |
+---------------------------------------+
mysql> SELECT 'あ' = 'ぁ' COLLATE utf8_unicode_ci;
+---------------------------------------+
| 'あ' = 'ぁ' COLLATE utf8_unicode_ci |
+---------------------------------------+
| 1 |
+---------------------------------------+
mysql> SELECT 'あ' = 'ぁ' COLLATE utf8_unicode_520_ci;
+-------------------------------------------+
| 'あ' = 'ぁ' COLLATE utf8_unicode_520_ci |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
I recommend changing your column to be COLLATION utf8_unicode_520_ci (or utf8mb4_unicode_520_ci).
If you expect to be including Chinese, then be sure to use utf8mb4. (Perhaps this advice applies to Kanji, too.)
In a Django application with MySQL DB back-end users try to insert notes which contain some smileys and hearts and stuff which are Unicode characters. MySQL refuses the operations with an error:
(1366, "Incorrect string value: '\\xE2\\x9D\\xA4\\xEF\\xB8\\x8F' for column 'note' at row 1")
(The column in question has longtext type. The Unicode characters in this case valid, it's a heart and a modifier https://codepoints.net/U+2764 https://codepoints.net/U+FE0F, so it's not that they would be 4 byte long UTF-8 characters. I made sure that MySQL's default character set is utf-8.)
What is interesting is that I cannot fully reproduce this error on my local developer environment. One particular difference is that it only emits a warning for that anomaly.
Update1:
This is still bothering to me:
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name="sblive";
+----------------------------+
| default_character_set_name |
+----------------------------+
| latin1 |
+----------------------------+
1 row in set (0.00 sec)
I converted the specific table's charset to utf-8:
mysql> alter table uploads_uploads convert to character set utf8 COLLATE utf8_general_ci;
Query OK, 1209036 rows affected (1 min 10.31 sec)
Records: 1209036 Duplicates: 0 Warnings: 0
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "sblive" AND table_name = "uploads_uploads" AND column_name = "note";
+--------------------+
| character_set_name |
+--------------------+
| utf8 |
+--------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE '%colla%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
You are asking for ❤️ followed by a "non-spacing" "VARIATION SELECTOR-16".
Your bytes are utf8 -- good
Your connection needs to specify utf8 -- does it?
Your TEXT column need to be declared CHARACTER SET utf8 -- is it? Use SHOW CREATE TABLE to verify.
If you are using HTML, it needs to say charset=UTF-8 -- does it?
Suggest you switch to utf8mb4 if the 'back-end users' are likely to enter more emoticons -- the 'Emoji' will need it.
Addenda
Let's check the data... Please run this
SELECT col, HEX(col) FROM ...
Those two character should deliver hex E29DA4 and EFB88F. If you see C3A2C29DC2A4C3AFC2B8C28F, you have "double encoding", which is a messier problem. 2764FE0F would indicate utf16, I think.
The value cut off after that character 💀
Why this happening?
create table tmp2(t1 varchar(100));
insert into tmp2 values('before💀after');
mysql> select * from tmp2;
+--------+
| t1 |
+--------+
| before |
+--------+
1 row in set (0.01 sec)
I ran followed commands and returned some useful information
mysql> SHOW FULL COLUMNS FROM tmp2;
+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| t1 | varchar(100) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |
+-------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
and this,
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "test" AND table_name = "tmp2" AND column_name = "t1";
+--------------------+
| character_set_name |
+--------------------+
| utf8 |
+--------------------+
1 row in set (0.00 sec)
Im testing this on ubuntu/mysql command line.
I found the solution here
I learnt some characters are not includes in utf8
There is a good article here
I needed to change column utf8 to utf8mb4 and it worked
alter table tmp2 modify t1 varchar(100) character set utf8mb4;
SET NAMES utf8mb4;
insert tmp2 values('before💀after');