MYSQL: best utf setting for ACCENT sensitive but CASE insensitive - mysql

For example, ἐν or Ἐν are the same, but should be distinguished from ἕν/Ἓν. I've tried utf8_bin which seems to be the closest, but is also case sensitive.

mysql> select 'ἐν' = 'Ἐν' collate utf8mb4_0900_as_ci;
+----------------------------------------------+
| 'ἐν' = 'Ἐν' collate utf8mb4_0900_as_ci |
+----------------------------------------------+
| 1 |
+----------------------------------------------+
mysql> select 'ἐν' = 'ἕν' collate utf8mb4_0900_as_ci;
+----------------------------------------------+
| 'ἐν' = 'ἕν' collate utf8mb4_0900_as_ci |
+----------------------------------------------+
| 0 |
+----------------------------------------------+

Related

MySQL distinct select for emoji always return 1 emoji out of 2

have these emojis stored in a reaction column. 😏, 😇
If I run this query select a distinct reaction from chat_reactions where chat_id = 593 it only select the first one. My column's collation is utf8mb4_unicode_ci but this issue doesn't happen for ❤️ (red love emoji)
How can I solve this?
Solve this by using a collation that treats the emojis as distinct.
Collation defines which characters are equal, less than, or greater than other characters.
Demo: I created a table like yours and filled it with the three emojis:
mysql> select reaction from chat_reactions;
+----------+
| reaction |
+----------+
| 😏 |
| 😇 |
| ❤️ |
+----------+
In utf8mb4_unicode_ci collation, those face emojis are considered the same, so they are collapsed into one row:
mysql> select distinct reaction collate utf8mb4_unicode_ci from chat_reactions;
+-------------------------------------+
| reaction collate utf8mb4_unicode_ci |
+-------------------------------------+
| 😏 |
| ❤️ |
+-------------------------------------+
But the more modern collations treat the face emojis as distinct:
mysql> select distinct reaction collate utf8mb4_unicode_520_ci from chat_reactions;
+-----------------------------------------+
| reaction collate utf8mb4_unicode_520_ci |
+-----------------------------------------+
| 😏 |
| 😇 |
| ❤️ |
+-----------------------------------------+
mysql> select distinct reaction collate utf8mb4_0900_ai_ci from chat_reactions;
+-------------------------------------+
| reaction collate utf8mb4_0900_ai_ci |
+-------------------------------------+
| 😏 |
| 😇 |
| ❤️ |
+-------------------------------------+

How do I make my query only return names with a specific lower-case letter?

SELECT *
FROM County
WHERE LOWER(Name) LIKE "%u%";
Im trying to return only rows where County names contain a lower case "u" somewhere in its name. For some reason with the query above I return several rows where Name only contain an upper case "U" -- which is not what I want. I dont understand...
Thanks in advance!
Try :
SELECT *
FROM County
WHERE
BINARY name like '%u%' ;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=44048a2d080036ce9905340d6ebbf3e3
CREATE TABLE County (
Name varchar(30 )
);
insert into County values
('Test1'),
('Test2'),
('Tust3'),
('TeAt4'),
('TeAt5'),
('TUst6'),
('Tust7');
Result:
Name
Tust3
Tust7
mysql> show variables like '%character%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)
Because my character_set_client is cp850, i can use the matching collating sequence latin1_general_cs. More info on this collating sequences if found in the documentation
SELECT *
FROM County
WHERE Name COLLATE latin1_general_cs LIKE "%u%"
Above query should find all records with a small letter u.
The collating sequence latin1_bin also works (as given in the other answers):
SELECT *
FROM County
WHERE Name COLLATE latin1_bin LIKE "%u%"

Why the result of "SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;" is 1?

I am a Chinese. I can not understand why the result of "SELECT 'ä' = 'ae' COLLATE latin1_german2_ci", which is the example on MySQL document, is 1?
Besides, I have read another article. It set the charset as latin1 first, and then the result of "SELECT 'ä' = 'ae' COLLATE latin1_german2_ci" becomes 0. Why is the two results of the same sql is different? Is it because the charset difference?
On MySQL document.
mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
| 1 |
+--------------------------------------+
On the another article.
mysql> set charset latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+---------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+---------------------------------------+
| 0 |
+---------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+------------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+------------------------------------------+
| 0 |
+------------------------------------------+
1 row in set (0.00 sec)

Using case sensitive match in Case when fucntion

The command
case when ltrim(rtrim(City_old)) = ltrim(rtrim(City_New)) then 'Y'
doesn't consider case sensitive differences.
Can someone please help me in using a case-sensitive match in case when function? Thanks in advance
Case sensitivity or insensitivity is based on the string collation defined for your columns. MySQL defaults to a case-insensitive collation, so all comparisons will ignore case by default.
mysql> select case when 'city' = 'City' then 'Y' else 'N' end as matches;
+---------+
| matches |
+---------+
| Y |
+---------+
You can make a comparison in a case-sensitive manner by overriding the collation:
mysql> select case when 'city' collate utf8mb4_bin = 'City' then 'Y' else 'N' end as matches;
+---------+
| matches |
+---------+
| N |
+---------+
You must choose a collation that is compatible with the character set of the string you are comparing. You can check which compatible collations are supported by your current MySQL instance:
mysql> SELECT * FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE character_set_name='utf8mb4';
+------------------------+--------------------+
| COLLATION_NAME | CHARACTER_SET_NAME |
+------------------------+--------------------+
| utf8mb4_general_ci | utf8mb4 |
| utf8mb4_bin | utf8mb4 |
| utf8mb4_unicode_ci | utf8mb4 |
| utf8mb4_icelandic_ci | utf8mb4 |
. . .
All the collations ending with _ci are case-insensitive. The only case-sensitive option above is utf8mb4_bin.
Likewise for utf8:
mysql> SELECT * FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE character_set_name='utf8';
+--------------------------+--------------------+
| COLLATION_NAME | CHARACTER_SET_NAME |
+--------------------------+--------------------+
| utf8_general_ci | utf8 |
| utf8_bin | utf8 |
| utf8_unicode_ci | utf8 |
| utf8_icelandic_ci | utf8 |
. . .
The choice also depends on the MySQL version you use. They keep introducing new character sets and collations, trying to make MySQL support standards better. For example, in MySQL 8.0, you can use collation utf8mb4_0900_as_cs
Read https://dev.mysql.com/doc/refman/8.0/en/case-sensitivity.html for more details.
Case sensitivity in MySQL is often achieved by using the binary operator:
(case when binary ltrim(rtrim(City_old)) = binary ltrim(rtrim(City_New))
then 'Y' else 'N'
end) as is_same
This assumes that the original collation of the two strings is the same (which seems reasonable for two columns in the same table).

How to set schema collation in MySQL for Japanese

I am having a problem with collation. I want to set collation to support the Japanese language. For example, when table.firstname has 'あ', a query with 'ぁ' should return the record. Thanks in advance.
That's like "uppercase" and "lowercase", correct?
mysql> SELECT 'あ' = 'ぁ' COLLATE utf8_general_ci;
+---------------------------------------+
| 'あ' = 'ぁ' COLLATE utf8_general_ci |
+---------------------------------------+
| 0 |
+---------------------------------------+
mysql> SELECT 'あ' = 'ぁ' COLLATE utf8_unicode_ci;
+---------------------------------------+
| 'あ' = 'ぁ' COLLATE utf8_unicode_ci |
+---------------------------------------+
| 1 |
+---------------------------------------+
mysql> SELECT 'あ' = 'ぁ' COLLATE utf8_unicode_520_ci;
+-------------------------------------------+
| 'あ' = 'ぁ' COLLATE utf8_unicode_520_ci |
+-------------------------------------------+
| 1 |
+-------------------------------------------+
I recommend changing your column to be COLLATION utf8_unicode_520_ci (or utf8mb4_unicode_520_ci).
If you expect to be including Chinese, then be sure to use utf8mb4. (Perhaps this advice applies to Kanji, too.)