The command
case when ltrim(rtrim(City_old)) = ltrim(rtrim(City_New)) then 'Y'
doesn't consider case sensitive differences.
Can someone please help me in using a case-sensitive match in case when function? Thanks in advance
Case sensitivity or insensitivity is based on the string collation defined for your columns. MySQL defaults to a case-insensitive collation, so all comparisons will ignore case by default.
mysql> select case when 'city' = 'City' then 'Y' else 'N' end as matches;
+---------+
| matches |
+---------+
| Y |
+---------+
You can make a comparison in a case-sensitive manner by overriding the collation:
mysql> select case when 'city' collate utf8mb4_bin = 'City' then 'Y' else 'N' end as matches;
+---------+
| matches |
+---------+
| N |
+---------+
You must choose a collation that is compatible with the character set of the string you are comparing. You can check which compatible collations are supported by your current MySQL instance:
mysql> SELECT * FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE character_set_name='utf8mb4';
+------------------------+--------------------+
| COLLATION_NAME | CHARACTER_SET_NAME |
+------------------------+--------------------+
| utf8mb4_general_ci | utf8mb4 |
| utf8mb4_bin | utf8mb4 |
| utf8mb4_unicode_ci | utf8mb4 |
| utf8mb4_icelandic_ci | utf8mb4 |
. . .
All the collations ending with _ci are case-insensitive. The only case-sensitive option above is utf8mb4_bin.
Likewise for utf8:
mysql> SELECT * FROM INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY
WHERE character_set_name='utf8';
+--------------------------+--------------------+
| COLLATION_NAME | CHARACTER_SET_NAME |
+--------------------------+--------------------+
| utf8_general_ci | utf8 |
| utf8_bin | utf8 |
| utf8_unicode_ci | utf8 |
| utf8_icelandic_ci | utf8 |
. . .
The choice also depends on the MySQL version you use. They keep introducing new character sets and collations, trying to make MySQL support standards better. For example, in MySQL 8.0, you can use collation utf8mb4_0900_as_cs
Read https://dev.mysql.com/doc/refman/8.0/en/case-sensitivity.html for more details.
Case sensitivity in MySQL is often achieved by using the binary operator:
(case when binary ltrim(rtrim(City_old)) = binary ltrim(rtrim(City_New))
then 'Y' else 'N'
end) as is_same
This assumes that the original collation of the two strings is the same (which seems reasonable for two columns in the same table).
Related
SELECT *
FROM County
WHERE LOWER(Name) LIKE "%u%";
Im trying to return only rows where County names contain a lower case "u" somewhere in its name. For some reason with the query above I return several rows where Name only contain an upper case "U" -- which is not what I want. I dont understand...
Thanks in advance!
Try :
SELECT *
FROM County
WHERE
BINARY name like '%u%' ;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=44048a2d080036ce9905340d6ebbf3e3
CREATE TABLE County (
Name varchar(30 )
);
insert into County values
('Test1'),
('Test2'),
('Tust3'),
('TeAt4'),
('TeAt5'),
('TUst6'),
('Tust7');
Result:
Name
Tust3
Tust7
mysql> show variables like '%character%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)
Because my character_set_client is cp850, i can use the matching collating sequence latin1_general_cs. More info on this collating sequences if found in the documentation
SELECT *
FROM County
WHERE Name COLLATE latin1_general_cs LIKE "%u%"
Above query should find all records with a small letter u.
The collating sequence latin1_bin also works (as given in the other answers):
SELECT *
FROM County
WHERE Name COLLATE latin1_bin LIKE "%u%"
I have a table with a column, which has cp1251_general_ci collation. I don't want to change column collation, but I want to get data in utf8 encoding.
Is there a way to select any data somehow in a way that it looks just like a data with utf8_general_ci collation?
I.e. I need something like this
SELECT CONVERT_TO_UTF8(weirdColumn) FROM weirdTable
Here's a demo table using the cp1251 encoding. I'll insert some Cyrillic characters into it.
mysql> CREATE TABLE weirdTable (weirdColumn text) ENGINE=InnoDB DEFAULT CHARSET=cp1251;
mysql> insert into weirdTable values ('ЂЃЉЌ');
mysql> select * from weirdTable;
+-------------+
| weirdColumn |
+-------------+
| ЂЃЉЌ |
+-------------+
Use MySQL's CONVERT() function to force the characters to a different encoding:
mysql> select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
Here's proof that the result has been converted to utf8. I create a table using metadata from the query result:
mysql> create table w2
as select convert(weirdColumn using utf8) as weirdColumnUtf8 from weirdTable;
Query OK, 1 row affected (0.07 sec)
Records: 1 Duplicates: 0 Warnings: 0
mysql> show create table w2\G
*************************** 1. row ***************************
Table: w2
Create Table: CREATE TABLE `w2` (
`weirdColumnUtf8` longtext CHARACTER SET utf8
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
1 row in set (0.00 sec)
mysql> select * from w2;
+-----------------+
| weirdColumnUtf8 |
+-----------------+
| ЂЃЉЌ |
+-----------------+
On my MySQL instance, utf8mb4 is the default character encoding. That's okay; it's a superset of utf8, and the utf8 encoding is enough to store these characters. However, I generally recommend if you use utf8, there's no reason not to use utf8mb4.
If you change the character encoding, you cannot keep the cp1251 collation. Collations are specific to encodings. But you can use one of the collations associated with utf8 or utf8mb4. You can see the available collations for a given character encoding:
mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+--------------------------+---------+-----+---------+----------+---------+---------------+
...
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | PAD SPACE |
| utf8_general_mysql500_ci | utf8 | 223 | | Yes | 1 | PAD SPACE |
...
We have a MySQL InnoDB table, with a text field COLLATE utf8mb4_unicode_ci. I need to search for rows that contain any emoji characters. I've searched through quite a few SO questions, but people seem to have a list of emojis they are searching for. I'm actually looking for a solution that will find ANY emoji.
Here are some posts that are not helping.
This one seems to come closest to actually providing me with what I'm looking for, but the OP hasn't actually posted his search code.
Thanks!
I've had situation where db migration from one server to another caused emoji to disappear. So I had to find all rows in original table which contained high utf8 (emoji) characters.
This query worked as expected:
SELECT field FROM `table` WHERE HEX(field) RLIKE "^(..)*F.";
before doing anything check if you are using utf8mb4 on your db, tables AND connection:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Could this work maybe ?
Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql
[\x{23}\x{2A}\x{30}-\x{39}\x{A9}\x{AE}\x{203C}\x{2049}\x{2122}\x{2139}\x{2194}-\x{2199}\x{21A9}-\x{21AA}\x{231A}-\x{231B}\x{2328}\x{23CF}\x{23E9}-\x{23F3}\x{23F8}-\x{23FA}\x{24C2}\x{25AA}-\x{25AB}\x{25B6}\x{25C0}\x{25FB}-\x{25FE}\x{2600}-\x{2604}\x{260E}\x{2611}\x{2614}-\x{2615}\x{2618}\x{261D}\x{2620}\x{2622}-\x{2623}\x{2626}\x{262A}\x{262E}-\x{262F}\x{2638}-\x{263A}\x{2640}\x{2642}\x{2648}-\x{2653}\x{2660}\x{2663}\x{2665}-\x{2666}\x{2668}\x{267B}\x{267F}\x{2692}-\x{2697}\x{2699}\x{269B}-\x{269C}\x{26A0}-\x{26A1}\x{26AA}-\x{26AB}\x{26B0}-\x{26B1}\x{26BD}-\x{26BE}\x{26C4}-\x{26C5}\x{26C8}\x{26CE}-\x{26CF}\x{26D1}\x{26D3}-\x{26D4}\x{26E9}-\x{26EA}\x{26F0}-\x{26F5}\x{26F7}-\x{26FA}\x{26FD}\x{2702}\x{2705}\x{2708}-\x{270D}\x{270F}\x{2712}\x{2714}\x{2716}\x{271D}\x{2721}\x{2728}\x{2733}-\x{2734}\x{2744}\x{2747}\x{274C}\x{274E}\x{2753}-\x{2755}\x{2757}\x{2763}-\x{2764}\x{2795}-\x{2797}\x{27A1}\x{27B0}\x{27BF}\x{2934}-\x{2935}\x{2B05}-\x{2B07}\x{2B1B}-\x{2B1C}\x{2B50}\x{2B55}\x{3030}\x{303D}\x{3297}\x{3299}\x{1F004}\x{1F0CF}\x{1F170}-\x{1F171}\x{1F17E}-\x{1F17F}\x{1F18E}\x{1F191}-\x{1F19A}\x{1F1E6}-\x{1F1FF}\x{1F201}-\x{1F202}\x{1F21A}\x{1F22F}\x{1F232}-\x{1F23A}\x{1F250}-\x{1F251}\x{1F300}-\x{1F321}\x{1F324}-\x{1F393}\x{1F396}-\x{1F397}\x{1F399}-\x{1F39B}\x{1F39E}-\x{1F3F0}\x{1F3F3}-\x{1F3F5}\x{1F3F7}-\x{1F4FD}\x{1F4FF}-\x{1F53D}\x{1F549}-\x{1F54E}\x{1F550}-\x{1F567}\x{1F56F}-\x{1F570}\x{1F573}-\x{1F57A}\x{1F587}\x{1F58A}-\x{1F58D}\x{1F590}\x{1F595}-\x{1F596}\x{1F5A4}-\x{1F5A5}\x{1F5A8}\x{1F5B1}-\x{1F5B2}\x{1F5BC}\x{1F5C2}-\x{1F5C4}\x{1F5D1}-\x{1F5D3}\x{1F5DC}-\x{1F5DE}\x{1F5E1}\x{1F5E3}\x{1F5E8}\x{1F5EF}\x{1F5F3}\x{1F5FA}-\x{1F64F}\x{1F680}-\x{1F6C5}\x{1F6CB}-\x{1F6D2}\x{1F6E0}-\x{1F6E5}\x{1F6E9}\x{1F6EB}-\x{1F6EC}\x{1F6F0}\x{1F6F3}-\x{1F6F6}\x{1F910}-\x{1F91E}\x{1F920}-\x{1F927}\x{1F930}\x{1F933}-\x{1F93A}\x{1F93C}-\x{1F93E}\x{1F940}-\x{1F945}\x{1F947}-\x{1F94B}\x{1F950}-\x{1F95E}\x{1F980}-\x{1F991}\x{1F9C0}]
In my opinion the easy way is to create a table with all emoji codes and then make a join through like condition to your table.
I share here how to insert emotis on mysql:
create table emojis (
e varchar(100) COLLATE utf8mb4_unicode_ci
);
insert into emojis values
( _utf8mb4 0xF09F9881 COLLATE utf8mb4_unicode_ci),
( _utf8mb4 '😂' );
The final query should look like:
select distinct yt.id
from your_table yt
inner join emojis e
on yt.some_column like '%' + e.e + '%'
Why after executing set names utf8mb4, the column name changes to question mark? See below:
mysql> show variables like 'character%' ;
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> select '\U+1F600';
+------+
| 😀 |
+------+
| 😀 |
+------+
mysql> set names utf8mb4;
mysql> select '\U+1F600';
+------+
| ? |
+------+
| 😀 |
+------+
In my opinion, utf8mb4 is designed to support these emoji characters. Why changed to utf8mb4, the column name changed to question mark?
In addition, I copied the emoji character from website(http://getemoji.com/) , then pasted it in terminal.If I just type '\U+1F600' manually. See below:
mysql> select '\U+1F600' ;
+---------+
| U+1F600 |
+---------+
| U+1F600 |
+---------+
So I guess when I pasted it in terminal there is something happened implicitly. And this implicitly conversion(😀 --> '\U+1F600') maybe could explain this phenomenpon.
This would appear to be expected behaviour according to MySQL documentation, where metadata is declared to be stored in utf8 (the non-4byte version).
It is returned to the client as character_set_result (utf8mb4), however most likely your virtual column name is being stored at utf8 to be compatible and comparable with all other metadata and thus the 4-byte part of the character is lost even though it is not in a real table.
See here:
https://dev.mysql.com/doc/refman/5.6/en/charset-metadata.html
I had found more info by using wireshark. See below:
Before executing set names utf8mb4
After executing set names utf8mb4
In this case the server can't find a Charset number, so the column name become a question mark. And it seems which Charset number does not matter, just need it is not Unknow. If I execute set names latin1, the response packet info is:
These queries both give the result I expect:
SELECT sex
FROM ponies
ORDER BY sex COLLATE latin1_swedish_ci ASC
SELECT sex
FROM ponies
ORDER BY CONVERT(sex USING utf8) COLLATE utf8_general_ci ASC
| f |
| f |
| m |
| m |
+---+
But this query gives a different result:
SELECT sex FROM ponies ORDER BY sex ASC
| m |
| m |
| f |
| f |
+---+
Here's the configuration:
SHOW VARIABLES LIKE 'collation\_%'
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
The table collation is latin1_swedish_ci.
MySQL server is 5.5.16.
Table Collations
Collation defaults are stored on a table-by-table basis. There is a server-set default, but that is applied to the table at the time it is created.
To find the collation for a specific table, run this query:
SHOW TABLE STATUS LIKE 'ponies'\G
You should see output like this:
*************************** 1. row ***************************
Name: ponies
Engine: MyISAM
Version: 10
Row_format: Fixed
Rows: 8
Avg_row_length: 20
Data_length: 160
Max_data_length: 5629499534213119
Index_length: 1024
Data_free: 0
Auto_increment: NULL
Create_time: 2012-02-27 10:16:25
Update_time: 2012-02-27 10:17:40
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)
And you can see the Collation setting in that result.
Column collations
You can also override collation settings on particular columns within a table. A create table statement like this would create a latin1_swedish_ci table, with a utf8_polish_ci column:
CREATE TABLE ponies (
sex CHAR(1) COLLATE utf8_polish_ci
) CHARACTER SET latin1 COLLATE latin1_swedish_ci;
The best way to view the results of this is like this:
SHOW FULL COLUMNS FROM ponies;
Output:
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
| sex | char(1) | utf8_polish_ci | YES | | NULL | | select,insert,update,references | |
+-------+---------+----------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.00 sec)
The documentation says it uses a case insensitive character comparison by default. I don't see why you are not getting that result though.
The documentation also suggests using the binary qualifier for case sensitive comparison. I wonder if that would affect your result?:
SELECT sex FROM ponies ORDER BY BINARY sex ASC
This behaviour can be observed when sex is an ENUM in which case it is usually sorted by the numerical position in the ENUM definition. Only when a collation is explicitly given an it is sorted in alphabetical order.