How would I search for text that contains emojis? - mysql

We have a MySQL InnoDB table, with a text field COLLATE utf8mb4_unicode_ci. I need to search for rows that contain any emoji characters. I've searched through quite a few SO questions, but people seem to have a list of emojis they are searching for. I'm actually looking for a solution that will find ANY emoji.
Here are some posts that are not helping.
This one seems to come closest to actually providing me with what I'm looking for, but the OP hasn't actually posted his search code.
Thanks!

I've had situation where db migration from one server to another caused emoji to disappear. So I had to find all rows in original table which contained high utf8 (emoji) characters.
This query worked as expected:
SELECT field FROM `table` WHERE HEX(field) RLIKE "^(..)*F.";
before doing anything check if you are using utf8mb4 on your db, tables AND connection:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+

Could this work maybe ?
Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql
[\x{23}\x{2A}\x{30}-\x{39}\x{A9}\x{AE}\x{203C}\x{2049}\x{2122}\x{2139}\x{2194}-\x{2199}\x{21A9}-\x{21AA}\x{231A}-\x{231B}\x{2328}\x{23CF}\x{23E9}-\x{23F3}\x{23F8}-\x{23FA}\x{24C2}\x{25AA}-\x{25AB}\x{25B6}\x{25C0}\x{25FB}-\x{25FE}\x{2600}-\x{2604}\x{260E}\x{2611}\x{2614}-\x{2615}\x{2618}\x{261D}\x{2620}\x{2622}-\x{2623}\x{2626}\x{262A}\x{262E}-\x{262F}\x{2638}-\x{263A}\x{2640}\x{2642}\x{2648}-\x{2653}\x{2660}\x{2663}\x{2665}-\x{2666}\x{2668}\x{267B}\x{267F}\x{2692}-\x{2697}\x{2699}\x{269B}-\x{269C}\x{26A0}-\x{26A1}\x{26AA}-\x{26AB}\x{26B0}-\x{26B1}\x{26BD}-\x{26BE}\x{26C4}-\x{26C5}\x{26C8}\x{26CE}-\x{26CF}\x{26D1}\x{26D3}-\x{26D4}\x{26E9}-\x{26EA}\x{26F0}-\x{26F5}\x{26F7}-\x{26FA}\x{26FD}\x{2702}\x{2705}\x{2708}-\x{270D}\x{270F}\x{2712}\x{2714}\x{2716}\x{271D}\x{2721}\x{2728}\x{2733}-\x{2734}\x{2744}\x{2747}\x{274C}\x{274E}\x{2753}-\x{2755}\x{2757}\x{2763}-\x{2764}\x{2795}-\x{2797}\x{27A1}\x{27B0}\x{27BF}\x{2934}-\x{2935}\x{2B05}-\x{2B07}\x{2B1B}-\x{2B1C}\x{2B50}\x{2B55}\x{3030}\x{303D}\x{3297}\x{3299}\x{1F004}\x{1F0CF}\x{1F170}-\x{1F171}\x{1F17E}-\x{1F17F}\x{1F18E}\x{1F191}-\x{1F19A}\x{1F1E6}-\x{1F1FF}\x{1F201}-\x{1F202}\x{1F21A}\x{1F22F}\x{1F232}-\x{1F23A}\x{1F250}-\x{1F251}\x{1F300}-\x{1F321}\x{1F324}-\x{1F393}\x{1F396}-\x{1F397}\x{1F399}-\x{1F39B}\x{1F39E}-\x{1F3F0}\x{1F3F3}-\x{1F3F5}\x{1F3F7}-\x{1F4FD}\x{1F4FF}-\x{1F53D}\x{1F549}-\x{1F54E}\x{1F550}-\x{1F567}\x{1F56F}-\x{1F570}\x{1F573}-\x{1F57A}\x{1F587}\x{1F58A}-\x{1F58D}\x{1F590}\x{1F595}-\x{1F596}\x{1F5A4}-\x{1F5A5}\x{1F5A8}\x{1F5B1}-\x{1F5B2}\x{1F5BC}\x{1F5C2}-\x{1F5C4}\x{1F5D1}-\x{1F5D3}\x{1F5DC}-\x{1F5DE}\x{1F5E1}\x{1F5E3}\x{1F5E8}\x{1F5EF}\x{1F5F3}\x{1F5FA}-\x{1F64F}\x{1F680}-\x{1F6C5}\x{1F6CB}-\x{1F6D2}\x{1F6E0}-\x{1F6E5}\x{1F6E9}\x{1F6EB}-\x{1F6EC}\x{1F6F0}\x{1F6F3}-\x{1F6F6}\x{1F910}-\x{1F91E}\x{1F920}-\x{1F927}\x{1F930}\x{1F933}-\x{1F93A}\x{1F93C}-\x{1F93E}\x{1F940}-\x{1F945}\x{1F947}-\x{1F94B}\x{1F950}-\x{1F95E}\x{1F980}-\x{1F991}\x{1F9C0}]

In my opinion the easy way is to create a table with all emoji codes and then make a join through like condition to your table.
I share here how to insert emotis on mysql:
create table emojis (
e varchar(100) COLLATE utf8mb4_unicode_ci
);
insert into emojis values
( _utf8mb4 0xF09F9881 COLLATE utf8mb4_unicode_ci),
( _utf8mb4 '😂' );
The final query should look like:
select distinct yt.id
from your_table yt
inner join emojis e
on yt.some_column like '%' + e.e + '%'

Related

How do I make my query only return names with a specific lower-case letter?

SELECT *
FROM County
WHERE LOWER(Name) LIKE "%u%";
Im trying to return only rows where County names contain a lower case "u" somewhere in its name. For some reason with the query above I return several rows where Name only contain an upper case "U" -- which is not what I want. I dont understand...
Thanks in advance!
Try :
SELECT *
FROM County
WHERE
BINARY name like '%u%' ;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=44048a2d080036ce9905340d6ebbf3e3
CREATE TABLE County (
Name varchar(30 )
);
insert into County values
('Test1'),
('Test2'),
('Tust3'),
('TeAt4'),
('TeAt5'),
('TUst6'),
('Tust7');
Result:
Name
Tust3
Tust7
mysql> show variables like '%character%';
+--------------------------+---------------------------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------------------------+
| character_set_client | cp850 |
| character_set_connection | cp850 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | cp850 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.00 sec)
Because my character_set_client is cp850, i can use the matching collating sequence latin1_general_cs. More info on this collating sequences if found in the documentation
SELECT *
FROM County
WHERE Name COLLATE latin1_general_cs LIKE "%u%"
Above query should find all records with a small letter u.
The collating sequence latin1_bin also works (as given in the other answers):
SELECT *
FROM County
WHERE Name COLLATE latin1_bin LIKE "%u%"

MySQL 5.6 create view with unicode character set

MySQL 5.6. I can't get a string constant within a view to populate correctly against a database with default UCS2 character set. Works fine on 5.7.
I've created a minimally reproducible example, below.
DROP SCHEMA IF EXISTS test3;
CREATE SCHEMA test3 CHARACTER SET ucs2;
CONNECT test3;
CREATE TABLE testtable (
testname VARCHAR(15)
);
INSERT INTO testTable( testname ) VALUES ('foo');
INSERT INTO testTable( testname ) VALUES ('bar');
CREATE OR REPLACE VIEW testview AS
SELECT * FROM testtable
WHERE testname = 'foo';
SELECT * FROM testview;
^^^ This select statement returns no results.
MySQL [test3]> show create view testview \G
*************************** 1. row ***************************
View: testview
Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost`
SQL SECURITY DEFINER VIEW `testview` AS select `testtable`.`testname` AS
`testname` from `testtable` where (`testtable`.`testname` = '\0\0\0f\0\0\0o\0\0\0o')
character_set_client: utf8
collation_connection: utf8_general_ci
What is that, utf32??
The following does work, but I don't want to write the collation directly into the statement, as this needs to be portable code and the syntax looks non-standard:
CREATE OR REPLACE VIEW testview AS
SELECT * FROM testtable
WHERE testname = 'foo' COLLATE utf8_general_ci;
I have tried setting the client, connection, and server character sets to ucs2 and utf16 but this changed nothing. Likewise with the collations to *_general_ci.
Any ideas?
Edit:
MySQL [test3]> show variables like "char%";
+--------------------------+------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | ucs2 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\mysql-5.6.36-winx64\share\charsets\ |
+--------------------------+------------------------------------------------------------+
There is essentially no reason to ever use usc2 or utf16 or utf32 in MySQL tables. Use utf8mb4 only. (Or utf8 if you have an old version of MySQL.)
Please provide SHOW VARIABLES LIKE "char%"; Certain things should not be changed:
mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary | <--
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 | <--
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
When you created the view, you did not set the charset. I can see that from your SHOW when it said:
character_set_client: utf8

Does mysql latin1 also support emoji character?

Now because below phenomenon I feel I totally do not understand character set. At first I think only utf8mb4 support Emoji character e.g. 😀.
See below:
As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters
But accidentally I found this phenomenon,see below:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> show create table t4\G
*************************** 1. row ***************************
Table: t4
Create Table: CREATE TABLE `t4` (
`data` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into t4 select '\U+1F600';
mysql> select * from t4;
+------+
| data |
+------+
| 😀 |
+------+
Now I'm very confused, it seems latin1 also could support emoji character. I know it must be an illusion, but I don't know how to clear it?
You cannot store anything other than iso-8859-1 characters into an latin1 field without converting it to e.g. base64
It might work, but will fail later at some point. In special having multibyte characters like emoticons.

MySQL UTF8 Issue

Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.

phpMyAdmin mysql saved Emoji as Question mark

I want to save Emoji into MySql database, and I realize, three bytes Emoji is saved correctly in the database, but 4 byte emoji have been saved as question marks. It seems like I did fully convert utf8 to utf8mb4, but I dont know what exactly is missing here. My MySQL version is 5.5.29, when I do a SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; in MySql shell, it shows the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Now, for testing purpose, I only have 1 database with 1 table created to test emoji saving. I created the database through phpMyAdmin, and created the table through MySql shell:
CREATE TABLE `test_emojis` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
and it still does not work (still question marks).
However, I found something interesting, I see question marks in phpMyAdmin, but I can see emoji icon properly in Mysql shell if I type select * from test_emoji; any ideas?
Can someone help please?
Thanks
phpMyAdmin has hardcoded utf8 charset so you would have to edit it's code to change this. For future versions it's fixed in fb30c14 (this shows you also where to change these values).
Upgrade your phpMyAdmin to >= 4.3.9 and problem solved.