I want to save Emoji into MySql database, and I realize, three bytes Emoji is saved correctly in the database, but 4 byte emoji have been saved as question marks. It seems like I did fully convert utf8 to utf8mb4, but I dont know what exactly is missing here. My MySQL version is 5.5.29, when I do a SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%'; in MySql shell, it shows the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Now, for testing purpose, I only have 1 database with 1 table created to test emoji saving. I created the database through phpMyAdmin, and created the table through MySql shell:
CREATE TABLE `test_emojis` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`content` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
and it still does not work (still question marks).
However, I found something interesting, I see question marks in phpMyAdmin, but I can see emoji icon properly in Mysql shell if I type select * from test_emoji; any ideas?
Can someone help please?
Thanks
phpMyAdmin has hardcoded utf8 charset so you would have to edit it's code to change this. For future versions it's fixed in fb30c14 (this shows you also where to change these values).
Upgrade your phpMyAdmin to >= 4.3.9 and problem solved.
Related
I have a database in which I require that only two fields in one table allow for 4 byte emojis to be stored. I did this (obviously with the correct table and column names):
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
I know it worked, because when I do show create table chatbots_proposalarea; it shows me this:
CREATE TABLE `chatbots_proposalarea` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(50) COLLATE utf8mb4_bin NOT NULL,
`proposal` varchar(1500) COLLATE utf8mb4_bin DEFAULT NULL,
`candidate_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `chatbots_proposalare_candidate_id_6465160e_fk_chatbots_` (`candidate_id`),
CONSTRAINT `chatbots_proposalare_candidate_id_6465160e_fk_chatbots_` FOREIGN KEY (`candidate_id`) REFERENCES `chatbots_candidate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
The fields "name" and "proposal" do seem to have the uft8mb4_bin collation, and it showed no error when running the commands. However, when I try and save a value there such as "Seguridad 🍞", it gives me the error
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x8D\x9E' for column 'name' at row 1
Any help in discovering what am I missing would be very appreciated.
NOTES
This is on a Django project, and mounted on an Ubuntu server, and the SQL version is mysql Ver 14.14 Distrib 5.7.28, for Linux (x86_64)
I don't see why that would make a difference, but the same happens when I update directly to the database by doing
UPDATE chatbots_proposalarea SET name='Seguridad 🍞' where id=1;
A solution was recommended, but that one depended on the usage of triggers, and that was the cause of the issue, which isn't my case.
UPDATE
If it shows any important information, when I run show variables where Variable_name like 'character\_set\_%' or Variable_name like 'collation%';
I get the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+--------------------------+--------------------+
I changed these values to the same that a coworker has in one of his projects, in which he also requires 4 byte emojis to be stored.
That's not the table's fault; it is the connection. Something like:
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
'OPTIONS': {
'charset': 'utf8mb4',
'use_unicode': True, },
},
}
We have a MySQL InnoDB table, with a text field COLLATE utf8mb4_unicode_ci. I need to search for rows that contain any emoji characters. I've searched through quite a few SO questions, but people seem to have a list of emojis they are searching for. I'm actually looking for a solution that will find ANY emoji.
Here are some posts that are not helping.
This one seems to come closest to actually providing me with what I'm looking for, but the OP hasn't actually posted his search code.
Thanks!
I've had situation where db migration from one server to another caused emoji to disappear. So I had to find all rows in original table which contained high utf8 (emoji) characters.
This query worked as expected:
SELECT field FROM `table` WHERE HEX(field) RLIKE "^(..)*F.";
before doing anything check if you are using utf8mb4 on your db, tables AND connection:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Could this work maybe ?
Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql
[\x{23}\x{2A}\x{30}-\x{39}\x{A9}\x{AE}\x{203C}\x{2049}\x{2122}\x{2139}\x{2194}-\x{2199}\x{21A9}-\x{21AA}\x{231A}-\x{231B}\x{2328}\x{23CF}\x{23E9}-\x{23F3}\x{23F8}-\x{23FA}\x{24C2}\x{25AA}-\x{25AB}\x{25B6}\x{25C0}\x{25FB}-\x{25FE}\x{2600}-\x{2604}\x{260E}\x{2611}\x{2614}-\x{2615}\x{2618}\x{261D}\x{2620}\x{2622}-\x{2623}\x{2626}\x{262A}\x{262E}-\x{262F}\x{2638}-\x{263A}\x{2640}\x{2642}\x{2648}-\x{2653}\x{2660}\x{2663}\x{2665}-\x{2666}\x{2668}\x{267B}\x{267F}\x{2692}-\x{2697}\x{2699}\x{269B}-\x{269C}\x{26A0}-\x{26A1}\x{26AA}-\x{26AB}\x{26B0}-\x{26B1}\x{26BD}-\x{26BE}\x{26C4}-\x{26C5}\x{26C8}\x{26CE}-\x{26CF}\x{26D1}\x{26D3}-\x{26D4}\x{26E9}-\x{26EA}\x{26F0}-\x{26F5}\x{26F7}-\x{26FA}\x{26FD}\x{2702}\x{2705}\x{2708}-\x{270D}\x{270F}\x{2712}\x{2714}\x{2716}\x{271D}\x{2721}\x{2728}\x{2733}-\x{2734}\x{2744}\x{2747}\x{274C}\x{274E}\x{2753}-\x{2755}\x{2757}\x{2763}-\x{2764}\x{2795}-\x{2797}\x{27A1}\x{27B0}\x{27BF}\x{2934}-\x{2935}\x{2B05}-\x{2B07}\x{2B1B}-\x{2B1C}\x{2B50}\x{2B55}\x{3030}\x{303D}\x{3297}\x{3299}\x{1F004}\x{1F0CF}\x{1F170}-\x{1F171}\x{1F17E}-\x{1F17F}\x{1F18E}\x{1F191}-\x{1F19A}\x{1F1E6}-\x{1F1FF}\x{1F201}-\x{1F202}\x{1F21A}\x{1F22F}\x{1F232}-\x{1F23A}\x{1F250}-\x{1F251}\x{1F300}-\x{1F321}\x{1F324}-\x{1F393}\x{1F396}-\x{1F397}\x{1F399}-\x{1F39B}\x{1F39E}-\x{1F3F0}\x{1F3F3}-\x{1F3F5}\x{1F3F7}-\x{1F4FD}\x{1F4FF}-\x{1F53D}\x{1F549}-\x{1F54E}\x{1F550}-\x{1F567}\x{1F56F}-\x{1F570}\x{1F573}-\x{1F57A}\x{1F587}\x{1F58A}-\x{1F58D}\x{1F590}\x{1F595}-\x{1F596}\x{1F5A4}-\x{1F5A5}\x{1F5A8}\x{1F5B1}-\x{1F5B2}\x{1F5BC}\x{1F5C2}-\x{1F5C4}\x{1F5D1}-\x{1F5D3}\x{1F5DC}-\x{1F5DE}\x{1F5E1}\x{1F5E3}\x{1F5E8}\x{1F5EF}\x{1F5F3}\x{1F5FA}-\x{1F64F}\x{1F680}-\x{1F6C5}\x{1F6CB}-\x{1F6D2}\x{1F6E0}-\x{1F6E5}\x{1F6E9}\x{1F6EB}-\x{1F6EC}\x{1F6F0}\x{1F6F3}-\x{1F6F6}\x{1F910}-\x{1F91E}\x{1F920}-\x{1F927}\x{1F930}\x{1F933}-\x{1F93A}\x{1F93C}-\x{1F93E}\x{1F940}-\x{1F945}\x{1F947}-\x{1F94B}\x{1F950}-\x{1F95E}\x{1F980}-\x{1F991}\x{1F9C0}]
In my opinion the easy way is to create a table with all emoji codes and then make a join through like condition to your table.
I share here how to insert emotis on mysql:
create table emojis (
e varchar(100) COLLATE utf8mb4_unicode_ci
);
insert into emojis values
( _utf8mb4 0xF09F9881 COLLATE utf8mb4_unicode_ci),
( _utf8mb4 '😂' );
The final query should look like:
select distinct yt.id
from your_table yt
inner join emojis e
on yt.some_column like '%' + e.e + '%'
I am having a problem with mysql when I tried to insert unicode (chinese here) in mysql, for example, I want to insert:
insert into site_parameter(name) values("测试");
however in mysql terminal, it become:
mysql> insert into site_parameter(name) values("");
Query OK, 1 row affected (0.00 sec)
I can't even type chinese under the mysql terminal.
here is my.conf
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
I have checked the collation:
mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> show create table site_parameter;
| site_parameter | CREATE TABLE `site_parameter` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
`description` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=388 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
1 row in set (0.00 sec)
could anyone help?
There is a charset property in console window (CMD).
You can check current or set another with chcp command -
>chcp
>will output current
>chcp 65001
>will set utf-8
Some information about this command - Chcp.
As Devart said, the link below does provide the solution:
https://superuser.com/questions/55224/change-the-default-codepage-from-latin1-to-utf8-on-a-linux-machine
Here is the solution detail:
Edit /var/lib/locales/supported.d/local and add your locale to the list of supported locales if it isn't there already, eg:
en_US UTF-8
Regenerate the supported locales on your machine:
sudo dpkg-reconfigure locales
Open /etc/default/locale and check if LANG and LANGUAGE are changed:
LANG="en_US"
LANGUAGE="en_US:UTF-8"
if they are not, you can manually update them now.
reboot.
Now because below phenomenon I feel I totally do not understand character set. At first I think only utf8mb4 support Emoji character e.g. 😀.
See below:
As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters
But accidentally I found this phenomenon,see below:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> show create table t4\G
*************************** 1. row ***************************
Table: t4
Create Table: CREATE TABLE `t4` (
`data` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into t4 select '\U+1F600';
mysql> select * from t4;
+------+
| data |
+------+
| 😀 |
+------+
Now I'm very confused, it seems latin1 also could support emoji character. I know it must be an illusion, but I don't know how to clear it?
You cannot store anything other than iso-8859-1 characters into an latin1 field without converting it to e.g. base64
It might work, but will fail later at some point. In special having multibyte characters like emoticons.
Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.