Storing emojies in mysql - mysql

I would like to store emojies in mysql (version 5.7.18).
My table structure looks like this:
CREATE TABLE `message_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`created_at` datetime(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`chat_id` int(11) NOT NULL,
PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
I am trying to save emojies in message field only and I can see that it gets saved with question marks (?☺️???).
Is there a way for me to read these values directly from the table (actually I would like to see emojies in table viewer). I am using SequelPro for viewing table (if that matters).
Exact mysql query that I am running
INSERT INTO message_message(message, created_at, msg_sender_id, chat_id, is_read) VALUES ('💁👍', UTC_TIME(), 110, 164, False)
If I run select query on this table, it looks like this:
+---------------------------------------------------------------------+
| message |
+---------------------------------------------------------------------+
| 😁 |
| 😁💁👍 |
| 💁👍 |
| 💁👍 |
| 💁👍 |
| 💁👍
Does this looks like data is stored correctly?

Apparently, your data is stored correctly.
You provided this string F09F9281F09F918D as a result for SELECT hex(message) for the data inserted with
INSERT INTO message_message(message, created_at, msg_sender_id, chat_id, is_read) VALUES ('💁👍', UTC_TIME(), 110, 164, False)
And if one checks the UTF8 for both emojis:
F0 9F 92 81 for 💁
F0 9F 91 8D for 👍
then you would find that those exactly match with what you already have.
It means your code is correct and if you have any problems with your GUI application - it's a GUI application configuration or unicode support issues and is a bit out of topic for the stackoverflow.
References:
https://unicode-table.com/en/1F481/
https://unicode-table.com/en/1F44D/

I think your table collation must be properly configured too:
CREATE TABLE `message_message` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`message` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`created_at` datetime(6) NOT NULL,
`is_read` tinyint(1) NOT NULL,
`chat_id` int(11) NOT NULL,
PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

Make sure your table collation is CHARACTER SET utf8mb4 COLLATE utf8mb4_bin, to update this (in your case), the query would be:
ALTER TABLE message_message CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
Make sure your database's default collation is utf8mb4, to update this, the query would be:
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";

Related

How do I do a case-insensitive MySQL query when columns use utf8mb4_bin collation?

I have a first column typed as varchar(190) that is using utf8mb4_bin collation.
When I perform the following query I only get back all of Joe as expected:
SELECT first, last FROM person WHERE first = 'Joe'
What I would like to get is Joe, joe, jOe, joE, jOE, JoE, JOE, and JOe. Basically a case-insensitive search on a case-sensitive field.
How do I do this?
CREATE TABLE `person` (
`id` int NOT NULL AUTO_INCREMENT,
`first` varchar(190) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
`middle` varchar(190) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
`last` varchar(190) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin DEFAULT NULL,
`job` varchar(190) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
UNIQUE KEY `names_unq` (`first`,`middle`,`last`,`job`),
KEY `index_job` (`job`),
KEY `index_first` (`first`,`job`),
KEY `index_first_last` (`first`,`last`,`job`),
KEY `index_middle` (`middle`,`job`),
KEY `index_last` (`job`,`last`)
) ENGINE=InnoDB AUTO_INCREMENT=99750823 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
You can specify a collation in a string comparison expression to override the collation used in the comparison. Read https://dev.mysql.com/doc/refman/8.0/en/charset-literal.html for more details on this.
CREATE TABLE `person` (
`first` text COLLATE utf8mb4_bin,
`last` text COLLATE utf8mb4_bin
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
mysql> select first, last from person where first = 'Joe';
+-------+-------+
| first | last |
+-------+-------+
| Joe | Grant |
+-------+-------+
mysql> select first, last from person where first = 'joe';
Empty set (0.00 sec)
mysql> select first, last from person where first = 'joe' collate utf8mb4_unicode_ci;
+-------+-------+
| first | last |
+-------+-------+
| Joe | Grant |
+-------+-------+
use "collate utf8mb4_unicode_ci" as it makes one-to-one comparison with character to whatever the filter condition you have given.
The simplest way to do this is to use UPPER().
NB: This is not optimised (unless there is an index UPPER() which is unlikely) but it is a quick fix for simple queries. I do not recommend it for large scale or production queries without testing the speed/cost.
SELECT first, last FROM person WHERE UPPER(first) = 'JOE';
If you are matching a parameter you might need to use upper on both side as in
SELECT first, last FROM person WHERE UPPER(first) = UPPER(#name);

MySQL utf8 multibyte (utf8mb4) insert duplicate entry problem

I have two words ('বাঁধা' and 'বাধা') to be inserted in a mysql (8.0.12 - MySQL Community Server - GPL) table. The word 'বাঁধা' is inserted correctly. But when inserting 'বাধা', mysql produces an error:
INSERT INTO lc6_words(jp_word, jp_fcharascii) VALUES('বাঁধা', 2476);
/*Query OK*/
INSERT INTO lc6_words(jp_word, jp_fcharascii) VALUES('বাধা', 2476);
/*#1062 - Duplicate entry 'বাধা' for key 'jp_word'*/
The table structure:
CREATE TABLE IF NOT EXISTS `lc6_words` (
`jp_wkey` BIGINT NOT NULL AUTO_INCREMENT,
`jp_word` varchar(255) NOT NULL,
`jp_fcharascii` int UNSIGNED NOT NULL,
`jp_word_occ` BIGINT UNSIGNED NOT NULL DEFAULT 1,
UNIQUE(`jp_word`),
PRIMARY KEY (`jp_wkey`)
) ENGINE=MyISAM DEFAULT CHARSET=UTF8MB4 COLLATE=utf8mb4_bin;
Relevant queries and their output:
SELECT jp_wkey FROM lc6_words WHERE BINARY jp_word='বাঁধা';
/* 1 */
SELECT jp_wkey FROM lc6_words WHERE BINARY jp_word='বাধা';
/* Empty */
Thanks for reading this far. And some more too if you share your thoughts :).
There seems to be problem in collation. After running the command below, all worked perfectly:
ALTER TABLE lc6_words MODIFY jp_word VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;
Note: The VARCHAR size changed from 255 to 191.

MySQL Character Set and Collate

I use MySQL 5.7, but I do not know how to config it to display Vietnamese correctly.
I have set
CREATE DATABASE brt
DEFAULT CHARACTER SET utf8 COLLATE utf8_vietnamese_ci;
After that I used "LOAD DATA LOCAL INFILE" to load data written by Vietnamese into the database.
But I often get a result with error in Vietnamese character display.
For the detailed codes and files, please check via my GitHub as the following link
https://github.com/fivermori/mysql
Please show me how to solve this. Thanks.
As #ysth suggests, using utf8mb4 will save you a world of trouble going forward. If you change your create statements to look like this, you should be good:
CREATE DATABASE `brt` DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
USE `brt`;
DROP TABLE IF EXISTS `fixedAssets`;
CREATE TABLE IF NOT EXISTS `fixedAssets` (
`id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`code` varchar(250) UNIQUE NOT NULL DEFAULT '',
`name` varchar(250) NOT NULL DEFAULT '',
`type` varchar(250) NOT NULL DEFAULT '',
`createdDate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE INDEX `idx_fa_main` ON `fixedAssets` (`code`);
I've tested this using the data that you provided and get the expected query results:
name
----------------------------------------------------------------
Mould Terminal box cover BN90/112 612536030 39 tháng
Mould W2206-045-9911-VN #3 ( 43 tháng)
Mould Flange BN90/B5 614260271 ( 43 tháng)
Mould 151*1237PH04pC11 ( 10 năm)
Transfer 24221 - 2112 ( sửa chữa nhà xưởng Space T 07-2016 ) BR2
Using the utf8mb4 character set and utf8mb4_unicode_ci collation is usually one of the simpler ways to ensure that your database can correctly display everything from plain ASCII to modern emoji and everything in between.

pt-online-schema-change error with big composite primary key

I've a table with composite primary key with below structure:
CREATE TABLE field_name_test (
id_type varchar(128) NOT NULL DEFAULT '',
desc varchar(128) NOT NULL DEFAULT '' ,
deleted tinyint(4) NOT NULL DEFAULT '0' ,
type_id int(10) unsigned NOT NULL ,
rev_id int(10) unsigned NOT NULL ,
lang varchar(32) NOT NULL DEFAULT '',
delta int(10) unsigned NOT NULL,
fname_value varchar(255) DEFAULT NULL,
fname_format varchar(255) DEFAULT NULL,
PRIMARY KEY (id_type,type_id,rev_id,deleted,delta,lang),
KEY id_type (id_type),
KEY desc (desc),
KEY deleted (deleted),
KEY type_id (type_id),
KEY rev_id (rev_id),
KEY lang (lang),
KEY fname_format (fname_format)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
i'm running pt-o-s-c to change the collation of the table and it is working fine with other tables but this one is giving below error:
pt-online-schema-change --execute --password=#### --user=#### --socket=#### --port=#### --chunk-time=1 --recursion-method=none --no-drop-old-table --alter "CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci , CHANGE desc desc varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci , CHANGE id_type id_type varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci , CHANGE lang lang varchar(32) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci , ROW_FORMAT=DYNAMIC , LOCK=SHARED, ALGORITHM=COPY" D=db,t=field_name_test,h=localhost
No slaves found. See --recursion-method if host ###### has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_foreign_keys, 10, 1
Altering db.field_name_test...
Creating new table...
Created new table db._field_name_test_new OK.
Altering new table...
Altered db._field_name_test_new OK.
2017-09-15T09:18:47 Creating triggers...
2017-09-15T09:18:47 Created triggers OK.
2017-09-15T09:18:47 Copying approximately 3843064 rows...
2017-09-15T09:18:47 Dropping triggers...
2017-09-15T09:18:47 Dropped triggers OK.
2017-09-15T09:18:47 Dropping new table...
2017-09-15T09:18:47 Dropped new table OK.
db.field_name_test was not altered.
2017-09-15T09:18:47 Error copying rows from db.field_name_test to db._field_name_test_new: 2017-09-15T09:18:47 Error copying rows at chunk 1 of db.field_name_test because MySQL used only 390 bytes of the PRIMARY index instead of 497. See the --[no]check-plan documentation for more information.
I'm running above in Galera 3 node cluster.
So i've below concerns on pt-o-s-c:
1) what solutions can be for above such cases ?
2) Is it possible to run parallel pt-o-s-c in a same database ?
Please let me know if any other input you need. Thanks in advance.

MySQL UTF-8 Collation not working as I would expect [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Looking for case insensitive MySQL collation where “a” != “ä”
I'm struggling with this utf8 nonsense, I create a test table:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(20) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;
I insert a single row:
INSERT INTO `test`(`name`) VALUES ('Cryptïc');
I query against the table:
SELECT `name` FROM `test` WHERE `name` LIKE 'Cryptic';
I get result set:
+---------+
| name |
+---------+
| Cryptïc |
+---------+
i should not equal ï, a little help?
Use utf8_bin instead of utf8_general_ci.
With utf8_general_ci, similar characters (like i and ï) are treated as the same character in comparisons and sorting. The comparison is also case insensitive (hence the _ci), which means that i and I are also treated the same.
Other collations, like utf8_unicode_ci do better sorting, but still 'fail' on comparisons.