Saving emoji characters in mysql database - mysql

I am trying to save some data on mysql database, input contains emoji characters like this : '\U0001f60a\U0001f48d' and I'm getting this error:
1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8A\\xF0\\x9F...' for column 'caption' at row 1"
I searched over net and read a lot of answers include these:
MySQL utf8mb4, Errors when saving Emojis or MySQL utf8mb4, Errors when saving Emojis or https://mathiasbynens.be/notes/mysql-utf8mb4#character-sets or http://www.java2s.com/Tutorial/MySQL/0080__Table/charactersetsystem.htm but nothing worked !!
I have different problems:
here is mydb info:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8_general_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
I tried to change character_set_server value to utf8mb4 by
mysql>SET character_set_server = utf8mb4
Query OK, 0 rows affected (0.00 sec)
But when restart mysqld everything revert !
I don't have any /etc/my.cnf file in also, and I edited /etc/mysql/my.cnf file instead.
What should I do?
How can I save emoji file in my database?

1st or 2nd line in source code (to have literals in the code utf8-encoded: # -- coding: utf-8 --
Your columns/tables need to be CHARACTER SET utf8mb4
The python package "MySQL-python" version needs to be at least 1.2.5 in order to handle utf8mb4.
self.query('SET NAMES utf8mb4') may be necessary.
Django needs client_encoding: 'UTF8' -- I don't know if that should be 'utf8mb4`.
References:
https://code.djangoproject.com/ticket/18392
http://mysql.rjweb.org/doc.php/charcoll#python

Related

Force MariaDB clients to use utf8mb4

I'm running into an issue where I'm getting differently ordered results when querying with PHP Versus the command line. From my research, it appears that in some cases that bad encoding can cause problems with the order of the results.
That said, all my DB tables are encoded as utf8mb4, with the collation utf8mb4_general_ci. However, it doesnt seem that the mysql variables are set correctly.
I'm on Mysql 5.5.5-10.1.26-MariaDb.
Here are my CNF settings, but to be honest I don't know what I'm doing here:
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mariadb]
[mysqld]
character-set-server=utf8mb4
character_set_client=utf8mb4
collation-server=utf8mb4_general_ci
The variables output from mysql:
character_set_client utf8
character_set_connection utf8
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8
character_set_server utf8mb4
character_set_system utf8
collation_connection utf8_general_ci
collation_database utf8mb4_unicode_ci
collation_server utf8mb4_general_ci
Update: A person has asked for how I'm connecting to the database:
$this->connection = new PDO('mysql:host='.DB_SERVER.';dbname='.DB_NAME.';port='.DB_PORT, DB_USER, DB_PASS, $options);
Update: I've switched to utf8mb4_unicode_ci (as per suggestions in answers below).
You want to have character-set-client-handshake = FALSE as well.
With /etc/my.cnf.d/character-set.cnf
# https://scottlinux.com/2017/03/04/mysql-mariadb-set-character-set-and-collation-to-utf8/
# https://mariadb.com/kb/en/library/setting-character-sets-and-collations/
# https://medium.com/#adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434
# https://stackoverflow.com/questions/47566730/force-mariadb-clients-to-use-utf8mb4
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
collation-server = utf8mb4_unicode_ci
init-connect = 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci'
character-set-server = utf8mb4
I get everything to be utf8mb41
MariaDB [(none)]> show variables like 'char%'; show variables like 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.00 sec)
MariaDB [(none)]>
however without the character-set-client-handshake line some are still utf8
MariaDB [(none)]> show variables like 'char%'; show variables like 'collation%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
3 rows in set (0.01 sec)
MariaDB [(none)]>
1 character_set_system is always utf8.
You should probably use utf8mb4_unicode_ci instead of utf8mb4_general_ci as it's more accurate. Unless you're running MariaDB on a system with an old/limited CPU and performance is a huge concern.
That being said, the solution is to set init_connect in your MariaDB configuration (or --init-connect on the command line):
init_connect = "SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci"
Either way is fine. I am not recommending one way over the other. Both are equally valid approaches.
Your MariaDB configuration may be in my.cnf or a file included by my.cnf, typically found under /etc/mysql. Check your system documentation for details. Because you are configuring a server variable, as indicated by the MariaDB documentation linked to above, you should set the variable in the server part of the configuration file. The server part of the configuration files is indicated by the INI section names ending in "d". An INI section is denoted by a keyword surrounded by square brackets, e.g. "[section]". The "d" stands for "daemon", which is standard UNIX nomenclature for a server process. You can set the variable in either the [mysqld] section or the [mariadb] section. Because the init_connect server variable is common to both MySQL and MariaDB, I would recommend you put it under [mysqld].
I see that you are setting character_set_client=utf8mb4 in your pasted configuration. You don't need to do this. You can delete or comment out the line. Comments are lines starting with pound symbol (#), also known as a hash mark, octothorp, or number sign.
Any and all clients that connect to the server will execute these command(s) before any other commands are processed.
init_connect is not performed by anyone connecting as root, so it is not as universal as you would like.
SET NAMES utf8mb4 sets 3 things; experiment to see that. You need all 3.
If you weren't as far back as 5.5, I would recommend utf8mb4_unicode_520_ci as being a better collation: "Unicode collation names now may include a version number to indicate the Unicode Collation Algorithm (UCA) version on which the collation is based. Initial collations thus created use version UCA 5.2.0. For example, utf8_unicode_520_ci is based on UCA 5.2.0. UCA-based Unicode collation names that do not include a version number are based on version 4.0.0."
Version 8.0 has Unicode 9.0 standard.
Back to the question: There is no perfect solution; the user can override whatever you do -- either through ignorance or through malice.
You could police the tables created, but that won't keep them from connecting incorrectly. Or correctly, but with a different charset. It is valid to do SET NAMES latin1, then provide latin1-encode bytes. MySQL will convert as it stores/fetches.
But if they have utf8-encoded bytes, but say SET NAMES latin1, you get "double encoding". This "bug" destroys any chance of collating correctly, but is otherwise (usually) transparent. That is, stuff is messed up as it is stored, then un-messed up as it is fetched.
To fix this warning you should edit
/etc/my.cnf (my.ini on Windows)
Simply add/set in the file
[client]
default-character-set=utf8mb4
[mysql]
default-character-set=utf8mb4
[mysqld]
collation-server=utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server=utf8mb4

Mysql 5.5 how to set up everything to utf8?

I just want everything default to utf8. I've checked this question but nothing help.
Currently, My /etc/my.cnf is
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
But when I restart the server, create a new database, it is still latin1(character_set_database and character_set_server):
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
When I create a database, It is latin1:
mysql> create database d1;
Query OK, 1 row affected (0.00 sec)
mysql> use d1;
Database changed
mysql> show variables like "character_set_database";
+------------------------+--------+
| Variable_name | Value |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+
1 row in set (0.00 sec)
When I create a table in this database, it can't recognize valid utf8 啊:
mysql> create table t1(name varchar(1) default '啊');
ERROR 1067 (42000): Invalid default value for 'name'
I know alter database d1 character set utf8; will fix this. But I just want everything default to utf8, is it possible?
This is tricky.
The character set and collation for the default database can be
determined from the values of the character_set_database and
collation_database system variables. The server sets these variables
whenever the default database changes. If there is no default
database, the variables have the same value as the corresponding
server-level system variables, character_set_server and
collation_server.
So one would assume the default for the collation-database is the same as the default for the collation-server variable.
Please check the following:
Is there any other config that would override your my.cnf, like /etc/mysql/mysql.cnf or ~/.my.cnf ?
The client (not server!) is setting its own collation upon startup, so you could set a client collation/encoding through [mysql] (not mysqld) or look if this is already set somewhere.
You do SHOW VARIABLES ... - this is querying SESSION based variables, try to query explicitly global settings through SHOW GLOBAL VARIABLES ...

MySQL UTF8 Issue

Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.

How do I convert the OpenShift MySQL 5.1 cartridge to UTF-8

The default MySQL 5.1 cartridge apparently creates all its tables with the latin1 character set. I have an application (Review Board, a python/Django application) that has some issues unless the DB is running as UTF-8. How do I change that? I can't just edit my.cnf because it will be wiped at the next cartridge restart.
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I cannot change this setting in my.cnf, because to the best of my knowledge, there exists no OpenShift environment variable to set the character encoding. How do I persistently change this (ideally in my OpenShift hooks so this will persist into future deployments) and update my existing tables to UTF-8?
I found a solution but not a perfect one :
In openshift installing phpMyAdmin,
Find and change server settings, the relevant character variables changed from latin1 to utf8.
Hope that helps

Rails show question marks(????) for my input utf8 data

I have set every encoding set variable I can figure out to utf8.
In database.yml:
development: &development
adapter: mysql2
encoding: utf8
In my.cnf:
[client]
default-character-set = utf8
[mysqld]
default-character-set = utf8
skip-character-set-client-handshake
character-set-server = utf8
collation-server = utf8_general_ci
init-connect = SET NAMES utf8
And if I run mysql client in terminal:
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
But it's to beat the air. When I insert utf8 data from Rails app, it finally becomes ????????????.
What do I miss?
Check not global settings but when you are connected to specific database for application. When you changed settings for mysql you have also change settings for your app database.
Simple way to check it is to log to mysql into app db:
mysql app_db_production -u db_user -p
or rails command:
rails dbconsole production
For my app it looks like this:
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_general_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
Command for changing database collation and charset:
mysql> alter database app_db_production CHARACTER SET utf8 COLLATE utf8_general_ci ;
Query OK, 1 row affected (0.00 sec)
And remeber to change charset and collation for all your tables:
ALTER TABLE tablename CHARACTER SET utf8 COLLATE utf8_general_ci; # changes for new records
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; # migrates old records
Now it should work.
I had the same problem. I added characterEncoding to the end of mysql connection string:
use this: jdbc:mysql://localhost/dbname?characterEncoding=utf8
instead of this: jdbc:mysql://localhost/dbname
Okay for anybody else for whom the #Ravbaker answer does not cut it .. some more tips
MySQL has encoding specified in multiple levels : server, database, connection, table and even field/column. My problem was that the field/column was forced to latin (which over rides all the other encodings). I set the field back to the table encoding (which was utf-8) and the world was good again.
Most of these settings can be set at the usual places: my.cnf, alter queries and rails database.yml file.
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
was the query which did the trick for me.
For server / connection encodings use my.cnf and database.yml
For database / table / column encodings use queries
(You can also achieve these by other means)
Do you have this in the HTML?
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
or on HTML5 pages with <!doctype html>
<meta charset="utf-8">
You may need this to let the browser send strings in utf8.
I have some problem today! It's solved by drop my table and creating new, then db:migrate and all is pretty works!
WARNING: IT WILL DELETE ALL YOUR DATA IN THIS TABLE
So:
$ mysql -u USER -p
mysql > drop database YOURDB_NAME_development;
mysql > create database YOURDB_NAME_development CHARACTER SET utf8 COLLATE utf8_general_ci;
mysql > \q
$ rake db:migrate
Well done!