Incorrect String Value in MySQL db - mysql

I am running a webapp on Ubuntu 16.04.4.
The stack is as follows
Python 3.5.2
MySQL 5.7.22
Flask
Flask-SQLAlchemy
The webapp has a feature for admins to upload some text using a xlsx. file which is read with openpyxl inside the webapp. However while saving I am getting errors like:
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1366, "Incorrect string value: '\\xC4\\x9B nep...'
In the beginning I was able to delete the characters which were making troubles (e.g. zero width whitespace). But now I am not able to do it anymore like this.
Reading a bit on the internet I think it could be that my db is not using utf8mb4. Could someone lead me to update my db and all its tables? Because I do not know anything about SQL and stuff.
As the webapp is used in production I do not like to try tutorials which are outdated.

Seems to work now. I did following steps:
Started the mysql cli with:
mysql -u root -p
Logged in using the root pw.
Checked the default parameters using
show variables like "%character%";
which gave me:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
and
show variables like "%collation%";
which gave me
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
So I edited /etc/mysql/my.cnf
I added:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
Restarting mysql (sudo service mysql restart) and running the same commands as above now gave me
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
and
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+----------------------+--------------------+
So I looked up the table settings using
SHOW TABLE STATUS FROM databasename;
They still used stuff like latin1_swedish_ci
I used following to change the database setting:
ALTER DATABASE databasename CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
and following for each table:
use databasename;
ALTER TABLE assessments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Looking up again the table settings showed that latin1_swedish_ci was now changed to utf8mb4_unicode_ci
Then I changed the sqlalchemy connection url to use ?encoding=utf8mb4 at the end.
Restarted mysql again and the webapp. Since then it's working properly.

Related

Polish chars cant be inserted to DB via jdbc MYSQL

i have a problem with inserting data with polish chars to Mysql DB. Im working on windows 8 and Ubuntu. At Windows there is no problem but on ubuntu i can not insert that kind of chars: "żąśźćłż" in place of them i get: "?????". I have checked with TRACE lvl of logging. My application put correct Strings to prepared query but in db i see "???????". I can insert that kind of chars via cmd and its ok, so problably there is some problem with connector? Or some other settings. I have tried change:
mysql> show variables like "collation%";;
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+--------------------+
to
utf8_general_ci
every where but after service(mysql) restart its come back with the same with
mysql> show variables like "character%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I can not set utf8 for database and server.
Anyone have some ideas?
Adding the line
character_set_server = utf8
in the [mysqld] section of the MySQL configuration file (my.ini or my.cnf) should set the new value the next time the MySQL server is started.

How to configure my.cnf for multiple CHARACTER SET of database in one instance

In a instance i have two databases:
1st databse -> my_db
2nd database -> sample_db
mysql> show global variables like 'char%';
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.6.27.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
mysql> show global variables like 'coll%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
mysql> use my_db;
SHOW CREATE DATABASE my_db ;
+-------------------+-------------------------------------------------------------------------------------------+
| Database | Create Database
+-------------------+-------------------------------------------------------------------------------------------+
| plum_production_1 | CREATE DATABASE `my_db` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci */ |
+-------------------+-------------------------------------------------------------------------------------------+
1st database; my_db
mysql> show variables like '%coll%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-------------------+
mysql> show variables like '%char%';
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.6.27.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
2nd database:
mysql> use sample_db;
mysql> show create database sample_db;
+-----------------+----------------------------------------------------------------------------+
| Database | Create Database
|
+-----------------+----------------------------------------------------------------------------+
| plum_production | CREATE DATABASE `plum_production` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+-----------------+----------------------------------------------------------------------------+
mysql> show variables like '%char%';
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.6.27.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
mysql> show variables like '%coll%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
How to configure my.cnf when we require multiple collation types of db's i.e.,
i need
my_db - character_set utf8 collate utf8_unicode_ci.
Sample_db - character_set latin1 collate latin1_swedish_ci.
With the above configuration am facing some issues like tables are locked when trying to insert records into multiple tables except 1st table of insert statement.And other queries are too slow.Temporarily i changed my_db -character_set latin1 collate latin1_swedish_ci,now it is working fine.
But my requirement was not this.
For my_db table & columns: character set- Utf8,collation-utf8_unicode_ci --> To get this done i altered
Database :- Alter database my_db characterset utf8 collate utf8_unicode_ci,
Tables :- For all tables - ALTER TABLE table_names CHARACTER SET utf8 COLLATE utf8_unicode_ci;
To convert all Columns :- ALTER TABLE table_names CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
am i in right way ?.is there anything to change other than in my.cnf?
and in sample_db:Charcter set-latin1,collation-latin_swedish_ci.
We are using awsrds my.cnf looks like this :-
[mysqld]
character_set_client: utf8
character_set_database: utf8
character_set_results: utf8
character_set_connection: utf8
character_set_server: utf8
collation_connection: utf8_unicode_ci
collation_server: utf8_unicode_ci
and also how to configure local instance my.cnf(not in aws) ? for example:
[client]
[mysql]
[mysqld]
When connecting how can i set names utf8_mb4?is it required to mention always when connecting to that db? i asked many qstions coz am confused and scared of data lose..thanks in advance.
my.cnf is mostly defaults that can be overridden. If you have a mixture, don't worry about it; focus on the other settings.
Client
What client do you have? (All I see is mysql commandline tool.) Probably the client should be always utf8mb4 (mysql character set, equivalent to the outside world of UTF-8).
When connecting, use the connection parameters to establish CHARACTER SET utf8mb4, possibly by doing SET NAMES utf8m4;
Data in Columns
Each column can have a CHARACTER SET and COLLATION. If not specified, they default from the CREATE TABLE. If that does not specify, it defaults from the CREATE DATABASE. Etc.
So, be sure each column is the way they need to be. Use SHOW CREATE TABLE to verify.
Client to/from Columns
MySQL transcodes data as it goes between the client and the server. So, it is OK to have the client using utf8mb4, but INSERTing/SELECTing a column that is declared latin1. (Some combinations won't work.)
Corollary: There is no problem if one DB is latin1 and another is utf8.
Garbage
See "best practice" in http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored . If you get gibberish, see that link for further debugging/cures.

Overriding my.conf template in RDS using AWS OpsWorks

I'm trying to change my RDS instance default character set to utf8mb4 so I can support emojis. I have a repo with all my recipes, which I've used in the past to customize my deployments. I followed this guide from AWS, but when I deploy the app, the changes aren't reflected in the database. I also made sure to create a metadata.rb file in the root of the mysql directory in my custom cookbooks repo.
I also setup a new RDS instance using a new property group where the appropriate character sets and collations are set to utf8mb4. This DB is also set as my datasource in my app in OpsWorks.
In Rails, I also set the encoding and collation to utf8mb4...
default: &default
adapter: mysql2
encoding: utf8mb4
collation: utf8mb4_general_ci
...
If I ssh into my application server and then connect to MySql, this is what I see when querying for global variables...
mysql> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
But when I do the same for non-global variables, I see this:
mysql> show variables like 'char%';
+--------------------------+-------------------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /rdsdbbin/mysql-5.6.23.R1/share/charsets/ |
+--------------------------+-------------------------------------------+
8 rows in set (0.00 sec)
I should also mention that this works fine locally after I set the appropriate tables and columns to utf8mb4 using a migration.
At this point, I can't figure out why I can't get the character sets to apply correctly. Hopefully someone smarter than me can help me figure this out!
Thanks
After digging a bit deeper, I realized that I missed a critical step when updating custom cookbooks. I didn't realize that simply doing a deployment wouldn't retrieve the new recipes. After I ran the command to "Update Custom Cookbooks" and then did a deployment, it worked.
Hopefully this helps someone in the future.

MySQL utf8mb4 on Amazon RDS: global variables set correctly but variables not set

I'm trying to convert my Amazon RDS server to use utf8mb4 encoding instead of utf8. I've followed the guide here and it has worked for the most part (global variables are set through my new parameter group in RDS), but my system variables are not setting correctly which means that I'm not able to utilize the new encoding.
When I run:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
I see:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
Which is obviously incorrect, but when I run:
SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
I see:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
This is correct, but for some reason these global values are not setting the server when I restart the server. I can correctly set the variables manually after restarts, but I don't understand why they aren't setting initially.
To clarify Aaron's answer, this was (at least for us) caused by the character set/collation of our connection to the database; the database itself is set up correctly. When you use any client to connect to the db -- whether it be SQLyog, MySQL Workbench, built in client or any other -- there is a character set and collation associated with that connection. Thus you need to need to change this connection charset/collation to utf8mb4 and utf8mb4_unicode_ci, and the values of
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
will then all show correctly. If you take a close look at the "problem" values in the original question you'll notice they are _client, _connection etc. which should have given me an obvious clue that the problem was with my mysql client and not the database itself.
Fixed it. It was an issue with my local installation of MySQL - not the server. I had to change the default encoding that it sent and it worked great from there.
For me this was because I had skipped the instruction (in the original linked guide) to modify /etc/my.cnf:
[client]
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

MySQL 5.5 utf8 trouble

I've done all the things that worked on previous versions of MySQL (and new to MySQL 5.5) to set utf8 encoding.
Now I have output of
SHOW GLOBAL VARIABLES LIKE '%char%';
exactly what I wanted:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
And that cmds in [mysqld]:
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
In [client] and [mysql]:
default-character-set= utf8
But default table encoding in creating is still latin1!
Am I missed some other things with MySQL 5.5 to make it work?
Thanks in advance!
Default table encoding equals to the current database encoding.
You can check it with
SHOW CREATE DATABASE dbname;
PS: it's a good practice to always specify encoding explicitly. That way you won't rely on server settings.