MySQL UTF8 Issue - mysql

Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.

Related

Mysql 5.5 how to set up everything to utf8?

I just want everything default to utf8. I've checked this question but nothing help.
Currently, My /etc/my.cnf is
[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
But when I restart the server, create a new database, it is still latin1(character_set_database and character_set_server):
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
When I create a database, It is latin1:
mysql> create database d1;
Query OK, 1 row affected (0.00 sec)
mysql> use d1;
Database changed
mysql> show variables like "character_set_database";
+------------------------+--------+
| Variable_name | Value |
+------------------------+--------+
| character_set_database | latin1 |
+------------------------+--------+
1 row in set (0.00 sec)
When I create a table in this database, it can't recognize valid utf8 啊:
mysql> create table t1(name varchar(1) default '啊');
ERROR 1067 (42000): Invalid default value for 'name'
I know alter database d1 character set utf8; will fix this. But I just want everything default to utf8, is it possible?
This is tricky.
The character set and collation for the default database can be
determined from the values of the character_set_database and
collation_database system variables. The server sets these variables
whenever the default database changes. If there is no default
database, the variables have the same value as the corresponding
server-level system variables, character_set_server and
collation_server.
So one would assume the default for the collation-database is the same as the default for the collation-server variable.
Please check the following:
Is there any other config that would override your my.cnf, like /etc/mysql/mysql.cnf or ~/.my.cnf ?
The client (not server!) is setting its own collation upon startup, so you could set a client collation/encoding through [mysql] (not mysqld) or look if this is already set somewhere.
You do SHOW VARIABLES ... - this is querying SESSION based variables, try to query explicitly global settings through SHOW GLOBAL VARIABLES ...

Saving emoji characters in mysql database

I am trying to save some data on mysql database, input contains emoji characters like this : '\U0001f60a\U0001f48d' and I'm getting this error:
1366, "Incorrect string value: '\\xF0\\x9F\\x98\\x8A\\xF0\\x9F...' for column 'caption' at row 1"
I searched over net and read a lot of answers include these:
MySQL utf8mb4, Errors when saving Emojis or MySQL utf8mb4, Errors when saving Emojis or https://mathiasbynens.be/notes/mysql-utf8mb4#character-sets or http://www.java2s.com/Tutorial/MySQL/0080__Table/charactersetsystem.htm but nothing worked !!
I have different problems:
here is mydb info:
mysql> SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8_general_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
I tried to change character_set_server value to utf8mb4 by
mysql>SET character_set_server = utf8mb4
Query OK, 0 rows affected (0.00 sec)
But when restart mysqld everything revert !
I don't have any /etc/my.cnf file in also, and I edited /etc/mysql/my.cnf file instead.
What should I do?
How can I save emoji file in my database?
1st or 2nd line in source code (to have literals in the code utf8-encoded: # -- coding: utf-8 --
Your columns/tables need to be CHARACTER SET utf8mb4
The python package "MySQL-python" version needs to be at least 1.2.5 in order to handle utf8mb4.
self.query('SET NAMES utf8mb4') may be necessary.
Django needs client_encoding: 'UTF8' -- I don't know if that should be 'utf8mb4`.
References:
https://code.djangoproject.com/ticket/18392
http://mysql.rjweb.org/doc.php/charcoll#python

MySQL Incorrect string value error

In a Django application with MySQL DB back-end users try to insert notes which contain some smileys and hearts and stuff which are Unicode characters. MySQL refuses the operations with an error:
(1366, "Incorrect string value: '\\xE2\\x9D\\xA4\\xEF\\xB8\\x8F' for column 'note' at row 1")
(The column in question has longtext type. The Unicode characters in this case valid, it's a heart and a modifier https://codepoints.net/U+2764 https://codepoints.net/U+FE0F, so it's not that they would be 4 byte long UTF-8 characters. I made sure that MySQL's default character set is utf-8.)
What is interesting is that I cannot fully reproduce this error on my local developer environment. One particular difference is that it only emits a warning for that anomaly.
Update1:
This is still bothering to me:
mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name="sblive";
+----------------------------+
| default_character_set_name |
+----------------------------+
| latin1 |
+----------------------------+
1 row in set (0.00 sec)
I converted the specific table's charset to utf-8:
mysql> alter table uploads_uploads convert to character set utf8 COLLATE utf8_general_ci;
Query OK, 1209036 rows affected (1 min 10.31 sec)
Records: 1209036 Duplicates: 0 Warnings: 0
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = "sblive" AND table_name = "uploads_uploads" AND column_name = "note";
+--------------------+
| character_set_name |
+--------------------+
| utf8 |
+--------------------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
mysql> SHOW VARIABLES LIKE '%colla%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
You are asking for ❤️ followed by a "non-spacing" "VARIATION SELECTOR-16".
Your bytes are utf8 -- good
Your connection needs to specify utf8 -- does it?
Your TEXT column need to be declared CHARACTER SET utf8 -- is it? Use SHOW CREATE TABLE to verify.
If you are using HTML, it needs to say charset=UTF-8 -- does it?
Suggest you switch to utf8mb4 if the 'back-end users' are likely to enter more emoticons -- the 'Emoji' will need it.
Addenda
Let's check the data... Please run this
SELECT col, HEX(col) FROM ...
Those two character should deliver hex E29DA4 and EFB88F. If you see C3A2C29DC2A4C3AFC2B8C28F, you have "double encoding", which is a messier problem. 2764FE0F would indicate utf16, I think.

Why after executing set names utf8mb4, the column name changes to question mark?

Why after executing set names utf8mb4, the column name changes to question mark? See below:
mysql> show variables like 'character%' ;
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> select '\U+1F600';
+------+
| 😀 |
+------+
| 😀 |
+------+
mysql> set names utf8mb4;
mysql> select '\U+1F600';
+------+
| ? |
+------+
| 😀 |
+------+
In my opinion, utf8mb4 is designed to support these emoji characters. Why changed to utf8mb4, the column name changed to question mark?
In addition, I copied the emoji character from website(http://getemoji.com/) , then pasted it in terminal.If I just type '\U+1F600' manually. See below:
mysql> select '\U+1F600' ;
+---------+
| U+1F600 |
+---------+
| U+1F600 |
+---------+
So I guess when I pasted it in terminal there is something happened implicitly. And this implicitly conversion(😀 --> '\U+1F600') maybe could explain this phenomenpon.
This would appear to be expected behaviour according to MySQL documentation, where metadata is declared to be stored in utf8 (the non-4byte version).
It is returned to the client as character_set_result (utf8mb4), however most likely your virtual column name is being stored at utf8 to be compatible and comparable with all other metadata and thus the 4-byte part of the character is lost even though it is not in a real table.
See here:
https://dev.mysql.com/doc/refman/5.6/en/charset-metadata.html
I had found more info by using wireshark. See below:
Before executing set names utf8mb4
After executing set names utf8mb4
In this case the server can't find a Charset number, so the column name become a question mark. And it seems which Charset number does not matter, just need it is not Unknow. If I execute set names latin1, the response packet info is:

Retrieving latin1 encoded results with JDBC

I am trying to retrieve result sets from a MySQL database sing JDBC which is then used to generate reports in BiRT. The connection string is set up in BiRT.
The database is latin1:
SHOW VARIABLES LIKE 'c%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
| collation_connection | latin1_swedish_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
| completion_type | 0 |
| concurrent_insert | 1 |
| connect_timeout | 5 |
+--------------------------+----------------------------+
So I have been trying to correct the strange looking encoding results that are returned (German characters). I thought it would make sense to us the "characterSetResults" property to retrieve the result set as "latin1" like this:
jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=latin1&characterSetResults=latin1
This connection string fails and by deduction I have discovered that it is the property:
characterSetResults=latin1
is causing the connection to fail. The error is a long java error which means little to me. It starts with:
org.eclipse.birt.report.data.oda.jdbc.JDBCException: There is an error in get connection, Communications link failure
Last packet sent to the server was 38 ms ago..
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.doConnect(JDBCDriverManager.java:262)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.getConnection(JDBCDriverManager.java:186)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.tryCreateConnection(JDBCDriverManager.java:706)
at org.eclipse.birt.report.data.oda.jdbc.JDBCDriverManager.testConnection(JDBCDriverManager.java:634)
at org.eclipse.birt.report.data.oda.jdbc.ui.util.DriverLoader.testConnection(DriverLoader.java:120)
at org.eclipse.birt.report.data.oda.jdbc.ui.util.DriverLoader.testConnection(DriverLoader.java:133)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper.testConnection(JDBCSelectionPageHelper.java:687)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper.access$7(JDBCSelectionPageHelper.java:655)
at org.eclipse.birt.report.data.oda.jdbc.ui.profile.JDBCSelectionPageHelper$7.widgetSelected(JDBCSelectionPageHelper.java:578)
at org.eclipse.swt.widgets.TypedListener.handleEvent(TypedListener.java:234)
If I change this to:
characterSetResults=utf8
the connection string connects without errors, but the encoding issue remains.
Does anyone know the correct way to retrieve latin1? And yes, I know UTF8 is the thing to use, but this is not my database....
Thank you for reading this,
Stephen
After some digging, have you tried characterSetResults=ISO8859_1? This is equivalent to latin1 and there is evidence MySQL handles this much better.
I do not have a DB to test this on, but it looks form what I read to be spot-on for what you need.
When specifying character encodings on the client side, use Java-style names(Mysql connector-j-reference-charsets).So it is supposed to work by using jdbc:mysql://localhost:3306/statistics?useUnicode=true&characterEncoding=utf-8&characterSetResults=Cp1252