MySQL 5.6 create view with unicode character set - mysql

MySQL 5.6. I can't get a string constant within a view to populate correctly against a database with default UCS2 character set. Works fine on 5.7.
I've created a minimally reproducible example, below.
DROP SCHEMA IF EXISTS test3;
CREATE SCHEMA test3 CHARACTER SET ucs2;
CONNECT test3;
CREATE TABLE testtable (
testname VARCHAR(15)
);
INSERT INTO testTable( testname ) VALUES ('foo');
INSERT INTO testTable( testname ) VALUES ('bar');
CREATE OR REPLACE VIEW testview AS
SELECT * FROM testtable
WHERE testname = 'foo';
SELECT * FROM testview;
^^^ This select statement returns no results.
MySQL [test3]> show create view testview \G
*************************** 1. row ***************************
View: testview
Create View: CREATE ALGORITHM=UNDEFINED DEFINER=`root`#`localhost`
SQL SECURITY DEFINER VIEW `testview` AS select `testtable`.`testname` AS
`testname` from `testtable` where (`testtable`.`testname` = '\0\0\0f\0\0\0o\0\0\0o')
character_set_client: utf8
collation_connection: utf8_general_ci
What is that, utf32??
The following does work, but I don't want to write the collation directly into the statement, as this needs to be portable code and the syntax looks non-standard:
CREATE OR REPLACE VIEW testview AS
SELECT * FROM testtable
WHERE testname = 'foo' COLLATE utf8_general_ci;
I have tried setting the client, connection, and server character sets to ucs2 and utf16 but this changed nothing. Likewise with the collations to *_general_ci.
Any ideas?
Edit:
MySQL [test3]> show variables like "char%";
+--------------------------+------------------------------------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | ucs2 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | C:\Program Files\MySQL\mysql-5.6.36-winx64\share\charsets\ |
+--------------------------+------------------------------------------------------------+

There is essentially no reason to ever use usc2 or utf16 or utf32 in MySQL tables. Use utf8mb4 only. (Or utf8 if you have an old version of MySQL.)
Please provide SHOW VARIABLES LIKE "char%"; Certain things should not be changed:
mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary | <--
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 | <--
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
When you created the view, you did not set the charset. I can see that from your SHOW when it said:
character_set_client: utf8

Related

Does mysql latin1 also support emoji character?

Now because below phenomenon I feel I totally do not understand character set. At first I think only utf8mb4 support Emoji character e.g. 😀.
See below:
As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters
But accidentally I found this phenomenon,see below:
mysql> show variables like 'character%';
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> show create table t4\G
*************************** 1. row ***************************
Table: t4
Create Table: CREATE TABLE `t4` (
`data` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into t4 select '\U+1F600';
mysql> select * from t4;
+------+
| data |
+------+
| 😀 |
+------+
Now I'm very confused, it seems latin1 also could support emoji character. I know it must be an illusion, but I don't know how to clear it?
You cannot store anything other than iso-8859-1 characters into an latin1 field without converting it to e.g. base64
It might work, but will fail later at some point. In special having multibyte characters like emoticons.

Why after executing set names utf8mb4, the column name changes to question mark?

Why after executing set names utf8mb4, the column name changes to question mark? See below:
mysql> show variables like 'character%' ;
+--------------------------+---------------------------------------+
| Variable_name | Value |
+--------------------------+---------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/mysql/server-5.6/share/charsets/ |
+--------------------------+---------------------------------------+
mysql> select '\U+1F600';
+------+
| 😀 |
+------+
| 😀 |
+------+
mysql> set names utf8mb4;
mysql> select '\U+1F600';
+------+
| ? |
+------+
| 😀 |
+------+
In my opinion, utf8mb4 is designed to support these emoji characters. Why changed to utf8mb4, the column name changed to question mark?
In addition, I copied the emoji character from website(http://getemoji.com/) , then pasted it in terminal.If I just type '\U+1F600' manually. See below:
mysql> select '\U+1F600' ;
+---------+
| U+1F600 |
+---------+
| U+1F600 |
+---------+
So I guess when I pasted it in terminal there is something happened implicitly. And this implicitly conversion(😀 --> '\U+1F600') maybe could explain this phenomenpon.
This would appear to be expected behaviour according to MySQL documentation, where metadata is declared to be stored in utf8 (the non-4byte version).
It is returned to the client as character_set_result (utf8mb4), however most likely your virtual column name is being stored at utf8 to be compatible and comparable with all other metadata and thus the 4-byte part of the character is lost even though it is not in a real table.
See here:
https://dev.mysql.com/doc/refman/5.6/en/charset-metadata.html
I had found more info by using wireshark. See below:
Before executing set names utf8mb4
After executing set names utf8mb4
In this case the server can't find a Charset number, so the column name become a question mark. And it seems which Charset number does not matter, just need it is not Unknow. If I execute set names latin1, the response packet info is:

MySQL UTF8 Issue

Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.

COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

I am trying to fix a character encoding issue - previously we had the collation set for this column utf8_general_ci which caused issues because it is accent insensitive..
I'm trying to find all the entries in the database that could have been affected.
set names utf8;
select * from table1 t1 join table2 t2 on (t1.pid=t2.pid and t1.id != t2.id) collate utf8_general_ci;
However, this generates the error:
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'
The database is now defined with DEFAULT CHARACTER SET utf8
The table is defined with CHARSET=utf8
The "pid" column is defined with: CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
The server version is Server version: 5.5.37-MariaDB-0ubuntu0.14.04.1 (Ubuntu)
Question: Why am I getting an error about latin1 when latin1 doesn't seem to be present anywhere in the table / schema definition?
MariaDB [(none)]> SHOW VARIABLES LIKE '%char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
MariaDB [(none)]> SHOW VARIABLES LIKE '%collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | latin1_swedish_ci |
+----------------------+-------------------+
First, run this query:
SHOW VARIABLES LIKE '%char%';
You have character_set_server='latin1' shown in your post ...
So, go into your my.cnf and add or uncomment these lines:
character-set-server = utf8
collation-server = utf8_unicode_ci
Restart the server.
The same error is produced in MariaDB (10.1.36-MariaDB) by using the combination of parenthesis and the COLLATE statement. My SQL was different, the error was the same, I had:
SELECT *
FROM table1
WHERE (field = 'STRING') COLLATE utf8_bin;
Omitting the parenthesis was solving it for me.
SELECT *
FROM table1
WHERE field = 'STRING' COLLATE utf8_bin;
In my case I created a database and gave the collation 'utf8_general_ci' but the required collation was 'latin1'. After changing my collation type to latin1_bin the error was gone.

Rails show question marks(????) for my input utf8 data

I have set every encoding set variable I can figure out to utf8.
In database.yml:
development: &development
adapter: mysql2
encoding: utf8
In my.cnf:
[client]
default-character-set = utf8
[mysqld]
default-character-set = utf8
skip-character-set-client-handshake
character-set-server = utf8
collation-server = utf8_general_ci
init-connect = SET NAMES utf8
And if I run mysql client in terminal:
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
But it's to beat the air. When I insert utf8 data from Rails app, it finally becomes ????????????.
What do I miss?
Check not global settings but when you are connected to specific database for application. When you changed settings for mysql you have also change settings for your app database.
Simple way to check it is to log to mysql into app db:
mysql app_db_production -u db_user -p
or rails command:
rails dbconsole production
For my app it looks like this:
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
mysql> show variables like 'collation%';
+----------------------+-------------------+
| Variable_name | Value |
+----------------------+-------------------+
| collation_connection | utf8_general_ci |
| collation_database | latin1_swedish_ci |
| collation_server | utf8_general_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)
Command for changing database collation and charset:
mysql> alter database app_db_production CHARACTER SET utf8 COLLATE utf8_general_ci ;
Query OK, 1 row affected (0.00 sec)
And remeber to change charset and collation for all your tables:
ALTER TABLE tablename CHARACTER SET utf8 COLLATE utf8_general_ci; # changes for new records
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; # migrates old records
Now it should work.
I had the same problem. I added characterEncoding to the end of mysql connection string:
use this: jdbc:mysql://localhost/dbname?characterEncoding=utf8
instead of this: jdbc:mysql://localhost/dbname
Okay for anybody else for whom the #Ravbaker answer does not cut it .. some more tips
MySQL has encoding specified in multiple levels : server, database, connection, table and even field/column. My problem was that the field/column was forced to latin (which over rides all the other encodings). I set the field back to the table encoding (which was utf-8) and the world was good again.
Most of these settings can be set at the usual places: my.cnf, alter queries and rails database.yml file.
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
was the query which did the trick for me.
For server / connection encodings use my.cnf and database.yml
For database / table / column encodings use queries
(You can also achieve these by other means)
Do you have this in the HTML?
<meta http-equiv="content-type" content="text/html;charset=UTF-8" />
or on HTML5 pages with <!doctype html>
<meta charset="utf-8">
You may need this to let the browser send strings in utf8.
I have some problem today! It's solved by drop my table and creating new, then db:migrate and all is pretty works!
WARNING: IT WILL DELETE ALL YOUR DATA IN THIS TABLE
So:
$ mysql -u USER -p
mysql > drop database YOURDB_NAME_development;
mysql > create database YOURDB_NAME_development CHARACTER SET utf8 COLLATE utf8_general_ci;
mysql > \q
$ rake db:migrate
Well done!