COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'binary'? - mysql

mysql> SELECT LOCATE("n", "München") COLLATE utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'binary'
How do I get rid of this error?
What I already tried (copy&paste):
$ mysql -u admin -p $DATABASE
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.69 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SELECT LOCATE("n", "München") COLLATE utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'binary'
mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT LOCATE("n", "München") COLLATE utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'binary'
mysql> SELECT LOCATE(_utf8"n", _utf8"München") COLLATE utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'binary'
mysql> SHOW VARIABLES LIKE "character_set_database";
+------------------------+-------+
| Variable_name | Value |
+------------------------+-------+
| character_set_database | utf8 |
+------------------------+-------+
1 row in set (0.00 sec)

Possibly the server has been compiled with a default character set of binary, so that string literals are being interpreted as such, or the client is set to use a binary mode when communicating with the server. You can change the client and connection character set by calling SET NAMES utf8 (though this is not recommended if your SQL statements are being issued from PHP, for example, as PHP will have its own commands for setting the connection character set). See Connection Character Sets and Collations in the MySQL reference manual.
Alternatively you can use "introducers" to specify explicitly the charset used for the string literals in your LOCATE function, for instance:
LOCATE(_utf8"n", _utf8"München")
See the reference manual page Character String Literal Character Set and Collation for more details.

The COLLATE in my example sets the collation of the return value of
LOCATE, the result of which is of type binary.
To set the collation of the arguments:
mysql> SELECT LOCATE(_utf8"n" COLLATE utf8_general_ci,
_utf8"München" COLLATE utf8_general_ci) AS locate;
+--------+
| locate |
+--------+
| 3 |
+--------+
1 row in set (0.00 sec)
My motivation actually was finding out whether MySQL takes the collation
into account when searching for the substring. Unfortunately it does
not. See the result of the second command:
mysql> SELECT LOCATE(_utf8"ü" COLLATE utf8_general_ci,
_utf8"München" COLLATE utf8_general_ci) AS locate;
+--------+
| locate |
+--------+
| 2 |
+--------+
1 row in set (0.00 sec)
mysql> SELECT LOCATE(_utf8"u" COLLATE utf8_general_ci,
_utf8"München" COLLATE utf8_general_ci) AS locate;
+--------+
| locate |
+--------+
| 0 |
+--------+
1 row in set (0.00 sec)
Test with a temporary table (collation taken into account in WHERE clause, but not in
LOCATE):
mysql> CREATE TEMPORARY TABLE test
(text VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO test VALUES("München");
Query OK, 1 row affected (0.00 sec)
mysql> SELECT text FROM test WHERE text LIKE "%u%";
+---------+
| text |
+---------+
| München |
+---------+
1 row in set (0.00 sec)
mysql> SELECT LOCATE("u", text) AS locate FROM test WHERE text LIKE "%u%";
+--------+
| locate |
+--------+
| 0 |
+--------+
1 row in set (0.01 sec)

I know this is late, but I hope it helps someone. I kept getting the same error and I knew my charsets and collations were fine.
Check for '#' symbols in your statement that don't belong. I was testing my stored procedure out as a select statement with variables, then when creating the stored proc forgot to remove the '#' symbols. Needless to say, I felt very silly.
I also know this doesn't seem to be the case in this question but this is my first SO post and I don't have enough rep to do much else, so I apologize.

Related

Mariadb query utf8 escaped string

I am using 5.5.65-MariaDB MariaDB Server.
I have a table with a column of type medium text, named "remoteData", where I store a json string.
String values in this json string are stored as escaped utf8 sequences, for example
"patientFirstName":"\u0395\u039b\u0395\u03a5\u0398\u0395\u03a1\u0399\u039f\u03a3"
The above value is the Greek Name "ΕΛΕΥΘΕΡΙΟΣ".
I am trying to search this column using the query
Select * from sync_details where remoteData like "%ΛΕΥΘΕΡ%"
but I get an empty set.
I assume this is because of the values being escaped but I don't know what to do.
EDIT: The query will run through php so we can use a solution that includes php functions.
Thank you in advance.
Christoforos
With a database defined to use CHARACTER SET utf8and a utf8_general_ci collation it should just work like this:
CREATE DATABASE IF NOT EXISTS `test` CHARACTER SET utf8 COLLATE utf8_general_ci;
CREATE TABLE `test`.`sync_details` (`remoteData` MEDIUMTEXT);
INSERT INTO `test`.`sync_details` (`remoteData`) VALUES ('{"patientFirstName":"\\u0395\\u039b\\u0395\\u03a5\\u0398\\u0395\\u03a1\\u0399\\u039f\\u03a3"}');
SELECT `remoteData` FROM `test`.`sync_details` WHERE `remoteData` LIKE '%ΛΕΥΘΕΡ%';
+----------------------------------------------+
| remoteData |
+----------------------------------------------+
| {"patientFirstName": "ΕΛΕΥΘΕΡΙΟΣ"} |
+----------------------------------------------+
1 row in set (0,00 sec)
You could also try JSON_EXTRACT to get structured data from the stored JSON object. I just tested it like this:
SELECT JSON_EXTRACT(`remoteData`, "$.patientFirstName")
FROM `test`.`sync_details`
WHERE JSON_EXTRACT(`remoteData`, "$.patientFirstName")
LIKE '%ΛΕΥΘΕΡ%';
+--------------------------------------------------+
| JSON_EXTRACT(`remoteData`, "$.patientFirstName") |
+--------------------------------------------------+
| "ΕΛΕΥΘΕΡΙΟΣ" |
+--------------------------------------------------+
1 row in set (0,00 sec)
To index data in the JSON object you could add a "Generated Column" to your table using the GENERATED ALWAYS syntax
ALTER TABLE `test`.`sync_details` ADD COLUMN `firstName` VARCHAR(100) GENERATED ALWAYS AS (`remoteData` ->> '$.patientFirstName');
CREATE INDEX `firstnames_idx` ON `test`.`sync_details`(`firstName`);
SELECT `firstName` FROM `test`.`sync_details` WHERE `firstName` LIKE '%ΛΕΥΘΕΡ%';
+----------------------+
| firstName |
+----------------------+
| ΕΛΕΥΘΕΡΙΟΣ |
+----------------------+
1 row in set (0,00 sec)
This will only work with MariaDB >= 10.2 and with a utf8 encoded db and a utf8_general_ci collation.

How to change character encoding for column in mysql table

I have a table with data already in it. I would like to change the character encoding for one of the columns. Currently the column seems to have two encodings. Even after changing it, I see the same results.
Current Encoding
mysql> SELECT character_set_name FROM information_schema.`COLUMNS`
-> WHERE table_name = "mytable"
-> AND column_name = "my_col";
+--------------------+
| character_set_name |
+--------------------+
| latin1 |
| utf8 |
+--------------------+
2 rows in set (0.02 sec)
Changing the encoding (0 rows are affected)
mysql> ALTER TABLE mytable MODIFY my_col LONGTEXT CHARACTER SET utf8;
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
You probably have 2 rows because it is two different tables in two different databases.
Do SELECT * ... instead of SELECT character_set_name ....
ALTER TABLE mytable MODIFY my_col LONGTEXT CHARACTER SET utf8; is safe only if there are no values in mytable.my_col yet.
A table declared to be latin1, and containing latin1 bytes can be converted to utf8 via
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8;

How do I insert chinese characters into MySQL from a sript?

I have the following script
set username utf8;
insert into tables values ('Active','活跃')
However after the script run, the inserted value for the chinese character is
活跃
What did I miss here ?
so that you have to change Collation to Chinese.
You can change that by here(below image)
CREATE TABLE big5 (BIG5 CHAR(1) CHARACTER SET BIG5);
mysql> INSERT INTO big5 VALUES (0xf9dc);
mysql> SELECT * FROM big5;
+------+
| big5 |
+------+
| 嫺 |
+------+
MySQL Big5 Chinese character set
Read this as well
In database,Also set collation of fields to utf8_general_ci. If you can set collation of databas and tables to utg8_general_ci as well, that will even be better.

utf8mb4 characters not surviving "LOAD DATA INFILE"

I have a csv file containing some characters that lie outside Unicode BMP, for example the character 🀀. They are SMP characters, so they need to be stored in utf8mb4 charset and utf8mb4_general_ci collation in MySQL instead of utf8 charset and utf8_general_ci collation.
So here are my SQL queries.
MariaDB [tweets]> set names 'utf8mb4';
Query OK, 0 rows affected (0.01 sec)
MariaDB [tweets]> create table test (a text) collate utf8mb4_general_ci;
Query OK, 0 rows affected (0.06 sec)
MariaDB [tweets]> insert into test (a) values ('🀀');
Query OK, 1 row affected (0.03 sec)
MariaDB [tweets]> select * from test;
+------+
| a |
+------+
| 🀀 |
+------+
1 row in set (0.00 sec)
No warnings. Everything is right. Now I want to load that csv file. For test, the file has only one line.
MariaDB [tweets]> load data local infile 't.csv' into table wzyboy character set utf8mb4 fields terminated by ',' enclosed by '"' lines terminated by '\n\n' (tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,timestamp,source,text,expanded_urls);
Query OK, 1 row affected, 7 warnings (0.01 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 7
The warning message is:
| Warning | 1366 | Incorrect string value: '\xF0\x9F\x80\x80' for column 'text' at row 1 |
All my working environments (OS, Termianl, etc) use UTF-8. I have specified utf8mb4 in everyplace I could think up of, and if I manually INSERT INTO it works just fine. However, when I use LOAD DATA INFILE [...] CHARACTER SET utf8mb4 [...] it just fails with error "Incorrect string value".
Problem solved.
It was a mistake. During the experiment, I just TRUNCATE TABLE but not re-create it. So the database and the table are both utf8mb4, but the columns are still utf8...

MySQL LIKE operator with wildcard and backslash

It's frustrated with MySQL's pattern escaping used in LIKE operator.
root#dev> create table foo(name varchar(255));
Query OK, 0 rows affected (0.02 sec)
root#dev> insert into foo values('with\\slash');
Query OK, 1 row affected (0.00 sec)
root#dev> insert into foo values('\\slash');
Query OK, 1 row affected (0.00 sec)
root#dev> select * from foo where name like '%\\\\%';
Empty set (0.01 sec)
root#dev> select * from foo;
+------------+
| name |
+------------+
| with\slash |
| \slash |
+------------+
2 rows in set (0.00 sec)
root#dev> select * from foo where name like '%\\\\%';
Empty set (0.00 sec)
root#dev> select * from foo where name like binary '%\\\\%';
+------------+
| name |
+------------+
| with\slash |
| \slash |
+------------+
2 rows in set (0.00 sec)
According to MySQL docs: http://dev.mysql.com/doc/refman/5.5/en/string-comparison-functions.html#operator_like
%\\\\% is the right operand, but why it yields no result?
EDIT:
The database I'm testing that in has character_set_database set to utf8. To further my investigation, I created the same setup in a database that has character_set_database set to latin1, and guess what, '%\\\\%' works!
EDIT:
The problem can be reproduced and it's the field collation problem. Details: http://bugs.mysql.com/bug.php?id=63829
In MySQL 5.6.10, with the text field collation utf8mb4_unicode_520_ci this can be achieved by using 5 backslash characters instead of 4, i.e:
select * from foo where name like binary '%\\\\\%';
Somehow, against all expectations, this properly finds all rows with backslashes.
At least this should work until the MySQL field collation bug above is fixed. Considering it's been more than 5 years since the bug is discovered, any app designed with this may outlive its usefulness before MySQL is even fixed - so should be a pretty reliable workaround.
With MySQL 5.0.12 dev on Windows 10 I got the following results when I changed the query from
SELECT * FROM `foo` WHERE `name` LIKE '%http:\/\/%'
to
SELECT * FROM `foo` WHERE `name` LIKE '%http:\\\\\\\%'
it works and yet the first string with forward slashes was the original field content. It seems to have interpreted forward slashes as backslashes.
It seems it has some relation to that MySQL bug: http://bugs.mysql.com/bug.php?id=46659
I think you connect to mysql not specifying correct --character-set-server option (which defaults to latin1 with collation latin1_swedish_ci), and having utf-8 as the current charset of the console. That causes incorrect char conversions and comparisons when you deal with data which supposed to be converted to the utf8 from the charset of --character-set-server.