How to fix UTF-8 double encoded data on large MySQL database - mysql

I have approx 70 databases, each database contains 750+ tables (exact same structure), and lot of data stored but, the problem is only few databases set to utf8 and others are latin1, so latin1 database saved double encoded values like 接近åˆå ± for 接近初報
So i want to convert all my databases to utf8mb4 so it should save correct data, but this will obviously requires existing double encoded data to convert to utf8mb4
I have following sql query to convert data.
UPDATE table SET col = IFNULL(CONVERT(CONVERT(CONVERT(col USING latin1) USING binary) USING utf8), col )
But the problem is my databases are very large and this will take lot of time to convert data to utf8. so is there any easy way to update data for whole database in one go or something else which is easy?
Many thanks

You really should be using utf8mb4 for Chinese; some Chinese characters are not representable in MySQL's 3-byte utf8.
A slightly shorter expression:
CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)
Which case? see http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases -- You probably need the 3rd of these:
CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset:
First, lets assume you have this declaration for tbl.col:
col VARCHAR(111) CHARACTER SET latin1 NOT NULL
Then, to convert the column without changing the bytes:
ALTER TABLE tbl MODIFY COLUMN col VARBINARY(111) NOT NULL;
ALTER TABLE tbl MODIFY COLUMN col VARCHAR(111) CHARACTER SET utf8mb4 NOT NULL;
Note: If you start with TEXT, use BLOB as the intermediate definition. Since ALTER needs to know all the details (size, nullness, etc), it is quite messy to dynamically create the ALTERs.
CHARACTER SET utf8mb4 with double-encoding:
UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);
CHARACTER SET latin1 with double-encoding: Do the 2-step ALTER, then fix the double-encoding.
Going through the tables:
SELECT CONCAT("UPDATE ", table_schema, ".", table_name, "
SET ", column_name, " = CONVERT(BINARY(CONVERT(", column_name,
" USING latin1)) USING utf8mb4);")
FROM information_schema.columns
WHERE character_set_name = 'latin1';
Then copy & paste the output. (Or write a Stored Procedure to do the execute.)
Caveat: The SELECTs may pick more tables/columns than it should.

Related

changing the collation of information_schema itself [duplicate]

Am getting the below error when trying to do a select through a stored procedure in MySQL.
Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '='
Any idea on what might be going wrong here?
The collation of the table is latin1_general_ci and that of the column in the where clause is latin1_general_cs.
This is generally caused by comparing two strings of incompatible collation or by attempting to select data of different collation into a combined column.
The clause COLLATE allows you to specify the collation used in the query.
For example, the following WHERE clause will always give the error you posted:
WHERE 'A' COLLATE latin1_general_ci = 'A' COLLATE latin1_general_cs
Your solution is to specify a shared collation for the two columns within the query. Here is an example that uses the COLLATE clause:
SELECT * FROM table ORDER BY key COLLATE latin1_general_ci;
Another option is to use the BINARY operator:
BINARY str is the shorthand for CAST(str AS BINARY).
Your solution might look something like this:
SELECT * FROM table WHERE BINARY a = BINARY b;
or,
SELECT * FROM table ORDER BY BINARY a;
Please keep in mind that, as pointed out by Jacob Stamm in the comments, "casting columns to compare them will cause any indexing on that column to be ignored".
For much greater detail about this collation business, I highly recommend eggyal's excellent answer to this same question.
TL;DR
Either change the collation of one (or both) of the strings so that they match, or else add a COLLATE clause to your expression.
What is this "collation" stuff anyway?
As documented under Character Sets and Collations in General:
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.
Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encoding for “A”, and the combination of all four letters and their encodings is a character set.
Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation.
But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “a” and “b” as equivalent to “A” and “B”; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.
In real life, most character sets have many characters: not just “A” and “B” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German “Ö”), and for multiple-character mappings (such as the rule that “Ö” = “OE” in one of the two German collations).
Further examples are given under Examples of the Effect of Collation.
Okay, but how does MySQL decide which collation to use for a given expression?
As documented under Collation of Expressions:
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column charset_name:
SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;
However, with multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column x, or of the string literal 'Y'? Both x and 'Y' have collations, so which collation takes precedence?
Standard SQL resolves such questions using what used to be called “coercibility” rules.
[ deletia ]
MySQL uses coercibility values with the following rules to resolve ambiguities:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It returns a result that has a character set of utf8 and the same collation as utf8_column. Values of latin1_column are automatically converted to utf8 before concatenating.
For an operation with operands from the same character set but that mix a _bin collation and a _ci or _cs collation, the _bin collation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types.
So what is an "illegal mix of collations"?
An "illegal mix of collations" occurs when an expression compares two strings of different collations but of equal coercibility and the coercibility rules cannot help to resolve the conflict. It is the situation described under the third bullet-point in the above quotation.
The particular error given in the question, Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '=', tells us that there was an equality comparison between two non-Unicode strings of equal coercibility. It furthermore tells us that the collations were not given explicitly in the statement but rather were implied from the strings' sources (such as column metadata).
That's all very well, but how does one resolve such errors?
As the manual extracts quoted above suggest, this problem can be resolved in a number of ways, of which two are sensible and to be recommended:
Change the collation of one (or both) of the strings so that they match and there is no longer any ambiguity.
How this can be done depends upon from where the string has come: Literal expressions take the collation specified in the collation_connection system variable; values from tables take the collation specified in their column metadata.
Force one string to not be coercible.
I omitted the following quote from the above:
MySQL assigns coercibility values as follows:
An explicit COLLATE clause has a coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by functions such as USER() or VERSION()) has a coercibility of 3.
The collation of a literal has a coercibility of 4.
NULL or an expression that is derived from NULL has a coercibility of 5.
Thus simply adding a COLLATE clause to one of the strings used in the comparison will force use of that collation.
Whilst the others would be terribly bad practice if they were deployed merely to resolve this error:
Force one (or both) of the strings to have some other coercibility value so that one takes precedence.
Use of CONCAT() or CONCAT_WS() would result in a string with a coercibility of 1; and (if in a stored routine) use of parameters/local variables would result in strings with a coercibility of 2.
Change the encodings of one (or both) of the strings so that one is Unicode and the other is not.
This could be done via transcoding with CONVERT(expr USING transcoding_name); or via changing the underlying character set of the data (e.g. modifying the column, changing character_set_connection for literal values, or sending them from the client in a different encoding and changing character_set_client / adding a character set introducer). Note that changing encoding will lead to other problems if some desired characters cannot be encoded in the new character set.
Change the encodings of one (or both) of the strings so that they are both the same and change one string to use the relevant _bin collation.
Methods for changing encodings and collations have been detailed above. This approach would be of little use if one actually needs to apply more advanced collation rules than are offered by the _bin collation.
Adding my 2c to the discussion for future googlers.
I was investigating a similar issue where I got the following error when using custom functions that recieved a varchar parameter:
Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and
(utf8_general_ci,IMPLICIT) for operation '='
Using the following query:
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
I was able to tell that the DB was using utf8_general_ci, while the tables were defined using utf8_unicode_ci:
mysql> show table status;
+--------------+-----------------+
| Name | Collation |
+--------------+-----------------+
| my_view | NULL |
| my_table | utf8_unicode_ci |
...
Notice that the views have NULL collation. It appears that views and functions have collation definitions even though this query shows null for one view. The collation used is the DB collation that was defined when the view/function were created.
The sad solution was to both change the db collation and recreate the views/functions to force them to use the current collation.
Changing the db's collation:
ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;
Changing the table collation:
ALTER TABLE mydb CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
I hope this will help someone.
Sometimes it can be dangerous to convert charsets, specially on databases with huge amounts of data. I think the best option is to use the "binary" operator:
e.g : WHERE binary table1.column1 = binary table2.column1
I had a similar problem, was trying to use the FIND_IN_SET procedure with a string variable.
SET #my_var = 'string1,string2';
SELECT * from my_table WHERE FIND_IN_SET(column_name,#my_var);
and was receiving the error
Error Code: 1267. Illegal mix of collations (utf8_unicode_ci,IMPLICIT)
and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
Short answer:
No need to change any collation_YYYY variables, just add the correct collation next to your variable declaration, i.e.
SET #my_var = 'string1,string2' COLLATE utf8_unicode_ci;
SELECT * from my_table WHERE FIND_IN_SET(column_name,#my_var);
Long answer:
I first checked the collation variables:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
+----------------------+-----------------+
| collation_database | utf8_general_ci |
+----------------------+-----------------+
| collation_server | utf8_general_ci |
+----------------------+-----------------+
Then I checked the table collation:
mysql> SHOW CREATE TABLE my_table;
CREATE TABLE `my_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`column_name` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
This means that my variable was configured with the default collation of utf8_general_ci while my table was configured as utf8_unicode_ci.
By adding the COLLATE command next to the variable declaration, the variable collation matched the collation configured for the table.
Below solution worked for me.
CONVERT( Table1.FromColumn USING utf8) = CONVERT(Table2.ToColumn USING utf8)
Solution if literals are involved.
I am using Pentaho Data Integration and dont get to specify the sql syntax.
Using a very simple DB lookup gave the error
"Illegal mix of collations (cp850_general_ci,COERCIBLE) and (latin1_swedish_ci,COERCIBLE) for operation '='"
The generated code was
"SELECT DATA_DATE AS latest_DATA_DATE FROM hr_cc_normalised_data_date_v WHERE PSEUDO_KEY = ?"
Cutting the story short the lookup was to a view and when I issued
mysql> show full columns from hr_cc_normalised_data_date_v;
+------------+------------+-------------------+------+-----+
| Field | Type | Collation | Null | Key |
+------------+------------+-------------------+------+-----+
| PSEUDO_KEY | varchar(1) | cp850_general_ci | NO | |
| DATA_DATE | varchar(8) | latin1_general_cs | YES | |
+------------+------------+-------------------+------+-----+
which explains where the 'cp850_general_ci' comes from.
The view was simply created with 'SELECT 'X',......'
According to the manual literals like this should inherit their character set and collation from server settings which were correctly defined as 'latin1' and 'latin1_general_cs'
as this clearly did not happen I forced it in the creation of the view
CREATE OR REPLACE VIEW hr_cc_normalised_data_date_v AS
SELECT convert('X' using latin1) COLLATE latin1_general_cs AS PSEUDO_KEY
, DATA_DATE
FROM HR_COSTCENTRE_NORMALISED_mV
LIMIT 1;
now it shows latin1_general_cs for both columns and the error has gone away. :)
If the columns that you are having trouble with are "hashes", then consider the following...
If the "hash" is a binary string, you should really use BINARY(...) datatype.
If the "hash" is a hex string, you do not need utf8, and should avoid such because of character checks, etc. For example, MySQL's MD5(...) yields a fixed-length 32-byte hex string. SHA1(...) gives a 40-byte hex string. This could be stored into CHAR(32) CHARACTER SET ascii (or 40 for sha1).
Or, better yet, store UNHEX(MD5(...)) into BINARY(16). This cuts in half the size of the column. (It does, however, make it rather unprintable.) SELECT HEX(hash) ... if you want it readable.
Comparing two BINARY columns has no collation issues.
Very interesting... Now, be ready. I looked at all of the "add collate" solutions and to me, those are band aid fixes. The reality is the database design was "bad". Yes, standard changes and new things gets added, blah blah, but it does not change the bad database design fact. I refuse to go with the route of adding "collate" all over the SQL statements just to get my query to work. The only solution that works for me and will virtually eliminate the need to tweak my code in the future is to re-design the database/tables to match the character set that I will live with and embrace for the long term future. In this case, I choose to go with the character set "utf8mb4".
So the solution here when you encounter that "illegal" error message is to re-design your database and tables. It is much easier and quicker then it sounds. Exporting your data and re-importing it from a CSV may not even be required. Change the character set of the database and make sure all the character set of your tables matches.
Use these commands to guide you:
SHOW VARIABLES LIKE "collation_database";
SHOW TABLE STATUS;
Now, if you enjoy adding "collate" here and there and beef up your code with forces fulls "overrides", be my guess.
MySQL really dislikes mixing collations unless it can coerce them to the same one (which clearly is not feasible in your case). Can't you just force the same collation to be used via a COLLATE clause? (or the simpler BINARY shortcut if applicable...).
A possible solution is to convert the entire database to UTF8 (see also this question).
I used ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;, but didn't work.
In this query:
Select * from table1, table2 where table1.field = date_format(table2.field,'%H');
This work for me:
Select * from table1, table2 where concat(table1.field) = date_format(table2.field,'%H');
Yes, only a concat.
One another source of the issue with collations is mysql.proc table. Check collations of your storage procedures and functions:
SELECT
p.db, p.db_collation, p.type, COUNT(*) cnt
FROM mysql.proc p
GROUP BY p.db, p.db_collation, p.type;
Also pay attention to mysql.proc.collation_connection and mysql.proc.character_set_client columns.
If you have phpMyAdmin installed, you can follow the instructions given in the following link: https://mediatemple.net/community/products/dv/204403914/default-mysql-character-set-and-collation You have to match the collate of the database with that of all the tables, as well as the fields of the tables and then recompile all the stored procedures and functions. With that everything should work again.
I personnaly had this problem in a procedure.
If you dont want to alter table you can try to convert your parameter into the procedure .
I've try sevral use of collate (with a set into the select) but none works for me.
CONVERT(my_param USING utf32) did the trick.
In my case the default return type of a function was the type/collation from database (utf8mb4_general_ci) but database column was ascii.
WHERE ascii_col = md5(concat_ws(',', a,b,c))
Quick fix was
WHERE ascii_col = BINARY md5(concat_ws(',', a,b,c))
This code needs to be put inside Run SQL query/queries on database
SQL QUERY WINDOW
ALTER TABLE `table_name` CHANGE `column_name` `column_name` VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL;
Please replace table_name and column_name with appropriate name.

mysql ERROR 1270 in mysql.exe command line for operation 'replace' [duplicate]

Am getting the below error when trying to do a select through a stored procedure in MySQL.
Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '='
Any idea on what might be going wrong here?
The collation of the table is latin1_general_ci and that of the column in the where clause is latin1_general_cs.
This is generally caused by comparing two strings of incompatible collation or by attempting to select data of different collation into a combined column.
The clause COLLATE allows you to specify the collation used in the query.
For example, the following WHERE clause will always give the error you posted:
WHERE 'A' COLLATE latin1_general_ci = 'A' COLLATE latin1_general_cs
Your solution is to specify a shared collation for the two columns within the query. Here is an example that uses the COLLATE clause:
SELECT * FROM table ORDER BY key COLLATE latin1_general_ci;
Another option is to use the BINARY operator:
BINARY str is the shorthand for CAST(str AS BINARY).
Your solution might look something like this:
SELECT * FROM table WHERE BINARY a = BINARY b;
or,
SELECT * FROM table ORDER BY BINARY a;
Please keep in mind that, as pointed out by Jacob Stamm in the comments, "casting columns to compare them will cause any indexing on that column to be ignored".
For much greater detail about this collation business, I highly recommend eggyal's excellent answer to this same question.
TL;DR
Either change the collation of one (or both) of the strings so that they match, or else add a COLLATE clause to your expression.
What is this "collation" stuff anyway?
As documented under Character Sets and Collations in General:
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.
Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encoding for “A”, and the combination of all four letters and their encodings is a character set.
Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation.
But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “a” and “b” as equivalent to “A” and “B”; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.
In real life, most character sets have many characters: not just “A” and “B” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German “Ö”), and for multiple-character mappings (such as the rule that “Ö” = “OE” in one of the two German collations).
Further examples are given under Examples of the Effect of Collation.
Okay, but how does MySQL decide which collation to use for a given expression?
As documented under Collation of Expressions:
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column charset_name:
SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;
However, with multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column x, or of the string literal 'Y'? Both x and 'Y' have collations, so which collation takes precedence?
Standard SQL resolves such questions using what used to be called “coercibility” rules.
[ deletia ]
MySQL uses coercibility values with the following rules to resolve ambiguities:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It returns a result that has a character set of utf8 and the same collation as utf8_column. Values of latin1_column are automatically converted to utf8 before concatenating.
For an operation with operands from the same character set but that mix a _bin collation and a _ci or _cs collation, the _bin collation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types.
So what is an "illegal mix of collations"?
An "illegal mix of collations" occurs when an expression compares two strings of different collations but of equal coercibility and the coercibility rules cannot help to resolve the conflict. It is the situation described under the third bullet-point in the above quotation.
The particular error given in the question, Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '=', tells us that there was an equality comparison between two non-Unicode strings of equal coercibility. It furthermore tells us that the collations were not given explicitly in the statement but rather were implied from the strings' sources (such as column metadata).
That's all very well, but how does one resolve such errors?
As the manual extracts quoted above suggest, this problem can be resolved in a number of ways, of which two are sensible and to be recommended:
Change the collation of one (or both) of the strings so that they match and there is no longer any ambiguity.
How this can be done depends upon from where the string has come: Literal expressions take the collation specified in the collation_connection system variable; values from tables take the collation specified in their column metadata.
Force one string to not be coercible.
I omitted the following quote from the above:
MySQL assigns coercibility values as follows:
An explicit COLLATE clause has a coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by functions such as USER() or VERSION()) has a coercibility of 3.
The collation of a literal has a coercibility of 4.
NULL or an expression that is derived from NULL has a coercibility of 5.
Thus simply adding a COLLATE clause to one of the strings used in the comparison will force use of that collation.
Whilst the others would be terribly bad practice if they were deployed merely to resolve this error:
Force one (or both) of the strings to have some other coercibility value so that one takes precedence.
Use of CONCAT() or CONCAT_WS() would result in a string with a coercibility of 1; and (if in a stored routine) use of parameters/local variables would result in strings with a coercibility of 2.
Change the encodings of one (or both) of the strings so that one is Unicode and the other is not.
This could be done via transcoding with CONVERT(expr USING transcoding_name); or via changing the underlying character set of the data (e.g. modifying the column, changing character_set_connection for literal values, or sending them from the client in a different encoding and changing character_set_client / adding a character set introducer). Note that changing encoding will lead to other problems if some desired characters cannot be encoded in the new character set.
Change the encodings of one (or both) of the strings so that they are both the same and change one string to use the relevant _bin collation.
Methods for changing encodings and collations have been detailed above. This approach would be of little use if one actually needs to apply more advanced collation rules than are offered by the _bin collation.
Adding my 2c to the discussion for future googlers.
I was investigating a similar issue where I got the following error when using custom functions that recieved a varchar parameter:
Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and
(utf8_general_ci,IMPLICIT) for operation '='
Using the following query:
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
I was able to tell that the DB was using utf8_general_ci, while the tables were defined using utf8_unicode_ci:
mysql> show table status;
+--------------+-----------------+
| Name | Collation |
+--------------+-----------------+
| my_view | NULL |
| my_table | utf8_unicode_ci |
...
Notice that the views have NULL collation. It appears that views and functions have collation definitions even though this query shows null for one view. The collation used is the DB collation that was defined when the view/function were created.
The sad solution was to both change the db collation and recreate the views/functions to force them to use the current collation.
Changing the db's collation:
ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;
Changing the table collation:
ALTER TABLE mydb CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
I hope this will help someone.
Sometimes it can be dangerous to convert charsets, specially on databases with huge amounts of data. I think the best option is to use the "binary" operator:
e.g : WHERE binary table1.column1 = binary table2.column1
I had a similar problem, was trying to use the FIND_IN_SET procedure with a string variable.
SET #my_var = 'string1,string2';
SELECT * from my_table WHERE FIND_IN_SET(column_name,#my_var);
and was receiving the error
Error Code: 1267. Illegal mix of collations (utf8_unicode_ci,IMPLICIT)
and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
Short answer:
No need to change any collation_YYYY variables, just add the correct collation next to your variable declaration, i.e.
SET #my_var = 'string1,string2' COLLATE utf8_unicode_ci;
SELECT * from my_table WHERE FIND_IN_SET(column_name,#my_var);
Long answer:
I first checked the collation variables:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
+----------------------+-----------------+
| collation_database | utf8_general_ci |
+----------------------+-----------------+
| collation_server | utf8_general_ci |
+----------------------+-----------------+
Then I checked the table collation:
mysql> SHOW CREATE TABLE my_table;
CREATE TABLE `my_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`column_name` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
This means that my variable was configured with the default collation of utf8_general_ci while my table was configured as utf8_unicode_ci.
By adding the COLLATE command next to the variable declaration, the variable collation matched the collation configured for the table.
Below solution worked for me.
CONVERT( Table1.FromColumn USING utf8) = CONVERT(Table2.ToColumn USING utf8)
Solution if literals are involved.
I am using Pentaho Data Integration and dont get to specify the sql syntax.
Using a very simple DB lookup gave the error
"Illegal mix of collations (cp850_general_ci,COERCIBLE) and (latin1_swedish_ci,COERCIBLE) for operation '='"
The generated code was
"SELECT DATA_DATE AS latest_DATA_DATE FROM hr_cc_normalised_data_date_v WHERE PSEUDO_KEY = ?"
Cutting the story short the lookup was to a view and when I issued
mysql> show full columns from hr_cc_normalised_data_date_v;
+------------+------------+-------------------+------+-----+
| Field | Type | Collation | Null | Key |
+------------+------------+-------------------+------+-----+
| PSEUDO_KEY | varchar(1) | cp850_general_ci | NO | |
| DATA_DATE | varchar(8) | latin1_general_cs | YES | |
+------------+------------+-------------------+------+-----+
which explains where the 'cp850_general_ci' comes from.
The view was simply created with 'SELECT 'X',......'
According to the manual literals like this should inherit their character set and collation from server settings which were correctly defined as 'latin1' and 'latin1_general_cs'
as this clearly did not happen I forced it in the creation of the view
CREATE OR REPLACE VIEW hr_cc_normalised_data_date_v AS
SELECT convert('X' using latin1) COLLATE latin1_general_cs AS PSEUDO_KEY
, DATA_DATE
FROM HR_COSTCENTRE_NORMALISED_mV
LIMIT 1;
now it shows latin1_general_cs for both columns and the error has gone away. :)
If the columns that you are having trouble with are "hashes", then consider the following...
If the "hash" is a binary string, you should really use BINARY(...) datatype.
If the "hash" is a hex string, you do not need utf8, and should avoid such because of character checks, etc. For example, MySQL's MD5(...) yields a fixed-length 32-byte hex string. SHA1(...) gives a 40-byte hex string. This could be stored into CHAR(32) CHARACTER SET ascii (or 40 for sha1).
Or, better yet, store UNHEX(MD5(...)) into BINARY(16). This cuts in half the size of the column. (It does, however, make it rather unprintable.) SELECT HEX(hash) ... if you want it readable.
Comparing two BINARY columns has no collation issues.
Very interesting... Now, be ready. I looked at all of the "add collate" solutions and to me, those are band aid fixes. The reality is the database design was "bad". Yes, standard changes and new things gets added, blah blah, but it does not change the bad database design fact. I refuse to go with the route of adding "collate" all over the SQL statements just to get my query to work. The only solution that works for me and will virtually eliminate the need to tweak my code in the future is to re-design the database/tables to match the character set that I will live with and embrace for the long term future. In this case, I choose to go with the character set "utf8mb4".
So the solution here when you encounter that "illegal" error message is to re-design your database and tables. It is much easier and quicker then it sounds. Exporting your data and re-importing it from a CSV may not even be required. Change the character set of the database and make sure all the character set of your tables matches.
Use these commands to guide you:
SHOW VARIABLES LIKE "collation_database";
SHOW TABLE STATUS;
Now, if you enjoy adding "collate" here and there and beef up your code with forces fulls "overrides", be my guess.
MySQL really dislikes mixing collations unless it can coerce them to the same one (which clearly is not feasible in your case). Can't you just force the same collation to be used via a COLLATE clause? (or the simpler BINARY shortcut if applicable...).
A possible solution is to convert the entire database to UTF8 (see also this question).
I used ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;, but didn't work.
In this query:
Select * from table1, table2 where table1.field = date_format(table2.field,'%H');
This work for me:
Select * from table1, table2 where concat(table1.field) = date_format(table2.field,'%H');
Yes, only a concat.
One another source of the issue with collations is mysql.proc table. Check collations of your storage procedures and functions:
SELECT
p.db, p.db_collation, p.type, COUNT(*) cnt
FROM mysql.proc p
GROUP BY p.db, p.db_collation, p.type;
Also pay attention to mysql.proc.collation_connection and mysql.proc.character_set_client columns.
If you have phpMyAdmin installed, you can follow the instructions given in the following link: https://mediatemple.net/community/products/dv/204403914/default-mysql-character-set-and-collation You have to match the collate of the database with that of all the tables, as well as the fields of the tables and then recompile all the stored procedures and functions. With that everything should work again.
I personnaly had this problem in a procedure.
If you dont want to alter table you can try to convert your parameter into the procedure .
I've try sevral use of collate (with a set into the select) but none works for me.
CONVERT(my_param USING utf32) did the trick.
In my case the default return type of a function was the type/collation from database (utf8mb4_general_ci) but database column was ascii.
WHERE ascii_col = md5(concat_ws(',', a,b,c))
Quick fix was
WHERE ascii_col = BINARY md5(concat_ws(',', a,b,c))
This code needs to be put inside Run SQL query/queries on database
SQL QUERY WINDOW
ALTER TABLE `table_name` CHANGE `column_name` `column_name` VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL;
Please replace table_name and column_name with appropriate name.

Illegal mix of collations error in mysql query

Is there any way to compare the generated range column in the mysql query ?
SELECT ue.bundle,ue.timestamp,b.id,bv.id as bundleVersionId,bv.start_date,bv.end_date, bv.type,ue.type from (
SELECT bundle,timestamp,tenant, case when Document_Id ='' then 'potrait'
WHEN Document_Id<>'' then 'persisted' end as type from uds_expanded ) ue
JOIN bundle b on b.name=ue.bundle join bundle_version bv on b.id=bv.bundle_id
WHERE ue.tenant='02306' and ue.timestamp >= bv.start_date and ue.timestamp <=bv.end_date and **ue.type=bv.type ;**
I am getting the following error when I try to compare the types
Error Code: 1267. Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation '=' 0.000 sec
Stick to one encoding/collation for your entire system. Right now you seem to be using UTF8 one place and latin1 in another place. Convert the latter to use UTF8 as well and you'll be good.
You can change the collation to UTF8 using
alter table <some_table> convert to character set utf8 collate utf8_general_ci;
I think sometimes the issue is we use different orm utilities to generate table and then we want to test queries either by mysql command line or MySql workbench, then this problem comes due to differences of table collation and the command line or app we use. simple way is to define your variables (ones used to test the query against table columns)
ex:
MySQL>
MySQL> set #testCode = 'test2' collate utf8_unicode_ci;
Query OK, 0 rows affected (0.00 sec)
MySQL> select * from test where code = #testCode;
full details
Be aware that the single columns can have their collation.
For example, Doctrine generates columns of VARCHAR type as CHARACTER SET utf8 COLLATE utf8_unicode_ci, and changing the table collation doesen't affect the single columns.
You can change the column's collation with this command:
ALTER TABLE `table`
CHANGE COLUMN `test` `test` VARCHAR(15) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
or in MySql Workbench interface-> right click on the table-> Alter Table and then in the interface click on a column and modify it.
Use ascii_bin where ever possible, it will match up with almost any collation.

How to setup MySQL to handle unicode diacriticals properly?

This is an odd puzzle, AFAIK utf8_bin should guarantee that every accent is stored in the database properly, i.e. without some strange conversion to ASCII. So I have such table with:
DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and yet when I try to compare/query/whatever such entries as "Krąków" and "Kraków" according to MySQL this is the same string.
Out of curiosity I also tried utf8_polish, and MySQL claims that for Polish guys "a" and "ą" do not make any difference.
So how to setup MySQL table, so I could store unicode strings safely, without losing accents and alike?
Server: MySQL 5.5 + openSUSE 11.4, client: Windows 7 + MySQL Workbench 5.2.
Update -- CREATE TABLE
CREATE TABLE `Cities` (
`city_Name` VARCHAR(145) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`city_Name`)
) DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Please note that I cannot set a different utf8_bin for column, because entire table is utf8_bin, so in effect collation for column is reset to default.
All credits of the solution go to bobince, so please upvote his comment to my question.
The solution to the problem is somewhat strange, and I would risk saying MySQL is broken in this regard.
So, let's say I created a table with utf8 and didn't do anything for column. Later I realize I need strict comparison of characters, so I change the collation for table AND columns to utf8_bin. Solved?
No, now MySQL sees this -- the table is indeed utf8_bin, but column is also utf8_bin, which means column uses the DEFAULT collation of the table. However MySQL does not realize that the previous default is not the same as current default. And thus comparison still does not work.
So you have to shake off that default for column, to some alien value out of scope of the collation "family" (in case of "utf8xxx" means no other "utf8xxx"). Once it is shaken off, and you see entry which does not say "default" at column collation, you can set utf8_bin, which now evaluates to default, but since we come from non-default collation, everything kicks in as expected.
Do not forget to apply the changes at each step.
The MySQL default charset and collation (which are server-wide but can be changed per connection) apply at the time a table is created. Changing the defaults after the table is created doesn't affect existing tables.
Character sets and collations are attributes of individual columns. They can be set from a table-wide default but they do belong to columns.
A charset of utf8 should be sufficient to allow all European languages to be represented correctly. You should definitely be able to store "a" and "ą" as two different characters.
A collation of utf8-bin yields a case and accented-character sensitive collation.
Here are some examples of the difference between text value and collation behavior. I'm using three sample strings: 'abcd', 'ĄBCD' , and 'ąbcd'. The last two have the A-ogonek letter.
This first example says that with utf8 character representation and utf8_general_ci collation, that the three strings each display as specified by the user, but that they compare equal. That's to be expected in a collation that doesn't distinguish between a and ą. That's a typical case insensitive collation, where all the variant characters are sorted equal to the character without any diacritical marks.
SET NAMES 'utf8' COLLATE 'utf8_general_ci';
SELECT 'abcd', 'ąbcd' , 'abcd' < 'ąbcd', 'abcd' = 'ąbcd';
false true
This next example shows that in the case-insensitive Polish-language collation, a comes before ą. I don't know Polish, but I suspect Polish telephone books have the As and the Ą's separated.
SET NAMES 'utf8' COLLATE 'utf8_polish_ci';
SELECT 'abcd', 'ĄBCD' , 'ąbcd', 'abcd' < 'ĄBCD', 'abcd' < 'ąbcd' , 'ąbcd' = 'ĄBCD'
true true true
This next example shows what happens with the utf8_bin collation.
SET NAMES 'utf8' COLLATE 'utf8_bin';
SELECT 'abcd', 'ĄBCD' , 'ąbcd', 'abcd' < 'ĄBCD', 'abcd' < 'ąbcd' , 'ąbcd' = 'ĄBCD'
true true false
There's one non-intuitive thing to notice in this case. 'abcd' < 'ĄBCD' is true (whereas 'abcd' < 'ABCD' with pure ASCII is false). That's a strange result if you're thinking linguistically. That's because the both A-ogonek characters have binary values in utf8 that are higher than all the abc and ABC characters. So: if you use the utf8-bin collation for ORDER BY operations, you'll get linguistically strange results.
You're saying that 'Krąków' and 'Kraków' compare equal, and that you're puzzled by that. They do compare equal when the collation in use is utf8_general_ci. But they don't with either utf8_bin or utf8_polish_ci. According to the Polish-language support in MySQL, these two spellings of the city's name are different.
As you design your application, you need to sort out how you want all this to work linguistically. Are 'Krąków' and 'Kraków' the same place? Are 'Ąaron' and 'Aaron' the same person? If so, you want utf8_general_ci.
You could consider altering the table you've shown like this:
ALTER TABLE Cities
MODIFY COLUMN city_Name
VARCHAR(145)
CHARACTER SET utf8
COLLATE utf8_general_ci
This will set the column in your table the way you want it.

Illegal mix of collations error in MySql

Just got this answer from a previous question and it works a treat!
SELECT username, (SUM(rating)/COUNT(*)) as TheAverage, Count(*) as TheCount
FROM ratings WHERE month='Aug' GROUP BY username HAVING TheCount > 4
ORDER BY TheAverage DESC, TheCount DESC
But when I stick this extra bit in it gives this error:
Documentation #1267 - Illegal mix of
collations
(latin1_swedish_ci,IMPLICIT) and
(latin1_general_ci,IMPLICIT) for
operation '='
SELECT username, (SUM(rating)/COUNT(*)) as TheAverage, Count(*) as TheCount FROM
ratings WHERE month='Aug'
**AND username IN (SELECT username FROM users WHERE gender =1)**
GROUP BY username HAVING TheCount > 4 ORDER BY TheAverage DESC, TheCount DESC
The table is:
id, username, rating, month
Here's how to check which columns are the wrong collation:
SELECT table_schema, table_name, column_name, character_set_name, collation_name
FROM information_schema.columns
WHERE collation_name = 'latin1_general_ci'
ORDER BY table_schema, table_name,ordinal_position;
And here's the query to fix it:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET latin1 COLLATE 'latin1_swedish_ci';
Link
Check the collation type of each table, and make sure that they have the same collation.
After that check also the collation type of each table field that you have use in operation.
I had encountered the same error, and that tricks works on me.
[MySQL]
In these (very rare) cases:
two tables that really need different collation types
values not coming from a table, but from an explicit enumeration, for instance:
SELECT 1 AS numbers UNION ALL SELECT 2 UNION ALL SELECT 3
you can compare the values between the different tables by using CAST or CONVERT:
CAST('my text' AS CHAR CHARACTER SET utf8)
CONVERT('my text' USING utf8)
See CONVERT and CAST documentation on MySQL website.
I was getting this same error on PhpMyadmin and did the solution indicated here which worked for me
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
Illegal mix of collations MySQL Error
Also I would recommend going with General instead of swedish since that one is default and not to use the language unless your application is using Swedish.
I think you should convert to utf8
--set utf8 for connection
SET collation_connection = 'utf8_general_ci'
--change CHARACTER SET of DB to utf8
ALTER DATABASE dbName CHARACTER SET utf8 COLLATE utf8_general_ci
--change CHARACTER SET of table to utf8
ALTER TABLE tableName CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
I also got same error, but in my case main problem was in where condition the parameter that i'm checking was having some unknown hidden character (+%A0)
When A0 convert I got 160 but 160 was out of the range of the character that db knows, that's why database cannot recognize it as character other thing is my table column is varchar
the solution that I did was I checked there is some characters like that and remove those before run the sql command
ex:- preg_replace('/\D/', '', $myParameter);
Check that your users.gender column is an INTEGER.
Try: alter table users convert to character set latin1 collate latin1_swedish_ci;
You need to change each column Collation from latin1_general_ci to latin1_swedish_ci
I got this same error inside a stored procedure, in the where clause. i discovered that the problem ocurred with a local declared variable, previously loaded by the same table/column.
I resolved it casting the data to single char type.
In short, this error is caused by MySQL trying to do an operation on two things which have different collation settings. If you make the settings match, the error will go away. Of course, you need to choose the right setting for your database, depending on what it is going to be used for.
Here's some good advice on choosing between two very common utf8 collations: What's the difference between utf8_general_ci and utf8_unicode_ci
If you are using phpMyAdmin you can do this systematically by working through the tables mentioned in your error message, and checking the collation type for each column. First you should check which is the overall collation setting for your database - phpMyAdmin can tell you this and change it if necessary. But each column in each table can have its own setting. Normally you will want all these to match.
In a small database this is easy enough to do by hand, and in any case if you read the error message in full it will usually point you to the right place. Don't forget to look at the 'structure' settings for columns with subtables in as well. When you find a collation that does not match you can change it using phpMyAdmin directly, no need to use the query window. Then try your operation again. If the error persists, keep looking!
The problem here mainly, just Cast the field like this cast(field as varchar) or cast(fields as date)
I had this problem not because I'm storing in different collations, but because my column type is JSON, which is binary.
Fixed it like this:
select table.field COLLATE utf8mb4_0900_ai_ci AS fieldName
Use ascii_bin where ever possible, it will match up with almost any collation.
A username seldom accepts special characters anyway.
If you want to avoid changing syntax to solve this problem, try this:
Update your MySQL to version 5.5 or greater.
This resolved the problem for me.
I have the same problem with collection warning for a field that is set from 0 to 1. All columns collections was the same. We try to change collections again but nothing fix this issue.
At the end we update the field to NULL and after that we update to 1 and this overcomes the collection problem.
Was getting Illegal mix of collations while creating a category in Bagisto. Running these commands (thank you #Quy Le) solved the issue for me:
--set utf8 for connection
SET collation_connection = 'utf8_general_ci'
--change CHARACTER SET of DB to utf8
ALTER DATABASE dbName CHARACTER SET utf8 COLLATE utf8_general_ci
--change category tables
ALTER TABLE categories CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
ALTER TABLE category_translations CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
In my case it was something strange. I read an api key from a file and then I send it to the server where a SQL query is made. The problem was the BOM character that the Windows notepad left, it was causing the error that says:
SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
I just removed it and everything worked like a charm
You need to set 'utf8' for all parameters in each Function. It's my case:
SELECT username, AVG(rating) as TheAverage, COUNT(*) as TheCount
FROM ratings
WHERE month='Aug'
AND username COLLATE latin1_general_ci IN
(
SELECT username
FROM users
WHERE gender = 1
)
GROUP BY
username
HAVING
TheCount > 4
ORDER BY
TheAverage DESC, TheCount DESC;
Make sure your version of MySQL supports subqueries (4.1+). Next, you could try rewriting your query to something like this:
SELECT ratings.username, (SUM(rating)/COUNT(*)) as TheAverage, Count(*) as TheCount FROM ratings, users
WHERE ratings.month='Aug' and ratings.username = users.username
AND users.gender = 1
GROUP BY ratings.username
HAVING TheCount > 4 ORDER BY TheAverage DESC, TheCount DESC