How to handle umlauts in a MySQL `WHERE` clause? - mysql

My Table collation is utf8_general_ci and character set is utf8. If I run a query like the following, it does not show me any value.
SELECT * FROM mytable WHERE myfield = "Björn Borg"
OR myfield = "FrüFrü & Tigerlily";
So how I can fetch the values from table ?

The problem can be in your connection charset. Try to run
SET NAMES 'utf8mb4'
And then your query.

There are more places where MySQL defines character sets, including at the column level.
To check the database settings:
SHOW VARIABLES LIKE 'char%';
SHOW VARIABLES LIKE 'collation%';
To check the table settings: SHOW CREATE TABLE tablename;
To check the individual columns: SHOW FULL COLUMNS IN tablename;
If you created a database with the wrong encoding, you will have to change every single occurence of a bad charset in all of these.

Related

Which column attributes in a MySQL statement can specified in a CREATE TABLE / ALTER TABLE statement?

Task at Hand : Need to create tables in MySQL basis a dump from the columns table from Information.Schema of another database. I do not have access to the original DB or the DB architect. I would like to know which of the following attributes found in extracts from the Information_Schema of the original database CAN be specified in CREATE TABLE statements. The objective is to create a Tables which are ditto as per the original.
Problem at Hand: I understand that some of these attributes are specified by the user while creating tables while some may be calculated by MySQL from the data in tables. While I am reading and understanding on each of these attributes, I am unable to quickly ascertain which of the attributes listed below are calculated by MySQL and not user specified and hence can be ignored while writing CREATE TABLE statements.
CHARACTER_MAXIMUM_LENGTH
CHARACTER_OCTET_LENGTH
NUMERIC_PRECISION
NUMERIC_PRECISION_RADIX
NUMERIC_SCALE
DATETIME_PRECISION
CHARACTER_SET_CATALOG
CHARACTER_SET_SCHEMA
CHARACTER_SET_NAME
COLLATION_CATALOG
COLLATION_SCHEMA
COLLATION_NAME
DOMAIN_CATALOG
DOMAIN_SCHEMA
DOMAIN_NAME
Please read the MySQL Manual Page
For CHAR and VARCHAR and similar character (non-TEXT) columns; the CHARACTER_MAXIMUM_LENGTH value is just that;
CREATE TABLE pet (name VARCHAR(20), owner VARCHAR(20),
species VARCHAR(20), sex CHAR(1), birth DATE, death DATE);
^^^^ ^^^
This is the CHARACTER_MAXIMUM_LENGTH value of this column.
CHARACTER_OCTET_LENGTH should be the same as CHARACTER_MAXIMUM_LENGTH, except for multi-byte character sets.
For multi-byte characters sets you should probably be using utf8mb4_ sets. So the
CHARACTER_OCTET_LENGTH may be CHARACTER_MAXIMUM_LENGTH x 4 (for 4-byte full UTF-8; note that utf8_ MySQL character sets/collations are 3-byte only). While this value is auto-generated upon table creation, if you're building these data sets manually you might need to calculate this one yourself.
Which character set you use will also relate to which CHARACTER_SET_SCHEMA
and CHARACTER_SET_NAME is set. For example utf8mb4_general_ci would
For NUMERIC_* values; please read here
COLLATION_* (ex Schema) stuff is user set but if not user set then database and/or table defaults are used.
CHARACTER_SET_* (ex Schema) stuff is user set but if not user set then database and/or table defaults are used.
DATETIME_PRECISION is database/OS set and is not set on TABLE CREATE
DOMAIN_* values are not found on the MySQL Manual and seem to be invalid
So;
CHARACTER_MAXIMUM_LENGTH - User set *
CHARACTER_OCTET_LENGTH - Auto generated, from User set/default details (*)
NUMERIC_PRECISION - User set
NUMERIC_PRECISION_RADIX - User set for specific spacial-type columns
NUMERIC_SCALE - User set
DATETIME_PRECISION - System set
CHARACTER_SET_CATALOG - System set catalogue of possible values.
CHARACTER_SET_SCHEMA - The generated Schema of possible values from the Catalogue
CHARACTER_SET_NAME - User set but defaults to MySQL / Db default values *
COLLATION_CATALOG - System set catalogue of possible values.
COLLATION_SCHEMA - The generated Schema of possible values from the Catalogue
COLLATION_NAME - User set but defaults to MySQL / Db default values *
DOMAIN_CATALOG - Unknown. no records of this type.
DOMAIN_SCHEMA - Unknown. no records of this type.
DOMAIN_NAME - Unknown. no records of this type.

How to fix UTF-8 double encoded data on large MySQL database

I have approx 70 databases, each database contains 750+ tables (exact same structure), and lot of data stored but, the problem is only few databases set to utf8 and others are latin1, so latin1 database saved double encoded values like 接近åˆå ± for 接近初報
So i want to convert all my databases to utf8mb4 so it should save correct data, but this will obviously requires existing double encoded data to convert to utf8mb4
I have following sql query to convert data.
UPDATE table SET col = IFNULL(CONVERT(CONVERT(CONVERT(col USING latin1) USING binary) USING utf8), col )
But the problem is my databases are very large and this will take lot of time to convert data to utf8. so is there any easy way to update data for whole database in one go or something else which is easy?
Many thanks
You really should be using utf8mb4 for Chinese; some Chinese characters are not representable in MySQL's 3-byte utf8.
A slightly shorter expression:
CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4)
Which case? see http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases -- You probably need the 3rd of these:
CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset:
First, lets assume you have this declaration for tbl.col:
col VARCHAR(111) CHARACTER SET latin1 NOT NULL
Then, to convert the column without changing the bytes:
ALTER TABLE tbl MODIFY COLUMN col VARBINARY(111) NOT NULL;
ALTER TABLE tbl MODIFY COLUMN col VARCHAR(111) CHARACTER SET utf8mb4 NOT NULL;
Note: If you start with TEXT, use BLOB as the intermediate definition. Since ALTER needs to know all the details (size, nullness, etc), it is quite messy to dynamically create the ALTERs.
CHARACTER SET utf8mb4 with double-encoding:
UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);
CHARACTER SET latin1 with double-encoding: Do the 2-step ALTER, then fix the double-encoding.
Going through the tables:
SELECT CONCAT("UPDATE ", table_schema, ".", table_name, "
SET ", column_name, " = CONVERT(BINARY(CONVERT(", column_name,
" USING latin1)) USING utf8mb4);")
FROM information_schema.columns
WHERE character_set_name = 'latin1';
Then copy & paste the output. (Or write a Stored Procedure to do the execute.)
Caveat: The SELECTs may pick more tables/columns than it should.

Illegal mix of collations error in mysql query

Is there any way to compare the generated range column in the mysql query ?
SELECT ue.bundle,ue.timestamp,b.id,bv.id as bundleVersionId,bv.start_date,bv.end_date, bv.type,ue.type from (
SELECT bundle,timestamp,tenant, case when Document_Id ='' then 'potrait'
WHEN Document_Id<>'' then 'persisted' end as type from uds_expanded ) ue
JOIN bundle b on b.name=ue.bundle join bundle_version bv on b.id=bv.bundle_id
WHERE ue.tenant='02306' and ue.timestamp >= bv.start_date and ue.timestamp <=bv.end_date and **ue.type=bv.type ;**
I am getting the following error when I try to compare the types
Error Code: 1267. Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation '=' 0.000 sec
Stick to one encoding/collation for your entire system. Right now you seem to be using UTF8 one place and latin1 in another place. Convert the latter to use UTF8 as well and you'll be good.
You can change the collation to UTF8 using
alter table <some_table> convert to character set utf8 collate utf8_general_ci;
I think sometimes the issue is we use different orm utilities to generate table and then we want to test queries either by mysql command line or MySql workbench, then this problem comes due to differences of table collation and the command line or app we use. simple way is to define your variables (ones used to test the query against table columns)
ex:
MySQL>
MySQL> set #testCode = 'test2' collate utf8_unicode_ci;
Query OK, 0 rows affected (0.00 sec)
MySQL> select * from test where code = #testCode;
full details
Be aware that the single columns can have their collation.
For example, Doctrine generates columns of VARCHAR type as CHARACTER SET utf8 COLLATE utf8_unicode_ci, and changing the table collation doesen't affect the single columns.
You can change the column's collation with this command:
ALTER TABLE `table`
CHANGE COLUMN `test` `test` VARCHAR(15) CHARACTER SET 'utf8' COLLATE 'utf8_general_ci'
or in MySql Workbench interface-> right click on the table-> Alter Table and then in the interface click on a column and modify it.
Use ascii_bin where ever possible, it will match up with almost any collation.

MySQL: char_length(), wrong value for Russian

I am using char_length() to measure the size of "Русский": strangely, instead of telling me that it's 7 chars, it tells me there are 14. Interestingly if the query is simply...
SELECT CHAR_LENGTH('Русский')
...the answer is correct. However if I query the DB instead, the anser is 14:
SELECT CHAR_LENGTH(text) FROM locales WHERE lang = 'ru-RU' AND name = 'lang_name'
Anybody go any ideas what I might be doing wrong? I can confirm that the collation is utf8_general_ci and the table is MyISAM
Thanks,
Adrien
EDIT: My end objective is to be able to measure the lengths of records in a table containing single and double-byte chracters (eg. English & Russian, but not limited to these two languages only)
Because of two bytes is used for each UTF8 char.
See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_char-length
mysql> set names utf8;
mysql> SELECT CHAR_LENGTH('Русский'); result - 7
mysql> SELECT CHAR_LENGTH('test'); result - 4
create table test123 (
text VARCHAR(255) NOT NULL DEFAULT '',
text_text TEXT) Engine=Innodb default charset=UTF8;
insert into test123 VALUES('русский','test русский');
SELECT CHAR_LENGTH(text),CHAR_LENGTH(text_text) from test123; result - 7 and 12
I have tested work with: set names koi8r; create table and so on and got invalid result.
So the solution is recreate table and insert all data after setting set names UTF8.
the function return it's anwser guided by the most adjacent charset avaiable
in the case of a column, the column definition
in the case of a literal, the connection default
review the column charset with:
SELECT CHARACTER_SET_NAME FROM information_schema.`COLUMNS`
where table_name = 'locales'
and column_name = 'text'
be careful, it is not filtered by table_schema

Illegal mix of collations error in MySql

Just got this answer from a previous question and it works a treat!
SELECT username, (SUM(rating)/COUNT(*)) as TheAverage, Count(*) as TheCount
FROM ratings WHERE month='Aug' GROUP BY username HAVING TheCount > 4
ORDER BY TheAverage DESC, TheCount DESC
But when I stick this extra bit in it gives this error:
Documentation #1267 - Illegal mix of
collations
(latin1_swedish_ci,IMPLICIT) and
(latin1_general_ci,IMPLICIT) for
operation '='
SELECT username, (SUM(rating)/COUNT(*)) as TheAverage, Count(*) as TheCount FROM
ratings WHERE month='Aug'
**AND username IN (SELECT username FROM users WHERE gender =1)**
GROUP BY username HAVING TheCount > 4 ORDER BY TheAverage DESC, TheCount DESC
The table is:
id, username, rating, month
Here's how to check which columns are the wrong collation:
SELECT table_schema, table_name, column_name, character_set_name, collation_name
FROM information_schema.columns
WHERE collation_name = 'latin1_general_ci'
ORDER BY table_schema, table_name,ordinal_position;
And here's the query to fix it:
ALTER TABLE tbl_name CONVERT TO CHARACTER SET latin1 COLLATE 'latin1_swedish_ci';
Link
Check the collation type of each table, and make sure that they have the same collation.
After that check also the collation type of each table field that you have use in operation.
I had encountered the same error, and that tricks works on me.
[MySQL]
In these (very rare) cases:
two tables that really need different collation types
values not coming from a table, but from an explicit enumeration, for instance:
SELECT 1 AS numbers UNION ALL SELECT 2 UNION ALL SELECT 3
you can compare the values between the different tables by using CAST or CONVERT:
CAST('my text' AS CHAR CHARACTER SET utf8)
CONVERT('my text' USING utf8)
See CONVERT and CAST documentation on MySQL website.
I was getting this same error on PhpMyadmin and did the solution indicated here which worked for me
ALTER TABLE table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
Illegal mix of collations MySQL Error
Also I would recommend going with General instead of swedish since that one is default and not to use the language unless your application is using Swedish.
I think you should convert to utf8
--set utf8 for connection
SET collation_connection = 'utf8_general_ci'
--change CHARACTER SET of DB to utf8
ALTER DATABASE dbName CHARACTER SET utf8 COLLATE utf8_general_ci
--change CHARACTER SET of table to utf8
ALTER TABLE tableName CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
I also got same error, but in my case main problem was in where condition the parameter that i'm checking was having some unknown hidden character (+%A0)
When A0 convert I got 160 but 160 was out of the range of the character that db knows, that's why database cannot recognize it as character other thing is my table column is varchar
the solution that I did was I checked there is some characters like that and remove those before run the sql command
ex:- preg_replace('/\D/', '', $myParameter);
Check that your users.gender column is an INTEGER.
Try: alter table users convert to character set latin1 collate latin1_swedish_ci;
You need to change each column Collation from latin1_general_ci to latin1_swedish_ci
I got this same error inside a stored procedure, in the where clause. i discovered that the problem ocurred with a local declared variable, previously loaded by the same table/column.
I resolved it casting the data to single char type.
In short, this error is caused by MySQL trying to do an operation on two things which have different collation settings. If you make the settings match, the error will go away. Of course, you need to choose the right setting for your database, depending on what it is going to be used for.
Here's some good advice on choosing between two very common utf8 collations: What's the difference between utf8_general_ci and utf8_unicode_ci
If you are using phpMyAdmin you can do this systematically by working through the tables mentioned in your error message, and checking the collation type for each column. First you should check which is the overall collation setting for your database - phpMyAdmin can tell you this and change it if necessary. But each column in each table can have its own setting. Normally you will want all these to match.
In a small database this is easy enough to do by hand, and in any case if you read the error message in full it will usually point you to the right place. Don't forget to look at the 'structure' settings for columns with subtables in as well. When you find a collation that does not match you can change it using phpMyAdmin directly, no need to use the query window. Then try your operation again. If the error persists, keep looking!
The problem here mainly, just Cast the field like this cast(field as varchar) or cast(fields as date)
I had this problem not because I'm storing in different collations, but because my column type is JSON, which is binary.
Fixed it like this:
select table.field COLLATE utf8mb4_0900_ai_ci AS fieldName
Use ascii_bin where ever possible, it will match up with almost any collation.
A username seldom accepts special characters anyway.
If you want to avoid changing syntax to solve this problem, try this:
Update your MySQL to version 5.5 or greater.
This resolved the problem for me.
I have the same problem with collection warning for a field that is set from 0 to 1. All columns collections was the same. We try to change collections again but nothing fix this issue.
At the end we update the field to NULL and after that we update to 1 and this overcomes the collection problem.
Was getting Illegal mix of collations while creating a category in Bagisto. Running these commands (thank you #Quy Le) solved the issue for me:
--set utf8 for connection
SET collation_connection = 'utf8_general_ci'
--change CHARACTER SET of DB to utf8
ALTER DATABASE dbName CHARACTER SET utf8 COLLATE utf8_general_ci
--change category tables
ALTER TABLE categories CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
ALTER TABLE category_translations CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
In my case it was something strange. I read an api key from a file and then I send it to the server where a SQL query is made. The problem was the BOM character that the Windows notepad left, it was causing the error that says:
SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
I just removed it and everything worked like a charm
You need to set 'utf8' for all parameters in each Function. It's my case:
SELECT username, AVG(rating) as TheAverage, COUNT(*) as TheCount
FROM ratings
WHERE month='Aug'
AND username COLLATE latin1_general_ci IN
(
SELECT username
FROM users
WHERE gender = 1
)
GROUP BY
username
HAVING
TheCount > 4
ORDER BY
TheAverage DESC, TheCount DESC;
Make sure your version of MySQL supports subqueries (4.1+). Next, you could try rewriting your query to something like this:
SELECT ratings.username, (SUM(rating)/COUNT(*)) as TheAverage, Count(*) as TheCount FROM ratings, users
WHERE ratings.month='Aug' and ratings.username = users.username
AND users.gender = 1
GROUP BY ratings.username
HAVING TheCount > 4 ORDER BY TheAverage DESC, TheCount DESC