MYSQL case sensitive search for utf8_bin field - mysql

I created a table and set the collation to utf8 in order to be able to add a unique index to a field. Now I need to do case insensitive searches, but when I performed some queries with the collate keyword and I got:
mysql> select * from page where pageTitle="Something" Collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for
CHARACTER SET 'latin1'
mysql> select * from page where pageTitle="Something" Collate latin1_general_ci;
ERROR 1267 (HY000): Illegal mix of collations (utf8_bin,IMPLICIT) and
(latin1_general_ci,EXPLICIT) for operation '='
I am pretty new to SQL, so I was wondering if anyone could help.

A string in MySQL has a character set and a collation. Utf8 is the character set, and utf8_bin is one of its collations. To compare your string literal to an utf8 column, convert it to utf8 by prefixing it with the _charset notation:
_utf8 'Something'
Now a collation is only valid for some character sets. The case-sensitive collation for utf8 appears to be utf8_bin, which you can specify like:
_utf8 'Something' collate utf8_bin
With these conversions, the query should work:
select * from page where pageTitle = _utf8 'Something' collate utf8_bin
The _charset prefix works with string literals. To change the character set of a field, there is CONVERT ... USING. This is useful when you'd like to convert the pageTitle field to another character set, as in:
select * from page
where convert(pageTitle using latin1) collate latin1_general_cs = 'Something'
To see the character and collation for a column named 'col' in a table called 'TAB', try:
select distinct collation(col), charset(col) from TAB
A list of all character sets and collations can be found with:
show character set
show collation
And all valid collations for utf8 can be found with:
show collation where charset = 'utf8'

Also please note that in case of using "Collate utf8_general_ci" or "Collate latin1_general_ci", i.e. "force" collate - such a converting will prevent from usage of existing indexes! This could be a bottleneck in future for performance.

Try this, Its working for me
SELECT * FROM users WHERE UPPER(name) = UPPER('josé') COLLATE utf8_bin;

May I ask why you have a need to explicitly change the collation when you do a SELECT? Why not just collate in the way you want to retrieve the records when sorted?
The problem you are having with your searches being case sensitive is that you have a binary collation. Try instead to use the general collation. For more information about case sensitivity and collations, look here:
Case Sensitivity in String Searches

Related

Incorrect string value: \'\\xC3\' for column \'description\' at row 1 in mySql

Here, In my table, I've one column name as description.
As per my error, I've tried many solutions from SO to change the collation type.
I've tried below collection
1) utf8mb4_unicode_ci
2) utf8_general_ci
Here, SHOW FULL COLUMNS FROM your_table;
Can anyone know what is the right collation for \'\\xC3\' this type of string?
To support full UTF-8 Unicode like for example emojis in your case it is the character À you should use utf8mb4 and utf8mb4_unicode_ci utf8 is outdated.
You can find a full explanation at https://mathiasbynens.be/notes/mysql-utf8mb4.
You can check the current collations of your table like this:
SHOW FULL COLUMNS FROM your_table;
I assume your description column has type TEXT otherwise you might need to change the type.
To alter the table default character set you can use:
ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4;
But this does not change the collation of your column.
To change the collation of your column you should use:
ALTER TABLE your_table MODIFY description TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Try this first
ALTER TABLE your_database_name.your_table CONVERT TO CHARACTER SET utf8
OR If above solution won't work then do the following after connecting to your database
SET NAMES 'utf8';
SET CHARACTER SET utf8;

How can I see utf8mbf collate for a column in MySQL?

Here is the command I use:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
It works well. Now I need to set utf8mb4_unicode_ci for a column (since currently characters are shown as ???). Anyway here is my new command:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8mb4_unicode_ci;
But sadly MySQL throws:
ERROacR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER
Any idea?
The first part of the COLLATION name must match the CHARACTER SET name.
CHARACTER SET utf8mb4 is needed for Emoji and some Chinese characters.
Let's back up to the 'real' problem -- of question marks.
COLLATION refers to the rules of ordering and sorting, not encoding.
CHARACTER SET refers to the encoding. This should be consistent at all stages. Question Marks come from inconsistencies.
Trouble with UTF-8 characters; what I see is not what I stored points out that these are the likely suspects for Question Marks:
The bytes to be stored are not encoded as utf8/utf8mb4. Fix this.
The column in the database is not CHARACTER SET utf8mb4. Fix this if you need 4-byte UTF-8. (Use SHOW CREATE TABLE.)
Also, check that the connection during reading is UTF-8. The details depend on the application doing the connecting.
This worked for me:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Polish and German accented letters in mysql

I want to store in unique column polish and german signs.
When i alter database:
alter database osa character set utf8 collate utf8_general_ci;
I have a problem with german signs.
sql> insert into company(uuid, name) VALUE ("1","IDE")
[2016-11-27 10:37:35] 1 row affected in 13ms
sql> insert into company(uuid, name) VALUE ("2","IDĘ")
[2016-11-27 10:37:37] 1 row affected in 9ms
sql> insert into company(uuid, name) VALUE ("3","Schuring")
[2016-11-27 10:37:38] 1 row affected in 13ms
sql> insert into company(uuid, name) VALUE ("4","Schüring")
[2016-11-27 10:37:39] [23000][1062] Duplicate entry 'Schüring' for key 'UK_niu8sfil2gxywcru9ah3r4ec5'
Which collate I have to use?
Edit:
Also not works for utf8_unicode_ci
The _ci in the COLLATION indicates "character insensitive". Unfortunately, it also means "accent insensitive". So to get E and Ę to be treated differently, you need a _bin collation -- either utf8_bin or utf8mb4_bin.
mb4 is needed for Emoji and Chinese, plus some obscure things.
Replace all occurrences of utf8_general_ci with utf8_unicode_ci instead. utf8_general_ci is broken, apparently: What are the diffrences between utf8_general_ci and utf8_unicode_ci?
utf8_general_ci is a very simple — and on Unicode, very broken — collation, one that gives incorrect results on general Unicode text.
Maybe you should try utf8mb4_unicode_ci ?
Utf8 charset cannot store all utf8 characters.
https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
alter database osa character set utf8mb4 COLLATE utf8mb4_bin;
Works for me. #Maciek Bryński thank you for your hint.

How to use COLLATE in MySQL prepared statement

I need to specify COLLATE in connection with LIKE inside a prepared statement inside a stored procedure, e.g. <col> LIKE ? COLLATE utf8_unicode_ci. I'm getting the following error:
COLLATION 'utf8_unicode_ci' is not valid for CHARACTER SET 'binary'
I have also tried to cast the parameter by all of: LIKE _utf8 ? COLLATE utf8_unicode_ci, LIKE CONVERT(? AS utf8) COLLATE utf8_unicode_ci and LIKE CAST(? AS varchar CHARACTER SET utf8) COLLATE utf8_unicode_ci but the error is then something like:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'varchar CHARACTER SET utf8) COLLATE utf8_unicode_ci ...
Any hint would be greatly appreciated.
AFAIK, the _utf8 is part of the syntax for string literals. Since you don't have, you cannot use it.
The signature for CAST() is this:
CAST(expr AS type)
I think you really want CONVERT():
CONVERT(expr,type), CONVERT(expr USING transcoding_name)
[...]
CONVERT() with USING converts data between different character sets.
In MySQL, transcoding names are the same as the corresponding
character set names. For example, this statement converts the string
'abc' in the default character set to the corresponding string in the
utf8 character set:
SELECT CONVERT('abc' USING utf8);

Convert from utf8_general_ci to utf8_unicode_ci

I have a utf8_general_ci database that I'm interested in converting to utf8_unicode_ci.
I've tried the following commands
ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; (for every single table)
But that seems to change the charset for future data but doesn't convert the actual existing data from utf8_general_ci to utf8_unicode_ci.
Is there any way to convert the existing data to utf8_unicode_ci?
SHOW CREATE TABLE to see if it really set the CHARACTER SET and COLLATION on the columns, not just the defaults.
What was the CHARACTER SET before the ALTERs?
Do SELECT col, HEX(col) ... for some field that should have utf8 in it. This will help us determine if you really have utf8 in the table. The encoding for characters is different based on CHARACTER SET; the HEX helps discover such.
The ordering (WHERE, ORDER BY, etc) is controlled by COLLATION. The indexes probably had to be rebuilt based on your ALTER TABLE. Did big tables with indexes take a 'long' time to convert?
To actually see the difference between utf8_general_ci and utf8_unicode_ci, you need a "combining accent" or, more simply, the German ß versus ss:
mysql> SELECT 'ß' = 'ss' COLLATE utf8_general_ci,
'ß' = 'ss' COLLATE utf8_unicode_ci;
+-------------------------------------+-------------------------------------+
| 'ß' = 'ss' COLLATE utf8_general_ci | 'ß' = 'ss' COLLATE utf8_unicode_ci |
+-------------------------------------+-------------------------------------+
| 0 | 1 |
+-------------------------------------+-------------------------------------+
However, to test that in your tables, you would need to store those values and use WHERE or GROUP_CONCAT or something else to determine the equality.
What 'proof' do you have that the ALTERs failed to achieve the collation change?
(Addressing other comments: REPAIR should be irrelevant. CONVERT TO tells the ALTER to actually modify the data, so it should have done the desired action.)
You have to change the collation of every field in every table. As you say, the collation of the table is only the default value for fields created later, and the collation of the database is only the default value for tables created later.
As Lorenz Meyer said, the collation of the table is only the default value for fields created later and you need to set the defaults for the columns explicitly too.
Such a change looks like:
ALTER TABLE mytable CHANGE mycolumn mycolumn varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci