MySQL transform case-sensitive unique field into unique case-insensitive field - mysql

This is an interesting challenge question as there can be multiple ways to solve this problem :)
I have an ID column that is unique but case-sensitive, ex:
----- ID ------
0018000001K6dkh -> record 1
0018000001K6dkH -> record 2 (different from record 1)
As MySQL is case insensitive in utf8, it is considering the two ID values as identical:
SELECT COUNT(*) FROM table WHERE id='0018000001K6dkh' //returns 2 instead of 1
SELECT COUNT(*) FROM table WHERE id='0018000001K6dkH' //returns 2 instead of 1
I need to build a MySQL function that I can apply to this column to transform all IDs into a case-insensitive and unique IDs, ex:
function('0018000001K6dkh') = something
function('0018000001K6dkH') = something different
function('0018000000ttzlB') = something even different
function('0018000000tTzlB') = something even more different
Note: I don't want to use collations, because I would have to change all the WHEREs and JOINs in my application (Laravel PHP). I want to change the IDs permanently.

According the official MySQL documentation you can set the collation on a per column basis (*see data_type section for CHAR, VARCHAR, TEXT, ENUM, and SET types).
This should change the default manner in which they are compared to string literals and each other, but I am not familiar with the rules used when comparing fields of differing collation with each other.

This is from mysql http://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html
To cause a case-sensitive comparison of nonbinary strings to be case
insensitive, use COLLATE to name a case-insensitive collation. The
strings in the following example normally are case sensitive, but
COLLATE changes the comparison to be case insensitive:
SET #s1 = 'MySQL' COLLATE latin1_bin;
SET #s2 = 'mysql' COLLATE latin1_bin;
SELECT #s1 = #s2;
return false
using #s1 COLLATE latin1_swedish_ci = #s2;
SELECT #s1 COLLATE latin1_swedish_ci = #s2;
return true
or use BINARY
SELECT COUNT(*) FROM table WHERE BINARY id='0018000001K6dkh'
SELECT COUNT(*) FROM table WHERE BINARY id='0018000001K6dkH'

Related

Are WHERE clause in Mysql not case-sensitive?

is where clause not case-sensitive??
i tried using different versions of query for the table data:
username(TEXT) pass(TEXT)
admin admin
The above table has only one entry
SELECT pass FROM lbdb_user WHERE username = "Admin";
SELECT pass FROM lbdb_user WHERE username = "admiN";
SELECT pass FROM lbdb_user WHERE username = "admin ";
all the above queries that i ran produced the same result.
Mysql collate, searching case sensitive
Usually our mysql queries are not case sensitive. In order to query case sensitive, you can use the mysql COLLATE clause.
The collate clause lets you specify a collation, which basically is a set of rules for comparing characters in a given character set.
The suffixes ci, cs, bin of a collation stand for case insensitive, case sensitive and binary, respectively. A binary collation such as utf8_bin is case sensitive as well since it compares the characters based on their numeric values.
SELECT * FROM users WHERE name like 'cRaZy' COLLATE utf8_bin;

Make different lower and upper in mysql select

I want to search for a name on database. But I just want select Bill , not biLL or BiLL or ... just "Bill". But when I use this query which shows Bill , BiLL, BILL, bilL and ...
query=`select * from names where name='Bill'`
To quote the documentation:
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation
You can overcome this by explicitly using a case sensitive collation:
select * from names where name='Bill' COLLATE latin1_general_cs
There is also another solution to set the collection of the column to utf8mb4_unicode_520 or any case sensitive standard collections.

Accent sensitive matches in MySQL

MySQL table is recognizing cód and cod as same. How can i avoid this problem.
The table is COLLATE='utf8_general_ci'
You should look for the characters set, tablename, and column names of the fields you want to use
Make sure they are set to:
COLLATE='utf8_bin'
Make sure you don't put the _ci at the end as it stands for case insensitive
If you can't get to the table and db in the database, you can use it on your queries:
SELECT * FROM tablex WHERE LOWER(column) = 'cód' COLLATE utf8_bin

How to setup MySQL to handle unicode diacriticals properly?

This is an odd puzzle, AFAIK utf8_bin should guarantee that every accent is stored in the database properly, i.e. without some strange conversion to ASCII. So I have such table with:
DEFAULT CHARSET=utf8 COLLATE=utf8_bin
and yet when I try to compare/query/whatever such entries as "Krąków" and "Kraków" according to MySQL this is the same string.
Out of curiosity I also tried utf8_polish, and MySQL claims that for Polish guys "a" and "ą" do not make any difference.
So how to setup MySQL table, so I could store unicode strings safely, without losing accents and alike?
Server: MySQL 5.5 + openSUSE 11.4, client: Windows 7 + MySQL Workbench 5.2.
Update -- CREATE TABLE
CREATE TABLE `Cities` (
`city_Name` VARCHAR(145) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`city_Name`)
) DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Please note that I cannot set a different utf8_bin for column, because entire table is utf8_bin, so in effect collation for column is reset to default.
All credits of the solution go to bobince, so please upvote his comment to my question.
The solution to the problem is somewhat strange, and I would risk saying MySQL is broken in this regard.
So, let's say I created a table with utf8 and didn't do anything for column. Later I realize I need strict comparison of characters, so I change the collation for table AND columns to utf8_bin. Solved?
No, now MySQL sees this -- the table is indeed utf8_bin, but column is also utf8_bin, which means column uses the DEFAULT collation of the table. However MySQL does not realize that the previous default is not the same as current default. And thus comparison still does not work.
So you have to shake off that default for column, to some alien value out of scope of the collation "family" (in case of "utf8xxx" means no other "utf8xxx"). Once it is shaken off, and you see entry which does not say "default" at column collation, you can set utf8_bin, which now evaluates to default, but since we come from non-default collation, everything kicks in as expected.
Do not forget to apply the changes at each step.
The MySQL default charset and collation (which are server-wide but can be changed per connection) apply at the time a table is created. Changing the defaults after the table is created doesn't affect existing tables.
Character sets and collations are attributes of individual columns. They can be set from a table-wide default but they do belong to columns.
A charset of utf8 should be sufficient to allow all European languages to be represented correctly. You should definitely be able to store "a" and "ą" as two different characters.
A collation of utf8-bin yields a case and accented-character sensitive collation.
Here are some examples of the difference between text value and collation behavior. I'm using three sample strings: 'abcd', 'ĄBCD' , and 'ąbcd'. The last two have the A-ogonek letter.
This first example says that with utf8 character representation and utf8_general_ci collation, that the three strings each display as specified by the user, but that they compare equal. That's to be expected in a collation that doesn't distinguish between a and ą. That's a typical case insensitive collation, where all the variant characters are sorted equal to the character without any diacritical marks.
SET NAMES 'utf8' COLLATE 'utf8_general_ci';
SELECT 'abcd', 'ąbcd' , 'abcd' < 'ąbcd', 'abcd' = 'ąbcd';
false true
This next example shows that in the case-insensitive Polish-language collation, a comes before ą. I don't know Polish, but I suspect Polish telephone books have the As and the Ą's separated.
SET NAMES 'utf8' COLLATE 'utf8_polish_ci';
SELECT 'abcd', 'ĄBCD' , 'ąbcd', 'abcd' < 'ĄBCD', 'abcd' < 'ąbcd' , 'ąbcd' = 'ĄBCD'
true true true
This next example shows what happens with the utf8_bin collation.
SET NAMES 'utf8' COLLATE 'utf8_bin';
SELECT 'abcd', 'ĄBCD' , 'ąbcd', 'abcd' < 'ĄBCD', 'abcd' < 'ąbcd' , 'ąbcd' = 'ĄBCD'
true true false
There's one non-intuitive thing to notice in this case. 'abcd' < 'ĄBCD' is true (whereas 'abcd' < 'ABCD' with pure ASCII is false). That's a strange result if you're thinking linguistically. That's because the both A-ogonek characters have binary values in utf8 that are higher than all the abc and ABC characters. So: if you use the utf8-bin collation for ORDER BY operations, you'll get linguistically strange results.
You're saying that 'Krąków' and 'Kraków' compare equal, and that you're puzzled by that. They do compare equal when the collation in use is utf8_general_ci. But they don't with either utf8_bin or utf8_polish_ci. According to the Polish-language support in MySQL, these two spellings of the city's name are different.
As you design your application, you need to sort out how you want all this to work linguistically. Are 'Krąków' and 'Kraków' the same place? Are 'Ąaron' and 'Aaron' the same person? If so, you want utf8_general_ci.
You could consider altering the table you've shown like this:
ALTER TABLE Cities
MODIFY COLUMN city_Name
VARCHAR(145)
CHARACTER SET utf8
COLLATE utf8_general_ci
This will set the column in your table the way you want it.

Special Characters and a simple select query

I have got a problem with a simple Select Query and special chars. I want to select the name Änne.
SELECT * FROM `names` WHERE `name` = 'Änne'
utf8_general_ci
Änne
Anne
okay, ...
utf8 general ci is a very simple collation. What it does it just
removes all accents then converts to upper case and uses the code of this sort of "base letter" result letter to compare.
http://forums.mysql.com/read.php?103,187048,188748
utf8_unicode_ci
Änne
Anne
why?
utf8_bin
Änne
utf8_bin seems to be the right choice at this point, but i have to do my search case insensitiv.
SELECT * FROM `names` WHERE `name` = 'änne'
utf8_bin
none
Is there no way to do so?
I could use php ucwords() to uppercase the first letters, but i would prefer to find a DB solution.
edit: ucwords('änne') = änne, so i cant use that too
SELECT * FROM `names` WHERE lower(`name`) = 'änne'
is working for me, because i don't have a difference between 'Änne' and 'änne' in my DB.
what about:
SELECT * FROM `names` WHERE upper(`name`) = upper("änne")
Quoting doc:
The default character set and collation are latin1 and
latin1_swedish_ci, so nonbinary string comparisons are case
insensitive by default. This means that if you search with col_name
LIKE 'a%', you get all column values that start with A or a. To make
this search case sensitive, make sure that one of the operands has a
case sensitive or binary collation
That means that case sensitive results are because you have set a binary collation. You can set collation column to utf8_general_ci and change it on searchs:
col_name COLLATE latin1_general_cs LIKE 'a%'
There is an error in your MySQL code:
SELECT * FROM names WHERE name = "Änne"
Remove the quotes around the table name and the field name.