MySQL: Collate in query - any side effects? - mysql

My OpenCart table collation is utf8_bin, unfortunately I can't search for product names with accent in their name. I searched on Google and just found that the collation must be utf8_general_ci for accent compatible and case insensitive search.
What If I add collate declaration to the search query?
SELECT *
FROM `address`
COLLATE utf8_general_ci
LIMIT 0 , 30
Does it have any (bad) side effect? I red about problems with indexing, performance? Or it is totally safe?

I'm afraid you have to consider the side effects on query performance, especially those using indexes. Here is a simple test:
mysql> create table aaa (a1 varchar(100) collate latin1_general_ci, tot int);
insert into aaa values('test1',3) , ('test2',4), ('test5',5);
mysql> create index aindex on aaa (a1);
Query OK, 0 rows affected (0.59 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> desc aaa;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| a1 | varchar(100) | YES | MUL | NULL | |
| tot | int(11) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
2 rows in set (0.53 sec)
mysql> explain select * from aaa where a1='test1' ;
+----+-------------+-------+------+---------------+--------+---------+-------+--
----+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | r
ows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------+--
----+-----------------------+
| 1 | SIMPLE | aaa | ref | aindex | aindex | 103 | const |
1 | Using index condition |
+----+-------------+-------+------+---------------+--------+---------+-------+--
----+-----------------------+
1 row in set (0.13 sec)
mysql> explain select * from aaa where a1='test1' collate utf8_general_ci;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows
| Extra |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-------------+
| 1 | SIMPLE | aaa | ALL | NULL | NULL | NULL | NULL | 3
| Using where |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-------------+
1 row in set (0.06 sec)
You can see that MySQL is stopping using the index on a1 when you search it using another collation, which can be a huge problem for you.
To make sure your indexes are being used for queries, you may have to change your column collation to the most frequently used one.

If practical, change the column definition(s).
ALTER TABLE tbl
MODIFY col VARCHAR(...) COLLATE utf8_general_ci ...;
(You should include anything else that was already in the column definition.) If you have multiple columns to modify, do them all in the same ALTER (for speed).
If, for some reason, you cannot do the ALTER, then, yes, you can tweak the SELECT to change the collation:
The SELECT you mentioned had no WHERE clause for filtering, so let me change the test case:
Let's say you have this, which will find only 'San Jose':
SELECT *
FROM tbl
WHERE city = 'San Jose'
To include San José:
SELECT *
FROM tbl
WHERE city COLLATE utf8_general_ci = 'San Jose'
If you might have "combining accents", consider using utf8_unicode_ci. More on Combining Diacriticals and More on your topic.
As for side effects? None except for on potentially big one: The index on the column cannot be used. In my second SELECT (above), INDEX(city) is useless. The ALTER avoids this performance penalty on the SELECT, but the one-time ALTER, itself, is costly.

In using of COLLATE in SQL statements, I don't find that usage, Anyway for explaining about your main question of effects of using collations I found some tips, but at first:
From dev.mysql.com:
Nonbinary strings (as stored in the CHAR, VARCHAR, and TEXT data types) have a character set and collation. A given character set can have several collations, each of which defines a particular sorting and comparison order for the characters in the set.
Collation is merely the ordering that is used for string comparisons—it has (almost) nothing to do with the character encoding that is used for data storage. I say almost because collations can only be used with certain character sets, so changing collation may force a change in the character encoding.
To the extent that the character encoding is modified, MySQL will correctly re-encode values to the new character set whether going from single to multi-byte or vice-versa. Beware that any values that become too large for the column will be truncated.[1]
The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.[2]
With multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column x, or of the string literal 'Y'? Both x and 'Y' have collations, so which collation takes precedence?
Standard SQL resolves such questions using what used to be called “coercibility” rules. [3]
If you change the collation of a field, ORDER BY -[also in WHERE]- cannot use any INDEX; hence it could be surprisingly inefficient. [4]
Since the forced collation is defined over the same character set as the column's encoding, there won't be any performance impact(versus defining that collation as the column's default; whereas utf8_general_ci will almost certainly perform slower in comparisons than utf8_bin due the extra lookups/computation required).
However, if one forced a collation that is defined over a different character set, MySQL would have to transcode the column's values (which would have a performance impact).[5]

This might help: UTF-8: General? Bin? Unicode?
Please note that utf8_bin is also case sensitive. So I would go for altering table collation to utf8_general_ci and have peace of mind for the future.

Related

Why is MySQL string not case sensitive? [duplicate]

I have a function that returns five characters with mixed case. If I do a query on this string it will return the value regardless of case.
How can I make MySQL string queries case sensitive?
The good news is that if you need to make a case-sensitive query, it is very easy to do:
SELECT * FROM `table` WHERE BINARY `column` = 'value'
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation:
col_name COLLATE latin1_general_cs LIKE 'a%'
col_name LIKE 'a%' COLLATE latin1_general_cs
col_name COLLATE latin1_bin LIKE 'a%'
col_name LIKE 'a%' COLLATE latin1_bin
If you want a column always to be treated in case-sensitive fashion, declare it with a case sensitive or binary collation.
The answer posted by Craig White has a big performance penalty
SELECT * FROM `table` WHERE BINARY `column` = 'value'
because it doesn't use indexes. So, either you need to change the table collation like mention here https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html.
OR
Easiest fix, you should use a BINARY of value.
SELECT * FROM `table` WHERE `column` = BINARY 'value'
E.g.
mysql> EXPLAIN SELECT * FROM temp1 WHERE BINARY col1 = "ABC" AND col2 = "DEF" ;
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | temp1 | ALL | NULL | NULL | NULL | NULL | 190543 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
VS
mysql> EXPLAIN SELECT * FROM temp1 WHERE col1 = BINARY "ABC" AND col2 = "DEF" ;
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | temp1 | range | col1_2e9e898e | col1_2e9e898e | 93 | NULL | 2 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
enter code here
1 row in set (0.00 sec)
Instead of using the = operator, you may want to use LIKE or LIKE BINARY
// this returns 1 (true)
select 'A' like 'a'
// this returns 0 (false)
select 'A' like binary 'a'
select * from user where username like binary 'a'
It will take 'a' and not 'A' in its condition
The most correct way to perform a case sensitive string comparison without changing the collation of the column being queried is to explicitly specify a character set and collation for the value that the column is being compared to.
select * from `table` where `column` = convert('value' using utf8mb4) collate utf8mb4_bin;
Why not use binary?
Using the binary operator is inadvisable because it compares the actual bytes of the encoded strings. If you compare the actual bytes of two strings encoded using the different character sets two strings that should be considered the same they may not be equal. For example if you have a column that uses the latin1 character set, and your server/session character set is utf8mb4, then when you compare the column with a string containing an accent such as 'café' it will not match rows containing that same string! This is because in latin1 é is encoded as the byte 0xE9 but in utf8 it is two bytes: 0xC3A9.
Why use convert as well as collate?
Collations must match the character set. So if your server or session is set to use the latin1 character set you must use collate latin1_bin but if your character set is utf8mb4 you must use collate utf8mb4_bin. Therefore the most robust solution is to always convert the value into the most flexible character set, and use the binary collation for that character set.
Why apply the convert and collate to the value and not the column?
When you apply any transforming function to a column before making a comparison it prevents the query engine from using an index if one exists for the column, which could dramatically slow down your query. Therefore it is always better to transform the value instead where possible. When a comparison is performed between two string values and one of them has an explicitly specified collation, the query engine will use the explicit collation, regardless of which value it is applied to.
Accent Sensitivity
It is important to note that MySql is not only case insensitive for columns using an _ci collation (which is typically the default), but also accent insensitive. This means that 'é' = 'e'. Using a binary collation (or the binary operator) will make string comparisons accent sensitive as well as case sensitive.
What is utf8mb4?
The utf8 character set in MySql is an alias for utf8mb3 which has been deprecated in recent versions because it does not support 4 byte characters (which is important for encoding strings like 🐈). If you wish to use the UTF8 character encoding with MySql then you should be using the utf8mb4 charset.
To make use of an index before using the BINARY, you could do something like this if you have large tables.
SELECT
*
FROM
(SELECT * FROM `table` WHERE `column` = 'value') as firstresult
WHERE
BINARY `column` = 'value'
The subquery would result in a really small case-insensitive subset of which you then select the only case-sensitive match.
You can use BINARY to case sensitive like this
select * from tb_app where BINARY android_package='com.Mtime';
unfortunately this sql can't use index, you will suffer a performance hit on queries reliant on that index
mysql> explain select * from tb_app where BINARY android_package='com.Mtime';
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | tb_app | NULL | ALL | NULL | NULL | NULL | NULL | 1590351 | 100.00 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
Fortunately, I have a few tricks to solve this problem
mysql> explain select * from tb_app where android_package='com.Mtime' and BINARY android_package='com.Mtime';
+----+-------------+--------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | tb_app | NULL | ref | idx_android_pkg | idx_android_pkg | 771 | const | 1 | 100.00 | Using index condition |
+----+-------------+--------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-----------------------+
Following is for MySQL versions equal to or higher than 5.5.
Add to /etc/mysql/my.cnf
[mysqld]
...
character-set-server=utf8
collation-server=utf8_bin
...
All other collations I tried seemed to be case-insensitive, only "utf8_bin" worked.
Do not forget to restart mysql after this:
sudo service mysql restart
According to http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html there is also a "latin1_bin".
The "utf8_general_cs" was not accepted by mysql startup. (I read "_cs" as "case-sensitive" - ???).
No need to changes anything on DB level, just you have to changes in SQL Query it will work.
Example -
"SELECT * FROM <TABLE> where userId = '" + iv_userId + "' AND password = BINARY '" + iv_password + "'";
Binary keyword will make case sensitive.
Excellent!
I share with you, code from a function that compares passwords:
SET pSignal =
(SELECT DECODE(r.usignal,'YOURSTRINGKEY') FROM rsw_uds r WHERE r.uname =
in_usdname AND r.uvige = 1);
SET pSuccess =(SELECT in_usdsignal LIKE BINARY pSignal);
IF pSuccess = 1 THEN
/*Your code if match*/
ELSE
/*Your code if don't match*/
END IF;
For those looking to do case sensitive comparison with a regular expression using RLIKE or REGEXP, you can instead use REGEXP_LIKE() with match type c like this:
SELECT * FROM `table` WHERE REGEXP_LIKE(`column`, 'value', 'c');
mysql is not case sensitive by default, try changing the language collation to latin1_general_cs

Searching emoji from varchar column returns different record

I'm using MySQL5.6. The DB character set is utf8mb4.
When I search emoji as below, I got unexpected results.
mysql> SELECT id, hex(title) FROM tags WHERE title = 0xF09F9886;
+-----+------------+
| id | hex(title) |
+-----+------------+
| 165 | F09F9886 |
| 166 | F09F9884 |
+-----+------------+
It should return only id=165. Does anyone know this why?
I found how to fix it. It was a problem of collation. I used default collation value, I presume it's utf8mb4_general_ci. When I changed that utf8mb4_bin, MySQL returned right result.
You can change collation as below.
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

In MySQL, should I quote numbers or not?

For example - I create database and a table from cli and insert some data:
CREATE DATABASE testdb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
USE testdb;
CREATE TABLE test (id INT, str VARCHAR(100)) TYPE=innodb CHARACTER SET 'utf8' COLLATE 'utf8_general_ci';
INSERT INTO test VALUES (9, 'some string');
Now I can do this and these examples do work (so - quotes don't affect anything it seems):
SELECT * FROM test WHERE id = '9';
INSERT INTO test VALUES ('11', 'some string');
So - in these examples I've selected a row by a string that actually stored as INT in mysql and then I inserted a string in a column that is INT.
I don't quite get why this works the way it works here. Why is string allowed to be inserted in an INT column?
Can I insert all MySQL data types as strings?
Is this behavior standard across different RDBMS?
MySQL is a lot like PHP, and will auto-convert data types as best it can. Since you're working with an int field (left-hand side), it'll try to transparently convert the right-hand-side of the argument into an int as well, so '9' just becomes 9.
Strictly speaking, the quotes are unnecessary, and force MySQL to do a typecasting/conversion, so it wastes a bit of CPU time. In practice, unless you're running a Google-sized operation, such conversion overhead is going to be microscopically small.
You should never put quotes around numbers. There is a valid reason for this.
The real issue comes down to type casting. When you put numbers inside quotes, it is treated as a string and MySQL must convert it to a number before it can execute the query. While this may take a small amount of time, the real problems start to occur when MySQL doesn't do a good job of converting your string. For example, MySQL will convert basic strings like '123' to the integer 123, but will convert some larger numbers, like '18015376320243459', to floating point. Since floating point can be rounded, your queries may return inconsistent results. Learn more about type casting here. Depending on your server hardware and software, these results will vary. MySQL explains this.
If you are worried about SQL injections, always check the value first and use PHP to strip out any non numbers. You can use preg_replace for this: preg_replace("/[^0-9]/", "", $string)
In addition, if you write your SQL queries with quotes they will not work on databases like PostgreSQL or Oracle.
Check this, you can understand better ...
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | test_no | index | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | NULL | 3126240 | Using where; Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
| 1 | SIMPLE | test_no | const | Uniq_idx_varchar_num | Uniq_idx_varchar_num | 63 | const | 1 | Using index |
+----+-------------+------------------------+-------+-------------------+-------------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
mysql>
mysql>
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num=0000194701461220130201115347;
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set, 1 warning (7.94 sec)
mysql> SELECT COUNT(1) FROM test_no WHERE varchar_num='0000194701461220130201115347';
+----------+
| COUNT(1) |
+----------+
| 1 |
+----------+
1 row in set (0.00 sec)
AFAIK it is standard, but it is considered bad practice because
- using it in a WHERE clause will prevent the optimizer from using indices (explain plan should show that)
- the database has to do additional work to convert the string to a number
- if you're using this for floating-point numbers ('9.4'), you'll run into trouble if client and server use different language settings (9.4 vs 9,4)
In short: don't do it (but YMMV)
This is not standard behavior.
For MySQL 5.5. this is the default SQL Mode
mysql> select ##sql_mode;
+------------+
| ##sql_mode |
+------------+
| |
+------------+
1 row in set (0.00 sec)
ANSI and TRADITIONAL are used more rigorously by Oracle and PostgreSQL. The SQL Modes MySQL permits must be set IF AND ONLY IF you want to make the SQL more ANSI-compliant. Otherwise, you don't have to touch a thing. I've never done so.
It depends on the column type!
if you run
SELECT * FROM `users` WHERE `username` = 0;
in mysql/maria-db you will get all the records where username IS NOT NULL.
Always quote values if the column is of type string (char, varchar,...) otherwise you'll get unexpected results!
You don't need to quote the numbers but it is always a good habit if you do as it is consistent.
The issue is, let's say that we have a table called users, which has a column called current_balance of type FLOAT, if you run this query:
UPDATE `users` SET `current_balance`='231608.09' WHERE `user_id`=9;
The current_balance field will be updated to 231608, because MySQL made a rounding, similarly if you try this query:
UPDATE `users` SET `current_balance`='231608.55' WHERE `user_id`=9;
The current_balance field will be updated to 231609

How can I make SQL case sensitive string comparison on MySQL?

I have a function that returns five characters with mixed case. If I do a query on this string it will return the value regardless of case.
How can I make MySQL string queries case sensitive?
The good news is that if you need to make a case-sensitive query, it is very easy to do:
SELECT * FROM `table` WHERE BINARY `column` = 'value'
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation:
col_name COLLATE latin1_general_cs LIKE 'a%'
col_name LIKE 'a%' COLLATE latin1_general_cs
col_name COLLATE latin1_bin LIKE 'a%'
col_name LIKE 'a%' COLLATE latin1_bin
If you want a column always to be treated in case-sensitive fashion, declare it with a case sensitive or binary collation.
The answer posted by Craig White has a big performance penalty
SELECT * FROM `table` WHERE BINARY `column` = 'value'
because it doesn't use indexes. So, either you need to change the table collation like mention here https://dev.mysql.com/doc/refman/5.7/en/case-sensitivity.html.
OR
Easiest fix, you should use a BINARY of value.
SELECT * FROM `table` WHERE `column` = BINARY 'value'
E.g.
mysql> EXPLAIN SELECT * FROM temp1 WHERE BINARY col1 = "ABC" AND col2 = "DEF" ;
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | temp1 | ALL | NULL | NULL | NULL | NULL | 190543 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
VS
mysql> EXPLAIN SELECT * FROM temp1 WHERE col1 = BINARY "ABC" AND col2 = "DEF" ;
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | temp1 | range | col1_2e9e898e | col1_2e9e898e | 93 | NULL | 2 | Using index condition; Using where |
+----+-------------+-------+-------+---------------+---------------+---------+------+------+------------------------------------+
enter code here
1 row in set (0.00 sec)
Instead of using the = operator, you may want to use LIKE or LIKE BINARY
// this returns 1 (true)
select 'A' like 'a'
// this returns 0 (false)
select 'A' like binary 'a'
select * from user where username like binary 'a'
It will take 'a' and not 'A' in its condition
The most correct way to perform a case sensitive string comparison without changing the collation of the column being queried is to explicitly specify a character set and collation for the value that the column is being compared to.
select * from `table` where `column` = convert('value' using utf8mb4) collate utf8mb4_bin;
Why not use binary?
Using the binary operator is inadvisable because it compares the actual bytes of the encoded strings. If you compare the actual bytes of two strings encoded using the different character sets two strings that should be considered the same they may not be equal. For example if you have a column that uses the latin1 character set, and your server/session character set is utf8mb4, then when you compare the column with a string containing an accent such as 'café' it will not match rows containing that same string! This is because in latin1 é is encoded as the byte 0xE9 but in utf8 it is two bytes: 0xC3A9.
Why use convert as well as collate?
Collations must match the character set. So if your server or session is set to use the latin1 character set you must use collate latin1_bin but if your character set is utf8mb4 you must use collate utf8mb4_bin. Therefore the most robust solution is to always convert the value into the most flexible character set, and use the binary collation for that character set.
Why apply the convert and collate to the value and not the column?
When you apply any transforming function to a column before making a comparison it prevents the query engine from using an index if one exists for the column, which could dramatically slow down your query. Therefore it is always better to transform the value instead where possible. When a comparison is performed between two string values and one of them has an explicitly specified collation, the query engine will use the explicit collation, regardless of which value it is applied to.
Accent Sensitivity
It is important to note that MySql is not only case insensitive for columns using an _ci collation (which is typically the default), but also accent insensitive. This means that 'é' = 'e'. Using a binary collation (or the binary operator) will make string comparisons accent sensitive as well as case sensitive.
What is utf8mb4?
The utf8 character set in MySql is an alias for utf8mb3 which has been deprecated in recent versions because it does not support 4 byte characters (which is important for encoding strings like 🐈). If you wish to use the UTF8 character encoding with MySql then you should be using the utf8mb4 charset.
To make use of an index before using the BINARY, you could do something like this if you have large tables.
SELECT
*
FROM
(SELECT * FROM `table` WHERE `column` = 'value') as firstresult
WHERE
BINARY `column` = 'value'
The subquery would result in a really small case-insensitive subset of which you then select the only case-sensitive match.
You can use BINARY to case sensitive like this
select * from tb_app where BINARY android_package='com.Mtime';
unfortunately this sql can't use index, you will suffer a performance hit on queries reliant on that index
mysql> explain select * from tb_app where BINARY android_package='com.Mtime';
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | tb_app | NULL | ALL | NULL | NULL | NULL | NULL | 1590351 | 100.00 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
Fortunately, I have a few tricks to solve this problem
mysql> explain select * from tb_app where android_package='com.Mtime' and BINARY android_package='com.Mtime';
+----+-------------+--------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | tb_app | NULL | ref | idx_android_pkg | idx_android_pkg | 771 | const | 1 | 100.00 | Using index condition |
+----+-------------+--------+------------+------+---------------------------+---------------------------+---------+-------+------+----------+-----------------------+
Following is for MySQL versions equal to or higher than 5.5.
Add to /etc/mysql/my.cnf
[mysqld]
...
character-set-server=utf8
collation-server=utf8_bin
...
All other collations I tried seemed to be case-insensitive, only "utf8_bin" worked.
Do not forget to restart mysql after this:
sudo service mysql restart
According to http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html there is also a "latin1_bin".
The "utf8_general_cs" was not accepted by mysql startup. (I read "_cs" as "case-sensitive" - ???).
No need to changes anything on DB level, just you have to changes in SQL Query it will work.
Example -
"SELECT * FROM <TABLE> where userId = '" + iv_userId + "' AND password = BINARY '" + iv_password + "'";
Binary keyword will make case sensitive.
Excellent!
I share with you, code from a function that compares passwords:
SET pSignal =
(SELECT DECODE(r.usignal,'YOURSTRINGKEY') FROM rsw_uds r WHERE r.uname =
in_usdname AND r.uvige = 1);
SET pSuccess =(SELECT in_usdsignal LIKE BINARY pSignal);
IF pSuccess = 1 THEN
/*Your code if match*/
ELSE
/*Your code if don't match*/
END IF;
For those looking to do case sensitive comparison with a regular expression using RLIKE or REGEXP, you can instead use REGEXP_LIKE() with match type c like this:
SELECT * FROM `table` WHERE REGEXP_LIKE(`column`, 'value', 'c');
mysql is not case sensitive by default, try changing the language collation to latin1_general_cs

How do I convert a column to ASCII on the fly without saving to check for matches with an external ASCII string?

I have a member search function where you can give parts of names and the return should be all members having at least one of username, firstname or lastname matching that input. The problem here is that some names have 'weird' characters like the é in Renée and the user doesn't wanna type the weird character but the normal ASCII substitute e.
In PHP I convert the input string to ASCII using iconv (just in case someone types weird characters). In the database however I should also convert the weird chars to ASCII (obviously) for the strings to match.
I tried the following:
SELECT
CONVERT(_latin1'Renée' USING ascii) t1,
CAST(_latin1'Renée' AS CHAR CHARACTER SET ASCII) t2;
(That's two tries.) Both don't work. Both have Ren?e as output. The question mark should be an e. It's alright if it outputs Ren?ee since I can just remove all question marks after the convert.
As you can imagine, the columns I want to query are encoded Latin1.
Thanks.
You don't need to convert anything. Your requirement is to compare two strings and ask if they're equal, ignoring accents; the database server can use a collation to do that for you:
Non-UCA collations have a one-to-one
mapping from character code to weight.
In MySQL, such collations are case
insensitive and accent insensitive.
utf8_general_ci is an example: 'a',
'A', 'À', and 'á' each have different
character codes but all have a weight
of 0x0041 and compare as equal.
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 1 | 1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
First off, it should work this way:
SELECT * FROM `test` WHERE `name` COLLATE utf8_general_ci LIKE '%renee%';
Where the test table is:
+-----+--------+
| id | name |
+-----+--------+
| 1 | Renée |
| 2 | Renêe |
| 3 | Renee |
+-----+--------+
What is your MySQL version, and how do you try to match things?
One of the other possible solutions is transliteration.
Related: PHP Transliteration
Transliterating the input should not be a problem, but transliterating the values from the permanent storage (e.g. db) real-time during the search may not be feasible. So you can add three more fields like: username_slug, firstname_slug and lastname_slug. When inserting/modifying a record, set the slug values appropriately. And when searching, search the transliterated input against that slug fields.
+------+----------+---------------+----------+---------------+ ...
| id | username | username_slug | lastname | lastname_slug | ...
+------+----------+---------------+----------+---------------+ ...
| 1 | Renée | renee | La Niña | la-nina | ...
| 2 | Renêe | renee | ... | ... | ...
| 3 | Renee | renee | ... | ... | ...
+------+----------+---------------+----------+---------------+ ...
A search for "renee" or "renèe" would match all of the records.
As a side effect, you may be able to use that fields for generating SEF (search engine friendly) links, hence they are named ,..._slug, e.g. example.com/users/renee. Of course, in that case you should check for the uniqueness of the slug field.
#vincebowdren answer above works, I'm just adding this as an answer for formatting purposes:
CREATE TABLE `members` (
`id` int(11) DEFAULT NULL,
`lastname` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL
);
insert into members values (1, 'test6ë');
select id from members where lastname like 'test6e%';
Yields
+------+
| id |
+------+
| 1 |
+------+
And using Latin1,
set names latin1;
CREATE TABLE `members2` (
`id` int(11) DEFAULT NULL,
`lastname` varchar(20) CHARACTER SET latin1 DEFAULT NULL
);
insert into members2 values (1, 'Renée');
select id from members2 where lastname like '%Renee%';
will yield:
+------+
| id |
+------+
| 1 |
+------+
Of course, the OP should have the same charset in the application (PHP), connection (MySQL on Linux used to default to latin1 in 5.0, but defaults to UTF8 in 5.1), and in the field datatype to have less unknowns. Collations take care of the rest.
EDIT: I wrote should to have a better control over everything, but the following also works:
set names latin1;
select id from members where lastname like 'test6ë%';
Because, once the connection charset is set, MySQL does the conversion internally. In this case, it will convert somehow convert and compare the UTF8 string (from DB) to the latin1 (from query).
EDIT 2: Some skepticism requires me to provide an even more convincing example:
Given the statements above, here what I did more. Make sure the terminal is in UTF8.
set names utf8;
insert into members values (5, 'Renée'), (6, 'Renêe'), (7, 'Renèe');
select members.id, members.lastname, members2.id, members2.lastname
from members inner join members2 using (lastname);
Remember that members is in utf8 and members2 is in latin1.
+------+----------+------+----------+
| id | lastname | id | lastname |
+------+----------+------+----------+
| 5 | Renée | 1 | Renée |
| 6 | Renêe | 1 | Renée |
| 7 | Renèe | 1 | Renée |
+------+----------+------+----------+
which proves with the correct settings, the collation does the work for you.
The CAST() operator in the context of character encodings translates from one method of character storage to another — it does not change the actual characters, which is what you are after. An é character is what it is in any character set, it is not an e. You need to convert accented characters to non-accented characters, which is a different issue and has been asked a number of times previously (normalizing accented characters in MySQL queries).
I am unsure if there is a way to do this directly in MySQL, short of having a translation table and going through letter by letter. It would most likely be easier to write a PHP script to go through the database and make the translations.