Case sensitive find in Rails - mysql

I'm using Mysql with collation utf8_general_ci and for most of my searches it is good. But for one model and one field I want to find a record with case sensitive. How to do it?

It is MySQL that is doing the case insensitive query, not Ruby on Rails.
See http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
You could make database columns, that require case sensitivity to be case sensitive
Modify fields to be BINARY or VARBINARY instead of CHAR and VARCHAR.
Modify fields to have binary collation (e.g. latin1_bin)
-
create table tbl_name (
...
data varchar COLLATE latin1_bin
)
Or you can modify your queries to use COLLATE operator:
SELECT * from tbl_name WHERE col_name COLLATE latin1_bin LIKE 'a%'

If you always want to search that column in a case sensitive manner, the best thing would be to define it with collation utf8_bin

Related

Laravel: How to create table with case sensitive column (binary)?

I would like to use base62 unique identifiers and my problem is that the columns are not case sensitive, so F1 is the same as f1 when I search for it. Now in MYSQL I would simply do
CREATE TABLE USERS
(
USER_NAME STRING(10) BINARY
)
So in Laravel it should look like
$table->string('base62_id', 10)->binary();
However, I don't think ->binary() exists in laravel for this purpose. So how would I do that?
I understand this question is old, but in case anyone stumbles upon it. At time of writing, Laravel is version 8 and this is valid:
$table->string("case_sensitive_id")->charset("utf8")->collation("utf8_bin")->nullable();
This will achieve case sensitivity without any sort of alter statements.
So this is the answer:
DB::statement("ALTER TABLE `mytable` ADD `base62_id` VARCHAR( 10 ) CHARACTER SET utf8 COLLATE utf8_bin UNIQUE AFTER `id` , ADD INDEX ( `base62_id` )");
The key is to use
CHARACTER SET utf8 COLLATE utf8_bin
to make it case sensitive.
Thank you # my source: http://blog.birdhouse.org/2010/10/24/base62-urls-django/comment-page-1/

Convert from utf8_general_ci to utf8_unicode_ci

I have a utf8_general_ci database that I'm interested in converting to utf8_unicode_ci.
I've tried the following commands
ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; (for every single table)
But that seems to change the charset for future data but doesn't convert the actual existing data from utf8_general_ci to utf8_unicode_ci.
Is there any way to convert the existing data to utf8_unicode_ci?
SHOW CREATE TABLE to see if it really set the CHARACTER SET and COLLATION on the columns, not just the defaults.
What was the CHARACTER SET before the ALTERs?
Do SELECT col, HEX(col) ... for some field that should have utf8 in it. This will help us determine if you really have utf8 in the table. The encoding for characters is different based on CHARACTER SET; the HEX helps discover such.
The ordering (WHERE, ORDER BY, etc) is controlled by COLLATION. The indexes probably had to be rebuilt based on your ALTER TABLE. Did big tables with indexes take a 'long' time to convert?
To actually see the difference between utf8_general_ci and utf8_unicode_ci, you need a "combining accent" or, more simply, the German ß versus ss:
mysql> SELECT 'ß' = 'ss' COLLATE utf8_general_ci,
'ß' = 'ss' COLLATE utf8_unicode_ci;
+-------------------------------------+-------------------------------------+
| 'ß' = 'ss' COLLATE utf8_general_ci | 'ß' = 'ss' COLLATE utf8_unicode_ci |
+-------------------------------------+-------------------------------------+
| 0 | 1 |
+-------------------------------------+-------------------------------------+
However, to test that in your tables, you would need to store those values and use WHERE or GROUP_CONCAT or something else to determine the equality.
What 'proof' do you have that the ALTERs failed to achieve the collation change?
(Addressing other comments: REPAIR should be irrelevant. CONVERT TO tells the ALTER to actually modify the data, so it should have done the desired action.)
You have to change the collation of every field in every table. As you say, the collation of the table is only the default value for fields created later, and the collation of the database is only the default value for tables created later.
As Lorenz Meyer said, the collation of the table is only the default value for fields created later and you need to set the defaults for the columns explicitly too.
Such a change looks like:
ALTER TABLE mytable CHANGE mycolumn mycolumn varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

Case Insensitivity of MySQL

Hello everyone I am using MySQL 5.0, but when I fire my queries through my web application that is in Java they are case insensitive.
First query:
select * from market where company='"abc"'
Second query:
select 8 from market where company='"ABC"'
Both queries give me same results. I just want rows with company "abc" only and not ABC.
How I can solve this problem? Thanks.
You need to set the proper collation.
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
You can use a binary collation for the field, for instance utf8_bin
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
If you want to do always case sensitive searches on that field, you can use an alter table to set utf8_bin, or another binary collation, for example:
ALTER TABLE `market` CHANGE `company` `company` VARCHAR( 255 )
CHARACTER SET utf8 COLLATE utf8_bin NULL DEFAULT NULL

MYSQL case sensitive search for utf8_bin field

I created a table and set the collation to utf8 in order to be able to add a unique index to a field. Now I need to do case insensitive searches, but when I performed some queries with the collate keyword and I got:
mysql> select * from page where pageTitle="Something" Collate utf8_general_ci;
ERROR 1253 (42000): COLLATION 'utf8_general_ci' is not valid for
CHARACTER SET 'latin1'
mysql> select * from page where pageTitle="Something" Collate latin1_general_ci;
ERROR 1267 (HY000): Illegal mix of collations (utf8_bin,IMPLICIT) and
(latin1_general_ci,EXPLICIT) for operation '='
I am pretty new to SQL, so I was wondering if anyone could help.
A string in MySQL has a character set and a collation. Utf8 is the character set, and utf8_bin is one of its collations. To compare your string literal to an utf8 column, convert it to utf8 by prefixing it with the _charset notation:
_utf8 'Something'
Now a collation is only valid for some character sets. The case-sensitive collation for utf8 appears to be utf8_bin, which you can specify like:
_utf8 'Something' collate utf8_bin
With these conversions, the query should work:
select * from page where pageTitle = _utf8 'Something' collate utf8_bin
The _charset prefix works with string literals. To change the character set of a field, there is CONVERT ... USING. This is useful when you'd like to convert the pageTitle field to another character set, as in:
select * from page
where convert(pageTitle using latin1) collate latin1_general_cs = 'Something'
To see the character and collation for a column named 'col' in a table called 'TAB', try:
select distinct collation(col), charset(col) from TAB
A list of all character sets and collations can be found with:
show character set
show collation
And all valid collations for utf8 can be found with:
show collation where charset = 'utf8'
Also please note that in case of using "Collate utf8_general_ci" or "Collate latin1_general_ci", i.e. "force" collate - such a converting will prevent from usage of existing indexes! This could be a bottleneck in future for performance.
Try this, Its working for me
SELECT * FROM users WHERE UPPER(name) = UPPER('josé') COLLATE utf8_bin;
May I ask why you have a need to explicitly change the collation when you do a SELECT? Why not just collate in the way you want to retrieve the records when sorted?
The problem you are having with your searches being case sensitive is that you have a binary collation. Try instead to use the general collation. For more information about case sensitivity and collations, look here:
Case Sensitivity in String Searches

MySQL error: "Column 'columnname' cannot be part of FULLTEXT index"

Recently I changed a bunch of columns to utf8_general_ci (the default UTF-8 collation) but when attempting to change a particular column, I received the MySQL error:
Column 'node_content' cannot be part of FULLTEXT index
In looking through docs, it appears that MySQL has a problem with FULLTEXT indexes on some multi-byte charsets such as UCS-2, but that it should work on UTF-8.
I'm on the latest stable MySQL 5.0.x release (5.0.77 I believe).
Oops, so I have found the answer to my problem:
All columns of a FULLTEXT index must have not only the same character set but also the same collation.
My FULLTEXT index had utf8_unicode_ci on one of its columns, and utf8_general_ci on its other columns.
Just to add to Thomas's good advice: And to sort things out in PHPMyAdmin you have to change the characterset for all columns AT THE SAME TIME.
Just wasted half a day trying again and again to change the columns one at a time and continually getting the error message about the FULLTEXT index.
For DBeaver/database tool users.
When you use interface to modify more than one column, the tool generate commands like this :
ALTER TABLE databaseName.tableName MODIFY COLUMN columnName1 text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL;
ALTER TABLE databaseName.tableName MODIFY COLUMN columnName2 varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL;
This is not working because you must modify the charsets at the same time.
So, you have to change it manually, in one command :
ALTER TABLE databaseName.tableName
MODIFY COLUMN columnName1 text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL,
MODIFY COLUMN columnName2 text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL;
utf8 or utf8mb4 ? See here.