Case sensitive uniqueness and case insensitive search - mysql

I have a table with a field a using encoding utf8 and collation utf8_unicode_ci:
CREATE TABLE dictionary (
a varchar(128) NOT NULL
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The collation utf8_unicode_ci is required for an efficient case insensitive search with extensions and ligations. For this purpose i have the index:
CREATE INDEX a_idx on dictionary(a);
Problem: Additionally i must ensure that all stored values of the field a are unique but in a case sensitive way.
German example: "blühen" and "Blühen" must both be stored in the table. But adding "Blühen" a second time should not be possible.
Is there a build-in functionality in MySQL to have both?
Unfortunately it seems not to be possible to set the collation for the index in MySQL 5.1.
Solutions to this problem include a uniqueness check before insert or a trigger. Both are far less elegant than using a unique index.

Well, there are 2 ways to accomplish this:
using _bin collation
change your datatype to VARBINARY
Case 1: using _bin collation
Create your table as follows:
CREATE TABLE `dictionary` (
`a` VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
UNIQUE KEY `idx_un_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Please note:
the datatype of the column a
the UNIQUE index on column a
Case 2: using VARBINARY dataype
Create your table as follows:
CREATE TABLE `dictionary` (
`a` VARBINARY(128) NOT NULL,
UNIQUE KEY `idx_uniq_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Please note:
the new datatype VARBINARY
the UNIQUE index on column a
So, both the above will solve your purpose. That is, they both will allow values like 'abc', 'Abc', 'ABC', 'aBc' etc but not allow the same value again if the case matches.
Please note that giving an "_bin" collation is different than using the binary datatype. So please feel free to refer to the following links:
The BINARY and VARBINARY datatypes
The _bin and binary Collations
I hope the above helps!

You can achieve this by adding additinal column 'column_lower'.
CREATE TABLE `dictionary` (
`a` VARCHAR(128) NOT NULL,
`a_lower` VARCHAR(128) NOT NULL,
UNIQUE KEY `idx_un_a_lower` (`a_lower`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Insert that goes like this:
insert into dictionary set a = x, a_lower = lower(x);
Select can now be case-insensitive:
select * from dictionary where a_lower like lower('search_term%')
Note that column which has index on it, can store at max 191 characters. MySQL can have at max 767 bytes long index, that is 767 / 4 (unicode can take up to 4 bytes if you use utf8mb4 collation) = 191.75 = 191 characters. If you use utf8 collation that takes up at max 3 bytes per character column can store at max 767 / 3 = 255 characters.

SELECT * FROM dictionary WHERE a COLLATE utf8_general_ci = 'abc'
Try this It will work .. it worked for me.

Related

What BIN column property in MySQL Workbench is used for?

I have done enough research and read the MySQL documentation, but I seem not to find a good explanation of the BINARY (BIN) column property in MySQL Table.
Could someone explain when this should be checked and/or what is used for?
The BIN column means the column uses a binary collation.
I tested this by creating a table with a VARCHAR datatype and I checked the BIN column in MySQL Workbench.
Then I viewed the DDL for the table in the command-line client:
mysql> show create table mytable\G
*************************** 1. row ***************************
Table: mytable
Create Table: CREATE TABLE `mytable` (
`title` varchar(45) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
PRIMARY KEY (`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
You can see that the collation is utf8mb4_bin, which is the binary collation for that character set.
String comparisons to that column will use byte-by-byte comparison instead of using character equivalences according to any unicode-compatible collation.
So it's case-sensitive, and characters will compare as different even if they differ only in diacritics. For example 'e' = 'é' is false in binary comparisons.

MySQL. Can't insert russian characters into column with a type VARCHAR(45)

I have a problem when trying to execute this line in MySQL (Workbench):
INSERT INTO classification (`Type`, `Subtype`) VALUES ("тип", "подтип");
I have tried to set different charsets for table classification : cp1251, utf-8, utf8mb4, cp1251_bin.
This is a table with all charsets in my database that I have found, maybe it will help you:
UPD. I have found a solution. However, I had to change my table, so now the table risk is an edited table classification. The result of SHOW CREATE TABLE risk is:
'CREATE TABLE `risk` (
`IdRisk` int(11) NOT NULL AUTO_INCREMENT,
`IdSubtype` int(11) DEFAULT NULL,
`Content` varchar(4000) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`IdRisk`),
KEY `FK_subtype_risk_idx` (`IdSubtype`),
CONSTRAINT `FK_subtype_risk` FOREIGN KEY (`IdSubtype`) REFERENCES `subtype` (`IdSubtype`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=48 DEFAULT CHARSET=latin1'
Can't find the solution to this issue. I'm hope that someone knows a solution to it.
Thank You!
The CHARACTER SET for the table is the default for columns in the table. Please provide SHOW CREATE TABLE so we can verify what the columns are set to.
What is the encoding of the bytes in the client? cp1251 is different than utf8; utf8mb4 == utf8 for Russian.
In what way are things bad? Based on the symptom, see this for specific tips on what else might be set incorrectly.
Perhaps it was your change to NVARCHAR that forced CHARACTER SET utf8 on the columns?

finding values case insensitively with emojis

I have a table with varchar value that needs to store text values with emojis:
CREATE TABLE `my_table` (
`id` bigint(11) NOT NULL AUTO_INCREMENT,
`value` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `value_idx` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Now I need to do selects on this table to find all values starting with prefix. Selects must be case insensitive and must match emoji as well. So far I found 4 options, which all have trade offs:
I can use utf8mb4_unicode_ci collation and do selects like
select * from my_table where value like 'prefix%'
It will wind all values starting with prefix ignoring its characters case, but will not find anything if prefix contains emojis
I can set collection to utf8mb4_bin and my selects will find values if prefix contains emojis, but will be case sensitive
I can do
select * from my_table where LOWER(value) like 'prefix%'
and it will work case insensitively and with emojis, but will not use index
And finally I can save all values in lower case and use utf8mb4_bin collation, but saving in lower case is also the trade off
Is there any solution that would allow me to do "like" selects ignoring case of the prefix and allowing to have emojis in prefix?
UPD: I do not have problems with storing emojis, I have problems with finding them with "like" select keeping case insensitive collation
Solution is to use MySQL 5.6+ and to use utf8mb4_unicode_520_ci collation which doesn't treat all 4 bytes characters as equal

MySql table columns have different CHARSET and COLLATION even when SELECT data is from the same source table and column?

In creating a simple (temporary) MySQL table, taking data from the same column of the same source table, the two resulting columns wind up with different CHARACTER SET and resulting default COLLATION settings:
mysql> CREATE TABLE tempDates
SELECT SUBDATE(MAX(EventDate), INTERVAL 90 DAY) AS StartDate,
MAX(EventDate) AS EndDate FROM james_bond_007
WHERE EventCategory = 'Successful_Kills';
Here is the output showing the resulting table structures:
mysql> SHOW CREATE TABLE tempDates;
CREATE TABLE `tempDates` (
`StartDate` varchar(29) CHARACTER SET utf8 DEFAULT NULL,
`EndDate` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
I ran an alter table command, but NOTHING changed:
ALTER TABLE tempdates CHARACTER SET latin1 COLLATE latin1_swedish_ci;
From a curiosity standpoint, I want to know why this happens, and from a practical standpoint, how do I make this not happen?
The result I want is for all columns to have the server defaults: CHARACTER SET latin1 COLLATE latin1_swedish_ci
Even better would be a way to impose the server defaults on all columns so I don't have to type more than I want to in future queries of this type.
#Rick James
This solved my problem so I want to mark it answered.
If you've a moment, perhaps an explanation as to why? (gives me another excuse to upvote you and accept your answer)

Can MySQL automatically specify `_utf8` for inserts to UTF-8 columns?

I have a table like this, where one column is latin1, the other is UTF-8:
Create Table: CREATE TABLE `names` (
`name_english` varchar(255) character NOT NULL,
`name_chinese` varchar(255) character set utf8 default NULL,
) ENGINE=MyISAM DEFAULT CHARSET=latin1
When I do an insert, I have to type _utf8 before values being inserted into UTF-8 columns:
insert into names (name_english = "hooey", name_chinese = _utf8 "鬼佬");
However, since MySQL should know that name_chinese is a UTF-8 column, it should be able to know to use _utf8 automatically.
Is there any way to tell MySQL to use _utf8 automatically, so when I'm programatically making prepared statements, I don't have to worry about including it with the right parameters?
why not to use UTF-8 for the whole table?