case sensitive LIKE on VARCHAR field with cp1251 encoding

case sensitive LIKE on VARCHAR field with cp1251 encoding - mysql

I have an InnoDB table with VARCHAR(250) cp1251_general_ci field named comment.
I'm trying to search on this field, case sentative.
SELECT comment
FROM body_legend
WHERE comment LIKE '%ТТ%'
GROUP BY comment
works as expected, but its case insensitive.
I tried to use BINARY like
SELECT comment
FROM body_legend
WHERE comment LIKE BINARY '%ТТ%'
GROUP BY comment`
it returns an empty result.
I tried to use COLLATE like
SELECT comment
FROM body_legend
WHERE comment LIKE '%ТТ%' COLLATE cp1251_general_ci
it returns error
COLLATION 'cp1251_general_ci' is not valid for CHARACTER SET 'utf8mb4'
How to make search case sensitive? I would be glad if the answer is complemented by a description of why did not work my queries.

Self-answer:
SELECT comment
FROM body_legend
WHERE comment
LIKE BINARY CONVERT( '%ТТ%' USING cp1251);
It seems that if I write this query in the phpMyAdmin it is read as the utf8 and in bit-wise comparison with the value in the cp1251 table, of course, is not the same.

Related

How to search for the exact string value in MySQL?

I got this code:
select * from locality WHERE name ="ISTASYON"
This works perfectly with mySQL. The problem comes, when I try to execute but result is incorrect
Result code:
28BAF9346A41E4E4E0501AAC4524363B 0 iSTASYON
402881a4523b52d201523b6c2afb4166 0 İSTASYON
402881a4523b52d201523b6c38b7417c 0 İSTASYON
402881a4523b52d201523baa9faf0092 0 İSTASYON
402881a4523b52d201523baab059009f 0 İSTASYON
402881a4523b52d201523baad01a00b7 0 İSTASYON
58441bc4c054447ebe1cddbfeef958b5 0 ISTASYON
fa7fb88d1d4c41feb497b08f42066c82 1 2016-04-19 09:53:41.000000 İSTASYON
My problem is that results contain ISTASYON, İSTASYON, ıSTASYON, iSTASYON but i wanna only ISTASYON
How to solve this problem ?

You can use the COLLATE or BINARY operator to force binary comparison:
SELECT * FROM locality WHERE name COLLATE utf8_bin = "ISTASYON"
or
SELECT * FROM locality WHERE BINARY name = "ISTASYON"
If you want the column always to be treated in this fashion, declare it with a binary collation.
See the docs for more info.

You can use a select query like this, I also use and it's helpful.
Select * from `users` where username COLLATE latin1_general_cs LIKE '%$email%'

The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'i%', you get all column values that start with I or i.To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation.
col_name COLLATE latin1_general_cs LIKE 'i%'
col_name LIKE 'i%' COLLATE latin1_general_cs
col_name COLLATE latin1_bin LIKE 'i%'
col_name LIKE 'i%' COLLATE latin1_bin

select * from locality WHERE BINARY name ="ISTASYON"
or
select * from locality WHERE name = BINARY "ISTASYON"

Plan A: This will be case-sensitive and accent sensitive:
It would be more efficient (after setup) to have the column be utf8_bin (or utf8mb4_bin. latin1 (and latin1_bin) will not suffice because of the Turkish characters.
The setup is
ALTER TABLE tbl MODIFY COLUMN ... COLLATION utf8_bin ...;
Caution: If you also want to do case folding and/or accent stripping with the same column(s) in other situations, this will hurt those situations.
Plan B: This addresses the dotless ı and a few other Turkish idiosyncrasies.
COLLATION utf8_turkish_ci (or utf8mb4_turkish_ci) treats these equal: I=ı, then treats the following as different and later, but all equal: i=Ì=Í=Î=Ï=ì=í=î=ï=Ī=ī=Į=į=İ. Ü=ü come after U and before V. Similarly Ğ=ğ come after G and before H. See utf8 collations for more details.

MySQL string comparison with special characters

I have created an autocomplete that matches against a list of names in a database.
The database that I'm working contains a ton of names with special characters, but the end users are most likely going to search with the English equivalent of those names, e.g. Bela Bartok for Béla Bartók and Dvorak for Dvořák, etc. Currently, doing the English searches returns no results.
I have come across threads saying that the way to solve this is to change your MySQL collation to utf8 (which I have done to no avail).
I think that this may be because I used utf8_unicode_ci, but the one that would get the results that I want is utf8_general_ci. The problem with the latter though is that all the comments say to no longer use it.
Does anyone know how I can solve this problem?

If you know the list of special characters and what the equivalents in plain English are, than you can do the following:
lower case the string
replace the characters with the lower case equivalents
search against that "plain English" column
You will need to use the full text searching of MySQL in order to search against the text or come up with a home grown solution for how you're going to handle that.

Just tested with both utf8_general_ci and utf8_unicode_ci collations and it worked like a charm in both cases.
Follows the MySQL code I used to run my test:
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`text` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `test` (`id`, `text`) VALUES (NULL, 'Dvořák'), (NULL, 'Béla Bartók');
SELECT * FROM `test` WHERE `text` LIKE '%dvorak%';
The above SELECT statement returns:
id text
--------------
1 Dvořák
Note: During my test I set all the collations to the desired one. The database collation, the table collation and the column collation as well.
Could it be that there's a bug in your PHP application?

I found the solution to my problem. Changing the collation to utf8_unicode_ci works perfectly fine. My problem was that I needed to use REGEXP in my query instead of LIKE, but REGEXP obviously doesn't work in this situation!
So in short, changing your collation to utf8_unicode_ci will allow you to compare Dvorak and Dvořák using = or LIKE, but not one of the REGEXP equivalents.

First, let's see if the data is stored correctly. Do
SELECT name, HEX(name) FROM ... WHERE ...;
Béla may come out (ignoring the spaces)
42 C3A9 6C 61 -- if correctly encoded with utf8 (é = C3A9)
42 E9 6C 61 -- if encoded with latin1 (é = E9)
The "Collation" (utf8_general_ci or utf8_unicode_ci) makes no difference for the examples you gave. Both tread é = e. See extensive list of equivalences for utf8 collations.
After you determine the encoding, we can proceed to prescribe a cure.

Taking a hint from Rick James, using:
SELECT * FROM `test` WHERE HEX(`column`) = HEX('Dvořák');
Should work. If you need a case insensitive query, then you'll need to lower/upper both sides in addition to the HEX check.

A more up to date collation is utf8mb4_unicode_520_ci.
Note, it does NOT work for utf8mb4_unicode_ci. See the comparison here: https://stackoverflow.com/a/59805600/857113

Make different lower and upper in mysql select

I want to search for a name on database. But I just want select Bill , not biLL or BiLL or ... just "Bill". But when I use this query which shows Bill , BiLL, BILL, bilL and ...
query=`select * from names where name='Bill'`

To quote the documentation:
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation
You can overcome this by explicitly using a case sensitive collation:
select * from names where name='Bill' COLLATE latin1_general_cs

There is also another solution to set the collection of the column to utf8mb4_unicode_520 or any case sensitive standard collections.

Special Characters and a simple select query

I have got a problem with a simple Select Query and special chars. I want to select the name Änne.
SELECT * FROM `names` WHERE `name` = 'Änne'
utf8_general_ci
Änne
Anne
okay, ...
utf8 general ci is a very simple collation. What it does it just
removes all accents then converts to upper case and uses the code of this sort of "base letter" result letter to compare.
http://forums.mysql.com/read.php?103,187048,188748
utf8_unicode_ci
Änne
Anne
why?
utf8_bin
Änne
utf8_bin seems to be the right choice at this point, but i have to do my search case insensitiv.
SELECT * FROM `names` WHERE `name` = 'änne'
utf8_bin
none
Is there no way to do so?
I could use php ucwords() to uppercase the first letters, but i would prefer to find a DB solution.
edit: ucwords('änne') = änne, so i cant use that too
SELECT * FROM `names` WHERE lower(`name`) = 'änne'
is working for me, because i don't have a difference between 'Änne' and 'änne' in my DB.

what about:
SELECT * FROM `names` WHERE upper(`name`) = upper("änne")
Quoting doc:
The default character set and collation are latin1 and
latin1_swedish_ci, so nonbinary string comparisons are case
insensitive by default. This means that if you search with col_name
LIKE 'a%', you get all column values that start with A or a. To make
this search case sensitive, make sure that one of the operands has a
case sensitive or binary collation
That means that case sensitive results are because you have set a binary collation. You can set collation column to utf8_general_ci and change it on searchs:
col_name COLLATE latin1_general_cs LIKE 'a%'

There is an error in your MySQL code:
SELECT * FROM names WHERE name = "Änne"
Remove the quotes around the table name and the field name.

Looking for case insensitive MySQL collation where "a" != "ä"

I'm looking for a MySQL collation for UTF8 which is case insensitive and distinguishes between "a" and "ä" (or more generally, between umlauted / accented characters and their "pure" form). utf8_general_ci does the former, utf8_bin the latter, bot none does both. If there is no such collation, what can I do to get as close as possible in a WHERE clause?

My recommendation would be to use utf8_bin and in your WHERE clause, force both sides of your comparison to upper or lower case.

It works fine here with utf8_german2_ci as collation:
SELECT * FROM tablename WHERE fieldname LIKE "würz%" COLLATE utf8_german2_ci

I checked utf8_bin like this
CREATE TABLE tmp2 (utf8_bin VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin);
INSERT INTO tmp2 VALUES ('nói');
select * from tmp2 where utf8_bin='noi';

You could try utf8_swedish_ci, it's both case insensitive and distinguishes between a and ä (but treats e.g. ü like y).
Collations are language-dependent, and it seems German doesn't have its own collation in MySQL. (I had a look at your profile, which says you're German.)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

case sensitive LIKE on VARCHAR field with cp1251 encoding - mysql

Self-answer: SELECT comment FROM body_legend WHERE comment LIKE BINARY CONVERT( '%ТТ%' USING cp1251); It seems that if I write this query in the phpMyAdmin it is read as the utf8 and in bit-wise comparison with the value in the cp1251 table, of course, is not the same.

Related

How to search for the exact string value in MySQL?

MySQL string comparison with special characters

Make different lower and upper in mysql select

Special Characters and a simple select query

Looking for case insensitive MySQL collation where "a" != "ä"

Categories

Resources