I'm looking for a MySQL collation for UTF8 which is case insensitive and distinguishes between "a" and "ä" (or more generally, between umlauted / accented characters and their "pure" form). utf8_general_ci does the former, utf8_bin the latter, bot none does both. If there is no such collation, what can I do to get as close as possible in a WHERE clause?
My recommendation would be to use utf8_bin and in your WHERE clause, force both sides of your comparison to upper or lower case.
It works fine here with utf8_german2_ci as collation:
SELECT * FROM tablename WHERE fieldname LIKE "würz%" COLLATE utf8_german2_ci
I checked utf8_bin like this
CREATE TABLE tmp2 (utf8_bin VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin);
INSERT INTO tmp2 VALUES ('nói');
select * from tmp2 where utf8_bin='noi';
You could try utf8_swedish_ci, it's both case insensitive and distinguishes between a and ä (but treats e.g. ü like y).
Collations are language-dependent, and it seems German doesn't have its own collation in MySQL. (I had a look at your profile, which says you're German.)
Related
I got this code:
select * from locality WHERE name ="ISTASYON"
This works perfectly with mySQL. The problem comes, when I try to execute but result is incorrect
Result code:
28BAF9346A41E4E4E0501AAC4524363B 0 iSTASYON
402881a4523b52d201523b6c2afb4166 0 İSTASYON
402881a4523b52d201523b6c38b7417c 0 İSTASYON
402881a4523b52d201523baa9faf0092 0 İSTASYON
402881a4523b52d201523baab059009f 0 İSTASYON
402881a4523b52d201523baad01a00b7 0 İSTASYON
58441bc4c054447ebe1cddbfeef958b5 0 ISTASYON
fa7fb88d1d4c41feb497b08f42066c82 1 2016-04-19 09:53:41.000000 İSTASYON
My problem is that results contain ISTASYON, İSTASYON, ıSTASYON, iSTASYON but i wanna only ISTASYON
How to solve this problem ?
You can use the COLLATE or BINARY operator to force binary comparison:
SELECT * FROM locality WHERE name COLLATE utf8_bin = "ISTASYON"
or
SELECT * FROM locality WHERE BINARY name = "ISTASYON"
If you want the column always to be treated in this fashion, declare it with a binary collation.
See the docs for more info.
You can use a select query like this, I also use and it's helpful.
Select * from `users` where username COLLATE latin1_general_cs LIKE '%$email%'
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'i%', you get all column values that start with I or i.To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation.
col_name COLLATE latin1_general_cs LIKE 'i%'
col_name LIKE 'i%' COLLATE latin1_general_cs
col_name COLLATE latin1_bin LIKE 'i%'
col_name LIKE 'i%' COLLATE latin1_bin
select * from locality WHERE BINARY name ="ISTASYON"
or
select * from locality WHERE name = BINARY "ISTASYON"
Plan A: This will be case-sensitive and accent sensitive:
It would be more efficient (after setup) to have the column be utf8_bin (or utf8mb4_bin. latin1 (and latin1_bin) will not suffice because of the Turkish characters.
The setup is
ALTER TABLE tbl MODIFY COLUMN ... COLLATION utf8_bin ...;
Caution: If you also want to do case folding and/or accent stripping with the same column(s) in other situations, this will hurt those situations.
Plan B: This addresses the dotless ı and a few other Turkish idiosyncrasies.
COLLATION utf8_turkish_ci (or utf8mb4_turkish_ci) treats these equal: I=ı, then treats the following as different and later, but all equal: i=Ì=Í=Î=Ï=ì=í=î=ï=Ī=ī=Į=į=İ. Ü=ü come after U and before V. Similarly Ğ=ğ come after G and before H. See utf8 collations for more details.
I want to search for a name on database. But I just want select Bill , not biLL or BiLL or ... just "Bill". But when I use this query which shows Bill , BiLL, BILL, bilL and ...
query=`select * from names where name='Bill'`
To quote the documentation:
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation
You can overcome this by explicitly using a case sensitive collation:
select * from names where name='Bill' COLLATE latin1_general_cs
There is also another solution to set the collection of the column to utf8mb4_unicode_520 or any case sensitive standard collections.
I have an InnoDB table with VARCHAR(250) cp1251_general_ci field named comment.
I'm trying to search on this field, case sentative.
SELECT comment
FROM body_legend
WHERE comment LIKE '%ТТ%'
GROUP BY comment
works as expected, but its case insensitive.
I tried to use BINARY like
SELECT comment
FROM body_legend
WHERE comment LIKE BINARY '%ТТ%'
GROUP BY comment`
it returns an empty result.
I tried to use COLLATE like
SELECT comment
FROM body_legend
WHERE comment LIKE '%ТТ%' COLLATE cp1251_general_ci
it returns error
COLLATION 'cp1251_general_ci' is not valid for CHARACTER SET 'utf8mb4'
How to make search case sensitive? I would be glad if the answer is complemented by a description of why did not work my queries.
Self-answer:
SELECT comment
FROM body_legend
WHERE comment
LIKE BINARY CONVERT( '%ТТ%' USING cp1251);
It seems that if I write this query in the phpMyAdmin it is read as the utf8 and in bit-wise comparison with the value in the cp1251 table, of course, is not the same.
This question already has answers here:
What does character set and collation mean exactly?
(4 answers)
Closed 8 years ago.
1/ Sometime you write
ALTER DATABASE [MyDb] COLLATE SQL_Latin1_General_CP1_CI_AS
after a moment, if you try
ALTER DATABASE [MyDb] COLLATE FRENCH_CI_AS
it may not succeed. why this?
2/ if table1 is SQL_Latin1_General_CP1_CI_AS and if table2 is FRENCH_CI_AS
one cannot write ...where table1.field1= table2.field2
What is the consequence of COLLATE in a database or table ?
you can compare your query with collate name
table1.field1 = table2.field2 COLLATE FRENCH_CI_AS
otherwise Alter Column.(If you want to change field COLLATE)
ALTER TABLE [dbo].[TableName]
ALTER COLUMN [FieldName] [DataType] COLLATE [Collate Name] NOT NULL
This answer is based on experience from working with Ms Sql Server. I have only a little experience from working with other databases, but I imagine much of this would apply for them as well.
From Msdn (here):
Collations let users sort and compare strings according to their own
conventions
Collations are used to let sql server know which characters are considered the same. If a case insensitive collation is used, X and x are considered the same. If a case sensitive collation is used, they are different. In an accent insensitive collation e and é might be considered the same, while in an accent sensitive collation they are different.
At least with Ms Sql Server, if you are comparing two strings that are stored in different collations, you must tell the server which collation to use in the comparison by using the COLLATE clause already mentioned in other answers. Not sure about how this is done in other databases.
If your table contains a row with a column that contains the text ÖBc and you select from this table like so:
select COL from TBL where COL = 'obc'
Then the row will be found using a case and accent insensitive collation like latin1_general_ci_ai. Accent insensitivity means that O and Ö are the same, and case insensitivity takes care of the case mismatch.
With the same select and collation latin1_general_cs_ai the row will not be found because of case sensitivity.
Similarly with collation latin1_general_ci_as, the row will not be found because of accent sensitivity.
Collation is also used to determine alphabetical sort order when sorting results. This determines if upper and lower case charachers are sorted differently and also where accented charaters are sorted.
You can compare item with different collations. For SQL Server you use :
table1.field1 = table2.field2 COLLATE FRENCH_CI_AS
or
table1.field1 = table2.field2 COLLATE SQL_Latin1_General_CP1_CI_AS
(or whatever the field1 collation is).
Note: You should try and leave the collation as the server default, unless you have a specific reason to do otherwise.
I have got a problem with a simple Select Query and special chars. I want to select the name Änne.
SELECT * FROM `names` WHERE `name` = 'Änne'
utf8_general_ci
Änne
Anne
okay, ...
utf8 general ci is a very simple collation. What it does it just
removes all accents then converts to upper case and uses the code of this sort of "base letter" result letter to compare.
http://forums.mysql.com/read.php?103,187048,188748
utf8_unicode_ci
Änne
Anne
why?
utf8_bin
Änne
utf8_bin seems to be the right choice at this point, but i have to do my search case insensitiv.
SELECT * FROM `names` WHERE `name` = 'änne'
utf8_bin
none
Is there no way to do so?
I could use php ucwords() to uppercase the first letters, but i would prefer to find a DB solution.
edit: ucwords('änne') = änne, so i cant use that too
SELECT * FROM `names` WHERE lower(`name`) = 'änne'
is working for me, because i don't have a difference between 'Änne' and 'änne' in my DB.
what about:
SELECT * FROM `names` WHERE upper(`name`) = upper("änne")
Quoting doc:
The default character set and collation are latin1 and
latin1_swedish_ci, so nonbinary string comparisons are case
insensitive by default. This means that if you search with col_name
LIKE 'a%', you get all column values that start with A or a. To make
this search case sensitive, make sure that one of the operands has a
case sensitive or binary collation
That means that case sensitive results are because you have set a binary collation. You can set collation column to utf8_general_ci and change it on searchs:
col_name COLLATE latin1_general_cs LIKE 'a%'
There is an error in your MySQL code:
SELECT * FROM names WHERE name = "Änne"
Remove the quotes around the table name and the field name.