MySQL Diacritics insensitive search - mysql

I have a Romanian dictionary database. The word table has a column named Word which is utf8_romanian_ci collation. In this column I keep all the words. Most of the Romanian words have diacritics: acasă, mâine ...etc.
I try to run a query which ignores the diacritics. Something like:
SELECT * FROM WordList where Word = 'acasa'
should return the word acasă
I tried:
SET NAMES utf8;
before the query, but it does not work.
I also tried
SELECT * FROM WordList where Word = 'acasa' COLLATE utf8_bin
It does not work either.
Any idea that it might work?

Try to add COLLATE utf8_unicode_ci to query:
SELECT *
FROM WordList
WHERE Word = _utf8 'acasa' COLLATE utf8_unicode_ci
Test on SQL Fiddle
More info:
MySQL: Unicode Character Sets
MySQL: Using COLLATE in SQL Statements

Related

Replace specific character MySQL search on utf8mb4_unicode_ci

I want to search my database for the character İ - "latin capital letter i with dot above (U+0130)" - and replace it with a regular I (U+0049).
For example, I want to transform "SİNG" to "SING".
The database collation is utf8mb4_unicode_ci.
I can find the characters using COLLATE utf8mb4_bin
SELECT * FROM `benches` WHERE `inscription` LIKE '%İ%' COLLATE utf8mb4_bin
But I can't replace it.
UPDATE `benches` SET inscription = REPLACE(inscription, 'İ', 'I') WHERE INSTR(inscription, 'İ') > 0 COLLATE utf8mb4_bin
I get the error
#1253 - COLLATION 'utf8mb4_bin' is not valid for CHARACTER SET 'latin1'
Which is weird because the database and column are definitely utf8mb4_unicode_ci
So, what magic invocation do I need to search and replace a specific Unicode character from within a string?
The quick fix might be
UPDATE `benches`
SET inscription = REPLACE(inscription, _utf8mb4'İ' COLLATE utf8mb4_bin, 'I')
WHERE INSTR(inscription, _utf8mb4'İ' COLLATE utf8mb4_bin) > 0
A better fix might be to execute this after connecting:
SET NAMES utf8mb4;
If neither of these work, please provide a test case that includes creating and populating a table, plus the UPDATE. It may take some experimentation to conjure up another potential solution.
I have had success with a query like:
UPDATE `benches`
SET inscription = REPLACE(inscription, 'İ', 'I')
WHERE inscription LIKE '%İ%' COLLATE utf8mb4_bin;

How to search for the exact string value in MySQL?

I got this code:
select * from locality WHERE name ="ISTASYON"
This works perfectly with mySQL. The problem comes, when I try to execute but result is incorrect
Result code:
28BAF9346A41E4E4E0501AAC4524363B 0 iSTASYON
402881a4523b52d201523b6c2afb4166 0 İSTASYON
402881a4523b52d201523b6c38b7417c 0 İSTASYON
402881a4523b52d201523baa9faf0092 0 İSTASYON
402881a4523b52d201523baab059009f 0 İSTASYON
402881a4523b52d201523baad01a00b7 0 İSTASYON
58441bc4c054447ebe1cddbfeef958b5 0 ISTASYON
fa7fb88d1d4c41feb497b08f42066c82 1 2016-04-19 09:53:41.000000 İSTASYON
My problem is that results contain ISTASYON, İSTASYON, ıSTASYON, iSTASYON but i wanna only ISTASYON
How to solve this problem ?
You can use the COLLATE or BINARY operator to force binary comparison:
SELECT * FROM locality WHERE name COLLATE utf8_bin = "ISTASYON"
or
SELECT * FROM locality WHERE BINARY name = "ISTASYON"
If you want the column always to be treated in this fashion, declare it with a binary collation.
See the docs for more info.
You can use a select query like this, I also use and it's helpful.
Select * from `users` where username COLLATE latin1_general_cs LIKE '%$email%'
The default character set and collation are latin1 and latin1_swedish_ci, so nonbinary string comparisons are case insensitive by default. This means that if you search with col_name LIKE 'i%', you get all column values that start with I or i.To make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation.
col_name COLLATE latin1_general_cs LIKE 'i%'
col_name LIKE 'i%' COLLATE latin1_general_cs
col_name COLLATE latin1_bin LIKE 'i%'
col_name LIKE 'i%' COLLATE latin1_bin
select * from locality WHERE BINARY name ="ISTASYON"
or
select * from locality WHERE name = BINARY "ISTASYON"
Plan A: This will be case-sensitive and accent sensitive:
It would be more efficient (after setup) to have the column be utf8_bin (or utf8mb4_bin. latin1 (and latin1_bin) will not suffice because of the Turkish characters.
The setup is
ALTER TABLE tbl MODIFY COLUMN ... COLLATION utf8_bin ...;
Caution: If you also want to do case folding and/or accent stripping with the same column(s) in other situations, this will hurt those situations.
Plan B: This addresses the dotless ı and a few other Turkish idiosyncrasies.
COLLATION utf8_turkish_ci (or utf8mb4_turkish_ci) treats these equal: I=ı, then treats the following as different and later, but all equal: i=Ì=Í=Î=Ï=ì=í=î=ï=Ī=ī=Į=į=İ. Ü=ü come after U and before V. Similarly Ğ=ğ come after G and before H. See utf8 collations for more details.

Special characters select issues with MySQL

Based on http://www.i18nqa.com/debug/utf8-debug.html I want to perform a search in my MySQL table to see if I have rows that have encoding problems.
If I run the following query :
select t.col1 from table t where t.col1 like '%Ú%'
it will bring all the t.col1 that have 'as' characters in them.
How can I change the query to make it fetch only the rows containing '%Ú%', and not all that contain '%as%'.
try this if you are using collation latin1_swedish_ci
select t.col1 from table t where t.col1 regexp '^[Ú]';
With MySQL's collations, case-folding and accent-stripping go together.
If you want neither, use the ..._bin collation for the character set you are using.
WHERE foo LIKE '%Ú%' COLLATE utf8_bin
Even faster would be to declare foo to be COLLATE utf8_bin instead of whatever you have. (Note: the default for utf8 is utf8_general_ci.)

Special Characters and a simple select query

I have got a problem with a simple Select Query and special chars. I want to select the name Änne.
SELECT * FROM `names` WHERE `name` = 'Änne'
utf8_general_ci
Änne
Anne
okay, ...
utf8 general ci is a very simple collation. What it does it just
removes all accents then converts to upper case and uses the code of this sort of "base letter" result letter to compare.
http://forums.mysql.com/read.php?103,187048,188748
utf8_unicode_ci
Änne
Anne
why?
utf8_bin
Änne
utf8_bin seems to be the right choice at this point, but i have to do my search case insensitiv.
SELECT * FROM `names` WHERE `name` = 'änne'
utf8_bin
none
Is there no way to do so?
I could use php ucwords() to uppercase the first letters, but i would prefer to find a DB solution.
edit: ucwords('änne') = änne, so i cant use that too
SELECT * FROM `names` WHERE lower(`name`) = 'änne'
is working for me, because i don't have a difference between 'Änne' and 'änne' in my DB.
what about:
SELECT * FROM `names` WHERE upper(`name`) = upper("änne")
Quoting doc:
The default character set and collation are latin1 and
latin1_swedish_ci, so nonbinary string comparisons are case
insensitive by default. This means that if you search with col_name
LIKE 'a%', you get all column values that start with A or a. To make
this search case sensitive, make sure that one of the operands has a
case sensitive or binary collation
That means that case sensitive results are because you have set a binary collation. You can set collation column to utf8_general_ci and change it on searchs:
col_name COLLATE latin1_general_cs LIKE 'a%'
There is an error in your MySQL code:
SELECT * FROM names WHERE name = "Änne"
Remove the quotes around the table name and the field name.

Looking for case insensitive MySQL collation where "a" != "ä"

I'm looking for a MySQL collation for UTF8 which is case insensitive and distinguishes between "a" and "ä" (or more generally, between umlauted / accented characters and their "pure" form). utf8_general_ci does the former, utf8_bin the latter, bot none does both. If there is no such collation, what can I do to get as close as possible in a WHERE clause?
My recommendation would be to use utf8_bin and in your WHERE clause, force both sides of your comparison to upper or lower case.
It works fine here with utf8_german2_ci as collation:
SELECT * FROM tablename WHERE fieldname LIKE "würz%" COLLATE utf8_german2_ci
I checked utf8_bin like this
CREATE TABLE tmp2 (utf8_bin VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin);
INSERT INTO tmp2 VALUES ('nói');
select * from tmp2 where utf8_bin='noi';
You could try utf8_swedish_ci, it's both case insensitive and distinguishes between a and ä (but treats e.g. ü like y).
Collations are language-dependent, and it seems German doesn't have its own collation in MySQL. (I had a look at your profile, which says you're German.)