mysql select distinct letters, including extended latin characters - mysql

Original question:
Table structure:
CREATE TABLE `texts` (
`letter` VARCHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`text` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
INDEX (`letter` ASC),
INDEX (`text` ASC)
)
ENGINE InnoDB
CHARACTER SET utf8
COLLATE utf8_general_ci;
Sample data:
INSERT INTO `texts`
(`letter`, `text`)
VALUES
('a', 'Apple'),
('ā', 'Ābols'),
('b', 'Bull'),
('c', 'Cell'),
('č', 'Čakste');
The query which I'm executing:
SELECT DISTINCT `letter` FROM `texts`;
Expected results:
`letter`
a
ā
b
c
č
Actual results:
`letter`
a
b
c
I've tried many utf8 collations (utf8_[bin|general_ci|unicode_ci],
utf8mb4_[bin|general_ci|unicode_ci] etc), none of them work. How to
fix this?
Edit for clarification: what I want is not just to get all the letters
out, but also get them in the order I specified in the expected
results. utf8_bin gets all the letters, but they are ordered in the
wrong way - extended latin characters follow only after all the basic
latin characters (example: a, b, c, ā, č). Also, the actual table I'm
using has many texts per letter, so grouping is a must.
Edit #2: here's the full table data from the live site - http://pastebin.com/cH2DUzf3
Executing that SQL and running the following query after that:
SELECT DISTINCT BINARY `letter` FROM `texts` ORDER BY `letter` ASC
yields almost perfect results, with one exception: the letter 'ū' is before 'u', which is weird to say the least, because all other extended latin letters show up after their basic latin versions. How do I solve this one last problem?

Check Manual for BINARY type
SELECT DISTINCT BINARY `letter` FROM `texts`
Check SQL Fiddle

Related

Matching db values with special chars

Is there a way to match values in query if the stored data has special characters, and the search query doesn't:
For example: I want to match a column with the following value:
Doña Ana
but I can only search using
Dona Ana
You may collate the column of interest to Latin general, which doesn't have accents:
SELECT *
FROM yourTable
WHERE name COLLATE latin1_general_ci = 'Dona Ana';

What is wrong with MySql LIKE operator on utf8_turkish_ci collation?

I've a table like below:
wordId | word
---------------------------------
1 | axxe
2 | test word
3 | another test word
I'm trying to run the query below to find the records beginning with the letters "ax".
SELECT * FROM `words` WHERE word LIKE 'ax%'
MySQL cannot find anything.
But, if I try one of the queries below I can see the correct record (the word "axxe") on the results.
SELECT * FROM `words` WHERE word='axxe'
SELECT * FROM `words` WHERE word LIKE '%ax%'
SELECT * FROM `words` WHERE word LIKE 'a%'
Why can't MySQL find the correct value for the first query? I've tried to run this both on the command line and phpMyAdmin but the result is the same.
This is SHOW CREATE TABLE output:
CREATE TABLE `words` (
`wordId` int(11) NOT NULL auto_increment,
`word` text collate utf8_turkish_ci NOT NULL
PRIMARY KEY (`word`)
) ENGINE=MyISAM AUTO_INCREMENT=2853 DEFAULT CHARSET=utf8 COLLATE=utf8_turkish_ci
TL;DR:
Update your MySQL version.
I created a simulation of your problem here:
Create table:
CREATE TABLE `turky` (
`id` int(5) NOT NULL AUTO_INCREMENT,
`word` text COLLATE utf8_turkish_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=utf8 COLLATE=utf8_turkish_ci
Insert data:
INSERT INTO `turky` (`id`, `word`) VALUES
(1, 'axxe'),
(2, 'test word'),
(3, 'axxxxxe'),
(4, 'another test word');
Run test query (that works):
SELECT * FROM `turky` WHERE `word`='axxe'
Result:
1, 'axxe',
Run test query 2 (that works):
SELECT * FROM `turky` WHERE word LIKE '%ax%'
Result:
1, 'axxe',
3, 'axxxxxe',
Run test query 3 (that works):
SELECT * FROM `turky` WHERE word LIKE 'a%'
Result:
1, 'axxe',
3, 'axxxxxe',
4, 'another test word';
Run test query 4 (that does not work originally):
SELECT * FROM `turky` WHERE `word` LIKE 'ax%'
Result:
1, 'axxe',
3, 'axxxxxe',
This works in MySQL, using PHPMyAdmin.
Versions:
MySQL: 5.6.35
PHPMyAdmin: 4.6.6
The current Turkish alphabet doesn't contain the letter "x" so this fact may [probably not] be causing some obscure interference with the SQL sorting process (as in a lack of language guidance when looking for this character).
Web searching Turkish language bugs in MySQL and while there are half a dozen none of them appear to be for your specific instance.
But the only option here that I can see from my own testing (above), using the table and SQL details you've given us, is that you have an older version of MySQL that includes some turkish language bugs.
If your MySQL version is up to date
(or at least, more recent than mine)
Then the issue seems to be specific to your setup and your data, so I highly doubt we can find and reproduce this issue :-(
More Diagnostic stuff:
As commented by Jacob H, see if this issue still occurs after casting to binary:
SELECT * FROM `turky` WHERE BINARY `word` LIKE CONCAT(BINARY 'ax','%');
Result:
1, 'axxe',
3, 'axxxxxe',

INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.
My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR does some conversion, it finds ë in é.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.
Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive
This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.
The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.
To pervert the reports example, if you create the following table:
create table t ( needle varchar(10), haystack varchar(10)
) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");
then run this query, you can see the same behaviour demonstrated:
select needle
, haystack
, needle=haystack as `=`
, haystack LIKE CONCAT('%',needle,'%') as `like`
, instr(needle, haystack) as `instr`
from t;
SQL Fiddle

SQL - how to return exact matches only (special characters)

I have a table with words in spanish (INT id_word,VARCHAR(255) word). Lets suppose the table has these records:
1 casa
2 pantalon
If I search for the word pantalón (with a special char ó) it should not return any rows. How do I select exact matches only? It is currently returning the 2nd row.
SELECT * FROM words WHERE word='pantalón';
Thanks!
Solution from ifx, i changed the word field's collation to utf8_bin.
The reason this happens is down to the collation. There are collations that are accent sensitive (which you want in this case) and other that are accent insensitive (which is what you currently have configured). There are also case-sensitive and case-insensitive collations.
The following code produces the correct result:
create table test (
id int identity(1,1),
value nvarchar(100) collate SQL_Latin1_General_Cp437_CI_AS
)
insert into test values ('casa')
insert into test values ('pantalon')
select value collate SQL_Latin1_General_Cp437_CS_AS from test where value = 'pantalón'
The below code produces the incorrect result:
drop table test
go
create table test (
id int identity(1,1),
value nvarchar(100) collate SQL_Latin1_General_Cp437_CI_AI
)
insert into test values ('casa')
insert into test values ('pantalon')
select value collate SQL_Latin1_General_Cp437_CS_AS from test where value = 'pantalón'
The key here is the collation - AI means Accent-insensitive, AS means accent-sensitive.
i have this problem in our language too, so i did this, i have 2 coulmns for names, one of the i have named SearchColumn and the other one ViewColumn, when saving data I replace Special characters with other characters. when a user wants to search for something with the same function I do the changes and search it in the SearchColumn, if the search matches, I would display the value of the ViewColumn.

MySQL Charset Fails When DateTime Column on WHERE Clause

I have a MYSQL query like below:
SELECT YA.PROGRAMNAME
FROM TABLE_Y YA
WHERE YA.PROGRAMID=9845
AND YA.ENDDATE>NOW()
When executed, this query doesn't correctly return the language specific characters (here, Turkish characters)
Above query outputs:
Zaman?n Tan???
Turkish characters are replaced with question marks.
If I omit the last line of the query, then it works correctly but I miss to control if the program's end date is greater than now.
SELECT YA.PROGRAMNAME
FROM TABLE_Y YA
WHERE YA.PROGRAMID=9845
/* Outputs: Zamanın Tanığı */
On a side note: the charset for TABLE_Y is latin5.
What can I do to have it working correctly? Thanks.
Strange behavior with WHERE clause.
I'd suggest you to configure session before selecting, e.g. -
SET NAMES latin5; -- or SET NAMES utf8;
SELECT YA.PROGRAMNAME
FROM TABLE_Y YA
...
EDIT
Cannot reproduce the problem. There may be a problem on the client, but I'm not sure.
CREATE TABLE table_y(
PROGRAMID INT(11) NOT NULL AUTO_INCREMENT,
PROGRAMNAME VARCHAR(255) DEFAULT NULL,
ENDDATE DATE DEFAULT NULL,
PRIMARY KEY (PROGRAMID)
)
ENGINE = INNODB
CHARACTER SET latin5
COLLATE latin5_turkish_ci;
INSERT INTO table_y VALUES
(1, 'Zamanın Tanığı', '2014-10-03'),
(2, 'Zamanın Tanığı', '2011-06-03');
SET NAMES latin1;
SELECT YA.PROGRAMNAME FROM TABLE_Y YA WHERE YA.PROGRAMID = 1 AND YA.ENDDATE > NOW();
---------------------------
Zaman?n Tan???
SET NAMES latin5;
SELECT YA.PROGRAMNAME FROM TABLE_Y YA WHERE YA.PROGRAMID = 1 AND YA.ENDDATE > NOW();
---------------------------
Zamanın Tanığı