I have a MYSQL query like below:
SELECT YA.PROGRAMNAME
FROM TABLE_Y YA
WHERE YA.PROGRAMID=9845
AND YA.ENDDATE>NOW()
When executed, this query doesn't correctly return the language specific characters (here, Turkish characters)
Above query outputs:
Zaman?n Tan???
Turkish characters are replaced with question marks.
If I omit the last line of the query, then it works correctly but I miss to control if the program's end date is greater than now.
SELECT YA.PROGRAMNAME
FROM TABLE_Y YA
WHERE YA.PROGRAMID=9845
/* Outputs: Zamanın Tanığı */
On a side note: the charset for TABLE_Y is latin5.
What can I do to have it working correctly? Thanks.
Strange behavior with WHERE clause.
I'd suggest you to configure session before selecting, e.g. -
SET NAMES latin5; -- or SET NAMES utf8;
SELECT YA.PROGRAMNAME
FROM TABLE_Y YA
...
EDIT
Cannot reproduce the problem. There may be a problem on the client, but I'm not sure.
CREATE TABLE table_y(
PROGRAMID INT(11) NOT NULL AUTO_INCREMENT,
PROGRAMNAME VARCHAR(255) DEFAULT NULL,
ENDDATE DATE DEFAULT NULL,
PRIMARY KEY (PROGRAMID)
)
ENGINE = INNODB
CHARACTER SET latin5
COLLATE latin5_turkish_ci;
INSERT INTO table_y VALUES
(1, 'Zamanın Tanığı', '2014-10-03'),
(2, 'Zamanın Tanığı', '2011-06-03');
SET NAMES latin1;
SELECT YA.PROGRAMNAME FROM TABLE_Y YA WHERE YA.PROGRAMID = 1 AND YA.ENDDATE > NOW();
---------------------------
Zaman?n Tan???
SET NAMES latin5;
SELECT YA.PROGRAMNAME FROM TABLE_Y YA WHERE YA.PROGRAMID = 1 AND YA.ENDDATE > NOW();
---------------------------
Zamanın Tanığı
Related
Original question:
Table structure:
CREATE TABLE `texts` (
`letter` VARCHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`text` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
INDEX (`letter` ASC),
INDEX (`text` ASC)
)
ENGINE InnoDB
CHARACTER SET utf8
COLLATE utf8_general_ci;
Sample data:
INSERT INTO `texts`
(`letter`, `text`)
VALUES
('a', 'Apple'),
('ā', 'Ābols'),
('b', 'Bull'),
('c', 'Cell'),
('č', 'Čakste');
The query which I'm executing:
SELECT DISTINCT `letter` FROM `texts`;
Expected results:
`letter`
a
ā
b
c
č
Actual results:
`letter`
a
b
c
I've tried many utf8 collations (utf8_[bin|general_ci|unicode_ci],
utf8mb4_[bin|general_ci|unicode_ci] etc), none of them work. How to
fix this?
Edit for clarification: what I want is not just to get all the letters
out, but also get them in the order I specified in the expected
results. utf8_bin gets all the letters, but they are ordered in the
wrong way - extended latin characters follow only after all the basic
latin characters (example: a, b, c, ā, č). Also, the actual table I'm
using has many texts per letter, so grouping is a must.
Edit #2: here's the full table data from the live site - http://pastebin.com/cH2DUzf3
Executing that SQL and running the following query after that:
SELECT DISTINCT BINARY `letter` FROM `texts` ORDER BY `letter` ASC
yields almost perfect results, with one exception: the letter 'ū' is before 'u', which is weird to say the least, because all other extended latin letters show up after their basic latin versions. How do I solve this one last problem?
Check Manual for BINARY type
SELECT DISTINCT BINARY `letter` FROM `texts`
Check SQL Fiddle
In another post on stackoverflow, I read that INSTR could be used to order results by relevance.
My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR does some conversion, it finds ë in é.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.
Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive
This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.
The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.
To pervert the reports example, if you create the following table:
create table t ( needle varchar(10), haystack varchar(10)
) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");
then run this query, you can see the same behaviour demonstrated:
select needle
, haystack
, needle=haystack as `=`
, haystack LIKE CONCAT('%',needle,'%') as `like`
, instr(needle, haystack) as `instr`
from t;
SQL Fiddle
Currently we have an interessting problem regarding the sort order of MySQL in an enum-field. The fields enum entries have been sorted in the order we want it. Just to be save, we added a CONCAT around it, so it would be cast to char and ordered in alphabetical order, just as suggested by the MySQL-reference (MySQL Reference - Enum)
Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col).
But that didn't produce the expected results, so we started to investigate further. It seems that the order by statement doesn't work on a combination of enum and the concat function. I've wrote the following sample script, which should show my point:
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO test
VALUES ('b'), ('c'), ('a');
SELECT * FROM test; -- b, c, a
SELECT * FROM test ORDER BY col1 ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS CHAR) ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS BINARY) ASC; -- a, b, c
SELECT * FROM test ORDER BY CONCAT(col1) ASC; -- b, c, a - This goes wrong
I am currently suspecting some kind of problem with the collation/encoding, but I'm not sure. My databases default encoding is also utf8. The MySQL version is 5.6.12 but it seems to be reproduceable with MySQL 5.1. The storage engine is MyIsam but it also occurs with the memory engine.
Any help would be appreciated.
Update:
As it seems the problem is produced only in MySQL 5.6 and by the collation of the column. With the first CREATE TABLE statement, the queries work fine.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_general_ci DEFAULT NULL
)
With the second they don't.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
)
The collation of the table and/or database don't seem to affect the queries. The queries can be tested in this SQL Fiddle
Strange,it works in this fiddle.Do you have a trigger or something?
http://sqlfiddle.com/#!2/0976a/2
BUT,in 5.6 goes haywire:
http://sqlfiddle.com/#!9/0976a/1
Mysql bug,probably.
More,if you input the values in the enum in the "proper" order it works:
http://sqlfiddle.com/#!9/a3784/1
IN the doc:
ENUM values are sorted based on their index numbers, which depend on
the order in which the enumeration members were listed in the column
specification. For example, 'b' sorts before 'a' for ENUM('b', 'a').
As per the document:
Under the Handling of Enumeration Literals section, it states that:
If you store a number into an ENUM column, the number is treated as
the index into the possible values, and the value stored is the
enumeration member with that index. (However, this does not work with
LOAD DATA, which treats all input as strings.) If the numeric value is
quoted, it is still interpreted as an index if there is no matching
string in the list of enumeration values. For these reasons, it is not
advisable to define an ENUM column with enumeration values that look
like numbers, because this can easily become confusing.
For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3:
numbers ENUM('0','1','2')
If you store 2, it is interpreted as an
index value, and becomes '1' (the value with index 2). If you store
'2', it matches an enumeration value, so it is stored as '2'. If you
store '3', it does not match any enumeration value, so it is treated
as an index and becomes '2' (the value with index 3).
mysql> INSERT INTO t (numbers) VALUES(2),('2'),('3');
mysql> SELECT * FROM t;
+---------+
| numbers |
+---------+
| 1 |
| 2 |
| 2 |
+---------+
In your case:
INSERT INTO test
VALUES ('2'), ('3'), ('1');
Index value of '2' is 2, '3' is 3 and '1' is 1.
So the output is 2,3,1
I created table like that in MySQL:
DROP TABLE IF EXISTS `barcode`;
CREATE TABLE `barcode` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(40) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `barcode` VALUES ('1', 'abc');
INSERT INTO `barcode` VALUES ('2', 'abc ');
Then I query data from table barcode:
SELECT * FROM barcode WHERE `code` = 'abc ';
The result is:
+-----+-------+
| id | code |
+-----+-------+
| 1 | abc |
+-----+-------+
| 2 | abc |
+-----+-------+
But I want the result set is only 1 record. I workaround with:
SELECT * FROM barcode WHERE `code` = binary 'abc ';
The result is 1 record. But I'm using NHibernate with MySQL for generating query from mapping table. So that how to resolve this case?
There is no other fix for it. Either you specify a single comparison as being binary or you set the whole database connection to binary. (doing SET NAMES binary, which may have other side effects!)
Basically, that 'lazy' comparison is a feature of MySQL which is hard coded. To disable it (on demand!), you can use a binary compare, what you apparently already do. This is not a 'workaround' but the real fix.
from the MySQL Manual:
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces
Of course there are plenty of other possiblities to achieve the same result from a user's perspective, i.e.:
WHERE field = 'abc ' AND CHAR_LENGTH(field) = CHAR_LENGTH('abc ')
WHERE field REGEXP 'abc[[:space:]]'
The problem with these is that they effectively disable fast index lookups, so your query always results in a full table scan. With huge datasets that makes a big difference.
Again: PADSPACE is default for MySQLs [VAR]CHAR comparison. You can (and should) disable it by using BINARY. This is the indended way of doing this.
You can try with a regular expression matching :
SELECT * FROM barcode WHERE `code` REGEXP 'abc[[:space:]]'
i was just working on case just like that when using LIKE with wildcard (%) resulting in an unexpected result. While searching i also found STRCMP(text1, text2) under string comparison feature of mysql which compares two string. however using BINARY with LIKE solved the problem for me.
SELECT * FROM barcode WHERE `code` LIKE BINARY 'abc ';
You could do this:
SELECT * FROM barcode WHERE `code` = 'abc '
AND CHAR_LENGTH(`code`)=CHAR_LENGTH('abc ');
I am assuming you only want one result, you could use LIMIT
SELECT * FROM barcode WHERE `code` = 'abc ' LIMIT 1;
To do exact string matching you could use Collation
SELECT *
FROM barcode
WHERE code COLLATE utf8_bin = 'abc';
The sentence right after the one quoted by Kaii basically says "use LIKE" :
“Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant
and the example below shows that 'Monty' = 'Monty ' is true, but not 'Monty' LIKE 'Monty '.
However, if you use LIKE, beware of literal strings containing the '%', '_' or '\' characters : '%' and '_' are wildcard characters, '\' is used to escape sequences.
for example TableColumn could be contains value in forms New, new or NEW, how can I write query that returns only
SELECT * FROM myTable
WHERE myColumn = 'New'
but doesn't returns TableRows contains new or NEW
For MySQL, a simple option is:
SELECT * FROM myTable
WHERE myColumn = 'New'
AND BINARY(myColumn) = BINARY('New');
The second condition is logically sufficient, but makes the query slow if the table is big (the Index on myColumn cannot be used). The combination of the 2 conditions allows index use for the first condition, and then filtering out the non matching case.
You can use COLLATE in your where clause
SELECT *
FROM myTable
WHERE myColumn COLLATE latin1_general_cs = 'New'
The Best Way to make this column case sensitive is change this particular column if charset is UTF8 change it's collation to collate utf8_bin, after the modification of column it search case sensitive.
i.e I have a table name people with column name "name".
alter table people
modify column name varchar(50) charset utf8 collate utf8_bin;
Note : You can use varchar data type to varbinary, it works fine..
SELECT * FROM users WHERE BINARY userid = 'Rahul';
mysql does not check case of word so in the case of username or userid we shouold need to check case also for more security.
I hope this will help you