Is there a way to match values in query if the stored data has special characters, and the search query doesn't:
For example: I want to match a column with the following value:
Doña Ana
but I can only search using
Dona Ana
You may collate the column of interest to Latin general, which doesn't have accents:
SELECT *
FROM yourTable
WHERE name COLLATE latin1_general_ci = 'Dona Ana';
Related
I am trying to create a function that accepts all characters, international accent marks. But any comma and exclamation mark should be excluded and the string rejected.
So far I have created a table with column that stores the values.
I need to move only those values forward that have no comma and exclamation marks.
The regex I am using is as below:
IF column_value not REGEXP concat('[',x'21','-',x'2C',x'2E','-',x'40',x'5B','-',x'60',x'7B','-',x'7E',x'A1','-',x'BF',']') then
SET is_valid = 1;
This is not regexp format to reject rows that contain the characters I dont need. Everything else should be stored in the utf8_unicode_ci column i have created.
For now, all the values till HEX code (DF) is marked valid.
But, the remaining values are marked invalid. For example, è is marked invalid.
Can you please help ?
UPDATE table_name SET is_valid = 1 WHERE column_value NOT REGEXP '[,!]+';
or
UPDATE table_name SET is_valid = 1 WHERE column_value REGEXP '[^,!]+';
The short clarification about your last question about è:
I can see small latin letter E with combined grave accent. In unicode it is looks like \u0065\u0300, but in UTF-8 it is three bytes \x65\xCC\x80.
REGEX checks each byte separately. Let see your filter:
[
\x21-\x2C //PASS
\x2E-\x40 //PASS
\x5B-\x60 //PASS
\x7B-\x7E //PASS
\xA1-\xBF //PASS
]
But if this is a latin small letter E with grave \u00e8, then UTF-8 is \xC3\xA8:
[
\x21-\x2C //PASS
\x2E-\x40 //PASS
\x5B-\x60 //PASS
\x7B-\x7E //PASS
\xA1-\xBF // \xA8 IS FILTERED THERE
]
è and è looks the same and make all the mess. It is the reason to find the way simplify REGEX and avoid as much as it possible Unicode involvements in it.
Thanks for your guidance.
I have created a table with special characters and their hex-codes. I convert each string like hex('value of string'). Now, I use instr(hex('value of string'), #hex_value_from_table).
The #hex_value_from_table picks the value of each special character from the table and checks for the occurrence in the string. This way, if any character has to be removed, I dont change the regexp, instead i add/delete entries from the "value" table of special characters.
Query:
create table special_char_hex_codes
(
char_name varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Character to be rejected',
hex_value varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Character hexadecimal value'
);
set sql_safe_updates = 0;
delete from eligibility_file_reject_chars ;
insert into special_char_hex_codes select '€',hex('€') ;
select instr(hex('Wilâmer'),trim(hex_value)) as str_exists
from
(
select char_name,hex_value from special_char_hex_codes
) as test;
Original question:
Table structure:
CREATE TABLE `texts` (
`letter` VARCHAR(1) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
`text` VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL,
INDEX (`letter` ASC),
INDEX (`text` ASC)
)
ENGINE InnoDB
CHARACTER SET utf8
COLLATE utf8_general_ci;
Sample data:
INSERT INTO `texts`
(`letter`, `text`)
VALUES
('a', 'Apple'),
('ā', 'Ābols'),
('b', 'Bull'),
('c', 'Cell'),
('č', 'Čakste');
The query which I'm executing:
SELECT DISTINCT `letter` FROM `texts`;
Expected results:
`letter`
a
ā
b
c
č
Actual results:
`letter`
a
b
c
I've tried many utf8 collations (utf8_[bin|general_ci|unicode_ci],
utf8mb4_[bin|general_ci|unicode_ci] etc), none of them work. How to
fix this?
Edit for clarification: what I want is not just to get all the letters
out, but also get them in the order I specified in the expected
results. utf8_bin gets all the letters, but they are ordered in the
wrong way - extended latin characters follow only after all the basic
latin characters (example: a, b, c, ā, č). Also, the actual table I'm
using has many texts per letter, so grouping is a must.
Edit #2: here's the full table data from the live site - http://pastebin.com/cH2DUzf3
Executing that SQL and running the following query after that:
SELECT DISTINCT BINARY `letter` FROM `texts` ORDER BY `letter` ASC
yields almost perfect results, with one exception: the letter 'ū' is before 'u', which is weird to say the least, because all other extended latin letters show up after their basic latin versions. How do I solve this one last problem?
Check Manual for BINARY type
SELECT DISTINCT BINARY `letter` FROM `texts`
Check SQL Fiddle
i have one table which consists of 10 columns out of which one column is username . the column username stores the name of student which may be in uppercase and lowercase .
i want to segregate the uppercase and lowercase students.if the username consists of any uppercase it will list the row.
i am interested in doing query for column username.in other column also uppercase letters are there but i want to list based on username column only. i have tried several query but no one is working.please advice
i want to list rows with any upperletter in column username.
i have tried these codes
SELECT * FROM accounts WHERE LOWER(username) LIKE '%q'
did not worked
SELECT * FROM accounts WHERE UPPER(username) = UPPER('%q')
did not worked
SELECT * FROM accounts where username COLLATE latin1_swedish_ci = '%q'
did not worked
SELECT * FROM accounts WHERE username REGEXP '[A-Z]';
did not worked
SELECT * FROM accounts WHERE username REGEXP '^[[:upper:]+]$'
did not worked
SELECT *
FROM accounts
WHERE CAST(username AS BINARY) RLIKE '[A-Z]';
CREATE TABLE accounts (
id int,
username varchar(50)
) CHARACTER SET latin1 COLLATE latin1_general_ci;
SELECT* FROM accounts WHERE username REGEXP '^[A-Z]+$';
Make sure you use COLLATE latin1_general_ci
You were on track with the collation, but you need to have a table that is collated, not just the query. What you could do is create a new table, then insert your current rows into the new collated table, then try the REGEX or the rest of the methods.
Select ALL fields that contains only UPPERCASE letters
The following query will work fine
select * from TABLE where CAST( COL_NAME AS BINARY) = upper(COL_NAME);
First, you need to make sure the field you are searching on have a case-specific collation like latin1_general_cs (if you are using latin character set). Then you can just search for uppercase or lower case, whichever is you are looking for (i.e. WHERE username LIKE '%q%' or WHERE username LIKE '%Q%'
Mysql is case insensitive for strings, so it will be more complecated than a single seletect statement. If you want to do this comparison often, convert the type of the username column to one of the binary types listed below:
http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html
If you don't want to do this often, consider saving off the results of the current table to a temp table, altering that table with a case sensitive string type, and then using your regex.
I created table like that in MySQL:
DROP TABLE IF EXISTS `barcode`;
CREATE TABLE `barcode` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(40) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO `barcode` VALUES ('1', 'abc');
INSERT INTO `barcode` VALUES ('2', 'abc ');
Then I query data from table barcode:
SELECT * FROM barcode WHERE `code` = 'abc ';
The result is:
+-----+-------+
| id | code |
+-----+-------+
| 1 | abc |
+-----+-------+
| 2 | abc |
+-----+-------+
But I want the result set is only 1 record. I workaround with:
SELECT * FROM barcode WHERE `code` = binary 'abc ';
The result is 1 record. But I'm using NHibernate with MySQL for generating query from mapping table. So that how to resolve this case?
There is no other fix for it. Either you specify a single comparison as being binary or you set the whole database connection to binary. (doing SET NAMES binary, which may have other side effects!)
Basically, that 'lazy' comparison is a feature of MySQL which is hard coded. To disable it (on demand!), you can use a binary compare, what you apparently already do. This is not a 'workaround' but the real fix.
from the MySQL Manual:
All MySQL collations are of type PADSPACE. This means that all CHAR and VARCHAR values in MySQL are compared without regard to any trailing spaces
Of course there are plenty of other possiblities to achieve the same result from a user's perspective, i.e.:
WHERE field = 'abc ' AND CHAR_LENGTH(field) = CHAR_LENGTH('abc ')
WHERE field REGEXP 'abc[[:space:]]'
The problem with these is that they effectively disable fast index lookups, so your query always results in a full table scan. With huge datasets that makes a big difference.
Again: PADSPACE is default for MySQLs [VAR]CHAR comparison. You can (and should) disable it by using BINARY. This is the indended way of doing this.
You can try with a regular expression matching :
SELECT * FROM barcode WHERE `code` REGEXP 'abc[[:space:]]'
i was just working on case just like that when using LIKE with wildcard (%) resulting in an unexpected result. While searching i also found STRCMP(text1, text2) under string comparison feature of mysql which compares two string. however using BINARY with LIKE solved the problem for me.
SELECT * FROM barcode WHERE `code` LIKE BINARY 'abc ';
You could do this:
SELECT * FROM barcode WHERE `code` = 'abc '
AND CHAR_LENGTH(`code`)=CHAR_LENGTH('abc ');
I am assuming you only want one result, you could use LIMIT
SELECT * FROM barcode WHERE `code` = 'abc ' LIMIT 1;
To do exact string matching you could use Collation
SELECT *
FROM barcode
WHERE code COLLATE utf8_bin = 'abc';
The sentence right after the one quoted by Kaii basically says "use LIKE" :
“Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant
and the example below shows that 'Monty' = 'Monty ' is true, but not 'Monty' LIKE 'Monty '.
However, if you use LIKE, beware of literal strings containing the '%', '_' or '\' characters : '%' and '_' are wildcard characters, '\' is used to escape sequences.
I have a table with words in spanish (INT id_word,VARCHAR(255) word). Lets suppose the table has these records:
1 casa
2 pantalon
If I search for the word pantalón (with a special char ó) it should not return any rows. How do I select exact matches only? It is currently returning the 2nd row.
SELECT * FROM words WHERE word='pantalón';
Thanks!
Solution from ifx, i changed the word field's collation to utf8_bin.
The reason this happens is down to the collation. There are collations that are accent sensitive (which you want in this case) and other that are accent insensitive (which is what you currently have configured). There are also case-sensitive and case-insensitive collations.
The following code produces the correct result:
create table test (
id int identity(1,1),
value nvarchar(100) collate SQL_Latin1_General_Cp437_CI_AS
)
insert into test values ('casa')
insert into test values ('pantalon')
select value collate SQL_Latin1_General_Cp437_CS_AS from test where value = 'pantalón'
The below code produces the incorrect result:
drop table test
go
create table test (
id int identity(1,1),
value nvarchar(100) collate SQL_Latin1_General_Cp437_CI_AI
)
insert into test values ('casa')
insert into test values ('pantalon')
select value collate SQL_Latin1_General_Cp437_CS_AS from test where value = 'pantalón'
The key here is the collation - AI means Accent-insensitive, AS means accent-sensitive.
i have this problem in our language too, so i did this, i have 2 coulmns for names, one of the i have named SearchColumn and the other one ViewColumn, when saving data I replace Special characters with other characters. when a user wants to search for something with the same function I do the changes and search it in the SearchColumn, if the search matches, I would display the value of the ViewColumn.