MYSQL - Like statement not working with special characters - mysql

I am having an issue with the following:
Inside my table I have the following:
ID Long Latt city
1 n/a n/a Newcastle-upon-Tyne
2 n/a n/a Newcastle Upon Tyne
3 n/a n/a Stoke-on-Trent
4 n/a n/a Stoke on Trent
If someone enters in the search "Newcastle Upon Type" I want both of them to show. My sql statement is:
select * from `properties` where `city` LIKE '%Newcastle Upon Tyne%'
But only one shows? But it's a LIKE statement, "Newcastle-Upon-Tyne" and "Newcastle Upon Tyne are similar - So why is only the exact match showing in this instance?

The LIKE comparison is returning TRUE only for that one row.
The LIKE comparison is essentially equivalent to an equality comparison
SELECT 'ab d' = 'ab d' --> TRUE
, 'ab d' LIKE 'ab d' --> TRUE
The difference is that LIKE supports two wildcard characters in the values on the right side.... the percent sign character (%) and the underscore character (_). The % character matches zero, one or more of any character. The _ character matches any one single character.
Compare the results from
city LIKE 'Newcastle_Upon_Tyne'
city LIKE 'Newcastle%Upon%Tyne'
Both of those would evaluate to true for values of city
'Newcastle-Upon-Tyne'
'Newcastle7Upon4Tyne'
Additionally, the one with the percent signs would also evaluate to TRUE for values of city such as
'NewcastleUpon56789Tyne'
'Newcastle FEE- UponFI Tyne'
If you want more precise matching than is provided by the LIKE comarison, you could use a regular expression instead..
city REGEXP 'Newcastle[ -]Upon[ -]Tyne'
This would return TRUE for city values
'Newcastle Upon Tyne'
'Newcastle-Upon Tyne'
'Newcastle Upon-Tyne'
'Newcastle-Upon-Tyne'

Because a space is not a wildcard character. A space is just like any other letter in a like statement. If you do something like this:
select * from `properties` where `city` LIKE '% %'
It will find all of the records that contain a space.
If you want any records that contain the words in that order, regardless of the characters between them, you can do this:
select * from `properties` where `city` LIKE '%Newcastle%Upon%Tyne%'

Related

How do I Query for used BETWEEN Operater for text searches in MySql database?

I have a SQL Table in that i use BETWEEN Operater.
The BETWEEN Operater selects values within range. The values can be numbers, text , dates.
stu_id name city pin
1 Raj Ranchi 123456
2 sonu Delhi 652345
3 ANU KOLKATA 879845
4 K.K's Company Delhi 345546
5 J.K's Company Delhi 123456
I have a query like this:-
SELECT * FROM student WHERE stu_id BETWEEN 2 AND 4 //including 2 & 4
SELECT * FROM `student` WHERE name between 'A' and 'K' //including A & not K
Here My Question is why not including K.
but I want K also in searches.
Don't use between -- until you really understand it. That is just general advice. BETWEEN is inclusive, so your second query is equivalent to:
WHERE name >= 'A' AND
name <= 'K'
Because of the equality, 'K' is included in the result set. However, names longer than one character and starting with 'K' are not -- "Ka" for instance.
Instead, be explicit:
WHERE name >= 'A' AND
name < 'L'
Of course, BETWEEN can be useful. However, it is useful for discrete values, such as integers. It is a bit dangerous with numbers with decimals, strings, and date/time values. That is why I encourage you to express the logic as inequalities.
In supplement to gordon's answer, one way to get what you're expecting is to turn your name into a discrete set of values:
SELECT * FROM `student` WHERE LEFT(name, 1) between 'A' and 'K'
You need to appreciate that K.K's Company is alphabetically AFTER the letter K on its own so it is not BETWEEN, in the same way that 4.1 is not BETWEEN 2 and 4
By stripping it down to just a single character from the start of the string it will work like you expect, but take cautionary note, you should always avoid running functions on values in tables, because if you had a million names, thats a million strings that mysql has to strip out to just the first letter and it might no longer be able to use an index on name, battering the performance.
Instead, you could :
SELECT * FROM `student` WHERE name >= 'A' and name < 'L'
which is more likely to permit the use of an index as you aren't manipulating the stored values before comparing them
This works because it asks for everything up to but not including L.. Which includes all of your names starting with K, even kzzzzzzzz. Numerically it is equivalent to saying number >= 2 and number < 5 which gives you all the numbers starting with 2, 3 or 4 (like the 4.1 from before) but not the 5
Remember that BETWEEN is inclusive at both ends. Always revert to a pattern of a >= b and a < c, a >= c and a < d when you want to specify ranges that capture all possible values
Compare in lexicographical order, 'K.K's Company' > 'K'
We should convert the string to integer. You can try that mysql script with CAST and SUBSTRING. I've updated your script here. It will include the last record as well.
SELECT * FROM student WHERE name CAST(SUBSTRING(username FROM 1) AS UNSIGNED)
BETWEEN 'A' AND 'K';
The script will work. Hope it will helps to you.
Here I've attached my test sample.

MySQL select UTF-8 string with '=' but not with 'LIKE'

I have a table with some words that come from medieval books and have some accented letters that doesn't exists anymore in modern latin1 alphabet. I can represent these letters easily with UTF-8 combining characters. For example, to create a "J" with a tilde, I use the UTF-8 sequence \u004A+\u0303 and the J becomes accented with a tilde.
The table uses utf8 encoding and the field collation is utf8_unicode_ci.
My problem is the following: If I try to select the entire string, I receive the correct answer. If I try to select using 'LIKE', I receive the wrong answer.
For example:
mysql> select word, hex(word) from oldword where word = 'hua';
+--------+--------------+
| word | hex(word) |
+--------+--------------+
| hũa | 6875CC8361 |
| huã | 6875C3A3 |
| hua | 687561 |
| hũã | 6875CC83C3A3 |
+--------+--------------+
4 rows in set (0,04 sec)
mysql> select word, hex(word) from oldword where word like 'hua';
+-------+------------+
| word | hex(word) |
+-------+------------+
| huã | 6875C3A3 |
| hua | 687561 |
+-------+------------+
2 rows in set (0,04 sec)
I don't want to search only the entire word. I want to search words that start with some substring. Eventually the searched word is the entire word.
How could I select the partial string using like and match all the strings?
I tried to create a custom collation using this information, but the server became unstable and only after a lot of trials and errors I was able to revert to the utf8_unicode_ci collation again and the server returned to normal condition.
EDIT: There's a problem with this site and some characters don't display correctly. Please see the results on these pastebins:
http://pastebin.com/mckJTLFX
http://pastebin.com/WP87QvgB
After seeing Marcus Adams' answer I realized that the REPLACE function could be the solution for this problem, although he didn't mentioned this function.
As I have only two different combining characters (acute and tilde), combined with other ASCII characters, for example j with tilde, j with acute, m with tilde, s with tilde, and so on. I just have to replace these two characters when using LIKE.
After searching the manual, I learned about the UNHEX function that helped me to properly represent the combining characters alone in the query to remove them.
The combining tilde is represented by CC83 in HEX code and the acute is represented by CC81 in HEX.
So, the query that solves my problem is this one.
SELECT word, REPLACE(REPLACE(word, UNHEX("CC83"), ""), UNHEX("CC81"), "")
FROM oldword WHERE REPLACE(REPLACE(word, UNHEX("CC83"), ""), UNHEX("CC81"), "")
LIKE 'hua%';`
The problem is that LIKE performs the comparison character-by-character and when using the "combining tilda", it literally is two characters, though it displays as one (assuming your client supports displaying it as such).
There will never be a case where comparing e.g. hu~a to hua character-by-character will match because it's comparing ~ with a for the third character.
Collations (and coercions) work in your favor and handle such things when comparing the string as a whole, but not when comparing character-by-character.
Even if you considered using SUBSTRING() as a hack instead of using LIKE with a wildcard % to perform a prefix search, consider the following:
SELECT SUBSTRING('hũa', 1, 3) = 'hua'
-> 0
SELECT SUBSTRING('hũa', 1, 4) = 'hua'
-> 1
You kind of have to know the length you're going for or brute force it like this:
SELECT * FROM oldword
WHERE SUBSTRING(word, 1, 3) = 'hua'
OR SUBSTRING(word, 1, 4) = 'hua'
OR SUBSTRING(word, 1, 5) = 'hua'
OR SUBSTRING(word, 1, 6) = 'hua'
According to this:
ũ collates equal to plain U in all utf8 collations on 5.6.
j́ collates equal to plain J in most collations; exceptions:
utf8_general*ci because it is actually j plus an accent. And the "general" collations only look at one character (as distinguished from byte) at a time. Most collations take into consideration multiple characters, such as ch or ll in Spanish or ss in German.
utf8_roman_ci, which is quite an oddball. j́=i=j
(LIKE does not exactly follow the regular rules of collation. I am not versed on the details, but I think that J is represented as 2 characters causes it to work differently in LIKE than in WHERE or ORDER BY. Furthermore, I don't know whether REPLACE() collates like LIKE or the other places.)
You can use the % symbol like a wildcard character. For example this:
SELECT word
FROM myTable
WHERE word LIKE 'hua%';
This will pull all records that start with hua and have 0+ characters following it. Here is an SQL Fiddle example.

Search string mysql db that can have spaces, be in another string, etc

I have this database wich contains product codes like
EXA 075 11112
0423654
3 574 662 123
JOLA 22354 5
LUCS 2245 785
I use a query with %LIKE% to list the products mathing a string entered by the user for example "22" would list
JOLA 22354 5
LUCS 2245 785
The problem is that the user does not necessarily know the format of the code, so it types in 07511112 and the output is zero, because "EXA 075 11112" is not matched by %LIKE%.
Is there a way to construct the query to trim all spaces from the product field before the search occurs, and then search by the string also trimed of spaces using %LIKE% ? I guess it should then match all entries. Or is there another way ?
I cannot run replace ' ', '' on the column, the codes must remains as there are now.
You could use replace function
select *
from mytable
where REPLACE( `productcode` , ' ' , '' ) like '%searchparam%'

Mysql get values from column having more than certain characters without punctuation

How can I get all the values from mysql table field, having more than 10 characters without any special characters (space, line breaks, colons, etc.)
Let's say I have table name myTable and the field I want to get values from is myColumn.
myColumn
--------
1234
------
123 456
------
123:456
-------
1234
5678
--------
123-456
----------------
1234567890123
So here I would like to get all the field values except first one i.e. 1234
Any help is much appreciated.
Thanks
UPDATE:
Sorry if I was unable to give proper description of my problem. I have tried it again:
If there is count of more than 10 characters without punctuation, then retrieve that as well.
Retrieve all the values which have special characters like line break, spaces, etc.
Yes, I have primary key in this table if this helps.
The logic seems to be "more than 10 characters OR has special punctuation":
where length(mycol) > 10 or
mycol regexp '[^a-zA-Z0-9]'
SELECT MyColumn
From MyTable
WHERE MyColumn RLIKE '([a-z0-9].*){10}'
[a-z0-9] matches a normal character.
([a-z0-9].*) matches a normal character followed by anything.
{10} matches the preceding regexp 10 times.
The result is that this matches 10 normal characters with anything between them.

mysql replace string + next one char

Is it possible to REPLACE a string + next character in MySQL? Something like LIKE underscore.
For example, if text column is this:
12 13 14 14_B 15 14_A, REPLACE all 14_* with an empty character, and replaced text should be:
12 13 14 15
You'll be looking to do this using a regular expression UDF in MySQL. Key ingredients are
regular expression UDF - check here
The regular expression itself
If you will ONLY ever see 2 to 4 of these that you need replaced, a poor man's working approach (SQL Fiddle):
SELECT *,IF(LOCATE('14_',B)+3<=Length(B),
INSERT(B,LOCATE('14_',B),4,''),B) C
FROM
(
SELECT *,IF(LOCATE('14_',A)+3<=Length(A),
INSERT(A,LOCATE('14_',A),4,''),A) B
FROM (
SELECT *,IF(LOCATE('14_',x)+3<=Length(X),
INSERT(X,LOCATE('14_',x),4,''),X) A
FROM X
) Q1
) Q2
I've only catered for 3 replacements but you can easily expand the pattern. Include only the columns from the base table needed in the outermost query.