confusion about mysql like search and = search - mysql

I got this question when I use mysql search something. here is the detailed information.
say I got a table named test with a column named content. in a specific record, the content column holds:
["
/^\w{2,}/","
/^[a-z][a-z0-9]+$/","
/^[a-z0-9]+$/","
/^[a-z]\d+$/"]
there is a linefeed character in the end of the lines(last line excluded)
so when I used the like syntax to search this record, I wrote a SQL like this
select * from test where `content` like
'[\"\n/^\\\\w{2,}/\",\"\n/^[a-z][a-z0-9]+$/\",\"\n/^[a-z0-9]+$/\",\"\n/^[a-z]\\\\d+$/\"]'
and it returned the right result. but when I changed the like to = and this SQL statement didn't work, after I tried several times, I got this SQL statement that worked:
select * from test where `content` =
'[\"\n/^\\w{2,}/\",\"\n/^[a-z][a-z0-9]+$/\",\"\n/^[a-z0-9]+$/\",\"\n/^[a-z]\\d+$/\"]'
it worked. so here is the question:
why on earth the like and = have different escape strategy? in the like statement I have to use \\\\w,\\\\d while in the = statement \\w,\\d just doing fine?

MySQL LIKE operator to select data based on patterns.
The LIKE operator is commonly used to select data based on patterns. Using the LIKE operator in the right way is essential to increase the query performance.
The LIKE operator allows you to select data from a table based on a specified pattern. Therefore, the LIKE operator is often used in the WHERE clause of the SELECT statement.
MySQL provides two wildcard characters for using with the LIKE operator, the percentage % and underscore _.
The percentage (%) wildcard allows you to match any string of zero or more characters.
The underscore (_) wildcard allows you to match any single character.
Comparison operations result in a value of 1 (TRUE), 0 (FALSE), or NULL. These operations work for both numbers and strings. Strings are automatically converted to numbers and numbers to strings as necessary.
The following relational comparison operators can be used to compare not only scalar operands, but row operands:
= > < >= <= <> !=
Note: = is Equal operator and LIKE for Simple pattern matching

Related

MySQL - search for patterns

I'm trying to figure out if someone has an elegant way to look for patterns in data stored in a varchar field where a value is not known -- meaning I can't use LIKE. For example, say a table called test looked like this:
id, str
and the data looked like this:
1, YUUUY
2, DDDMM
3, MMMMT
4, XMXMX
and I want to do a select that will return anything where the value of str has a pattern that matches the pattern ABABA. ABABA here shows a pattern and not literal letters. So the only one that matches this pattern would be id = 4. Is there a regular expression that I can use to pattern match like this? To make sure I'm clear regarding the patterns:
The pattern for id=1 is ABBBA.
The pattern for id=2 is AAABB.
The pattern for id=3 is AAAAB.
When running the query, all I will know is the pattern to search for.
Alternatively, if it makes it easier, I can have the table set up like:
id,c1,c2,c3,c4,c5
and the data would look like this:
1,Y,U,U,U,Y
2,D,D,D,M,M
3,M,M,M,M,T
4,X,M,X,M,X
Not sure if that makes it easier, but I think regexp is out the window if the data is set up like that.
No regular expression support in MySQL to do that kind of pattern matching, no.
SQL wasn't specifically designed for pattern matching of strings (or patterns of values in separate columns.)
But... we could come up with something workable, even if it's not a regular expression and it's not elegant.
Assuming we don't have a custom built user-defined function, and we want to use native MySQL functions and expression...
And assuming that the patterns we are looking for are guaranteed to consist of only two distinct characters...
And assuming that we're looking at exactly five character positions...
And assuming that the pattern string we're matching to will always begin with the letter 'A', and the "other" letter in the pattern will also be 'B'
It wouldn't be overly ugly to do something like this:
SELECT t.id
, t.str
FROM myable t
WHERE CONCAT('A'
,IF(MID(t.str,2,1)=MID(t.str,1,1),'A','B')
,IF(MID(t.str,3,1)=MID(t.str,1,1),'A','B')
,IF(MID(t.str,4,1)=MID(t.str,1,1),'A','B')
,IF(MID(t.str,5,1)=MID(t.str,1,1),'A','B')
) = 'ABBBA'
The first character in the string is automatically converted to an 'A'.
The second character, if that matches the first character, then it's also an 'A' otherwise it's a 'B'.
We do the same thing for the third, fourth and fifth characters.
Concatenate the 'A' and 'B' characters into a single string, and we can now perform an equality comparison to a pattern string, consisting of 'A' and 'B', starting with an 'A'.
But that is going to fall apart if the stated assumptions aren't true. If str is less than five characters in length, if it contains more than two distinct characters (we'll see the first character as matching... this would see str=XYYZX as matching pattern ABBBA. (First character is automatic match to A, and the fifth character matches the first, so it's an A, and all of the other characters don't match, so they are 'B', even though they aren't the same.
And so on.
We could add some additional checks.
For example, to guaranteed that str is exactly five characters in length...
AND CHAR_LENGTH(t.str)=5
Note that the default collation in MySQL is case insensitive. That means means a str value of MmmmM would be converted to 'AAAAA', not 'ABBBA'. And a str value of MmmKk would match 'AAABB'.
Unfortunately, it doesn't look like MySQL supports regex groups. I was hoping you could do something like this to match ABBBA for example:
([A-Z])([A-Z])\2\2\1
Example here: http://regexr.com/3d8gu
It looks like there is a MySQL plugin that might support it:
https://github.com/mysqludf/lib_mysqludf_preg
Here is a real hacky way to do it.
ABBBA (or YUUUY, etc):
SELECT id, name FROM table WHERE
substring(name,1,1) = substring(name,5,1) AND
substring(name,2,1) = substring(name,3,1) AND
substring(name,3,1) = substring(name,4,1);
AAABB (or DDDMM, etc):
SELECT id, name FROM table WHERE
substring(name,1,1) = substring(name,2,1) AND
substring(name,2,1) = substring(name,3,1) AND
substring(name,4,1) = substring(name,5,1);
AAAAB (or MMMMT, etc):
SELECT id, name FROM table WHERE
substring(name,1,1) = substring(name,2,1) AND
substring(name,2,1) = substring(name,3,1) AND
substring(name,3,1) = substring(name,4,1) AND
substring(name,4,1) != substring(name,5,1);
You get the picture...
It would be similar if you separated the data into different columns. Instead of comparing substrings you would just compare the columns.

matching escape charactres using like operator in mysql

I want to match the string having escape characters with particular column in a table.
SELECT * FROM table WHERE col LIKE 'MESSRESTAURANGER AB\\MESSVEGEN 1\\STOCKH';
Though there is matching data in table, query result is empty set. But the same query works fine in oracle. What is the issue with mysql?
You miss %:
SELECT * FROM table WHERE col LIKE '%MESSRESTAURANGER AB\\MESSVEGEN 1\\STOCKH%';
But it should work without escaping:
SELECT * FROM table WHERE col LIKE '%MESSRESTAURANGER AB\MESSVEGEN 1\STOCKH%';
Fiddle http://sqlfiddle.com/#!9/a7ba59/2
EDIT:
SELECT * FROM t WHERE n LIKE '%MESSRESTAURANGER AB\\\\\\\\MESSVEGEN 1\\\\\\\\STOCKH%'
Because MySQL uses C escape syntax in strings (for example, “\n” to
represent a newline character), you must double any “\” that you use
in LIKE strings. For example, to search for “\n”, specify it as “\n”.
To search for “\”, specify it as \\\\; this is because the
backslashes are stripped once by the parser and again when the pattern
match is made, leaving a single backslash to be matched against.
Fiddle http://sqlfiddle.com/#!9/ac46b/9

A function in R to mimic "LIKE" in MySQL (update)

I need a function in R that mimics the functionality of LIKE in MySQL.
(I need to validate outcomes of SQL queries and R scripts against each other. If I had a function that exists to mimic the functionality of LIKE, great, that reduces my workload.)
I am adding some of the behaviors of LIKE from the link above. As you can see, there are ways in which LIKE differs from the standard grep regex.
LIKE (description from the link)
Pattern matching using SQL simple regular expression comparison. Returns 1 (TRUE) or 0 (FALSE).
Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:
Trailing spaces are significant
With LIKE you can use the following two wildcard characters in the pattern.
Character Description
% Matches any number of characters, even zero characters
_ Matches exactly one character
In MySQL, LIKE is permitted on numeric expressions. (This is an extension to the standard SQL LIKE.)
mysql> SELECT 10 LIKE '1%';
-> 1
Try sqldf package. You can write sql-like queries on data.frame
For example:
require(sqldf)
data(CO2)
new.data <- sqldf("select * from CO2 where Plant like 'Qn%'")
try ?grepl or package sqldf
df=data.frame(A=c("mytext_is_here","anothertext_is_here","mytext_is_also_here"),B=1:3)
df
firstSolution = subset(df, grepl("^mytext", A))
library("sqldf")
secondSolution = sqldf("select * from df where A like 'mytext%'")
Source: page 8 of http://cran.r-project.org/web/packages/sqldf/sqldf.pdf
I think you could use grepl function in R to do the same.
grepl does partial string matching and it will return a logical vector which you could later use to subset data along with other conditions as well.
You could also later use '!' sign in front of grepl to filter out the results having that expression.
ex. sample=c("data","ddata","ddata1")
filtered_data=grepl("dd",sample)
# it gives a logical vector FALSE TRUE TRUE
#and it can be used as follows to find out all the elements that have a string "dd" in it.
sample[grepl("dd",sample)]
Please note that grepl is case sensitive.

Using REGEX to alter field data in a mysql query

I have two databases, both containing phone numbers. I need to find all instances of duplicate phone numbers, but the formats of database 1 vary wildly from the format of database 2.
I'd like to strip out all non-digit characters and just compare the two 10-digit strings to determine if it's a duplicate, something like:
SELECT b.phone as barPhone, sp.phone as SPPhone FROM bars b JOIN single_platform_bars sp ON sp.phone.REGEX = b.phone.REGEX
Is such a thing even possible in a mysql query? If so, how do I go about accomplishing this?
EDIT: Looks like it is, in fact, a thing you can do! Hooray! The following query returned exactly what I needed:
SELECT b.phone, b.id, sp.phone, sp.id
FROM bars b JOIN single_platform_bars sp ON REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(b.phone,' ',''),'-',''),'(',''),')',''),'.','') = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')',''),'.','')
MySQL doesn't support returning the "match" of a regular expression. The MySQL REGEXP function returns a 1 or 0, depending on whether an expression matched a regular expression test or not.
You can use the REPLACE function to replace a specific character, and you can nest those. But it would be unwieldy for all "non-digit" characters. If you want to remove spaces, dashes, open and close parens e.g.
REPLACE(REPLACE(REPLACE(REPLACE(sp.phone,' ',''),'-',''),'(',''),')','')
One approach is to create user defined function to return just the digits from a string. But if you don't want to create a user defined function...
This can be done in native MySQL. This approach is a bit unwieldy, but it is workable for strings of "reasonable" length.
SELECT CONCAT(IF(SUBSTR(sp.phone,1,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,1,1),'')
,IF(SUBSTR(sp.phone,2,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,2,1),'')
,IF(SUBSTR(sp.phone,3,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,3,1),'')
,IF(SUBSTR(sp.phone,4,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,4,1),'')
,IF(SUBSTR(sp.phone,5,1) REGEXP '^[0-9]$',SUBSTR(sp.phone,5,1),'')
) AS phone_digits
FROM sp
To unpack that a bit... we extract a single character from the first position in the string, check if it's a digit, if it is a digit, we return the character, otherwise we return an empty string. We repeat this for the second, third, etc. characters in the string. We concatenate all of the returned characters and empty strings back into a single string.
Obviously, the expression above is checking only the first five characters of the string, you would need to extend this, basically adding a line for each position you want to check...
And unwieldy expressions like this can be included in a predicate (in a WHERE clause). (I've just shown it in the SELECT list for convenience.)
MySQL doesn't support such string operations natively. You will either need to use a UDF like this, or else create a stored function that iterates over a string parameter concatenating to its return value every digit that it encounters.

What's the difference between '=' operator and LIKE when not using wildcards

I do this question, because I can't found a question with the same reason. The reason is when I use LIKE, I get CONSISTENT RESULTS, and when I use (=) operator I get INCONSISTENT RESULTS.
THE CASE
I have a BIG VIEW (viewX) with multiple inner joins and left joins, where some columns have null values, because the database definition allows for that.
When I open this VIEW I see for example: 8 rows as result.
When I run for example: select * from viewX where column_int = 34 and type_string = 'xyz', this query shows me 100 rows, that aren't defined in the result of the view. [INCONSISTENT]
BUT
When I run select * from viewX where column_int = 34 and type_string like 'xyz', this query show me only 4 rows, that is defined in the view when I opened (see 1.) [CONSISTENT]
Does anyone idea, of what is happening here?
From the documentation.....
'Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator: '
more importantly (when using LIKE):
'string comparisons are not case sensitive unless one of the operands is a binary string'
from :
http://dev.mysql.com/doc/refman/5.0/en/string-comparison-functions.html
Per the MySQL documentation LIKE does function differently than =, especially when you have trailing or leading spaces.
You need to post your actual query but I'm guessing it's related to the known variances.