SQL Regex last character search not working - mysql

I'm using regex to find specific search but the last separator getting ignore.
Must search for |49213[A-Z]| but searches for |49213[A-Z]
SELECT * FROM table WHERE (data REGEXP '/\|49213[A-Z]+\|/')

Why are you using | in the pattern? Why the +?
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]\|')
If you want multiple:
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]+\|')
or:
SELECT * FROM table WHERE (data REGEXP '[|]49213[A-Z][|]')

Aha. That is rather subtle.
\ escapes certain characters that have special meaning.
But it does not seem to do so for | ("or") or . ("any byte"), etc.
So, \| is the same as |.
But the regexp parser does not like having either side of "or" being empty. (I suspect this is a "bug"). Hence the error message.
https://dev.mysql.com/doc/refman/5.7/en/regexp.html says
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:
The best fix seems to be [|] or \\| instead of \| when you want the pipe character.
Someday, the REGEXP parser in MySQL will be upgraded to PCRE as in MariaDB. Then a lot more features will come, and this 'bug' may go away.

Related

Types of Wildcards in MySql

My query:
Select * From tableName Where columnName Like "[PST]%"
is not giving the expected result.
Why does this wildcard not work in MySql?
If you want to filter on strings that contain any 'P', 'S', or 'T', then you can use a regex:
where col rlike '[PST]'
If you want strings that contain substring 'PST', then no need for square brackets - and like is enough:
where col like '%PST%'
If you want the matching character(s) at the start of the string, then the regex solution looks like:
where col rlike '^PST'
And the like option would be:
where col like 'PST%'
MySQL's LIKE syntax is documented here: https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
Standard SQL from decades ago defined only two wildcards: % and _. These are the only wildcards an SQL product needs to support if they want to say they are SQL compliant and support the LIKE predicate.
% matches zero or more of any characters. It's analogous to .* in regular expressions.
_ matches exactly one of any character. It's analogous to . in regular expressions.
Also if you want to match a literal '%' or '_', you need to escape it, i.e. put a backslash before it:
WHERE title LIKE 'The 7\% Solution'
Microsoft SQL Server's LIKE syntax is documented here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15
They support % and _ wildcards, and the \ escape character, but they extend standard SQL with two other forms:
[a-z] matches one character, but only characters in the range inside the brackets. This is similar in regular expressions. The - is a range operator, unless it appears at the start or end of the string inside the brackets.
[^a-z] matches one character, which must not be one of the characters in the range inside the brackets. Also the same in regular expressions.
These are not standard forms of wildcards for the LIKE predicate, and other brands of SQL database don't support them.
Later versions of the SQL standard introduced a new predicate SIMILAR TO which supports much richer patterns and wildcards, since the right-side operand is a string which contains a regular expression. But since this predicate was introduced in a later edition of the SQL standard, some implementations had already developed their own solution that was almost the same.
MySQL called the operator REGEXP and RLIKE is a synonym (https://dev.mysql.com/doc/refman/8.0/en/regexp.html).
It was requested in https://bugs.mysql.com/bug.php?id=746 to support SIMILAR TO syntax to help MySQL comply with the SQL standard, but the request was turned down, because it had subtly different behavior to the existing REGEXP/RLIKE operator.
Microsoft SQL Server has partial support of regular expression wildcards in the LIKE operator, and also a dbo.RegexMatch() function.
SQLite has a GLOB operator, and so on.
Thanks everyone!
For specific this question, we need to use regexp
Select * From tableName Where ColumnName Regexp "^[PST]";
For more detail over Regular Expression i.e Regexp :
https://www.youtube.com/watch?v=KoltE-JUY0c

MySQL REGEXP failing to limit # of occurrences (?!)

I have a table with a lot of individual words in it (Column name 'qWord') with contents including 'Utility', 'Utter', 'Unicorn' and 'Utile'
I'm trying to do a SELECT to find qWord strings which have at most one instance of the letter 't'.
Using REGEXP I thought it would be a trivial statement like:
SELECT *
FROM entries.qentries
WHERE (qWord REGEXP 'T{0,1}')
but I'm still getting 'Utter' and 'Utility' in the output -- along with 'Utile' and 'Unicorn'.
So what am I missing here?
(FWIW: MySQL 8.0.11, Community edition running on a Windows 8.1 machine)
Here's the full REGEXP and my apologies for not posting it initially. I'm looking for words composed only of specific letters and that part works fine.
But I also words with a limited number of a given letter, say t
SELECT * FROM entries.entries WHERE
(qWord NOT REGEXP 'C|F|G|I|J|K|P|Q|S|V|W|X|Y|Z|-')
AND (qWord REGEXP 'A|B|D|E|H|L|M|N|O|R|T|U')
AND (qWord REGEXP 't{0,1}') ;
I've also tried (qWord REGEXP 't{0}|t{1}') as well as (qWord REGEXP '(?<=[^t]|^)(t{0}|t{1})(?:[^t]|$)' )
without success, so I remain stuck
You can use the following regex:
SELECT *
FROM entries.qentries
WHERE (qWord REGEXP '^[^tT]*[tT]?[^tT]*$')
Explanations:
^, $ starting and ending anchors (this is needed to avoid word partial match)
[^tT]* any character that is not a t or a T 0 or more times
[tT]? at most one occurrence of t or T (? is equivalent to {0,1})
[^tT]* any character that is not a t or a T 0 or more times
Regex Demo
Additional Notes:
[^tT] this character range will accept anything that is not a t or a T (spaces, ., \n and other characters will also be accepted, you can restrict this if you want to accept only letters and exclude the t,T you can use: [a-su-zA-SU-Z], if you want to add other characters to this class, just add them at the end [a-su-zA-SU-Z -] will also accept words with spaces and -.

MySQL regex matching at least 2 dots

Consider the following regex
#(.*\..*){2,}
Expected behaviour:
a#b doesnt match
a#b.c doesnt match
a#b.c.d matches
a#b.c.d.e matches
and so on
Testing in regexpal it works as expected.
Using it it in a mysql select doesn't work as expected. Query:
SELECT * FROM `users` where mail regexp '#(.*\..*){2,}'
is returning lines like
foo#example.com
that should not match the given regex. Why?
I think the answer to your question is here.
Because MySQL uses the C escape syntax in strings (for example, ā€œ\nā€
to represent the newline character), you must double any ā€œ\ā€ that you
use in your REGEXP strings.
MYSQL Reference
Because your middle dot wasn't properly escaped it was treated as just another wildcard and in the end your expression was effectively collapsed to #.{2,} or #..+
#anubhava's answer is probably a better substitute for what you tried to do though I would note #dasblinkenlight's comment about using the character class [.] which will make it easy to drop in a regex you've already tested in at RegexPal.
You can use:
SELECT * FROM `users` where mail REGEXP '([^.]*\\.){2}'
to enforce at least 2 dots in mail column.
I would match two dots in MySQL using like:
where col like '%#.%.%'
The problem with your code is that .* (match-everything dot) matches dot '.' character. Replacing it with [^.]* fixes the problem:
SELECT *
FROM `users`
where mail regexp '#([^.]*[.]){2,}'
Note the use of [.] in place of the equivalent \.. This syntax makes it easier to embed the regex into programming languages that use backslash as escape character in their string literals.
Demo.

Mysql regex error #1139 using literal -

I tried running this query:
SELECT column FROM table WHERE column REGEXP '[^A-Za-z\-\']'
but this returns
#1139 - Got error 'invalid character range' from regexp
which seems to me like the - in the character class is not being escaped, and instead read as an invalid range. Is there some other way that it's suppose to be escaped for mysql to be the literal -?
This regex works as expected outside of mysql, https://regex101.com/r/wE8vY5/1.
I came up with an alternative to that regex which is
SELECT column FROM table WHERE column NOT REGEXP '([:alpha:]|-|\')'
so the question isn't how do I get this to work. The question is why doesn't the first regex work?
Here's a SQL fiddle of the issue, http://sqlfiddle.com/#!9/f8a006/1.
Also, there is no language being used here, query is being run at DB level.
Regex in PHP: http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55
Regex in JS: https://jsfiddle.net/6ay4zmrb/
Just change the order.
SELECT column FROM table WHERE column REGEXP '[^-A-Za-z\']'
#Avinash Raj is correct the - must be first (or last). The \ is not an escape character in POSIX, which is what mysql uses, https://dev.mysql.com/doc/refman/5.1/en/regexp.html.
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression.
-http://www.regular-expressions.info/posixbrackets.html
What special characters must be escaped in regular expressions?
Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally

Select special characteres mysql

I need to make selects from fields that can contain special characteres for example
+--------------+
| code |
+--------------+
| **4058947"_\ |
| **4123/"_\ |
| sew'-8947"_\ |
+--------------+
i try this
select code from table where code REGEXP '[(|**4058947"_\|)]';
select code from table where code REGEXP '[(**4058947"_\)]';
select code from table where code REGEXP '^[(**4058947"_\)]';
but the querys return all rows and this query return empty
select code from table where code REGEXP '^[(**4058947"_\)]$';
and i need that only return the first one or the specified
To select only one row, you could just do this if it doesn't matter which one.
SELECT code FROM table LIMIT 1
If it does matter, drop the regex.
SELECT code FROM table WHERE code = "**4058947\"_\\"
To match those special characters (in this case, " and \), you need to "escape" them. (That's how it's called. I didn't make that up.) In most mainstream languages this is done by putting a backslash in front of it (MySQL does it this way too). The backslash is the escape character, a backslash with another character behind it is called an escape sequence. As you see, I escaped the quote and the backslash in the code value I want to match, so it should work now.
If you need to keep the regexes (which I hope is not the case, since you have the literal string you want to match against) same thing applies. Escape quotes and backslashes and you'll be fine, if you drop the parentheses and brackets. Note that in a regex, you need to escape far more characters. This is because some characters (for example: | [] () * + have a special function in a regex. This is very handy, but becomes a bit of a problem when you need to match a string with that character in it. In that case, you need to escape it, but with a double backslash! This is because MySQL first parses the query and will throw an error if it encounters an invalid escape sequence (that is, if you escape a character you needn't escape according to MySQL). Only then is the result parsed as a regex, with the double backslashes replaced by single backslashes. This gets ugly very quickly, since this means matching a backslash with a MySQL regex requires 4 backslashes! Two in the regex, but this needs to be doubled, since MySQL parses it as a string first!