How to make this REGEX below work for MySql? - mysql

I have written regex and tested it online, works fine. When I test in terminal, MySQL console, it doesn't match and I get an empty set. I believe MySQL regexp syntax is somehow different but I cannot find the right way.
This is data I use:
edu.ba;
medu.ba;
edu.ba;
med.edu.ba;
edu.com;
edu.ba
I should get only edu.ba matches including; if there is some. Works fine except in actual query.
(\;+|^)\bedu.ba\b(\;+|$|\n)
Is there anything I could change to get the same results?

You want to match edu.ba in between semi-colons or start/end of string. The word boundaries are redundant here (although if you want to experiment, the MySQL regex before MySQL v8 used [[:<:]] / [[:>:]] word boundaries, and in MySQL v8+, you need to use double backslashes with \b - '\\b').
Use
(;|^)edu[.]ba(;|$)
Details
(;|^) - ; or start of string
edu[.]ba - edu.ba literal string (dot inside brackets always matches a literal dot)
(;|$) - ; or end of string.

Related

MySQL REGEXP_SUBSTR() escaping issue?

Please take the following example regex:
https://regexr.com/4ek7r
As you can see, the regex works great and matches the sizes (e.g. 3/16" etc) from the product descriptions.
I'm trying to implement this in MySQL 8.0.15 using REGEXP_SUBSTR()
As per the documentation I have doubled up the escape characters but the regex is not working.
Please see the following SQL fiddle:
https://www.db-fiddle.com/f/e6Ez3XCdU5Ahs91z6TQA8P/0
As you can see, REGEXP_SUBSTR() returns NULL
I'm presuming this is an escape issue - but i'm not 100% sure.
How can I ensure MySQL returns the 1st match per product (row) akin to the regexr.com example?
Cheers
Edit: 28/05/2019 - root cause
Wiktor's answer below solved my problem and his regex was much cleaner & well worth the upvote. That said, i didn't understand why my original version was not working after the port from SQL Server to MySQL. I finally noticed the problem this morning - it had nothing to do with the regex, it was a rookie error in string concatenation! Specifically, I was using UPPER(Description + ' ') (i.e. using +) - which works fine in SQL Server but obviously; MySQL forces numeric! So i was essentially running my regex against a 0! Replacing the + with CONCAT actually fixed my original query with original regex - just thought i'd share this in case it helps anyone else!
In MySQL v8.x that supports ICU regex, you may use
SELECT Description, REGEXP_SUBSTR(Description, '(?im)(?=\\b(?:[0-9/]+(?:\\.[0-9/]+)?\\s*(?:[X-]|$)|[0-9/\\s]+(?:\\.[0-9/]+)?(?:[CM]?M|["”TH])))[0-9/\\s.]+(?:[CM]?M|["”TH])?(?:\\s*[/X-]\\s*[0-9/\\s.]+(?:[CM]?M|["”TH])?)?(?=[.\\s()]|$)') AS Size FROM tbl_Example
The main points:
The flags can be used as inline options, (?mi), m will enable multiline mode when ^ and $ match start/end of a line and i will enable case insensitive mode
[$] matches a $ char, to match end of a line position, you need to move $ out of a character class, use alternations in this case ((?=[\.\s\(\)$]) -> (?=[.\s()]|$), yes, do not escape what does not have to be escaped, too)
Matching fractional number part, it is better to use a (?:\.[0-9/]+)? like pattern (it matches an optional sequence of . and then 1 or more digits or /s)
(C|M)? is better written as [CM]? (a character class is more efficient)

Issue with Regexp with mySQL query

I'm trying to build a search query which searches for a word in a string and finds matches based on the following criteria:
The word is surrounded by a combination of either a space, period or comma
The word is at the start of the string and ends with a space, period or comma
The word is at the end of the string and is followed by a space, period or comma
It's a full match, i.e. the entire string is just the word
For example, if the word is 'php' the following strings would be matches:
php
mysql, php, javascript
php.mysql
javascript php
But for instance it wouldn't match:
php5
I've tried the following query:
SELECT * FROM candidate WHERE skillset REGEXP '^|[., ]php[., ]|$'
However that doesn't work, it returns every record as a match which is wrong.
Without the ^| and |$ in there, i.e.
SELECT * FROM candidate WHERE skillset REGEXP '[., ]php[., ]'
It successfully finds matches where 'php' is somewhere in the string except the start and end of the string. So the problem must be with the ^| and |$ part of the regexp.
How can I add those conditions in to make it work as required?
Try '\bphp\b', \b is a word boundary and might just be exactly what you need because it looks for the whole word php.
For MySQL, word boundaries are represented with [[:<:]] and [[:>:]] instead of \b, so use the query '[[:<:]]php[[:>:]]'. More info on word boundaries here.
Well, you can play around a bit with regex101.com
Something I found that works for you but doesn't exactly follow your rules is:
/(?=[" ".,]?php[" ".,]?)(?=php[\W])/
This uses the lookahead operator, ?=, to do AND
The first portion of the regex is
[" ".,]?php[" ".,]?
This will match anything that has a space, period, or comma before or after the php, but at most only one.
The section portion of the regex is
php[\W]
This will match anything that is php, followed by a non-character. In other words, it will NOT match php followed by a character, digit, or underscore.
It's not the perfect answer for your set of rules, but it does work with your sample data set. Play around on regex101.com and try to make a perfect one.

MySQL regex matching at least 2 dots

Consider the following regex
#(.*\..*){2,}
Expected behaviour:
a#b doesnt match
a#b.c doesnt match
a#b.c.d matches
a#b.c.d.e matches
and so on
Testing in regexpal it works as expected.
Using it it in a mysql select doesn't work as expected. Query:
SELECT * FROM `users` where mail regexp '#(.*\..*){2,}'
is returning lines like
foo#example.com
that should not match the given regex. Why?
I think the answer to your question is here.
Because MySQL uses the C escape syntax in strings (for example, “\n”
to represent the newline character), you must double any “\” that you
use in your REGEXP strings.
MYSQL Reference
Because your middle dot wasn't properly escaped it was treated as just another wildcard and in the end your expression was effectively collapsed to #.{2,} or #..+
#anubhava's answer is probably a better substitute for what you tried to do though I would note #dasblinkenlight's comment about using the character class [.] which will make it easy to drop in a regex you've already tested in at RegexPal.
You can use:
SELECT * FROM `users` where mail REGEXP '([^.]*\\.){2}'
to enforce at least 2 dots in mail column.
I would match two dots in MySQL using like:
where col like '%#.%.%'
The problem with your code is that .* (match-everything dot) matches dot '.' character. Replacing it with [^.]* fixes the problem:
SELECT *
FROM `users`
where mail regexp '#([^.]*[.]){2,}'
Note the use of [.] in place of the equivalent \.. This syntax makes it easier to embed the regex into programming languages that use backslash as escape character in their string literals.
Demo.

Mysql regex error #1139 using literal -

I tried running this query:
SELECT column FROM table WHERE column REGEXP '[^A-Za-z\-\']'
but this returns
#1139 - Got error 'invalid character range' from regexp
which seems to me like the - in the character class is not being escaped, and instead read as an invalid range. Is there some other way that it's suppose to be escaped for mysql to be the literal -?
This regex works as expected outside of mysql, https://regex101.com/r/wE8vY5/1.
I came up with an alternative to that regex which is
SELECT column FROM table WHERE column NOT REGEXP '([:alpha:]|-|\')'
so the question isn't how do I get this to work. The question is why doesn't the first regex work?
Here's a SQL fiddle of the issue, http://sqlfiddle.com/#!9/f8a006/1.
Also, there is no language being used here, query is being run at DB level.
Regex in PHP: http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55
Regex in JS: https://jsfiddle.net/6ay4zmrb/
Just change the order.
SELECT column FROM table WHERE column REGEXP '[^-A-Za-z\']'
#Avinash Raj is correct the - must be first (or last). The \ is not an escape character in POSIX, which is what mysql uses, https://dev.mysql.com/doc/refman/5.1/en/regexp.html.
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression.
-http://www.regular-expressions.info/posixbrackets.html
What special characters must be escaped in regular expressions?
Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally

Regex Search in phpMyAdmin

Attempting to change the "files" folder location in a Drupal site from /files to /sites/default/files.
In order to avoid changing anything else such as
http://www.google.com/profiles/
I'm trying to use a basic regular expression with a word boundary.
\bfiles/
A quick check in regexpal is working as expected, but when I enter the above in the phpMyAdmin search , checking the "as regular expression" checkbox, I don't get the expected result.
Two questions:
How should I write my expression with a word boundary so that it works in phpMyAdmin?
I'm really a newbie at SQL statements! Would it be possible to write a SQL query that would simply look for every occurrence of "files/" & replace it with "sites/default/files/"?
According to the MySql docs, the regex flavour used is POSIX 1003.2. For this flavour of regex, word boundaries are as follows:
[[:<:]] (beginning) [[:>:]] (end)
so your regex would be:
[[:<:]]files/
If you want to use sql to search and replace all instances of [[:<:]]files/ from a specific field in a table, you could use a UDF such as the one found here
Also, you should be aware of the following while using regex with MySql:
Because MySQL uses the C escape syntax in strings (for example, “\n”
to represent the newline character), you must double any “\” that you
use in your REGEXP strings.