REGEXP MySQL - several Groups no matter which order - mysql

i try to create a REGEX in MySQL but after 4 hours of reading other examples and try & error i hope someone can help me to fix my regex and bring it to work.
What i need is:
to match several strings in a seperatet("###") text no matter in which position/order they are inside "###TEXT###TEXT###"
My regex so far which work, but only if the strings in the exact order like in the regex. So i have to say my regex search STRING1 && STRING2 inside ### ### no matter in which position.
###([^#]*)(9034==1-wellig)([^#]*)(9037==DIN C4)([^#]*)###
My Text:
###9021==220|9034==1-wellig|9023==356|9024==230|9037==DIN C4###9021==220|9034==2-wellig|9023==356|9037==DIN C4|9024==230###9021==220|9034==1-wellig|9023==356|9037==DIN C4|9024==230###
When i modify my Text to something like that("9037==DIN C4" before "9034==1-wellig") it do not work:
###9021==220|9037==DIN C4|9034==1-wellig|9023==356|9024==230###9021==220|9034==2-wellig|9023==356|9037==DIN C4|9024==230###9021==220|9037==DIN C4|9034==1-wellig|9023==356|9024==230###
Example: https://regex101.com/r/amal7c/1
I hope i explained my problem clearly, im sure it´s only 1 small change but i do not get it to work...
Best Regards
Tom

You cannot use lookarounds in MySQL regex, you need to use alternations:
WHERE col REGEXP '###[^#]*(9034==1-wellig[^#]*9037==DIN C4|9037==DIN C4[^#]*9034==1-wellig)[^#]*###'
See this regex demo.
Details
### - literal ###
[^#]* - zero or more chars other than #
(9034==1-wellig[^#]*9037==DIN C4|9037==DIN C4[^#]*9034==1-wellig) -
an alternation capturing group matching
9034==1-wellig[^#]*9037==DIN C4 - 9034==1-wellig, zero or more chars other than # and 9037==DIN C4
| - or
9037==DIN C4[^#]*9034==1-wellig - 9037==DIN C4, zero or more chars other than # and 9034==1-wellig
[^#]*### - zero or more chars other than # followed with ###.

OMG ok i think i just found the solution by myself, it working on regex101 but not in MySQL :/
###(?=[^#]*(9034==1-wellig))(?=[^#]*(9037==DIN C4))[^#]*###
But if some of you experts can enhance my regex im also very thankfull :)
https://regex101.com/r/amal7c/2

Related

Find %20 and replace with - in Dreamweaver

I have a line of code:
mainURL=Sample%20Line%20Of%20Code.php
I want to replace the %20 with a dash (-) using Dreamweaver's find and replace
I have tried several variations of:
Find - mainURL=([^\s]*).phpand mainURL=([^%20]*).php
Replace - mainURL=([^-]*).phpit seems to find my string but literally replaces it with mainURL=([^-]*).phpI'm fairly new to regular expressions and could use some help.
Thanks in advance - Tom
You may use
Find What: (\G(?!\A)|mainURL=(?=\S*\.php))((?:(?!\.php)\S)*?)%20
Replace with: $1$2-
See the regex demo.
Details
(\G(?!\A)|mainURL=(?=\S*\.php)) - Group 1 capturing mainURL= that is followed with 0+
non-whitespace and .php substring or the end of the previous match
((?:(?!\.php)\S)*?) - Group 2: any non-whitespace char that is not a starting point for .php substring, 0+ repetitions, as few as possible
%20 - a literal substring.

MySQL RegEx to match two consecutive digits that are the same

I am using the following RegEx in MySQL to match two consecutive digits that are the same anywhere in a string:
^.*([[:digit:]])\1+.*$
It matches correctly the following strings:
8831
5011
9931
but it also matches
9318
and it doesn't match
3449
Is the problem around .* or is it something else?
There's no way to check to the same thing twice directly, instead you would need to check for all possibilities. Luckily since you are only looking at 10 digits, it's relatively easy:
(11|22|33|44|55|66|77|88|99|00)
I don't think MySQL regular expressions have back references. You can do the more verbose:
where col regexp '00|11|22|33|44|55|66|77|88|99'

MySQL matching this regex while it shouldn't

I'm trying to recognize quoting (citing) somebody's else sentence in a markdown text, which I have in my local copy of MySQL GHTorrent dataset. So I wrote this query:
select * from github_discussions where body rlike '(.)*(\s){1,}(>)(\s){1,}(.)+';
it matches some unwanted data, which according to https://regex101.com/, it should not with this particular regular expression.
Test string:
`Params` is plural -> contain<s>s</s>
Matched on MySQL database, not matched at regex101 dot com.
Obvious example of quoting, but not matched at db:
Yes, I believe so.\r\n\r\n\r\n\r\nK\r\n\r\n> On 19-Jul-2014, at 17:33, Stefan Karpinski <notifications#github.com> wrote:\r\n> \r\n> This is the standard 3-clause BSD license, right?\r\n> \r\n> —\r\n> Reply to this email directly or view it on GitHub.
Moreover, MySQL workbench didn't show those return carriage and new line symbols unless copy-pasted here.
Can I normalize (remove \r and \n) with some update query ?
Is MySQL regex implementation different from POSIX standard regex ?
Do you have by any chances maximally clean solution for recognizing quoting in a markdown text ?
Thanks!
You've got an awful lot of parens in there. Try this as functionally what you have above:
select * from github_discussions where body rlike '.*[:blank:]+>[:blank:]+.+'
However, I'm not sure that's really what you want. This would happily match this line:
this is before > and after
which by my understanding is not a quoted string in markdown. Instead I would anchor it at the beginning like this:
select * from github_discussions where body rlike '^[:blank:]*>[:blank:]+'
That will match a greater-than sign at the beginning of the line, optionally preceded by whitespace. Is that what you are looking for?
I'm not sure if your data has newlines embedded. If so, you may need to look into ways of having your regex identify newlines using the ^ anchoring symbol. As is the well accepted conclusion in regex literature, that is left as an exercise for the student. :-)

MySQL Regular Expression [a-z]\.[a-z] but not a.m. or p.m

Evening,
I want to search some columns in a MySQL table for any instances of [a-z]\.[a-z], for example:
John.than, Ame.ica, Llan.antffraid etc.
but I don't want this to include the strings 'a.m.' OR 'p.m.'. I have tried using (?!a.m.|p.m.) but this does not work. It returns the error: "Got error 'repetition-operator operand invalid' from regexp".
I have the following regular expression:
REGEXP BINARY '[a-z]\\\.[a-z]'
N.B. If a colum includes a.m. OR p.m. but also contains a string like bro.ken, it needs to be returned.
Build your regex step by step:
You want everything, except its a "standalone" a.m or p.m:
[b-oq-z]{1}\.[a-ln-z]{1} matches everything of the format x.y that is not a.# or p.# or #.m
However you miss a.a, a.b, a.c ... also. so add that cases:
a\.[^m] (same for the p-cases: p\.[^m])
a.m is valid, when there are chars in front of the a: kra.m, tra.m. Same applies for p.m: erp.m
[a-z]{1}[ap]\.m covers this condtion.
Now, we are missing strings, where the second part is longer: a.mod, p.markt:
[ap]\.m[a-z]+ covers that one.
Finally just the ones ending with .m but having a different prefix are missing:
[b-oq-z]{1}\.m
This should now cover all possible use Cases. Simple combine the pattern with OR (|) and you are done:
([b-oq-z]{1}\.[a-ln-z]{1}|a\.[^m]|p\.[^m]|[a-z]{1}[ap]\.m|[ap]\.m[a-z]+|[b-oq-z]{1}\.m)
Edit live on Debuggex
Note: This will NOT give you the exakt match groups. But since you use it in a SQL-Query only the case that there is a match is required. (ark.m will be matched by k.m - but it fulfills your specification)
Keep in Mind: When creating a regular expression, there is no right solution: Just Working Ones, and not working ones. a\.[^m]|p\.[^m] is equal to [ap]\.[^m], which will reduce the pattern by one OR.
You have found the perfect Regex-Pattern, when 2 conditions are met:
It works!
You can understand it, when looking at it in 4 months!
If you can use assertions, this might work, but not sure about backtracking.
# (?=^.*(?:(?!a\.m|p\.m)[a-z]\.[a-z]|(?:a\.m|p\.m).*(?!a\.m|p\.m)[a-z]\.[a-z]))
(?=
^
.*
(?:
(?! a\.m | p\.m )
[a-z] \. [a-z]
|
(?: a\.m | p\.m )
.*
(?! a\.m | p\.m )
[a-z] \. [a-z]
)
)
I would do it like this:
SELECT 'Ame.ica wakes up at 8 a.m.' REGEXP
'[b-oq-z]\\.[a-ln-z]|[ap]\\.[^m]|[^ap]\\.m|[[:alpha:]][ap]\\.m|[ap]\\.m[[:alpha:]]' findme,
'America wakes up at 8 a.m.' REGEXP
'[b-oq-z]\\.[a-ln-z]|[ap]\\.[^m]|[^ap]\\.m|[[:alpha:]][ap]\\.m|[ap]\\.m[[:alpha:]]' dontfindme
It's a shorter and therefor slightly faster version of dognose's answer. Also it's tailored to MySQL which has the slightly odd [[:alpha:]] class.

Regexp for whole words, punctuation included

I'm completely new about MySQL regexp and, after 3 days searching the web I've got to give up!
I need to find words into a database, exact words. But I need to take care about punctuation too (would be nice I provide my list of punctuation symbols). These words might be at the beginning of the database record, at the end or in the middle. They also can the the only data into the record. I don't care about uppercase or lowercase words.
Thus, I need to retrieve words in such cases:
Lion - Lion. - Lion, - Lion; - Lion... (dots) - Lion… (hellip) etc. (there can be many other punctuation symbols)
or
'Lion' - "Lion" - (Lion) - <Lion> - /Lion/ etc.
but those are incorrect:
Lion.tiger - Lions - superlion - <Lion" - (Lion> - Lion.....
I've tried dozens of regexp provided on several websites but none could solve that accurately.
Cheers.
You should be able to use word boundaries. Otherwise you'll have to declare all of the characters that you're willing to accept.
select id,myRecord from my_records where myRecord REGEXP '[[:<:]]lion[[:>:]]';
Here's an SQL Fiddle using some of the examples that you've provided for matching.
Note that because the . character in Lion.tiger is typically considered a word barrier (end of sentence declaration) in English, you might need to make a special case where you're not matching it.
select id,myRecord from my_records where
myRecord REGEXP '[[:<:]]lion[[:>:]]' and
myRecord NOT REGEXP '[[:<:]]lion\.[[:alpha:]]';
Example fiddle.