I have a underscore delimited string or words such as:
word1_word2_word3_word4 and a list of allowed values such as word1, word3
The goal is to filter out not allowed values and replace them with let's say … so the resulting string will be word1_..._word3_...
This needs to be done in MySQL and I plan to use REGEXP_REPLACE but all my attempts to come with a working regex that handles all instances ( such as first and last word) failed.
To simplify things I tried adding a leading and trailing underscores to the string so it becomes _word1_word2_word3_word4_ and do:
(?<=_)[^_]+(?=_) which nicely matches all string between delimiters, however I could not figure out how to exclude word1 and word3.
Just negative lookahead for word1|word3 right before the start of the match:
(?<=_)(?!word1|word3)[^_]+(?=_)
If the match may also start at the beginning of the string or end at the end of the string (without a _ delimiter), then alternate the lookarounds with ^ and $:
(?<=_|^)(?!word1|word3)[^_]+(?=_|$)
https://regex101.com/r/QFb1p7/1
You may use
(?<![^_])(?!(?:word1|word3)(?![^_]))[^_]+(?![^_])
See the regex demo.
Note: (?<![^_]) = (?<=^|_) and (?![^_]) = (?=_|$), but are more efficient. Why (?![^_]) in (?!(?:word1|word3)(?![^_])) is used? Because you may still want to match word10 or word345.
Details
(?<![^_]) - start of string or _
(?!(?:word1|word3)(?![^_])) - no word1 or word3 up to the end of string or _ are allowed
[^_]+ - 1+ chars other than underscores
(?![^_]) - end of string or _
Related
Say I have this string:
["teads.tv, 15429, reseller, 15a9c44f6d26cbe1 ","video.unrulymedia.com,367782854,reseller","google.com, pub-8173359565788166, direct, f08c47fec0942fa0","google.com, pub-8804303781641925, reseller, f08c47fec0942fa0 "]
I am trying to extract all the text strings like teads.tv, google.com and etc.
Each text string is placed in the following way "text.text,, but there are aslo combinations of ", without any character in between.
I tried this Regex expression:
"(.*?)\,
but I also capture the empty combinations, you can check it out here.
How can I modify the Regex expression, so it would capture only the combination with a string between ",?
Cheers,
If there should be at least a single non whitespace char present other than " , [ ] you can match optional whitespace chars and use a negated character class listing all the characters that should not be matched and repeat that 1 or more times.
"(\s*[^\][\s",]+),
Regex demo
The more broad variation is to repeat 1+ times any char except a comma:
"([^,]+),
Regex demo
How about using + (one or more) instead of * (zero or more) as quantifier:
"(.+?),
Additionally, you may not need to escape , with backslash.
Reading the question as retrieving the string with a dotted notation such as domain names means that we are looking for the first string after a ".
This string will grab strings with dots within them, but avoid the quote characters.
const regEx = /(?:\")([\w\d\.\-]+)/g;
const input = '["teads.tv, 15429, reseller, 15a9c44f6d26cbe1 ","video.unrulymedia.com,367782854,reseller","google.com, pub-8173359565788166, direct, f08c47fec0942fa0","google.com, pub-8804303781641925, reseller, f08c47fec0942fa0 "]';
const regMatch = Array.from(input.matchAll(regEx), m => m[1]);
console.log(regMatch)
Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.
Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.
Friends,
I want to fetch hashtags from a field.
select PREG_RLIKE("/[[:<:]]abcd[[:>:]]/","okok got it #abcd");
//output 1
BUT
select PREG_RLIKE("/[[:<:]]#abcd[[:>:]]/","okok got it #abcd");
//output 0
not getting why # is not considering
Please help
The pattern matches:
[[:<:]] - a leading word boundary
#abcd - a literal string
[[:>:]] - a trailing word boundary.
Since a leading word boundary is a location between a non-word and a word char (or start of a string and a word char), you can't expect it to be matched between a space (non-word char) and a hash symbol (#).
Since you are using a PCRE based UDF function, use lookarounds:
select PREG_RLIKE("/(?<!\\w)#abcd(?!\\w)/","okok got it #abcd");
The (?<!\w) negative lookbehind acts like a leading word boundary failing the match if the search term is preceded with a word char, and (?!\w) negative lookahead fails the match if the search term is followed with a word char.
See the regex demo.
I am trying to do iregex match in django for the regular expression
reg_string = (\w|\d|\b|\s)+h(\w|\d|\b|\s)+(\w|\d|\b|\s)+anto(\w|\d|\b|\s)+
self.queryset.filter(name__iregex=r"%s"%(reg_string,))
by using the word "The Canton" for name but its not returning any value but while using it in python re.search its working
print (re.search(r'(\w|\d|\b|\s)+h(\w|\d|\b|\s)+(\w|\d|\b|\s)+anto(\w|\d|\b|\s)+', 'The Canton', re.I).group()
I am using Mysql 5.7, any one know how to fix this
Note that MySQL REGEXP does not support shorthand character classes like \s, \d, \w, etc. It supports some basic POSIX character classes like [:digit:], [:alpha:], [:alnum:], etc.
Even if you keep on using the pattern in Python, you should not write (\w|\d|\b|\s)+ as it matches and captures a single char that is a word char or digit, word boundary, or whitespace, 1 or more times (and rewriting buffer of Group N with the latest char the engine matched). You could rewrite that with a single character class - [\w\s]+.
Now, your pattern in MySQL will look like
[_[:alnum:][:space:]]+h[_[:alnum:][:space:]]+anto[_[:alnum:][:space:]]+
where [\w\s]+ is turned into [_[:alnum:][:space:]]+:
[ - start of a bracket expression
_ - an underscore (as \w matches _ and [:alnum:] does not)
[:alnum:] - an alphanuemric char
[:space:] - any whitespace char
] - end of the bracket expression
+ - quantifier, 1 or more times.
I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$