Regexp to match JSON key:value pairs with commas in value [duplicate] - json

Can't get why this regex (regex101)
/[\|]?([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures all the input, while this (regex101)
/[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g
captures only |Func
Input string is |Func(param1, param2, param32, param54, param293, par13am, param)|
Also how can i match repeated capturing group in normal way? E.g. i have regex
/\(\(\s*([a-z\_]+){1}(?:\s+\,\s+(\d+)*)*\s*\)\)/gui
And input string is (( string , 1 , 2 )).
Regex101 says "a repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations...". I've tried to follow this tip, but it didn't helped me.

Your /[\|]+([a-z0-9A-Z]+)(?:[\(]?[,][\)]?)?[\|]?/g regex does not match because you did not define a pattern to match the words inside parentheses. You might fix it as \|+([a-z0-9A-Z]+)(?:\(?(\w+(?:\s*,\s*\w+)*)\)?)?\|?, but all the values inside parentheses would be matched into one single group that you would have to split later.
It is not possible to get an arbitrary number of captures with a PCRE regex, as in case of repeated captures only the last captured value is stored in the group buffer.
What you may do is get mutliple matches with preg_match_all capturing the initial delimiter.
So, to match the second string, you may use
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\()\K\w+
See the regex demo.
Details:
(?:\G(?!\A)\s*,\s*|\|+([a-z0-9A-Z]+)\() - either the end of the previous match (\G(?!\A)) and a comma enclosed with 0+ whitespaces (\s*,\s*), or 1+ | symbols (\|+), followed with 1+ alphanumeric chars (captured into Group 1, ([a-z0-9A-Z]+)) and a ( symbol (\()
\K - omit the text matched so far
\w+ - 1+ word chars.

Related

Extracting text string between 2 characters with Regex

Say I have this string:
["teads.tv, 15429, reseller, 15a9c44f6d26cbe1 ","video.unrulymedia.com,367782854,reseller","google.com, pub-8173359565788166, direct, f08c47fec0942fa0","google.com, pub-8804303781641925, reseller, f08c47fec0942fa0 "]
I am trying to extract all the text strings like teads.tv, google.com and etc.
Each text string is placed in the following way "text.text,, but there are aslo combinations of ", without any character in between.
I tried this Regex expression:
"(.*?)\,
but I also capture the empty combinations, you can check it out here.
How can I modify the Regex expression, so it would capture only the combination with a string between ",?
Cheers,
If there should be at least a single non whitespace char present other than " , [ ] you can match optional whitespace chars and use a negated character class listing all the characters that should not be matched and repeat that 1 or more times.
"(\s*[^\][\s",]+),
Regex demo
The more broad variation is to repeat 1+ times any char except a comma:
"([^,]+),
Regex demo
How about using + (one or more) instead of * (zero or more) as quantifier:
"(.+?),
Additionally, you may not need to escape , with backslash.
Reading the question as retrieving the string with a dotted notation such as domain names means that we are looking for the first string after a ".
This string will grab strings with dots within them, but avoid the quote characters.
const regEx = /(?:\")([\w\d\.\-]+)/g;
const input = '["teads.tv, 15429, reseller, 15a9c44f6d26cbe1 ","video.unrulymedia.com,367782854,reseller","google.com, pub-8173359565788166, direct, f08c47fec0942fa0","google.com, pub-8804303781641925, reseller, f08c47fec0942fa0 "]';
const regMatch = Array.from(input.matchAll(regEx), m => m[1]);
console.log(regMatch)

Regex grouping: must start with /, optional group of characters alpha-numeric with forward slashes and total 1-255 characters

I have an HTML5 input element with a pattern attribute. I'm having some trouble with an optional group.
The (relative) URL must start with a forward slash (I have this working).
The total (relative) URL may contain a total of up to 255 characters.
All characters from 2-255 must be (lowercase) alpha-numeric or a forward slash.
Separately the forward slash regex works and the 2-255 part works for alpha-numeric and forward slashes. However I'm having trouble allowing both groups with the second group being optional.
What I have confirmed to work:
pattern="^\/"
pattern="[a-z0-9\/]"
However I can't determine how to allow the second group as an option (I've tried adding the ? after the ending square bracket in example without luck).
I also am not sure how to combine the length ({255,}) bit to the total pattern expression.
How do I combine all three aspects of the regular expression?
Note: tags seem to be broken at the moment of posting this.
You can use
pattern="/[a-z0-9/]{0,254}"
You do not need ^ nor $ in the pattern regex, by the way, it must match the whole string anyway, it will be parsed as ^(?:/[a-z0-9/]{0,254})$ pattern. That is, it will match a string that starts with / and then contains 0 to 254 lowercase ASCII letters, digits or slashes till the string end.
Note that / should only be escaped in regex literals where / is used as a delimiter char. pattern regexps are defined with literal strings.

Regex / Pattern HTML email

Is there a way to associate two regex ?
I have this one which prevents user to use this email (test#test.com)
pattern="^((?!test#test.com).)*$"
I also have one which validates email syntax
pattern="[a-z0-9._%+-]{3,}#[a-z]{3,}([.]{1}[a-z]{2,}|[.]{1}[a-z]{2,}[.]{1}[a-z]{2,})"
How to merge those two regex in order to prevent user to user test#test.com and to validate the email syntax ?
I tried to use an OR operator (single pipe) but I am missing something, it doesn't work ...
Thanks !
It seems you may use
pattern="(?!test#test\.com$)[a-z0-9._%+-]{3,}#[a-z]{3,}\.[a-z]{2,}(?:\.[a-z]{2,})?"
Note that the HTML5 patterns are automatically anchored as they are wrapped with ^(?: and )$ at the start/end, so no need adding ^ and $ at the start/end of the pattern.
The (?!test#test\.com$) negative lookahead will fail the match if the input string is equal to the test#test.com string (unlike your first regex that only fails the input that contains the email).
The rest is your second pattern, I only removed {1} that are implicit and contracted an alternation group to a \.[a-z]{2,}(?:\.[a-z]{2,})? where (?:\.[a-z]{2,})? is an optional non-capturing group matching 1 or 0 sequences of . and 2 or more lowercase ASCII letters.
Add A-Z to the character classes to also support uppercase ASCII letters.

To fetch hash tags from a string in mysql

Friends,
I want to fetch hashtags from a field.
select PREG_RLIKE("/[[:<:]]abcd[[:>:]]/","okok got it #abcd");
//output 1
BUT
select PREG_RLIKE("/[[:<:]]#abcd[[:>:]]/","okok got it #abcd");
//output 0
not getting why # is not considering
Please help
The pattern matches:
[[:<:]] - a leading word boundary
#abcd - a literal string
[[:>:]] - a trailing word boundary.
Since a leading word boundary is a location between a non-word and a word char (or start of a string and a word char), you can't expect it to be matched between a space (non-word char) and a hash symbol (#).
Since you are using a PCRE based UDF function, use lookarounds:
select PREG_RLIKE("/(?<!\\w)#abcd(?!\\w)/","okok got it #abcd");
The (?<!\w) negative lookbehind acts like a leading word boundary failing the match if the search term is preceded with a word char, and (?!\w) negative lookahead fails the match if the search term is followed with a word char.
See the regex demo.

MySQL Regex for matching exactly 3 same chars but not 4 same chars within a larger string

I am trying to write a regex for mysql in PHP to find (at least one occurrence of) exactly 3 of the same characters in a row, but not 4 (or more) of the same.
Eg for "000" I want to find:
0//////0/00/ LS///////000
000////0/00/ LS//////////
0//////0/00/ LS////000///
0//////000// LS//////000/
0//////000// LS//00000000
but not:
0//////0000/ LS//////////
0//////0000/ LS//////////
0/////00000/ LS//////////
I have tried the code below which I thought would match 3 zeros preceded and followed by zero or more chars which are not 0, but this resulted in some rows with single 0's and some 000000's
REGEXP '[^0]*[0{3}][^0]*'
Many thanks.
If you plan to use a regex in MySQL, you cannot use lookarounds. Thus, you can use alternation with negated character class and anchors:
(^|[^0])0{3}([^0]|$)
See the regex demo
Explanation:
(^|[^0]) - a group matching either the start of string (^) or a character other than 0
0{3} - exactly 3 zeros
([^0]|$) - a group matching either a character other than 0 or the end of string ($).