Regex to match a username - html

I am trying to create a regex to validate usernames which should match the following :
Only one special char (._-) allowed and it must not be at the extremes of the string
The first character cannot be a number
All the other characters allowed are letters and numbers
The total length should be between 3 and 20 chars
This is for an HTML validation pattern, so sadly it must be one big regex.
So far this is what I've got:
^(?=(?![0-9])[A-Za-z0-9]+[._-]?[A-Za-z0-9]+).{3,20}
But the positive lookahead can be repeated more than one time allowing to be more than one special character which is not what I wanted. And I don't know how to correct that.

You should split your regex into two parts (not two Expressions!) to make your life easier:
First, match the format the username needs to have:
^[a-zA-Z][a-zA-Z0-9]*[._-]?[a-zA-Z0-9]+$
Now, we just need to validate the length constraint. In order to not mess around with the already found pattern, you can use a non-consuming match that only validates the number of characters (its literally a hack for creating an and pattern for your regular expression): (?=^.{3,20}$)
The regex will only try to match the valid format if the length constraint is matched. It is non-consuming, so after it is successful, the engine still is at the start of the string.
so, all together:
(?=^.{3,20}$)^[a-zA-Z][a-zA-Z0-9]*[._-]?[a-zA-Z0-9]+$
Debugger Demo

I think you need to use ? instead of +, so the special character is matched only once or not.
^(?=(?![0-9])?[A-Za-z0-9]?[._-]?[A-Za-z0-9]+).{3,20}

Related

Finding common phrases in rows that have dynamic content

I'm using MySQL, and I am trying to find common strings over a given character length within a series of messages that are highly dynamic, Each message may have a common phrase, but they will be appended with reference codes or names that don't match a specific format on either side of the string. for example, this is an example of the types of common phrases I'm trying to scan for, but has dynamic content embedded as well, and in different formats (https://screencast.com/t/rlABTWitQ)
The end result I am looking for is something akin to this (https://screencast.com/t/qXzrGNFuf)
Because of the highly variable nature of the formats of these messages, uses of substring_index and regexp (as much as my amateur familiarity with REGEXP has taken me), I can't seem to get anything going
SELECT LEFT("first_middle_last", CHAR_LENGTH("first_middle_last") - LOCATE('_', REVERSE("first_middle_last")));
I can't use something like this, as it would just strip out on a specific type of character. As you can see, the types of strings are too variant in format

Can we validate .0 using regex

I'm doing a field that will only accept whole numbers. So I did a regex validation like this /^\d{1,3}$/ this is validating whole number entry and does not allow decimal from .1 e.g it will make 1.1 invalid but when I tried to input 1.0 it accepted it. Is there a regex that will also check .0?
^\d{1,3}(\.0)?$ accepts one, two or three digit whole numbers as well as if they end with .0.

REGEX differentiating inner match from outer match

I am working with REGEX on complex JSON representing objects, each represented by UUID's. The problem is the REGEX that matches each individual object also matches a larger pattern. Take, for example, the following:
{_id:"(UUID)" value:"x"}(additional info here),{_id:"(UUID)" value:"y"}(additional info here)
now if I do a pattern such as /{_id:"(.+?)".+value:"(.+?)"}/g to grab the ID and Value of each, instead of matching each one individually will it not match the larger pattern, that being the first id and the last value?
What's the best way to ensure each group is individually pulled and not a larger pattern which also matches?
The problem with you regex /{_id:"(.+?)".+value:"(.+?)"}/g
was that .+ should be .+?
So now the regex is:
{_id:"(.+?)".+?value:"(.+?)"}/g
https://regex101.com/r/xK0qJ8/2
I was able to figure it out, I wasn't using the non-greedy "?" correctly. I was able to get each one individually by using the following:
/{_id:"(.+?)".+?value:"(.+?)"}/g

MySQL: Is it safe to lowercase or uppercase regular expression?

I use regular expressions in MySQL on multibyte-encoded (utf-8) data, but I need it to be match case-insensitively. As MySQL has bug (for many years unresolved) that it can't deal properly with matching multibyte-encoded strings case-insensitively, I am trying to simulate the "insensitiveness" by lowercasing the value and the regexp pattern. Is it safe to lowercase regexp pattern such way? I mean, are there any edge cases I forgot?
Could following cause any problems?
LOWER('šárKA') = REGEXP LOWER('^Šárka$')
Update: I edited the question to be more concrete.
MySQL documentation:
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
It is their bug filed in 2007 and unsolved until now. However, I can't just change database to solve this issue. I need MySQL somehow to consider 'Š' equal to 'š', even if it is by hacking it with not-so-elegant solution. Other characters than accented (multi-byte) match well and with no issues.
The i option for the Regex will make sure it matches case insensitively.
Example:
'^(?i)Foo$' // (?i) will turn on case insensitivity for the rest of the regex
'/^Foo$/i' // the i options turns off case sensitivity
Note that these may not work in your particular Flavour of Regex (which you haven't hinted upon) so make sure you consult your manual for the correct syntax.
Update:
From here: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
REGEXP is not case sensitive, except when used with binary strings.
As noone actually answered my original question, I made my own research and realized it is not safe to lowercase or uppercase regular expression without any other processing. To be precise, it is safe to do this with theoretically pure regular expressions, but their every sane implementation adds some character classes and special directives, which can be vulnerable to case changing:
Escape sequences like \n, \t, etc.
Character classes like \W (non-alphanumeric) and \w (alphanumeric).
Character classes like [.characters.], [=character_class=], or [:character_class:] (MySQL regular expressions dialect).
Lowercasing or uppercasing \W and \w could completely change regular expression's meaning. This leads to following conclusion:
Presented solution is no-go.
Presented solution is possible, but the regular expression must be lowercased in more sophisticated way than just by using LOWER or something similar. It has to be parsed and the case has to be changed carefully.

Email address regular expression with max length

I have a regular expression that I am using for client side HTML5 validation and I need to add a max length element to it. Here is my regular expression :
#pattern = #"^([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)$"
How would I for example limit it to 50 characters?
EDIT : I need to check the max length in the same regular expression as I am using HTML5 validation which only currently allows checking against required and pattern attributes.
If you absolutely must use a regex, add a lookahead assertion at the start of the regex:
#pattern = #"^(?!.{51})([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)$"
The (?!.{51}) asserts that it's impossible to match 51 characters starting from the beginning of the string, without actually consuming any of the characters, so they are still available for the actual regex match.