This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I have some knowledge of MySQL regexp syntax but I am not proficient. I was looking for a way to construct a pattern that selects all names from a mysql table that contain strange characters due to international input such as spanish names that have the symbol on top of the n.
I came upon this following pattern which I tried and it worked.
[^a-zA-Z0-9#:. \'\-`,\&]
The query is:
SELECT *
FROM orders_table
WHERE customers_name REGEXP '[^a-zA-Z0-9#:. \'\-`,\&]'
However I would like to understand how this pattern was constructed and what each part means.
The idea is that anything between [...] is a character class, which matches any single character that's in the set between the [ and ].
Adding the ^ to the start of the list of characters means (as noted above) it's negated, which means it matches any character NOT in the set. Putting a ^ anywhere but the start of the [ ... ] means it's just a regular ^ character to match, and in no case does ^ inside a character class mean a start-of-line anchor.
Ranges work, such as a-z, and if you want a literal dash in the set, you either put it first (possibly after the ^), or quote it with \
Edit: the other special characters - # : etc. - are not special in this context, they just match as regular characters.
[a-zA-Z0-9#:. \'-`,\&] - Let us consider all the important parts of this pattern.
[] - denotes a character class which matches any single character within [].
[a-zA-Z0-9] -- One character (lowercase or uppercase) that is in the range of a-z,A-Z OR 0-9.
All other characters which are present within the [] match for a particular single character.
single quotes "'" and ampersand & are escaped by "\" because in sql, they are used for specific purposes.
To match a pattern which has a character other than any of these symbols within [], we use a negation "^" at the beginning.
Thus, as you have told ,your pattern selects all names from a mysql table that contain other strange characters.
Related
I have an HTML5 input element with a pattern attribute. I'm having some trouble with an optional group.
The (relative) URL must start with a forward slash (I have this working).
The total (relative) URL may contain a total of up to 255 characters.
All characters from 2-255 must be (lowercase) alpha-numeric or a forward slash.
Separately the forward slash regex works and the 2-255 part works for alpha-numeric and forward slashes. However I'm having trouble allowing both groups with the second group being optional.
What I have confirmed to work:
pattern="^\/"
pattern="[a-z0-9\/]"
However I can't determine how to allow the second group as an option (I've tried adding the ? after the ending square bracket in example without luck).
I also am not sure how to combine the length ({255,}) bit to the total pattern expression.
How do I combine all three aspects of the regular expression?
Note: tags seem to be broken at the moment of posting this.
You can use
pattern="/[a-z0-9/]{0,254}"
You do not need ^ nor $ in the pattern regex, by the way, it must match the whole string anyway, it will be parsed as ^(?:/[a-z0-9/]{0,254})$ pattern. That is, it will match a string that starts with / and then contains 0 to 254 lowercase ASCII letters, digits or slashes till the string end.
Note that / should only be escaped in regex literals where / is used as a delimiter char. pattern regexps are defined with literal strings.
I have a password field and I need to check if it has at least 8 characters and if it has the following characters:
! # # $ % ^ & *
I tried to do it using a pattern, and it's not working as expected:
<div class="col-sm-6 form-group">
<input type="password" class="form-control" id="Clave" name="txtClave"
pattern='/[!##$%^&*(),.?":{}|<>]/g.{8,}'
title="Debe contener uno de los siguientes caracteres: ! # # $ % ^ & *, y al menos 8 o más caracteres" required>
</div>
Try Regex: ^(?=.*[!##$%^&*(),.?":{}|<>]).{8,}$
Demo
The best mechanism for combining multiple tests in a single regex is a lookahead. An ordinary regex moves through a string looking for a match, which means that when it finds a match it is no longer at the beginning of the string. A lookahead looks for a match without actually moving (hence the name "lookahead"). The basic format is (?=<regex>) and you can combine as many as you like into a single pattern.
In this case, you have two conditions, so you'll want to combine two lookaheads. We've already seen the first -- .{8,} -- but in a lookahead you want a little more than that: you need to ensure that the regex matches the entire string. So start your pattern with \A, the anchor matching the beginning of the string, and end the lookahead with \z, the anchor matching the end of the string. Put it together and the first part of your pattern is \A(?=.{8,}\z). (This precaution is unnecessary in your specific case, because you'll accept passwords with more than eight characters, but it's still good practice.)
The second condition, matching any of eight specific characters, starts with the class [!##$%^&*]. But in a lookahead that starts at the beginning of the string and never moves, that class would match only the first character. You need a regex that matches anywhere in the string. An easy way to do this is .*[!##$%^&*], which matches zero or more characters followed by one of your special characters. In a lookahead, that would be (?=.*[!##$%^&*]). "Easy" is not always best, however: the .* construct is comparatively inefficient, because it always checks the entire string and then has to backtrack to the beginning before continuing, which can be computationally expensive.
A much more efficient way to do something like this is [^!##$%^&*]*[!##$%^&*]. This matches zero or more characters that are not in your special set, followed by exactly one character that is. (A caret (^) as the first character in a bracketed class means to negate the class; a caret anywhere else in the class is just a literal caret as a member of the class.) It's more efficient because it checks only the characters before its position in the string, and can stop immediately once it finds a match. Putting that in a lookahead gives us (?=[^!##$%^&*]*[!##$%^&*]).
Now you can simply combine the two lookaheads into your "pattern", like so:
pattern='\A(?=.{8,}\z)(?=[^!##$%^&*]*[!##$%^&*])'
That should match any password with eight or more characters, at least one of which is one of your eight special characters: ! # # $ % ^ & *
I am trying to do iregex match in django for the regular expression
reg_string = (\w|\d|\b|\s)+h(\w|\d|\b|\s)+(\w|\d|\b|\s)+anto(\w|\d|\b|\s)+
self.queryset.filter(name__iregex=r"%s"%(reg_string,))
by using the word "The Canton" for name but its not returning any value but while using it in python re.search its working
print (re.search(r'(\w|\d|\b|\s)+h(\w|\d|\b|\s)+(\w|\d|\b|\s)+anto(\w|\d|\b|\s)+', 'The Canton', re.I).group()
I am using Mysql 5.7, any one know how to fix this
Note that MySQL REGEXP does not support shorthand character classes like \s, \d, \w, etc. It supports some basic POSIX character classes like [:digit:], [:alpha:], [:alnum:], etc.
Even if you keep on using the pattern in Python, you should not write (\w|\d|\b|\s)+ as it matches and captures a single char that is a word char or digit, word boundary, or whitespace, 1 or more times (and rewriting buffer of Group N with the latest char the engine matched). You could rewrite that with a single character class - [\w\s]+.
Now, your pattern in MySQL will look like
[_[:alnum:][:space:]]+h[_[:alnum:][:space:]]+anto[_[:alnum:][:space:]]+
where [\w\s]+ is turned into [_[:alnum:][:space:]]+:
[ - start of a bracket expression
_ - an underscore (as \w matches _ and [:alnum:] does not)
[:alnum:] - an alphanuemric char
[:space:] - any whitespace char
] - end of the bracket expression
+ - quantifier, 1 or more times.
I am trying to search my codebase for code that calls a function named "foo" so I am searching for "foo(" but the results I'm getting includes everything with the word foo in it which includes css, comments and strings that don't even have the trailing open parenthesis.
Anyone know how to do a search for strings that include special characters like ),"'?
When searching for special characters, try using escape character before the character, i.e. \, e.g. "foo\(".
Additionally, I found a reply for a similar question (see http://marc.info/?l=opensolaris-opengrok-discuss&m=115776447032671). It seems that frequently occurring special characters are not indexed because of performance issues, therefore it might not be possible to effectively search for such pattern.
Opengrok supports escaping special characters that are part of the query syntax. Current special characters are:
+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /
To escape these character use the \ before the character. For example to search for (1+1):2 use the query \(1\+1)\:2
Suppose I got this string to be expected: 100:~# or 100:~/tmp
This really means, I need to match the terminal prompt for a machine (which may or may not contain the path). Normally, with this regex pattern:
100:(~|/)(/+[a-zA-Z0-9]*)*#
It works for an input string such as: 100:~/foo/bar/foo/baz#
You can test it here: Regex Pal
But using Expect in TCL, I have to add -re to match such pattern. However, I am not allowed to do so. I tried the above pattern without regex, and it failed.
The current pattern for matching 100:~# or 100:~/tmp is very simple: 100:[~/]*#, and I was told that it is shell expression for matching strings, not regular expression. The 100:[~/]*# pattern means it matches anything between 100:[~/] (~ and / are optional) and #. The * character is meant to match anything, as opposed to the regular * which is zero or more in traditional regex sense.
What exactly is pattern matching expression in Expect withou -re flag?
They are known as "glob" patterns. They are styled after the shell's pattern matching. The documentation is here: http://tcl.tk/man/tcl8.5/TclCmd/string.htm#M40
*
Matches any sequence of characters in string, including a null string.
?
Matches any single character in string.
[chars]
Matches any character in the set given by chars. If a sequence of the form x-y appears in chars, then any character between x and y, inclusive, will match. When used with -nocase, the end points of the range are converted to lower case first. Whereas {[A-z]} matches “_” when matching case-sensitively (since “_” falls between the “Z” and “a”), with -nocase this is considered like {[A-Za-z]} (and probably what was meant in the first place).
\x
Matches the single character x. This provides a way of avoiding the special interpretation of the characters *?[]\ in pattern.