Regex all uppercase with special characters - mysql

I have a regex '^[A0-Z9]+$' that works until it reaches strings with 'special' characters like a period or dash.
List:
UPPER
lower
UPPER lower
lower UPPER
TEST
test
UPPER2.2-1
UPPER2
Gives:
UPPER
TEST
UPPER2
How do I get the regex to ignore non-alphanumeric characters also so it includes UPPER2.2-1 also?
I have a link here to show it 'real-time': http://www.rubular.com/r/ev23M7G1O3
This is for MySQL REGEX
EDIT: I didn't specify I wanted all non-alphanumeric characters (including spaces), but with the help of others here it led me to this: '^[A-Z-0-9[:punct:][:space:]]+$' is there anything wrong with this?

Try
'^[A-Z0-9.-]+$'
You just need to add the special characters to the group, optionally escaping them.
Additionally if you choose not to escape the -, be aware that it should be placed at the start or the end of the grouping expression to avoid the chance that it may be interpreted as delimiting a range.
To your updated question, if you want all non-whitespace, try using a group such as:
^[^ ]+$
which will match everything except for a space.
If instead what you wanted is all non-whitespace and non-lowercase, you likely will want to use:
^[^ a-z]+$
The 'trick' used here is adding a caret symbol after the opening [ in the group expression. This indicates that we want the negation of the match.
Following the pattern, we can also apply this 'trick' to get everything but lowercase letters like this:
^[^a-z]+$
I'm not really sure which of the 3 above you want, but if nothing else, this ought to serve as a good example of what you can do with character classes.

I believe you are looking for (one?) uppercase-word match, where word is pretty much anything.
^[^a-z\s]+$
...or if you want to allow more words with spaces, then probably just
^[^a-z]+$

You just need to put in the . and -. In theory, you don't need to escape because they are inside the brackets, but I like to to remind myself to escape when I have to.
'^[A-Z0-9\.\-]+$'

Try regular expression as below:
'^[A0-Z0\\.\\-]+$'

Related

How to limit simple form input to 50 characters

Is it possible to limit a simple form input to only 50 characters without javascript?
I have used the max_length attribute, however this includes blank spaces which is not what i want.
I've attempted to use pattern (as suggested on another post), but i can't seem to get that to work either.
Thanks
I don't know why you don't want it to include blanks.
Usually I use max_length including blanks and leave it to the user to trim their excess whitespace. I'm not disagreeing, I honestly don't know what your requirement is.
If you want to allow leading and trailing whitespace, but are willing to leave it to the user to replace excess whitespace within the text to one whitespace character then this is the pattern you want:
<input pattern="^\s*.{0,50}\s*$">
Sometimes for multiline regular expressions, \A is used instead of ^ and \z is used instead of $, but I'm not sure HTML supports that in their regular expressions.

HTML Input Pattern Comma and Letters

What is the correct pattern for a text input to only allow uppercase letters, lowercase letters, and commas?
I know that this is correct for the letters:
pattern="[a-zA-Z]"
but I dont know how to allow commas.
Thanks for any help!
Short answer:
pattern="^[a-zA-Z,]*$"
A couple of comment:
* means zero or more characters which means this patter will allow empty fields as well. If you want to guarantee that it will contain at least one character, use + instead of *.
^ means beginning of the string and $ is the end. If you don't use them then something like this would be possible "!#123asdSDADS,,,21312312(2"

Regex find two characters in order, between others, ignoring punctuation

I'm trying to filter using regex in mySQL.
The field is a text field and I want to find all that match 'MD' or similar ('M.D.', 'M. D.', 'DDS, M.D.' etc.).
I do not want to accept those that contain M and D as a part of another acronym (e.g., 'DMD'). However 'DMD, M.D.' I would want to find.
Apologies if this is a simple task - I read through some regex tutorials and couldn't figure this out! Thanks.
Update:
With help from the suggestions I arrived at the following solution:
(\s|^)M\.?\s*D\.?
which works for all of my cases. The quotes in my questions were to indicate it was a string, they are not a part of the string.
You can use a regex like this:
\b(M\.?\s*D\.?|D\.?\s*D\.?\s*S\.?)
Working demo
If I have understood your requirement:
'([^'.]*[ ,]*M[. ]*D[. ]*)'
this looks for MD preceded by space comma or ' separated by 0 or more dots & spaces, followed by '
it matches all the contents between the '' marks
test: https://regex101.com/r/oV2kV8/2
In the end I found this solution works:
(\s|^)M\.?\s*D\.?(\s|$)
This allows for the 'MD' to be at the start or after another credential and to have spaces or periods or nothing between the letters.

Regex matching Google Cache url (matching entire href parameter when it contains a word)

Disclaimer: I know that html and regex should not stand together, but this is an exceptional case.
I need to parse Google Search results and extract cache urls. I have this in the page:
<a href="/url?q=http://webcache.googleusercontent.com/search%3Fq%3Dcache:
gsNKb7ku3ewJ:somedata&ei=MyIIUtrZAcPX7AaVzIHwDg&ved=0CB8QIDAC&usg
=AFQjCNGcnWfdzQiTKwyAMmI-M-xzxII5Ag">Cached</a>
I tried simple stuff like: href=[\'"]?([^\'" >]+) but it is not what I need. I want to extract a single parameter (q) from the href. I need to get:
http://webcache.googleusercontent.com/search%3Fq%3Dcache:gsNKb7ku3ewJ:somedata
So everything between "url?q=" and first "&", when the contents contain word "webcache" in it.
If your language supports positive look-behinds:
(?<=q=).*?(?=[&"])
Otherwise match group \1 with this expression:
(?:q=)(.*?)(?=[&"])
Explanation:
.*? is the body of our expression. Just match everything, but don't be greedy!
(?<=q=) is a positive look-behind, which says "q=" should come before the match
(?=[&"]) is a positive look ahead, which says "either & or a quote should come after the match"
Because we make it not greedy with the ?, it'll stop at the first quote or ampersand. Otherwise it'd match all of the way to the closing quote.
Use a look behind before, and a look ahead at the end to assert the surrounding text, and include the keyword in the regex:
(?<=url\?q=)[^&]*webcache[^&]*(?=&)
Using [^&]* ensures that the keyword occurs before an & - within the target string.

MySQL RegEx Search Query

I am writing a custom search engine for my website. I am trying to make use of MySQL REGEXP feature. I would like to be able to search for a word separated by spaces to avoid the chances of getting suffixes or prefixes on a word. For example I am trying to search for "appreciate" I want appreciate, not appreciated or unappreciate or unappreciated. Any ideas on how I could do this with MySQL's REGEXP? My idea for this was to look for spaces like maybe so:
^appreciate$|^appreciate[:space:]|[:space:]appreciate$|[:space:]appreciate[:space:]
I am sure they is a better way of doing it and I have no idea if that even works
I think what you want is something like this:
SELECT 'I appreciate you' REGEXP '[[:<:]]appreciate[[:>:]]'; /* matches */
[[<:]] and [[>:]] are word boundaries. From the manual:
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
Edit: just to clarify, this also deals with situations where there's a newline character after the word, or a comma, etc
What about:
^\s*appreciate(\s+.*)*$
Between the start and the word there may be 0+ whitespace parts
then comes the word
then if something comes after that, it has to start with whitespace
You can seek for non-alphabetic characters:
[^[:alpha:]]+
... or just word boundaries:
[[:<:]]foo[[:>:]]
Before making a choice, don't forget to make some tests with commas, dots and non-English chars. Also, take into account that MySQL does not fully support regular expressions in multi-byte strings (such as UTF-8).