I'm trying to write a query to identify what rows have special characters in them, but I want it to ignore spaces
So far I've got
SELECT word FROM `games_hangman_words` WHERE word REGEXP '[^[:alnum:]]'
Currently this matches those that use all special characters, what I want is to ignore if the special character is space
So if I have these rows
Alice
4 Kings
Another Story
Ene-tan
Go-Busters Logo
Lea's Request
I want it to match
Ene-tan, Go-Busters Logo and Lea's Request
Simply extend your class.
... WHERE word REGEXP '[^[:alnum:] ]' ...
for only a "regular" space (ASCII 32) or
... WHERE word REGEXP '[^[:alnum:][:space:]]' ...
for all kind of white space characters.
Related
I'm building a word unscrambler using MySQL, Think about it like the SCRABBLE game, there is a string which is the letter tiles and the query should return all words that can be constructed from these letters, I was able to achieve that using this query:
SELECT * FROM words
WHERE word REGEXP '^[hello]{2,}$'
AND NOT word REGEXP 'h(.*?h){1}|e(.*?e){1}|l(.*?l){2}|l(.*?l){2}|o(.*?o){1}'
The first part of the query makes sure that the output words are constructed from the letter tiles, the second part takes care of the words occurrences, so the above query will return words like: hello, hell, hole, etc..
My issue is when there is a blank tile (a wildcard), so for example if the string was: "he?lo", the "?" Can be replaced with any letter, so for example it will output: helio, helot.
Can someone suggest any modification on the query that will make it support the wildcards and also takes care of the occurrence. (The blank tiles could be up to 2)
I've got something that comes close. With a single blank tile, use:
SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*$'
AND word not REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'
with 2 blank tiles use:
SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*.[acre]*$'
AND word NOT REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'
The . in the first regexp allows a character that isn't one of the tiles with a letter on it.
The only problem with this is that the second regexp prevents duplicates of the lettered tiles, but a blank should be allowed to duplicate one of the letters. I'm not sure how to fix this. You could add 1 to the counts in {}, but then it would allow you to duplicate multiple letters even though you only have one blank tile.
A possible starting point:
Sort the letters in the words; sort the letters in the tiles (eg, "ehllo", "acer", "aerr").
That will avoid some of the ORing, but still has other complexities.
If this is really Scrabble, what about the need to attach to an existing letter or letters? And do you primarily want to find a way to use all 7 letters?
I am optimizing my query since full text search returns irrevelant result when there is numbers and repetitive keywords in text.
What I want to do is to extract numbers on text and add X amount of point to relevance when sorting the result.
Everything works smoothly besides one thing;
When I want to extract and prioritize result with number Z, it also counts other numbers that includes number Z in any part of it.
For Example;
Sample Data
###############
Text 55.A
Text 55_B
Text #55ABC
Text 551234.
Text 55677#
Text 556
Query
###############
... CASE WHEN (myTable.title like "% 55%") THEN ...
Expected output
###############
Text 55.A
Text 55_B
Actual output
###############
Text 55.A
Text 55_B
Text #55ABC
Text 551234.
Text 55677#
Text 556
How could I use REGEXP with LIKE, there can be symbols and characters after number I have given.
Thanks in advance
You may use
REGEXP '([[:<:]]|_)55([[:>:]]|_)'
If you are using MySQL 8.x and newer that use ICU regex library use
REGEXP '(\\b|_)55(\\b|_)'
See the regex demo
The (\\b|_) matches a word boundary or a _, the ([[:<:]]|_) matches a starting word boundary or _ and ([[:>:]]|_) matches a trailing word boundary or _.
Why do I get 0 when running this expression?
SELECT 'Nr. 1700-902-8423. asdasdasd' REGEXP '1+[ ,.-/\]*7+[ ,.-/\]*0+[ ,.-/\]*0+[ ,.-/\]*9+[ ,.-/\]*0+[ ,.-/\]*2+[ ,.-/\]*8+[ ,.-/\]*4+[ ,.-/\]*2+[ ,.-/\]*3+';
I need to get true, when the text contains the specified number (17009028423). There can be symbols ,.-/\ between digits.
For example, if I have number 17009028423, I need get true when in text is:
1700-902-8423
17-00,902-84.23
170/09-0.28\423
1700..902 842-3
17,.009028 4//2\3
etc.
Thanks.
There are two problems with your regular expression. First is that backslash in \] escapes the special meaning of ] to denote a character class. You need to escape your backslash: \\]. Another problem is that - denotes a range [ and ] (e.g. [a-zA-Z]). You need to escape that too or put it at the end like [a-zA-Z-] (as #tenub said). Plus the backslashes should be escaped themselves, which makes:
SELECT 'Nr. 1700-902-8423. asdasdasd' REGEXP '1[ ,./\\\\-]*7[ ,./\\\\-]*0[ ,./\\\\-]*0[ ,./\\\\-]*9[ ,./\\\\-]*0[ ,./\\\\-]*2[ ,./\\\\-]*8[ ,./\\\\-]*4[ ,./\\\\-]*2[ ,./\\\\-]*3'
You can check for yourself.
I also removed + signs in case you want to match each number only once.
I have a regex '^[A0-Z9]+$' that works until it reaches strings with 'special' characters like a period or dash.
List:
UPPER
lower
UPPER lower
lower UPPER
TEST
test
UPPER2.2-1
UPPER2
Gives:
UPPER
TEST
UPPER2
How do I get the regex to ignore non-alphanumeric characters also so it includes UPPER2.2-1 also?
I have a link here to show it 'real-time': http://www.rubular.com/r/ev23M7G1O3
This is for MySQL REGEX
EDIT: I didn't specify I wanted all non-alphanumeric characters (including spaces), but with the help of others here it led me to this: '^[A-Z-0-9[:punct:][:space:]]+$' is there anything wrong with this?
Try
'^[A-Z0-9.-]+$'
You just need to add the special characters to the group, optionally escaping them.
Additionally if you choose not to escape the -, be aware that it should be placed at the start or the end of the grouping expression to avoid the chance that it may be interpreted as delimiting a range.
To your updated question, if you want all non-whitespace, try using a group such as:
^[^ ]+$
which will match everything except for a space.
If instead what you wanted is all non-whitespace and non-lowercase, you likely will want to use:
^[^ a-z]+$
The 'trick' used here is adding a caret symbol after the opening [ in the group expression. This indicates that we want the negation of the match.
Following the pattern, we can also apply this 'trick' to get everything but lowercase letters like this:
^[^a-z]+$
I'm not really sure which of the 3 above you want, but if nothing else, this ought to serve as a good example of what you can do with character classes.
I believe you are looking for (one?) uppercase-word match, where word is pretty much anything.
^[^a-z\s]+$
...or if you want to allow more words with spaces, then probably just
^[^a-z]+$
You just need to put in the . and -. In theory, you don't need to escape because they are inside the brackets, but I like to to remind myself to escape when I have to.
'^[A-Z0-9\.\-]+$'
Try regular expression as below:
'^[A0-Z0\\.\\-]+$'
I am writing a custom search engine for my website. I am trying to make use of MySQL REGEXP feature. I would like to be able to search for a word separated by spaces to avoid the chances of getting suffixes or prefixes on a word. For example I am trying to search for "appreciate" I want appreciate, not appreciated or unappreciate or unappreciated. Any ideas on how I could do this with MySQL's REGEXP? My idea for this was to look for spaces like maybe so:
^appreciate$|^appreciate[:space:]|[:space:]appreciate$|[:space:]appreciate[:space:]
I am sure they is a better way of doing it and I have no idea if that even works
I think what you want is something like this:
SELECT 'I appreciate you' REGEXP '[[:<:]]appreciate[[:>:]]'; /* matches */
[[<:]] and [[>:]] are word boundaries. From the manual:
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
Edit: just to clarify, this also deals with situations where there's a newline character after the word, or a comma, etc
What about:
^\s*appreciate(\s+.*)*$
Between the start and the word there may be 0+ whitespace parts
then comes the word
then if something comes after that, it has to start with whitespace
You can seek for non-alphabetic characters:
[^[:alpha:]]+
... or just word boundaries:
[[:<:]]foo[[:>:]]
Before making a choice, don't forget to make some tests with commas, dots and non-English chars. Also, take into account that MySQL does not fully support regular expressions in multi-byte strings (such as UTF-8).