I'm building a word unscrambler using MySQL, Think about it like the SCRABBLE game, there is a string which is the letter tiles and the query should return all words that can be constructed from these letters, I was able to achieve that using this query:
SELECT * FROM words
WHERE word REGEXP '^[hello]{2,}$'
AND NOT word REGEXP 'h(.*?h){1}|e(.*?e){1}|l(.*?l){2}|l(.*?l){2}|o(.*?o){1}'
The first part of the query makes sure that the output words are constructed from the letter tiles, the second part takes care of the words occurrences, so the above query will return words like: hello, hell, hole, etc..
My issue is when there is a blank tile (a wildcard), so for example if the string was: "he?lo", the "?" Can be replaced with any letter, so for example it will output: helio, helot.
Can someone suggest any modification on the query that will make it support the wildcards and also takes care of the occurrence. (The blank tiles could be up to 2)
I've got something that comes close. With a single blank tile, use:
SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*$'
AND word not REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'
with 2 blank tiles use:
SELECT * FROM words
WHERE word REGEXP '^[acre]*.[acre]*.[acre]*$'
AND word NOT REGEXP 'a(.*?a){1}|r(.*?r){1}|c(.*?c){1}|e(.*?e){1}'
The . in the first regexp allows a character that isn't one of the tiles with a letter on it.
The only problem with this is that the second regexp prevents duplicates of the lettered tiles, but a blank should be allowed to duplicate one of the letters. I'm not sure how to fix this. You could add 1 to the counts in {}, but then it would allow you to duplicate multiple letters even though you only have one blank tile.
A possible starting point:
Sort the letters in the words; sort the letters in the tiles (eg, "ehllo", "acer", "aerr").
That will avoid some of the ORing, but still has other complexities.
If this is really Scrabble, what about the need to attach to an existing letter or letters? And do you primarily want to find a way to use all 7 letters?
Related
I'm trying to write a query to identify what rows have special characters in them, but I want it to ignore spaces
So far I've got
SELECT word FROM `games_hangman_words` WHERE word REGEXP '[^[:alnum:]]'
Currently this matches those that use all special characters, what I want is to ignore if the special character is space
So if I have these rows
Alice
4 Kings
Another Story
Ene-tan
Go-Busters Logo
Lea's Request
I want it to match
Ene-tan, Go-Busters Logo and Lea's Request
Simply extend your class.
... WHERE word REGEXP '[^[:alnum:] ]' ...
for only a "regular" space (ASCII 32) or
... WHERE word REGEXP '[^[:alnum:][:space:]]' ...
for all kind of white space characters.
I would like to replace the text in a google doc. At the moment I have place markers as follows
Invoice ##invoiceNumber##
I replace the invoice number with
body.replaceText('##invoiceNumber##',invoiceNumber);
Which is fine but I can only run the script once as obviously ##invoiceNumber## is no longer in the document. I was thinking I could replace the text after Invoice as this will stay the same, appendParagraph looks like it might to the trick but I can't figure it out. I think something like body.appendParagraph("Invoice") would select the area? Not sure how to append to this after that.
You could try something like this I think:
body.replaceText('InvoiceNumber \\w{1,9} ','InvoiceNumber ' + invoicenumber);
I don't know how big your invoice numbers are but that will except from 1 to 9 word characters preceeded by a space and followed by a space. That pattern might have to be modified depending upon your textual needs.
Word Characters [A-Za-z0-9_]
If your invoice numbers are unique enough perhaps you could just replace them.
Reference
Regular Expression Syntax
Note: the regex pattern is passed as a string rather than a regular expression
I try to replace every occurrence of a word in a text (which is a html file) and everything around until we meet a " or a ' or a ( for behind or a ) for forward with a regex using nodejs.
My problem is that when I have two words to replace let's say 3.png and 13.png, 13.png is being replaced too by matching 3.png and when I come to replace 13.png in my text it's not there because it was already replaced when matching previous 3.png.
My ideal solution would be :
if matched pattern contains a /
then it must exact match after / and replace everything around (slash included) until we meet one of these characters (excluded) " or a ' or a ( for behind or a )
else exact match between "" or '' or ()
You can find here a regex101 example
Currently I'm sorting my words to search like so:
imgjson.sort((a, b) => b.name.length - a.name.length);
in order to replace the longest words first which solves my problem because we replace 13.png first then 3.png but I would like to know if this can be done with js regex?
Thanks a lot for your reply and time!
As #PushpeshKumarRajwanshi told use \b.
If you want to be more accurate and informed about regex, you can use https://regex101.com/.
In right-bottom corner you can find all special characters and functions of regex you may be need to use.
I store lyrics of songs and also allow chords to be added by putting them between square brackets (e.g: [Dm7]). Here's an example of lyrics stored in my database:
Left my fear [Dm7]by the side of the [B]road
Hear You[C] speak won't let[E] go
Fall to my knees
...
What I want to do is search for lyrics in songs. For example I might want to search for the lyrics fear by the side . The problem is the [Dm7] in my example above does not allow a simple LIKE search.
Is it possible to do a search (REGEX?) that excludes text such as [Dm7] from a query? If so how? Please note that the chords between the square brackets can vary.
You might like to consider a fulltext index, and then use match() against() in your where clause. Example:
create fulltext index ftx on songs(lyrics);
select *
from songs
where match(lyrics) against('fear by the side');
demo here
The matching is a little fuzzy, and you can't use the boolean mode matching because the chords don't have whitespace on both sides, but the normal mode should be sufficient.
The 'fuzziness' of the match can be used to provide a match ranking - works best on english language, which this seems to be. For example:
select match(lyrics) against('fear by the side') rank,
lyrics from songs
where match(lyrics) against('fear by the side')
order by match(lyrics) against('fear by the side') desc;
Would sort the results by best match, and also return the matching rank.
updated demo
The fulltext index also has a boolean mode, which as the same suggests, can be used to force the results to include or exclude certain words like so:
match(column) against('+word -otherword' in boolean mode) would return all rows for which column contains word but does not have otherword.
your fulltext index can also be multi column, if you desire.
Thanks to #SvenB and his suggestion of this post, this was my answer.
REPLACE(col, SUBSTRING(col, (LOCATE('[', col)), LOCATE(']', col) - (LOCATE('[', col)) + 1), '') LIKE '%fear by the side%'
It's a bit messy but works! I think in the long term FULL TEXT search is the way to go based on others comments.
I am writing a custom search engine for my website. I am trying to make use of MySQL REGEXP feature. I would like to be able to search for a word separated by spaces to avoid the chances of getting suffixes or prefixes on a word. For example I am trying to search for "appreciate" I want appreciate, not appreciated or unappreciate or unappreciated. Any ideas on how I could do this with MySQL's REGEXP? My idea for this was to look for spaces like maybe so:
^appreciate$|^appreciate[:space:]|[:space:]appreciate$|[:space:]appreciate[:space:]
I am sure they is a better way of doing it and I have no idea if that even works
I think what you want is something like this:
SELECT 'I appreciate you' REGEXP '[[:<:]]appreciate[[:>:]]'; /* matches */
[[<:]] and [[>:]] are word boundaries. From the manual:
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
Edit: just to clarify, this also deals with situations where there's a newline character after the word, or a comma, etc
What about:
^\s*appreciate(\s+.*)*$
Between the start and the word there may be 0+ whitespace parts
then comes the word
then if something comes after that, it has to start with whitespace
You can seek for non-alphabetic characters:
[^[:alpha:]]+
... or just word boundaries:
[[:<:]]foo[[:>:]]
Before making a choice, don't forget to make some tests with commas, dots and non-English chars. Also, take into account that MySQL does not fully support regular expressions in multi-byte strings (such as UTF-8).