Mysql query – Word begins with within the content - mysql

I'm attempting to replicate the following regex pattern in a MySql query: http://regexr.com/3gt57
I'm unable to use a like as I need to match words that begin with the submitted term but don't necessarily contain the term.
I can't seem to use the pattern in a REGEXP query:
SELECT * FROM serialised_post
WHERE post_content REGEXP '\bSugg\S*';
Any help would be appreciated

In MySQL regexp, \S = [^[:space:]] (a negated bracket expression matching any char other than a whitespace char ([:space:] is a POSIX character class matching any whitespace)) and \b (here, a leading word boundary) is [[:<:]].
Use
WHERE post_content REGEXP '[[:<:]]Sugg[^[:space:]]*';
See more details about MySQL regex syntax here.

Related

SQL Regex last character search not working

I'm using regex to find specific search but the last separator getting ignore.
Must search for |49213[A-Z]| but searches for |49213[A-Z]
SELECT * FROM table WHERE (data REGEXP '/\|49213[A-Z]+\|/')
Why are you using | in the pattern? Why the +?
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]\|')
If you want multiple:
SELECT * FROM table WHERE (data REGEXP '\|49213[A-Z]+\|')
or:
SELECT * FROM table WHERE (data REGEXP '[|]49213[A-Z][|]')
Aha. That is rather subtle.
\ escapes certain characters that have special meaning.
But it does not seem to do so for | ("or") or . ("any byte"), etc.
So, \| is the same as |.
But the regexp parser does not like having either side of "or" being empty. (I suspect this is a "bug"). Hence the error message.
https://dev.mysql.com/doc/refman/5.7/en/regexp.html says
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2 that contains the special + character, only the last of the following regular expressions is the correct one:
The best fix seems to be [|] or \\| instead of \| when you want the pipe character.
Someday, the REGEXP parser in MySQL will be upgraded to PCRE as in MariaDB. Then a lot more features will come, and this 'bug' may go away.

How to search for text in a mysql column and also include punctuation

My sql search utilizes the regular expression:
$sql = " select * from table where regexp CONCAT('[[:<:]]', :searchTerm, '[[:>:]]') ";
:searchTerm is a bound search variable.
Currently, If execute my search for the term "well", a bunch of results come up. However, if I search for "Well," with a comma, the query result is empty.
I am assuming that this happens because '[[:<:]]' and '[[:>:]]' are word boundary markers, and thus, all punctuation is ignored.
Does anyone know how I can structure the query to also include punctuation?
You can use the built-in punctuation character class [:punct:] to match a punctuation character followed a ? which will match zero or one of the character or character class before it -- in this case it would be the character class you defined.
For example:
$sql = "select * from table where regexp CONCAT('[[:<:]]', :searchTerm, '[[:punct:]]?[[:>:]]') ";
Here's fair warning, I don't have much experience with MySql and I didn't text out the example above. This should work, or at least point you in the right direction. You can find out more about the Regular Expression Operators section of the mysql documentation.
Additionally, I'm not sure what characters are included in the built-in punctuation character class, but you can define your own using square brackets and just include the punctuation characters you want. For example [,.!?;:]

Mysql regex error #1139 using literal -

I tried running this query:
SELECT column FROM table WHERE column REGEXP '[^A-Za-z\-\']'
but this returns
#1139 - Got error 'invalid character range' from regexp
which seems to me like the - in the character class is not being escaped, and instead read as an invalid range. Is there some other way that it's suppose to be escaped for mysql to be the literal -?
This regex works as expected outside of mysql, https://regex101.com/r/wE8vY5/1.
I came up with an alternative to that regex which is
SELECT column FROM table WHERE column NOT REGEXP '([:alpha:]|-|\')'
so the question isn't how do I get this to work. The question is why doesn't the first regex work?
Here's a SQL fiddle of the issue, http://sqlfiddle.com/#!9/f8a006/1.
Also, there is no language being used here, query is being run at DB level.
Regex in PHP: http://sandbox.onlinephpfunctions.com/code/10f5fe2939bdbbbebcc986c171a97c0d63d06e55
Regex in JS: https://jsfiddle.net/6ay4zmrb/
Just change the order.
SELECT column FROM table WHERE column REGEXP '[^-A-Za-z\']'
#Avinash Raj is correct the - must be first (or last). The \ is not an escape character in POSIX, which is what mysql uses, https://dev.mysql.com/doc/refman/5.1/en/regexp.html.
One key syntactic difference is that the backslash is NOT a metacharacter in a POSIX bracket expression.
-http://www.regular-expressions.info/posixbrackets.html
What special characters must be escaped in regular expressions?
Inside character classes, the backslash is a literal character in POSIX regular expressions. You cannot use it to escape anything. You have to use "clever placement" if you want to include character class metacharacters as literals. Put the ^ anywhere except at the start, the ] at the start, and the - at the start or the end of the character class to match these literally

MySQL REGEXP word boundaries [[:<:]] [[:>:]] and double quotes

I'm trying to match some whole-word-expressions with the MySQL REGEXP function. There is a problem, when there are double quotes involved.
The MySQL documentation says: "To use a literal instance of a special character in a regular expression, precede it by two backslash () characters."
But these queries all return 0:
SELECT '"word"' REGEXP '[[:<:]]"word"[[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]]\"word\"[[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]]\\"word\\"[[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]] word [[:>:]]'; -> 0
SELECT '"word"' REGEXP '[[:<:]][[.".]]word[[.".]][[:>:]]'; -> 0
What else can I try to get a 1? Or is this impossible?
Let me quote the documentation first:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and
end of words, respectively. A word is a sequence of word characters
that is not preceded by or followed by word characters. A word
character is an alphanumeric character in the alnum class or an
underscore (_).
From the documentation we can see the reason behind your problem and it is not caused by escaping whatsoever. The problem is that you are trying to match the word boundary [[:<:]] right at the beginning of the string which won't work because a word boundary as you can see from the documentation separates a word character from a non-word character, but in your case the first character is a " which isn't a word character so there is no word boundary, the same goes for the last " and [[:>:]].
In order for this to work, you need to change your expression a bit to this one:
"[[:<:]]word[[:>:]]"
^^^^^^^ ^^^^^^^
Notice how the word boundary separates a non-word character " from a word character w in the beginning and a " from d at the end of the string.
EDIT: If you always want to use a word boundary at the start and end of the string without knowing if there will be an actual boundary then you might use the following expression:
([[:<:]]|^)"word"([[:>:]]|$)
This will either match a word boundary at the beginning or the start-of-string ^ and the same for the end of the word boundary or end-of-string. I really advise you to study the data you are trying to match and look for common patterns and don't use regular expressions if they are not the right tool for the job.
SQL Fiddle Demo
In MySQL up from 8.0.4 use: \\bword\\b
ref. https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-compatibility
In MySQL 8 and above
Adding to Oleksiy Muzalyev's answer
https://dev.mysql.com/doc/refman/8.0/en/regexp.html#regexp-compatibility
In MySQL 8.04 and above, you have to use:
\bword\b
Where \b represents the ICU variant for word boundary. The previous Spencer library uses [[:<:]] to represent a word boundary.
When actually using this as part of a query, I've had to escape the escape character \ so my query actually looked like
SELECT * FROM table WHERE field RLIKE '\\bterm\\b'
When querying from PHP, use SINGLE quotes to do the same thing
$sql = 'SELECT * FROM table WHERE field RLIKE ?';
$args = ['\\bterm\\b'];
...
You need to be a little more sophisticated:
SELECT '"word"' REGEXP '"word"'; --> 1
SELECT '"This is" what I need' REGEXP '"This is" what I need[[:>:]]'; --> 1
That is,
If the test string begins/ends with a 'letter', the precede/follow the string with [[:<:]]/[[:>:]].
This is as opposed to blindly tacking those onto the string. After all, you are already inspecting the search string for special regexp characters to escape them. This is just another task in that vein. The definition of 'letter' should match whatever the word-boundary tokens look for.

How do you find words with hyphens in a MYSQL REGEXP query using word boundries?

I have a MYSQL query to try to find words with hyphens. I am using the MYSQL word boundary.
SELECT COUNT(id)
AS count
FROM table
WHERE (name REGEXP '^[[<:]]some-words-with-hyphens[[:>:]]/')
This seems to work, although the following does not (see the - after the word "hyphens"):
SELECT COUNT(id)
AS count
FROM table
WHERE (words REGEXP '^[[<:]]some-words-with-hyphens-[[:>:]]/')
I tried to escape the -'s with \- but that did not seem to change the result. I also tried to put the - in brackets like [-], but that did not seem to change the result.
What would be the proper way to write this query with the understanding that hyphens will be within and possibly at the end of the "word"?
As documented under Regular Expressions:
A regular expression for the REGEXP operator may use any of the following special characters and constructs:
[ deletia ]
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
Since - and / are both non-word characters, the [[:>:]] construct does not match the point between them.
It's not clear why you're using these constructs at all, as the following ought to do the trick:
words REGEXP '^some-words-with-hyphens-/'
Indeed, it's not clear why you're even using regular expressions in this case, as simple pattern matching can achieve the same:
words LIKE 'some-words-with-hyphens-/%'
Assuming that some-words-with-hyphens is actually a regex and not some verbatim text, you could simply add an optional - at the end of the regex in order to match a trailing dash if it's present:
WHERE (name REGEXP '^[[<:]]some-words-with-hyphens[[:>:]]-?/')
#eggyal has already explained why the word boundary matches before that hyphen.