MySQL regex for word boundary containing '#' - mysql

I'm trying to search for an example phrase: '#test123' using regex like:
SELECT (...) WHERE x RLIKE '[[:<:]]#test123[[:>:]]'
With no luck. Probably the word boundary selector '[[:<:]]' does not count '#' as a word.
How to achieve it? How to set in MySQL regex word boundary selector but with exceptions?

MySQL 5.7 Reference Manual / ... / Regular Expressions:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and
end of words, respectively. A word is a sequence of word characters
that is not preceded by or followed by word characters. A word
character is an alphanumeric character in the alnum class or an
underscore (_).
So, # is a word boundary, not a word character. We need to expand "word characters" class to include # too. The simplest way is to enumerate custom word characters directly a-z0-9_#:
SELECT * FROM
(
SELECT '#test123' AS x UNION ALL
SELECT 'and #test123 too' UNION ALL
SELECT 'not#test123not' UNION ALL
SELECT 'not#test123' UNION ALL
SELECT '#test123not' UNION ALL
SELECT 'not # test123' UNION ALL
SELECT 'test123' UNION ALL
SELECT '#west123'
) t
WHERE x RLIKE '([^a-z0-9_#]|^)#test123([^a-z0-9_#]|$)';
Result:
x
----------------
#test123
and #test123 too

I think you can use below expression instead:
'[.#.][[:<:]]test123[[:>:]]'
Note: don't use non-word literals inside [[:<:]][[:>:]] and use [..] for characters.
Or (with thanks to #Y.B.)
'(^|.*[^a-zA-Z0-9_])[.#.][[:<:]]test123[[:>:]]'

Related

MySQL 8 + Regex Word Boundaries

I want to searching for the term 'ed' at the start or end of a word, the following SQL statement only matches a exact word match.
SELECT * FROM ul_product
where productname REGEXP '\\bed\\b'
If I do the following it gets results where ed is at the start or end of a word
SELECT * FROM ul_product
where productname REGEXP '(\\bed)|(ed\\b)'
Is this how it's supposed to work?
The description of word boundary and examples online led me to believe statement 1 would produce the results of statement 2.
I can use the statements I've created as is for my 'exact' and 'partial' matching, but is this right?
Regex '\\bed\\b' searches for 'ed' surrounded by word boundaries - in other words it searches for word 'ed'.
On the other end, regex: '(\\bed)|(ed\\b)' searches for either '\\bed' or 'ed\\b' (the pipe character stands for "or" in regexes). So it matches on 'ed' at the beginning of a word or at the end of a word - which seems to be what you want.
Note that the parentheses are not necessary here. You could just write this as:
where productname REGEXP '\\bed|ed\\b'

Mysql regex search with digit

How would I do the following in mysql?
SELECT * FROM table WHERE search REGEXP '.+season\d+\s?.+' limit 10;
I want to match something like:
"hello this is season1 how are you?"
But not:
"hello this is season1episode1 how are you?
You can use the following regular expression since \d and \s are not available on MySQL. You can use character classes instead.
You can replace \d with [[:digit:]] or [0-9] and \s with [[= =]] or [ ].
SELECT * FROM table WHERE search REGEXP '.+season[[:digit:]]+[[= =]].+' LIMIT 10
-- or...
SELECT * FROM table WHERE search REGEXP '.+season[0-9]+[ ].+' LIMIT 10
demo on dbfiddle.uk
Before MySQL 8.0,
REGEXP "season[0-9]+[[:>:]]"
meaning "season", at least one digit, then a word boundary. Note that it will stop with punctuation.
REGEXP "season[0-9]+[^a-zA-Z]"
Might work for you -- it says that it should be followed by a letter.
8.0 changes the word boundary to:
REGEXP "season[0-9]+\b"
(Caveat: the backslash may need to be doubled up.)

MySQL - query to get all rows that a specific character is non-English

I have a table that has nvarchar elements.
This table has two kinds of elements:
elements with only digit characters
elements with digit characters and the 3rd character is non-English character
I want a query to get all rows that their 3rd character is non-English.
EDIT
use WHERE SUBSTRING(<table>.ColumnName, 3, 1) NOT BETWEEN '0' AND '9' worked for me either
I'd use regexp_like with a regex that the third character isn't a digit:
SELECT *
FROM mytable
WHERE REGEXP_LIKE(mycol, '..[^[:digit:]].*')
In MySQL versions older than 8.0, you could use the regexp operator:
SELECT *
FROM mytable
WHERE mycol REGEXP '..[^[:digit:]].*'
You can use RLIKE operator, below is the query for matching the third character which is not a digit and not an English alphabet
SELECT * FROM
mytable
where SUBSTR(mycol,3,1) NOT RLIKE '^[A-Za-z0-9]$';

Special chars in SQL regex - match word boundary with special chars

I've got a search function which creates query. My goal is to search for exact word, so if the phrase is 'hello' it should return only results with 'hello' (not with 'xhello', 'helloxx' etc). My code looks like:
SELECT (...) WHERE x RLIKE '[[:<:]]word[[:>:]]'
And it works for most of the cases, BUT
the problem starts when the phrase is f.e. '$hello', or 'helloĊ' etc - the special chars ruin the functionality.
Is there a way to handle it ?
Try
SELECT * FROM table WHERE x RLIKE '(^|[[:space:]])Hello([[:space:]]|$)'
or
SELECT * FROM table WHERE x RLIKE '(^| )Hello( |$)'
or
SELECT * FROM table WHERE x REGEXP '(^|[[:space:]])Hello([[:space:]]|$)'
or
SELECT * FROM test WHERE name REGEXP '(^| )Hello( |$)'

More efficient word boundary query in mySQL

I have a table with 1/2 million phrases and I am doing word matching using this query:
SELECT * FROM `searchIndex` WHERE `indexData` RLIKE '[[:<:]]Hirt'
The indexData field has a FULLTEXT index and is datatype longtext.
I want to match on items like
"Alois M. Hirt"
"Show Biz - Al Hirt, in a new role, ..."
"Al Hirt's Sinatraville open 9 p..."
"Hirt will be playing..."
and not on "shirt" or "thirteen" or "thirty" etc.
The query is succeeding but it frequently takes 3 seconds to return and I wondered if there was a better, more efficient way of doing this word boundary match?
If I were to add another index to indexData what would be the correct keylength to use?
TIA
No need to have a FULLTEXT index. MySQL has special markers for word boundaries. From the MySQL doc:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
setsuna's answer worked very well:
SELECT * FROM searchIndex WHERE MATCH (indexData) AGAINST ('Hirt*' IN BOOLEAN MODE);