More efficient word boundary query in mySQL

More efficient word boundary query in mySQL - mysql

I have a table with 1/2 million phrases and I am doing word matching using this query:
SELECT * FROM `searchIndex` WHERE `indexData` RLIKE '[[:<:]]Hirt'
The indexData field has a FULLTEXT index and is datatype longtext.
I want to match on items like
"Alois M. Hirt"
"Show Biz - Al Hirt, in a new role, ..."
"Al Hirt's Sinatraville open 9 p..."
"Hirt will be playing..."
and not on "shirt" or "thirteen" or "thirty" etc.
The query is succeeding but it frequently takes 3 seconds to return and I wondered if there was a better, more efficient way of doing this word boundary match?
If I were to add another index to indexData what would be the correct keylength to use?
TIA

No need to have a FULLTEXT index. MySQL has special markers for word boundaries. From the MySQL doc:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0

setsuna's answer worked very well:
SELECT * FROM searchIndex WHERE MATCH (indexData) AGAINST ('Hirt*' IN BOOLEAN MODE);

Related

MySQL 8 + Regex Word Boundaries

I want to searching for the term 'ed' at the start or end of a word, the following SQL statement only matches a exact word match.
SELECT * FROM ul_product
where productname REGEXP '\\bed\\b'
If I do the following it gets results where ed is at the start or end of a word
SELECT * FROM ul_product
where productname REGEXP '(\\bed)|(ed\\b)'
Is this how it's supposed to work?
The description of word boundary and examples online led me to believe statement 1 would produce the results of statement 2.
I can use the statements I've created as is for my 'exact' and 'partial' matching, but is this right?

Regex '\\bed\\b' searches for 'ed' surrounded by word boundaries - in other words it searches for word 'ed'.
On the other end, regex: '(\\bed)|(ed\\b)' searches for either '\\bed' or 'ed\\b' (the pipe character stands for "or" in regexes). So it matches on 'ed' at the beginning of a word or at the end of a word - which seems to be what you want.
Note that the parentheses are not necessary here. You could just write this as:
where productname REGEXP '\\bed|ed\\b'

Search using REGEXP with no repeating characters in mysql

I have a databse of english words and i want to use regexp to search in this database
i used this query :
SELECT * FROM `english` WHERE CHAR_LENGTH(words)=4 AND `words` REGEXP '^[oofd]+$'
it is working and its selecting the words like food,foo,of,do
but it is also selecting the words like fooo with 3 o's but in my regexp there is just 2 o's
what is the right regular expression to select words with no repeating characters
if there is two character for example oo it will select just words with two o or 1 or zero ,not three or more
I looked in the internet i came with this:
^(?:([oofd])(?!.*\1))
but it gives me an error:
#1139 - Got error 'repetition-operator operand invalid' from regexp

Check for the presence of two vowels in sequence and the absence of three or more vowels in sequence:
SELECT *
FROM english
WHERE
words REGEXP '^.*[aeiou]{2}.*$' AND
words NOT REGEXP '^.*[aeiou]{3}.*$'
If you only want to look for certain repeated vowels, you should be able to adapt this answer fairly easily.

MySQL regex for word boundary containing '#'

I'm trying to search for an example phrase: '#test123' using regex like:
SELECT (...) WHERE x RLIKE '[[:<:]]#test123[[:>:]]'
With no luck. Probably the word boundary selector '[[:<:]]' does not count '#' as a word.
How to achieve it? How to set in MySQL regex word boundary selector but with exceptions?

MySQL 5.7 Reference Manual / ... / Regular Expressions:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and
end of words, respectively. A word is a sequence of word characters
that is not preceded by or followed by word characters. A word
character is an alphanumeric character in the alnum class or an
underscore (_).
So, # is a word boundary, not a word character. We need to expand "word characters" class to include # too. The simplest way is to enumerate custom word characters directly a-z0-9_#:
SELECT * FROM
(
SELECT '#test123' AS x UNION ALL
SELECT 'and #test123 too' UNION ALL
SELECT 'not#test123not' UNION ALL
SELECT 'not#test123' UNION ALL
SELECT '#test123not' UNION ALL
SELECT 'not # test123' UNION ALL
SELECT 'test123' UNION ALL
SELECT '#west123'
) t
WHERE x RLIKE '([^a-z0-9_#]|^)#test123([^a-z0-9_#]|$)';
Result:
x
----------------
#test123
and #test123 too

I think you can use below expression instead:
'[.#.][[:<:]]test123[[:>:]]'
Note: don't use non-word literals inside [[:<:]][[:>:]] and use [..] for characters.
Or (with thanks to #Y.B.)
'(^|.*[^a-zA-Z0-9_])[.#.][[:<:]]test123[[:>:]]'

Mysql REGEXP select numbers starting with given digit(s)

my table has a column with comma-separated (and eventually a space, too) numbers; those numbers can have from five to twelve digits.
9645811, 9646011,9645911, 9646111
or
41031, 41027, 559645811, 5501006009
I need to select the rows with that column containing a number STARTING with given digits. In the above examples, only the first has to be selected. What I've tried so far:
SELECT myfield FROM mytable
WHERE myfield REGEXP ('(^|[,\s]+)(96458[\d]*)([,\s]*|$)');
However the query returns no results. I'd like to select only the first row, where there is a number STARTING with 96458.
Any help would be appreciated :)

You need to use a starting word boundary [[:<:]]:
SELECT myfield FROM mytable WHERE myfield REGEXP ('[[:<:]]96458');
See the MySQL regex syntax for more details.
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively.
See this SQL fiddle.

MySQL query, checking for single vs double digits

I have a column in my client-provided database that has values such as '2; 3; 14' or '1', etc. I am using MySQL. How do I write the query so that
1) I can check if the column contains a number (1, for example)
2) I won't get a 'hit' if I am checking for a '1' and the value is actually '14', for example.
Thanks is advance

If column is varchar and you want to return row while searching for '1' in '1;3;14' then you can use REGEXP operator for regular expression search with word boundary character .
select * from test
where col regexp '[[:<:]]1[[:>:]]'
SQL FIddle Demo
From MySQL docs
Word Boundary Markers [[:<:]], [[:>:]]
These markers stand for word boundaries.
They match the beginning and end of words, respectively.
A word is a sequence of word characters that is not preceded by or followed by word characters.
A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

More efficient word boundary query in mySQL - mysql

setsuna's answer worked very well: SELECT * FROM searchIndex WHERE MATCH (indexData) AGAINST ('Hirt*' IN BOOLEAN MODE);

Related

MySQL 8 + Regex Word Boundaries

Search using REGEXP with no repeating characters in mysql

MySQL regex for word boundary containing '#'

Mysql REGEXP select numbers starting with given digit(s)

MySQL query, checking for single vs double digits

Categories

Resources