MySQL 8 + Regex Word Boundaries - mysql

I want to searching for the term 'ed' at the start or end of a word, the following SQL statement only matches a exact word match.
SELECT * FROM ul_product
where productname REGEXP '\\bed\\b'
If I do the following it gets results where ed is at the start or end of a word
SELECT * FROM ul_product
where productname REGEXP '(\\bed)|(ed\\b)'
Is this how it's supposed to work?
The description of word boundary and examples online led me to believe statement 1 would produce the results of statement 2.
I can use the statements I've created as is for my 'exact' and 'partial' matching, but is this right?

Regex '\\bed\\b' searches for 'ed' surrounded by word boundaries - in other words it searches for word 'ed'.
On the other end, regex: '(\\bed)|(ed\\b)' searches for either '\\bed' or 'ed\\b' (the pipe character stands for "or" in regexes). So it matches on 'ed' at the beginning of a word or at the end of a word - which seems to be what you want.
Note that the parentheses are not necessary here. You could just write this as:
where productname REGEXP '\\bed|ed\\b'

Related

How to find variable pattern in MySql with Regex?

I am trying to pull a product code from a long set of string formatted like a URL address. The pattern is always 3 letters followed by 3 or 4 numbers (ex. ???### or ???####). I have tried using REGEXP and LIKE syntax, but my results are off for both/I am not sure which operators to use.
The first select statement is close to trimming the URL to show just the code, but oftentimes will show a random string of numbers it may find in the URL string.
The second select statement is more rudimentary, but I am unsure which operators to use.
Which would be the quickest solution?
SELECT columnName, SUBSTR(columnName, LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", columnName), LENGTH(columnName) - LOCATE(columnName REGEXP "[^=\-][a-zA-Z]{3}[\d]{3,4}", REVERSE(columnName))) AS extractedData FROM tableName
SELECT columnName FROM tableName WHERE columnName LIKE '%___###%' OR columnName LIKE '%___####%'
-- Will take a substring of this result as well
Example Data:
randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc
In this case, the desired string is "xyz123" and the location of said pattern is variable based on each entry.
EDIT
SELECT column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), SUBSTR(column, LOCATE(column REGEXP "([a-zA-Z]{3}[0-9]{3,4}$)", column), LENGTH(column) - LOCATE(column REGEXP "^.*[a-zA-Z]{3}[0-9]{3,4}", REVERSE(column))) AS extractData From mainTable
This expression is still not grabbing the right data, but I feel like it may get me closer.
I suggest using
REGEXP_SUBSTR(column, '(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])')
Details:
(?<=[&?]random_code=[^&#]{0,256}-) - immediately on the left, there must be & or &, random_code=, and then zero to 256 chars other than & and # followed with a - char
[a-zA-Z]{3} - three ASCII letters
[0-9]{3,4} - three to four ASCII digits
(?![^&#]) - that are followed either with &, # or end of string.
See the online demo:
WITH cte AS ( SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz123&hello_world=us&etc_etc' val
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz4567&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz89&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-xyz00000&hello_world=us&etc_etc'
UNION ALL
SELECT 'randomwebsite.com/3982356923abcd1ab?random_code=12480712_ABC_DEF_ANOTHER_CODE-aaaaa11111&hello_world=us&etc_etc')
SELECT REGEXP_SUBSTR(val,'(?<=[&?]random_code=[^&#]{0,256}-)[a-zA-Z]{3}[0-9]{3,4}(?![^&#])') output
FROM cte
Output:
I'd make use of capture groups:
(?<=[=\-\\])([a-zA-Z]{3}[\d]{3,4})(?=[&])
I assume with [^=\-] you wanted to capture string with "-","\" or "=" in front but not include those chars in the result. To do that use "positive lookbehind" (?<=.
I also added a lookahead (?= for "&".
If you'd like to fidget more with regex I recommend RegExr

MySQL command to get first letter of last name

Hello I have made a dummy table that I am practicing with and I am trying to get the lasts name first letter for example. Aba Kadabra and Alfa Kadabra the last letter of their last name is 'K' so when I was testing some queries such as...
select * from employees
where full_name like 'K%'
select * from employees
where full_name like 'K%'
Neither of these worked. Can anyone tell me the best way to accomplish this?
Because % works that way. See here
So, 'K%' just brings all full_name that start with K.
and '%K' brings all full_name that end with K.
What you need is '% K%', test it please.
MySQL LIKE operator checks whether a specific character string matches
a specified pattern.
The LIKE operator does a pattern matching comparison. The operand to
the right of the LIKE operator contains the pattern and the left hand
operand contains the string to match against the pattern. A percent
symbol ("%") in the LIKE pattern matches any sequence of zero or more
characters in the string. An underscore ("_") in the LIKE pattern
matches any single character in the string. Any other character
matches itself or its lower/upper case equivalent (i.e.
case-insensitive matching). (A bug: SQLite only understands
upper/lower case for ASCII characters by default. The LIKE operator is
case sensitive by default for unicode characters that are beyond the
ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ'
LIKE 'Æ' is FALSE.)
You can use below query:
select * from table where full_name like '% K%'

Search using REGEXP with no repeating characters in mysql

I have a databse of english words and i want to use regexp to search in this database
i used this query :
SELECT * FROM `english` WHERE CHAR_LENGTH(words)=4 AND `words` REGEXP '^[oofd]+$'
it is working and its selecting the words like food,foo,of,do
but it is also selecting the words like fooo with 3 o's but in my regexp there is just 2 o's
what is the right regular expression to select words with no repeating characters
if there is two character for example oo it will select just words with two o or 1 or zero ,not three or more
I looked in the internet i came with this:
^(?:([oofd])(?!.*\1))
but it gives me an error:
#1139 - Got error 'repetition-operator operand invalid' from regexp
Check for the presence of two vowels in sequence and the absence of three or more vowels in sequence:
SELECT *
FROM english
WHERE
words REGEXP '^.*[aeiou]{2}.*$' AND
words NOT REGEXP '^.*[aeiou]{3}.*$'
If you only want to look for certain repeated vowels, you should be able to adapt this answer fairly easily.

Mysql REGEXP select numbers starting with given digit(s)

my table has a column with comma-separated (and eventually a space, too) numbers; those numbers can have from five to twelve digits.
9645811, 9646011,9645911, 9646111
or
41031, 41027, 559645811, 5501006009
I need to select the rows with that column containing a number STARTING with given digits. In the above examples, only the first has to be selected. What I've tried so far:
SELECT myfield FROM mytable
WHERE myfield REGEXP ('(^|[,\s]+)(96458[\d]*)([,\s]*|$)');
However the query returns no results. I'd like to select only the first row, where there is a number STARTING with 96458.
Any help would be appreciated :)
You need to use a starting word boundary [[:<:]]:
SELECT myfield FROM mytable WHERE myfield REGEXP ('[[:<:]]96458');
See the MySQL regex syntax for more details.
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively.
See this SQL fiddle.

More efficient word boundary query in mySQL

I have a table with 1/2 million phrases and I am doing word matching using this query:
SELECT * FROM `searchIndex` WHERE `indexData` RLIKE '[[:<:]]Hirt'
The indexData field has a FULLTEXT index and is datatype longtext.
I want to match on items like
"Alois M. Hirt"
"Show Biz - Al Hirt, in a new role, ..."
"Al Hirt's Sinatraville open 9 p..."
"Hirt will be playing..."
and not on "shirt" or "thirteen" or "thirty" etc.
The query is succeeding but it frequently takes 3 seconds to return and I wondered if there was a better, more efficient way of doing this word boundary match?
If I were to add another index to indexData what would be the correct keylength to use?
TIA
No need to have a FULLTEXT index. MySQL has special markers for word boundaries. From the MySQL doc:
[[:<:]], [[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
setsuna's answer worked very well:
SELECT * FROM searchIndex WHERE MATCH (indexData) AGAINST ('Hirt*' IN BOOLEAN MODE);