Search for “whole word match” in MySQL [duplicate] - mysql

This question already has answers here:
Whole word matching with dot characters in MySQL
(5 answers)
Closed 2 years ago.
I want search the exact word using select query in mysql
eg: My table column content
"This is a sample mail to test Auto Decline Invitation."
Qry:
SELECT * FROM `test` where text REGEXP '[[:<:]]Invitation.[[:>:]]'
In above example i need to select all records match with 'Invitation.'

Instead of using REGEXP, you could also use the LIKE pattern matching operator.
A sample query could be:
SELECT * FROM `test` WHERE `text` LIKE '%Invitation.%';
Edit
Otherwise, if LIKE doesn't match your requirements, you can of course use REGEXP.
For a REGEXP (MySQL 5.7) expression, you'll want to use (mentioned by Wiktor):
SELECT * FROM `test` WHERE `text` REGEXP '[[:<:]]Invitation[.]';
For a REGEXP (MySQL 8.0) expression, you'll want to use:
SELECT * FROM `test` WHERE `text` REGEXP '\\bInvitation\\.';
The [[:<:]] & [[:>:]], and \b operators offer similar functionality for their boundaries. MySQL 5.7 is a little bit more explicit, as you can see per the documentation here at the bottom of the page. With MySQL 8.0, it supports the International Components for Unicode (ICU), as opposed to 5.7 that uses Henry Spencer's implementation for regular expressions.
From the MySQL 8.0 docs:
MySQL implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. (Prior to MySQL 8.0.4, MySQL used Henry Spencer's implementation of regular expressions, which operates in byte-wise fashion and is not multibyte safe.
If you do a search on this documentation page for \b, you'll see some clarification between the difference of ICU vs. Spencer regular expression handling:
The Spencer library supports word-beginning and word-end boundary markers ([[:<:]] and [[:>:]] notation). ICU does not. For ICU, you can use \b to match word boundaries; double the backslash because MySQL interprets it as the escape character within strings.
Bit of a learning experience for me too, thanks Wiktor!

Related

MySQL Regular expression with alternation group not working

I'm trying to match this string "محمد مصلح حسن القطان" from a column in MySQL table using regular expression which have different alternations of the letter "ا". I have tried this
SELECT caseTitle FROM cases where caseTitle REGEXP 'قط([ا|أ|آ|إ])ن';
For some reason it doesn't work, when I try this
SELECT caseTitle FROM cases where caseTitle REGEXP 'قط([ا|أ|آ|إ])';
It works and matches the string, I'm using Google Cloud SQL with version 5.7 and unfortunately, I can't define custom collation for Arabic letters which should have solved my problem so I had to use regular expressions.

Types of Wildcards in MySql

My query:
Select * From tableName Where columnName Like "[PST]%"
is not giving the expected result.
Why does this wildcard not work in MySql?
If you want to filter on strings that contain any 'P', 'S', or 'T', then you can use a regex:
where col rlike '[PST]'
If you want strings that contain substring 'PST', then no need for square brackets - and like is enough:
where col like '%PST%'
If you want the matching character(s) at the start of the string, then the regex solution looks like:
where col rlike '^PST'
And the like option would be:
where col like 'PST%'
MySQL's LIKE syntax is documented here: https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
Standard SQL from decades ago defined only two wildcards: % and _. These are the only wildcards an SQL product needs to support if they want to say they are SQL compliant and support the LIKE predicate.
% matches zero or more of any characters. It's analogous to .* in regular expressions.
_ matches exactly one of any character. It's analogous to . in regular expressions.
Also if you want to match a literal '%' or '_', you need to escape it, i.e. put a backslash before it:
WHERE title LIKE 'The 7\% Solution'
Microsoft SQL Server's LIKE syntax is documented here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15
They support % and _ wildcards, and the \ escape character, but they extend standard SQL with two other forms:
[a-z] matches one character, but only characters in the range inside the brackets. This is similar in regular expressions. The - is a range operator, unless it appears at the start or end of the string inside the brackets.
[^a-z] matches one character, which must not be one of the characters in the range inside the brackets. Also the same in regular expressions.
These are not standard forms of wildcards for the LIKE predicate, and other brands of SQL database don't support them.
Later versions of the SQL standard introduced a new predicate SIMILAR TO which supports much richer patterns and wildcards, since the right-side operand is a string which contains a regular expression. But since this predicate was introduced in a later edition of the SQL standard, some implementations had already developed their own solution that was almost the same.
MySQL called the operator REGEXP and RLIKE is a synonym (https://dev.mysql.com/doc/refman/8.0/en/regexp.html).
It was requested in https://bugs.mysql.com/bug.php?id=746 to support SIMILAR TO syntax to help MySQL comply with the SQL standard, but the request was turned down, because it had subtly different behavior to the existing REGEXP/RLIKE operator.
Microsoft SQL Server has partial support of regular expression wildcards in the LIKE operator, and also a dbo.RegexMatch() function.
SQLite has a GLOB operator, and so on.
Thanks everyone!
For specific this question, we need to use regexp
Select * From tableName Where ColumnName Regexp "^[PST]";
For more detail over Regular Expression i.e Regexp :
https://www.youtube.com/watch?v=KoltE-JUY0c

MySQL query to find matching string using REGEXP not working

I am using MySQL 5.5.
I have a table named nutritions, having a column serving_data with text datatype.
Some of the values in serving_data column are like:
[{"label":"1 3\/4 cups","unit":"3\/4 cups"},{"label":"1 cups","unit":"3\/4 cups"},{"label":"1 container (7 cups ea.)","unit":"3\/4 cups"}]
Now, I want to find records containing serving_data like 1 3\/4 cups .
For that I've made a query,
SELECT id,`name`,`nutrition_data`,`serving_data`
FROM `nutritions` WHERE serving_data REGEXP '(\d\s\\\D\d\scup)+';
But is seems not working.
Also I've tried
SELECT id,`name`,`nutrition_data`,`serving_data`
FROM `nutritions` WHERE serving_data REGEXP '/(\d\s\\\D\d\scup)+/g';
If I use the same pattern in http://regexr.com/ then it seems matching.
Can anyone help me?
Note that in MySQL regex, you cannot use shorthand classes like \d, \D or \s, replace them with [0-9], [^0-9] and [[:space:]] respectively.
You may use
REGEXP '[0-9]+[[:space:]][0-9]+\\\\/[0-9]+[[:space:]]+cup'
See the regex demo (note that in general, regex101.com does not support MySQL regex flavor, but the PCRE option supports the POSIX character classes like [:digit:], [:space:], so it is only used for a demo here, not as a proof it works with MySQL REGEXP).
Pattern details:
[0-9]+ - 1 or more digits
[[:space:]] - a whitespace
[0-9]+- 1 or more digits
\\\\/ - a literal \/ char sequence
[0-9]+[[:space:]]+cup - 1 or more digits, 1 or more whitespaces, cup.
Note that you may precise the word cup with a word boundary, add a [[:>:]] pattern after it to match a cup as a whole word.

mysql regex utf-8 characters

I am trying to get data from MySQL database via REGEX with or without special utf-8 characters.
Let me explain on example :
If user enters word like sirena it should return rows which include words like sirena,siréna,šíreňá .. and so on..
also it should work backwards when he enters siréná it should return the same results..
I am trying to search it via REGEX, my query looks like this :
SELECT * FROM `content` WHERE `text` REGEXP '[sšŠ][iíÍ][rŕŔřŘ][eéÉěĚ][nňŇ][AaáÁäÄ0]'
It works only when in database is word sirena but not when there is word siréňa..
Is it because something with UTF-8 and MySQL? (collation of mysql column is utf8_general_ci)
Thank you!
MySQL's regular expression library does not support utf-8.
See Bug #30241 Regular expression problems, which has been open since 2007. They will have to change the regular expression library they use before that can be fixed, and I haven't found any announcement of when or if they will do this.
The only workaround I've seen is to search for specific HEX strings:
mysql> SELECT * FROM `content` WHERE HEX(`text`) REGEXP 'C3A9C588';
+----------+
| text |
+----------+
| siréňa |
+----------+
Re your comment:
No, I don't know of any solution with MySQL.
You might have to switch to PostgreSQL, because that RDBMS supports \u codes for UTF characters in their regular expression syntax.
Try something like ... REGEXP '(a|b|[ab])'
SELECT * FROM `content` WHERE `text` REGEXP '(s|š|Š|[sšŠ])(i|í|Í|[iíÍ])(r|ŕ|Ŕ|ř|Ř|[rŕŔřŘ])(e|é|É|ě|Ě|[eéÉěĚ])(n|ň|Ň|[nňŇ])(A|a|á|Á|ä|Ä|0|[AaáÁäÄ0])'
It works for me!
Use the lib_mysqludf_preg library from the mysql UDF repository for PCRE regular expressions directly in mysql
Although MySQL's regular expression library does not support utf-8 the mysql UDF repository has the ability to use utf-8 compatible regex according PCRE regular expressions directly in mysql.
http://www.mysqludf.org/
https://github.com/mysqludf/lib_mysqludf_preg#readme

MySQL query with non-printing characters (left-to-right mark)

I just found myself lost in the interesting situation that I need to query MySQL for fields containing a so called Left-to-right mark.
As the nature of this character is to be non-printing, thus invisible, I'm unable to simply copy/paste it into a query.
As mentioned in the linked Wikipedia article, the Left-to-right mark is Unicode character U+200F, which is a fact that I'm sure is the key to success in my current adventure.
My question is: How do I use raw Unicode in a MySQL query? Something along the lines of:
SELECT * FROM users WHERE username LIKE '%\U+200F%'
or
SELECT * FROM users WHERE username REGEXP '\U+200F'
or whatever the correct syntax for Unicode in MySQL is and depending on whether this is supported with LIKE and/or REGEXP.
To get a unicode char, something like this should work:
SELECT CHAR(<number> USING utf8);
Also, don't use REGEXP, because the regexp lib used by MySQL is very old, and doesn't support multi-byte charsets.