Convert and extract row using mysql and regular expression pattern - mysql

How do I to extract a converted value from my db?
SELECT name FROM clients LIMIT 1;
It will list: "13's Automotors"
But I want to show "13SAUTOMOTORS", listing only letters and numbers without spaces.
Server type: Percona Server Server version: 5.6.40-84.0-log - Percona
Server (GPL), Release 84.0, Revision 47234b3 Protocol version: 10

Newer versions (MySQL 8 and MariaDB 10+) support the REGEXP_REPLACE() function.
SELECT regexp_replace(name, '[^\\d\\w]', '') as converted_name
FROM clients
will replace all non-digit and non-word characters with an empty string.
If you need the result in uppercase, use UPPER()
SELECT upper(regexp_replace(name, '[^\\d\\w]', '')) as converted_name
FROM clients
db<>fiddle demo
If your version doesn't support REGEXP_REPLACE(), consider to do the conversion in your application language. Since you've tagged your question with mysqli, I assume that you are using PHP. Then you can use pred_replace() and strtoupper():
$row['converted_name'] = preg_replace('/[^\\d\\w]/', '', $row['name']);
$row['converted_name'] = strtoupper($row['converted_name']);
rextester demo

In older versions of MySQL you can use a bunch of nested replace operations for this.
SELECT UPPER(REPLACE(REPLACE('3''s Automotors', ' ',''),'''','')) val
It's a bit nasty because you need a nested REPLACE for each possible character you want to remove, but it works.
Notice you must represent single-quote characters ' in string constants in MySQL queries by doubling them. So, '''' represents the single character ' as a string constant.

Related

Difference in behavior on CONCAT_WS between systems

So, this is a simple situation but I wanted to understand what's causing this issue. I have the following code (modified for example):
SELECT `Transactions`.*, CONCAT_WS(" ", `People`.`first_name`, `People`.`last_name`) AS full_name ...
On my local machine I have:
Windows 10
Apache 2.4.25
PHP 7.4.11
MySQL 5.7.25
With that combination the following code works fine.
On the remote server I have:
Ubuntu 20.04.1 LTS
Apache 2.4.41
PHP 7.4.3
MySQL 8.0.19
So, I have a section which uses a data table and the data table uses server-side processing to obtain the information. On my local it shows the information correctly, but on my remote server I always got an empty array. So I tried executing the same SQL command in my remote server and I got this error:
#1054 - Unknown column ' ' in 'field list'
My SQL was correctly formed so I thought maybe the problem was related to the CONCAT_WS function.
So I decided to modify it to:
SELECT `Transactions`.*, CONCAT_WS(' ', `People`.`first_name`, `People`.`last_name`) AS full_name ...
I basically changed CONCAT_WS(" ", to CONCAT_WS(' ', and the code worked as intended.
I am not sure if this affects in some way, but is this a MySQL change in requirements for the usage of CONCAT_WS or something else?
Is it ok if I use it with single quotes elsewhere?
I suggest you run this on both systems:
SELECT ##sql_mode;
You will find that on your 8.0 server, the sql mode includes either the modes ANSI or ANSI_QUOTES.
Explanation:
The double-quotes have different meanings in MySQL depending on which sql mode is in effect.
By default, double-quotes are the same as single-quotes: they delimit a string literal.
If the sql mode is ANSI or ANSI_QUOTES, then the double-quotes delimit an identifier, acting like the back-ticks.
So the same code can behave differently on different MySQL instances. This has nothing to do with the difference between 5.7 and 8.0, because the sql mode behaves the same on these two versions. Neither version enables the ANSI or ANSI_QUOTES modes by default, so you or someone else must have enabled that mode on your 8.0 server.
This is why in this expression:
CONCAT_WS(" ", ...)
The first argument " " is treated as a string literal on one server, and on the other server it is treated as a column whose name is (which is legal SQL, even if it's weird).
It's safest to always use single-quotes to delimit a string literal, and to always use back-ticks to delimit an identifier.
Never use double-quotes for either case in MySQL, because your code that uses double-quotes in SQL queries will break if someone changes the sql mode.

Types of Wildcards in MySql

My query:
Select * From tableName Where columnName Like "[PST]%"
is not giving the expected result.
Why does this wildcard not work in MySql?
If you want to filter on strings that contain any 'P', 'S', or 'T', then you can use a regex:
where col rlike '[PST]'
If you want strings that contain substring 'PST', then no need for square brackets - and like is enough:
where col like '%PST%'
If you want the matching character(s) at the start of the string, then the regex solution looks like:
where col rlike '^PST'
And the like option would be:
where col like 'PST%'
MySQL's LIKE syntax is documented here: https://dev.mysql.com/doc/refman/8.0/en/pattern-matching.html
Standard SQL from decades ago defined only two wildcards: % and _. These are the only wildcards an SQL product needs to support if they want to say they are SQL compliant and support the LIKE predicate.
% matches zero or more of any characters. It's analogous to .* in regular expressions.
_ matches exactly one of any character. It's analogous to . in regular expressions.
Also if you want to match a literal '%' or '_', you need to escape it, i.e. put a backslash before it:
WHERE title LIKE 'The 7\% Solution'
Microsoft SQL Server's LIKE syntax is documented here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-ver15
They support % and _ wildcards, and the \ escape character, but they extend standard SQL with two other forms:
[a-z] matches one character, but only characters in the range inside the brackets. This is similar in regular expressions. The - is a range operator, unless it appears at the start or end of the string inside the brackets.
[^a-z] matches one character, which must not be one of the characters in the range inside the brackets. Also the same in regular expressions.
These are not standard forms of wildcards for the LIKE predicate, and other brands of SQL database don't support them.
Later versions of the SQL standard introduced a new predicate SIMILAR TO which supports much richer patterns and wildcards, since the right-side operand is a string which contains a regular expression. But since this predicate was introduced in a later edition of the SQL standard, some implementations had already developed their own solution that was almost the same.
MySQL called the operator REGEXP and RLIKE is a synonym (https://dev.mysql.com/doc/refman/8.0/en/regexp.html).
It was requested in https://bugs.mysql.com/bug.php?id=746 to support SIMILAR TO syntax to help MySQL comply with the SQL standard, but the request was turned down, because it had subtly different behavior to the existing REGEXP/RLIKE operator.
Microsoft SQL Server has partial support of regular expression wildcards in the LIKE operator, and also a dbo.RegexMatch() function.
SQLite has a GLOB operator, and so on.
Thanks everyone!
For specific this question, we need to use regexp
Select * From tableName Where ColumnName Regexp "^[PST]";
For more detail over Regular Expression i.e Regexp :
https://www.youtube.com/watch?v=KoltE-JUY0c

MySQL REGEXP_SUBSTR() escaping issue?

Please take the following example regex:
https://regexr.com/4ek7r
As you can see, the regex works great and matches the sizes (e.g. 3/16" etc) from the product descriptions.
I'm trying to implement this in MySQL 8.0.15 using REGEXP_SUBSTR()
As per the documentation I have doubled up the escape characters but the regex is not working.
Please see the following SQL fiddle:
https://www.db-fiddle.com/f/e6Ez3XCdU5Ahs91z6TQA8P/0
As you can see, REGEXP_SUBSTR() returns NULL
I'm presuming this is an escape issue - but i'm not 100% sure.
How can I ensure MySQL returns the 1st match per product (row) akin to the regexr.com example?
Cheers
Edit: 28/05/2019 - root cause
Wiktor's answer below solved my problem and his regex was much cleaner & well worth the upvote. That said, i didn't understand why my original version was not working after the port from SQL Server to MySQL. I finally noticed the problem this morning - it had nothing to do with the regex, it was a rookie error in string concatenation! Specifically, I was using UPPER(Description + ' ') (i.e. using +) - which works fine in SQL Server but obviously; MySQL forces numeric! So i was essentially running my regex against a 0! Replacing the + with CONCAT actually fixed my original query with original regex - just thought i'd share this in case it helps anyone else!
In MySQL v8.x that supports ICU regex, you may use
SELECT Description, REGEXP_SUBSTR(Description, '(?im)(?=\\b(?:[0-9/]+(?:\\.[0-9/]+)?\\s*(?:[X-]|$)|[0-9/\\s]+(?:\\.[0-9/]+)?(?:[CM]?M|["”TH])))[0-9/\\s.]+(?:[CM]?M|["”TH])?(?:\\s*[/X-]\\s*[0-9/\\s.]+(?:[CM]?M|["”TH])?)?(?=[.\\s()]|$)') AS Size FROM tbl_Example
The main points:
The flags can be used as inline options, (?mi), m will enable multiline mode when ^ and $ match start/end of a line and i will enable case insensitive mode
[$] matches a $ char, to match end of a line position, you need to move $ out of a character class, use alternations in this case ((?=[\.\s\(\)$]) -> (?=[.\s()]|$), yes, do not escape what does not have to be escaped, too)
Matching fractional number part, it is better to use a (?:\.[0-9/]+)? like pattern (it matches an optional sequence of . and then 1 or more digits or /s)
(C|M)? is better written as [CM]? (a character class is more efficient)

MySQL for replace with wildcard

I'm trying to write a SQL update to replace a specific xml node with a new string:
UPDATE table
SET Configuration = REPLACE(Configuration,
"<tag>%%ANY_VALUE%%</tag>"
"<tag>NEW_DATA</tag>");
So that
<root><tag>SDADAS</tag></root>
becomes
<root><tag>NEW_DATA</tag></root>
Is there a syntax im missing for this type of request?
Update: MySQL 8.0 has a function REGEX_REPLACE().
Below is my answer from 2014, which still applies to any version of MySQL before 8.0:
REPLACE() does not have any support for wildcards, patterns, regular expressions, etc. REPLACE() only replaces one constant string for another constant string.
You could try something complex, to pick out the leading part of the string and the trailing part of the string:
UPDATE table
SET Configuration = CONCAT(
SUBSTR(Configuration, 1, LOCATE('<tag>', Configuration)+4),
NEW_DATA,
SUBSTR(Configuration, LOCATE('</tag>', Configuration)
)
But this doesn't work for cases when you have multiple occurrences of <tag>.
You may have to fetch the row back into an application, perform string replacement using your favorite language, and post the row back. In other words, a three-step process for each row.

Regexp - back references, translating code from PHP to MySQL

I have a regular expression that works in PHP, but not the MySQL REGEXP function.
'(.)\1{2,}'
In PHP this matches any char that is repeated 2 or more times, how can I translate this to work with the MySQL function.
Sorry, you can't. MySQL uses POSIX regex, which doesn't support back-references. If you must perform such matches in MySQL, your only option is to install a UDF such as lib_mysqludf_preg.