MySQL replace ASCII characters - mysql

I have some weird problem with MySQL. I am trying to match two strings, but in one string there is extra characters.
Initial string looks like this 'Ascot '
When I select:
select ascii(substring(name, 1, 1)), ascii(substring(name, 7, 1))
I get 194, 194. But when I replace:
select replace(name, char(194), '' )
it shows as '?Ascot?' in phpMyAdmin and no matching is done. Can someone please help me with this?
Problem column is defined as utf8mb4_unicode_ci. I am trying to match this with column from another table defined as utf8_general_ci.
Tried to change utf8_general_ci to utf8mb4_unicode_ci but with no results.
When I do substring(name, 2, 5) then it matches. So the solution should be to replace those characters.
EDIT:
I tried the following function to remove non alphanumeric characters and it seems to work now:
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE v_char VARCHAR(1);
DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';
WHILE (i <= LENGTH(prm_strInput) ) DO
SET v_char = SUBSTR(prm_strInput,i,1);
IF v_char REGEXP '^[A-Za-z0-9 ]+$' THEN #alphanumeric
SET v_parseStr = CONCAT(v_parseStr,v_char);
END IF;
SET i = i + 1;
END WHILE;
RETURN trim(v_parseStr);
END
But this is extremely inefficient...

Use this before your mysql query:
mysql_query("SET NAMES 'utf8'");

Related

How to replace words in specific column using mysql? [duplicate]

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;
I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]
Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:
SELECT REPLACE('stackowerflow', 'ower', 'over');
Output: "stackoverflow"
/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-');
Output: "-tackover-low"
I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.
(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)
MySQL 8.0+:
You can use the native REGEXP_REPLACE function.
Older versions:
You can use a user-defined function (UDF) like mysql-udf-regexp.
If you are using MariaDB or MySQL 8.0, they have a function
REGEXP_REPLACE(col, regexp, replace)
See MariaDB docs and PCRE Regular expression enhancements
Note that you can use regexp grouping as well (I found that very useful):
SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\\2 - \\1 - \\3')
returns
over - stack - flow
My brute force method to get this to work was just:
Dump the table - mysqldump -u user -p database table > dump.sql
Find and replace a couple patterns - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;, There are obviously other perl regeular expressions you could perform on the file as well.
Import the table - mysqlimport -u user -p database table < dump.sql
If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.
With MySQL 8.0+ you could use natively REGEXP_REPLACE function.
12.5.2 Regular Expressions:
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
and Regular expression support:
Previously, MySQL used the Henry Spencer regular expression library to support regular expression operators (REGEXP, RLIKE).
Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The REGEXP_LIKE() function performs regular expression matching in the manner of the REGEXP and RLIKE operators, which now are synonyms for that function. In addition, the REGEXP_INSTR(), REGEXP_REPLACE(), and REGEXP_SUBSTR() functions are available to find match positions and perform substring substitution and extraction, respectively.
SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c');
-- Output:
-tackover-low
DBFiddle Demo
I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:
http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/
Here is the function code:
DELIMITER $$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))
RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END$$
DELIMITER ;
Example execution:
mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \\ my- sql (regular) ,expressions ._,');
we solve this problem without using regex
this query replace only exact match string.
update employee set
employee_firstname =
trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))
Example:
emp_id employee_firstname
1 jay
2 jay ajay
3 jay
After executing query result:
emp_id employee_firstname
1 abc
2 abc ajay
3 abc
UPDATE 2: A useful set of regex functions including REGEXP_REPLACE have now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.
UPDATE 1: Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html
The following expands upon the function provided by Rasika Godawatte but trawls through all necessary substrings rather than just testing single characters:
-- ------------------------------------------------------------------------------------
-- USAGE
-- ------------------------------------------------------------------------------------
-- SELECT reg_replace(<subject>,
-- <pattern>,
-- <replacement>,
-- <greedy>,
-- <minMatchLen>,
-- <maxMatchLen>);
-- where:
-- <subject> is the string to look in for doing the replacements
-- <pattern> is the regular expression to match against
-- <replacement> is the replacement string
-- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching
-- <minMatchLen> specifies the minimum match length
-- <maxMatchLen> specifies the maximum match length
-- (minMatchLen and maxMatchLen are used to improve efficiency but are
-- optional and can be set to 0 or NULL if not known/required)
-- Example:
-- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl;
DROP FUNCTION IF EXISTS reg_replace;
DELIMITER //
CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT)
RETURNS VARCHAR(21845) DETERMINISTIC BEGIN
DECLARE result, subStr, usePattern VARCHAR(21845);
DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
IF subject REGEXP pattern THEN
SET result = '';
-- Sanitize input parameter values
SET minMatchLen = IF(minMatchLen IS NULL OR minMatchLen < 1, 1, minMatchLen);
SET maxMatchLen = IF(maxMatchLen IS NULL OR maxMatchLen < 1
OR maxMatchLen > CHAR_LENGTH(subject),
CHAR_LENGTH(subject), maxMatchLen);
-- Set the pattern to use to match an entire string rather than part of a string
SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
-- Set start position to 1 if pattern starts with ^ or doesn't end with $.
IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
SET startPos = 1, startInc = 1;
-- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos
-- to the min or max match length from the end (depending on "greedy" flag).
ELSEIF greedy THEN
SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
ELSE
SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
END IF;
WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
-- Set start length to maximum if matching greedily or pattern ends with $.
-- Otherwise set starting length to the minimum match length.
IF greedy OR RIGHT(pattern, 1) = '$' THEN
SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
ELSE
SET len = minMatchLen, lenInc = 1;
END IF;
SET prevStartPos = startPos;
lenLoop: WHILE len >= 1 AND len <= maxMatchLen
AND startPos + len - 1 <= CHAR_LENGTH(subject)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
SET subStr = SUBSTRING(subject, startPos, len);
IF subStr REGEXP usePattern THEN
SET result = IF(startInc = 1,
CONCAT(result, replacement), CONCAT(replacement, result));
SET startPos = startPos + startInc * len;
LEAVE lenLoop;
END IF;
SET len = len + lenInc;
END WHILE;
IF (startPos = prevStartPos) THEN
SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
CONCAT(SUBSTRING(subject, startPos, 1), result));
SET startPos = startPos + startInc;
END IF;
END WHILE;
IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
ELSEIF startInc = -1 AND startPos >= 1 THEN
SET result = CONCAT(LEFT(subject, startPos), result);
END IF;
ELSE
SET result = subject;
END IF;
RETURN result;
END//
DELIMITER ;
Demo
Rextester Demo
Limitations
This method is of course going to take a while when the subject
string is large. Update: Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).
It won't allow substitution of backreferences (e.g. \1, \2
etc.) to replace capturing groups. If this functionality is needed, please see this answer which attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity).
If ^and/or $ is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as (^start|end$) are not supported.
There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g. a.*?b.*) is not supported.
Usage Examples
The function has been used to answer the following StackOverflow questions:
How to count words in MySQL / regular expression
replacer?
How to extract the nth word and count word occurrences in a MySQL
string?
How to extract two consecutive digits from a text field in
MySQL?
How to remove all non-alpha numeric characters from a string in
MySQL?
How to replace every other instance of a particular character in a MySQL
string?
How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?
I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:
https://github.com/mysqludf/lib_mysqludf_preg
Sample SQL:
SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;
I found the package from this blog post as linked on this question.
You 'can' do it ... but it's not very wise ... this is about as daring as I'll try ... as far as full RegEx support your much better off using perl or the like.
UPDATE db.tbl
SET column =
CASE
WHEN column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
THEN REPLACE(column,'WORD_TO_REPLACE','REPLACEMENT')
END
WHERE column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
I think there is an easy way to achieve this and It's working fine for me.
To SELECT rows using REGEX
SELECT * FROM `table_name` WHERE `column_name_to_find` REGEXP 'string-to-find'
To UPDATE rows using REGEX
UPDATE `table_name` SET column_name_to_find=REGEXP_REPLACE(column_name_to_find, 'string-to-find', 'string-to-replace') WHERE column_name_to_find REGEXP 'string-to-find'
REGEXP Reference:
https://www.geeksforgeeks.org/mysql-regular-expressions-regexp/
We can use IF condition in SELECT query as below:
Suppose that for anything with "ABC","ABC1","ABC2","ABC3",..., we want to replace with "ABC" then using REGEXP and IF() condition in the SELECT query, we can achieve this.
Syntax:
SELECT IF(column_name REGEXP 'ABC[0-9]$','ABC',column_name)
FROM table1
WHERE column_name LIKE 'ABC%';
Example:
SELECT IF('ABC1' REGEXP 'ABC[0-9]$','ABC','ABC1');
The one below basically finds the first match from the left and then replaces all occurences of it (tested in mysql-5.6).
Usage:
SELECT REGEX_REPLACE('dis ambiguity', 'dis[[:space:]]*ambiguity', 'disambiguity');
Implementation:
DELIMITER $$
CREATE FUNCTION REGEX_REPLACE(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000),
var_replacement VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT 'Based on https://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/'
BEGIN
DECLARE var_replaced VARCHAR(1000) DEFAULT var_original;
DECLARE var_leftmost_match VARCHAR(1000) DEFAULT
REGEX_CAPTURE_LEFTMOST(var_original, var_pattern);
WHILE var_leftmost_match IS NOT NULL DO
IF var_replacement <> var_leftmost_match THEN
SET var_replaced = REPLACE(var_replaced, var_leftmost_match, var_replacement);
SET var_leftmost_match = REGEX_CAPTURE_LEFTMOST(var_replaced, var_pattern);
ELSE
SET var_leftmost_match = NULL;
END IF;
END WHILE;
RETURN var_replaced;
END $$
DELIMITER ;
DELIMITER $$
CREATE FUNCTION REGEX_CAPTURE_LEFTMOST(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT '
Captures the leftmost substring that matches the [var_pattern]
IN [var_original], OR NULL if no match.
'
BEGIN
DECLARE var_temp_l VARCHAR(1000);
DECLARE var_temp_r VARCHAR(1000);
DECLARE var_left_trim_index INT;
DECLARE var_right_trim_index INT;
SET var_left_trim_index = 1;
SET var_right_trim_index = 1;
SET var_temp_l = '';
SET var_temp_r = '';
WHILE (CHAR_LENGTH(var_original) >= var_left_trim_index) DO
SET var_temp_l = LEFT(var_original, var_left_trim_index);
IF var_temp_l REGEXP var_pattern THEN
WHILE (CHAR_LENGTH(var_temp_l) >= var_right_trim_index) DO
SET var_temp_r = RIGHT(var_temp_l, var_right_trim_index);
IF var_temp_r REGEXP var_pattern THEN
RETURN var_temp_r;
END IF;
SET var_right_trim_index = var_right_trim_index + 1;
END WHILE;
END IF;
SET var_left_trim_index = var_left_trim_index + 1;
END WHILE;
RETURN NULL;
END $$
DELIMITER ;
Yes, you can.
UPDATE table_name
SET column_name = 'seach_str_name'
WHERE column_name REGEXP '[^a-zA-Z0-9()_ .\-]';

MySQL extract data from text using RegEx [duplicate]

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;
I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]
Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:
SELECT REPLACE('stackowerflow', 'ower', 'over');
Output: "stackoverflow"
/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-');
Output: "-tackover-low"
I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.
(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)
MySQL 8.0+:
You can use the native REGEXP_REPLACE function.
Older versions:
You can use a user-defined function (UDF) like mysql-udf-regexp.
If you are using MariaDB or MySQL 8.0, they have a function
REGEXP_REPLACE(col, regexp, replace)
See MariaDB docs and PCRE Regular expression enhancements
Note that you can use regexp grouping as well (I found that very useful):
SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\\2 - \\1 - \\3')
returns
over - stack - flow
My brute force method to get this to work was just:
Dump the table - mysqldump -u user -p database table > dump.sql
Find and replace a couple patterns - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;, There are obviously other perl regeular expressions you could perform on the file as well.
Import the table - mysqlimport -u user -p database table < dump.sql
If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.
With MySQL 8.0+ you could use natively REGEXP_REPLACE function.
12.5.2 Regular Expressions:
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
and Regular expression support:
Previously, MySQL used the Henry Spencer regular expression library to support regular expression operators (REGEXP, RLIKE).
Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The REGEXP_LIKE() function performs regular expression matching in the manner of the REGEXP and RLIKE operators, which now are synonyms for that function. In addition, the REGEXP_INSTR(), REGEXP_REPLACE(), and REGEXP_SUBSTR() functions are available to find match positions and perform substring substitution and extraction, respectively.
SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c');
-- Output:
-tackover-low
DBFiddle Demo
I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:
http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/
Here is the function code:
DELIMITER $$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))
RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END$$
DELIMITER ;
Example execution:
mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \\ my- sql (regular) ,expressions ._,');
we solve this problem without using regex
this query replace only exact match string.
update employee set
employee_firstname =
trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))
Example:
emp_id employee_firstname
1 jay
2 jay ajay
3 jay
After executing query result:
emp_id employee_firstname
1 abc
2 abc ajay
3 abc
UPDATE 2: A useful set of regex functions including REGEXP_REPLACE have now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.
UPDATE 1: Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html
The following expands upon the function provided by Rasika Godawatte but trawls through all necessary substrings rather than just testing single characters:
-- ------------------------------------------------------------------------------------
-- USAGE
-- ------------------------------------------------------------------------------------
-- SELECT reg_replace(<subject>,
-- <pattern>,
-- <replacement>,
-- <greedy>,
-- <minMatchLen>,
-- <maxMatchLen>);
-- where:
-- <subject> is the string to look in for doing the replacements
-- <pattern> is the regular expression to match against
-- <replacement> is the replacement string
-- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching
-- <minMatchLen> specifies the minimum match length
-- <maxMatchLen> specifies the maximum match length
-- (minMatchLen and maxMatchLen are used to improve efficiency but are
-- optional and can be set to 0 or NULL if not known/required)
-- Example:
-- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl;
DROP FUNCTION IF EXISTS reg_replace;
DELIMITER //
CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT)
RETURNS VARCHAR(21845) DETERMINISTIC BEGIN
DECLARE result, subStr, usePattern VARCHAR(21845);
DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
IF subject REGEXP pattern THEN
SET result = '';
-- Sanitize input parameter values
SET minMatchLen = IF(minMatchLen IS NULL OR minMatchLen < 1, 1, minMatchLen);
SET maxMatchLen = IF(maxMatchLen IS NULL OR maxMatchLen < 1
OR maxMatchLen > CHAR_LENGTH(subject),
CHAR_LENGTH(subject), maxMatchLen);
-- Set the pattern to use to match an entire string rather than part of a string
SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
-- Set start position to 1 if pattern starts with ^ or doesn't end with $.
IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
SET startPos = 1, startInc = 1;
-- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos
-- to the min or max match length from the end (depending on "greedy" flag).
ELSEIF greedy THEN
SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
ELSE
SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
END IF;
WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
-- Set start length to maximum if matching greedily or pattern ends with $.
-- Otherwise set starting length to the minimum match length.
IF greedy OR RIGHT(pattern, 1) = '$' THEN
SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
ELSE
SET len = minMatchLen, lenInc = 1;
END IF;
SET prevStartPos = startPos;
lenLoop: WHILE len >= 1 AND len <= maxMatchLen
AND startPos + len - 1 <= CHAR_LENGTH(subject)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
SET subStr = SUBSTRING(subject, startPos, len);
IF subStr REGEXP usePattern THEN
SET result = IF(startInc = 1,
CONCAT(result, replacement), CONCAT(replacement, result));
SET startPos = startPos + startInc * len;
LEAVE lenLoop;
END IF;
SET len = len + lenInc;
END WHILE;
IF (startPos = prevStartPos) THEN
SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
CONCAT(SUBSTRING(subject, startPos, 1), result));
SET startPos = startPos + startInc;
END IF;
END WHILE;
IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
ELSEIF startInc = -1 AND startPos >= 1 THEN
SET result = CONCAT(LEFT(subject, startPos), result);
END IF;
ELSE
SET result = subject;
END IF;
RETURN result;
END//
DELIMITER ;
Demo
Rextester Demo
Limitations
This method is of course going to take a while when the subject
string is large. Update: Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).
It won't allow substitution of backreferences (e.g. \1, \2
etc.) to replace capturing groups. If this functionality is needed, please see this answer which attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity).
If ^and/or $ is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as (^start|end$) are not supported.
There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g. a.*?b.*) is not supported.
Usage Examples
The function has been used to answer the following StackOverflow questions:
How to count words in MySQL / regular expression
replacer?
How to extract the nth word and count word occurrences in a MySQL
string?
How to extract two consecutive digits from a text field in
MySQL?
How to remove all non-alpha numeric characters from a string in
MySQL?
How to replace every other instance of a particular character in a MySQL
string?
How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?
I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:
https://github.com/mysqludf/lib_mysqludf_preg
Sample SQL:
SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;
I found the package from this blog post as linked on this question.
You 'can' do it ... but it's not very wise ... this is about as daring as I'll try ... as far as full RegEx support your much better off using perl or the like.
UPDATE db.tbl
SET column =
CASE
WHEN column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
THEN REPLACE(column,'WORD_TO_REPLACE','REPLACEMENT')
END
WHERE column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
I think there is an easy way to achieve this and It's working fine for me.
To SELECT rows using REGEX
SELECT * FROM `table_name` WHERE `column_name_to_find` REGEXP 'string-to-find'
To UPDATE rows using REGEX
UPDATE `table_name` SET column_name_to_find=REGEXP_REPLACE(column_name_to_find, 'string-to-find', 'string-to-replace') WHERE column_name_to_find REGEXP 'string-to-find'
REGEXP Reference:
https://www.geeksforgeeks.org/mysql-regular-expressions-regexp/
We can use IF condition in SELECT query as below:
Suppose that for anything with "ABC","ABC1","ABC2","ABC3",..., we want to replace with "ABC" then using REGEXP and IF() condition in the SELECT query, we can achieve this.
Syntax:
SELECT IF(column_name REGEXP 'ABC[0-9]$','ABC',column_name)
FROM table1
WHERE column_name LIKE 'ABC%';
Example:
SELECT IF('ABC1' REGEXP 'ABC[0-9]$','ABC','ABC1');
The one below basically finds the first match from the left and then replaces all occurences of it (tested in mysql-5.6).
Usage:
SELECT REGEX_REPLACE('dis ambiguity', 'dis[[:space:]]*ambiguity', 'disambiguity');
Implementation:
DELIMITER $$
CREATE FUNCTION REGEX_REPLACE(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000),
var_replacement VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT 'Based on https://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/'
BEGIN
DECLARE var_replaced VARCHAR(1000) DEFAULT var_original;
DECLARE var_leftmost_match VARCHAR(1000) DEFAULT
REGEX_CAPTURE_LEFTMOST(var_original, var_pattern);
WHILE var_leftmost_match IS NOT NULL DO
IF var_replacement <> var_leftmost_match THEN
SET var_replaced = REPLACE(var_replaced, var_leftmost_match, var_replacement);
SET var_leftmost_match = REGEX_CAPTURE_LEFTMOST(var_replaced, var_pattern);
ELSE
SET var_leftmost_match = NULL;
END IF;
END WHILE;
RETURN var_replaced;
END $$
DELIMITER ;
DELIMITER $$
CREATE FUNCTION REGEX_CAPTURE_LEFTMOST(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT '
Captures the leftmost substring that matches the [var_pattern]
IN [var_original], OR NULL if no match.
'
BEGIN
DECLARE var_temp_l VARCHAR(1000);
DECLARE var_temp_r VARCHAR(1000);
DECLARE var_left_trim_index INT;
DECLARE var_right_trim_index INT;
SET var_left_trim_index = 1;
SET var_right_trim_index = 1;
SET var_temp_l = '';
SET var_temp_r = '';
WHILE (CHAR_LENGTH(var_original) >= var_left_trim_index) DO
SET var_temp_l = LEFT(var_original, var_left_trim_index);
IF var_temp_l REGEXP var_pattern THEN
WHILE (CHAR_LENGTH(var_temp_l) >= var_right_trim_index) DO
SET var_temp_r = RIGHT(var_temp_l, var_right_trim_index);
IF var_temp_r REGEXP var_pattern THEN
RETURN var_temp_r;
END IF;
SET var_right_trim_index = var_right_trim_index + 1;
END WHILE;
END IF;
SET var_left_trim_index = var_left_trim_index + 1;
END WHILE;
RETURN NULL;
END $$
DELIMITER ;
Yes, you can.
UPDATE table_name
SET column_name = 'seach_str_name'
WHERE column_name REGEXP '[^a-zA-Z0-9()_ .\-]';

Replacing entire word in mysql with wildcards [duplicate]

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;
I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]
Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:
SELECT REPLACE('stackowerflow', 'ower', 'over');
Output: "stackoverflow"
/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-');
Output: "-tackover-low"
I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.
(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)
MySQL 8.0+:
You can use the native REGEXP_REPLACE function.
Older versions:
You can use a user-defined function (UDF) like mysql-udf-regexp.
If you are using MariaDB or MySQL 8.0, they have a function
REGEXP_REPLACE(col, regexp, replace)
See MariaDB docs and PCRE Regular expression enhancements
Note that you can use regexp grouping as well (I found that very useful):
SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\\2 - \\1 - \\3')
returns
over - stack - flow
My brute force method to get this to work was just:
Dump the table - mysqldump -u user -p database table > dump.sql
Find and replace a couple patterns - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;, There are obviously other perl regeular expressions you could perform on the file as well.
Import the table - mysqlimport -u user -p database table < dump.sql
If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.
With MySQL 8.0+ you could use natively REGEXP_REPLACE function.
12.5.2 Regular Expressions:
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
and Regular expression support:
Previously, MySQL used the Henry Spencer regular expression library to support regular expression operators (REGEXP, RLIKE).
Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The REGEXP_LIKE() function performs regular expression matching in the manner of the REGEXP and RLIKE operators, which now are synonyms for that function. In addition, the REGEXP_INSTR(), REGEXP_REPLACE(), and REGEXP_SUBSTR() functions are available to find match positions and perform substring substitution and extraction, respectively.
SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c');
-- Output:
-tackover-low
DBFiddle Demo
I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:
http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/
Here is the function code:
DELIMITER $$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))
RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END$$
DELIMITER ;
Example execution:
mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \\ my- sql (regular) ,expressions ._,');
we solve this problem without using regex
this query replace only exact match string.
update employee set
employee_firstname =
trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))
Example:
emp_id employee_firstname
1 jay
2 jay ajay
3 jay
After executing query result:
emp_id employee_firstname
1 abc
2 abc ajay
3 abc
UPDATE 2: A useful set of regex functions including REGEXP_REPLACE have now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.
UPDATE 1: Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html
The following expands upon the function provided by Rasika Godawatte but trawls through all necessary substrings rather than just testing single characters:
-- ------------------------------------------------------------------------------------
-- USAGE
-- ------------------------------------------------------------------------------------
-- SELECT reg_replace(<subject>,
-- <pattern>,
-- <replacement>,
-- <greedy>,
-- <minMatchLen>,
-- <maxMatchLen>);
-- where:
-- <subject> is the string to look in for doing the replacements
-- <pattern> is the regular expression to match against
-- <replacement> is the replacement string
-- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching
-- <minMatchLen> specifies the minimum match length
-- <maxMatchLen> specifies the maximum match length
-- (minMatchLen and maxMatchLen are used to improve efficiency but are
-- optional and can be set to 0 or NULL if not known/required)
-- Example:
-- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl;
DROP FUNCTION IF EXISTS reg_replace;
DELIMITER //
CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT)
RETURNS VARCHAR(21845) DETERMINISTIC BEGIN
DECLARE result, subStr, usePattern VARCHAR(21845);
DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
IF subject REGEXP pattern THEN
SET result = '';
-- Sanitize input parameter values
SET minMatchLen = IF(minMatchLen IS NULL OR minMatchLen < 1, 1, minMatchLen);
SET maxMatchLen = IF(maxMatchLen IS NULL OR maxMatchLen < 1
OR maxMatchLen > CHAR_LENGTH(subject),
CHAR_LENGTH(subject), maxMatchLen);
-- Set the pattern to use to match an entire string rather than part of a string
SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
-- Set start position to 1 if pattern starts with ^ or doesn't end with $.
IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
SET startPos = 1, startInc = 1;
-- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos
-- to the min or max match length from the end (depending on "greedy" flag).
ELSEIF greedy THEN
SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
ELSE
SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
END IF;
WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
-- Set start length to maximum if matching greedily or pattern ends with $.
-- Otherwise set starting length to the minimum match length.
IF greedy OR RIGHT(pattern, 1) = '$' THEN
SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
ELSE
SET len = minMatchLen, lenInc = 1;
END IF;
SET prevStartPos = startPos;
lenLoop: WHILE len >= 1 AND len <= maxMatchLen
AND startPos + len - 1 <= CHAR_LENGTH(subject)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
SET subStr = SUBSTRING(subject, startPos, len);
IF subStr REGEXP usePattern THEN
SET result = IF(startInc = 1,
CONCAT(result, replacement), CONCAT(replacement, result));
SET startPos = startPos + startInc * len;
LEAVE lenLoop;
END IF;
SET len = len + lenInc;
END WHILE;
IF (startPos = prevStartPos) THEN
SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
CONCAT(SUBSTRING(subject, startPos, 1), result));
SET startPos = startPos + startInc;
END IF;
END WHILE;
IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
ELSEIF startInc = -1 AND startPos >= 1 THEN
SET result = CONCAT(LEFT(subject, startPos), result);
END IF;
ELSE
SET result = subject;
END IF;
RETURN result;
END//
DELIMITER ;
Demo
Rextester Demo
Limitations
This method is of course going to take a while when the subject
string is large. Update: Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).
It won't allow substitution of backreferences (e.g. \1, \2
etc.) to replace capturing groups. If this functionality is needed, please see this answer which attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity).
If ^and/or $ is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as (^start|end$) are not supported.
There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g. a.*?b.*) is not supported.
Usage Examples
The function has been used to answer the following StackOverflow questions:
How to count words in MySQL / regular expression
replacer?
How to extract the nth word and count word occurrences in a MySQL
string?
How to extract two consecutive digits from a text field in
MySQL?
How to remove all non-alpha numeric characters from a string in
MySQL?
How to replace every other instance of a particular character in a MySQL
string?
How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?
I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:
https://github.com/mysqludf/lib_mysqludf_preg
Sample SQL:
SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;
I found the package from this blog post as linked on this question.
You 'can' do it ... but it's not very wise ... this is about as daring as I'll try ... as far as full RegEx support your much better off using perl or the like.
UPDATE db.tbl
SET column =
CASE
WHEN column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
THEN REPLACE(column,'WORD_TO_REPLACE','REPLACEMENT')
END
WHERE column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
I think there is an easy way to achieve this and It's working fine for me.
To SELECT rows using REGEX
SELECT * FROM `table_name` WHERE `column_name_to_find` REGEXP 'string-to-find'
To UPDATE rows using REGEX
UPDATE `table_name` SET column_name_to_find=REGEXP_REPLACE(column_name_to_find, 'string-to-find', 'string-to-replace') WHERE column_name_to_find REGEXP 'string-to-find'
REGEXP Reference:
https://www.geeksforgeeks.org/mysql-regular-expressions-regexp/
We can use IF condition in SELECT query as below:
Suppose that for anything with "ABC","ABC1","ABC2","ABC3",..., we want to replace with "ABC" then using REGEXP and IF() condition in the SELECT query, we can achieve this.
Syntax:
SELECT IF(column_name REGEXP 'ABC[0-9]$','ABC',column_name)
FROM table1
WHERE column_name LIKE 'ABC%';
Example:
SELECT IF('ABC1' REGEXP 'ABC[0-9]$','ABC','ABC1');
The one below basically finds the first match from the left and then replaces all occurences of it (tested in mysql-5.6).
Usage:
SELECT REGEX_REPLACE('dis ambiguity', 'dis[[:space:]]*ambiguity', 'disambiguity');
Implementation:
DELIMITER $$
CREATE FUNCTION REGEX_REPLACE(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000),
var_replacement VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT 'Based on https://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/'
BEGIN
DECLARE var_replaced VARCHAR(1000) DEFAULT var_original;
DECLARE var_leftmost_match VARCHAR(1000) DEFAULT
REGEX_CAPTURE_LEFTMOST(var_original, var_pattern);
WHILE var_leftmost_match IS NOT NULL DO
IF var_replacement <> var_leftmost_match THEN
SET var_replaced = REPLACE(var_replaced, var_leftmost_match, var_replacement);
SET var_leftmost_match = REGEX_CAPTURE_LEFTMOST(var_replaced, var_pattern);
ELSE
SET var_leftmost_match = NULL;
END IF;
END WHILE;
RETURN var_replaced;
END $$
DELIMITER ;
DELIMITER $$
CREATE FUNCTION REGEX_CAPTURE_LEFTMOST(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT '
Captures the leftmost substring that matches the [var_pattern]
IN [var_original], OR NULL if no match.
'
BEGIN
DECLARE var_temp_l VARCHAR(1000);
DECLARE var_temp_r VARCHAR(1000);
DECLARE var_left_trim_index INT;
DECLARE var_right_trim_index INT;
SET var_left_trim_index = 1;
SET var_right_trim_index = 1;
SET var_temp_l = '';
SET var_temp_r = '';
WHILE (CHAR_LENGTH(var_original) >= var_left_trim_index) DO
SET var_temp_l = LEFT(var_original, var_left_trim_index);
IF var_temp_l REGEXP var_pattern THEN
WHILE (CHAR_LENGTH(var_temp_l) >= var_right_trim_index) DO
SET var_temp_r = RIGHT(var_temp_l, var_right_trim_index);
IF var_temp_r REGEXP var_pattern THEN
RETURN var_temp_r;
END IF;
SET var_right_trim_index = var_right_trim_index + 1;
END WHILE;
END IF;
SET var_left_trim_index = var_left_trim_index + 1;
END WHILE;
RETURN NULL;
END $$
DELIMITER ;
Yes, you can.
UPDATE table_name
SET column_name = 'seach_str_name'
WHERE column_name REGEXP '[^a-zA-Z0-9()_ .\-]';

Using regex to increment part of a string within MYSQL? [duplicate]

I have a table with ~500k rows; varchar(255) UTF8 column filename contains a file name;
I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]
Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:
SELECT REPLACE('stackowerflow', 'ower', 'over');
Output: "stackoverflow"
/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-');
Output: "-tackover-low"
I know about REGEXP/RLIKE, but those only check if there is a match, not what the match is.
(I could do a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replace and then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)
MySQL 8.0+:
You can use the native REGEXP_REPLACE function.
Older versions:
You can use a user-defined function (UDF) like mysql-udf-regexp.
If you are using MariaDB or MySQL 8.0, they have a function
REGEXP_REPLACE(col, regexp, replace)
See MariaDB docs and PCRE Regular expression enhancements
Note that you can use regexp grouping as well (I found that very useful):
SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\\2 - \\1 - \\3')
returns
over - stack - flow
My brute force method to get this to work was just:
Dump the table - mysqldump -u user -p database table > dump.sql
Find and replace a couple patterns - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;, There are obviously other perl regeular expressions you could perform on the file as well.
Import the table - mysqlimport -u user -p database table < dump.sql
If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.
With MySQL 8.0+ you could use natively REGEXP_REPLACE function.
12.5.2 Regular Expressions:
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
and Regular expression support:
Previously, MySQL used the Henry Spencer regular expression library to support regular expression operators (REGEXP, RLIKE).
Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The REGEXP_LIKE() function performs regular expression matching in the manner of the REGEXP and RLIKE operators, which now are synonyms for that function. In addition, the REGEXP_INSTR(), REGEXP_REPLACE(), and REGEXP_SUBSTR() functions are available to find match positions and perform substring substitution and extraction, respectively.
SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c');
-- Output:
-tackover-low
DBFiddle Demo
I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:
http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/
Here is the function code:
DELIMITER $$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))
RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END$$
DELIMITER ;
Example execution:
mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \\ my- sql (regular) ,expressions ._,');
we solve this problem without using regex
this query replace only exact match string.
update employee set
employee_firstname =
trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))
Example:
emp_id employee_firstname
1 jay
2 jay ajay
3 jay
After executing query result:
emp_id employee_firstname
1 abc
2 abc ajay
3 abc
UPDATE 2: A useful set of regex functions including REGEXP_REPLACE have now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.
UPDATE 1: Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html
The following expands upon the function provided by Rasika Godawatte but trawls through all necessary substrings rather than just testing single characters:
-- ------------------------------------------------------------------------------------
-- USAGE
-- ------------------------------------------------------------------------------------
-- SELECT reg_replace(<subject>,
-- <pattern>,
-- <replacement>,
-- <greedy>,
-- <minMatchLen>,
-- <maxMatchLen>);
-- where:
-- <subject> is the string to look in for doing the replacements
-- <pattern> is the regular expression to match against
-- <replacement> is the replacement string
-- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching
-- <minMatchLen> specifies the minimum match length
-- <maxMatchLen> specifies the maximum match length
-- (minMatchLen and maxMatchLen are used to improve efficiency but are
-- optional and can be set to 0 or NULL if not known/required)
-- Example:
-- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl;
DROP FUNCTION IF EXISTS reg_replace;
DELIMITER //
CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT)
RETURNS VARCHAR(21845) DETERMINISTIC BEGIN
DECLARE result, subStr, usePattern VARCHAR(21845);
DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
IF subject REGEXP pattern THEN
SET result = '';
-- Sanitize input parameter values
SET minMatchLen = IF(minMatchLen IS NULL OR minMatchLen < 1, 1, minMatchLen);
SET maxMatchLen = IF(maxMatchLen IS NULL OR maxMatchLen < 1
OR maxMatchLen > CHAR_LENGTH(subject),
CHAR_LENGTH(subject), maxMatchLen);
-- Set the pattern to use to match an entire string rather than part of a string
SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
-- Set start position to 1 if pattern starts with ^ or doesn't end with $.
IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
SET startPos = 1, startInc = 1;
-- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos
-- to the min or max match length from the end (depending on "greedy" flag).
ELSEIF greedy THEN
SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
ELSE
SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
END IF;
WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
-- Set start length to maximum if matching greedily or pattern ends with $.
-- Otherwise set starting length to the minimum match length.
IF greedy OR RIGHT(pattern, 1) = '$' THEN
SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
ELSE
SET len = minMatchLen, lenInc = 1;
END IF;
SET prevStartPos = startPos;
lenLoop: WHILE len >= 1 AND len <= maxMatchLen
AND startPos + len - 1 <= CHAR_LENGTH(subject)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
SET subStr = SUBSTRING(subject, startPos, len);
IF subStr REGEXP usePattern THEN
SET result = IF(startInc = 1,
CONCAT(result, replacement), CONCAT(replacement, result));
SET startPos = startPos + startInc * len;
LEAVE lenLoop;
END IF;
SET len = len + lenInc;
END WHILE;
IF (startPos = prevStartPos) THEN
SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
CONCAT(SUBSTRING(subject, startPos, 1), result));
SET startPos = startPos + startInc;
END IF;
END WHILE;
IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
ELSEIF startInc = -1 AND startPos >= 1 THEN
SET result = CONCAT(LEFT(subject, startPos), result);
END IF;
ELSE
SET result = subject;
END IF;
RETURN result;
END//
DELIMITER ;
Demo
Rextester Demo
Limitations
This method is of course going to take a while when the subject
string is large. Update: Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).
It won't allow substitution of backreferences (e.g. \1, \2
etc.) to replace capturing groups. If this functionality is needed, please see this answer which attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity).
If ^and/or $ is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as (^start|end$) are not supported.
There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g. a.*?b.*) is not supported.
Usage Examples
The function has been used to answer the following StackOverflow questions:
How to count words in MySQL / regular expression
replacer?
How to extract the nth word and count word occurrences in a MySQL
string?
How to extract two consecutive digits from a text field in
MySQL?
How to remove all non-alpha numeric characters from a string in
MySQL?
How to replace every other instance of a particular character in a MySQL
string?
How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?
I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:
https://github.com/mysqludf/lib_mysqludf_preg
Sample SQL:
SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;
I found the package from this blog post as linked on this question.
You 'can' do it ... but it's not very wise ... this is about as daring as I'll try ... as far as full RegEx support your much better off using perl or the like.
UPDATE db.tbl
SET column =
CASE
WHEN column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
THEN REPLACE(column,'WORD_TO_REPLACE','REPLACEMENT')
END
WHERE column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
I think there is an easy way to achieve this and It's working fine for me.
To SELECT rows using REGEX
SELECT * FROM `table_name` WHERE `column_name_to_find` REGEXP 'string-to-find'
To UPDATE rows using REGEX
UPDATE `table_name` SET column_name_to_find=REGEXP_REPLACE(column_name_to_find, 'string-to-find', 'string-to-replace') WHERE column_name_to_find REGEXP 'string-to-find'
REGEXP Reference:
https://www.geeksforgeeks.org/mysql-regular-expressions-regexp/
We can use IF condition in SELECT query as below:
Suppose that for anything with "ABC","ABC1","ABC2","ABC3",..., we want to replace with "ABC" then using REGEXP and IF() condition in the SELECT query, we can achieve this.
Syntax:
SELECT IF(column_name REGEXP 'ABC[0-9]$','ABC',column_name)
FROM table1
WHERE column_name LIKE 'ABC%';
Example:
SELECT IF('ABC1' REGEXP 'ABC[0-9]$','ABC','ABC1');
The one below basically finds the first match from the left and then replaces all occurences of it (tested in mysql-5.6).
Usage:
SELECT REGEX_REPLACE('dis ambiguity', 'dis[[:space:]]*ambiguity', 'disambiguity');
Implementation:
DELIMITER $$
CREATE FUNCTION REGEX_REPLACE(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000),
var_replacement VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT 'Based on https://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/'
BEGIN
DECLARE var_replaced VARCHAR(1000) DEFAULT var_original;
DECLARE var_leftmost_match VARCHAR(1000) DEFAULT
REGEX_CAPTURE_LEFTMOST(var_original, var_pattern);
WHILE var_leftmost_match IS NOT NULL DO
IF var_replacement <> var_leftmost_match THEN
SET var_replaced = REPLACE(var_replaced, var_leftmost_match, var_replacement);
SET var_leftmost_match = REGEX_CAPTURE_LEFTMOST(var_replaced, var_pattern);
ELSE
SET var_leftmost_match = NULL;
END IF;
END WHILE;
RETURN var_replaced;
END $$
DELIMITER ;
DELIMITER $$
CREATE FUNCTION REGEX_CAPTURE_LEFTMOST(
var_original VARCHAR(1000),
var_pattern VARCHAR(1000)
) RETURNS
VARCHAR(1000)
COMMENT '
Captures the leftmost substring that matches the [var_pattern]
IN [var_original], OR NULL if no match.
'
BEGIN
DECLARE var_temp_l VARCHAR(1000);
DECLARE var_temp_r VARCHAR(1000);
DECLARE var_left_trim_index INT;
DECLARE var_right_trim_index INT;
SET var_left_trim_index = 1;
SET var_right_trim_index = 1;
SET var_temp_l = '';
SET var_temp_r = '';
WHILE (CHAR_LENGTH(var_original) >= var_left_trim_index) DO
SET var_temp_l = LEFT(var_original, var_left_trim_index);
IF var_temp_l REGEXP var_pattern THEN
WHILE (CHAR_LENGTH(var_temp_l) >= var_right_trim_index) DO
SET var_temp_r = RIGHT(var_temp_l, var_right_trim_index);
IF var_temp_r REGEXP var_pattern THEN
RETURN var_temp_r;
END IF;
SET var_right_trim_index = var_right_trim_index + 1;
END WHILE;
END IF;
SET var_left_trim_index = var_left_trim_index + 1;
END WHILE;
RETURN NULL;
END $$
DELIMITER ;
Yes, you can.
UPDATE table_name
SET column_name = 'seach_str_name'
WHERE column_name REGEXP '[^a-zA-Z0-9()_ .\-]';

REGEX in mysql query

i have a table with address as column.
values for address is "#12-3/98 avenue street", which has numbers, special characters and alphabets.
i want to write my sql query usng regex to remove special characters from the address value
ex: "12398avenuestreet" will be the value i want after removing the special characters
thank you.
maybe this function help you
CREATE FUNCTION strip_non_alpha(
_dirty_string varchar(40)
)
RETURNS varchar(40)
BEGIN
DECLARE _length int;
DECLARE _position int;
DECLARE _current_char varchar(1);
DECLARE _clean_string varchar(40);
SET _clean_string = '';
SET _length = LENGTH(_dirty_string);
SET _position = 1;
WHILE _position <= _length DO
SET _current_char = SUBSTRING(_dirty_string, _position, 1);
IF _current_char REGEXP '[A-Za-z0-9]' THEN
SET _clean_string = CONCAT(_clean_string, _current_char);
END IF;
SET _position = _position + 1;
END WHILE;
RETURN CONCAT('', _clean_string);
END;
so you need to call this like
update mytable set address = strip_non_alpha(address);
You don't need RegExp for simple character replacement.
MySQL string functions
Unfortunately, MySQL regular expressions are "match only", you cannot do a replace in your query. This leaves you with only something like this (witch is very-very stupid):
SELECT REPLACE(REPLACE(address, '?', ''), '#', '') -- and many many other nested replaces
FROM table
Or put this logic inside your application (the best option here).
MySQL regular expressions is only for pattern matching and not replacing, so your best bet is to create a function or a repetative use of Replace().
As far as I know, it is not possible to replace via MySQL regex, since these functions are only used for matching.
Alternatively, you can use MySQL Replace for this:
SELECT REPLACE(REPLACE(REPLACE(REPLACE(address, '#', ''), '-', ''), '/', ''), ' ', '') FROM table;
Which will remove #, -, / and spaces and result in the string you want.
You may use this MySQL UDF. And then simply,
update my_table set my_column = PREG_REPLACE('/[^A-Za-z0-9]/' , '' , my_column);