sql - select column but remove non alphanumeric in result [duplicate] - mysql

I'm working on a routine that compares strings, but for better efficiency I need to remove all characters that are not letters or numbers.
I'm using multiple REPLACE functions now, but maybe there is a faster and nicer solution ?

Using MySQL 8.0 or higher
Courtesy of michal.jakubeczy's answer below, replacing by Regex is now supported by MySQL:
UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')
Using MySQL 5.7 or lower
Regex isn't supported here. I had to create my own function called alphanum which stripped the chars for me:
DROP FUNCTION IF EXISTS alphanum;
DELIMITER |
CREATE FUNCTION alphanum( str CHAR(255) ) RETURNS CHAR(255) DETERMINISTIC
BEGIN
DECLARE i, len SMALLINT DEFAULT 1;
DECLARE ret CHAR(255) DEFAULT '';
DECLARE c CHAR(1);
IF str IS NOT NULL THEN
SET len = CHAR_LENGTH( str );
REPEAT
BEGIN
SET c = MID( str, i, 1 );
IF c REGEXP '[[:alnum:]]' THEN
SET ret=CONCAT(ret,c);
END IF;
SET i = i + 1;
END;
UNTIL i > len END REPEAT;
ELSE
SET ret='';
END IF;
RETURN ret;
END |
DELIMITER ;
Now I can do:
select 'This works finally!', alphanum('This works finally!');
and I get:
+---------------------+---------------------------------+
| This works finally! | alphanum('This works finally!') |
+---------------------+---------------------------------+
| This works finally! | Thisworksfinally |
+---------------------+---------------------------------+
1 row in set (0.00 sec)
Hurray!

From a performance point of view,
(and on the assumption that you read more than you write)
I think the best way would be to pre calculate and store a stripped version of the column,
This way you do the transform less.
You can then put an index on the new column and get the database to do the work for you.

SELECT teststring REGEXP '[[:alnum:]]+';
SELECT * FROM testtable WHERE test REGEXP '[[:alnum:]]+';
See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Scroll down to the section that says: [:character_class:]
If you want to manipulate strings the fastest way will be to use a str_udf, see:
https://github.com/hholzgra/mysql-udf-regexp

Since MySQL 8.0 you can use regular expression to remove non alphanumeric characters from a string. There is method REGEXP_REPLACE
Here is the code to remove non-alphanumeric characters:
UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')

Straight and battletested solution for latin and cyrillic characters:
DELIMITER //
CREATE FUNCTION `remove_non_numeric_and_letters`(input TEXT)
RETURNS TEXT
BEGIN
DECLARE output TEXT DEFAULT '';
DECLARE iterator INT DEFAULT 1;
WHILE iterator < (LENGTH(input) + 1) DO
IF SUBSTRING(input, iterator, 1) IN
('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я')
THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END //
DELIMITER ;
Usage:
-- outputs "hello12356"
SELECT remove_non_numeric_and_letters('hello - 12356-привет ""]')

The fastest way I was able to find (and using ) is with convert().
from Doc. CONVERT() with USING is used to convert data between different character sets.
Example:
convert(string USING ascii)
In your case the right character set will be self defined
NOTE from Doc. The USING form of CONVERT() is available as of 4.1.0.

Based on the answer by Ryan Shillington, modified to work with strings longer than 255 characters and preserving spaces from the original string.
FYI there is lower(str) in the end.
I used this to compare strings:
DROP FUNCTION IF EXISTS spacealphanum;
DELIMITER $$
CREATE FUNCTION `spacealphanum`( str TEXT ) RETURNS TEXT CHARSET utf8
BEGIN
DECLARE i, len SMALLINT DEFAULT 1;
DECLARE ret TEXT DEFAULT '';
DECLARE c CHAR(1);
SET len = CHAR_LENGTH( str );
REPEAT
BEGIN
SET c = MID( str, i, 1 );
IF c REGEXP '[[:alnum:]]' THEN
SET ret=CONCAT(ret,c);
ELSEIF c = ' ' THEN
SET ret=CONCAT(ret," ");
END IF;
SET i = i + 1;
END;
UNTIL i > len END REPEAT;
SET ret = lower(ret);
RETURN ret;
END $$
DELIMITER ;

Be careful, characters like ’ or » are considered as alpha by MySQL.
It better to use something like :
IF c BETWEEN 'a' AND 'z' OR c BETWEEN 'A' AND 'Z' OR c BETWEEN '0' AND
'9' OR c = '-' THEN

I have written this UDF. However, it only trims special characters at the beginning of the string. It also converts the string to lower case. You can update this function if desired.
DELIMITER //
DROP FUNCTION IF EXISTS DELETE_DOUBLE_SPACES//
CREATE FUNCTION DELETE_DOUBLE_SPACES ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
DECLARE result VARCHAR(250);
SET result = REPLACE( title, ' ', ' ' );
WHILE (result <> title) DO
SET title = result;
SET result = REPLACE( title, ' ', ' ' );
END WHILE;
RETURN result;
END//
DROP FUNCTION IF EXISTS LFILTER//
CREATE FUNCTION LFILTER ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
WHILE (1=1) DO
IF( ASCII(title) BETWEEN ASCII('a') AND ASCII('z')
OR ASCII(title) BETWEEN ASCII('A') AND ASCII('Z')
OR ASCII(title) BETWEEN ASCII('0') AND ASCII('9')
) THEN
SET title = LOWER( title );
SET title = REPLACE(
REPLACE(
REPLACE(
title,
CHAR(10), ' '
),
CHAR(13), ' '
) ,
CHAR(9), ' '
);
SET title = DELETE_DOUBLE_SPACES( title );
RETURN title;
ELSE
SET title = SUBSTRING( title, 2 );
END IF;
END WHILE;
END//
DELIMITER ;
SELECT LFILTER(' !##$%^&*()_+1a b');
Also, you could use regular expressions but this requires installing a MySql extension.

This can be done with a regular expression replacer function I posted in another answer and have blogged about here. It may not be the most efficient solution possible and might look overkill for the job in hand - but like a Swiss army knife, it may come in useful for other reasons.
It can be seen in action removing all non-alphanumeric characters in this Rextester online demo.
SQL (excluding the function code for brevity):
SELECT txt,
reg_replace(txt,
'[^a-zA-Z0-9]+',
'',
TRUE,
0,
0
) AS `reg_replaced`
FROM test;

I had a similar problem with trying to match last names in our database that were slightly different. For example, sometimes people entered the same person's name as "McDonald" and also as "Mc Donald", or "St John" and "St. John".
Instead of trying to convert the Mysql data, I solved the problem by creating a function (in PHP) that would take a string and create an alpha-only regular expression:
function alpha_only_regex($str) {
$alpha_only = str_split(preg_replace('/[^A-Z]/i', '', $str));
return '^[^a-zA-Z]*'.implode('[^a-zA-Z]*', $alpha_only).'[^a-zA-Z]*$';
}
Now I can search the database with a query like this:
$lastname_regex = alpha_only_regex($lastname);
$query = "SELECT * FROM my_table WHERE lastname REGEXP '$lastname_regex';

So far, the only alternative approach less complicated than the other answers here is to determine the full set of special characters of the column, i.e. all the special characters that are in use in that column at the moment, and then do a sequential replace of all those characters, e.g.
update pages set slug = lower(replace(replace(replace(replace(name, ' ', ''), '-', ''), '.', ''), '&', '')); # replacing just space, -, ., & only
.
This is only advisable on a known set of data, otherwise it's
trivial for some special characters to slip past with a
blacklist approach instead of a whitelist approach.
Obviously, the simplest way is to pre-validate the data outside of sql due to the lack of robust built-in whitelisting (e.g. via a regex replace).

I needed to get only alphabetic characters of a string in a procedure, and did:
SET #source = "whatever you want";
SET #target = '';
SET #i = 1;
SET #len = LENGTH(#source);
WHILE #i <= #len DO
SET #char = SUBSTRING(#source, #i, 1);
IF ((ORD(#char) >= 65 && ORD(#char) <= 90) || (ORD(#char) >= 97 && ORD(#char) <= 122)) THEN
SET #target = CONCAT(#target, #char);
END IF;
SET #i = #i + 1;
END WHILE;

Needed to replace non-alphanumeric characters rather than remove non-alphanumeric characters so I have created this based on Ryan Shillington's alphanum. Works for strings up to 255 characters in length
DROP FUNCTION IF EXISTS alphanumreplace;
DELIMITER |
CREATE FUNCTION alphanumreplace( str CHAR(255), d CHAR(32) ) RETURNS CHAR(255)
BEGIN
DECLARE i, len SMALLINT DEFAULT 1;
DECLARE ret CHAR(32) DEFAULT '';
DECLARE c CHAR(1);
SET len = CHAR_LENGTH( str );
REPEAT
BEGIN
SET c = MID( str, i, 1 );
IF c REGEXP '[[:alnum:]]' THEN SET ret=CONCAT(ret,c);
ELSE SET ret=CONCAT(ret,d);
END IF;
SET i = i + 1;
END;
UNTIL i > len END REPEAT;
RETURN ret;
END |
DELIMITER ;
Example:
select 'hello world!',alphanum('hello world!'),alphanumreplace('hello world!','-');
+--------------+--------------------------+-------------------------------------+
| hello world! | alphanum('hello world!') | alphanumreplace('hello world!','-') |
+--------------+--------------------------+-------------------------------------+
| hello world! | helloworld | hello-world- |
+--------------+--------------------------+-------------------------------------+
You'll need to add the alphanum function seperately if you want that, I just have it here for the example.

I tried a few solutions but at the end used replace. My data set is part numbers and I fairly know what to expect. But just for sanity, I used PHP to build the long query:
$dirty = array(' ', '-', '.', ',', ':', '?', '/', '!', '&', '#');
$query = 'part_no';
foreach ($dirty as $dirt) {
$query = "replace($query,'$dirt','')";
}
echo $query;
This outputs something I used to get a headache from:
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(part_no,' ',''),'-',''),'.',''),',',''),':',''),'?',''),'/',''),'!',''),'&',''),'#','')

if you are using php then....
try{
$con = new PDO ("mysql:host=localhost;dbname=dbasename","root","");
}
catch(PDOException $e){
echo "error".$e-getMessage();
}
$select = $con->prepare("SELECT * FROM table");
$select->setFetchMode(PDO::FETCH_ASSOC);
$select->execute();
while($data=$select->fetch()){
$id = $data['id'];
$column = $data['column'];
$column = preg_replace("/[^a-zA-Z0-9]+/", " ", $column); //remove all special characters
$update = $con->prepare("UPDATE table SET column=:column WHERE id='$id'");
$update->bindParam(':column', $column );
$update->execute();
// echo $column."<br>";
}

the alphanum function (self answered) have a bug, but I don't know why.
For text "cas synt ls 75W140 1L" return "cassyntls75W1401", "L" from the end is missing some how.
Now I use
delimiter //
DROP FUNCTION IF EXISTS alphanum //
CREATE FUNCTION alphanum(prm_strInput varchar(255))
RETURNS VARCHAR(255)
DETERMINISTIC
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE v_char VARCHAR(1);
DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';
WHILE (i <= LENGTH(prm_strInput) ) DO
SET v_char = SUBSTR(prm_strInput,i,1);
IF v_char REGEXP '^[A-Za-z0-9]+$' THEN
SET v_parseStr = CONCAT(v_parseStr,v_char);
END IF;
SET i = i + 1;
END WHILE;
RETURN trim(v_parseStr);
END
//
(found on google)

Probably a silly suggestion compared to others:
if(!preg_match("/^[a-zA-Z0-9]$/",$string)){
$sortedString=preg_replace("/^[a-zA-Z0-9]+$/","",$string);
}

Related

Mysql search & replace with random characters

I have in text some URLS src="https://example.com/public/images/someimage.jpg?itok=WDGFySy"
I need remove in every url this garbage token ?itok=WDGFySy, all tokens obviously are random :).
I try do it directly in database like this:
UPDATE wp_posts SET post_content = REPLACE(post_content, 'itok=[[:xdigit:]]{8}', '') WHERE post_content LIKE 'itok=[[:xdigit:]]{8}
';
But i cant find any of this tokens like this. LIKE this [a-fA-F0-9]{8} also wont help. Any advice? Thank you for any suggestions.
You can only use regex if you have MySQL 8.0.
However, if your links are in separate fields, it is possible to create and use SP to clean them up:
DELIMITER $$
CREATE FUNCTION CLEANUP(
aString VARCHAR(255)
, aName VARCHAR(15)
)
RETURNS VARCHAR(255)
BEGIN
SET #u = SUBSTRING_INDEX(aString, '?', 1);
SET #q = SUBSTRING_INDEX(aString, '?', -1);
IF #q = aString THEN
RETURN aString; -- no '?' char found
ELSE
SET #f = LOCATE(CONCAT(aName, '='), #q);
SET #query = IF(#f > 1, SUBSTR(#q, 1, #f - 1), '');
IF #f > 0 THEN
SET #t = LOCATE('&', #q, #f + LENGTH(aName) + 1);
IF #t > 0 THEN
SET #query = CONCAT(#query, SUBSTR(#q, #t + 1));
END IF;
END IF;
IF #query = '' THEN
RETURN #u;
ELSE
RETURN CONCAT(#u, '?', TRIM('&' FROM #query));
END IF;
END IF;
END
$$
DELIMITER ;
Then:
SELECT
CLEANUP('https://example.com/public/images/someimage.jpg?itok=WDGFySy&b=2', 'itok')
, CLEANUP('https://example.com/public/images/someimage.jpg?itok=WDGFySy', 'itok')
, CLEANUP('https://example.com/public/images/someimage.jpg?a=1&itok=WDGFySy', 'itok')
, CLEANUP('https://example.com/public/images/someimage.jpg?a=1&itok=WDGFySy&b=2', 'itok')
;

MySQL: SUM() of trailing number in VARCHAR [duplicate]

I'm looking to find records in a table that match a specific number that the user enters. So, the user may enter 12345, but this could be 123zz4-5 in the database.
I imagine something like this would work, if PHP functions worked in MySQL.
SELECT * FROM foo WHERE preg_replace("/[^0-9]/","",bar) = '12345'
What's the equivalent function or way to do this with just MySQL?
Speed is not important.
I realise that this is an ancient topic but upon googling this problem I couldn't find a simple solution (I saw the venerable agents but think this is a simpler solution) so here's a function I wrote, seems to work quite well.
DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
WHILE iterator < (LENGTH(input) + 1) DO
IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END
$$
You can easily do what you want with REGEXP_REPLACE (compatible with MySQL 8+ and MariaDB 10.0.5+)
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
Go to REGEXP_REPLACE doc: MySQL or MariaDB
Try it:
SELECT REGEXP_REPLACE('123asd12333', '[a-zA-Z]+', '');
Output:
12312333
Updated 2022: as per Marlom's answer, you can now use REGEX_REPLACE - which will perform even better than my historical answer here.
Most upvoted answer above isn't the fastest.
Full kudos to them for giving a working proposal to bounce off!
This is an improved version:
DELIMITER ;;
DROP FUNCTION IF EXISTS `STRIP_NON_DIGIT`;;
CREATE DEFINER=`root`#`localhost` FUNCTION `STRIP_NON_DIGIT`(input VARCHAR(255)) RETURNS VARCHAR(255) CHARSET utf8
READS SQL DATA
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
DECLARE lastDigit INT DEFAULT 1;
DECLARE len INT;
SET len = LENGTH(input) + 1;
WHILE iterator < len DO
-- skip past all digits
SET lastDigit = iterator;
WHILE ORD(SUBSTRING(input, iterator, 1)) BETWEEN 48 AND 57 AND iterator < len DO
SET iterator = iterator + 1;
END WHILE;
IF iterator != lastDigit THEN
SET output = CONCAT(output, SUBSTRING(input, lastDigit, iterator - lastDigit));
END IF;
WHILE ORD(SUBSTRING(input, iterator, 1)) NOT BETWEEN 48 AND 57 AND iterator < len DO
SET iterator = iterator + 1;
END WHILE;
END WHILE;
RETURN output;
END;;
Testing 5000 times on a test server:
-- original
Execution Time : 7.389 sec
Execution Time : 7.257 sec
Execution Time : 7.506 sec
-- ORD between not string IN
Execution Time : 4.031 sec
-- With less substrings
Execution Time : 3.243 sec
Execution Time : 3.415 sec
Execution Time : 2.848 sec
While it's not pretty and it shows results that don't match, this helps:
SELECT * FROM foo WHERE bar LIKE = '%1%2%3%4%5%'
I would still like to find a better solution similar to the item in the original question.
There's no regexp replace, only a plain string REPLACE().
MySQL has the REGEXP operator, but it's only a match tester not a replacer, so you would have to turn the logic inside-out:
SELECT * FROM foo WHERE bar REGEXP '[^0-9]*1[^0-9]*2[^0-9]*3[^0-9]*4[^0-9]*5[^0-9]*';
This is like your version with LIKE but matches more accurately. Both will perform equally badly, needing a full table scan without indexes.
On MySQL 8.0+ there is a new native function called REGEXP_REPLACE. A clean solution to this question would be:
SELECT * FROM foo WHERE REGEXP_REPLACE(bar,'[^0-9]+',"") = '12345'
The simplest way I can think to do it is to use the MySQL REGEXP operator a la:
WHERE foo LIKE '1\D*2\D*3\D*4\D*5'
It's not especially pretty but MySQL doesn't have a preg_replace function so I think it's the best you're going to get.
Personally, if this only-numeric data is so important, I'd keep a separate field just to contain the stripped data. It'll make your lookups a lot faster than with the regular expression search.
This blog post details how to strip non-numeric characters from a string via a MySQL function:
SELECT NumericOnly("asdf11asf");
returns 11
http://venerableagents.wordpress.com/2011/01/29/mysql-numeric-functions/
There's no regex replace as far as I'm concerned, but I found this solution;
--Create a table with numbers
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO ints (i) VALUES
( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), (10),
(11), (12), (13), (14), (15), (16), (17), (18), (19), (20);
--Then extract the numbers from the specified column
SELECT
bar,
GROUP_CONCAT(SUBSTRING(bar, i, 1) ORDER BY i SEPARATOR '')
FROM foo
JOIN ints ON i BETWEEN 1 AND LENGTH(bar)
WHERE
SUBSTRING(bar, i, 1) IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
GROUP BY bar;
It works for me and I use MySQL 5.0
Also I found this place that could help.
I have a similar situation, matching products to barcodes where the barcode doesn't store none alpha numerics sometimes, so 102.2234 in the DB needs to be found when searching for 1022234.
In the end I just added a new field, reference_number to the products tables, and have php strip out the none alpha numerics in the product_number to populate reference_number whenever a new products is added.
You'd need to do a one time scan of the table to create all the reference_number fields for existing products.
You can then setup your index, even if speed is not a factor for this operation, it is still a good idea to keep the database running well so this query doesn't bog it down and slow down other queries.
I came across this solution. The top answer by user1467716 will work in phpMyAdmin with a small change: add a second delimiter tag to the end of the code.
phpMyAdmin version is 4.1.14; MySQL version 5.6.20
I also added a length limiter using
DECLARE count INT DEFAULT 0; in the declarations
AND count < 5 in the WHILE statement
SET COUNT=COUNT+1; in the IF statement
Final form:
DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
DECLARE count INT DEFAULT 0;
WHILE iterator < (LENGTH(input) + 1) AND count < 5 DO --limits to 5 chars
IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
SET COUNT=COUNT+1;
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END
$$
DELIMITER $$ --added this
How big is table with foo? If it is small, and speed really doesn't matter, you might pull the row ID and foo, loop over it using the PHP replace functions to compare, and then pull the info you want by row number.
Of course, if the table is too big, this won't work well.
try this example. this is used for phone numbers, however you can modify it for your needs.
-- function removes non numberic characters from input
-- returne only the numbers in the string
CREATE DEFINER =`root`#`localhost` FUNCTION `remove_alpha`(inputPhoneNumber VARCHAR(50))
RETURNS VARCHAR(50)
CHARSET latin1
DETERMINISTIC
BEGIN
DECLARE inputLenght INT DEFAULT 0;
-- var for our iteration
DECLARE counter INT DEFAULT 1;
-- if null is passed, we still return an tempty string
DECLARE sanitizedText VARCHAR(50) DEFAULT '';
-- holder of each character during the iteration
DECLARE oneChar VARCHAR(1) DEFAULT '';
-- we'll process only if it is not null.
IF NOT ISNULL(inputPhoneNumber)
THEN
SET inputLenght = LENGTH(inputPhoneNumber);
WHILE counter <= inputLenght DO
SET oneChar = SUBSTRING(inputPhoneNumber, counter, 1);
IF (oneChar REGEXP ('^[0-9]+$'))
THEN
SET sanitizedText = Concat(sanitizedText, oneChar);
END IF;
SET counter = counter + 1;
END WHILE;
END IF;
RETURN sanitizedText;
END
to use this user defined function (UDF).
let's say you have a column of phone numbers:
col1
(513)983-3983
1-838-338-9898
phone983-889-8383
select remove_alpha(col1) from mytable
The result would be;
5139833983
18383389898
9838898383
thought I would share this since I built it off the function from here. I rearranged just so I can read it easier (I'm just server side).
You call it by passing in a table name and column name to have it strip all existing non-numeric characters from that column. I inherited a lot of bad table structures that put a ton of int fields as varchar so I needed a way to clean these up quickly before I can modify the column to an integer.
drop procedure if exists strip_non_numeric_characters;
DELIMITER ;;
CREATE PROCEDURE `strip_non_numeric_characters`(
tablename varchar(100)
,columnname varchar(100)
)
BEGIN
-- =============================================
-- Author: <Author,,David Melton>
-- Create date: <Create Date,,2/26/2019>
-- Description: <Description,,loops through data and strips out the bad characters in whatever table and column you pass it>
-- =============================================
#this idea was generated from the idea STRIP_NON_DIGIT function
#https://stackoverflow.com/questions/287105/mysql-strip-non-numeric-characters-to-compare
declare input,output varchar(255);
declare iterator,lastDigit,len,counter int;
declare date_updated varchar(100);
select column_name
into date_updated
from information_schema.columns
where table_schema = database()
and extra rlike 'on update CURRENT_TIMESTAMP'
and table_name = tablename
limit 1;
#only goes up to 255 so people don't run this for a longtext field
#just to be careful, i've excluded columns that are part of keys, that could potentially mess something else up
set #find_column_length =
concat("select character_maximum_length
into #len
from information_schema.columns
where table_schema = '",database(),"'
and column_name = '",columnname,"'
and table_name = '",tablename,"'
and length(ifnull(character_maximum_length,100)) < 255
and data_type in ('char','varchar')
and column_key = '';");
prepare stmt from #find_column_length;
execute stmt;
deallocate prepare stmt;
set counter = 1;
set len = #len;
while counter <= ifnull(len,1) DO
#this just removes it by putting all the characters before and after the character i'm looking at
#you have to start at the end of the field otherwise the lengths don't stay in order and you have to run it multiple times
set #update_query =
concat("update `",tablename,"`
set `",columnname,"` = concat(substring(`",columnname,"`,1,",len - counter,"),SUBSTRING(`",columnname,"`,",len - counter,",",counter - 1,"))
",if(date_updated is not null,concat(",`",date_updated,"` = `",date_updated,"`
"),''),
"where SUBSTRING(`",columnname,"`,",len - counter,", 1) not REGEXP '^[0-9]+$';");
prepare stmt from #update_query;
execute stmt;
deallocate prepare stmt;
set counter = counter + 1;
end while;
END ;;
DELIMITER ;
To search for numbers that match a particular numeric pattern in a string, first remove all the alphabets and special characters in a similar manner as below then convert the value to an integer and then search
SELECT *
FROM foo
WHERE Convert(Regexp_replace(bar, '[a-zA-Z]+', ''), signed) = 12345
I think you don't need complicated functions for that.
I've found a REGEXP_REPLACE solution using built-in mysql character class names. You can read about them in a table in the docs. Basically, they are mysql-specific names for commonly matched groups of characters like [:alnum:] for alpha-numeric characters, [:alpha:] for only alphabetic characters and so on.
So my version of the REGEXP_REPLACE:
REGEXP_REPLACE('My number is: +59 (29) 889-23-56', '[[:alpha:][:blank:][:punct:][:cntrl:]]', '')
will yield 59298892356 as per requirements.
Being someone that has version 5.7, doesn't have the privilege to create functions, and it's not practical to bring my data down into my code, I found Nelson Miranda's Answer amazing. I wanted to share it in a subquery version which I found more useful.
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO ints (i) VALUES
( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), (10),
(11), (12), (13), (14), (15), (16), (17), (18), (19), (20);
SELECT * FROM foo f
WHERE (SELECT GROUP_CONCAT(SUBSTRING(f.bar, i, 1) ORDER BY i SEPARATOR '')
FROM ints
WHERE i BETWEEN 1 AND LENGTH(f.bar)
AND SUBSTRING(f.bar, i, 1) IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')) = '12345'
Note: The ints table can be a temporary table.
If you are on MySQL 5.7 or below, and you just need something quick and dirty without having to define a new function, and you have a small number of known non-numerics to filter out, something like this can work well...
Select replace(replace(replace(replace(replace(phone, '+1', ''), '(', ''), ')', ''), '-', ''), ' ', '') from customers;

mysql replace regular expression what ever in the braces in a string

Below are the few rows of a column data in my mysql database
Data
test(victoryyyyy)king
java(vaaaarrrryy)side
(vkittt)sap
flex(vuuuuu)
k(vhhhhyyy)kk(abcd)k
In all rows there is random string that starts with
(v
and ends with
)
My task :- I have to replace all string from '(v' to ')' with empty space ( that is ' ') I shouldn't touch other values in the braces , in the above case I should not replace (abcd) in the last row
I mean for the above example the result should be
test king
java side
sap
flex
kkk(abcd)k
Could any one please help me ?
Thank You
Regards
Kiran
Mysql doesn't support regexes for replace tasks.
So you can only use string functions to find and substr necessary part.
I wrote my own function for this and it is working . Thanks every one .
drop FUNCTION replace_v;
CREATE FUNCTION replace_v (village varchar(100)) RETURNS varchar(100)
DETERMINISTIC
BEGIN
DECLARE a INT Default 0 ;
DECLARE lengthofstring INT Default 0 ;
DECLARE returnString varchar(100) Default '' ;
DECLARE charString varchar(100) Default '' ;
DECLARE found char(1) default 'N';
declare seccharString varchar(100) Default '' ;
set lengthofstring = length(village);
simple_loop: LOOP
SET a=a+1;
set charString=substr(village,a,1);
if(charString = '(') then
set seccharString=substr(village,a+1,1);
if( seccharString = 'v' || seccharString = 'V' || seccharString = 'p' || seccharString = 'P'
|| seccharString = 'm' || seccharString = 'M' ) then
set found='y';
end if;
end if ;
if(found='n') then
set returnString = concat (returnString,charString);
end if;
if(charString = ')') then
set found='n';
end if ;
IF a>=lengthofstring THEN
LEAVE simple_loop;
END IF;
END LOOP simple_loop;
if ( found = 'y') then
set returnString =village;
end if;
RETURN (replace( returnString,'&', ' '));
END

Case-insensitive REPLACE in MySQL?

MySQL runs pretty much all string comparisons under the default collation... except the REPLACE command. I have a case-insensitive collation and need to run a case-insensitive REPLACE. Is there any way to force REPLACE to use the current collation rather than always doing case-sensitive comparisons? I'm willing to upgrade my MySQL (currently running 5.1) to get added functionality...
mysql> charset utf8 collation utf8_unicode_ci;
Charset changed
mysql> select 'abc' like '%B%';
+------------------+
| 'abc' like '%B%' |
+------------------+
| 1 |
+------------------+
mysql> select replace('aAbBcC', 'a', 'f');
+-----------------------------+
| replace('aAbBcC', 'a', 'f') |
+-----------------------------+
| fAbBcC | <--- *NOT* 'ffbBcC'
+-----------------------------+
If replace(lower()) doesn't work, you'll need to create another function.
My 2 cents.
Since many people have migrated from MySQL to MariaDB, those people will have available a new function called REGEXP_REPLACE. Use it as you would a normal replace, but the pattern is a regular expression.
This is a working example:
UPDATE `myTable`
SET `myField` = REGEXP_REPLACE(`myField`, '(?i)my insensitive string', 'new string')
WHERE `myField` REGEXP '(?i)my insensitive string'
The option (?i) makes all the subsequent matches case insensitive (if put at the beginning of the pattern like I have then it all is insensitive).
See here for more information: https://mariadb.com/kb/en/mariadb/pcre/
Edit: as of MySQL 8.0 you can now use the regexp_replace function too, see documentation: https://dev.mysql.com/doc/refman/8.0/en/regexp.html
Alternative function for one spoken by fvox.
DELIMITER |
CREATE FUNCTION case_insensitive_replace ( REPLACE_WHERE text, REPLACE_THIS text, REPLACE_WITH text )
RETURNS text
DETERMINISTIC
BEGIN
DECLARE last_occurency int DEFAULT '1';
IF LCASE(REPLACE_THIS) = LCASE(REPLACE_WITH) OR LENGTH(REPLACE_THIS) < 1 THEN
RETURN REPLACE_WHERE;
END IF;
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
BEGIN
SET last_occurency = Locate(LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE));
SET REPLACE_WHERE = Insert( REPLACE_WHERE, last_occurency, LENGTH(REPLACE_THIS), REPLACE_WITH);
SET last_occurency = last_occurency + LENGTH(REPLACE_WITH);
END;
END WHILE;
RETURN REPLACE_WHERE;
END;
|
DELIMITER ;
Small test:
SET #str = BINARY 'New York';
SELECT case_insensitive_replace(#str, 'y', 'K');
Answers: New Kork
This modification of Luist's answer allows one to replace the needle with a differently cased version of the needle (two lines change).
DELIMITER |
CREATE FUNCTION case_insensitive_replace ( REPLACE_WHERE text, REPLACE_THIS text, REPLACE_WITH text )
RETURNS text
DETERMINISTIC
BEGIN
DECLARE last_occurency int DEFAULT '1';
IF LENGTH(REPLACE_THIS) < 1 THEN
RETURN REPLACE_WHERE;
END IF;
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
BEGIN
SET last_occurency = Locate(LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency);
SET REPLACE_WHERE = Insert( REPLACE_WHERE, last_occurency, LENGTH(REPLACE_THIS), REPLACE_WITH);
SET last_occurency = last_occurency + LENGTH(REPLACE_WITH);
END;
END WHILE;
RETURN REPLACE_WHERE;
END;
|
DELIMITER ;
I went with http://pento.net/2009/02/15/case-insensitive-replace-for-mysql/ (in fvox's answer) which performs the case insensitive search with case sensitive replacement and without changing the case of what should be unaffected characters elsewhere in the searched string.
N.B. the comment further down that same page stating that CHAR(255) should be changed to VARCHAR(255) - this seemed to be required for me as well.
In the previous answers, and the pento.net link, the arguments to LOCATE() are lower-cased.
This is a waste of resources, as LOCATE is case-insensitive by default:
mysql> select locate('el', 'HELLo');
+-----------------------+
| locate('el', 'HELLo') |
+-----------------------+
| 2 |
+-----------------------+
You can replace
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
with
WHILE Locate(REPLACE_THIS, REPLACE_WHERE, last_occurency ) > 0 DO
etc.
In case of 'special' characters there is unexpected behaviour:
SELECT case_insensitive_replace('A', 'Ã', 'a')
Gives
a
Which is unexpected... since we only want to replace the à not A
What is even more weird:
SELECT LOCATE('Ã', 'A');
gives
0
Which is the correct result... seems to have to do with encoding of the parameters of the stored procedure...
I like to use a search and replace function I created when I need to replace without worrying about the case of the original or search strings. This routine bails out quickly if you pass in an empty/null search string or a null replace string without altering the incoming string. I also added a safe count down just in case somehow the search keep looping. This way we don't get stuck in a loop forever. Alter the starting number if you think it is too low.
delimiter //
DROP FUNCTION IF EXISTS `replace_nocase`//
CREATE FUNCTION `replace_nocase`(raw text, find_str varchar(1000), replace_str varchar(1000)) RETURNS text
CHARACTER SET utf8
DETERMINISTIC
BEGIN
declare ret text;
declare len int;
declare hit int;
declare safe int;
if find_str is null or find_str='' or replace_str is null then
return raw;
end if;
set safe=10000;
set ret=raw;
set len=length(find_str);
set hit=LOCATE(find_str,ret);
while hit>0 and safe>0 do
set ret=concat(substring(ret,1,hit-1),replace_str,substring(ret,hit+len));
set hit=LOCATE(find_str,ret,hit+1);
set safe=safe-1;
end while;
return ret;
END//
This question is a bit old but I ran into the same problem and the answers given didn't allow me to solve it entirely.
I wanted the result to retain the case of the original string.
So I made a small modification to the replace_ci function proposed by fvox :
DELIMITER $$
DROP FUNCTION IF EXISTS `replace_ci`$$
CREATE FUNCTION `replace_ci` (str TEXT, needle CHAR(255), str_rep CHAR(255))
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE return_str TEXT DEFAULT '';
DECLARE lower_str TEXT;
DECLARE lower_needle TEXT;
DECLARE tmp_needle TEXT;
DECLARE str_origin_char CHAR(1);
DECLARE str_rep_char CHAR(1);
DECLARE final_str_rep TEXT DEFAULT '';
DECLARE pos INT DEFAULT 1;
DECLARE old_pos INT DEFAULT 1;
DECLARE needle_pos INT DEFAULT 1;
IF needle = '' THEN
RETURN str;
END IF;
SELECT LOWER(str) INTO lower_str;
SELECT LOWER(needle) INTO lower_needle;
SELECT LOCATE(lower_needle, lower_str, pos) INTO pos;
WHILE pos > 0 DO
SELECT substr(str, pos, char_length(needle)) INTO tmp_needle;
SELECT '' INTO final_str_rep;
SELECT 1 INTO needle_pos;
WHILE needle_pos <= char_length(tmp_needle) DO
SELECT substr(tmp_needle, needle_pos, 1) INTO str_origin_char;
SELECT SUBSTR(str_rep, needle_pos, 1) INTO str_rep_char;
SELECT CONCAT(final_str_rep, IF(BINARY str_origin_char = LOWER(str_origin_char), LOWER(str_rep_char), IF(BINARY str_origin_char = UPPER(str_origin_char), UPPER(str_rep_char), str_rep_char))) INTO final_str_rep;
SELECT (needle_pos + 1) INTO needle_pos;
END WHILE;
SELECT CONCAT(return_str, SUBSTR(str, old_pos, pos - old_pos), final_str_rep) INTO return_str;
SELECT pos + CHAR_LENGTH(needle) INTO pos;
SELECT pos INTO old_pos;
SELECT LOCATE(lower_needle, lower_str, pos) INTO pos;
END WHILE;
SELECT CONCAT(return_str, SUBSTR(str, old_pos, CHAR_LENGTH(str))) INTO return_str;
RETURN return_str;
END$$
DELIMITER ;
Example of use :
SELECT replace_ci( 'MySQL', 'm', 'e' ) as replaced;
Will return :
| replaced |
| --- |
| EySQL |

MySQL strip non-numeric characters to compare

I'm looking to find records in a table that match a specific number that the user enters. So, the user may enter 12345, but this could be 123zz4-5 in the database.
I imagine something like this would work, if PHP functions worked in MySQL.
SELECT * FROM foo WHERE preg_replace("/[^0-9]/","",bar) = '12345'
What's the equivalent function or way to do this with just MySQL?
Speed is not important.
I realise that this is an ancient topic but upon googling this problem I couldn't find a simple solution (I saw the venerable agents but think this is a simpler solution) so here's a function I wrote, seems to work quite well.
DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
WHILE iterator < (LENGTH(input) + 1) DO
IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END
$$
You can easily do what you want with REGEXP_REPLACE (compatible with MySQL 8+ and MariaDB 10.0.5+)
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string expr that match the regular expression specified by the pattern pat with the replacement string repl, and returns the resulting string. If expr, pat, or repl is NULL, the return value is NULL.
Go to REGEXP_REPLACE doc: MySQL or MariaDB
Try it:
SELECT REGEXP_REPLACE('123asd12333', '[a-zA-Z]+', '');
Output:
12312333
Updated 2022: as per Marlom's answer, you can now use REGEX_REPLACE - which will perform even better than my historical answer here.
Most upvoted answer above isn't the fastest.
Full kudos to them for giving a working proposal to bounce off!
This is an improved version:
DELIMITER ;;
DROP FUNCTION IF EXISTS `STRIP_NON_DIGIT`;;
CREATE DEFINER=`root`#`localhost` FUNCTION `STRIP_NON_DIGIT`(input VARCHAR(255)) RETURNS VARCHAR(255) CHARSET utf8
READS SQL DATA
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
DECLARE lastDigit INT DEFAULT 1;
DECLARE len INT;
SET len = LENGTH(input) + 1;
WHILE iterator < len DO
-- skip past all digits
SET lastDigit = iterator;
WHILE ORD(SUBSTRING(input, iterator, 1)) BETWEEN 48 AND 57 AND iterator < len DO
SET iterator = iterator + 1;
END WHILE;
IF iterator != lastDigit THEN
SET output = CONCAT(output, SUBSTRING(input, lastDigit, iterator - lastDigit));
END IF;
WHILE ORD(SUBSTRING(input, iterator, 1)) NOT BETWEEN 48 AND 57 AND iterator < len DO
SET iterator = iterator + 1;
END WHILE;
END WHILE;
RETURN output;
END;;
Testing 5000 times on a test server:
-- original
Execution Time : 7.389 sec
Execution Time : 7.257 sec
Execution Time : 7.506 sec
-- ORD between not string IN
Execution Time : 4.031 sec
-- With less substrings
Execution Time : 3.243 sec
Execution Time : 3.415 sec
Execution Time : 2.848 sec
While it's not pretty and it shows results that don't match, this helps:
SELECT * FROM foo WHERE bar LIKE = '%1%2%3%4%5%'
I would still like to find a better solution similar to the item in the original question.
There's no regexp replace, only a plain string REPLACE().
MySQL has the REGEXP operator, but it's only a match tester not a replacer, so you would have to turn the logic inside-out:
SELECT * FROM foo WHERE bar REGEXP '[^0-9]*1[^0-9]*2[^0-9]*3[^0-9]*4[^0-9]*5[^0-9]*';
This is like your version with LIKE but matches more accurately. Both will perform equally badly, needing a full table scan without indexes.
On MySQL 8.0+ there is a new native function called REGEXP_REPLACE. A clean solution to this question would be:
SELECT * FROM foo WHERE REGEXP_REPLACE(bar,'[^0-9]+',"") = '12345'
The simplest way I can think to do it is to use the MySQL REGEXP operator a la:
WHERE foo LIKE '1\D*2\D*3\D*4\D*5'
It's not especially pretty but MySQL doesn't have a preg_replace function so I think it's the best you're going to get.
Personally, if this only-numeric data is so important, I'd keep a separate field just to contain the stripped data. It'll make your lookups a lot faster than with the regular expression search.
This blog post details how to strip non-numeric characters from a string via a MySQL function:
SELECT NumericOnly("asdf11asf");
returns 11
http://venerableagents.wordpress.com/2011/01/29/mysql-numeric-functions/
There's no regex replace as far as I'm concerned, but I found this solution;
--Create a table with numbers
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO ints (i) VALUES
( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), (10),
(11), (12), (13), (14), (15), (16), (17), (18), (19), (20);
--Then extract the numbers from the specified column
SELECT
bar,
GROUP_CONCAT(SUBSTRING(bar, i, 1) ORDER BY i SEPARATOR '')
FROM foo
JOIN ints ON i BETWEEN 1 AND LENGTH(bar)
WHERE
SUBSTRING(bar, i, 1) IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')
GROUP BY bar;
It works for me and I use MySQL 5.0
Also I found this place that could help.
I have a similar situation, matching products to barcodes where the barcode doesn't store none alpha numerics sometimes, so 102.2234 in the DB needs to be found when searching for 1022234.
In the end I just added a new field, reference_number to the products tables, and have php strip out the none alpha numerics in the product_number to populate reference_number whenever a new products is added.
You'd need to do a one time scan of the table to create all the reference_number fields for existing products.
You can then setup your index, even if speed is not a factor for this operation, it is still a good idea to keep the database running well so this query doesn't bog it down and slow down other queries.
I came across this solution. The top answer by user1467716 will work in phpMyAdmin with a small change: add a second delimiter tag to the end of the code.
phpMyAdmin version is 4.1.14; MySQL version 5.6.20
I also added a length limiter using
DECLARE count INT DEFAULT 0; in the declarations
AND count < 5 in the WHILE statement
SET COUNT=COUNT+1; in the IF statement
Final form:
DROP FUNCTION IF EXISTS STRIP_NON_DIGIT;
DELIMITER $$
CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE output VARCHAR(255) DEFAULT '';
DECLARE iterator INT DEFAULT 1;
DECLARE count INT DEFAULT 0;
WHILE iterator < (LENGTH(input) + 1) AND count < 5 DO --limits to 5 chars
IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
SET COUNT=COUNT+1;
END IF;
SET iterator = iterator + 1;
END WHILE;
RETURN output;
END
$$
DELIMITER $$ --added this
How big is table with foo? If it is small, and speed really doesn't matter, you might pull the row ID and foo, loop over it using the PHP replace functions to compare, and then pull the info you want by row number.
Of course, if the table is too big, this won't work well.
try this example. this is used for phone numbers, however you can modify it for your needs.
-- function removes non numberic characters from input
-- returne only the numbers in the string
CREATE DEFINER =`root`#`localhost` FUNCTION `remove_alpha`(inputPhoneNumber VARCHAR(50))
RETURNS VARCHAR(50)
CHARSET latin1
DETERMINISTIC
BEGIN
DECLARE inputLenght INT DEFAULT 0;
-- var for our iteration
DECLARE counter INT DEFAULT 1;
-- if null is passed, we still return an tempty string
DECLARE sanitizedText VARCHAR(50) DEFAULT '';
-- holder of each character during the iteration
DECLARE oneChar VARCHAR(1) DEFAULT '';
-- we'll process only if it is not null.
IF NOT ISNULL(inputPhoneNumber)
THEN
SET inputLenght = LENGTH(inputPhoneNumber);
WHILE counter <= inputLenght DO
SET oneChar = SUBSTRING(inputPhoneNumber, counter, 1);
IF (oneChar REGEXP ('^[0-9]+$'))
THEN
SET sanitizedText = Concat(sanitizedText, oneChar);
END IF;
SET counter = counter + 1;
END WHILE;
END IF;
RETURN sanitizedText;
END
to use this user defined function (UDF).
let's say you have a column of phone numbers:
col1
(513)983-3983
1-838-338-9898
phone983-889-8383
select remove_alpha(col1) from mytable
The result would be;
5139833983
18383389898
9838898383
thought I would share this since I built it off the function from here. I rearranged just so I can read it easier (I'm just server side).
You call it by passing in a table name and column name to have it strip all existing non-numeric characters from that column. I inherited a lot of bad table structures that put a ton of int fields as varchar so I needed a way to clean these up quickly before I can modify the column to an integer.
drop procedure if exists strip_non_numeric_characters;
DELIMITER ;;
CREATE PROCEDURE `strip_non_numeric_characters`(
tablename varchar(100)
,columnname varchar(100)
)
BEGIN
-- =============================================
-- Author: <Author,,David Melton>
-- Create date: <Create Date,,2/26/2019>
-- Description: <Description,,loops through data and strips out the bad characters in whatever table and column you pass it>
-- =============================================
#this idea was generated from the idea STRIP_NON_DIGIT function
#https://stackoverflow.com/questions/287105/mysql-strip-non-numeric-characters-to-compare
declare input,output varchar(255);
declare iterator,lastDigit,len,counter int;
declare date_updated varchar(100);
select column_name
into date_updated
from information_schema.columns
where table_schema = database()
and extra rlike 'on update CURRENT_TIMESTAMP'
and table_name = tablename
limit 1;
#only goes up to 255 so people don't run this for a longtext field
#just to be careful, i've excluded columns that are part of keys, that could potentially mess something else up
set #find_column_length =
concat("select character_maximum_length
into #len
from information_schema.columns
where table_schema = '",database(),"'
and column_name = '",columnname,"'
and table_name = '",tablename,"'
and length(ifnull(character_maximum_length,100)) < 255
and data_type in ('char','varchar')
and column_key = '';");
prepare stmt from #find_column_length;
execute stmt;
deallocate prepare stmt;
set counter = 1;
set len = #len;
while counter <= ifnull(len,1) DO
#this just removes it by putting all the characters before and after the character i'm looking at
#you have to start at the end of the field otherwise the lengths don't stay in order and you have to run it multiple times
set #update_query =
concat("update `",tablename,"`
set `",columnname,"` = concat(substring(`",columnname,"`,1,",len - counter,"),SUBSTRING(`",columnname,"`,",len - counter,",",counter - 1,"))
",if(date_updated is not null,concat(",`",date_updated,"` = `",date_updated,"`
"),''),
"where SUBSTRING(`",columnname,"`,",len - counter,", 1) not REGEXP '^[0-9]+$';");
prepare stmt from #update_query;
execute stmt;
deallocate prepare stmt;
set counter = counter + 1;
end while;
END ;;
DELIMITER ;
To search for numbers that match a particular numeric pattern in a string, first remove all the alphabets and special characters in a similar manner as below then convert the value to an integer and then search
SELECT *
FROM foo
WHERE Convert(Regexp_replace(bar, '[a-zA-Z]+', ''), signed) = 12345
I think you don't need complicated functions for that.
I've found a REGEXP_REPLACE solution using built-in mysql character class names. You can read about them in a table in the docs. Basically, they are mysql-specific names for commonly matched groups of characters like [:alnum:] for alpha-numeric characters, [:alpha:] for only alphabetic characters and so on.
So my version of the REGEXP_REPLACE:
REGEXP_REPLACE('My number is: +59 (29) 889-23-56', '[[:alpha:][:blank:][:punct:][:cntrl:]]', '')
will yield 59298892356 as per requirements.
Being someone that has version 5.7, doesn't have the privilege to create functions, and it's not practical to bring my data down into my code, I found Nelson Miranda's Answer amazing. I wanted to share it in a subquery version which I found more useful.
DROP TABLE IF EXISTS ints;
CREATE TABLE ints (i INT UNSIGNED NOT NULL PRIMARY KEY);
INSERT INTO ints (i) VALUES
( 1), ( 2), ( 3), ( 4), ( 5), ( 6), ( 7), ( 8), ( 9), (10),
(11), (12), (13), (14), (15), (16), (17), (18), (19), (20);
SELECT * FROM foo f
WHERE (SELECT GROUP_CONCAT(SUBSTRING(f.bar, i, 1) ORDER BY i SEPARATOR '')
FROM ints
WHERE i BETWEEN 1 AND LENGTH(f.bar)
AND SUBSTRING(f.bar, i, 1) IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')) = '12345'
Note: The ints table can be a temporary table.
If you are on MySQL 5.7 or below, and you just need something quick and dirty without having to define a new function, and you have a small number of known non-numerics to filter out, something like this can work well...
Select replace(replace(replace(replace(replace(phone, '+1', ''), '(', ''), ')', ''), '-', ''), ' ', '') from customers;