REGEX in mysql query - mysql

i have a table with address as column.
values for address is "#12-3/98 avenue street", which has numbers, special characters and alphabets.
i want to write my sql query usng regex to remove special characters from the address value
ex: "12398avenuestreet" will be the value i want after removing the special characters
thank you.

maybe this function help you
CREATE FUNCTION strip_non_alpha(
_dirty_string varchar(40)
)
RETURNS varchar(40)
BEGIN
DECLARE _length int;
DECLARE _position int;
DECLARE _current_char varchar(1);
DECLARE _clean_string varchar(40);
SET _clean_string = '';
SET _length = LENGTH(_dirty_string);
SET _position = 1;
WHILE _position <= _length DO
SET _current_char = SUBSTRING(_dirty_string, _position, 1);
IF _current_char REGEXP '[A-Za-z0-9]' THEN
SET _clean_string = CONCAT(_clean_string, _current_char);
END IF;
SET _position = _position + 1;
END WHILE;
RETURN CONCAT('', _clean_string);
END;
so you need to call this like
update mytable set address = strip_non_alpha(address);

You don't need RegExp for simple character replacement.
MySQL string functions

Unfortunately, MySQL regular expressions are "match only", you cannot do a replace in your query. This leaves you with only something like this (witch is very-very stupid):
SELECT REPLACE(REPLACE(address, '?', ''), '#', '') -- and many many other nested replaces
FROM table
Or put this logic inside your application (the best option here).

MySQL regular expressions is only for pattern matching and not replacing, so your best bet is to create a function or a repetative use of Replace().

As far as I know, it is not possible to replace via MySQL regex, since these functions are only used for matching.
Alternatively, you can use MySQL Replace for this:
SELECT REPLACE(REPLACE(REPLACE(REPLACE(address, '#', ''), '-', ''), '/', ''), ' ', '') FROM table;
Which will remove #, -, / and spaces and result in the string you want.

You may use this MySQL UDF. And then simply,
update my_table set my_column = PREG_REPLACE('/[^A-Za-z0-9]/' , '' , my_column);

Related

Oracle INSTR replacement in MySQL

Requirements: Before, I used instr() in Oracle to achieve the requirements, but now I want to use MySQL to achieve the same effect, and try to use the functions in MySQL to achieve it.
INSTR(A.SOME_THING.B,".",1,2)<>0 --ORACLE
As far as I can tell, that's not that difficult for simple cases. But, as number of parameters raises, MySQL "replacement" for the same Oracle functionality gets worse.
As your code:
instr(some_thing, '.', 1, 2)
means
search through some_thing
for a dot
starting from the first position
and find dot's second occurrence
you can't do that in a simple manner using MySQL, as you'll need a user-defined function. Something like this (source is INSTR Function - Oracle to MySQL Migration; I suggest you have a look at the whole document. I'm posting code here because links might get broken):
DELIMITER //
CREATE FUNCTION INSTR4 (p_str VARCHAR(8000), p_substr VARCHAR(255),
p_start INT, p_occurrence INT)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE v_found INT DEFAULT p_occurrence;
DECLARE v_pos INT DEFAULT p_start;
lbl:
WHILE 1=1
DO
-- Find the next occurrence
SET v_pos = LOCATE(p_substr, p_str, v_pos);
-- Nothing found
IF v_pos IS NULL OR v_pos = 0 THEN
RETURN v_pos;
END IF;
-- The required occurrence found
IF v_found = 1 THEN
LEAVE lbl;
END IF;
-- Prepare to find another one occurrence
SET v_found = v_found - 1;
SET v_pos = v_pos + 1;
END WHILE;
RETURN v_pos;
END;
//
DELIMITER ;
Use it as
SELECT INSTR4('abcbcb', 'b', 3, 2);
and get 6 as a result.
In OracleDB the code
INSTR(column, ".", 1, 2) <> 0 --ORACLE
checks does the column contains at least 2 point chars in the value.
In MySQL this can be replaced with, for example,
LENGTH(column) - LENGTH(REPLACE(column, '.', '')) >= 2

Understanding SQL Language

Can someone help me understand this query better? I want to remove all special characters from my string but I don't understand how to apply it to my own query. I found this query on Stackoverflow and it seems to work for some people. I'm assuming #str is my string name but I don't know what #expres stands for. And do I need a select/from statement?
DECLARE #str VARCHAR(400)
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!]%'
SET #str = '(remove) ~special~ *characters. from string in sql!'
WHILE PATINDEX( #expres, #str ) > 0
SET #str = Replace(REPLACE( #str, SUBSTRING( #str, PATINDEX( #expres, #str ), 1 ),''),'-',' ')
From the code above there are a couple of things you need to understand first
# in sql is a form of variable declaration, meaning your assigning a value to that name
#express in this case is the list of characters you want to remove from the string. so anything inside the [] will be searched for in the next section of the code
PATINDEX is a function that will search through your #string to see if theres any matches with what you put in #express. IF there is, it will return the index of the start of the match.
putting this condition inside the WHILE means that it will loop through the #string until there is no match, meaning all matches have been removed
The final SET line is where the removal happens. This is accomplished using REPLACE.
REPLACE takes 3 arguments; the string you are searching through, in this case #string, the pattern you are trying to replace, in this case #expres and finally what you will replace it with, in this case ' ' and '-'
The SUBSTRING inside the REPLACE is trying to find the first thing it wants to replace. to do this it need to find where the pattern starts, therefor it uses PATINDEX to find where the index of it is
I hope that was clear enough. you can find the documentation for SUBSTRING PATINDEX and REPLACE here
If you analyze the SQL you will see that you are stripping characters from the #str variable. Therefore you need to set it to the value you want the characters to be stripped from.
use SBLReporting
DECLARE #str VARCHAR(400)
SELECT #str = name from bbnet.customerrelationship --here you set the #str variable to your desired value
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!]%'
WHILE PATINDEX( #expres, #str ) > 0 SET #str = Replace(REPLACE( #str, SUBSTRING( #str, PATINDEX( # expres, # str ), 1 ),''),'-',' ')
SELECT #str -- this will be your stripped value
You need to enter the string to be stripped of its special characters as #str.
The snippet DECLARE #str VARCHAR(400) simply says that #str is a variable of type varchar that can contain up to 400 characters.
Lets suppose your string is "Special Characters Are Things Like $##%" your code would be:
DECLARE #str VARCHAR(400)
DECLARE #expres VARCHAR(50) = '%[~,#,#,$,%,&,*,(,),.,!]%'
SET #str = 'Special Characters Are Things Like $##%'
WHILE PATINDEX( #expres, #str ) > 0
SET #str = Replace(REPLACE( #str, SUBSTRING( #str, PATINDEX( #expres, #str ), 1 ),''),'-',' ')
Finally execute SELECT #str and you should see "Special Characters Are Things Like " as your output.

MySQL replace ASCII characters

I have some weird problem with MySQL. I am trying to match two strings, but in one string there is extra characters.
Initial string looks like this 'Ascot '
When I select:
select ascii(substring(name, 1, 1)), ascii(substring(name, 7, 1))
I get 194, 194. But when I replace:
select replace(name, char(194), '' )
it shows as '?Ascot?' in phpMyAdmin and no matching is done. Can someone please help me with this?
Problem column is defined as utf8mb4_unicode_ci. I am trying to match this with column from another table defined as utf8_general_ci.
Tried to change utf8_general_ci to utf8mb4_unicode_ci but with no results.
When I do substring(name, 2, 5) then it matches. So the solution should be to replace those characters.
EDIT:
I tried the following function to remove non alphanumeric characters and it seems to work now:
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE v_char VARCHAR(1);
DECLARE v_parseStr VARCHAR(255) DEFAULT ' ';
WHILE (i <= LENGTH(prm_strInput) ) DO
SET v_char = SUBSTR(prm_strInput,i,1);
IF v_char REGEXP '^[A-Za-z0-9 ]+$' THEN #alphanumeric
SET v_parseStr = CONCAT(v_parseStr,v_char);
END IF;
SET i = i + 1;
END WHILE;
RETURN trim(v_parseStr);
END
But this is extremely inefficient...
Use this before your mysql query:
mysql_query("SET NAMES 'utf8'");

MySQL selecting string with multi special characters

I'm having a problem selecting strings from database. The problem is if you have +(123)-4 56-7 in row and if you are searching with a string 1234567 it wouldn't find any results. Any suggestions?
You can use the REPLACE() method to remove special characters in mysql, don't know if it's very efficient though. But it should work.
There is already another thread in SO which covers a very similar question, see here.
If it is always this kind of pattern you're searching, and your table is rather large, I advice against REPLACE() or REGEX() - which ofc will do the job if tweaked properly.
Better add a column with the plain phone numbers, which doesn't contain any formatting character data at all - or even better, a hash of the phone numbers. This way, you could add an index to the new column and search against this. From a database perspective, this is much easier, and much faster.
You can use User Defined Function to get Numeric values from string.
CREATE FUNCTION GetNumeric (val varchar(255)) RETURNS tinyint
RETURN val REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
CREATE FUNCTION GetNumeric (val VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE idx INT DEFAULT 0;
IF ISNULL(val) THEN RETURN NULL; END IF;
IF LENGTH(val) = 0 THEN RETURN ""; END IF;
SET idx = LENGTH(val);
WHILE idx > 0 DO
IF IsNumeric(SUBSTRING(val,idx,1)) = 0 THEN
SET val = REPLACE(val,SUBSTRING(val,idx,1),"");
SET idx = LENGTH(val)+1;
END IF;
SET idx = idx - 1;
END WHILE;
RETURN val;
END;
Then
Select columns from table
where GetNumeric(phonenumber) like %1234567%;
Query using replace function as -
select * from phoneTable where replace(replace(replace(phone, '+', ''), '-', ''), ')', '(') LIKE '%123%'

MySQL find_in_set with multiple search string

I find that find_in_set only search by a single string :-
find_in_set('a', 'a,b,c,d')
In the above example, 'a' is the only string used for search.
Is there any way to use find_in_set kind of functionality and search by multiple strings, like :-
find_in_set('a,b,c', 'a,b,c,d')
In the above example, I want to search by three strings 'a,b,c'.
One way I see is using OR
find_in_set('a', 'a,b,c,d') OR find_in_set('b', 'a,b,c,d') OR find_in_set('b', 'a,b,c,d')
Is there any other way than this?
there is no native function to do it, but you can achieve your aim using following trick
WHERE CONCAT(",", `setcolumn`, ",") REGEXP ",(val1|val2|val3),"
The MySQL function find_in_set() can search only for one string in a set of strings.
The first argument is a string, so there is no way to make it parse your comma separated string into strings (you can't use commas in SET elements at all!). The second argument is a SET, which in turn is represented by a comma separated string hence your wish to find_in_set('a,b,c', 'a,b,c,d') which works fine, but it surely can't find a string 'a,b,c' in any SET by definition - it contains commas.
You can also use this custom function
CREATE FUNCTION SPLIT_STR(
x VARCHAR(255),
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');
DELIMITER $$
CREATE FUNCTION `FIND_SET_EQUALS`(`s1` VARCHAR(200), `s2` VARCHAR(200))
RETURNS TINYINT(1)
LANGUAGE SQL
BEGIN
DECLARE a INT Default 0 ;
DECLARE isEquals TINYINT(1) Default 0 ;
DECLARE str VARCHAR(255);
IF s1 IS NOT NULL AND s2 IS NOT NULL THEN
simple_loop: LOOP
SET a=a+1;
SET str= SPLIT_STR(s2,",",a);
IF str='' THEN
LEAVE simple_loop;
END IF;
#Do check is in set
IF FIND_IN_SET(str, s1)=0 THEN
SET isEquals=0;
LEAVE simple_loop;
END IF;
SET isEquals=1;
END LOOP simple_loop;
END IF;
RETURN isEquals;
END;
$$
DELIMITER ;
SELECT FIND_SET_EQUALS('a,c,b', 'a,b,c')- 1
SELECT FIND_SET_EQUALS('a,c', 'a,b,c')- 0
SELECT FIND_SET_EQUALS(null, 'a,b,c')- 0
Wow, I'm surprised no one ever mentioned this here.In a nutshell, If you know the order of your members, then just query in a single bitwise operation.
SELECT * FROM example_table WHERE (example_set & mbits) = mbits;
Explanation:
If we had a set that has members in this order: "HTML", "CSS", "PHP", "JS"... etc.
That's how they're interpreted in MySQL:
"HTML" = 0001 = 1
"CSS" = 0010 = 2
"PHP" = 0100 = 4
"JS" = 1000 = 16
So for example, if you want to query all rows that have "HTML" and "CSS" in their sets, then you'll write
SELECT * FROM example_table WHERE (example_set & 3) = 3;
Because 0011 is 3 which is both 0001 "HTML" and 0010 "CSS".
Your sets can still be queried using the other methods like REGEXP , LIKE, FIND_IN_SET(), and so on. Use whatever you need.
Amazing answer by #Pavel Perminov! - And also nice comment by #doru for dynamically check..
From there what I have made for PHP code CONCAT(',','" . $country_lang_id . "', ',') REGEXP CONCAT(',(', REPLACE(YourColumnName, ',', '|'), '),') this below query may be useful for someone who is looking for ready code for PHP.
$country_lang_id = "1,2";
$sql = "select a.* from tablename a where CONCAT(',','" . $country_lang_id . "', ',') REGEXP CONCAT(',(', REPLACE(a.country_lang_id, ',', '|'), '),') ";
You can also use the like command for instance:
where setcolumn like '%a,b%'
or
where 'a,b,c,d' like '%b,c%'
which might work in some situations.
you can use in to find match values from two values
SELECT * FROM table WHERE myvals in (a,b,c,d)