mysql SELECT part of string with Regex (find and extract number) [duplicate] - mysql

I have a MySQL database and I have a query as:
SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'
This detects all originaltexts which have numbers with 2 digits in it.
I need MySQL to return those numbers as a field, so i can manipulate them further.
Ideally, if I can add additional criteria that is should be > 20 would be great, but i can do that separately as well.

If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
If you want something more simpler, you could alter this function to suit better your needs.
CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE s INT DEFAULT 1;
DECLARE e INT;
DECLARE adjustStart TINYINT DEFAULT 1;
DECLARE adjustEnd TINYINT DEFAULT 1;
-- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '$'
-- Of course, if those are already there, don't add them, but change the method of extraction accordingly.
IF LEFT(exp, 1) = '^' THEN
SET adjustStart = 0;
ELSE
SET exp = CONCAT('^', exp);
END IF;
IF RIGHT(exp, 1) = '$' THEN
SET adjustEnd = 0;
ELSE
SET exp = CONCAT(exp, '$');
END IF;
-- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
-- Bail out of the loops early if the original expression started with '^' or ended with '$', since that means the pointers can't move
WHILE (s <= LENGTH(string)) DO
SET e = LENGTH(string);
WHILE (e >= s) DO
IF SUBSTRING(string, s, e) REGEXP exp THEN
RETURN SUBSTRING(string, s, e);
END IF;
IF adjustEnd THEN
SET e = e - 1;
ELSE
SET e = s - 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
IF adjustStart THEN
SET s = s + 1;
ELSE
SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
RETURN NULL;
END

There isn't any syntax in MySQL for extracting text using regular expressions. You can use the REGEXP to identify the rows containing two consecutive digits, but to extract them you have to use the ordinary string manipulation functions which is very difficult in this case.
Alternatives:
Select the entire value from the database then use a regular expression on the client.
Use a different database that has better support for the SQL standard (may not be an option, I know). Then you can use this: SUBSTRING(originaltext from '%#[0-9]{2}#%' for '#').

I think the cleaner way is using REGEXP_SUBSTR():
This extracts exactly two any digits:
SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;
This extracts exactly two digits, but from 20-99 (example: 1112 return null; 1521 returns 52):
SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;
I test both in v8.0 and they work. That's all, good luck!

I'm having the same issue, and this is the solution I found (but it won't work in all cases) :
use LOCATE() to find the beginning and the end of the string you wan't to match
use MID() to extract the substring in between...
keep the regexp to match only the rows where you are sure to find a match.

I used my code as a Stored Procedure (Function), shall work to extract any number built from digits in a single block. This is a part of my wider library.
DELIMITER $$
-- 2013.04 michal#glebowski.pl
-- FindNumberInText("ab 234 95 cd", TRUE) => 234
-- FindNumberInText("ab 234 95 cd", FALSE) => 95
DROP FUNCTION IF EXISTS FindNumberInText$$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
DECLARE _r VARCHAR(32) DEFAULT '';
DECLARE _i INTEGER DEFAULT 1;
DECLARE _start INTEGER DEFAULT 0;
DECLARE _IsCharNumeric BOOLEAN;
IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
_loop: REPEAT
SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
IF _IsCharNumeric THEN
IF _start = 0 THEN SET _start = _i; END IF;
ELSE
IF _start > 0 THEN LEAVE _loop; END IF;
END IF;
SET _i = _i + 1;
UNTIL _i > length(_input) END REPEAT;
IF _start > 0 THEN
SET _r = MID(_input, _start, _i - _start);
IF NOT _fromLeft THEN SET _r = REVERSE(_r); END IF;
END IF;
RETURN _r;
END$$

If you want to return a part of a string :
SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;
Locate() will return the starting postion of the matching string which becomes starting position of Function Substring()

I know it's been quite a while since this question was asked but came across it and thought it would be a good challenge for my custom regex replacer - see this blog post.
...And the good news is it can, although it needs to be called quite a few times. See this online rextester demo, which shows the workings that got to the SQL below.
SELECT reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(txt,
'[^0-9]+',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'([0-9]{3,}|,[0-9],)',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^[0-9],',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',[0-9]$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',{2,}',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^,',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
) AS `csv`
FROM tbl;

Related

Oracle INSTR replacement in MySQL

Requirements: Before, I used instr() in Oracle to achieve the requirements, but now I want to use MySQL to achieve the same effect, and try to use the functions in MySQL to achieve it.
INSTR(A.SOME_THING.B,".",1,2)<>0 --ORACLE
As far as I can tell, that's not that difficult for simple cases. But, as number of parameters raises, MySQL "replacement" for the same Oracle functionality gets worse.
As your code:
instr(some_thing, '.', 1, 2)
means
search through some_thing
for a dot
starting from the first position
and find dot's second occurrence
you can't do that in a simple manner using MySQL, as you'll need a user-defined function. Something like this (source is INSTR Function - Oracle to MySQL Migration; I suggest you have a look at the whole document. I'm posting code here because links might get broken):
DELIMITER //
CREATE FUNCTION INSTR4 (p_str VARCHAR(8000), p_substr VARCHAR(255),
p_start INT, p_occurrence INT)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE v_found INT DEFAULT p_occurrence;
DECLARE v_pos INT DEFAULT p_start;
lbl:
WHILE 1=1
DO
-- Find the next occurrence
SET v_pos = LOCATE(p_substr, p_str, v_pos);
-- Nothing found
IF v_pos IS NULL OR v_pos = 0 THEN
RETURN v_pos;
END IF;
-- The required occurrence found
IF v_found = 1 THEN
LEAVE lbl;
END IF;
-- Prepare to find another one occurrence
SET v_found = v_found - 1;
SET v_pos = v_pos + 1;
END WHILE;
RETURN v_pos;
END;
//
DELIMITER ;
Use it as
SELECT INSTR4('abcbcb', 'b', 3, 2);
and get 6 as a result.
In OracleDB the code
INSTR(column, ".", 1, 2) <> 0 --ORACLE
checks does the column contains at least 2 point chars in the value.
In MySQL this can be replaced with, for example,
LENGTH(column) - LENGTH(REPLACE(column, '.', '')) >= 2

How to make uppercase only the odd indexes of a string in MySQL?

I'm trying to make only the odd indexes of a string in uppercase (whereas the even indexes to be in lowercase) in MySQL.
For example: StackOverflow -> StAcKoVeRfLoW or hello -> HeLlO.
I found a way to this by extracting a letter at a time using the mid function, then concatenating based on which index the letter is at:
SET #x='hello';
SELECT #x as Initial,
Concat(ucase(mid(#x,1,1)),lcase(mid(#x,2,1)),ucase(mid(#x,3,1)),lcase(mid(#x,4,1)),ucase(mid(#x,5,1)))
as Final;
However I'm interested if there is a way to simplify this, since if the string would be larger then we would have some problems. So basically is there a way to modify it to something like:
Concat(ucase(mid(#x,odd index,1)),lcase(mid(#x,even index,1)))?
This is probably most simply done in your application, but can be achieved in MySQL. For MySQL 8+ you can use a recursive CTE to extract the individual letters from the string and GROUP_CONCAT to put them back together, changing the case on an alternating basis:
WITH RECURSIVE INITIAL AS (
SELECT 'StackOverflow' AS x
),
CTE AS (
SELECT 1 AS upper, SUBSTRING(x, 1, 1) AS letter, SUBSTRING(x, 2) AS remainder
FROM INITIAL
UNION ALL
SELECT 1 - upper, SUBSTRING(remainder, 1, 1), SUBSTRING(remainder, 2)
FROM CTE
WHERE LENGTH(remainder) > 0
)
SELECT GROUP_CONCAT(CASE WHEN upper THEN UPPER(letter) ELSE LOWER(letter) END SEPARATOR '') AS new
FROM CTE
Output:
StAcKoVeRfLoW
In versions lower than 8, you can use a user-defined function:
DELIMITER //
CREATE FUNCTION AlterCase(initial TEXT)
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE l CHAR(1);
DECLARE new TEXT DEFAULT '';
WHILE i <= LENGTH(initial) DO
SET l = SUBSTRING(initial, i, 1);
SET new = CONCAT(new,
CASE WHEN i % 2 = 1 THEN UPPER(l) ELSE LOWER(l) END);
SET i = i + 1;
END WHILE;
RETURN new;
END //
DELIMITER ;
And call it as
SELECT AlterCase('StackOverflow')
Output:
StAcKoVeRfLoW
Note the function will work in MySQL 8+ too.
Demo on dbfiddle

MySQL selecting string with multi special characters

I'm having a problem selecting strings from database. The problem is if you have +(123)-4 56-7 in row and if you are searching with a string 1234567 it wouldn't find any results. Any suggestions?
You can use the REPLACE() method to remove special characters in mysql, don't know if it's very efficient though. But it should work.
There is already another thread in SO which covers a very similar question, see here.
If it is always this kind of pattern you're searching, and your table is rather large, I advice against REPLACE() or REGEX() - which ofc will do the job if tweaked properly.
Better add a column with the plain phone numbers, which doesn't contain any formatting character data at all - or even better, a hash of the phone numbers. This way, you could add an index to the new column and search against this. From a database perspective, this is much easier, and much faster.
You can use User Defined Function to get Numeric values from string.
CREATE FUNCTION GetNumeric (val varchar(255)) RETURNS tinyint
RETURN val REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
CREATE FUNCTION GetNumeric (val VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE idx INT DEFAULT 0;
IF ISNULL(val) THEN RETURN NULL; END IF;
IF LENGTH(val) = 0 THEN RETURN ""; END IF;
SET idx = LENGTH(val);
WHILE idx > 0 DO
IF IsNumeric(SUBSTRING(val,idx,1)) = 0 THEN
SET val = REPLACE(val,SUBSTRING(val,idx,1),"");
SET idx = LENGTH(val)+1;
END IF;
SET idx = idx - 1;
END WHILE;
RETURN val;
END;
Then
Select columns from table
where GetNumeric(phonenumber) like %1234567%;
Query using replace function as -
select * from phoneTable where replace(replace(replace(phone, '+', ''), '-', ''), ')', '(') LIKE '%123%'

How to extract two consecutive digits from a text field in MySQL?

I have a MySQL database and I have a query as:
SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'
This detects all originaltexts which have numbers with 2 digits in it.
I need MySQL to return those numbers as a field, so i can manipulate them further.
Ideally, if I can add additional criteria that is should be > 20 would be great, but i can do that separately as well.
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
If you want something more simpler, you could alter this function to suit better your needs.
CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE s INT DEFAULT 1;
DECLARE e INT;
DECLARE adjustStart TINYINT DEFAULT 1;
DECLARE adjustEnd TINYINT DEFAULT 1;
-- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '$'
-- Of course, if those are already there, don't add them, but change the method of extraction accordingly.
IF LEFT(exp, 1) = '^' THEN
SET adjustStart = 0;
ELSE
SET exp = CONCAT('^', exp);
END IF;
IF RIGHT(exp, 1) = '$' THEN
SET adjustEnd = 0;
ELSE
SET exp = CONCAT(exp, '$');
END IF;
-- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
-- Bail out of the loops early if the original expression started with '^' or ended with '$', since that means the pointers can't move
WHILE (s <= LENGTH(string)) DO
SET e = LENGTH(string);
WHILE (e >= s) DO
IF SUBSTRING(string, s, e) REGEXP exp THEN
RETURN SUBSTRING(string, s, e);
END IF;
IF adjustEnd THEN
SET e = e - 1;
ELSE
SET e = s - 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
IF adjustStart THEN
SET s = s + 1;
ELSE
SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
RETURN NULL;
END
There isn't any syntax in MySQL for extracting text using regular expressions. You can use the REGEXP to identify the rows containing two consecutive digits, but to extract them you have to use the ordinary string manipulation functions which is very difficult in this case.
Alternatives:
Select the entire value from the database then use a regular expression on the client.
Use a different database that has better support for the SQL standard (may not be an option, I know). Then you can use this: SUBSTRING(originaltext from '%#[0-9]{2}#%' for '#').
I think the cleaner way is using REGEXP_SUBSTR():
This extracts exactly two any digits:
SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;
This extracts exactly two digits, but from 20-99 (example: 1112 return null; 1521 returns 52):
SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;
I test both in v8.0 and they work. That's all, good luck!
I'm having the same issue, and this is the solution I found (but it won't work in all cases) :
use LOCATE() to find the beginning and the end of the string you wan't to match
use MID() to extract the substring in between...
keep the regexp to match only the rows where you are sure to find a match.
I used my code as a Stored Procedure (Function), shall work to extract any number built from digits in a single block. This is a part of my wider library.
DELIMITER $$
-- 2013.04 michal#glebowski.pl
-- FindNumberInText("ab 234 95 cd", TRUE) => 234
-- FindNumberInText("ab 234 95 cd", FALSE) => 95
DROP FUNCTION IF EXISTS FindNumberInText$$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
DECLARE _r VARCHAR(32) DEFAULT '';
DECLARE _i INTEGER DEFAULT 1;
DECLARE _start INTEGER DEFAULT 0;
DECLARE _IsCharNumeric BOOLEAN;
IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
_loop: REPEAT
SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
IF _IsCharNumeric THEN
IF _start = 0 THEN SET _start = _i; END IF;
ELSE
IF _start > 0 THEN LEAVE _loop; END IF;
END IF;
SET _i = _i + 1;
UNTIL _i > length(_input) END REPEAT;
IF _start > 0 THEN
SET _r = MID(_input, _start, _i - _start);
IF NOT _fromLeft THEN SET _r = REVERSE(_r); END IF;
END IF;
RETURN _r;
END$$
If you want to return a part of a string :
SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;
Locate() will return the starting postion of the matching string which becomes starting position of Function Substring()
I know it's been quite a while since this question was asked but came across it and thought it would be a good challenge for my custom regex replacer - see this blog post.
...And the good news is it can, although it needs to be called quite a few times. See this online rextester demo, which shows the workings that got to the SQL below.
SELECT reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(txt,
'[^0-9]+',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'([0-9]{3,}|,[0-9],)',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^[0-9],',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',[0-9]$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',{2,}',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^,',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
) AS `csv`
FROM tbl;

MySQL find_in_set with multiple search string

I find that find_in_set only search by a single string :-
find_in_set('a', 'a,b,c,d')
In the above example, 'a' is the only string used for search.
Is there any way to use find_in_set kind of functionality and search by multiple strings, like :-
find_in_set('a,b,c', 'a,b,c,d')
In the above example, I want to search by three strings 'a,b,c'.
One way I see is using OR
find_in_set('a', 'a,b,c,d') OR find_in_set('b', 'a,b,c,d') OR find_in_set('b', 'a,b,c,d')
Is there any other way than this?
there is no native function to do it, but you can achieve your aim using following trick
WHERE CONCAT(",", `setcolumn`, ",") REGEXP ",(val1|val2|val3),"
The MySQL function find_in_set() can search only for one string in a set of strings.
The first argument is a string, so there is no way to make it parse your comma separated string into strings (you can't use commas in SET elements at all!). The second argument is a SET, which in turn is represented by a comma separated string hence your wish to find_in_set('a,b,c', 'a,b,c,d') which works fine, but it surely can't find a string 'a,b,c' in any SET by definition - it contains commas.
You can also use this custom function
CREATE FUNCTION SPLIT_STR(
x VARCHAR(255),
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, '');
DELIMITER $$
CREATE FUNCTION `FIND_SET_EQUALS`(`s1` VARCHAR(200), `s2` VARCHAR(200))
RETURNS TINYINT(1)
LANGUAGE SQL
BEGIN
DECLARE a INT Default 0 ;
DECLARE isEquals TINYINT(1) Default 0 ;
DECLARE str VARCHAR(255);
IF s1 IS NOT NULL AND s2 IS NOT NULL THEN
simple_loop: LOOP
SET a=a+1;
SET str= SPLIT_STR(s2,",",a);
IF str='' THEN
LEAVE simple_loop;
END IF;
#Do check is in set
IF FIND_IN_SET(str, s1)=0 THEN
SET isEquals=0;
LEAVE simple_loop;
END IF;
SET isEquals=1;
END LOOP simple_loop;
END IF;
RETURN isEquals;
END;
$$
DELIMITER ;
SELECT FIND_SET_EQUALS('a,c,b', 'a,b,c')- 1
SELECT FIND_SET_EQUALS('a,c', 'a,b,c')- 0
SELECT FIND_SET_EQUALS(null, 'a,b,c')- 0
Wow, I'm surprised no one ever mentioned this here.In a nutshell, If you know the order of your members, then just query in a single bitwise operation.
SELECT * FROM example_table WHERE (example_set & mbits) = mbits;
Explanation:
If we had a set that has members in this order: "HTML", "CSS", "PHP", "JS"... etc.
That's how they're interpreted in MySQL:
"HTML" = 0001 = 1
"CSS" = 0010 = 2
"PHP" = 0100 = 4
"JS" = 1000 = 16
So for example, if you want to query all rows that have "HTML" and "CSS" in their sets, then you'll write
SELECT * FROM example_table WHERE (example_set & 3) = 3;
Because 0011 is 3 which is both 0001 "HTML" and 0010 "CSS".
Your sets can still be queried using the other methods like REGEXP , LIKE, FIND_IN_SET(), and so on. Use whatever you need.
Amazing answer by #Pavel Perminov! - And also nice comment by #doru for dynamically check..
From there what I have made for PHP code CONCAT(',','" . $country_lang_id . "', ',') REGEXP CONCAT(',(', REPLACE(YourColumnName, ',', '|'), '),') this below query may be useful for someone who is looking for ready code for PHP.
$country_lang_id = "1,2";
$sql = "select a.* from tablename a where CONCAT(',','" . $country_lang_id . "', ',') REGEXP CONCAT(',(', REPLACE(a.country_lang_id, ',', '|'), '),') ";
You can also use the like command for instance:
where setcolumn like '%a,b%'
or
where 'a,b,c,d' like '%b,c%'
which might work in some situations.
you can use in to find match values from two values
SELECT * FROM table WHERE myvals in (a,b,c,d)