Comparison on numeric part of string in MySQL - mysql

I have a field that contains version information such as:
V12.0
V1.0
BE0.50
VV24
I want to query for version greater then n. The number I want to do the comparison on is preceded by a non-fixed number of characters. Is it possible to do something like:
SELECT version FROM table WHERE int_part(version) > 10
V12.0
VV24

In this case it seems that version number appears in end only, alphabets are in beginning. So, probable solution will be to reverse the string, type cast it to get numbers, then reverse back the number obtained.
Try following solution
SELECT version
FROM table
HAVING CAST(REVERSE(IF(LOCATE(".", version), (CAST(REVERSE(version) AS DECIMAL(4,2))), (CAST(REVERSE(version) AS UNSIGNED INTEGER)))) AS DECIMAL(4,2)) > 12;
Hope it helps...

Seem like you have to create you own function to do this:
CREATE FUNCTION IsNumeric (val varchar(255)) RETURNS tinyint
RETURN val REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
CREATE FUNCTION NumericOnly (val VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE idx INT DEFAULT 0;
IF ISNULL(val) THEN RETURN NULL; END IF;
IF LENGTH(val) = 0 THEN RETURN ""; END IF;
SET idx = LENGTH(val);
WHILE idx > 0 DO
IF IsNumeric(SUBSTRING(val,idx,1)) = 0 THEN
SET val = REPLACE(val,SUBSTRING(val,idx,1),"");
SET idx = LENGTH(val)+1;
END IF;
SET idx = idx - 1;
END WHILE;
RETURN val;
END;
select NumericOnly('vv24');
+---------------------+
| NumericOnly('vv24') |
+---------------------+
| 24 |
+---------------------+

Big downside of MySQL not supporting Regular Expressions in REPLACE method. If I have to do any matching of sorts, I usually pull all data to my PHP and process it there. In long run it is better solution. Of course you could always use User Defined Functions if you prefer.
http://dev.mysql.com/doc/refman/5.0/en/adding-udf.html
#shubhansh
Would 24 not change to 0.24 and failed > 12 ?

Related

mysql SELECT part of string with Regex (find and extract number) [duplicate]

I have a MySQL database and I have a query as:
SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'
This detects all originaltexts which have numbers with 2 digits in it.
I need MySQL to return those numbers as a field, so i can manipulate them further.
Ideally, if I can add additional criteria that is should be > 20 would be great, but i can do that separately as well.
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
If you want something more simpler, you could alter this function to suit better your needs.
CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE s INT DEFAULT 1;
DECLARE e INT;
DECLARE adjustStart TINYINT DEFAULT 1;
DECLARE adjustEnd TINYINT DEFAULT 1;
-- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '$'
-- Of course, if those are already there, don't add them, but change the method of extraction accordingly.
IF LEFT(exp, 1) = '^' THEN
SET adjustStart = 0;
ELSE
SET exp = CONCAT('^', exp);
END IF;
IF RIGHT(exp, 1) = '$' THEN
SET adjustEnd = 0;
ELSE
SET exp = CONCAT(exp, '$');
END IF;
-- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
-- Bail out of the loops early if the original expression started with '^' or ended with '$', since that means the pointers can't move
WHILE (s <= LENGTH(string)) DO
SET e = LENGTH(string);
WHILE (e >= s) DO
IF SUBSTRING(string, s, e) REGEXP exp THEN
RETURN SUBSTRING(string, s, e);
END IF;
IF adjustEnd THEN
SET e = e - 1;
ELSE
SET e = s - 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
IF adjustStart THEN
SET s = s + 1;
ELSE
SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
RETURN NULL;
END
There isn't any syntax in MySQL for extracting text using regular expressions. You can use the REGEXP to identify the rows containing two consecutive digits, but to extract them you have to use the ordinary string manipulation functions which is very difficult in this case.
Alternatives:
Select the entire value from the database then use a regular expression on the client.
Use a different database that has better support for the SQL standard (may not be an option, I know). Then you can use this: SUBSTRING(originaltext from '%#[0-9]{2}#%' for '#').
I think the cleaner way is using REGEXP_SUBSTR():
This extracts exactly two any digits:
SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;
This extracts exactly two digits, but from 20-99 (example: 1112 return null; 1521 returns 52):
SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;
I test both in v8.0 and they work. That's all, good luck!
I'm having the same issue, and this is the solution I found (but it won't work in all cases) :
use LOCATE() to find the beginning and the end of the string you wan't to match
use MID() to extract the substring in between...
keep the regexp to match only the rows where you are sure to find a match.
I used my code as a Stored Procedure (Function), shall work to extract any number built from digits in a single block. This is a part of my wider library.
DELIMITER $$
-- 2013.04 michal#glebowski.pl
-- FindNumberInText("ab 234 95 cd", TRUE) => 234
-- FindNumberInText("ab 234 95 cd", FALSE) => 95
DROP FUNCTION IF EXISTS FindNumberInText$$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
DECLARE _r VARCHAR(32) DEFAULT '';
DECLARE _i INTEGER DEFAULT 1;
DECLARE _start INTEGER DEFAULT 0;
DECLARE _IsCharNumeric BOOLEAN;
IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
_loop: REPEAT
SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
IF _IsCharNumeric THEN
IF _start = 0 THEN SET _start = _i; END IF;
ELSE
IF _start > 0 THEN LEAVE _loop; END IF;
END IF;
SET _i = _i + 1;
UNTIL _i > length(_input) END REPEAT;
IF _start > 0 THEN
SET _r = MID(_input, _start, _i - _start);
IF NOT _fromLeft THEN SET _r = REVERSE(_r); END IF;
END IF;
RETURN _r;
END$$
If you want to return a part of a string :
SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;
Locate() will return the starting postion of the matching string which becomes starting position of Function Substring()
I know it's been quite a while since this question was asked but came across it and thought it would be a good challenge for my custom regex replacer - see this blog post.
...And the good news is it can, although it needs to be called quite a few times. See this online rextester demo, which shows the workings that got to the SQL below.
SELECT reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(txt,
'[^0-9]+',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'([0-9]{3,}|,[0-9],)',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^[0-9],',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',[0-9]$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',{2,}',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^,',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
) AS `csv`
FROM tbl;

LIKE function for htmlspecialchars encoded row

i have summernote text editor and doing htmlspecialchars($VALUE) before insert to db, then htmlspecialchars_decode($VALUE) after getting to maintain text editor's changes...
But i also need to do search function for this row, so how to use LIKE function when row (VARCHAR) is encoded with htmlspecialchars()? is there any SQL function to strip those tags while selecting to perform LIKE function?
P.S
I'm using PDO and my query looks like:
$this->db->prepare("SELECT * FROM t_tasks WHERE a_text LIKE '%$value%'");
and here, t_text looks like
<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>
Noisy Disclaimer: the following works as designed, and addresses the question being asked, but that does not make it a good idea or an example of best practice.
Quite the contrary, I would suggest. Sargability is completely defeated, and as you can see from reviewing the code, I have to go through some needless juggling and gyrations, because SQL is simply not the right tool for this job.
But, I wrote this when I needed it for an environment where I had no option but to work with data that was stored with encoded HTML entities -- and for legacy reasons could not be changed.
It's a MySQL stored function that converts entities to their utf8-encoded equivalent character. For example:
mysql> SELECT decode_entities('I ♥ doing “clever” things.') AS decoded_string;
+----------------------------------+
| decoded_string |
+----------------------------------+
| I ♥ doing “clever” things. |
+----------------------------------+
1 row in set (0.00 sec)
So, for your query, if we wanted to test whether this...
<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>
...is LIKE '<p><i style=%'...
mysql> SELECT decode_entities('<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>') LIKE '<p><i style=%' AS this_matches;
+--------------+
| this_matches |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)
...we find that it is.
After defining the function, you'd use...
$stmt = $this->db->prepare("SELECT t.*, decoded_entities(t.a_text) AS a_text_decoded FROM t_tasks t WHERE decode_entities(t.a_text) LIKE CONCAT('%', :value, '%'));
Here's the function:
DELIMITER $$
DROP FUNCTION IF EXISTS `decode_entities` $$
CREATE FUNCTION `decode_entities`(str LONGTEXT charset utf8) RETURNS longtext CHARSET utf8
NO SQL
DETERMINISTIC
BEGIN
-- decode HTML entities in database strings.
-- this processing is somewhat intensive due to the fact that this is clearly not something the database is
-- necessarily optimal place to accomlish; because of this, the function is optimized to quickly return strings that can't possibly contain entities
-- otherwise, we walk the string, looking for & ... ; then checking the matched inner contents for numeric (&#nnn;) and hex (?) literals,
-- failing that, we search for a named entity in the static string; if we end up with a decimal value, we utf-8 encode that value and replace
-- the entity, in place, in the string, with the utf-8 character; then advance our character pointer by one and then try again.
-- if we can't successfully decipher something that looks like an entity, we leave it as it was
-- the ordering of the values in the "entities' blob (entities are case sensitive) is something of a performance consideration; it may be desirable
-- that the most likely encountered entities in a given application be placed first in the blob, because there is a performance difference
-- of perhaps 30 usec (on a 1 GHz Opteron) when matching the first one compared to matching the last one
-- copy/pasted from https://stackoverflow.com/a/49498332/1695906
IF str IS NULL OR str NOT LIKE '%&%;%' THEN
RETURN str;
END IF;
BEGIN
DECLARE entities BLOB DEFAULT 'AElig,198,Aacute,193,Acirc,194,Agrave,192,Alpha,913,Aring,197,Atilde,195,Auml,196,Beta,914,Ccedil,199,Chi,935,Dagger,8225,Delta,916,ETH,208,Eacute,201,Ecirc,202,Egrave,200,Epsilon,917,Eta,919,Euml,203,Gamma,915,Iacute,205,Icirc,206,Igrave,204,Iota,921,Iuml,207,Kappa,922,Lambda,923,Mu,924,Ntilde,209,Nu,925,OElig,338,Oacute,211,Ocirc,212,Ograve,210,Omega,937,Omicron,927,Oslash,216,Otilde,213,Ouml,214,Phi,934,Pi,928,Prime,8243,Psi,936,Rho,929,Scaron,352,Sigma,931,THORN,222,Tau,932,Theta,920,Uacute,218,Ucirc,219,Ugrave,217,Upsilon,933,Uuml,220,Xi,926,Yacute,221,Yuml,376,Zeta,918,aacute,225,acirc,226,acute,180,aelig,230,agrave,224,alefsym,8501,alpha,945,amp,38,and,8743,ang,8736,apos,39,aring,229,asymp,8776,atilde,227,auml,228,bdquo,8222,beta,946,brvbar,166,bull,8226,cap,8745,ccedil,231,cedil,184,cent,162,chi,967,circ,710,clubs,9827,cong,8773,copy,169,crarr,8629,cup,8746,curren,164,dArr,8659,dagger,8224,darr,8595,deg,176,delta,948,diams,9830,divide,247,eacute,233,ecirc,234,egrave,232,empty,8709,emsp,8195,ensp,8194,epsilon,949,equiv,8801,eta,951,eth,240,euml,235,euro,8364,exist,8707,fnof,402,forall,8704,frac12,189,frac14,188,frac34,190,frasl,8260,gamma,947,ge,8805,gt,62,hArr,8660,harr,8596,hearts,9829,hellip,8230,iacute,237,icirc,238,iexcl,161,igrave,236,image,8465,infin,8734,int,8747,iota,953,iquest,191,isin,8712,iuml,239,kappa,954,lArr,8656,lambda,955,lang,9001,laquo,171,larr,8592,lceil,8968,ldquo,8220,le,8804,lfloor,8970,lowast,8727,loz,9674,lrm,8206,lsaquo,8249,lsquo,8216,lt,60,macr,175,mdash,8212,micro,181,middot,183,minus,8722,mu,956,nabla,8711,nbsp,160,ndash,8211,ne,8800,ni,8715,not,172,notin,8713,nsub,8836,ntilde,241,nu,957,oacute,243,ocirc,244,oelig,339,ograve,242,oline,8254,omega,969,omicron,959,oplus,8853,or,8744,ordf,170,ordm,186,oslash,248,otilde,245,otimes,8855,ouml,246,para,182,part,8706,permil,8240,perp,8869,phi,966,pi,960,piv,982,plusmn,177,pound,163,prime,8242,prod,8719,prop,8733,psi,968,quot,34,rArr,8658,radic,8730,rang,9002,raquo,187,rarr,8594,rceil,8969,rdquo,8221,real,8476,reg,174,rfloor,8971,rho,961,rlm,8207,rsaquo,8250,rsquo,8217,sbquo,8218,scaron,353,sdot,8901,sect,167,shy,173,sigma,963,sigmaf,962,sim,8764,spades,9824,sub,8834,sube,8838,sum,8721,sup1,185,sup2,178,sup3,179,sup,8835,supe,8839,szlig,223,tau,964,there4,8756,theta,952,thetasym,977,thinsp,8201,thorn,254,tilde,732,times,215,trade,8482,uArr,8657,uacute,250,uarr,8593,ucirc,251,ugrave,249,uml,168,upsih,978,upsilon,965,uuml,252,weierp,8472,xi,958,yacute,253,yen,165,yuml,255,zeta,950,zwj,8205,zwnj,8204';
DECLARE len BIGINT UNSIGNED DEFAULT LENGTH(str);
DECLARE ptr BIGINT UNSIGNED DEFAULT 0;
DECLARE nxtamp BIGINT UNSIGNED DEFAULT NULL;
DECLARE nxtsem BIGINT UNSIGNED DEFAULT NULL;
DECLARE sbstr LONGTEXT DEFAULT NULL;
DECLARE decval SMALLINT UNSIGNED DEFAULT NULL;
DECLARE setpos SMALLINT UNSIGNED DEFAULT NULL;
DECLARE uenc TINYTEXT DEFAULT NULL;
walk:
LOOP
SET ptr = ptr + 1;
IF ptr >= len THEN
LEAVE walk;
END IF;
SET nxtamp = LOCATE('&',str,ptr);
IF NOT nxtamp THEN
LEAVE walk;
END IF;
SET nxtsem = LOCATE(';',str,ptr + 1);
IF NOT nxtsem THEN
LEAVE walk;
END IF;
IF nxtsem < nxtamp THEN
ITERATE walk;
END IF;
SET sbstr = SUBSTRING(str FROM nxtamp +1 FOR nxtsem - nxtamp - 1);
IF sbstr RLIKE '^#[0-9]+$' THEN
SET decval = TRIM(LEADING '#' FROM sbstr);
ELSEIF sbstr RLIKE '^#x[0-9a-f]+$' THEN
SET decval = CONV(TRIM(LEADING '#x' FROM sbstr),16,10);
ELSE
SET setpos = FIND_IN_SET(sbstr,entities);
IF setpos > 0 THEN
SET decval = SUBSTRING_INDEX(SUBSTRING_INDEX(entities,',',setpos + 1),',',-1);
ELSE
ITERATE walk;
END IF;
END IF;
IF (decval > 0) THEN
SET uenc = CHAR(CASE
WHEN decval <= 0x7F THEN decval
WHEN decval <= 0x7FF THEN 0xC080 | ((decval >> 6) << 8) | (decval & 0x3F)
WHEN decval <= 0xFFFF THEN 0xE08080 | (((decval >> 12) & 0x0F ) << 16) | (((decval >> 6) & 0x3F ) << 8) | (decval & 0x3F)
ELSE NULL END);
IF uenc IS NOT NULL AND LENGTH(uenc) > 0 THEN
SET str = INSERT(str, nxtamp, 1 + nxtsem - nxtamp, uenc);
END IF;
END IF;
END LOOP;
RETURN str;
END;
END $$
DELIMITER ;
(n.b. these things are not called "tags" -- they are "HTML entities.")
use binding param eg:
$stmt = $this->db->prepare("SELECT * FROM t_tasks WHERE a_text LIKE CONCAT('%', :value, '%'));
$stmt->bindParam(':value', $value);

MySQL selecting string with multi special characters

I'm having a problem selecting strings from database. The problem is if you have +(123)-4 56-7 in row and if you are searching with a string 1234567 it wouldn't find any results. Any suggestions?
You can use the REPLACE() method to remove special characters in mysql, don't know if it's very efficient though. But it should work.
There is already another thread in SO which covers a very similar question, see here.
If it is always this kind of pattern you're searching, and your table is rather large, I advice against REPLACE() or REGEX() - which ofc will do the job if tweaked properly.
Better add a column with the plain phone numbers, which doesn't contain any formatting character data at all - or even better, a hash of the phone numbers. This way, you could add an index to the new column and search against this. From a database perspective, this is much easier, and much faster.
You can use User Defined Function to get Numeric values from string.
CREATE FUNCTION GetNumeric (val varchar(255)) RETURNS tinyint
RETURN val REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
CREATE FUNCTION GetNumeric (val VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE idx INT DEFAULT 0;
IF ISNULL(val) THEN RETURN NULL; END IF;
IF LENGTH(val) = 0 THEN RETURN ""; END IF;
SET idx = LENGTH(val);
WHILE idx > 0 DO
IF IsNumeric(SUBSTRING(val,idx,1)) = 0 THEN
SET val = REPLACE(val,SUBSTRING(val,idx,1),"");
SET idx = LENGTH(val)+1;
END IF;
SET idx = idx - 1;
END WHILE;
RETURN val;
END;
Then
Select columns from table
where GetNumeric(phonenumber) like %1234567%;
Query using replace function as -
select * from phoneTable where replace(replace(replace(phone, '+', ''), '-', ''), ')', '(') LIKE '%123%'

mysql get ascii code dump for string

In MySQL, is there a way in a simple SELECT to obtain a sequence of ASCII code/code points for each character in a varchar value? I'm more familiar with Oracle, which has the DUMP function that can be used for this.
For example, select some_function('abcd') would return something like 96,97,98,99?
This is about the closest equivalent I'm aware of in MySQL:
mysql> select hex('abcd');
+-------------+
| hex('abcd') |
+-------------+
| 61626364 |
+-------------+
1 row in set (0.00 sec)
I don't know of a mysql_function that will do that, but you can take the string in php, convert to an array, then take the ordinal of the character.
$char_value_array = {};
foreach($mysql_fetched_str as $char)
array_push($char_value_array, ord($char)
You can create a function like this:
CREATE FUNCTION dump (s CHAR(20)) RETURNS CHAR(50) DETERMINISTIC
BEGIN
DECLARE result CHAR(50);
DECLARE i INT;
DECLARE l INT;
SET result = ASCII(SUBSTRING(s,1,1));
SET l = LENGTH(s);
SET i = 2;
WHILE (i <= l) DO
SET result = CONCAT(result, ',', ASCII(SUBSTRING(s,i,1)));
SET i = i + 1;
END WHILE;
RETURN result;
END;
And then use it in the SELECT:
SELECT dump('abcd') FROM test LIMIT 1
Increase CHAR(20) and CHAR(50) definitions if you need to use it with longer strings.

How to extract two consecutive digits from a text field in MySQL?

I have a MySQL database and I have a query as:
SELECT `id`, `originaltext` FROM `source` WHERE `originaltext` regexp '[0-9][0-9]'
This detects all originaltexts which have numbers with 2 digits in it.
I need MySQL to return those numbers as a field, so i can manipulate them further.
Ideally, if I can add additional criteria that is should be > 20 would be great, but i can do that separately as well.
If you want more regular expression power in your database, you can consider using LIB_MYSQLUDF_PREG. This is an open source library of MySQL user functions that imports the PCRE library. LIB_MYSQLUDF_PREG is delivered in source code form only. To use it, you'll need to be able to compile it and install it into your MySQL server. Installing this library does not change MySQL's built-in regex support in any way. It merely makes the following additional functions available:
PREG_CAPTURE extracts a regex match from a string. PREG_POSITION returns the position at which a regular expression matches a string. PREG_REPLACE performs a search-and-replace on a string. PREG_RLIKE tests whether a regex matches a string.
All these functions take a regular expression as their first parameter. This regular expression must be formatted like a Perl regular expression operator. E.g. to test if regex matches the subject case insensitively, you'd use the MySQL code PREG_RLIKE('/regex/i', subject). This is similar to PHP's preg functions, which also require the extra // delimiters for regular expressions inside the PHP string.
If you want something more simpler, you could alter this function to suit better your needs.
CREATE FUNCTION REGEXP_EXTRACT(string TEXT, exp TEXT)
-- Extract the first longest string that matches the regular expression
-- If the string is 'ABCD', check all strings and see what matches: 'ABCD', 'ABC', 'AB', 'A', 'BCD', 'BC', 'B', 'CD', 'C', 'D'
-- It's not smart enough to handle things like (A)|(BCD) correctly in that it will return the whole string, not just the matching token.
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE s INT DEFAULT 1;
DECLARE e INT;
DECLARE adjustStart TINYINT DEFAULT 1;
DECLARE adjustEnd TINYINT DEFAULT 1;
-- Because REGEXP matches anywhere in the string, and we only want the part that matches, adjust the expression to add '^' and '$'
-- Of course, if those are already there, don't add them, but change the method of extraction accordingly.
IF LEFT(exp, 1) = '^' THEN
SET adjustStart = 0;
ELSE
SET exp = CONCAT('^', exp);
END IF;
IF RIGHT(exp, 1) = '$' THEN
SET adjustEnd = 0;
ELSE
SET exp = CONCAT(exp, '$');
END IF;
-- Loop through the string, moving the end pointer back towards the start pointer, then advance the start pointer and repeat
-- Bail out of the loops early if the original expression started with '^' or ended with '$', since that means the pointers can't move
WHILE (s <= LENGTH(string)) DO
SET e = LENGTH(string);
WHILE (e >= s) DO
IF SUBSTRING(string, s, e) REGEXP exp THEN
RETURN SUBSTRING(string, s, e);
END IF;
IF adjustEnd THEN
SET e = e - 1;
ELSE
SET e = s - 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
IF adjustStart THEN
SET s = s + 1;
ELSE
SET s = LENGTH(string) + 1; -- ugh, such a hack to end it early
END IF;
END WHILE;
RETURN NULL;
END
There isn't any syntax in MySQL for extracting text using regular expressions. You can use the REGEXP to identify the rows containing two consecutive digits, but to extract them you have to use the ordinary string manipulation functions which is very difficult in this case.
Alternatives:
Select the entire value from the database then use a regular expression on the client.
Use a different database that has better support for the SQL standard (may not be an option, I know). Then you can use this: SUBSTRING(originaltext from '%#[0-9]{2}#%' for '#').
I think the cleaner way is using REGEXP_SUBSTR():
This extracts exactly two any digits:
SELECT REGEXP_SUBSTR(`originalText`,'[0-9]{2}') AS `twoDigits` FROM `source`;
This extracts exactly two digits, but from 20-99 (example: 1112 return null; 1521 returns 52):
SELECT REGEXP_SUBSTR(`originalText`,'[2-9][0-9]') AS `twoDigits` FROM `source`;
I test both in v8.0 and they work. That's all, good luck!
I'm having the same issue, and this is the solution I found (but it won't work in all cases) :
use LOCATE() to find the beginning and the end of the string you wan't to match
use MID() to extract the substring in between...
keep the regexp to match only the rows where you are sure to find a match.
I used my code as a Stored Procedure (Function), shall work to extract any number built from digits in a single block. This is a part of my wider library.
DELIMITER $$
-- 2013.04 michal#glebowski.pl
-- FindNumberInText("ab 234 95 cd", TRUE) => 234
-- FindNumberInText("ab 234 95 cd", FALSE) => 95
DROP FUNCTION IF EXISTS FindNumberInText$$
CREATE FUNCTION FindNumberInText(_input VARCHAR(64), _fromLeft BOOLEAN) RETURNS VARCHAR(32)
BEGIN
DECLARE _r VARCHAR(32) DEFAULT '';
DECLARE _i INTEGER DEFAULT 1;
DECLARE _start INTEGER DEFAULT 0;
DECLARE _IsCharNumeric BOOLEAN;
IF NOT _fromLeft THEN SET _input = REVERSE(_input); END IF;
_loop: REPEAT
SET _IsCharNumeric = LOCATE(MID(_input, _i, 1), "0123456789") > 0;
IF _IsCharNumeric THEN
IF _start = 0 THEN SET _start = _i; END IF;
ELSE
IF _start > 0 THEN LEAVE _loop; END IF;
END IF;
SET _i = _i + 1;
UNTIL _i > length(_input) END REPEAT;
IF _start > 0 THEN
SET _r = MID(_input, _start, _i - _start);
IF NOT _fromLeft THEN SET _r = REVERSE(_r); END IF;
END IF;
RETURN _r;
END$$
If you want to return a part of a string :
SELECT id , substring(columnName,(locate('partOfString',columnName)),10) from tableName;
Locate() will return the starting postion of the matching string which becomes starting position of Function Substring()
I know it's been quite a while since this question was asked but came across it and thought it would be a good challenge for my custom regex replacer - see this blog post.
...And the good news is it can, although it needs to be called quite a few times. See this online rextester demo, which shows the workings that got to the SQL below.
SELECT reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(
reg_replace(txt,
'[^0-9]+',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'([0-9]{3,}|,[0-9],)',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^[0-9],',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',[0-9]$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',{2,}',
',',
TRUE,
1, -- Min match length
0 -- No max match length
),
'^,',
'',
TRUE,
1, -- Min match length
0 -- No max match length
),
',$',
'',
TRUE,
1, -- Min match length
0 -- No max match length
) AS `csv`
FROM tbl;