LIKE function for htmlspecialchars encoded row - mysql

i have summernote text editor and doing htmlspecialchars($VALUE) before insert to db, then htmlspecialchars_decode($VALUE) after getting to maintain text editor's changes...
But i also need to do search function for this row, so how to use LIKE function when row (VARCHAR) is encoded with htmlspecialchars()? is there any SQL function to strip those tags while selecting to perform LIKE function?
P.S
I'm using PDO and my query looks like:
$this->db->prepare("SELECT * FROM t_tasks WHERE a_text LIKE '%$value%'");
and here, t_text looks like
<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>

Noisy Disclaimer: the following works as designed, and addresses the question being asked, but that does not make it a good idea or an example of best practice.
Quite the contrary, I would suggest. Sargability is completely defeated, and as you can see from reviewing the code, I have to go through some needless juggling and gyrations, because SQL is simply not the right tool for this job.
But, I wrote this when I needed it for an environment where I had no option but to work with data that was stored with encoded HTML entities -- and for legacy reasons could not be changed.
It's a MySQL stored function that converts entities to their utf8-encoded equivalent character. For example:
mysql> SELECT decode_entities('I ♥ doing “clever” things.') AS decoded_string;
+----------------------------------+
| decoded_string |
+----------------------------------+
| I ♥ doing “clever” things. |
+----------------------------------+
1 row in set (0.00 sec)
So, for your query, if we wanted to test whether this...
<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>
...is LIKE '<p><i style=%'...
mysql> SELECT decode_entities('<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>') LIKE '<p><i style=%' AS this_matches;
+--------------+
| this_matches |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)
...we find that it is.
After defining the function, you'd use...
$stmt = $this->db->prepare("SELECT t.*, decoded_entities(t.a_text) AS a_text_decoded FROM t_tasks t WHERE decode_entities(t.a_text) LIKE CONCAT('%', :value, '%'));
Here's the function:
DELIMITER $$
DROP FUNCTION IF EXISTS `decode_entities` $$
CREATE FUNCTION `decode_entities`(str LONGTEXT charset utf8) RETURNS longtext CHARSET utf8
NO SQL
DETERMINISTIC
BEGIN
-- decode HTML entities in database strings.
-- this processing is somewhat intensive due to the fact that this is clearly not something the database is
-- necessarily optimal place to accomlish; because of this, the function is optimized to quickly return strings that can't possibly contain entities
-- otherwise, we walk the string, looking for & ... ; then checking the matched inner contents for numeric (&#nnn;) and hex (?) literals,
-- failing that, we search for a named entity in the static string; if we end up with a decimal value, we utf-8 encode that value and replace
-- the entity, in place, in the string, with the utf-8 character; then advance our character pointer by one and then try again.
-- if we can't successfully decipher something that looks like an entity, we leave it as it was
-- the ordering of the values in the "entities' blob (entities are case sensitive) is something of a performance consideration; it may be desirable
-- that the most likely encountered entities in a given application be placed first in the blob, because there is a performance difference
-- of perhaps 30 usec (on a 1 GHz Opteron) when matching the first one compared to matching the last one
-- copy/pasted from https://stackoverflow.com/a/49498332/1695906
IF str IS NULL OR str NOT LIKE '%&%;%' THEN
RETURN str;
END IF;
BEGIN
DECLARE entities BLOB DEFAULT 'AElig,198,Aacute,193,Acirc,194,Agrave,192,Alpha,913,Aring,197,Atilde,195,Auml,196,Beta,914,Ccedil,199,Chi,935,Dagger,8225,Delta,916,ETH,208,Eacute,201,Ecirc,202,Egrave,200,Epsilon,917,Eta,919,Euml,203,Gamma,915,Iacute,205,Icirc,206,Igrave,204,Iota,921,Iuml,207,Kappa,922,Lambda,923,Mu,924,Ntilde,209,Nu,925,OElig,338,Oacute,211,Ocirc,212,Ograve,210,Omega,937,Omicron,927,Oslash,216,Otilde,213,Ouml,214,Phi,934,Pi,928,Prime,8243,Psi,936,Rho,929,Scaron,352,Sigma,931,THORN,222,Tau,932,Theta,920,Uacute,218,Ucirc,219,Ugrave,217,Upsilon,933,Uuml,220,Xi,926,Yacute,221,Yuml,376,Zeta,918,aacute,225,acirc,226,acute,180,aelig,230,agrave,224,alefsym,8501,alpha,945,amp,38,and,8743,ang,8736,apos,39,aring,229,asymp,8776,atilde,227,auml,228,bdquo,8222,beta,946,brvbar,166,bull,8226,cap,8745,ccedil,231,cedil,184,cent,162,chi,967,circ,710,clubs,9827,cong,8773,copy,169,crarr,8629,cup,8746,curren,164,dArr,8659,dagger,8224,darr,8595,deg,176,delta,948,diams,9830,divide,247,eacute,233,ecirc,234,egrave,232,empty,8709,emsp,8195,ensp,8194,epsilon,949,equiv,8801,eta,951,eth,240,euml,235,euro,8364,exist,8707,fnof,402,forall,8704,frac12,189,frac14,188,frac34,190,frasl,8260,gamma,947,ge,8805,gt,62,hArr,8660,harr,8596,hearts,9829,hellip,8230,iacute,237,icirc,238,iexcl,161,igrave,236,image,8465,infin,8734,int,8747,iota,953,iquest,191,isin,8712,iuml,239,kappa,954,lArr,8656,lambda,955,lang,9001,laquo,171,larr,8592,lceil,8968,ldquo,8220,le,8804,lfloor,8970,lowast,8727,loz,9674,lrm,8206,lsaquo,8249,lsquo,8216,lt,60,macr,175,mdash,8212,micro,181,middot,183,minus,8722,mu,956,nabla,8711,nbsp,160,ndash,8211,ne,8800,ni,8715,not,172,notin,8713,nsub,8836,ntilde,241,nu,957,oacute,243,ocirc,244,oelig,339,ograve,242,oline,8254,omega,969,omicron,959,oplus,8853,or,8744,ordf,170,ordm,186,oslash,248,otilde,245,otimes,8855,ouml,246,para,182,part,8706,permil,8240,perp,8869,phi,966,pi,960,piv,982,plusmn,177,pound,163,prime,8242,prod,8719,prop,8733,psi,968,quot,34,rArr,8658,radic,8730,rang,9002,raquo,187,rarr,8594,rceil,8969,rdquo,8221,real,8476,reg,174,rfloor,8971,rho,961,rlm,8207,rsaquo,8250,rsquo,8217,sbquo,8218,scaron,353,sdot,8901,sect,167,shy,173,sigma,963,sigmaf,962,sim,8764,spades,9824,sub,8834,sube,8838,sum,8721,sup1,185,sup2,178,sup3,179,sup,8835,supe,8839,szlig,223,tau,964,there4,8756,theta,952,thetasym,977,thinsp,8201,thorn,254,tilde,732,times,215,trade,8482,uArr,8657,uacute,250,uarr,8593,ucirc,251,ugrave,249,uml,168,upsih,978,upsilon,965,uuml,252,weierp,8472,xi,958,yacute,253,yen,165,yuml,255,zeta,950,zwj,8205,zwnj,8204';
DECLARE len BIGINT UNSIGNED DEFAULT LENGTH(str);
DECLARE ptr BIGINT UNSIGNED DEFAULT 0;
DECLARE nxtamp BIGINT UNSIGNED DEFAULT NULL;
DECLARE nxtsem BIGINT UNSIGNED DEFAULT NULL;
DECLARE sbstr LONGTEXT DEFAULT NULL;
DECLARE decval SMALLINT UNSIGNED DEFAULT NULL;
DECLARE setpos SMALLINT UNSIGNED DEFAULT NULL;
DECLARE uenc TINYTEXT DEFAULT NULL;
walk:
LOOP
SET ptr = ptr + 1;
IF ptr >= len THEN
LEAVE walk;
END IF;
SET nxtamp = LOCATE('&',str,ptr);
IF NOT nxtamp THEN
LEAVE walk;
END IF;
SET nxtsem = LOCATE(';',str,ptr + 1);
IF NOT nxtsem THEN
LEAVE walk;
END IF;
IF nxtsem < nxtamp THEN
ITERATE walk;
END IF;
SET sbstr = SUBSTRING(str FROM nxtamp +1 FOR nxtsem - nxtamp - 1);
IF sbstr RLIKE '^#[0-9]+$' THEN
SET decval = TRIM(LEADING '#' FROM sbstr);
ELSEIF sbstr RLIKE '^#x[0-9a-f]+$' THEN
SET decval = CONV(TRIM(LEADING '#x' FROM sbstr),16,10);
ELSE
SET setpos = FIND_IN_SET(sbstr,entities);
IF setpos > 0 THEN
SET decval = SUBSTRING_INDEX(SUBSTRING_INDEX(entities,',',setpos + 1),',',-1);
ELSE
ITERATE walk;
END IF;
END IF;
IF (decval > 0) THEN
SET uenc = CHAR(CASE
WHEN decval <= 0x7F THEN decval
WHEN decval <= 0x7FF THEN 0xC080 | ((decval >> 6) << 8) | (decval & 0x3F)
WHEN decval <= 0xFFFF THEN 0xE08080 | (((decval >> 12) & 0x0F ) << 16) | (((decval >> 6) & 0x3F ) << 8) | (decval & 0x3F)
ELSE NULL END);
IF uenc IS NOT NULL AND LENGTH(uenc) > 0 THEN
SET str = INSERT(str, nxtamp, 1 + nxtsem - nxtamp, uenc);
END IF;
END IF;
END LOOP;
RETURN str;
END;
END $$
DELIMITER ;
(n.b. these things are not called "tags" -- they are "HTML entities.")

use binding param eg:
$stmt = $this->db->prepare("SELECT * FROM t_tasks WHERE a_text LIKE CONCAT('%', :value, '%'));
$stmt->bindParam(':value', $value);

Related

Crossfade two ore more strings in MySQL or MariaDB

I'm looking for a lean way of overlapping/crossfading two or more strings in MySQL or MariaDB.
There is a base string like this:
XXXOOOOOOOXXX
Then there are n strings that need to crossfade that base string. The rule in this demo case is that X should be priority. The strings can be of different length.
So this strings overlapping the base string:
OOOOOXOOOOOOOOOX
XXOOOXXOOOOOXXXXO
should result in:
XXXOOXXOOOXXXXXXO
I could do this in PHP, but maybe there is a function inside MySQL or MariaDB that makes it faster.
There is not a function in MySQL which does do that, but one can be created:
CREATE DEFINER=`root`#`localhost` FUNCTION `crossfade`(
a VARCHAR(100),
b VARCHAR(100)) RETURNS varchar(100) CHARSET utf8mb4
BEGIN
DECLARE i INTEGER DEFAULT 1;
DECLARE r VARCHAR(100) DEFAULT '';
-- Make input equally long
IF LENGTH(a) < LENGTH(b) THEN
SET A = RPAD(A,LENGTH(b),' ');
ELSE
SET B = RPAD(B,LENGTH(a),' ');
END IF;
WHILE i<=LENGTH(a) DO
IF substring(a,i,1)='X' THEN
SET r = concat(r,substring(a,i,1));
ELSE
SET r = concat(r,substring(b,i,1));
END IF;
SET i = i + 1;
END WHILE;
RETURN r;
END
Example:
SELECT crossfade('OOOOOXOOOOOOOOOX','XXOOOXXOOOOOXXXXO');
output:
+ ------------------------------------------------------ +
| crossfade('OOOOOXOOOOOOOOOX','XXOOOXXOOOOOXXXXO') |
+ ------------------------------------------------------ +
| XXOOOXXOOOOOXXXXO |
+ ------------------------------------------------------ +

mySQL set a varchar without the special characters

I use mySQL as a DBMS,
I have these rows in my table:
product_name | product_code | prod_type
prod1#00X | 1 |
#prod2#00X | 2 |
+prod3##00X | 3 |
I wanna set the prod_type = the product_name without the special characters.
=> prod_type
prod100X
prod200X
prod300X
(I can have other special characters not only '#' and '+')
How can I do that?
Method 1:
You can use the REPLACE() method to remove special characters in mysql, don't know if it's very efficient though. But it should work.
Like Below:
SELECT Replace(Replace(product_name,'#',''),'+','') as prod_type
From Table1
Fiddle Demo
Method 2:
If you have All other Special Charcter then go with this (Source)
-- ----------------------------
-- Function structure for `udf_cleanString`
-- ----------------------------
DROP FUNCTION IF EXISTS `udf_cleanString`;
DELIMITER ;;
CREATE FUNCTION `udf_cleanString`(`in_str` varchar(4096)) RETURNS varchar(4096) CHARSET utf8
BEGIN
DECLARE out_str VARCHAR(4096) DEFAULT '';
DECLARE c VARCHAR(4096) DEFAULT '';
DECLARE pointer INT DEFAULT 1;
IF ISNULL(in_str) THEN
RETURN NULL;
ELSE
WHILE pointer <= LENGTH(in_str) DO
SET c = MID(in_str, pointer, 1);
IF ASCII(c) > 31 AND ASCII(c) < 127 THEN
SET out_str = CONCAT(out_str, c);
END IF;
SET pointer = pointer + 1;
END WHILE;
END IF;
RETURN out_str;
END
;;
DELIMITER ;
After that just call the function as follows:
SELECT product_name, udf_cleanString(product_name) AS 'product_Type'
FROM table1;
SELECT Replace(Replace(product_name,'#',''),'+','')
From Table
in case other special characters try nested Replace
like this
select REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(product_name, '/', ''),'(',''),')',''),' ',''),'+',''),'-',''),'#','');
or try using Regex
What you can do is,
Create a function to remove the special character, you can find how from the refernece
use the query Update YourTable set prod_type = YourFunction(product_name )

Comparison on numeric part of string in MySQL

I have a field that contains version information such as:
V12.0
V1.0
BE0.50
VV24
I want to query for version greater then n. The number I want to do the comparison on is preceded by a non-fixed number of characters. Is it possible to do something like:
SELECT version FROM table WHERE int_part(version) > 10
V12.0
VV24
In this case it seems that version number appears in end only, alphabets are in beginning. So, probable solution will be to reverse the string, type cast it to get numbers, then reverse back the number obtained.
Try following solution
SELECT version
FROM table
HAVING CAST(REVERSE(IF(LOCATE(".", version), (CAST(REVERSE(version) AS DECIMAL(4,2))), (CAST(REVERSE(version) AS UNSIGNED INTEGER)))) AS DECIMAL(4,2)) > 12;
Hope it helps...
Seem like you have to create you own function to do this:
CREATE FUNCTION IsNumeric (val varchar(255)) RETURNS tinyint
RETURN val REGEXP '^(-|\\+){0,1}([0-9]+\\.[0-9]*|[0-9]*\\.[0-9]+|[0-9]+)$';
CREATE FUNCTION NumericOnly (val VARCHAR(255))
RETURNS VARCHAR(255)
BEGIN
DECLARE idx INT DEFAULT 0;
IF ISNULL(val) THEN RETURN NULL; END IF;
IF LENGTH(val) = 0 THEN RETURN ""; END IF;
SET idx = LENGTH(val);
WHILE idx > 0 DO
IF IsNumeric(SUBSTRING(val,idx,1)) = 0 THEN
SET val = REPLACE(val,SUBSTRING(val,idx,1),"");
SET idx = LENGTH(val)+1;
END IF;
SET idx = idx - 1;
END WHILE;
RETURN val;
END;
select NumericOnly('vv24');
+---------------------+
| NumericOnly('vv24') |
+---------------------+
| 24 |
+---------------------+
Big downside of MySQL not supporting Regular Expressions in REPLACE method. If I have to do any matching of sorts, I usually pull all data to my PHP and process it there. In long run it is better solution. Of course you could always use User Defined Functions if you prefer.
http://dev.mysql.com/doc/refman/5.0/en/adding-udf.html
#shubhansh
Would 24 not change to 0.24 and failed > 12 ?

Case-insensitive REPLACE in MySQL?

MySQL runs pretty much all string comparisons under the default collation... except the REPLACE command. I have a case-insensitive collation and need to run a case-insensitive REPLACE. Is there any way to force REPLACE to use the current collation rather than always doing case-sensitive comparisons? I'm willing to upgrade my MySQL (currently running 5.1) to get added functionality...
mysql> charset utf8 collation utf8_unicode_ci;
Charset changed
mysql> select 'abc' like '%B%';
+------------------+
| 'abc' like '%B%' |
+------------------+
| 1 |
+------------------+
mysql> select replace('aAbBcC', 'a', 'f');
+-----------------------------+
| replace('aAbBcC', 'a', 'f') |
+-----------------------------+
| fAbBcC | <--- *NOT* 'ffbBcC'
+-----------------------------+
If replace(lower()) doesn't work, you'll need to create another function.
My 2 cents.
Since many people have migrated from MySQL to MariaDB, those people will have available a new function called REGEXP_REPLACE. Use it as you would a normal replace, but the pattern is a regular expression.
This is a working example:
UPDATE `myTable`
SET `myField` = REGEXP_REPLACE(`myField`, '(?i)my insensitive string', 'new string')
WHERE `myField` REGEXP '(?i)my insensitive string'
The option (?i) makes all the subsequent matches case insensitive (if put at the beginning of the pattern like I have then it all is insensitive).
See here for more information: https://mariadb.com/kb/en/mariadb/pcre/
Edit: as of MySQL 8.0 you can now use the regexp_replace function too, see documentation: https://dev.mysql.com/doc/refman/8.0/en/regexp.html
Alternative function for one spoken by fvox.
DELIMITER |
CREATE FUNCTION case_insensitive_replace ( REPLACE_WHERE text, REPLACE_THIS text, REPLACE_WITH text )
RETURNS text
DETERMINISTIC
BEGIN
DECLARE last_occurency int DEFAULT '1';
IF LCASE(REPLACE_THIS) = LCASE(REPLACE_WITH) OR LENGTH(REPLACE_THIS) < 1 THEN
RETURN REPLACE_WHERE;
END IF;
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
BEGIN
SET last_occurency = Locate(LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE));
SET REPLACE_WHERE = Insert( REPLACE_WHERE, last_occurency, LENGTH(REPLACE_THIS), REPLACE_WITH);
SET last_occurency = last_occurency + LENGTH(REPLACE_WITH);
END;
END WHILE;
RETURN REPLACE_WHERE;
END;
|
DELIMITER ;
Small test:
SET #str = BINARY 'New York';
SELECT case_insensitive_replace(#str, 'y', 'K');
Answers: New Kork
This modification of Luist's answer allows one to replace the needle with a differently cased version of the needle (two lines change).
DELIMITER |
CREATE FUNCTION case_insensitive_replace ( REPLACE_WHERE text, REPLACE_THIS text, REPLACE_WITH text )
RETURNS text
DETERMINISTIC
BEGIN
DECLARE last_occurency int DEFAULT '1';
IF LENGTH(REPLACE_THIS) < 1 THEN
RETURN REPLACE_WHERE;
END IF;
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
BEGIN
SET last_occurency = Locate(LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency);
SET REPLACE_WHERE = Insert( REPLACE_WHERE, last_occurency, LENGTH(REPLACE_THIS), REPLACE_WITH);
SET last_occurency = last_occurency + LENGTH(REPLACE_WITH);
END;
END WHILE;
RETURN REPLACE_WHERE;
END;
|
DELIMITER ;
I went with http://pento.net/2009/02/15/case-insensitive-replace-for-mysql/ (in fvox's answer) which performs the case insensitive search with case sensitive replacement and without changing the case of what should be unaffected characters elsewhere in the searched string.
N.B. the comment further down that same page stating that CHAR(255) should be changed to VARCHAR(255) - this seemed to be required for me as well.
In the previous answers, and the pento.net link, the arguments to LOCATE() are lower-cased.
This is a waste of resources, as LOCATE is case-insensitive by default:
mysql> select locate('el', 'HELLo');
+-----------------------+
| locate('el', 'HELLo') |
+-----------------------+
| 2 |
+-----------------------+
You can replace
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
with
WHILE Locate(REPLACE_THIS, REPLACE_WHERE, last_occurency ) > 0 DO
etc.
In case of 'special' characters there is unexpected behaviour:
SELECT case_insensitive_replace('A', 'Ã', 'a')
Gives
a
Which is unexpected... since we only want to replace the à not A
What is even more weird:
SELECT LOCATE('Ã', 'A');
gives
0
Which is the correct result... seems to have to do with encoding of the parameters of the stored procedure...
I like to use a search and replace function I created when I need to replace without worrying about the case of the original or search strings. This routine bails out quickly if you pass in an empty/null search string or a null replace string without altering the incoming string. I also added a safe count down just in case somehow the search keep looping. This way we don't get stuck in a loop forever. Alter the starting number if you think it is too low.
delimiter //
DROP FUNCTION IF EXISTS `replace_nocase`//
CREATE FUNCTION `replace_nocase`(raw text, find_str varchar(1000), replace_str varchar(1000)) RETURNS text
CHARACTER SET utf8
DETERMINISTIC
BEGIN
declare ret text;
declare len int;
declare hit int;
declare safe int;
if find_str is null or find_str='' or replace_str is null then
return raw;
end if;
set safe=10000;
set ret=raw;
set len=length(find_str);
set hit=LOCATE(find_str,ret);
while hit>0 and safe>0 do
set ret=concat(substring(ret,1,hit-1),replace_str,substring(ret,hit+len));
set hit=LOCATE(find_str,ret,hit+1);
set safe=safe-1;
end while;
return ret;
END//
This question is a bit old but I ran into the same problem and the answers given didn't allow me to solve it entirely.
I wanted the result to retain the case of the original string.
So I made a small modification to the replace_ci function proposed by fvox :
DELIMITER $$
DROP FUNCTION IF EXISTS `replace_ci`$$
CREATE FUNCTION `replace_ci` (str TEXT, needle CHAR(255), str_rep CHAR(255))
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE return_str TEXT DEFAULT '';
DECLARE lower_str TEXT;
DECLARE lower_needle TEXT;
DECLARE tmp_needle TEXT;
DECLARE str_origin_char CHAR(1);
DECLARE str_rep_char CHAR(1);
DECLARE final_str_rep TEXT DEFAULT '';
DECLARE pos INT DEFAULT 1;
DECLARE old_pos INT DEFAULT 1;
DECLARE needle_pos INT DEFAULT 1;
IF needle = '' THEN
RETURN str;
END IF;
SELECT LOWER(str) INTO lower_str;
SELECT LOWER(needle) INTO lower_needle;
SELECT LOCATE(lower_needle, lower_str, pos) INTO pos;
WHILE pos > 0 DO
SELECT substr(str, pos, char_length(needle)) INTO tmp_needle;
SELECT '' INTO final_str_rep;
SELECT 1 INTO needle_pos;
WHILE needle_pos <= char_length(tmp_needle) DO
SELECT substr(tmp_needle, needle_pos, 1) INTO str_origin_char;
SELECT SUBSTR(str_rep, needle_pos, 1) INTO str_rep_char;
SELECT CONCAT(final_str_rep, IF(BINARY str_origin_char = LOWER(str_origin_char), LOWER(str_rep_char), IF(BINARY str_origin_char = UPPER(str_origin_char), UPPER(str_rep_char), str_rep_char))) INTO final_str_rep;
SELECT (needle_pos + 1) INTO needle_pos;
END WHILE;
SELECT CONCAT(return_str, SUBSTR(str, old_pos, pos - old_pos), final_str_rep) INTO return_str;
SELECT pos + CHAR_LENGTH(needle) INTO pos;
SELECT pos INTO old_pos;
SELECT LOCATE(lower_needle, lower_str, pos) INTO pos;
END WHILE;
SELECT CONCAT(return_str, SUBSTR(str, old_pos, CHAR_LENGTH(str))) INTO return_str;
RETURN return_str;
END$$
DELIMITER ;
Example of use :
SELECT replace_ci( 'MySQL', 'm', 'e' ) as replaced;
Will return :
| replaced |
| --- |
| EySQL |

How do you extract a numerical value from a string in a MySQL query?

I have a table with two columns: price (int) and price_display (varchar).
price is the actual numerical price, e.g. "9990"
price_display is the visual representation, e.g. "$9.99" or "9.99Fr"
I've been able to confirm the two columns match via regexp:
price_display not regexp
format(price/1000, 2)
But in the case of a mismatch, I want to extract the value from the price_display column and set it into the price column, all within the context of an update statement. I've not been able to figure out how.
Thanks.
This function does the job of only returning the digits 0-9 from the string, which does the job nicely to solve your issue, regardless of what prefixes or postfixes you have.
http://www.artfulsoftware.com/infotree/queries.php?&bw=1280#815
Copied here for reference:
SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS digits;
DELIMITER |
CREATE FUNCTION digits( str CHAR(32) ) RETURNS CHAR(32)
BEGIN
DECLARE i, len SMALLINT DEFAULT 1;
DECLARE ret CHAR(32) DEFAULT '';
DECLARE c CHAR(1);
IF str IS NULL
THEN
RETURN "";
END IF;
SET len = CHAR_LENGTH( str );
REPEAT
BEGIN
SET c = MID( str, i, 1 );
IF c BETWEEN '0' AND '9' THEN
SET ret=CONCAT(ret,c);
END IF;
SET i = i + 1;
END;
UNTIL i > len END REPEAT;
RETURN ret;
END |
DELIMITER ;
SELECT digits('$10.00Fr');
#returns 1000
One approach would be to use REPLACE() function:
UPDATE my_table
SET price = replace(replace(replace(price_display,'Fr',''),'$',''),'.','')
WHERE price_display not regexp format(price/1000, 2);
This works for the examples data you gave:
'$9.99'
'9.99Fr'
Both result in 999 in my test. With an update like this, it's important to be sure to back up the database first, and be cognizant of the formats of the items. You can see all the "baddies" by doing this query:
SELECT DISTINCT price_display
FROM my_table
WHERE price_display not regexp format(price/1000, 2)
ORDER BY price_display;
For me CASTING the field did the trick:
CAST( price AS UNSIGNED ) // For positive integer
CAST( price AS SIGNED ) // For negative and positive integer
IF(CAST(price AS UNSIGNED)=0,REVERSE(CAST(REVERSE(price) AS UNSIGNED)),CAST(price AS UNSIGNED)) // Fix when price starts with something else then a digit
For more details see:
https://dev.mysql.com/doc/refman/5.0/en/cast-functions.html
This is a "coding horror", relational database schemas should NOT be written like this!
Your having to write complex and unnecessary code to validate the data.
Try something like this:
SELECT CONCAT('$',(price/1000)) AS Price FROM ...
In addition, you can use a float, double or real instead of a integer.
If you need to store currency data, you might consider adding a currency field or use the systems locale functions to display it in the correct format.
I create a procedure that detect the first number in a string and return this, if not return 0.
DROP FUNCTION IF EXISTS extractNumber;
DELIMITER //
CREATE FUNCTION extractNumber (string1 VARCHAR(255)) RETURNS INT(11)
BEGIN
DECLARE position, result, longitude INT(11) DEFAULT 0;
DECLARE string2 VARCHAR(255);
SET longitude = LENGTH(string1);
SET result = CONVERT(string1, SIGNED);
IF result = 0 THEN
IF string1 REGEXP('[0-9]') THEN
SET position = 2;
checkString:WHILE position <= longitude DO
SET string2 = SUBSTR(string1 FROM position);
IF CONVERT(string2, SIGNED) != 0 THEN
SET result = CONVERT(string2, SIGNED);
LEAVE checkString;
END IF;
SET position = position + 1;
END WHILE;
END IF;
END IF;
RETURN result;
END //
DELIMITER ;
Return last number from the string:
CREATE FUNCTION getLastNumber(str VARCHAR(255)) RETURNS INT(11)
DELIMETER //
BEGIN
DECLARE last_number, str_length, position INT(11) DEFAULT 0;
DECLARE temp_char VARCHAR(1);
DECLARE temp_char_before VARCHAR(1);
IF str IS NULL THEN
RETURN -1;
END IF;
SET str_length = LENGTH(str);
WHILE position <= str_length DO
SET temp_char = MID(str, position, 1);
IF position > 0 THEN
SET temp_char_before = MID(str, position - 1, 1);
END IF;
IF temp_char BETWEEN '0' AND '9' THEN
SET last_number = last_number * 10 + temp_char;
END IF;
IF (temp_char_before NOT BETWEEN '0' AND '9') AND
(temp_char BETWEEN '0' AND '9') THEN
SET last_number = temp_char;
END IF;
SET position = position + 1;
END WHILE;
RETURN last_number;
END//
DELIMETER;
Then call this functions:
select getLastNumber("ssss111www222w");
print 222
select getLastNumber("ssss111www222www3332");
print 3332