i have summernote text editor and doing htmlspecialchars($VALUE) before insert to db, then htmlspecialchars_decode($VALUE) after getting to maintain text editor's changes...
But i also need to do search function for this row, so how to use LIKE function when row (VARCHAR) is encoded with htmlspecialchars()? is there any SQL function to strip those tags while selecting to perform LIKE function?
P.S
I'm using PDO and my query looks like:
$this->db->prepare("SELECT * FROM t_tasks WHERE a_text LIKE '%$value%'");
and here, t_text looks like
<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>
Noisy Disclaimer: the following works as designed, and addresses the question being asked, but that does not make it a good idea or an example of best practice.
Quite the contrary, I would suggest. Sargability is completely defeated, and as you can see from reviewing the code, I have to go through some needless juggling and gyrations, because SQL is simply not the right tool for this job.
But, I wrote this when I needed it for an environment where I had no option but to work with data that was stored with encoded HTML entities -- and for legacy reasons could not be changed.
It's a MySQL stored function that converts entities to their utf8-encoded equivalent character. For example:
mysql> SELECT decode_entities('I ♥ doing “clever” things.') AS decoded_string;
+----------------------------------+
| decoded_string |
+----------------------------------+
| I ♥ doing “clever” things. |
+----------------------------------+
1 row in set (0.00 sec)
So, for your query, if we wanted to test whether this...
<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>
...is LIKE '<p><i style=%'...
mysql> SELECT decode_entities('<p><i style="background-color: rgb(255, 255, 0);">cdsfsadfasdf</i></p>') LIKE '<p><i style=%' AS this_matches;
+--------------+
| this_matches |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)
...we find that it is.
After defining the function, you'd use...
$stmt = $this->db->prepare("SELECT t.*, decoded_entities(t.a_text) AS a_text_decoded FROM t_tasks t WHERE decode_entities(t.a_text) LIKE CONCAT('%', :value, '%'));
Here's the function:
DELIMITER $$
DROP FUNCTION IF EXISTS `decode_entities` $$
CREATE FUNCTION `decode_entities`(str LONGTEXT charset utf8) RETURNS longtext CHARSET utf8
NO SQL
DETERMINISTIC
BEGIN
-- decode HTML entities in database strings.
-- this processing is somewhat intensive due to the fact that this is clearly not something the database is
-- necessarily optimal place to accomlish; because of this, the function is optimized to quickly return strings that can't possibly contain entities
-- otherwise, we walk the string, looking for & ... ; then checking the matched inner contents for numeric (&#nnn;) and hex (?) literals,
-- failing that, we search for a named entity in the static string; if we end up with a decimal value, we utf-8 encode that value and replace
-- the entity, in place, in the string, with the utf-8 character; then advance our character pointer by one and then try again.
-- if we can't successfully decipher something that looks like an entity, we leave it as it was
-- the ordering of the values in the "entities' blob (entities are case sensitive) is something of a performance consideration; it may be desirable
-- that the most likely encountered entities in a given application be placed first in the blob, because there is a performance difference
-- of perhaps 30 usec (on a 1 GHz Opteron) when matching the first one compared to matching the last one
-- copy/pasted from https://stackoverflow.com/a/49498332/1695906
IF str IS NULL OR str NOT LIKE '%&%;%' THEN
RETURN str;
END IF;
BEGIN
DECLARE entities BLOB DEFAULT 'AElig,198,Aacute,193,Acirc,194,Agrave,192,Alpha,913,Aring,197,Atilde,195,Auml,196,Beta,914,Ccedil,199,Chi,935,Dagger,8225,Delta,916,ETH,208,Eacute,201,Ecirc,202,Egrave,200,Epsilon,917,Eta,919,Euml,203,Gamma,915,Iacute,205,Icirc,206,Igrave,204,Iota,921,Iuml,207,Kappa,922,Lambda,923,Mu,924,Ntilde,209,Nu,925,OElig,338,Oacute,211,Ocirc,212,Ograve,210,Omega,937,Omicron,927,Oslash,216,Otilde,213,Ouml,214,Phi,934,Pi,928,Prime,8243,Psi,936,Rho,929,Scaron,352,Sigma,931,THORN,222,Tau,932,Theta,920,Uacute,218,Ucirc,219,Ugrave,217,Upsilon,933,Uuml,220,Xi,926,Yacute,221,Yuml,376,Zeta,918,aacute,225,acirc,226,acute,180,aelig,230,agrave,224,alefsym,8501,alpha,945,amp,38,and,8743,ang,8736,apos,39,aring,229,asymp,8776,atilde,227,auml,228,bdquo,8222,beta,946,brvbar,166,bull,8226,cap,8745,ccedil,231,cedil,184,cent,162,chi,967,circ,710,clubs,9827,cong,8773,copy,169,crarr,8629,cup,8746,curren,164,dArr,8659,dagger,8224,darr,8595,deg,176,delta,948,diams,9830,divide,247,eacute,233,ecirc,234,egrave,232,empty,8709,emsp,8195,ensp,8194,epsilon,949,equiv,8801,eta,951,eth,240,euml,235,euro,8364,exist,8707,fnof,402,forall,8704,frac12,189,frac14,188,frac34,190,frasl,8260,gamma,947,ge,8805,gt,62,hArr,8660,harr,8596,hearts,9829,hellip,8230,iacute,237,icirc,238,iexcl,161,igrave,236,image,8465,infin,8734,int,8747,iota,953,iquest,191,isin,8712,iuml,239,kappa,954,lArr,8656,lambda,955,lang,9001,laquo,171,larr,8592,lceil,8968,ldquo,8220,le,8804,lfloor,8970,lowast,8727,loz,9674,lrm,8206,lsaquo,8249,lsquo,8216,lt,60,macr,175,mdash,8212,micro,181,middot,183,minus,8722,mu,956,nabla,8711,nbsp,160,ndash,8211,ne,8800,ni,8715,not,172,notin,8713,nsub,8836,ntilde,241,nu,957,oacute,243,ocirc,244,oelig,339,ograve,242,oline,8254,omega,969,omicron,959,oplus,8853,or,8744,ordf,170,ordm,186,oslash,248,otilde,245,otimes,8855,ouml,246,para,182,part,8706,permil,8240,perp,8869,phi,966,pi,960,piv,982,plusmn,177,pound,163,prime,8242,prod,8719,prop,8733,psi,968,quot,34,rArr,8658,radic,8730,rang,9002,raquo,187,rarr,8594,rceil,8969,rdquo,8221,real,8476,reg,174,rfloor,8971,rho,961,rlm,8207,rsaquo,8250,rsquo,8217,sbquo,8218,scaron,353,sdot,8901,sect,167,shy,173,sigma,963,sigmaf,962,sim,8764,spades,9824,sub,8834,sube,8838,sum,8721,sup1,185,sup2,178,sup3,179,sup,8835,supe,8839,szlig,223,tau,964,there4,8756,theta,952,thetasym,977,thinsp,8201,thorn,254,tilde,732,times,215,trade,8482,uArr,8657,uacute,250,uarr,8593,ucirc,251,ugrave,249,uml,168,upsih,978,upsilon,965,uuml,252,weierp,8472,xi,958,yacute,253,yen,165,yuml,255,zeta,950,zwj,8205,zwnj,8204';
DECLARE len BIGINT UNSIGNED DEFAULT LENGTH(str);
DECLARE ptr BIGINT UNSIGNED DEFAULT 0;
DECLARE nxtamp BIGINT UNSIGNED DEFAULT NULL;
DECLARE nxtsem BIGINT UNSIGNED DEFAULT NULL;
DECLARE sbstr LONGTEXT DEFAULT NULL;
DECLARE decval SMALLINT UNSIGNED DEFAULT NULL;
DECLARE setpos SMALLINT UNSIGNED DEFAULT NULL;
DECLARE uenc TINYTEXT DEFAULT NULL;
walk:
LOOP
SET ptr = ptr + 1;
IF ptr >= len THEN
LEAVE walk;
END IF;
SET nxtamp = LOCATE('&',str,ptr);
IF NOT nxtamp THEN
LEAVE walk;
END IF;
SET nxtsem = LOCATE(';',str,ptr + 1);
IF NOT nxtsem THEN
LEAVE walk;
END IF;
IF nxtsem < nxtamp THEN
ITERATE walk;
END IF;
SET sbstr = SUBSTRING(str FROM nxtamp +1 FOR nxtsem - nxtamp - 1);
IF sbstr RLIKE '^#[0-9]+$' THEN
SET decval = TRIM(LEADING '#' FROM sbstr);
ELSEIF sbstr RLIKE '^#x[0-9a-f]+$' THEN
SET decval = CONV(TRIM(LEADING '#x' FROM sbstr),16,10);
ELSE
SET setpos = FIND_IN_SET(sbstr,entities);
IF setpos > 0 THEN
SET decval = SUBSTRING_INDEX(SUBSTRING_INDEX(entities,',',setpos + 1),',',-1);
ELSE
ITERATE walk;
END IF;
END IF;
IF (decval > 0) THEN
SET uenc = CHAR(CASE
WHEN decval <= 0x7F THEN decval
WHEN decval <= 0x7FF THEN 0xC080 | ((decval >> 6) << 8) | (decval & 0x3F)
WHEN decval <= 0xFFFF THEN 0xE08080 | (((decval >> 12) & 0x0F ) << 16) | (((decval >> 6) & 0x3F ) << 8) | (decval & 0x3F)
ELSE NULL END);
IF uenc IS NOT NULL AND LENGTH(uenc) > 0 THEN
SET str = INSERT(str, nxtamp, 1 + nxtsem - nxtamp, uenc);
END IF;
END IF;
END LOOP;
RETURN str;
END;
END $$
DELIMITER ;
(n.b. these things are not called "tags" -- they are "HTML entities.")
use binding param eg:
$stmt = $this->db->prepare("SELECT * FROM t_tasks WHERE a_text LIKE CONCAT('%', :value, '%'));
$stmt->bindParam(':value', $value);
MySQL runs pretty much all string comparisons under the default collation... except the REPLACE command. I have a case-insensitive collation and need to run a case-insensitive REPLACE. Is there any way to force REPLACE to use the current collation rather than always doing case-sensitive comparisons? I'm willing to upgrade my MySQL (currently running 5.1) to get added functionality...
mysql> charset utf8 collation utf8_unicode_ci;
Charset changed
mysql> select 'abc' like '%B%';
+------------------+
| 'abc' like '%B%' |
+------------------+
| 1 |
+------------------+
mysql> select replace('aAbBcC', 'a', 'f');
+-----------------------------+
| replace('aAbBcC', 'a', 'f') |
+-----------------------------+
| fAbBcC | <--- *NOT* 'ffbBcC'
+-----------------------------+
If replace(lower()) doesn't work, you'll need to create another function.
My 2 cents.
Since many people have migrated from MySQL to MariaDB, those people will have available a new function called REGEXP_REPLACE. Use it as you would a normal replace, but the pattern is a regular expression.
This is a working example:
UPDATE `myTable`
SET `myField` = REGEXP_REPLACE(`myField`, '(?i)my insensitive string', 'new string')
WHERE `myField` REGEXP '(?i)my insensitive string'
The option (?i) makes all the subsequent matches case insensitive (if put at the beginning of the pattern like I have then it all is insensitive).
See here for more information: https://mariadb.com/kb/en/mariadb/pcre/
Edit: as of MySQL 8.0 you can now use the regexp_replace function too, see documentation: https://dev.mysql.com/doc/refman/8.0/en/regexp.html
Alternative function for one spoken by fvox.
DELIMITER |
CREATE FUNCTION case_insensitive_replace ( REPLACE_WHERE text, REPLACE_THIS text, REPLACE_WITH text )
RETURNS text
DETERMINISTIC
BEGIN
DECLARE last_occurency int DEFAULT '1';
IF LCASE(REPLACE_THIS) = LCASE(REPLACE_WITH) OR LENGTH(REPLACE_THIS) < 1 THEN
RETURN REPLACE_WHERE;
END IF;
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
BEGIN
SET last_occurency = Locate(LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE));
SET REPLACE_WHERE = Insert( REPLACE_WHERE, last_occurency, LENGTH(REPLACE_THIS), REPLACE_WITH);
SET last_occurency = last_occurency + LENGTH(REPLACE_WITH);
END;
END WHILE;
RETURN REPLACE_WHERE;
END;
|
DELIMITER ;
Small test:
SET #str = BINARY 'New York';
SELECT case_insensitive_replace(#str, 'y', 'K');
Answers: New Kork
This modification of Luist's answer allows one to replace the needle with a differently cased version of the needle (two lines change).
DELIMITER |
CREATE FUNCTION case_insensitive_replace ( REPLACE_WHERE text, REPLACE_THIS text, REPLACE_WITH text )
RETURNS text
DETERMINISTIC
BEGIN
DECLARE last_occurency int DEFAULT '1';
IF LENGTH(REPLACE_THIS) < 1 THEN
RETURN REPLACE_WHERE;
END IF;
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
BEGIN
SET last_occurency = Locate(LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency);
SET REPLACE_WHERE = Insert( REPLACE_WHERE, last_occurency, LENGTH(REPLACE_THIS), REPLACE_WITH);
SET last_occurency = last_occurency + LENGTH(REPLACE_WITH);
END;
END WHILE;
RETURN REPLACE_WHERE;
END;
|
DELIMITER ;
I went with http://pento.net/2009/02/15/case-insensitive-replace-for-mysql/ (in fvox's answer) which performs the case insensitive search with case sensitive replacement and without changing the case of what should be unaffected characters elsewhere in the searched string.
N.B. the comment further down that same page stating that CHAR(255) should be changed to VARCHAR(255) - this seemed to be required for me as well.
In the previous answers, and the pento.net link, the arguments to LOCATE() are lower-cased.
This is a waste of resources, as LOCATE is case-insensitive by default:
mysql> select locate('el', 'HELLo');
+-----------------------+
| locate('el', 'HELLo') |
+-----------------------+
| 2 |
+-----------------------+
You can replace
WHILE Locate( LCASE(REPLACE_THIS), LCASE(REPLACE_WHERE), last_occurency ) > 0 DO
with
WHILE Locate(REPLACE_THIS, REPLACE_WHERE, last_occurency ) > 0 DO
etc.
In case of 'special' characters there is unexpected behaviour:
SELECT case_insensitive_replace('A', 'Ã', 'a')
Gives
a
Which is unexpected... since we only want to replace the à not A
What is even more weird:
SELECT LOCATE('Ã', 'A');
gives
0
Which is the correct result... seems to have to do with encoding of the parameters of the stored procedure...
I like to use a search and replace function I created when I need to replace without worrying about the case of the original or search strings. This routine bails out quickly if you pass in an empty/null search string or a null replace string without altering the incoming string. I also added a safe count down just in case somehow the search keep looping. This way we don't get stuck in a loop forever. Alter the starting number if you think it is too low.
delimiter //
DROP FUNCTION IF EXISTS `replace_nocase`//
CREATE FUNCTION `replace_nocase`(raw text, find_str varchar(1000), replace_str varchar(1000)) RETURNS text
CHARACTER SET utf8
DETERMINISTIC
BEGIN
declare ret text;
declare len int;
declare hit int;
declare safe int;
if find_str is null or find_str='' or replace_str is null then
return raw;
end if;
set safe=10000;
set ret=raw;
set len=length(find_str);
set hit=LOCATE(find_str,ret);
while hit>0 and safe>0 do
set ret=concat(substring(ret,1,hit-1),replace_str,substring(ret,hit+len));
set hit=LOCATE(find_str,ret,hit+1);
set safe=safe-1;
end while;
return ret;
END//
This question is a bit old but I ran into the same problem and the answers given didn't allow me to solve it entirely.
I wanted the result to retain the case of the original string.
So I made a small modification to the replace_ci function proposed by fvox :
DELIMITER $$
DROP FUNCTION IF EXISTS `replace_ci`$$
CREATE FUNCTION `replace_ci` (str TEXT, needle CHAR(255), str_rep CHAR(255))
RETURNS TEXT
DETERMINISTIC
BEGIN
DECLARE return_str TEXT DEFAULT '';
DECLARE lower_str TEXT;
DECLARE lower_needle TEXT;
DECLARE tmp_needle TEXT;
DECLARE str_origin_char CHAR(1);
DECLARE str_rep_char CHAR(1);
DECLARE final_str_rep TEXT DEFAULT '';
DECLARE pos INT DEFAULT 1;
DECLARE old_pos INT DEFAULT 1;
DECLARE needle_pos INT DEFAULT 1;
IF needle = '' THEN
RETURN str;
END IF;
SELECT LOWER(str) INTO lower_str;
SELECT LOWER(needle) INTO lower_needle;
SELECT LOCATE(lower_needle, lower_str, pos) INTO pos;
WHILE pos > 0 DO
SELECT substr(str, pos, char_length(needle)) INTO tmp_needle;
SELECT '' INTO final_str_rep;
SELECT 1 INTO needle_pos;
WHILE needle_pos <= char_length(tmp_needle) DO
SELECT substr(tmp_needle, needle_pos, 1) INTO str_origin_char;
SELECT SUBSTR(str_rep, needle_pos, 1) INTO str_rep_char;
SELECT CONCAT(final_str_rep, IF(BINARY str_origin_char = LOWER(str_origin_char), LOWER(str_rep_char), IF(BINARY str_origin_char = UPPER(str_origin_char), UPPER(str_rep_char), str_rep_char))) INTO final_str_rep;
SELECT (needle_pos + 1) INTO needle_pos;
END WHILE;
SELECT CONCAT(return_str, SUBSTR(str, old_pos, pos - old_pos), final_str_rep) INTO return_str;
SELECT pos + CHAR_LENGTH(needle) INTO pos;
SELECT pos INTO old_pos;
SELECT LOCATE(lower_needle, lower_str, pos) INTO pos;
END WHILE;
SELECT CONCAT(return_str, SUBSTR(str, old_pos, CHAR_LENGTH(str))) INTO return_str;
RETURN return_str;
END$$
DELIMITER ;
Example of use :
SELECT replace_ci( 'MySQL', 'm', 'e' ) as replaced;
Will return :
| replaced |
| --- |
| EySQL |