preg_replace : capturing single quote inside single quote escaped expression - mysql

In a wordpress theme, I am using the "posts_where" filter to add search to the "excerpt" field. It is working excepted when there is a single quote in the search string, leading to a SQL synthax error.
It seems to be a bug in the preg_replace function off the posts_where filter.
For example, for the string "o'kine" , the $where string received in the posts_where filter is :
"AND (((cn_posts.post_title LIKE '%o\'kine%') OR (cn_posts.post_content LIKE '%o\'kine%')))"
Then this is my preg_replace to add the post_excerpt field :
$where = preg_replace(
"/post_title\s+LIKE\s*(\'[^\']+\')/",
"post_title LIKE $1) OR (post_excerpt LIKE $1", $where );
And the value of the $where after :
"AND (((cn_posts.post_title LIKE '%o\') OR (post_excerpt LIKE '%o\'kine%') OR (cn_posts.post_content LIKE '%o\'kine%')))"
See the '%o\' part that is causing the SQL synthax error.
The expected result would be :
"AND (((cn_posts.post_title LIKE '%o\'kine%') OR (post_excerpt LIKE '%o\'kine%') OR (cn_posts.post_content LIKE '%o\'kine%')))"
The bug is clearly in my regular expression, more precisely in my capturing parentheses. I do not know how to deal with the possibility of zero or more single quote in my search string?
EDIT : With Casimir et Hippolyte answer, this is the working filter with single quote in the search string :
function cn_search_where( $where ) {
$where = preg_replace(
"/post_title\s+LIKE\s*('[^'\\\\]*+(?s:\\\\.[^'\\\\]*)*+')/",
"post_title LIKE $1) OR (post_excerpt LIKE $1", $where );
return $where;
}

The subpattern to match a quoted string with eventual escaped quotes (or other characters) is:
'[^'\\]*+(?s:\\.[^'\\]*)*+'
(note that to figure a literal backslash in a regex pattern, it must be escaped since the backslash is a special character)
So in a php string (backslashes need to be escaped one more time):
$pattern = "~'[^'\\\\]*+(?s:\\\\.[^'\\\\]*)*+'~";
With this information, I think you can build the pattern yourself.
details:
' # a literal single quote
[^'\\]*+ # zero or more characters that are not a single quote or a backslash
(?s: # open a non-capture group with the s modifier (the dot can match newlines)
\\. # an escaped character
[^'\\]*
)*+ # repeat the group zero or more times
'

Related

How to have a whole string as a normal string in PostgreSQL's regular match?

In my sql code, I receive a input string as a regular match's filter, I want to have the whole string as a normal string, even it includes some special characters.
Just look below:
do $$ declare
jdata jsonb='[{"name":"Dog 3*240+1*120"}]'::jsonb;
vfilter1 text='dog';
vfilter2 text='3*240+1*120';
vexists bool=false;
begin
select jdata #? concat('$[*] ? (#.name like_regex "',vfilter1,'" flag "i")')::jsonpath into vexists;
raise notice 'exists:%',vexists; --the result is true
select jdata #? concat('$[*] ? (#.name like_regex "',vfilter2,'" flag "i")')::jsonpath into vexists;
raise notice 'exists:%',vexists;-- the result is false
end;
$$ language plpgsql;
the string 3*240+1*120 include + and * characters, perhaps this causes the regular match have them as special character. In my code, I just want to have the whole vfilter string includes all special characters together as a normal string for the regular match.
What should I do?
You should read the documentation for the feature you are using.
The optional flag string may include one or more of the characters i
for case-insensitive match, m to allow ^ and $ to match at newlines, s
to allow . to match a newline, and q to quote the whole pattern
(reducing the behavior to a simple substring match).

Powershell - How to build a function to replace certain special characters with others

I'm trying to build a function in Powershell that I can use every time I have to replace a list of special characters with other characters. For example, this is the list of the special chars:
$Specialchars = '["ü","ä","ö","ß","æ","Œ","œ","°","~","Ø"]'
And I want that function to replace for example the "ü" with "ue" or the "Ø" with "o" and so on. The idea is to run the function against any string that I want to have those special characters replaced the way I want.
For now, I tried this:
function ReplaceSpecialChars($chars)
{
return $chars -replace "Ø","o"
return $chars -replace "~",""
return $chars -replace "œ","oe"
}
and it works only when it founds the first special characters, the "Ø" and not for example when I run it against a string which has the "~". I'm not an expert of Powershell function at all, so I was wondering if somebody has some hints to help me.
Thanks a lot !
You must assign the result of the replacement before doing the next.
Here is an updated version of the function, incl a way to define your replacements:
function ReplaceSpecialChars([string]$string) {
# define replacements:
#{
"ä" = "ae"
"ß" = "ss"
# ...
}.GetEnumerator() | foreach {
$string = $string.Replace($_.Key, $_.Value)
}
return $string
}
Note that you might want to define lowercase + uppercase replacements.

Escape string for use in MySQL fulltext search

I am using Laravel 4 and have set up the following query:
if(Input::get('keyword')) {
$keyword = Input::get('keyword');
$search = DB::connection()->getPdo()->quote($keyword);
$query->whereRaw("MATCH(resources.name, resources.description, resources.website, resources.additional_info) AGAINST(? IN BOOLEAN MODE)",
array($search)
);
}
This query runs fine under normal use, however, if the user enters a string such as ++, an error is thrown. Looking at the MySQl docs, there are some keywords, such as + and - which have specific purposes. Is there a function which will escape these types of special characters from a string so it can be used in a fulltext search like above without throwing any errors?
Here is an example of an error which is thrown:
{"error":{"type":"Illuminate\\Database\\QueryException","message":"SQLSTATE[42000]: Syntax error or access violation: 1064 syntax error, unexpected '+' (SQL: select * from `resources` where `duplicate` = 0 and MATCH(resources.name, resources.description, resources.website, resources.additional_info) AGAINST('c++' IN BOOLEAN MODE))","file":"\/var\/www\/html\/[...]\/vendor\/laravel\/framework\/src\/Illuminate\/Database\/Connection.php","line":555}}
Solutions I've tried:
$search = str_ireplace(['+', '-'], ' ', $keyword);
$search = filter_var($keyword, FILTER_SANITIZE_STRING);
$search = DB::connection()->getPdo()->quote($keyword);
I'm assuming I will need to use regex. What's the best approach here?
Only the words and operators have meaning in Boolean search mode. Operators are: +, -, > <, ( ), ~, *, ", #distance. After some research I found what word characters are: Upper case, Lower case letters, Numeral (digit) and _. I think you can use one of two approaches:
Replace all non word characters with spaces (I prefer this approach). This can be accomplished with regex:
$search = preg_replace('/[^\p{L}\p{N}_]+/u', ' ', $keyword);
Replace characters-operators with spaces:
$search = preg_replace('/[+\-><\(\)~*\"#]+/', ' ', $keyword);
Only words are indexed by full text search engine and can be searched. Non word characters isn't indexed, so it does not make sense to leave them in the search string.
References:
Boolean Full-Text Searches
Fine-Tuning MySQL Full-Text Search (see: "Character Set Modifications")
PHP: preg_replace
PHP: Unicode character properties
PHP: Possible modifiers in regex patterns
While the answer from Rimas is technically correct, it will suit you only if you do not want users to use the MATCH operators, because it will strip them all completely. For example, I do want to allow use of all of them except #distance in search forms on my site, thus I've come up with this:
#Trim first
$newValue = preg_replace('/^\p{Z}+|\p{Z}+$/u', '', string);
#Remove all symbols except allowed operators and space. #distance is not included, since it's unlikely a human will be using it through UI form
$newValue = preg_replace('/[^\p{L}\p{N}_+\-<>~()"* ]/u', '', $newValue);
#Remove all operators, that can only precede a text and that are not preceded by either beginning of string or space
$newValue = preg_replace('/(?<!^| )[+\-<>~]/u', '', $newValue);
#Remove all double quotes and asterisks, that are not preceded by either beginning of string, letter, number or space
$newValue = preg_replace('/(?<![\p{L}\p{N}_ ]|^)[*"]/u', '', $newValue);
#Remove all double quotes and asterisks, that are inside text
$newValue = preg_replace('/([\p{L}\p{N}_])([*"])([\p{L}\p{N}_])/u', '', $newValue);
#Remove all opening parenthesis which are not preceded by beginning of string or space
$newValue = preg_replace('/(?<!^| )\(/u', '', $newValue);
#Remove all closing parenthesis which are not preceded by beginning of string or space or are not followed by end of string or space
$newValue = preg_replace('/(?<![\p{L}\p{N}_])\)|\)(?! |$)/u', '', $newValue);
#Remove all double quotes if the count is not even
if (substr_count($newValue, '"') % 2 !== 0) {
$newValue = preg_replace('/"/u', '', $newValue);
}
#Remove all parenthesis if count of closing does not match count of opening ones
if (substr_count($newValue, '(') !== substr_count($newValue, ')')) {
$newValue = preg_replace('/[()]/u', '', $newValue);
}
Unfortunately I was not able to figure out a way to do this in 1 regex, thus doing multiple runs. It's also possible, that I am missing some edge cases, as well. Any additions or corrections are appreciated: either here or create an issue for https://github.com/Simbiat/database where I implement this.

preg_match_all: get text inside quotes except in html tags

I've recently used a pattern to replace straight double quotes by pairs of opening/closing double quotes.
$string = preg_replace('/(\")([^\"]+)(\")/','“$2”',$string);
It works fine when $string is a sentence, even a paragraph.
But…
My function can be called to to the job for a chunk of HTML code, and it's not working as excepted anymore:
$string = preg_replace('/(\")([^\"]+)(\")/','“$2”','Something "with" quotes');
returns
<a href=“page.html”>Something “with” quotes</a>
And that's a problem…
So I thought I could do it in two passes: extract text within tags, then replace quotes.
I tried this
$pattern='/<[^>]+>(.*)<\/[^>]+>/';
And it works for instance if the string is
$string='Something "with" quotes';
But it's not working with strings like:
$string='Something "with" quotes Something "with" quotes';
Any idea?
Bertrand
Usual reply I guess... As it has been already pointed out, you should not parse HTML through Regex. You can take a look at the PHP Simple DOM Parse to extract the text and apply your regex, which from what you have already said, seems to be working just fine.
This tutorial should put you in the right direction.
I'm quite sure that this will end in a flame war but this works:
echo do_replace('Something "with" quotes')."\n";
echo do_replace('Something "with" quotes Something "with" quotes')."\n";
function do_replace($string){
preg_match_all('/<([^"]*?|"[^"]*")*>/', $string, $matches);
$matches = array_flip($matches[0]);
$uuid = md5(mt_rand());
while(strpos($string, $uuid) !== false) $uuid = md5(mt_rand());
// if you want better (time) garanties you could build a prefix tree and search it for a string not in it (would be O(n)
foreach($matches as $key => $value)
$matches[$key] = $uuid.$value;
$string = str_replace(array_keys($matches), $matches, $string);
$string = preg_replace('/\"([^\"<]+)\"/','“$1”', $string);
return str_replace($matches, array_keys($matches), $string);
}
output (I replaced “ and ” with “ and ”):
Something “with” quotes
Something “with” quotes Something “with” quotes
With a costum state machine you could even do it without the first replace and than replace back. I recomment to use a Parser anyway.
I finally found a way:
extract text that can be inside, or outside (before, after) any tag (if any)
use a callback to find quotes by pair and replace them.
code
$string = preg_replace_callback('/[^<>]*(?!([^<]+)?>)/sim', create_function('$matches', 'return preg_replace(\'/(\")([^\"]+)(\")/\', \'“$2”\', $matches[0]);'), $string);
Bertrand, resurrecting this question because it had a simple solution that lets you do the replace in one go—no need for a callback. (Found your question while doing some research for a general question about how to exclude patterns in regex.)
Here's our simple regex:
<[^>]*>(*SKIP)(*F)|"([^"]*)"
The left side of the alternation matches complete <tags> then deliberately fails. The right side matches double-quoted strings, and we know they are the right strings because they were not matched by the expression on the left.
This code shows how to use the regex (see the results at the bottom of the online demo):
<?php
$regex = '~<[^>]*>(*SKIP)(*F)|"([^"]*)"~';
$subject = 'Something "with" quotes Something "with" quotes';
$replaced = preg_replace($regex,"“$1”",$subject);
echo $replaced."<br />\n";
?>
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...

How do I escape special characters in MySQL?

For example:
select * from tablename where fields like "%string "hi" %";
Error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'hi" "' at line 1
How do I build this query?
The information provided in this answer can lead to insecure programming practices.
The information provided here depends highly on MySQL configuration, including (but not limited to) the program version, the database client and character-encoding used.
See http://dev.mysql.com/doc/refman/5.0/en/string-literals.html
MySQL recognizes the following escape sequences.
\0 An ASCII NUL (0x00) character.
\' A single quote (“'”) character.
\" A double quote (“"”) character.
\b A backspace character.
\n A newline (linefeed) character.
\r A carriage return character.
\t A tab character.
\Z ASCII 26 (Control-Z). See note following the table.
\\ A backslash (“\”) character.
\% A “%” character. See note following the table.
\_ A “_” character. See note following the table.
So you need
select * from tablename where fields like "%string \"hi\" %";
Although as Bill Karwin notes below, using double quotes for string delimiters isn't standard SQL, so it's good practice to use single quotes. This simplifies things:
select * from tablename where fields like '%string "hi" %';
I've developed my own MySQL escape method in Java (if useful for anyone).
See class code below.
Warning: wrong if NO_BACKSLASH_ESCAPES SQL mode is enabled.
private static final HashMap<String,String> sqlTokens;
private static Pattern sqlTokenPattern;
static
{
//MySQL escape sequences: http://dev.mysql.com/doc/refman/5.1/en/string-syntax.html
String[][] search_regex_replacement = new String[][]
{
//search string search regex sql replacement regex
{ "\u0000" , "\\x00" , "\\\\0" },
{ "'" , "'" , "\\\\'" },
{ "\"" , "\"" , "\\\\\"" },
{ "\b" , "\\x08" , "\\\\b" },
{ "\n" , "\\n" , "\\\\n" },
{ "\r" , "\\r" , "\\\\r" },
{ "\t" , "\\t" , "\\\\t" },
{ "\u001A" , "\\x1A" , "\\\\Z" },
{ "\\" , "\\\\" , "\\\\\\\\" }
};
sqlTokens = new HashMap<String,String>();
String patternStr = "";
for (String[] srr : search_regex_replacement)
{
sqlTokens.put(srr[0], srr[2]);
patternStr += (patternStr.isEmpty() ? "" : "|") + srr[1];
}
sqlTokenPattern = Pattern.compile('(' + patternStr + ')');
}
public static String escape(String s)
{
Matcher matcher = sqlTokenPattern.matcher(s);
StringBuffer sb = new StringBuffer();
while(matcher.find())
{
matcher.appendReplacement(sb, sqlTokens.get(matcher.group(1)));
}
matcher.appendTail(sb);
return sb.toString();
}
You should use single-quotes for string delimiters. The single-quote is the standard SQL string delimiter, and double-quotes are identifier delimiters (so you can use special words or characters in the names of tables or columns).
In MySQL, double-quotes work (nonstandardly) as a string delimiter by default (unless you set ANSI SQL mode). If you ever use another brand of SQL database, you'll benefit from getting into the habit of using quotes standardly.
Another handy benefit of using single-quotes is that the literal double-quote characters within your string don't need to be escaped:
select * from tablename where fields like '%string "hi" %';
MySQL has the string function QUOTE, and it should solve the problem
For strings like that, for me the most comfortable way to do it is doubling the ' or ", as explained in the MySQL manual:
There are several ways to include quote characters within a string:
A “'” inside a string quoted with “'” may be written as “''”.
A “"” inside a string quoted with “"” may be written as “""”.
Precede the quote character by an escape character (“\”).
A “'” inside a string quoted with “"” needs no special treatment and need not be doubled or escaped. In the same way, “"” inside a
Strings quoted with “'” need no special treatment.
It is from http://dev.mysql.com/doc/refman/5.0/en/string-literals.html.
You can use mysql_real_escape_string. mysql_real_escape_string() does not escape % and _, so you should escape MySQL wildcards (% and _) separately.
For testing how to insert the double quotes in MySQL using the terminal, you can use the following way:
TableName(Name,DString) - > Schema
insert into TableName values("Name","My QQDoubleQuotedStringQQ")
After inserting the value you can update the value in the database with double quotes or single quotes:
update table TableName replace(Dstring, "QQ", "\"")
If you're using a variable when searching in a string, mysql_real_escape_string() is good for you. Just my suggestion:
$char = "and way's 'hihi'";
$myvar = mysql_real_escape_string($char);
select * from tablename where fields like "%string $myvar %";