In my sql code, I receive a input string as a regular match's filter, I want to have the whole string as a normal string, even it includes some special characters.
Just look below:
do $$ declare
jdata jsonb='[{"name":"Dog 3*240+1*120"}]'::jsonb;
vfilter1 text='dog';
vfilter2 text='3*240+1*120';
vexists bool=false;
begin
select jdata #? concat('$[*] ? (#.name like_regex "',vfilter1,'" flag "i")')::jsonpath into vexists;
raise notice 'exists:%',vexists; --the result is true
select jdata #? concat('$[*] ? (#.name like_regex "',vfilter2,'" flag "i")')::jsonpath into vexists;
raise notice 'exists:%',vexists;-- the result is false
end;
$$ language plpgsql;
the string 3*240+1*120 include + and * characters, perhaps this causes the regular match have them as special character. In my code, I just want to have the whole vfilter string includes all special characters together as a normal string for the regular match.
What should I do?
You should read the documentation for the feature you are using.
The optional flag string may include one or more of the characters i
for case-insensitive match, m to allow ^ and $ to match at newlines, s
to allow . to match a newline, and q to quote the whole pattern
(reducing the behavior to a simple substring match).
In a wordpress theme, I am using the "posts_where" filter to add search to the "excerpt" field. It is working excepted when there is a single quote in the search string, leading to a SQL synthax error.
It seems to be a bug in the preg_replace function off the posts_where filter.
For example, for the string "o'kine" , the $where string received in the posts_where filter is :
"AND (((cn_posts.post_title LIKE '%o\'kine%') OR (cn_posts.post_content LIKE '%o\'kine%')))"
Then this is my preg_replace to add the post_excerpt field :
$where = preg_replace(
"/post_title\s+LIKE\s*(\'[^\']+\')/",
"post_title LIKE $1) OR (post_excerpt LIKE $1", $where );
And the value of the $where after :
"AND (((cn_posts.post_title LIKE '%o\') OR (post_excerpt LIKE '%o\'kine%') OR (cn_posts.post_content LIKE '%o\'kine%')))"
See the '%o\' part that is causing the SQL synthax error.
The expected result would be :
"AND (((cn_posts.post_title LIKE '%o\'kine%') OR (post_excerpt LIKE '%o\'kine%') OR (cn_posts.post_content LIKE '%o\'kine%')))"
The bug is clearly in my regular expression, more precisely in my capturing parentheses. I do not know how to deal with the possibility of zero or more single quote in my search string?
EDIT : With Casimir et Hippolyte answer, this is the working filter with single quote in the search string :
function cn_search_where( $where ) {
$where = preg_replace(
"/post_title\s+LIKE\s*('[^'\\\\]*+(?s:\\\\.[^'\\\\]*)*+')/",
"post_title LIKE $1) OR (post_excerpt LIKE $1", $where );
return $where;
}
The subpattern to match a quoted string with eventual escaped quotes (or other characters) is:
'[^'\\]*+(?s:\\.[^'\\]*)*+'
(note that to figure a literal backslash in a regex pattern, it must be escaped since the backslash is a special character)
So in a php string (backslashes need to be escaped one more time):
$pattern = "~'[^'\\\\]*+(?s:\\\\.[^'\\\\]*)*+'~";
With this information, I think you can build the pattern yourself.
details:
' # a literal single quote
[^'\\]*+ # zero or more characters that are not a single quote or a backslash
(?s: # open a non-capture group with the s modifier (the dot can match newlines)
\\. # an escaped character
[^'\\]*
)*+ # repeat the group zero or more times
'
Am using opencsv 2.3 and it does not appear to be dealing with escape characters as I expect. I need to be able to handle an escaped separator in a CSV file that does not use quoting characters.
Sample test code:
CSVReader reader = new CSVReader(new FileReader("D:/Temp/test.csv"), ',', '"', '\\');
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
for (String string : nextLine) {
System.out.println("Field [" + string + "].");
}
}
and the csv file:
first field,second\,field
and the output:
Field [first field].
Field [second].
Field [field].
Note that if I change the csv to
first field,"second\,field"
then I get the output I am after:
Field [first field].
Field [second,field].
However, in my case I do not have the option of modifying the source CSV.
Unfortunately it looks like opencsv does not support escaping of separator characters unless they're in quotes. The following method (taken from opencsv's source) is called when an escape character is encountered.
protected boolean isNextCharacterEscapable(String nextLine, boolean inQuotes, int i) {
return inQuotes // we are in quotes, therefore there can be escaped quotes in here.
&& nextLine.length() > (i + 1) // there is indeed another character to check.
&& (nextLine.charAt(i + 1) == quotechar || nextLine.charAt(i + 1) == this.escape);
}
As you can see, this method only returns true if the character following the escape character is a quote character or another escape character. You could patch the library to this, but in its current form, it won't let you do what you're trying to do.
I am using Laravel 4 and have set up the following query:
if(Input::get('keyword')) {
$keyword = Input::get('keyword');
$search = DB::connection()->getPdo()->quote($keyword);
$query->whereRaw("MATCH(resources.name, resources.description, resources.website, resources.additional_info) AGAINST(? IN BOOLEAN MODE)",
array($search)
);
}
This query runs fine under normal use, however, if the user enters a string such as ++, an error is thrown. Looking at the MySQl docs, there are some keywords, such as + and - which have specific purposes. Is there a function which will escape these types of special characters from a string so it can be used in a fulltext search like above without throwing any errors?
Here is an example of an error which is thrown:
{"error":{"type":"Illuminate\\Database\\QueryException","message":"SQLSTATE[42000]: Syntax error or access violation: 1064 syntax error, unexpected '+' (SQL: select * from `resources` where `duplicate` = 0 and MATCH(resources.name, resources.description, resources.website, resources.additional_info) AGAINST('c++' IN BOOLEAN MODE))","file":"\/var\/www\/html\/[...]\/vendor\/laravel\/framework\/src\/Illuminate\/Database\/Connection.php","line":555}}
Solutions I've tried:
$search = str_ireplace(['+', '-'], ' ', $keyword);
$search = filter_var($keyword, FILTER_SANITIZE_STRING);
$search = DB::connection()->getPdo()->quote($keyword);
I'm assuming I will need to use regex. What's the best approach here?
Only the words and operators have meaning in Boolean search mode. Operators are: +, -, > <, ( ), ~, *, ", #distance. After some research I found what word characters are: Upper case, Lower case letters, Numeral (digit) and _. I think you can use one of two approaches:
Replace all non word characters with spaces (I prefer this approach). This can be accomplished with regex:
$search = preg_replace('/[^\p{L}\p{N}_]+/u', ' ', $keyword);
Replace characters-operators with spaces:
$search = preg_replace('/[+\-><\(\)~*\"#]+/', ' ', $keyword);
Only words are indexed by full text search engine and can be searched. Non word characters isn't indexed, so it does not make sense to leave them in the search string.
References:
Boolean Full-Text Searches
Fine-Tuning MySQL Full-Text Search (see: "Character Set Modifications")
PHP: preg_replace
PHP: Unicode character properties
PHP: Possible modifiers in regex patterns
While the answer from Rimas is technically correct, it will suit you only if you do not want users to use the MATCH operators, because it will strip them all completely. For example, I do want to allow use of all of them except #distance in search forms on my site, thus I've come up with this:
#Trim first
$newValue = preg_replace('/^\p{Z}+|\p{Z}+$/u', '', string);
#Remove all symbols except allowed operators and space. #distance is not included, since it's unlikely a human will be using it through UI form
$newValue = preg_replace('/[^\p{L}\p{N}_+\-<>~()"* ]/u', '', $newValue);
#Remove all operators, that can only precede a text and that are not preceded by either beginning of string or space
$newValue = preg_replace('/(?<!^| )[+\-<>~]/u', '', $newValue);
#Remove all double quotes and asterisks, that are not preceded by either beginning of string, letter, number or space
$newValue = preg_replace('/(?<![\p{L}\p{N}_ ]|^)[*"]/u', '', $newValue);
#Remove all double quotes and asterisks, that are inside text
$newValue = preg_replace('/([\p{L}\p{N}_])([*"])([\p{L}\p{N}_])/u', '', $newValue);
#Remove all opening parenthesis which are not preceded by beginning of string or space
$newValue = preg_replace('/(?<!^| )\(/u', '', $newValue);
#Remove all closing parenthesis which are not preceded by beginning of string or space or are not followed by end of string or space
$newValue = preg_replace('/(?<![\p{L}\p{N}_])\)|\)(?! |$)/u', '', $newValue);
#Remove all double quotes if the count is not even
if (substr_count($newValue, '"') % 2 !== 0) {
$newValue = preg_replace('/"/u', '', $newValue);
}
#Remove all parenthesis if count of closing does not match count of opening ones
if (substr_count($newValue, '(') !== substr_count($newValue, ')')) {
$newValue = preg_replace('/[()]/u', '', $newValue);
}
Unfortunately I was not able to figure out a way to do this in 1 regex, thus doing multiple runs. It's also possible, that I am missing some edge cases, as well. Any additions or corrections are appreciated: either here or create an issue for https://github.com/Simbiat/database where I implement this.
I've a String with this value: Hello ""World""
I'd like to replace the double-double quotes characters with a single double quote, like this: Hello "World"
How can I do that? I use to call the split().join() methods, however, I cannot find the correct regular expression which it works.
You can do a string replace with regex like this:
str = str.replace( /\""/g, '"' )