I use full text indexing to find results faster and it works well except when the term i search for is attached to an underscore inside the database record.
My database records:
article.title
++++++++++++++++++++++++++++++
My article 123456 created
------------------------------
My article new_123456 created
------------------------------
My article 123456_new created
My match against query:
MATCH(article.title) AGAINST ( "123456*" IN BOOLEAN MODE )
This query return only the first record and ignore the others since the term "123456" is attached to an underscore ( _ ), either before or after the term, the query will ignore the records.
What is the thing I did wrong and how to fix this problem?
There are many things that can mess up FULLTEXT:
Punctuation
"stop words"
min word "length"
Language
It is sometimes best to edit the data before storing it. In your case, replacing "_" with " " might be the 'right' solution. That could be done either in your application code as you insert strings, or by using MySQL's REPLACE() as the string is INSERTed.
Maybe the question was not clear: What makes the difference, based on a character-by-character matching, between the two mail strings provided, as long as both were only [a-z] characters and in the same table (meaning same collation) to explain that some strings fail and some not? Anyone has a clue?
I've found several debates explaining the LIKE and = use in mySql, but not find a fulfilling answer for this issue:
I've found out that searching by whole mailboxes (not truncated, but only complete mailboxes so using no wildcards) using LIKE will not return few of them in my script, but they will match if using = sign instead.
(Of course currently function is updated and working properly with equal sign, but I would appreciate if someone could help me bring some light into this).
I can't reproduce the mailboxes per obvious security reasons but, can reproduce the structure of the last one I noticed that fails (each "x" represents a smallcap latin non-special character [a-z]):
select id from table_name where email = "xxx.xxxxxxxxx.xxxxxx#gmail.com"
Returns the id
select id from table_name where email like "xxx.xxxxxxxxx.xxxxxx#gmail.com"
Returns NULL
And this is giving me nuts, because normally it is working fine and, for example, with a structure like this:
select id from table_name where email = "xxxxx.xxxxxxxxx#gmail.com"
Returns the id
select id from table_name where email like "xxxxx.xxxxxxxxx#gmail.com"
Returns the id too
#_#
Maybe there's something wrong with the LIKE matching when there is more than one dot in the mailbox structure? Any other idea?
Thanks in advance for your time fellows.
I'm developing a Java desktop application that connects with a database, and I would like to know the next. It results that as far as I know, Prepared Statements avoid SQL injections while you don't make a direct concatenation with user data, but today I figured out that it doesn't escape String regex (like '%' from the LIKE operator,) due that it just escapes characters that could break up the String itself and alter the query. So, if user does:
Search = "%Dogs"; // User input
Query = "SELECT * FROM Table WHERE Field LIKE ?";
blah.setString(1, Search);
It will return all the rows that contains 'Dogs' at the beginning by injection.
Now I ask:
1-) Is this something bad / dangerous viewing from a global point?
2-) Is there a full list of Regex that Mysql could use from inside a String? if so, can you please share it with me?
Thank you.
If the user uses such meta characters in their search, the results may or may not be catastrophic, but a search for %% could be bad. A valid search for %Dogs may also not return the results the user was expecting which affects their experience.
LIKE only offers two meta characters, so you can escape them both on your own when acquired from users (simply using something akin to Search = Search.replaceAll("%", "\\\\%")).
I need some help with a RegEx. The concept is simple, but the actual solution is well beyond anything I know how to figure out. If anyone could explain how I could achieve my desired effect (and provide an explanation with any example code) it'd be much appreciated!
Basically, imagine a database table that stores the following string:
'My name is $1. I wonder who $2 is.'
First, bear in mind that the dollar sign-number format IS set in stone. That's not just for this example--that's how these wildcards will actually be stored. I would like an input like the following to be able to return the above string.
'My name is John. I wonder who Sarah is.'
How would I create a query that searches with wildcards in this format, and then returns the applicable rows? I imagine a regular expression would be the best way. Bear in mind that, theoretically, any number of wildcards should be acceptable.
Right now, this is the part of my existing query that drags the content out of the database. The concatenation, et cetera, is there because in a single database cell, there are multiple strings concatenated by a vertical bar.
AND CONCAT('|', content, '|')
LIKE CONCAT('%|', '" . mysql_real_escape_string($in) . "', '|%')
I need to modify ^this line to work with the variables that are a part of the query, while keeping the current effect (vertical bars, etc) in place. If the RegEx also takes into account the bars, then the CONCAT() functions can be removed.
Here is an example string with concatenation as it might appear in the database:
Hello, my name is $1.|Hello, I'm $1.|$1 is my name!
The query should be able to match with any of those chunks in the string, and then return that row if there is a match. The variables $1 should be treated as wildcards. Vertical bars will always delimit chunks.
For MySQL, this article is a nice guide which should help you. The Regexp would be "(\$)(\d+)". Here's a query I ripped off the article:
SELECT * FROM posts WHERE content REGEXP '(\\$)(\\d+)';
After retrieving data, use this handy function:
function ParseData($query,$data) {
$matches=array();
while(preg_match("/(\\$)(\\d+)/",$query,$matches)) {
if (array_key_exists(substr($matches[0],1),$data))
$query=str_replace($matches[0],"'".mysql_real_escape_string($data[substr($matches[0],1)])."'",$query);
else
$query=str_replace($matches[0],"''",$query);
}
return $query;
}
Usage:
$query="$1 went to $2's house";
$data=array(
'1' => 'Bob',
'2' => 'Joe'
);
echo ParseData($query,$data); //Returns "Bob went to Joe's house
If you aren't sticky about using the $1 and $2 and could change them around a bit, you could take a look at this:
http://php.net/manual/en/function.sprintf.php
E.G.
<?php
$num = 5;
$location = 'tree';
$format = 'There are %d monkeys in the %s';
printf($format, $num, $location);
?>
If you want to find entries in the database, then you can use a LIKE statement:
SELECT statement FROM myTable WHERE statement LIKE '%$1%'
Which will find all statements that include $1. I'm assuming that the first number to replace will always be $1 - it doesn't matter, in that case, that the total number of wildcards is arbitrary, as we're just looking for the first one.
The PHP replacement is a little trickier. You could probably do something like:
$count = 1;
while (strpos($statement, "$" . $count)) {
$statement = str_replace("$" . $count, $array[$count], $statement);
}
(I've not tested that, so there might be typos in there, but it should be enough to give the general idea.)
The one downside is that it will fail if you have more than ten parameters in your string to replace - the first runthrough will replace the first two characters of $10, as it's looking for $1.
I asked a different, but similar, question, and I think the solution applies to this question just as well.
https://stackoverflow.com/a/10763476/1382779
I have an InnoDB database table of 12 character codes which often need to be entered by a user.
Occasionally, a user will enter the code incorrectly (for example typing a lower case L instead of a 1 etc).
I'm trying to write a query that will find similar codes to the one they have entered but using LIKE '%code%' gives me way too many results, many of which contain only one matching character.
Is there a way to perform a more detailed check?
Edit - Case sensitive not required.
Any advice appreciated.
Thanks.
Have a look at soundex. Commonly misspelled strings have the same soundex code, so you can query for:
where soundex(Code) like soundex(UserInput)
use without wildcard % for that
SELECT `code` FROM table where code LIKE 'user_input'
thi wil also check the space
SELECT 'a' = 'a ', return 1 whereas SELCET 'a' LIKE 'a ' return 0
reference