SQL replace function with MATCH() AGAINST() - mysql

I would like to use the replace function inside a match function, to remove \n characters before it searches matching rows. Otherwise, for example, if the text is FULLTEXT\nsearch, and the search is search, it will not match.
Here is my query (simplified) :
SELECT * FROM messages WHERE MATCH(REPLACE(body,'\\n',' ')) AGAINST ('mysearch' IN BOOLEAN MODE)
But it throws an error...
[EDIT]
After #Shadow 's answer, I tried this :
SELECT * FROM (SELECT REPLACE(body,'\\n',' ') AS rb FROM messages) AS rbody WHERE MATCH(rb) AGAINST ('mysearch');
I think the idea is correct, but I get an error ERROR 1210 (HY000): Incorrect arguments to MATCH. I think this is because I didn't index the column rb (FULLTEXT INDEX (rb)), so the MATCH () AGAINST () operation won't work.
So I update my question : How can one index a column of a subquery

The answer is that you cannot dynamically remove \n character sequence within a match() call. As MySQL manual on match() says:
MATCH() takes a comma-separated list that names the columns to be searched.
You either have to store \n differently, not as a character sequence or you need to have a separate field in which these characters are already filtered out and this additional field is used for fulltext searches.

Actually, waiting for a better solution, I will just add a column raw_body to my table, where I will store the exact body (I won't escape it with real_sacpe_string, I will just manually replace " and ' by \" and \'), and I will prepare the query and bind the params. However, I don't know if it is secure enough against sqlinjection.
[UPDATE]
Actually I found out that I didn't even needed to manually escape quotes, since the prepared statement is enough to prevent sqli. So I think I will just keep this solution for the moment

Related

How to get # word from database field with mysql?

I have a database field name called "vCaption" in which i have sentences.
In those sentences, somewhere, there are words with # symbol at the starting of that word. i need that particular word form that sentence. And if there is no # symbol word exist in the record then it should return null.
for example,
"my #childhood image from #1992 with my #Dad"
i have above record in my table.
What i need is only these three below words.
chilhood, 1992, Dad.
i tried REGEX and other mysql function but it doesnt get me what exactly i need.
Please help me here.
SELECT vCaption FROM tbl_post WHERE vCaption REGEXP '(?<= #|^#)\S*'
i have written above query. it return error
"#1139 - Got error 'repetition-operator operand invalid' from regexp"
To select the rows that have words starting with a #, you can use this:
SELECT mycolumn FROM mytable WHERE mycolumn REGEXP "#[[:alnum:]]+";
But this only selects the rows you want—not all the words.
You could:
try to transform mycolumn in the SELECT using MySQL string functions to remove unwanted words... frankly, not sure if that's possible.
post-process the selected rows in your language to extract the words you want. For instance, in PHP, preg_match_all("~#[[:alnum:]]+~", $yourstring, $m) would return all the #words into $m[0]

mysql replace last character if match

I have a table called media with a column called accounts_used in which the rows appear in the following format
68146, 67342, 60577, 61506, 67194, 67034, 63484, 49113, 61518, 66971, 67511,
67351, 63621, 67725, 63638, 68141, 66114, 67262, 67537, 67537, 61765, 63701,
67087, 62641, 61294, 67063, 67049, 67038, 67170, 67147, 67289, 61264, 67091,
63690, 63505, 63505, 49172, 52313, 67070, 66945, 67234, 62265, 61368, 67870,
67211, 67586, 49240, 67538, 67538, 67809, 67183, 67164, 62712, 67519, 66895,
67693, 60266, 60266, 67593, 67031, 67137, 62570, 60682, 61195, 67569, 67569,
67069, 62082, 67345, 61748, 61553, 52029, 66877, 62630, 67196, 67196, 67196,
67196, 67196, 67196, 66873, 63677, 68174, 67127, 63594, 67107, 60419, 66601,
68156, 67203, 68161, 60233, 66586, 52654, 63570, 66887, 67191, 60877, 52108,
67131, 61784, 67566, 67162, 67073, 67092, 67064, 60133, 66907, 67559, 66846,
60490, 60347, 66558, 48737, 61539, 67236, 68135, 67238 , 63656, 67585, 67512
If the row has a comma at the end I want to remove this, so for example if the row looks like the following
1,2,3,4,5,6,
I want to replace it to just this
1,2,3,4,5,6
Is this possible to do using just a simple query?
It is a bad idea to store lists of ids in rows. But, you are doing it. You can fix this by doing:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 1)
where accounts_used = '%,';
Instead, you should have a MediaAccounts table, with one row per "media" and one row per account.
EDIT:
Possibly, the row ends with a ', ' rather than just a comma:
update media
set accounts_used = left(accounts_used, length(accounts_used) - 2)
where accounts_used = '%, ';
We faced a similar string-replacement issue with a large dataset of bibliographic entries, where we also needed to trim extraneous punctuation from a large number of strings stored in the database which had been imported verbatim from another system. Many of the records in our dataset also contained Unicode characters, as such we needed to find a suitable SQL query that would allow us to find the relevant records that needed to be updated, and then to update them in a way that was Unicode (multibyte character) compatible under MySQL.
In testing with our dataset, I found performing a search for the relevant records we needed to update using MySQL's LEFT() and RIGHT() substring methods, performed better than using a LIKE pattern-match query. Additionally, MySQL's LENGTH() method returns the number of bytes in a string, rather than the number of characters, and the distinction is important when dealing with string fields that potentially contain multibyte character sequences as MySQL's substring methods operate on the number of characters to select, rather than the number of bytes. Thus using the LENGTH() method did not work in our case where many of strings under test contained multibyte characters. These requirements resulted in an UPDATE query with the format presented below:
UPDATE media
SET accounts_used = LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1)
WHERE RIGHT(accounts_used, 1) = ',';
The query selects records in the media table where the accounts_used column ends with a comma , (found here using the WHERE RIGHT(accounts_used, 1) = ',' clause to perform the filtering where the RIGHT() method returns a substring of specified length starting on the right of the provided string/column), and then uses the LEFT(accounts_used, CHAR_LENGTH(accounts_used) - 1) method call to perform the string trim operation, here trimming the last character from the accounts_used column value, where LEFT() returns a substring of specified length starting on the left of the provided string/column).
Here the use of the multibyte-aware CHAR_LENGTH() method – rather than the basic LENGTH() method – was important in our case due to the countless records in our dataset that contained multibyte characters. If you are only dealing with an ASCII-encoded or another single-byte encoded character set then the LENGTH() method would work perfectly, and indeed in that case CHAR_LENGTH() and LENGTH() would return the same length count, and could even be used interchangeably. When dealing with data that could contain multibyte characters, or if in doubt use the CHAR_LEGNTH() method instead as it will return an accurate character length count in either case.
Please note that the column and field names used in the example query above match those noted in the original question, and should be modified as needed to suit your own dataset needs.

PHP/MYSQL match against query

I am trying to run a match against query and it is not working. I created a full text index on the two fields. But am getting sql error right before word 'relationship". Here is sql:
"SELECT * FROM pages WHERE MATCH (shdescript,ldescript) AGAINST (romance, relationship)";
I have also tried just searching against shdescript and just searching against ldescript but get same error. Also I've tried searchstring without spaces. As far as I know, you are supposed to have the words of the searchstring separated by commas in parentheses. What am I doing wrong? Thanks.
Add quotes around your search string.
"SELECT * FROM pages WHERE MATCH (shdescript,ldescript) AGAINST ('romance', 'relationship')";
Also make sure you protect yourself against the nasty SQL injection threat, read more here.
Try quoting your string (i.e 'romance' and 'relationship')
SELECT * FROM pages WHERE MATCH (shdescript,ldescript) AGAINST ('romance', 'relationship')
I believe your AGAINST must be in quotes. From:
http://dev.mysql.com/doc/refman/4.1/en/fulltext-search.html#function_match
AGAINST takes a string to search for, and an optional modifier that
indicates what type of search to perform. The search string must be a
literal string, not a variable or a column name.

match (row) against ('text') wont return results

$qstring = "SELECT titulo as value, id FROM blogs WHERE titulo LIKE '%".$term."%' LIMIT 5";
$qstring = "SELECT titulo as value, id FROM blogs WHERE MATCH(titulo) AGAINST ('.$term.') LIMIT 5";
The first one will return results but not really related to the query
the second will return:
Can't find FULLTEXT index matching the column list
why?
check value in $term that should be greater then 3 in case of FULLTEXT index search otherwise it will return null
The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables. The default minimum value is four characters. If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable, you can set the ft_min_word_len variable by putting the following lines in an option file:
match() only works on field which have a FULLTEXT on them, exactly as the error message says. You'd have to do:
ALTER TABLE blogs ADD FULLTEXT INDEX tituolo_ft (titulo);
before you can use fulltext operations on the field.
As the error message implies, you can't use MATCH ... AGAINST unless there is a FULLTEXT index on the field you are comparing.
The LIKE statement should work though. I think the problem may be the double quotes in your pattern which are superfluous and will require corresponding quotes in the database value. Please show what database data you are trying to match.
In addition to the FULLTEXT index mentioned by others it looks like you are not properly quoting your text in the AGAINST clause. I think it should be:
AGAINST ('".$term."')
Or else, since you already have double quotes around your query just embed the variable:
AGAINST ('$term')

How to allow fulltext searching with hyphens in the search query

I have keywords like "some-or-other" where the hyphens matter in the search through my mysql database. I'm currently using the fulltext function.
Is there a way to escape the hyphen character?
I know that one option is to comment out #define HYPHEN_IS_DELIM in the myisam/ftdefs.h file, but unfortunately my host does not allow this. Is there another option out there?
Here's the code I have right now:
$search_input = $_GET['search_input'];
$keyword_safe = mysql_real_escape_string($search_input);
$keyword_safe_fix = "*'\"" . $keyword_safe . "\"'*";
$sql = "
SELECT *,
MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix') AS score
FROM table_name
WHERE MATCH(coln1, coln2, coln3) AGAINST('$keyword_safe_fix')
ORDER BY score DESC
";
From here http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
One solution to find a word with a dashes or hyphens in is to use FULL TEXT SEARCH IN BOOLEAN MODE, and to enclose the word with the hyphen / dash in double quotes.
Or from here http://bugs.mysql.com/bug.php?id=2095
There is another workaround. It was recently added to the manual:
"
Modify a character set file: This requires no recompilation. The true_word_char() macro
uses a “character type” table to distinguish letters and numbers from other
characters. . You can edit the contents in one of the character set XML
files to specify that '-' is a “letter.” Then use the given character set for your
FULLTEXT indexes.
"
Have not tried it on my own.
Edit: Here is some more additional info from here http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
A phrase that is enclosed within double quote (“"”) characters matches only rows that contain the phrase literally, as it was typed. The full-text engine splits the phrase into words and performs a search in the FULLTEXT index for the words. Prior to MySQL 5.0.3, the engine then performed a substring search for the phrase in the records that were found, so the match must include nonword characters in the phrase. As of MySQL 5.0.3, nonword characters need not be matched exactly: Phrase searching requires only that matches contain exactly the same words as the phrase and in the same order. For example, "test phrase" matches "test, phrase" in MySQL 5.0.3, but not before.
If the phrase contains no words that are in the index, the result is empty. For example, if all words are either stopwords or shorter than the minimum length of indexed words, the result is empty.
Some people would suggest to use the following query:
SELECT id
FROM texts
WHERE MATCH(text) AGAINST('well-known' IN BOOLEAN MODE)
HAVING text LIKE '%well-known%';
But by that you need many variants depending on the used fulltext operators. Task: Realize a query like +well-known +(>35-hour <39-hour) working week*. Too complex!
And do not forget the default len of ft_min_word_len so a search for up-to-date returns only date in your results.
Trick
Because of that I prefer a trick so constructions with HAVING etc aren't needed at all:
Instead of adding the following text to your database table: "The Up-to-Date Sorcerer" is a well-known science fiction short story. copy the hyphen words without hypens to the end of the text inside a comment: "The Up-to-Date Sorcerer" is a well-known science fiction short story.<!-- UptoDate wellknown -->
If the users searches for up-to-date remove the hyphen in the sql query:
MATCH(text) AGAINST('uptodate ' IN BOOLEAN MODE)
By that you're user can find up-to-date as one word instead of getting all results that contain only date (because ft_min_word_len kills up and to).
Of course before you echo the texts you should remove the <!-- ... --> comments.
Advantages
the query is simpler
the user is able to use all fulltext operators as usual
the query is faster.
If a user searches for -well-known +science MySQL treats that as not include *well*, could include *known* and must include *science*. This isn't what the user expected. The trick solves that, too (as the sql query searches for -wellknown +science)
Maybe simpler to use the Binary operator.
SELECT *
FROM your_table_name
WHERE BINARY your_column = BINARY "Foo-Bar%AFK+LOL"
http://dev.mysql.com/doc/refman/5.0/en/cast-functions.html#operator_binary
The BINARY operator casts the string following it to a binary string. This is an easy way to force a column comparison to be done byte by byte rather than character by character. This causes the comparison to be case sensitive even if the column is not defined as BINARY or BLOB. BINARY also causes trailing spaces to be significant.
My preferred solution to this is to remove the hyphen from the search term and from the data being searched. I keep two columns in my full-text table - search and return. search contains sanitised data with various characters removed, and is what the users' search terms are compared to, after my code has sanitised those as well.
Then I display the return column.
It does mean I have two copies of the data in my database, but for me that trade-off is well worth it. My FT table is only ~500k rows, so it's not a big deal in my use case.