MYSql Full Text Search acting weird in boolean mode - mysql

I am trying to use ngram in MySQL FTS (Full Text Search), created this table and added some rows (I set ngram size to 3 using ngram_token_size=3 in my.ini) :
CREATE TABLE articles (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
bookid INT,chapter INT,txt LONGTEXT ,FULLTEXT (txt) WITH PARSER ngram) ENGINE=InnoDB CHARACTER SET utf8mb4;
then started to search, my first query returns 0 items, which is correct :
SELECT * FROM articles WHERE MATCH (txt) AGAINST ('myterm' in boolean mode);
but when I enter this query it returns some rows that does not contain myterm!
SELECT * FROM articles WHERE MATCH (txt) AGAINST ('"myterm" #1' in boolean mode);
also this returns rows that does not contain myterm
SELECT * FROM articles WHERE MATCH (txt) AGAINST ('+myterm +anotherterm' in boolean mode);

Related

MySQL full text search - no partial recognition

I'm trying to build a keyword search tool based on mysql and I can only get results for full words while I would like to get results for partial matches too.
My db structure looks like this:
My db content looks like this:
This query works:
select * from chromext_keyword where matches (keyword) against ('Redmi')
But this one doesn't work (no result):
select * from chromext_keyword where matches (keyword) against ('red')
I tried with % but it did not solve the problem. I tried the natural language option as well as boolean but it didn't help.
Update with create table query:
CREATE TABLE chromext_keywords (
id int(10) NOT NULL,
keyword text NOT NULL,
blacklist text NOT NULL,
category text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
and insert:
INSERT INTO chromext_keywords (id, keyword, blacklist, category) VALUES
(1, 'Redmi Note 10', '9,8,pro', '2'),
(2, 'Realme GT', '6,7,8,narzo', '2');
and I added full text:
ALTER TABLE chromext_keywords
ADD UNIQUE KEY id (id);
ALTER TABLE chromext_keywords ADD FULLTEXT KEY keyword (keyword);
I have also tried innoDb and Myisam
Am I missing something?
Thanks
You should check for Minimum word lenght setting ..
in mysql the minimum length for full text search in limited by the param
ft_min_word_len
and the defualt value is for words > 3
take a look at the related docs
https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html
I have finally found the answer.
The following query works:
SELECT * FROM chromext_keywords WHERE match(keyword) against('(re*)') IN BOOLEAN MODE)
With multiple keywords:
SELECT * FROM chromext_keywords WHERE match (keyword) against ('(+red*+not*)') IN BOOLEAN MODE)
I still need to figure out how to cover spelling mistakes. If anyone has an idea, let me know.

MySQL Fulltext MATCH/AGAINST showing not always results

I have the following table setup:
CREATE TABLE IF NOT EXISTS `search_table` (
`fulltext_id` int(10) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Entity ID',
`data_index` longtext COMMENT 'Data index',
PRIMARY KEY (`fulltext_id`),
FULLTEXT KEY `FTI_CATALOGSEARCH_FULLTEXT_DATA_INDEX` (`data_index`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='Search table'
AUTO_INCREMENT=1;
INSERT INTO `search_table` (`fulltext_id`, `data_index`)
VALUES (1, 'Test Hello abc');
Then I try a full text search on it with 3 different query texts:
SELECT `s`.`fulltext_id`, MATCH (s.data_index) AGAINST ('Test' IN BOOLEAN MODE) AS `relevance` FROM `search_table` AS `s`
WHERE (MATCH (s.data_index) AGAINST ('Test' IN BOOLEAN MODE));
SELECT `s`.`fulltext_id`, MATCH (s.data_index) AGAINST ('Hello' IN BOOLEAN MODE) AS `relevance` FROM `search_table` AS `s`
WHERE (MATCH (s.data_index) AGAINST ('Hello' IN BOOLEAN MODE));
SELECT `s`.`fulltext_id`, MATCH (s.data_index) AGAINST ('abc' IN BOOLEAN MODE) AS `relevance` FROM `search_table` AS `s`
WHERE (MATCH (s.data_index) AGAINST ('abc' IN BOOLEAN MODE));
Only the first query (the search for Test) gives a result back, the other two not. I don't understand why?
You should check the list of currently defined stopwords. You can do this like this:
mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
More information about MySQL stopwords can be found here: https://dev.mysql.com/doc/refman/5.7/en/fulltext-stopwords.html
Hello for example is a known stopword therefore it is being ignored during FTS matching. If you check comments on the linked MySQL doc page you will find links from user to English language stopwords lists, like https://www.ranks.nl/stopwords/.
Note, MySQL as well as other DB engines allows you to specify your own custom list of stop words. So you should check both pre-defined system stopwords list and any existing custom stopwords lists.

Mysql stop words and match

I'm hoping someone can help with a query I have regarding MYSQL stopwords and match
If I were to run the below mysql query:
CREATE TABLE `tbladvertstest` (`id` int(11) unsigned NOT NULL AUTO_INCREMENT,`SearchCompany` varchar(250) DEFAULT NULL,PRIMARY KEY (`id`),FULLTEXT KEY `SearchCompany` (`SearchCompany`)) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `tbladvertstest` (`id`, `SearchCompany`) VALUES (NULL, 'All Bar One');
SELECT * FROM `tbladvertstest` WHERE MATCH (`SearchCompany`) AGAINST ('"All Bar One"' IN BOOLEAN MODE)
I thought the query would return all results where 'SearchCompany' is "All Bar One" but there are zero rows returned. I'm assuming Mysql is checking against each word individually rather than looking at the full string and the stopwords and minium word lengths are the reason it not returning any results? Would I be right? If so is it possible to get MySQL to see it as a string?
By default, MySQL FULLTEXT indexes will not index words shorter than 4 characters long (for MyISAM tables). If you want to index 3 letter words, you need to set the ft_min_word_len system variable (or innodb_ft_min_token_size for InnoDB) to a value of 3 or lower, then restart mysqld and rebuild your table.
For example, add this to the [mysqld] section of your my.cnf file:
ft_min_word_len=3
Then restart mysqld.
Then run this command to rebuild the table:
alter table `tbladvertstest` force;
Now your query should work:
mysql > SELECT *
-> FROM `tbladvertstest`
-> WHERE MATCH (`SearchCompany`) AGAINST ('+"All Bar One"' IN BOOLEAN MODE) ;
+----+---------------+
| id | SearchCompany |
+----+---------------+
| 1 | All Bar One |
+----+---------------+
1 row in set (0.00 sec)
You need to specify the operators... If you want to require the whole phrase, it needs to be in quotes:
SELECT * FROM table WHERE MATCH(SearchCompany) AGAINST ('+"All Bar One"' IN BOOLEAN MODE);

Mysql match against numeric keyword

I use mysql full-text index.
I found it can not matches a key numeric word as '1' in '1,2,3' or '1 2 3'.
I use this query "SELECT * FROM users u where match(u.leader_uids) against('1' IN BOOLEAN MODE);"
How to solve this issue?
Thanks a lot!
I am Providing the example hope it will works for you i think
MATCH (field) AGAINST ('+856049' IN BOOLEAN MODE)
It will work only with words of 4 or more digits. So you must concat some prefix in the leader_uid before saving it. Example:
CREATE TABLE mytable(
id INT NOT NULL KEY AUTO_INCREMENT,
myfield TEXT,
FULLTEXT KEY ix_mytable (myfield)
);
INSERT INTO mytable (myfield) VALUES
('id_1 id_2 id_3'),
('id_8'),
('id_4 id_1');
SELECT * FROM mytable
WHERE MATCH(myfield) AGAINST ('+id_1' IN BOOLEAN MODE);
-- will select rows 1 and 3
You can change the minimum amount of chars required for the words, in mysql config:
https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_ft_min_token_size

INSTR(str,substr) does not work when str contains 'é' or 'ë' and substr only 'e'

In another post on stackoverflow, I read that INSTR could be used to order results by relevance.
My understanding of col LIKE '%str%' andINSTR(col, 'str')` is that they both behave the same. There seems to be a difference in how collations are handled.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO users (name)
VALUES ('Joël'), ('René');
SELECT * FROM users WHERE name LIKE '%joel%'; -- 1 record returned
SELECT * FROM users WHERE name LIKE '%rene%'; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'joel') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'rene') > 0; -- 0 records returned
SELECT * FROM users WHERE INSTR(name, 'joël') > 0; -- 1 record returned
SELECT * FROM users WHERE INSTR(name, 'rené') > 0; -- 1 record returned
Although INSTR does some conversion, it finds ë in é.
SELECT INSTR('é', 'ë'), INSTR('é', 'e'), INSTR('e', 'ë');
-- returns 1, 0, 0
Am I missing something?
http://sqlfiddle.com/#!2/9bf21/6 (using mysql-version: 5.5.22)
This is due to bug 70767 on LOCATE() and INSTR(), which has been verified.
Though the INSTR() documentation states that it can be used for multi-byte strings, it doesn't seem to work, as you note, with collations like utf8_general_ci, which should be case and accent insensitive
This function is multi-byte safe, and is case sensitive only if at least one argument is a binary string.
The bug report states that although MySQL does this correctly it only does so when the number of bytes is also identical:
However, you can easily observe that they do not (completely) respect collations when looking for one string inside another one. It seems that what's happening is that MySQL looks for a substring which is collation-equal to the target which has exactly the same length in bytes as the target. This is only rarely true.
To pervert the reports example, if you create the following table:
create table t ( needle varchar(10), haystack varchar(10)
) COLLATE=utf8_general_ci;
insert into t values ("A", "a"), ("A", "XaX");
insert into t values ("A", "á"), ("A", "XáX");
insert into t values ("Á", "a"), ("Á", "XaX");
insert into t values ("Å", "á"), ("Å", "XáX");
then run this query, you can see the same behaviour demonstrated:
select needle
, haystack
, needle=haystack as `=`
, haystack LIKE CONCAT('%',needle,'%') as `like`
, instr(needle, haystack) as `instr`
from t;
SQL Fiddle