Mysql stop words and match - mysql

I'm hoping someone can help with a query I have regarding MYSQL stopwords and match
If I were to run the below mysql query:
CREATE TABLE `tbladvertstest` (`id` int(11) unsigned NOT NULL AUTO_INCREMENT,`SearchCompany` varchar(250) DEFAULT NULL,PRIMARY KEY (`id`),FULLTEXT KEY `SearchCompany` (`SearchCompany`)) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `tbladvertstest` (`id`, `SearchCompany`) VALUES (NULL, 'All Bar One');
SELECT * FROM `tbladvertstest` WHERE MATCH (`SearchCompany`) AGAINST ('"All Bar One"' IN BOOLEAN MODE)
I thought the query would return all results where 'SearchCompany' is "All Bar One" but there are zero rows returned. I'm assuming Mysql is checking against each word individually rather than looking at the full string and the stopwords and minium word lengths are the reason it not returning any results? Would I be right? If so is it possible to get MySQL to see it as a string?

By default, MySQL FULLTEXT indexes will not index words shorter than 4 characters long (for MyISAM tables). If you want to index 3 letter words, you need to set the ft_min_word_len system variable (or innodb_ft_min_token_size for InnoDB) to a value of 3 or lower, then restart mysqld and rebuild your table.
For example, add this to the [mysqld] section of your my.cnf file:
ft_min_word_len=3
Then restart mysqld.
Then run this command to rebuild the table:
alter table `tbladvertstest` force;
Now your query should work:
mysql > SELECT *
-> FROM `tbladvertstest`
-> WHERE MATCH (`SearchCompany`) AGAINST ('+"All Bar One"' IN BOOLEAN MODE) ;
+----+---------------+
| id | SearchCompany |
+----+---------------+
| 1 | All Bar One |
+----+---------------+
1 row in set (0.00 sec)

You need to specify the operators... If you want to require the whole phrase, it needs to be in quotes:
SELECT * FROM table WHERE MATCH(SearchCompany) AGAINST ('+"All Bar One"' IN BOOLEAN MODE);

Related

MYSql Full Text Search acting weird in boolean mode

I am trying to use ngram in MySQL FTS (Full Text Search), created this table and added some rows (I set ngram size to 3 using ngram_token_size=3 in my.ini) :
CREATE TABLE articles (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
bookid INT,chapter INT,txt LONGTEXT ,FULLTEXT (txt) WITH PARSER ngram) ENGINE=InnoDB CHARACTER SET utf8mb4;
then started to search, my first query returns 0 items, which is correct :
SELECT * FROM articles WHERE MATCH (txt) AGAINST ('myterm' in boolean mode);
but when I enter this query it returns some rows that does not contain myterm!
SELECT * FROM articles WHERE MATCH (txt) AGAINST ('"myterm" #1' in boolean mode);
also this returns rows that does not contain myterm
SELECT * FROM articles WHERE MATCH (txt) AGAINST ('+myterm +anotherterm' in boolean mode);

MySQL full text search - no partial recognition

I'm trying to build a keyword search tool based on mysql and I can only get results for full words while I would like to get results for partial matches too.
My db structure looks like this:
My db content looks like this:
This query works:
select * from chromext_keyword where matches (keyword) against ('Redmi')
But this one doesn't work (no result):
select * from chromext_keyword where matches (keyword) against ('red')
I tried with % but it did not solve the problem. I tried the natural language option as well as boolean but it didn't help.
Update with create table query:
CREATE TABLE chromext_keywords (
id int(10) NOT NULL,
keyword text NOT NULL,
blacklist text NOT NULL,
category text NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
and insert:
INSERT INTO chromext_keywords (id, keyword, blacklist, category) VALUES
(1, 'Redmi Note 10', '9,8,pro', '2'),
(2, 'Realme GT', '6,7,8,narzo', '2');
and I added full text:
ALTER TABLE chromext_keywords
ADD UNIQUE KEY id (id);
ALTER TABLE chromext_keywords ADD FULLTEXT KEY keyword (keyword);
I have also tried innoDb and Myisam
Am I missing something?
Thanks
You should check for Minimum word lenght setting ..
in mysql the minimum length for full text search in limited by the param
ft_min_word_len
and the defualt value is for words > 3
take a look at the related docs
https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html
I have finally found the answer.
The following query works:
SELECT * FROM chromext_keywords WHERE match(keyword) against('(re*)') IN BOOLEAN MODE)
With multiple keywords:
SELECT * FROM chromext_keywords WHERE match (keyword) against ('(+red*+not*)') IN BOOLEAN MODE)
I still need to figure out how to cover spelling mistakes. If anyone has an idea, let me know.

Fulltext search in mysql doesn't retrieve all rows

i've a problem with a query in mysql.
This is what i done:
CREATE TABLE `dar`.`MyTable` (
`MyCol` VARCHAR(100) NOT NULL,
FULLTEXT INDEX `Index_1`(`MyCol`)
)
ENGINE = MyISAM;
INSERT INTO MyTable (MyCol)
VALUES ('6002.C3'),
('6002'),
('6002R1'),
('6003.C4'),
('AA6002.X'),
('BB 6002.X');
This is not necessary, but i've done anyway:
REPAIR TABLE MyTable QUICK;
Now, i execute the next query:
SELECT MyCol FROM MyTable
WHERE MATCH(MyCol) AGAINST ('6002*');
And, it doesn't return any row!!
The parameter ft_min_word_len i've changed to 2, but nothing is changed.
When deleting the row with 'BB 6002.X' the query returns 2 rows!!
6002
6002.C3
That is creepy.
Any idea what is happening here?
I need the query return:
6002.C3
6002
6002R1
Plus if include:
AA6002.X
BB 6002.X
Thanks in advance!!
You are past the 50% threshold in your dataset. Try
SELECT MyCol FROM MyTable
WHERE MATCH(MyCol) AGAINST ('6003');
And see what the result is.
The 50% threshold has a significant implication when you first try full-text searching to see how it works: If you create a table and insert only one or two rows of text into it, every word in the text occurs in at least 50% of the rows. As a result, no search returns any results. Be sure to insert at least three rows, and preferably many more. Users who need to bypass the 50% limitation can use the boolean search mode; see Section 12.9.2, “Boolean Full-Text Searches”.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-natural-language.html

MySQL Order By doesn't work on Concat(enum)

Currently we have an interessting problem regarding the sort order of MySQL in an enum-field. The fields enum entries have been sorted in the order we want it. Just to be save, we added a CONCAT around it, so it would be cast to char and ordered in alphabetical order, just as suggested by the MySQL-reference (MySQL Reference - Enum)
Make sure that the column is sorted lexically rather than by index number by coding ORDER BY CAST(col AS CHAR) or ORDER BY CONCAT(col).
But that didn't produce the expected results, so we started to investigate further. It seems that the order by statement doesn't work on a combination of enum and the concat function. I've wrote the following sample script, which should show my point:
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
INSERT INTO test
VALUES ('b'), ('c'), ('a');
SELECT * FROM test; -- b, c, a
SELECT * FROM test ORDER BY col1 ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS CHAR) ASC; -- a, b, c
SELECT * FROM test ORDER BY CAST(col1 AS BINARY) ASC; -- a, b, c
SELECT * FROM test ORDER BY CONCAT(col1) ASC; -- b, c, a - This goes wrong
I am currently suspecting some kind of problem with the collation/encoding, but I'm not sure. My databases default encoding is also utf8. The MySQL version is 5.6.12 but it seems to be reproduceable with MySQL 5.1. The storage engine is MyIsam but it also occurs with the memory engine.
Any help would be appreciated.
Update:
As it seems the problem is produced only in MySQL 5.6 and by the collation of the column. With the first CREATE TABLE statement, the queries work fine.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_general_ci DEFAULT NULL
)
With the second they don't.
CREATE TABLE test (
`col1` enum('a','b','c') COLLATE utf8_bin DEFAULT NULL
)
The collation of the table and/or database don't seem to affect the queries. The queries can be tested in this SQL Fiddle
Strange,it works in this fiddle.Do you have a trigger or something?
http://sqlfiddle.com/#!2/0976a/2
BUT,in 5.6 goes haywire:
http://sqlfiddle.com/#!9/0976a/1
Mysql bug,probably.
More,if you input the values in the enum in the "proper" order it works:
http://sqlfiddle.com/#!9/a3784/1
IN the doc:
ENUM values are sorted based on their index numbers, which depend on
the order in which the enumeration members were listed in the column
specification. For example, 'b' sorts before 'a' for ENUM('b', 'a').
As per the document:
Under the Handling of Enumeration Literals section, it states that:
If you store a number into an ENUM column, the number is treated as
the index into the possible values, and the value stored is the
enumeration member with that index. (However, this does not work with
LOAD DATA, which treats all input as strings.) If the numeric value is
quoted, it is still interpreted as an index if there is no matching
string in the list of enumeration values. For these reasons, it is not
advisable to define an ENUM column with enumeration values that look
like numbers, because this can easily become confusing.
For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3:
numbers ENUM('0','1','2')
If you store 2, it is interpreted as an
index value, and becomes '1' (the value with index 2). If you store
'2', it matches an enumeration value, so it is stored as '2'. If you
store '3', it does not match any enumeration value, so it is treated
as an index and becomes '2' (the value with index 3).
mysql> INSERT INTO t (numbers) VALUES(2),('2'),('3');
mysql> SELECT * FROM t;
+---------+
| numbers |
+---------+
| 1 |
| 2 |
| 2 |
+---------+
In your case:
INSERT INTO test
VALUES ('2'), ('3'), ('1');
Index value of '2' is 2, '3' is 3 and '1' is 1.
So the output is 2,3,1

best query to update a big mysql database (rows) against a badword list

I have a table with 8 million rows, which needs to be scanned against a huge list of badwords.
My first idea was to:
UPDATE `master` SET `blacklisted`='1' WHERE MATCH (`content-desc`, `content-title`) AGAINST ('
badword1 | badword2 | badword3 | "and many more"' IN BOOLEAN MODE)
unfortunately this version forgot some words and was not case-insensitive!
next try was to
$badwords = array("badword1","badword2","badword3","and-many-more");
foreach($badwords AS $name)
{
$sql = "UPDATE `master` SET `blacklisted`='1' WHERE concat(`content-title(mediumtext)`,`content-desc(mediumtext)`) LIKE '%".$name."%'";
sleep(6);
// Could limit this query by 100.000 and adding another foreach loop help?
// How would the foreach look like (select count(*) from master?)/100.000
}
a lot of queries which killed my server immediately!
Maybe the commented idea could help?! (but howto?)
Who has the best idea how to solve this query, without stressing the mysql server too much?
Thank you!
Not sure how this will perform on your table, but you can do a case insensitive comparison, as part of a join clause.
So, you have a table that you want to scan (with 8m rows)
CREATE TABLE IF NOT EXISTS haystack (
word varchar(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- Dumping data for table haystack
INSERT INTO haystack (word) VALUES
('a cat is'),
('category'),
('cat'),
('decatur'),
('dog'),
('pigeon'),
('eagle'),
('a beagle'),
('Cat'),
('CAT');
And a table with bad words
CREATE TABLE IF NOT EXISTS needles (
bad_word varchar(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- Dumping data for table needles
INSERT INTO needles (bad_word) VALUES
('cat'),
('eagle');
The following query will JOIN the two tables, case-insensitively, with fuzzy matching.
SELECT * FROM haystack AS h JOIN needles AS n ON h.word COLLATE utf8_general_ci LIKE CONCAT('%' , n.bad_word , '%');
You can of course, perform an update on the results easily enough. Here are the results I got... If you do not want to exclude words like "Dickson", "Stitsville", "Assume", it gets much more tricky.
word bad_word
a cat is cat
category cat
cat cat
decatur cat
eagle eagle
a beagle eagle
Cat cat
CAT cat