Full Text Search not working with numbers - mysql

I have this table with song titles, artist etc.
| id | artist | title | search_tags |
| 1 | miley cyrus | 23 | miley cyrus 23 |
This is my query:
select * from music where match(search_tags) against ('+$search_value*' IN BOOLEAN MODE)
It works fine when I search: miley but it doesn't show any results when search_tags = 23
Note: I'm using MySQL 5.6 with InnoDB on Windows 10 and ft_min_word_len=1

When searching for "23" will not work because the length of the key is small.
MySQL by default stored keys in the fulltext index with a min of 4 characters. you will need to change that to 1 or 2 for your query to work.
What you need to do here is
Add this line to your my.ini file
innodb_ft_min_token_size = 1
Restart MySQL Service
After the server comes back up, rebuild your tables by issuing a fake ALTER command.
ALTER TABLE table_name ENGINE=INNODB;
Run the query again and it should work :)
Good Luck

Notice that ft_min_word_len is 4 by default. The token 500 is length 3. thus it will not be indexed at all. You will have to do three(3) things:
STEP 01 : Open the init file inside mysql folder
Add this to \bin\mysql\mysql5.7.24
innodb_ft_min_token_size = 1
STEP 02 : Restart mysql
STEP 03 : Reindex all indexes in the models table
You could just drop and add the FULLTEXT index

Related

FullText Search Innodb Fails, MyIsam Returns Results

I've upgraded a table from myisam to innodb but am not having the same performance. The innodb returns a 0 score when there should be some relation. The myisam table returns a match for the same term (I kept a copy of the old table so I can still run the same query).
SELECT MATCH (COLUMNS) AGAINST ('+"Term Ex"' IN BOOLEAN MODE) as score
FROM table_myisam
where id = 1;
Returns:
+-------+
| score |
+-------+
| 1 |
+-------+
but:
SELECT MATCH (COLUMNS) AGAINST ('+"Term Ex"' IN BOOLEAN MODE) as score
FROM table
where id = 1;
returns:
+-------+
| score |
+-------+
| 0 |
+-------+
I thought the ex might not have been indexed because innodb_ft_min_token_size was set to 3. I lowered that to 1 and optimized the table but that had no affect. The column contents are 99 characters long so I presumed the whole column wasn't indexed because of innodb_ft_max_token_size. I increased that as well to 150 and ran the optimize again but again had the same result.
The only difference between these tables is the engine and the character set. This table is using utf8, the myisam table is using latin1.
Has anyone seen these behavior, or have advice for how to resolve it?
UPDATE:
I added ft_stopword_file="" to my my.cnf and ran OPTIMIZE TABLE table again. This time I got
optimize | note | Table does not support optimize, doing recreate + analyze instead
The query worked after this change. Ex is not a stop word though so not sure why it would make a difference.
A new query that fails though is:
SELECT MATCH (Columns) AGAINST ('+Term +Ex +in' IN BOOLEAN MODE) as score FROM Table where id = 1;
+-------+
| score |
+-------+
| 0 |
+-------+
the in causes this to fail but that is the next word in my table.
SELECT MATCH (Columns) AGAINST ('+Term +Ex' IN BOOLEAN MODE) as score FROM Table where id = 1;
+--------------------+
| score |
+--------------------+
| 219.30206298828125 |
+--------------------+
I also tried CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;, then updated my.cnf with innodb_ft_server_stopword_table='db/my_stopwords'. I restarted and ran:
show variables like 'innodb_ft_server_stopword_table';
which brought back:
+---------------------------------+---------------------------+
| Variable_name | Value |
+---------------------------------+---------------------------+
| innodb_ft_server_stopword_table | 'db/my_stopwords'; |
+---------------------------------+---------------------------+
so I thought the in would not cause the query to fail now but it continues. I also tried OPTIMIZE TABLE table again and even ALTER TABLE table DROP INDEX ... and ALTER TABLE table ADD FULLTEXT KEY ... none of which have had an affect.
Second Update
The issue is with the stop words.
$userinput = preg_replace('/\b(a|about|an|are|as|at|be|by|com|de|en|for|from|how|i|in|is|it|la|of|on|or|that|the|this|to|was|what|when|where|who|will|with|und|the|www)\b/', '', $userinput);
resolves the issue but that doesn't appear as a good solution to me. I'd like a solution that avoids the stop words breaking this in mysql.
Stopword table data:
CREATE TABLE `my_stopwords` (
`value` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
and
Name: my_stopwords
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 0
Avg_row_length: 0
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: NULL
Create_time: 2019-04-09 17:39:55
Update_time: NULL
Check_time: NULL
Collation: latin1_swedish_ci
Checksum: NULL
Create_options:
Comment:
There are several differences between MyISAM's FULLTEXT and InnoDB's. I think you were caught by the handling of 'short' words and/or stop words. MyISAM will show rows, but InnoDB will fail to.
What I have done when using FT (and after switching to InnoDB) is to filter the user's input to avoid short words. It takes extra effort but gets me the rows desired. My case is slightly different since the resulting query is something like this. Note that I have added + to require the words, but not on words shorter than 3 (my ft_min_token_size is 3). These searches were for build a table and build the table:
WHERE match(description) AGAINST('+build* a +table*' IN BOOLEAN MODE)
WHERE match(description) AGAINST('+build* +the* +table*' IN BOOLEAN MODE)
(The trailing * may be redundant; I have not investigated that.)
Another approach
Since FT is very efficient at non-short, non-stop words, do the search with two phases, each being optional: To search for "a long word", do
WHERE MATCH(d) AGAINST ('+long +word' IN BOOLEAN MODE)
AND d REGEXP '[[:<:]]a[[:>:]]'
The first part whittles down the possible rows rapidly by looking for 'long' and 'word' (as words). The second part makes sure there is a word a in the string, too. The REGEXP is costly but will be applied only to those rows that pass the first test.
To search just for "long word":
WHERE MATCH(d) AGAINST ('+long +word' IN BOOLEAN MODE)
To search just for the word "a":
WHERE d REGEXP '[[:<:]]a[[:>:]]'
Caveat: This case will be slow.
Note: My examples allow for the words to be in any order, and in any location in the string. That is, this string will match in all my examples: "She was longing for a word from him."
Here is a step by step procedure which should have reproduced your problem. (This is actually how you should have written your question.) The environment is a freshly installed VM with Debian 9.8 and Percona Server Ver 5.6.43-84.3.
Create an InnoDB table with a fulltext index and some dummy data:
create table test.ft_innodb (
txt text,
fulltext index (txt)
) engine=innodb charset=utf8 collate=utf8_unicode_ci;
insert into test.ft_innodb (txt) values
('Some dummy text'),
('Text with a long and short stop words in it ex');
Execute a test query to verify that it doesn't work yet as we need:
select txt
, match(t.txt) against ('+some' in boolean mode) as score0
, match(t.txt) against ('+with' in boolean mode) as score1
, match(t.txt) against ('+in' in boolean mode) as score2
, match(t.txt) against ('+ex' in boolean mode) as score3
from test.ft_innodb t;
Result (rounded):
txt | score0 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0 | 0 | 0
As you see, it's not working with stop words ("+with") or with short words ("+ex").
Create an empty InnoDB table for custom stop words:
create table test.my_stopwords (value varchar(30)) engine=innodb;
Edit /etc/mysql/my.cnf and add the following two lines in the [mysqld] block:
[mysqld]
# other settings
innodb_ft_server_stopword_table = "test/my_stopwords"
innodb_ft_min_token_size = 1
Restart MySQL with service mysql restart
Run the query from (2.) again (The result should be the same)
Rebuild the fulltext index with
optimize table test.ft_innodb;
It will actually rebuild the entire tabe including all indexes.
Execute the test query from (2.) again. Now the result is:
txt | score1 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0.0906 | 0.0906 | 0.0906
You see it works just fine for me. And it's quite simple to reproduce. (Again - This is how you should have written your question.)
Since your procedure is rather chaotic than detailed, it's difficult to say what could go wrong for you. For example:
CREATE TABLE my_stopwords(value VARCHAR(30)) ENGINE = INNODB;
This doesn't contain the information, in which database you have defined that table. Note that I have prefixed all my tables with the corresponding database. Now consider the following: I change my.cnf and set innodb_ft_server_stopword_table = "db/my_stopwords". Note - There is no such table on my server (not even the schema db exists). Restart the MySQL server. And check the new settings with
show variables like 'innodb_ft_server_stopword_table';
This returns:
Variable_name | Value
--------------------------------|----------------
innodb_ft_server_stopword_table | db/my_stopwords
And after optimize table test.ft_innodb; the test query returns this:
txt | score0 | score1 | score2 | score3
-----------------------------------------------|--------|--------|--------|-------
Some dummy text | 0.0906 | 0 | 0 | 0
Text with a long and short stop words in it ex | 0 | 0 | 0 | 0.0906
You see? It's not working with stopwords any more. But it works with short non stop words like "+ex". So make sure, that the table you defined in innodb_ft_server_stopword_table actually exists.
A common technique in searching is to make an extra column with the 'sanitized' string to search in. Then add the FULLTEXT index to that column instead of the original column.
In your case, removing the stopwords is the main difference. But there may also be punctuation that could (should?) be removed. Sometimes hyphenated words or words or contractions or part numbers or model numbers cause trouble. They can be modified to change the punctuation or spacing to make it more friendly with the FT requirements and/or the user's flavor of input. Another thing is to add words to the search-string column that are common misspellings of the words that are in the column.
Sure, this is more work than you would like to have to do. But I think it provides a viable solution.

MySQL performance for version 5.7 vs. 5.6

I have noticed a particular performance issue that I am unsure on how to deal with.
I am in the process of migrating a web application from one server to another with very similar specifications. The new server typically outperforms the old server to be clear.
The old server is running MySQL 5.6.35
The new server is running MySQL 5.7.17
Both the new and old server have virtually identical MySQL configurations.
Both the new and old server are running the exact same database perfectly duplicated.
The web application in question is Magento 1.9.3.2.
In Magento, the following function
Mage_Catalog_Model_Category::getChildrenCategories()
is intended to list all the immediate children categories given a certain category.
In my case, this function bubbles down eventually to this query:
SELECT `main_table`.`entity_id`
, main_table.`name`
, main_table.`path`
, `main_table`.`is_active`
, `main_table`.`is_anchor`
, `url_rewrite`.`request_path`
FROM `catalog_category_flat_store_1` AS `main_table`
LEFT JOIN `core_url_rewrite` AS `url_rewrite`
ON url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system=1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path LIKE '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC;
While the structure for this query is the same for any Magento installation, there will obviously be slight discrepancies on values between Magento Installation to Magento Installation and what category the function is looking at.
My catalog_category_flat_store_1 table has 214 rows.
My url_rewrite table has 1,734,316 rows.
This query, when executed on its own directly into MySQL performs very differently between MySQL versions.
I am using SQLyog to profile this query.
In MySQL 5.6, the above query performs in 0.04 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/JNKEpy/
In MySQL 5.7, the above query performs in 1.952 seconds. The profile for this query looks like this: https://codepen.io/Petce/full/gWMgKZ/
As you can see, the same query on almost the exact same setup is virtually 2 seconds slower, and I am unsure as to why.
For some reason, MySQL 5.7 does not want to use the table index to help produce the result set.
Anyone out there with more experience/knowledge can explain what is going on here and how to go about fixing it?
I believe the issue has something to do with the way that MYSQL 5.7 optimizer works. For some reason, it appears to think that a full table scan is the way to go. I can drastically improve the query performance by setting max_seeks_for_key very low (like 100) or dropping the range_optimizer_max_mem_size really low to forcing it to throw a warning.
Doing either of these increases the query speed by almost 10x down to 0.2 sec, however, this is still magnitudes slower that MYSQL 5.6 which executes in 0.04 seconds, and I don't think either of these is a good idea as I'm not sure if there would be other implications.
It is also very difficult to modify the query as it is generated by the Magento framework and would require customisation of the Magento codebase which I'd like to avoid. I'm also not even sure if it is the only query that is effected.
I have included the minor versions for my MySQL installations. I am now attempting to update MySQL 5.7.17 to 5.7.18 (the latest build) to see if there is any update to the performance.
After upgrading to MySQL 5.7.18 I saw no improvement. In order to bring the system back to a stable high performing state, we decided to downgrade back to MySQL 5.6.30. After doing the downgrade we saw an instant improvement.
The above query executed in MySQL 5.6.30 on the NEW server executed in 0.036 seconds.
Wow! This is the first time I have seen something useful from Profiling. Dynamically creating an index is a new Optimization feature from Oracle. But it looks like that was not the best plan for this case.
First, I will recommend that you file a bug at http://bugs.mysql.com -- they don't like to have regressions, especially this egregious. If possible, provide EXPLAIN FORMAT=JSON SELECT... and "Optimizer trace". (I do not accept tweaking obscure tunables as an acceptable answer, but thanks for discovering them.)
Back to helping you...
If you don't need LEFT, don't use it. It returns NULLs when there are no matching rows in the 'right' table; will that happen in your case?
Please provide SHOW CREATE TABLE. Meanwhile, I will guess that you don't have INDEX(include_in_menu, is_active, path). The first two can be in either order; path needs to be last.
And INDEX(category_id, is_system, store_id, id_path) with id_path last.
Your query seems to have a pattern that works well for turning into a subquery:
(Note: this even preserves the semantics of LEFT.)
SELECT `main_table`.`entity_id` , main_table.`name` , main_table.`path` ,
`main_table`.`is_active` , `main_table`.`is_anchor` ,
( SELECT `request_path`
FROM url_rewrite
WHERE url_rewrite.category_id=main_table.entity_id
AND url_rewrite.is_system = 1
AND url_rewrite.store_id = 1
AND url_rewrite.id_path LIKE 'category/%'
) as request_path
FROM `catalog_category_flat_store_1` AS `main_table`
WHERE (main_table.include_in_menu = '1')
AND (main_table.is_active = '1')
AND (main_table.path like '1/494/%')
AND (`level` <= 2)
ORDER BY `main_table`.`position` ASC
LIMIT 0, 1000
(The suggested indexes apply here, too.)
THIS is not a ANSWER only for comment for #Nigel Ren
Here you can see that LIKE also use index.
mysql> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,00 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> vals LIKE 'text%';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | range | vals | vals | 515 | NULL | 3 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>
sample with LEFT()
mysql> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+---------------------------------------+
| id | vals |
+----+---------------------------------------+
| 3 | text for line number 3 |
| 1 | textline 1 we rqwe rq wer qwer q wer |
| 2 | textline 2 asdf asd fas f asf wer 3 |
+----+---------------------------------------+
3 rows in set (0,01 sec)
mysql> EXPLAIN
-> SELECT *
-> FROM testdb
-> WHERE
-> LEFT(vals,4) = 'text';
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testdb | NULL | index | NULL | vals | 515 | NULL | 5 | 100.00 | Using where; Using index |
+----+-------------+--------+------------+-------+---------------+------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0,01 sec)
mysql>

Mysql search for string and number using MATCH() AGAINST()

I have a problem with the MATCH AGAINST function.
The following query give me the same result:
SELECT * FROM models MATCH(name) AGAINST('Fiat 500')
SELECT * FROM models MATCH(name) AGAINST('Fiat')
How can I search for both strings and numbers in a column of a FULL TEXT table?
Thanks
If you need Fiat and 500 anywhere where order does not matter, then
SELECT * FROM models MATCH(name) AGAINST('+Fiat +500');
If you need Fiat 500 together, then
SELECT * FROM models MATCH(name) AGAINST('+"Fiat 500"');
If you need Fiat and zero or more 500, then
SELECT * FROM models MATCH(name) AGAINST('+Fiat 500');
If you need 500 and zero or more Fiat, then
SELECT * FROM models MATCH(name) AGAINST('Fiat +500');
Give it a Try !!!
UPDATE 2013-01-28 18:28 EDT
Here are the default settings for FULLTEXT searching
mysql> show variables like 'ft%';
+--------------------------+----------------+
| Variable_name | Value |
+--------------------------+----------------+
| ft_boolean_syntax | + -><()~*:""&| |
| ft_max_word_len | 84 |
| ft_min_word_len | 4 |
| ft_query_expansion_limit | 20 |
| ft_stopword_file | (built-in) |
+--------------------------+----------------+
5 rows in set (0.00 sec)
mysql>
Notice that ft_min_word_len is 4 by default. The token 500 is length 3. thus it will not be indexed at all. You will have to do three(3) things:
STEP 01 : Configure for smaller string tokens
Add this to /etc/my.cnf
[mysqld]
ft_min_word_len = 1
STEP 02 : Restart mysql
service mysql restart
STEP 03 : Reindex all indexes in the models table
You could just drop and add the FULLTEXT index
or do it in stages and see how big it will get in advance
CREATE TABLE models_new LIKE models;
ALTER TABLE models_new DROP INDEX name;
ALTER TABLE models_new ADD FULLTEXT name (name);
ALTER TABLE models_new DISABLE KEYS;
INSERT INTO models_new SELECT * FROM models;
ALTER TABLE models_new ENABLE KEYS;
ALTER TABLE models RENAME models_old;
ALTER TABLE models_new RENAME models;
When you are satisfied this worked, then run
DROP TABLE models_old;
Give it a Try !!!
Just in case it helps others, if you're using InnoDB you need to use a different set of mysql.cnf properties
eg
show variables like 'innodb_ft%';
and in my.cnf set the following value instead of ft_min_word_len
innodb_ft_min_token_size=1

mysql dump query hangs

I have run mysql -u root -p gf < ~/gf_backup.sql to restore my db. However when I see the process list I see that one query has has been idle for a long time. I do not know the reason why.
mysql> show processlist;
+-----+------+-----------+-------------+---------+-------+-----------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------+-----------+-------------+---------+-------+-----------+------------------------------------------------------------------------------------------------------+
| 662 | root | localhost | gf | Query | 18925 | query end | INSERT INTO `gf_1` VALUES (1767654,'90026','Lddd',3343,34349),(1 |
| 672 | root | localhost | gf | Query | 0 | NULL | show processlist |
+-----+------+-----------+-------------+---------+-------+-----------+------------------------------------------------------------------------------------------------------+
Please check free space with df -h command (if under Linux/Unix) if you're out of space do not kill or restart MySQL until it catch up with changes when you free some space.
you may also want to check max_allowed_packet setting in my.cnf and set it to something like 256M, please refer to http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_allowed_packet
Probably your dump is very large and contains much normalized data (records split into a bunch of tables, with a bunch of foreign key constraints, indexes and so on).
If so, you may try to remove all constraints and index definitions from the SQL file, then import the data and re-create the former removed directives. This is a well-known trick to speed up imports, because INSERT commands without validation of any constraints are a lot faster, and creation of an index and so on afterwards can be done in a single transaction.
See also: http://support.tigertech.net/mysql-large-inserts
Of course, you should kill the query first. And remove all fragments it created already.

MySQL using only first 20 characters of a string when ordering records

It appears that when I’m using order by name statement where name has a varchar(255) type, MySQL at my server doesn’t place records in proper order if they have same 20 first characters of name field. It seems like MySQL doesn’t care about 21st character at all: it actually preserves the same incorrect order when sorting in descending order.
I replicated my table on another MySQL installation and everything is OK there. But what do I do about this limitation on a server? I can’t reinstall MySQL there because I’m using shared hosting.
Update: the name field doesn’t belong to any index and creating index on this field doesn’t help either.
MySQL version is 5.1.55, engine is MyISAM.
Update 2: I originally used cp1251_general_ci collation but then I tried other collations and got the exact same result. For strings I used '123456789012345678901'/'123456789012345678902' and 'abcdefghijklmnopqrstauvwxyz'/'abcdefghijklmnopqrstbuvwxyz', same result.
Ordering seems to not take into consideration all characters starting from the 21st, but otherwise it’s working as it should.
Interestingly, when using ORDER BY substring(name, 2) the 21st character matters but the 22nd does not.
could you check your max_length_for_sort_data and max_sort_length variables? default is 1024, if you have one of those set to 20 then that explains it all.
mysqladmin -u root -p variables | grep sort
Enter password:
| max_length_for_sort_data | 1024 |
| max_sort_length | 1024 |
| myisam_max_sort_file_size | 2146435072 |
| myisam_sort_buffer_size | 8388608 |
| optimizer_switch | index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on |
| sort_buffer_size | 2097144 |
You can find more info ORDER BY optimisation chapter of the mysql server manual and at the max_length_sort_data definition.