MySQL FULLTEXT not working properly after a period of time - mysql

I created a little table table for testing, in which there is the attribute "video_name" with the FULLTEXT modifier:
| videos | CREATE TABLE `videos` (
`video_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`video_name` varchar(70) DEFAULT NULL,
PRIMARY KEY (`video_id`),
UNIQUE KEY `video_name_2` (`video_name`),
FULLTEXT KEY `video_name` (`video_name`),
CONSTRAINT `no_empty_name` CHECK (`video_name` <> '')
) ENGINE=InnoDB AUTO_INCREMENT=153 DEFAULT CHARSET=utf8mb4 |
With this, I can do searches in both the NATURAL LANGUAGE and BOOLEAN mode.
Sadly, It seems that the modifier It isn't working correctly if I don't use the DB after a period of time.
The first time something went wrong, I noticed that the search was only working with the newer entries, only getting empty results with the older ones, even in BOOLEAN mode searching with the exact value of the "video_name", so the problem isn't because the number of rows. At that moment I fixed it by removing the modifier and applying It once again.
Now, today I was having problems again, but this time the entries that I have just added where the ones not working. Like the last time, re-applying the modifier fixed the problem, but, as we can see, It's pretty much guaranteed that the problems will appear again.
Is this behaviour correct? Am I missing some configuration?
Thank you for your time.
EDIT:
This is what the command SHOW VARIABLES LIKE 'innodb_ft%' returns:
+---------------------------------+------------+
| Variable_name | Value |
+---------------------------------+------------+
| innodb_ft_aux_table | |
| innodb_ft_cache_size | 8000000 |
| innodb_ft_enable_diag_print | OFF |
| innodb_ft_enable_stopword | ON |
| innodb_ft_max_token_size | 84 |
| innodb_ft_min_token_size | 3 |
| innodb_ft_num_word_optimize | 2000 |
| innodb_ft_result_cache_limit | 2000000000 |
| innodb_ft_server_stopword_table | |
| innodb_ft_sort_pll_degree | 2 |
| innodb_ft_total_cache_size | 640000000 |
| innodb_ft_user_stopword_table | |
+---------------------------------+------------+

Related

MySQL doesn't use my index while it declares it will in explain statement

I recently encounter a problem involving MySQL DBSM.
The Table is like this:
CREATE TABLE `orders` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(60) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
`sex` enum('男','女') DEFAULT NULL,
`amount` float(10,2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_i` (`name`),
KEY `sex` (`sex`)
) ENGINE=InnoDB AUTO_INCREMENT=5000001 DEFAULT CHARSET=utf8
As is shown above ,I create a single colume index on col name
I want to perform a range query on name, and the explain statement is
mysql> explain select * from orders where name like '王%';
+----+-------------+--------+------------+-------+---------------+--------+---------+------+-------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+--------+---------+------+-------+----------+----------------------------------+
| 1 | SIMPLE | orders | NULL | range | name_i | name_i | 183 | NULL | 20630 | 100.00 | Using index condition; Using MRR |
+----+-------------+--------+------------+-------+---------------+--------+---------+------+-------+----------+----------------------------------+
1 row in set, 1 warning (0.10 sec)
so it should use the index name_i and finish the query in a flash(my classmate spent 0.07 sec)
however , this is how it turned out:
| 4998119 | 王缝 | 27 | 男 | 159.21 |
| 4998232 | 王求葬 | 19 | 男 | 335.65 |
| 4998397 | 王倘予 | 49 | 女 | 103.39 |
| 4998482 | 王厚 | 77 | 男 | 960.69 |
| 4998703 | 王啄淋 | 73 | 女 | 458.85 |
| 4999106 | 王般埋 | 70 | 女 | 700.98 |
| 4999359 | 王胆具 | 31 | 女 | 362.83 |
| 4999510 | 王铁脾 | 31 | 女 | 973.09 |
| 4999880 | 王战万 | 59 | 女 | 127.28 |
| 4999928 | 王忆 | 42 | 女 | 72.47 |
+---------+--------+------+------+--------+
11160 rows in set (3.43 sec)
And it seems to not use the index at all, because the data is sorted by the primary key id rather than col name(besides it is too slow ,comparing to 0.07 sec).
Has anyone encountered the problem too?
What percentage of the table is "Kings" (王) ? If it is more than about 20%, it will choose to do a table scan instead of use the index. (And this may actually be faster.) (Based on Comments, 0.22% of the table is Kings.)
EXPLAIN and the execution of the query are separate things. Although I don't remember proving this, it is possible that the EXPLAIN might say one thing, but the query would work another way.
Do you have 5 million rows in the table? Was the cache 'cold' when you first ran it? And it had to fetch 11,160 rows from disk? Then the second time, all was in cache, so much faster?
Was the table loaded in "alphabetical" (or whatever the Chinese word for that is) order? If so, there is a good chance the ids and the names are in the same order?
Apparently you are using utf8_general_ci COLLATION? Maybe it does not sort Chinese well. (Provide a test case; I'll do some tests.)
I do not understand why it mentioned MRR.
I, too, am baffled by "1 min 32.24sec". The ORDER BY name should have further encouraged the Optimizer to use INDEX(name). Can you turn on "Optimizer trace".
To really see whether it used the index, do this:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
If the big number(s) look like the number of rows in the table, then it did a table scan. If they look more like 11160, then they used the index.

ERROR 188 (HY000): FTS query exceeds result cache limit mysql

I have full text index on text column of my table with around 11 million rows.
Table structure :
CREATE TABLE `review` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`comments` text COLLATE utf8mb4_unicode_ci,
`title` varchar(1000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `reviewer_id` (`reviewer_id`),
FULLTEXT KEY `comments` (`comments`)
) ENGINE=InnoDB AUTO_INCREMENT=273001866 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED;
I tried search like :
SELECT
id
FROM
review
WHERE MATCH (comments) AGAINST ('"This is review is only for Campus tours and not for University itself as no one can write a review on University"' IN BOOLEAN MODE)
This is throwing following error :
ERROR 188 (HY000): FTS query exceeds result cache limit
Google says this was a bug which has been fixed by mysql in 5.7. I am using 5.7.19.
Any idea how to resolve this.
Copy pasting ft variables:
mysql> show global variables like 'innodb_ft%';
+---------------------------------+--------------------+
| Variable_name | Value |
+---------------------------------+--------------------+
| innodb_ft_aux_table | |
| innodb_ft_cache_size | 8000000 |
| innodb_ft_enable_diag_print | OFF |
| innodb_ft_enable_stopword | ON |
| innodb_ft_max_token_size | 84 |
| innodb_ft_min_token_size | 3 |
| innodb_ft_num_word_optimize | 2000 |
| innodb_ft_result_cache_limit | 2000000000 |
| innodb_ft_server_stopword_table | local/my_stopwords |
| innodb_ft_sort_pll_degree | 2 |
| innodb_ft_total_cache_size | 640000000 |
| innodb_ft_user_stopword_table | |
+---------------------------------+--------------------+
12 rows in set (0.00 sec)
I have encountered this problem after adding a full-text index to a text column in a huge table (10 mln rows).
Previously I solved this by restarting the server, but now I can't restart it because it's performing some computations.
Solved this by adjusting this setting (2 times more than default):
SET GLOBAL innodb_ft_result_cache_limit = 4000000000;

MySQL: Slow avg query for 411M rows

I have a simple table (created by django) - engine InnoDB:
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| correlation | double | NO | | NULL | |
| gene1_id | int(10) unsigned | NO | MUL | NULL | |
| gene2_id | int(10) unsigned | NO | MUL | NULL | |
+-------------+------------------+------+-----+---------+----------------+
The table has more than 411 million rows.
(The target table will have around 461M rows, 21471*21470 rows)
My main query looks like this, there might be up to 10 genes specified at most.
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
This query is very slow, it takes almost 2 mins to run:
21471 rows in set (1 min 11.03 sec)
Indexes (cardinality looks strange - too small?):
Non_unique| Key_name | Seq_in_index | Column_name | Collation | Cardinality |
0 | PRIMARY | 1 | id | A | 411512194 |
1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez | 1 | gene1_id | A | 18 |
1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez | 1 | gene2_id | A | 18 |
I just run select count(*) on that table and it took 22 mins:
select count(*) from predictions_genescorrelation;
+-----------+
| count(*) |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)
What could be wrong?
I suspect that mysql configuration is not set up right.
During the import of data I experienced problem with space, so that might also affected the database, although I ran check table later - it took 2hours and stated OK.
Additionally - the cardinality of the indexes look strange. I have set up smaller database locally and there values are totally different (254945589,56528,17).
Should I redo indexes?
What params should I check of MySQL?
My tables are set up as InnoDB, would MyISAM make any difference?
Thanks,
matali
https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/
SELECT COUNT(*) queries are very slow without WHERE clause or without SELECT COUNT(id) ... USE INDEX (PRIMARY).
to speedup this:
SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation
WHERE gene2_id IN (176829, 176519, 176230)
GROUP BY gene1_id ORDER BY NULL
you should have composite key on (gene2_id, gene1_id, correlation) in that order. try
About index-cardinality: stats of Innodb tables are approximate, not accurate (sometimes insane). there even was (IS?) a bug-report https://bugs.mysql.com/bug.php?id=58382
Try to ANALIZE table and watch cardinality again

Redesigning a self-referencing mysql table

I have inherited a legacy application which makes use of a self-referencing table to facilitate a hierarchical structure. This results in recursive method calls which are creating a "bad smell".
The parent_id column references the primary key of the same table, and there are roughly 25 million records here:
+-------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+----------------+
| phase_id | int(10) | NO | PRI | NULL | auto_increment |
| plat_id | int(10) | YES | MUL | NULL | |
| name | text | YES | | NULL | |
| parent_id | int(10) | YES | MUL | NULL | |
| plan_id | int(10) | YES | MUL | NULL | |
+-------------+---------+------+-----+---------+----------------+
mysql> show table status like 'ref'\G
*************************** 1. row ***************************
Name: phase
Engine: MyISAM
Version: 10
Row_format: Dynamic
Rows: 25223658
Avg_row_length: 20
Data_length: 509450960
Max_data_length: 281474976710655
Index_length: 1026267136
Data_free: 0
Auto_increment: 25238013
I have a few questions about this kind of structure:
Is it generally bad practice to implement a self-referencing table? The main negative I can think of is that it is difficult/impossible to get the maximum depth of the hierarchy in a single query as there may be an X number children.
Is it worth redesigning this? Having so much data makes it more difficult to move it around.
What are my options? I have heard a little bit about table partitioning, but don't know if it is suitable in my scenario.
Any pointers would be really appreciated
ns
We ended up scrapping the existing self-referencing table. And created a new table to house a more simple model.
Thanks to Sebas for the Link above. There is a lot of goodness in there!

MySQL Index is bigger than the data stored

I have a database with the following stats
Tables Data Index Total
11 579,6 MB 0,9 GB 1,5 GB
So as you can see the Index is close to 2x bigger. And there is one table with ~7 million rows that takes up at least 99% of this.
I also have two indexes that are very similar
a) UNIQUE KEY `idx_customer_invoice` (`customer_id`,`invoice_no`),
b) KEY `idx_customer_invoice_order` (`customer_id`,`invoice_no`,`order_no`)
Update: Here is the table definition (at least structurally) of the largest table
CREATE TABLE `invoices` (
`id` int(10) unsigned NOT NULL auto_increment,
`customer_id` int(10) unsigned NOT NULL,
`order_no` varchar(10) default NULL,
`invoice_no` varchar(20) default NULL,
`customer_no` varchar(20) default NULL,
`name` varchar(45) NOT NULL default '',
`archived` tinyint(4) default NULL,
`invoiced` tinyint(4) default NULL,
`time` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
`group` int(11) default NULL,
`customer_group` int(11) default NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_customer_invoice` (`customer_id`,`invoice_no`),
KEY `idx_time` (`time`),
KEY `idx_order` (`order_no`),
KEY `idx_customer_invoice_order` (`customer_id`,`invoice_no`,`order_no`)
) ENGINE=InnoDB AUTO_INCREMENT=9146048 DEFAULT CHARSET=latin1 |
Update 2:
mysql> show indexes from invoices;
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| invoices | 0 | PRIMARY | 1 | id | A | 7578066 | NULL | NULL | | BTREE | |
| invoices | 0 | idx_customer_invoice | 1 | customer_id | A | 17 | NULL | NULL | | BTREE | |
| invoices | 0 | idx_customer_invoice | 2 | invoice_no | A | 7578066 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_time | 1 | time | A | 541290 | NULL | NULL | | BTREE | |
| invoices | 1 | idx_order | 1 | order_no | A | 6091 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 1 | customer_id | A | 17 | NULL | NULL | | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 2 | invoice_no | A | 7578066 | NULL | NULL | YES | BTREE | |
| invoices | 1 | idx_customer_invoice_order | 3 | order_no | A | 7578066 | NULL | NULL | YES | BTREE | |
+----------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
My questions are:
Is there a way to find unused indexes in MySQL?
Are there any common mistakes that impact the size of the index?
Can indexA safely be removed?
How can you measure the size of each index? All I get is the total of all indexes.
You can remove index A, because, as you have noted, it is a subset of another index. And it's possible to do this without disrupting normal processing.
The size of the index files is not alarming in itself and it can easily be true that the net benefit is positive. In other words, the usefulness and value of an index shouldn't be discounted because it results in a large file.
Index design is a complex and subtle art involving a deep understanding of the query optimizer explanations and extensive testing. But one common mistake is to include too few fields in an index in order to make it smaller. Another is to test indexes with insufficient, or insufficiently representative data.
I may be wrong, but the first index (idx_customer_invoice) is UNIQUE, the second (idx_customer_invoice_order) is not, so you'll probably lose the uniqueness constraint when you remove it. No?
Is there a way to find unused indexes in MySQL?
The database engine optimizer will select a proper index when attempting to optimize your query. Depending on when you collected statistics on your indexes last, the index which is chosen will vary. Unused indexes could suddenly become used because of new data repartition.
Can indexA safely be removed?
I would say yes, if indexA and indexB are B-Tree indexes. This is because an index that starts with the same columns in the same order will have the same structure.
use
show indexes from table;
to define what indexes do you have in a particular table. Cardinality would tell how useful your index is.
You can remove your indexes safely (it will not break a table), but beware: some queries might execute slower. First you should analyze your queries to decide whether you need a certain index or not.
I don't think you can find out data length of a particular index, though.
BUT, I think you probably think that if indexes length is greater than data length twice is something abnormal... Well, you are wrong. All of your indexes might be useful ;) If you have a table that provides a lot of information and you have to search on it upon a large number of column, it can easily be that indexes of this table will 2 times bigger in size that the tables data.
indexA can remove because there's a
indexB include indexA
what impact your index length is
your column type and column length
use:
select index_length from information_schema.tables
where table_name='your_table_name' and
table_schema='your_db_name';
get your table index_length