Optimize COUNT(*) with MATCH ... AGAINST

Optimize COUNT(*) with MATCH ... AGAINST - mysql

I am using COUNT(*) with MATCH() ... AGAINST(). My specific query is as follows:
SELECT COUNT(*) FROM `source_code` WHERE MATCH(`html`) AGAINST ('title');
I get results after a few seconds:
+----------+
| count(*) |
+----------+
| 17346 |
+----------+
1 row in set (16.30 sec)
After running the query multiple times, the query always takes around 16 seconds to complete.
Is there any way to speed up this query? Why isn't query cache caching the results of this query?
In case it's helpful, here is the EXPLAIN and CREATE TABLE statements:
EXPLAIN SELECT COUNT(*) FROM `source_code` WHERE MATCH(`html_w`) AGAINST ('title');
+----+-------------+-------------+----------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+----------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | source_code | fulltext | html | html | 0 | | 1 | Using where |
+----+-------------+-------------+----------+---------------+--------+---------+------+------+-------------+
Looks like the index is being used. (Maybe the overhead is that the query is still Using where? Is it normal for key_len to be 0?)
SHOW CREATE TABLE `source_code`;
CREATE TABLE `source_code` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) NOT NULL,
`domain` varchar(255) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`html` longtext,
`crawled` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
KEY `crawled` (`crawled`),
KEY `domain` (`domain`),
FULLTEXT KEY `html` (`html`)
) ENGINE=MyISAM AUTO_INCREMENT=78707 DEFAULT CHARSET=latin1
Nothing too crazy in the CREATE TABLE statement.

Unlike many other databases, mysql is very good had handling select count(*) queries when there is an index that covers the entire table. In your case you do have an index that covers the whole table but it's different from a normal primary key since it's a full text index.
You can see that the query analyzer tries to use that index (possible_keys) but it's actually unable to us it.
The key_len column indicates the length of the key that MySQL decided
to use. The length is NULL if the key column says NULL. Note that the
value of key_len enables you to determine how many parts of a
multiple-part key MySQL actually uses
It's most unusual for key_len to be 0 instead of null, but what it means is that 0 parts of your index was used for the query.
As for how to optimize this? THe answer is it's very difficult. The only thing I can think of is to create a stop word list and the other is to set the minimum word length. Both these go into your my.cnf file.

Related

Mariadb - select count() using index but select * not using proper index

Version - 10.4.25-MariaDB
I have a table where column(name) is a second part of primary key(idarchive,name).
When i run count(*) on table where name like 'done%', its using the index on field name properly but when i run select * its not using the separate index instead using primary key and slowing down the query.
Any idea what we can do here ?
any changes in optimizer switch or any other alternative which can help ?
Note - we can't use force index as queries are not controlable.
Table structure:
CREATE TABLE `table1 ` (
`idarchive` int(10) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`idsite` int(10) unsigned DEFAULT NULL,
`date1` date DEFAULT NULL,
`date2` date DEFAULT NULL,
`period` tinyint(3) unsigned DEFAULT NULL,
`ts_archived` datetime DEFAULT NULL,
`value` double DEFAULT NULL,
PRIMARY KEY (`idarchive`,`name`),
KEY `index_idsite_dates_period` (`idsite`,`date1`,`date2`,`period`,`ts_archived`),
KEY `index_period_archived` (`period`,`ts_archived`),
KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Queries:
explain select count(*) from table1 WHERE name like 'done%' ;
+------+-------------+-------------------------------+-------+---------------+------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------------------------+-------+---------------+------+---------+------+---------+--------------------------+
| 1 | SIMPLE | table1 | range | name | name | 767 | NULL | 9131455 | Using where; Using index |
+------+-------------+-------------------------------+-------+---------------+------+---------+------+---------+--------------------------+
1 row in set (0.000 sec)
explain select * from table1 WHERE name like 'done%' ;
+------+-------------+-------------------------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------------------------------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | table1 | ALL | name | NULL | NULL | NULL | 18262910 | Using where |
+------+-------------+-------------------------------+------+---------------+------+---------+------+----------+-------------+
1 row in set (0.000 sec) ```

Your SELECT COUNT(*) ... LIKE 'constant%' query is covered by your index on your name column. That is, the entire query can be satisfied by reading the index. So the query planner decides to range-scan your index to generate the result.
On the other hand, your SELECT * query needs all columns from all rows of the table. That can't be satisfied from any of your indexes. And, it's possible your WHERE name like 'done%' filter reads a significant fraction of the table, enough so the query planner decides the fastest way to satisfy it is to scan the entire table. The query planner figures this out by using statistics on the contents of the table, plus some knowledge of the relative costs of IO and CPU.
If you have just inserted many rows into the table you could try doing ANALYZE TABLE table1 and then rerun the query. Maybe after the table's statistics are updated you'll get a different query plan.
And, if you don't need all the columns, you could stop using SELECT * and instead name the columns you need. SELECT * is a notorious query-performance antipattern, because it often returns column data that never gets used. Once you know exactly what columns you want, you could create a covering index to provide them.
These days the query planner does a pretty good job of optimizing simple queries such as yours.
In MariaDB you can say ANALYZE FORMAT=JSON SELECT.... It will run the query and show you details of the actual execution plan it used.

MySQL update query not using indexed columns

I tried the following SQL query to update the table INDEXED_MERCHANT where I have 10000 records in the table. I indexed both "Name" and "A" as Indexed keys to improve the update query performance. By executing the command SHOW CREATE TABLE INDEXED_MERCHANT; I'll get an output result as follows:
INDEXED_MERCHANT | CREATE TABLE `INDEXED_MERCHANT` (
`ID` varchar(50) NOT NULL,
`NAME` varchar(200) DEFAULT NULL,
`ONLINE_STATUS` varchar(10) NOT NULL,
`A` varchar(100) DEFAULT NULL,
`B` varchar(200) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `NAME` (`NAME`,`A`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Here this means both my keys are recognized as indexed keys. When I execute the following command, the "Extra" column result says the query doesn't using the index keys. How should I achieve my goal ?
Executed query : EXPLAIN EXTENDED UPDATE INDEXED_MERCHANT SET ONLINE_STATUS = '0' WHERE NAME = 'A 205' AND A = 'P 205';
Result:
+----+-------------+------------------+-------+---------------+------+---------+-------------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | **Extra** |
+----+-------------+------------------+-------+---------------+------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | INDEXED_MERCHANT | range | NAME | NAME | 906 | const,const | 1 | 100.00 | **Using where** |
+----+-------------+------------------+-------+---------------+------+---------+-------------+------+----------+-------------+

Your query is hitting "NAME" index as evident in explain output column "key".
Here is the explanation of key and extra columns from mysql documentation
key
The key column indicates the key (index) that MySQL actually decided to use. If MySQL decides to use one of the possible_keys indexes to look up rows, that index is listed as the key value.
Using where
A WHERE clause is used to restrict which rows to match against the next table or send to the client. Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where and the table join type is ALL or index. Even if you are using an index for all parts of a WHERE clause, you may see Using where if the column can be NULL.

mysql file sort happens even with indexes -- How can I fix

I have a simple query below that I can to make sure runs fast if the table. I did an explain on the query and it says Using where; Using file sort. Is there a way to get rid of the file sort? The data has only about 25 items in it now; but it could end up by 300 or more.
mysql> show create table phppos_categories;
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| phppos_categories | CREATE TABLE `phppos_categories` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `phppos_categories_ibfk_1` (`parent_id`),
KEY `name` (`name`),
KEY `parent_id` (`parent_id`),
CONSTRAINT `phppos_categories_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `phppos_categories` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM (`phppos_categories`) WHERE `parent_id` = 5 ORDER BY `name` asc;
+----+-----------+-------------------+
| id | parent_id | name |
+----+-----------+-------------------+
| 3 | 5 | Basketball Shoes |
| 7 | 5 | Basketball Shorts |
+----+-----------+-------------------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM (`phppos_categories`) WHERE `parent_id` = 5 ORDER BY `name` asc;
+----+-------------+-------------------+------+------------------------------------+--------------------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------+------------------------------------+--------------------------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | phppos_categories | ref | phppos_categories_ibfk_1,parent_id | phppos_categories_ibfk_1 | 5 | const | 2 | Using where; Using filesort |
+----+-------------+-------------------+------+------------------------------------+--------------------------+---------+-------+------+-----------------------------+
1 row in set (0.00 sec)
mysql>

You can potentially remove the filesort here by adding a multi-column index on (parent_id, name).
ALTER TABLE phppos_categories ADD INDEX (parent_id,name);
Generally speaking MySQL will only use a single index when it has to potentially sort with one of them (if you are simply using two indexes to query the table, it MAY use two indexes using index-merge but often that is not the case anyway). The solution is to create a single index covering all of the columns you need to query.
Second to this, MySQL can only do a "range" search or "sort" with the last column in the index that it uses. Any columns before that must be an exact equality match.
On that basis we can create an index with parent_id first, which has an exact equality match (=5) and then on name which is your order constraint.
Main thing to be aware of is that you don't want to add more indexes than necessary, and an index to suit every single possible query may not be sensible, especially given the additional storage space and work required to keep the index up to date. As part of that comparison, you also need to consider how often the table is updated. If it is seldom updated then more indexes are potentially less of an issue.
See here for some more information:
https://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html

You are seeing 'using filesort' because you are ordering by the name column which is a varchar(255) field.
ORDER BY `name` asc;
Explain reports that the number of records examined is only 2, and the index on the parent_id is being used. There are 2 rows in your result set. Therefore, MySQL did not do any extra or unnecessary work.
Personally, I would not worry about trying to get rid of the 'using filesort' in this case. Even if you had 300 rows in the table, you are still limiting the result set with where on an indexed field (parent_id), and the number of examined rows will equal the number in your result set.

Why would an indexed column return results slowly when querying for `IS NULL`?

I have a table with 25 million rows, indexed appropriately.
But adding the clause AND status IS NULL turns a super fast query into a crazy slow query.
Please help me speed it up.
Query:
SELECT
student_id,
grade,
status
FROM
grades
WHERE
class_id = 1
AND status IS NULL -- This line delays results from <200ms to 40-70s!
AND grade BETWEEN 0 AND 0.7
LIMIT 25;
Table:
CREATE TABLE IF NOT EXISTS `grades` (
`student_id` BIGINT(20) NOT NULL,
`class_id` INT(11) NOT NULL,
`grade` FLOAT(10,6) DEFAULT NULL,
`status` INT(11) DEFAULT NULL,
UNIQUE KEY `unique_key` (`student_id`,`class_id`),
KEY `class_id` (`class_id`),
KEY `status` (`status`),
KEY `grade` (`grade`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Local development shows results instantly (<200ms). Production server is huge slowdown (40-70 seconds!).
Can you point me in the right direction to debug?
Explain:
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| 1 | SIMPLE | grades | index_merge | class_id,status,grade | status,class_id | 5,4 | NULL | 26811 | Using intersect(status,class_id); Using where |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+

A SELECT statement can only use one index per table.
Presumably the query before just did a scan using the sole index class_id for your condition class_id=1. Which will probably filter your result set nicely before checking the other conditions.
The optimiser is 'incorrectly' choosing an index merge on class_id and status for the second query and checking 26811 rows which is probably not optimal. You could hint at the class_id index by adding USING INDEX (class_id) to the end of the FROM clause.
You may get some joy with a composite index on (class_id,status,grade) which may run the query faster as it can match the first two and then range scan the grade. I'm not sure how this works with null though.
I'm guessing the ORDER BY pushed the optimiser to choose the class_id index again and returned your query to it's original speed.

MySQL 5.1 using filesort event when an index is present

Probably I'm missing some silly thing... Apparently MySQL 5.1 keeps doing a Filesort even when there is an index that matches exactly the column in the ORDER BY clause. To post it here, I've oversimplified the data model, but the issue is still happening:
Table definition:
CREATE TABLE `event` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`owner_id` int(11) DEFAULT NULL,
`date_created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `owner_id` (`owner_id`),
KEY `date_created` (`date_created`),
CONSTRAINT `event_ibfk_1` FOREIGN KEY (`owner_id`) REFERENCES `user_profile` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;
My problem is that event a simple SELECT is showing "Using filesort":
explain select * from event order by date_created desc;
And the result for the query explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE event ALL NULL NULL NULL NULL 6 Using filesort
Is there any way for this type of queries to use the index insteas of doing a filesort?
Thanks in advance to everybody.

Since your CREATE TABLE statement indicates that you have less than 10 rows (AUTO_INCREMENT=7) and using FORCE INDEX on my installation will make MySQL use the index, I'm guessing the optimizer thinks a table scan is faster (less random I/O) than an index scan (since you're selecting all columns, not just date_created). This is confirmed by the following:
mysql> explain select date_created from event order by date_created;
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| 1 | SIMPLE | event | index | NULL | date_created | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
1 row in set (0.00 sec)
In the above case, the index scan is faster because only the indexed column needs to be returned.
The MySQL documentation has some cases where using an index is considered slower: http://dev.mysql.com/doc/refman/5.1/en/how-to-avoid-table-scan.html

Q: Is there any way for this type of queries to use the index instead of doing a filesort?
A: To have MySQL use the index if at all possible, try:
EXPLAIN SELECT * FROM event FORCE INDEX (date_created) ORDER BY date_created DESC;
By using the FORCE INDEX (index_name), this tells MySQL to make use of the index if it's at all possible. Absent that directive, MySQL will choose the most efficient way to return the result set. A filesort may be more efficient than using the index.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Optimize COUNT(*) with MATCH ... AGAINST - mysql

Related

Mariadb - select count() using index but select * not using proper index

MySQL update query not using indexed columns

mysql file sort happens even with indexes -- How can I fix

Why would an indexed column return results slowly when querying for `IS NULL`?

MySQL 5.1 using filesort event when an index is present

Categories

Resources