MariaDB SELECT with index used but looks like table scan - mysql

I have a MariaDB 10.4 with a hung table (about 100 million rows) for storing crawled posts. The table contains 4x columns, and one of them is lastUpadate (datetime) and indexed.
Recently I try to select posts by lastUpdate. Most of them returns fast with index used, but some takes minutes with fewer records returned and looks like a table scan.
This is the query explain without conditions.
> explain select 1 from SourceAttr;
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| 1 | SIMPLE | SourceAttr | index | NULL | idxCreateDate | 5 | NULL | 79830491 | Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
This is the query explain and number of rows returned for the slow one. The number of rows in the explain is almost equals to the above one.
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | SourceAttr | index | idxLastUpdate | idxLastUpdate | 5 | NULL | 79827437 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
394454 rows in set (14 min 40.908 sec)
The is the fast one.
> explain select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | SourceAttr | range | idxLastUpdate | idxLastUpdate | 5 | NULL | 3699041 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
1352552 rows in set (2.982 sec)
Any reason what might cause this ?
Thanks a lot.

When you see type: index it's called an index scan. This is almost as bad as a table-scan.
Notice the rows: 79827437 in the EXPLAIN of the two slow queries. This means it's examining over 79 million items in the scanned index, either idxCreateDate or idxLastUpdate. So it's basically examining every index entry, which takes nearly as long as examining every row of the table.
Whereas the quick query says rows: 3699041 so it's estimating less than 3.7 million rows examined. More than 20x fewer.

Related

Speed up query with large table DESC limit 1

MariaDB 10 (myisam)
Query executes rather slowly, takes about 90 seconds.
I tried deleting some old rows and then optimizing the table.
SELECT ceil(rate * 8 / 1000000)
FROM db.Octets
WHERE id = 5344
order by datetime DESC
LIMIT 1;
Query takes a really long time to execute.
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | Octets | index | NULL | Octets_1_idx | 8 | NULL | 1 | Using where |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
you could try adding a composite redundant index
create index idx2 on Octets ( id , datetime, rate)

MySQL- Improvement on count(*) aggregation with composite index keys

I have a table with the following structure with almost 120000 rows,
desc user_group_report
+------------------+----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+-------------------+-------+
| user_id | int | YES | MUL | NULL | |
| group_id | int(11) | YES | MUL | NULL | |
| type_id | int(11) | YES | | NULL | |
| group_desc | varchar(128)| NO| | NULL |
| status | enum('open','close')|NO| | NULL | |
| last_updated | datetime | NO | | CURRENT_TIMESTAMP | |
+------------------+----------+------+-----+-------------------+-------+
I have indexes on the following keys :
user_group_type(user_id,group_id,group_type)
group_type(group_id,type_id)
user_type(user_id,type_id)
user_group(user_id,group_id)
My issue is I am running a count(*) aggregation on above table group by group_id and with a clause on type_id
Here is the query :
select count(*) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
and here is the explain plan (query taking 0.3 secs on average):
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | user_group_report | index | user_group_type,group_type,user_group | group_type | 10 | NULL | 119811 | Using where; Using index |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Here as I understand the query almost does a full table scan because of complex indices and When I am trying to add an index on group_id, the rows in explain plan shows a less number (almost half the rows) but the time taking for query execution is increased to 0.4-0.5 secs.
I have tried different ways to add/remove indices but none of them is reducing the time taken.
Assuming the table structure cannot be changed and querying is independent of other tables, Can someone suggest me a better way to optimize the above query or If i am missing anything here.
PS:
I have already tried to modify the query to the following but couldn't find any improvement.
select count(user_id) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
Any little help is appreciated.
Edit:
As per the suggestions, I added a new index
type_group on (type_id,group_id)
This is the new explain plan. The number of rows in explain,reduced but the query execution time is still the same
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | user_group_report | ref | user_group_type,type_group,user_group | type_group | 5 | const | 59846 | Using where; Using index |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
EDIT 2:
Adding details as suggested in answers/comments
select count(*)
from user_group_report
where type_id = 1
This query itself is taking 0.25 secs to execute.
and here is the explain plan:
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | user_group_report | ref | type_group | type_group | 5 | const | 59866 | Using index |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
I believe that your group_type is wrong. Try to switch the attributes.
create index ix_type_group on user_group_report(type_id,group_id)
This index is better for your query because you specify the type_id = 1 in the where clause. Therefore, the query processor finds the first record with type_id = 1 in your index and then it scans the records in the index with this type_id and performs the aggregation. With such index, only relevant records in the index are accessed which is not possible with the group_type index.
If type_id is selective (i.e. it reduces the search space significantly), creating an index on type_id, group_id should help significantly.
This is because it reduces the number of records that need to be grouped first (remove everything where type_id != 1), and only then does the grouping/summing.
EDIT:
Following on from the comments, it seems we need to figure out more about where the bottleneck is - finding the records, or grouping/summing.
The first step would be to measure the performance of:
select count(*)
from user_group_report
where type_id = 1
If that is significantly faster, the challenge is likely in the grouping than in finding the records. If that's just as slow, it's in finding the records in the first place.
Do most of the columns really need to be NULLable? Change to NOT NULL where applicable.
What percentage of the table has type_id = 1? If it is most of the table, then that would explain why you don't see much improvement. Meanwhile, the EXPLAIN seems to be thinking there are only two distinct values for type_id, hence it says only half the table will be scanned -- this number cannot be trusted.
To get more insight into what is going on, please do these:
EXPLAIN FORMAT=JSON SELECT...;
And
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
We can help interpret the data you get there. (Here is a brief discussion of such.)

Index on DATE(column)

if i use the following to compare a date with a datetime
SELECT * FROM `calendar` WHERE DATE(startTime) = '2010-04-29'
can i still use a index on startTime ?
after i create a index :
create index mi_date on calendar(startTime);
explain result :
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | calendar | index | NULL | mi_date | 6 | NULL | 25 | Using where; Using index |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
and with query
SELECT * FROM `calendar` WHERE startTime like '2010-04-29 %'
explain cmd :
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | calendar | index | mi_date | mi_date | 6 | NULL | 25 | Using where; Using index |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
the diference between the first explain is that column possible_keys is not null.
And there is also this query :
SELECT * FROM `calendar` WHERE startTime BETWEEN '2010-04-29 00:00:00' AND '2010-04-29 23:59:59'
explain :
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | calendar | range | mi_date | mi_date | 6 | NULL | 16 | Using where; Using index |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+--------------------------+
the column type is range and the rows scanned are less just 16 which is the result of my query.
p.s : i have 25 rows in my table.
With EXPLAIN
mysql> EXPLAIN SELECT * FROM `calendar` WHERE DATE(startTime) = '2010-04-29'
you can check index using from mysql. But normaly MySQl dont' use a index using DATE().
Try
SELECT * FROM `calendar` WHERE startTime BETWEEN '2010-04-29 00:00:00' AND '2010-04-29 23:59:59'
You can know it adding "Explain" before your query, this will report with keys are being used. Usually the use of functions disables indexed searchs, but you could perfectly use indexes modifying your query:
SELECT * FROM `calendar` WHERE startTime like '2010-04-29 %'
Well, as you can see in the results of the Explain, in this case (yours, using date(..) ) mysql is already using the index. Check the ref field:
ref – Shows the columns or constants that are compared to the index
named in the key column. MySQL will either pick a constant value to be
compared or a column itself based on the query execution plan. You can
see this in the example given below.
Which is better? At this point they must be likely the same. You can try other queries and take a look at the rows field.
rows – lists the number of records that were examined to produce the
output. This Is another important column worth focusing on optimizing
queries, especially for queries that use JOIN and subqueries.
The on with less rows examined is the best.

MySQL equality slower than greater-than

I'm having a curious issue with my MySQL database. The following query runs in 0.003 seconds:
SELECT * FROM `post` where `thread_id` > 12117484 and `index` > -1 limit 1;
If I change the second > to an =, the query doesn't complete (it runs for over a minute):
SELECT * FROM `post` where `thread_id` > 12117484 and `index` = 0 limit 1;
It's worth noting that the result from the first query has index = 0. I know it's bad form to name a column index...but it's the database I've been given. Here's the MySQL Explain for the second query:
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | post | range | post_thread_id_idx1 | post_thread_id_idx1 | 5 | NULL | 1 | Using index condition; Using where |
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+

Why my query is slower if a date column is in the SELECT part of my query?

I have a strange behavior with my mysql query below:
SELECT domain_id, domain_name, domain_lastupdate
FROM domains
WHERE domain_id > 300000 LIMIT 2000
takes ~ 15seconds...
while
SELECT domain_id, domain_name
FROM domains
WHERE domain_id > 300000 LIMIT 2000
takes ~ 0.05seconds...
I've tried different ids with different limits doing one before the other and the other way around not to get cached results, but I end up with dramatic time differences.
I have 1 index on the domain_id, 1 on the domain_name, but none with both columns...
I just don't get it...
#
The domain_lastupdate is a simple Date column.
Here's the EXPLAIN output of both queries:
explain SELECT domain_id, domain_name, domain_lastupdate FROM domains WHERE domain_id > 255000 LIMIT 500;
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
| 1 | SIMPLE | domains | range | UN_domainid | UN_domainid | 4 | NULL | 12575357 | Using where |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+-------------+
1 row in set (0.00 sec)
second one:
explain SELECT domain_id, domain_name FROM domains WHERE domain_id > 255000 LIMIT 500;
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | domains | range | UN_domainid | UN_domainid | 4 | NULL | 12575369 | Using where; Using index |
+----+-------------+---------+-------+---------------+-------------+---------+------+----------+--------------------------+
1 row in set (0.01 sec)
Any idea why the first one doesn't use the index ?
When you are pulling out the non date columns that you have indexed the SQL Server is able to pull your data directly out of the index and needn't go to the table at all. To get the date it is having to hit the table. Add an index on the date column.
Also I suppose you could create a multi column index. Make sure you have domain_id as the first column in the index. Creating Indexes
What you want to use is what is called A Covering Index