Why MySQL indexing is taking too much time for < operator? - mysql

This is my MYSQL table demo having more than 7 million rows;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | varchar(42) | YES | MUL | NULL | |
| date | datetime | YES | MUL | NULL | |
| text | varchar(100) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
I read that indexes work sequentially.
Case 1:
select * from demo where id="43984a7e-edcf-11ea-92c7-509a4cb89342" order by date limit 30;
I created (id, date) index and it is working fine and query is executing too fast.
But Hold on to see the below cases.
Case 2:
Below is my SQL query.
select * from demo where id>"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
to execute the above query faster I created an index on (id, date). But it is taking more than 10 sec.
then I made another index on (date). This took less than 1 sec. Why the composite index(id, date) is too much slower than (date) index in this case ??
Case 3:
select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
for this query, even the (date) index is taking more than 1.8 sec. Why < operator is not optimized with any index either it is (date) or(id, date).
and even this query is just going through around 300 rows and still taking more than 1.8 sec why?
mysql> explain select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | demo | NULL | index | demoindex1,demoindex2 | demoindex3 | 6 | NULL | 323 | 36.30 | Using where |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
Any suggestions for how to create an index in Case 3 to optimize it?

In your first query, the index can be used for both the where clause and the ordering. So it will be very fast.
For the second query, the index can only be used for the where clause. Because of the inequality, information about the date is no longer in order. So the engine needs to explicitly order.
In addition, I imagine that the second query returns much more data than the first -- a fair amount of data if it take 10 seconds to sort it.

Related

MySQL Index - not full table is in index

I have a simple InnoDB table with 1M+ rows and some simple indexes.
I need to sort this table by first_public and id columns and get some of them, this is why I've indexed first_public column.
first_public is unique at the moment, but in real life it might be not.
mysql> desc table;
+--------------+-------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+-------------------------+------+-----+---------+----------------+
| id | bigint unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
| id_category | int | NO | MUL | NULL | |
| active | smallint | NO | | NULL | |
| status | enum('public','hidden') | NO | | NULL | |
| first_public | datetime | YES | MUL | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
+--------------+-------------------------+------+-----+---------+----------------+
8 rows in set (0.06 sec)
it works well while I'm working with rows before 130000+
mysql> explain select id from table where active = 1 and status = 'public' order by first_public desc, id desc limit 24 offset 130341;
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
| 1 | SIMPLE | table | NULL | index | NULL | firstPublicDateIndx | 6 | NULL | 130365 | 5.00 | Using where; Backward index scan |
+----+-------------+--------+------------+-------+---------------+---------------------+---------+------+--------+----------+----------------------------------+
1 row in set, 1 warning (0.00 sec)
but when I try to get some next rows (with offset 140000+), it looks like MySQL don't use first_public column index at all.
mysql> explain select id from table where active = 1 and status = 'public' order by first_public desc, id desc limit 24 offset 140341;
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
| 1 | SIMPLE | table | NULL | ALL | NULL | NULL | NULL | NULL | 1133533 | 5.00 | Using where; Using filesort |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-----------------------------+
1 row in set, 1 warning (0.00 sec)
I tried to add first_public column in to select clause, but nothing changed.
What I'm doing wrong?
MySQL's optimizer tries to estimate the cost of doing your query, to decide if it's worth using an index. Sometimes it compares the cost of using the index versus just reading the rows in order, and discarding the ones that don't belong in the result.
In this case, it decided that if you use an OFFSET greater than 140k, it gives up on using the index.
Keep in mind how OFFSET works. There's no way of looking up the location of an offset by an index. Indexes help to look up rows by value, not by position. So to do an OFFSET query, it has to examine all the rows from the first matching row on up. Then it discards the rows it examined up to the offset, and then counts out the enough rows to meet the LIMIT and returns those.
It's like if you wanted to read pages 500-510 in a book, but to do this, you had to read pages 1-499 first. Then when someone asks you to read pages 511-520, and you have to read pages 1-510 over again.
Eventually the offset gets to be so large that it's less expensive to read 14000 rows in a table-scan, than to read 14000 index entries + 14000 rows.
What?!? Is OFFSET really so expensive? Yes, it is. It's much more common to look up rows by value, so MySQL is optimized for that usage.
So if you can reimagine your pagination queries to look up rows by value instead of using LIMIT/OFFSET, you'll be much happier.
For example, suppose you read "page" 1000, and you see that the highest id value on that page is 13999. When the client requests the next page, you can do the query:
SELECT ... FROM mytable WHERE id > 13999 LIMIT 24;
This does the lookup by the value of id, which is optimized because it utilizes the primary key index. Then it reads just 24 rows and returns them (MySQL is at least smart enough to stop reading after it reaches the OFFSET + LIMIT rows).
The best index is
INDEX(active, status, first_public, id)
Using huge offsets is terribly inefficient -- it must scan over 140341 + 24 rows to perform the query.
If you are trying to "walk through" the table, use the technique of "remembering where you left off". More discussion of this: http://mysql.rjweb.org/doc.php/pagination
The reason for the Optimizer to abandon the index: It decided that the bouncing back and forth between the index and the table was possibly worse than simply scanning the entire table. (The cutoff is about 20%, but varies widely.)

Why does an indexed mysql query filtered on less char values result in more rows examined?

When I run the following query, I see the expected rows examined as 40
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action in ('R','C')
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 40 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
But when I run this query that simply changes AND s.period_end_action in ('R','C') to AND s.period_end_action = 'C', I see the expected rows examined as 611
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action = 'C'
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 611 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
I have the following indexes on the subscription table:
_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
INDEX(status, period_end_action),
INDEX(current_period_end_date),
Any ideas? I don't understand why removing one of the period_end_action values would cause such a large increase in rows examined?
(I agree with others that EXPLAIN often has terrible row estimates.)
Actually the numbers might be reasonable (though I doubt it). The optimizer decided to do a table scan in both cases. And the query with fewer options for period_end_action probably has to scan farther to get the 20 rows. This is because it punted on using either of your secondary indexes.
These indexes are more likely to help your second query:
INDEX(period_end_action, _id)
INDEX(period_end_action, status)
INDEX(period_end_action, current_period_end_date)
The optimal index is usually starts with any columns tested by =.
Since there is no such thing for your first query, the Optimizer probably decided to scan in _id order so that it could avoid the "sort" mandated by ORDER BY.

Speed up query with large table DESC limit 1

MariaDB 10 (myisam)
Query executes rather slowly, takes about 90 seconds.
I tried deleting some old rows and then optimizing the table.
SELECT ceil(rate * 8 / 1000000)
FROM db.Octets
WHERE id = 5344
order by datetime DESC
LIMIT 1;
Query takes a really long time to execute.
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | Octets | index | NULL | Octets_1_idx | 8 | NULL | 1 | Using where |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
you could try adding a composite redundant index
create index idx2 on Octets ( id , datetime, rate)

MySQL- Improvement on count(*) aggregation with composite index keys

I have a table with the following structure with almost 120000 rows,
desc user_group_report
+------------------+----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+----------+------+-----+-------------------+-------+
| user_id | int | YES | MUL | NULL | |
| group_id | int(11) | YES | MUL | NULL | |
| type_id | int(11) | YES | | NULL | |
| group_desc | varchar(128)| NO| | NULL |
| status | enum('open','close')|NO| | NULL | |
| last_updated | datetime | NO | | CURRENT_TIMESTAMP | |
+------------------+----------+------+-----+-------------------+-------+
I have indexes on the following keys :
user_group_type(user_id,group_id,group_type)
group_type(group_id,type_id)
user_type(user_id,type_id)
user_group(user_id,group_id)
My issue is I am running a count(*) aggregation on above table group by group_id and with a clause on type_id
Here is the query :
select count(*) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
and here is the explain plan (query taking 0.3 secs on average):
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
| 1 | SIMPLE | user_group_report | index | user_group_type,group_type,user_group | group_type | 10 | NULL | 119811 | Using where; Using index |
+----+-------------+------------------+-------+---------------------------------+---------+---------+------+--------+--------------------------+
Here as I understand the query almost does a full table scan because of complex indices and When I am trying to add an index on group_id, the rows in explain plan shows a less number (almost half the rows) but the time taking for query execution is increased to 0.4-0.5 secs.
I have tried different ways to add/remove indices but none of them is reducing the time taken.
Assuming the table structure cannot be changed and querying is independent of other tables, Can someone suggest me a better way to optimize the above query or If i am missing anything here.
PS:
I have already tried to modify the query to the following but couldn't find any improvement.
select count(user_id) user_count, group_id
from user_group_report
where type_id = 1
group by group_id;
Any little help is appreciated.
Edit:
As per the suggestions, I added a new index
type_group on (type_id,group_id)
This is the new explain plan. The number of rows in explain,reduced but the query execution time is still the same
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
| 1 | SIMPLE | user_group_report | ref | user_group_type,type_group,user_group | type_group | 5 | const | 59846 | Using where; Using index |
+----+-------------+------------------+------+---------------------------------+---------+---------+-------+-------+--------------------------+
EDIT 2:
Adding details as suggested in answers/comments
select count(*)
from user_group_report
where type_id = 1
This query itself is taking 0.25 secs to execute.
and here is the explain plan:
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
| 1 | SIMPLE | user_group_report | ref | type_group | type_group | 5 | const | 59866 | Using index |
+----+-------------+------------------+------+---------------+---------+---------+-------+-------+-------------+
I believe that your group_type is wrong. Try to switch the attributes.
create index ix_type_group on user_group_report(type_id,group_id)
This index is better for your query because you specify the type_id = 1 in the where clause. Therefore, the query processor finds the first record with type_id = 1 in your index and then it scans the records in the index with this type_id and performs the aggregation. With such index, only relevant records in the index are accessed which is not possible with the group_type index.
If type_id is selective (i.e. it reduces the search space significantly), creating an index on type_id, group_id should help significantly.
This is because it reduces the number of records that need to be grouped first (remove everything where type_id != 1), and only then does the grouping/summing.
EDIT:
Following on from the comments, it seems we need to figure out more about where the bottleneck is - finding the records, or grouping/summing.
The first step would be to measure the performance of:
select count(*)
from user_group_report
where type_id = 1
If that is significantly faster, the challenge is likely in the grouping than in finding the records. If that's just as slow, it's in finding the records in the first place.
Do most of the columns really need to be NULLable? Change to NOT NULL where applicable.
What percentage of the table has type_id = 1? If it is most of the table, then that would explain why you don't see much improvement. Meanwhile, the EXPLAIN seems to be thinking there are only two distinct values for type_id, hence it says only half the table will be scanned -- this number cannot be trusted.
To get more insight into what is going on, please do these:
EXPLAIN FORMAT=JSON SELECT...;
And
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
We can help interpret the data you get there. (Here is a brief discussion of such.)

Is index dependent on selected columns?

I am executing most of the queries based on the time. So i created index for the created time. But , The index only works , If I select the indexed columns only. Is mysql index is dependant the selected columns?.
My Assumption On Index
I thought index is like a telephone dictionary index page. Ex: If i want to find "Mark" . Index page shows which page character "M" starts in the directory. I think as same as the mysql works.
Table
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | varchar(100) | YES | | NULL | |
| OPERATION | varchar(100) | YES | | NULL | |
| PID | int(11) | YES | | NULL | |
| CREATED_TIME | bigint(20) | YES | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
Indexes On the table.
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| IndexTest | 0 | PRIMARY | 1 | ID | A | 10261 | NULL | NULL | | BTREE | | |
| IndexTest | 1 | t_dx | 1 | CREATED_TIME | A | 410 | NULL | NULL | YES | BTREE | | |
+-----------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Queries Using Indexes:
explain select * from IndexTest where ID < 5;
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | IndexTest | range | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where |
+----+-------------+-----------+-------+---------------+---------+---------+------+------+-------------+
explain select CREATED_TIME from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
| 1 | SIMPLE | IndexTest | range | t_dx | t_dx | 9 | NULL | 5248 | Using where; Using index |
+----+-------------+-----------+-------+---------------+------+---------+------+------+--------------------------+
Queries Not using Indexes
explain select count(distinct(PID)) from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
explain select PID from IndexTest where CREATED_TIME > UNIX_TIMESTAMP(CURRENT_DATE())*1000;
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | IndexTest | ALL | t_dx | NULL | NULL | NULL | 10261 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+-------+-------------+
Short answer: No.
Whether indexes are used depends on the expresion in your WHERE clause, JOINs etc, but not on the columns you select.
But no rule without an exception (or actually a long list of those):
Long answer: Usually not
There are a number of factors used by the MySQL Optimizer in order to determine whether it should use an index.
The optimizer may decide to ignore an index if...
another (otherwise non-optimal) saves it from accessing the table data at all
it fails to understand that an expression is a constant
its estimates suggest it will return the full table anyway
if its use will cause the creation of a temporary file
... and tons of other reasons, some of which seem not to be documented anywhere
Sometimes the choices made by said optimizer are... erm... lets call them sub-optimal. Now what do you do in those cases?
You can help the optimizer by doing an OPTIMIZE TABLE and/or ANALYZE TABLE. That is easy to do, and sometimes helps.
You can make it use a certain index with the USE INDEX(indexname) or FORCE INDEX(indexname) syntax
You can make it ignore a certain index with the IGNORE INDEX(indexname) syntax
More details on Index Hints, Optimize Table and Analyze Table on the MySQL documentation website.
Actually, it makes no difference wether you select the column or not. Indexes are used for lookups, meaning for reducing really fast the number of records you need retrieved. That makes it usually useful in situations where: you have joins, you have where conditions. Also indexes help alot in ordering.
Updating and deleting can be sped up quite alot using indexes on the where conditions as well.
As an example:
table: id int pk ai, col1 ... indexed, col2 ...
select * from table -> does not use a index
select id from table where col1 = something -> uses the col1 index although it is not selected.
Looking at the second query, mysql does a lookup in the index, locates the records, then in this case stops and delivers (both id and col1 have index and id happens to be pk, so no need for a secondary lookup).
Situation changes a little in this case:
select col2 from table where col1 = something
This will make internally 2 lookups: 1 for the condition, and 1 on the pk for delivering the col2 data. Please notice that again, you don't need to select the col1 column to use the index.
Getting back to your query, the problem lies with: UNIX_TIMESTAMP(CURRENT_DATE())*1000;
If you remove that, your index will be used for lookups.
Is mysql index is dependant the selected columns?.
Yes, absolutely.
For example:
MySQL cannot use the index to perform lookups if the columns do not form a leftmost
prefix of the index. Suppose that you have the SELECT statements shown here:
SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
If an index exists on (col1, col2, col3), only the first two queries use the index.
The third and fourth queries do involve indexed columns, but (col2) and (col2, col3)
are not leftmost prefixes of (col1, col2, col3).
Have a read through the extensive documentation.
for mysql query , the answer is yes, but not all
the query:
explain select * from IndexTest where ID < 5;
use the table cluster index if you use innodb, its table's primary key, so it use primary for query
the second query:
select CREATED_TIME from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
this one is just fetch the index column that mysql does not need to fetch data from table but just index, so your explain result got "Using Index"
the query:
select count(distinct(PID)) from IndexTest where CREATED_TIME >
UNIX_TIMESTAMP(CURRENT_DATE())*1000;
it look like this
select PID from IndexTest where
CREATE_TIME>UNIX_TIMESTAMP(CURRENT_DATE())*1000 group by PID
mysql can use index to fetch data from database also, but mysql thinks this query it no need to use index to fetch data, because of the where condition filter, mysql thinks that use index fetch data is more expensive than scan all table, you can use force index also
the same reason for your last query
hopp this answer can help you
indexing helps speed the search for that particular column and associated data rather than the table data. So you have to include the indexed column to speed up select.