So I have a table about 2GB in size, I run 2 queries and one takes about 200ms and the other takes over 8s, both return '0' which is correct. I added device_id and time_server as indexes and assumed it would make them quicker, it did.. well for one of the queries. So why is there huge difference in time taken to query the same table?
Have I just been unlucky in that one query is running in memory and the other is running from disk as I've hit the limit of innodb_buffer_pool_size?
Why the difference in the row counts from EXPLAIN, if both return a count of 0, I'd have thought it would do a full table scan and the row count would be identical?
Its worth noting that CPU, RAM, Disk I/O etc.. are all fine with nothing obvious that could slow it down. Repeatedly running the queries gives the same results, so its consistent.
Query 1
mysql> SELECT count(*) AS count
FROM mydb.gps
WHERE device_id = 780 AND time_server > '2021-08-03 16:32:48';
+-------+
| count |
+-------+
| 0 |
+-------+
1 row in set (8.20 sec)
Query 2:
mysql> SELECT count(*) AS count
FROM mydb.gps
WHERE device_id = 430 AND time_server > '2021-08-03 16:32:48';
+-------+
| count |
+-------+
| 0 |
+-------+
1 row in set (0.02 sec)
If I run an explain on them:
Query 1
mysql> EXPLAIN
-> SELECT count(*) AS count FROM mydb.gps WHERE device_id = 780 AND time_server > '2021-08-03 16:32:48';
+----+-------------+-------+------------+------+-----------------------+-----------+---------+-------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------+-----------+---------+-------+--------+----------+-------------+
| 1 | SIMPLE | gps | NULL | ref | device_id,time_server | device_id | 5 | const | 282416 | 2.12 | Using where |
+----+-------------+-------+------------+------+-----------------------+-----------+---------+-------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Query 2
mysql> EXPLAIN
-> SELECT count(*) AS count FROM mydb.gps WHERE device_id = 430 AND time_server > '2021-08-03 16:32:48';
+----+-------------+-------+------------+------+-----------------------+-----------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------+-----------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | gps | NULL | ref | device_id,time_server | device_id | 5 | const | 2001 | 2.12 | Using where |
+----+-------------+-------+------------+------+-----------------------+-----------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
You have more than 100x more rows with the first device_id, so the query has more rows to scan to check the time_server value. You may be able to improve it by creating a multi-column index:
ALTER TABLE gps ADD INDEX (device_id, time_server);
Related
Here is the table data
id
time
amount
1
20221104
15
2
20221104
10
3
20221105
7
4
20221105
19
5
20221106
10
The id and time field is asc, but time can be same.
The rows are very large, so we don't want to use page limit offset method, but with cursor id.
first query:
select * from t where time > xxx and time < yyy order by id asc limit 10;
get the biggest id zzz, then
next query:
select * from t where time > xxx and time < yyy and id > zzz order by id asc limit 10;
How should I build the index?
If I use id as index, the time range will cause huge scan if time is far away.
And If I use time as index, seek id will not be effective.
The following index should be enough for both queries:
alter table t add index `time_id` (`time`,`id`);
Note, use proper date/datetime data types , will save a lot of pain in the future
The key is composite index by leftmost prefixing principle. But both queries here start with
range expression. So I suppose that simply creating index on (a, b) is unable to optimize effectively
because indexing process stops after range condition. It is enough to create index like this:
CREATE INDEX index_time ON t (`time`)
More can be referenced here:
https://www.ibm.com/docs/en/informix-servers/12.10?topic=indexes-use-composite
https://orangematter.solarwinds.com/2019/02/05/the-left-prefix-index-rule/
First I agree with #ErgestBasha Suggestion:
If you follow the general performance rules:
CREATE TABLE t (id INT, time DATE, amount DEC(3,1));
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO t VALUES
-> (1, '2022-11-04', 15),
-> (2, '2022-11-04', 10),
-> (3, '2022-11-05', 7),
-> (4, '2022-11-05', 19),
-> (5, '2022-11-06', 10);
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t;
+------+------------+--------+
| id | time | amount |
+------+------------+--------+
| 1 | 2022-11-04 | 15.0 |
| 2 | 2022-11-04 | 10.0 |
| 3 | 2022-11-05 | 7.0 |
| 4 | 2022-11-05 | 19.0 |
| 5 | 2022-11-06 | 10.0 |
+------+------------+--------+
5 rows in set (0.00 sec)
mysql> ALTER TABLE t ADD INDEX idx_time_id (time,id);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SHOW INDEXES FROM t;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| t | 1 | idx_time_id | 1 | time | A | 3 | NULL | NULL | YES | BTREE | | | YES | NULL |
| t | 1 | idx_time_id | 2 | id | A | 5 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.01 sec)
mysql> SELECT * FROM t WHERE time > '2022-11-04' AND time < '2022-11-06' AND id > 3 ORDER BY id ASC LIMIT 10;
+------+------------+--------+
| id | time | amount |
+------+------------+--------+
| 4 | 2022-11-05 | 19.0 |
+------+------------+--------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t WHERE time > '2022-11-04' AND time < '2022-11-06' AND id > 3 ORDER BY id ASC LIMIT 10;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | t | NULL | range | idx_time_id | idx_time_id | 4 | NULL | 2 | 33.33 | Using index condition; Using filesort |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
As you can see, It uses the indexes defined (time,id) and uses the range scan access method. Also Extra column you can see that index is used during operation!
See this for iterating through a compound key:
http://mysql.rjweb.org/doc.php/deletebig#iterating_through_a_compound_key
It cannot be done with two ANDs; tt needs one AND and one OR.
See this for why OFFSET should be avoided when Paginating
I have a table located in RAM and doing some performance tests.
Let's consider a sample query, adding explain sentences along with results
mysql> explain update users_ram set balance = balance + speed where sub = 1;
+----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | UPDATE | users_ram | NULL | ALL | NULL | NULL | NULL | NULL | 2333333 | 100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set (0.00 sec)
mysql> update users_ram set balance = balance + speed where sub = 1;
Query OK, 1166970 rows affected (0.37 sec)
Rows matched: 1166970 Changed: 1166970 Warnings: 0
As you can see, it takes 0.37 sec without index. Then I'm creating an index on the sub column, which is an int column with just two possible values of 0 and 1, and surprisingly nothing changes
mysql> create index sub on users_ram (sub);
Query OK, 2333333 rows affected (2.04 sec)
Records: 2333333 Duplicates: 0 Warnings: 0
mysql> show index from lords.users_ram;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| users_ram | 0 | user | 1 | user | NULL | 2333333 | NULL | NULL | YES | HASH | | |
| users_ram | 1 | sub | 1 | sub | NULL | 2 | NULL | NULL | | HASH | | |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
mysql> explain update users_ram set balance = balance + speed where sub = 1;
+----+-------------+-----------+------------+-------+---------------+------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+------+---------+-------+---------+----------+-------------+
| 1 | UPDATE | users_ram | NULL | range | sub | sub | 5 | const | 1166666 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+------+---------+-------+---------+----------+-------------+
1 row in set (0.00 sec)
mysql> update users_ram set balance = balance + speed where sub = 1;
Query OK, 1166970 rows affected (0.37 sec)
Rows matched: 1166970 Changed: 1166970 Warnings: 0
If I remove the index and add it again, but now using btree, it gets even more weird
mysql> explain update users_ram set balance = balance + speed where sub = 1;
+----+-------------+-----------+------------+-------+---------------+------+---------+-------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+------+---------+-------+---------+----------+-------------+
| 1 | UPDATE | users_ram | NULL | range | sub | sub | 5 | const | 1057987 | 100.00 | Using where |
+----+-------------+-----------+------------+-------+---------------+------+---------+-------+---------+----------+-------------+
1 row in set (0.00 sec)
mysql> update users_ram set balance = balance + speed where sub = 1;
Query OK, 1166970 rows affected (0.62 sec)
Rows matched: 1166970 Changed: 1166970 Warnings: 0
How could adding an index could have no effect or even slow down the query?
Let's take into account that I'm not modifying the column which is indexed, so mysql doesn't have to do an extra write operation, so really I can't get what's really happening here.
"table located in RAM" -- I suspect that is technically incorrect. The possibilities (in MySQL):
The table lives on disk, but it is usually fully cached in the in-RAM "buffer_pool".
The table is ENGINE=MEMORY. But that is used only for temp stuff; it is completely lost if the server goes down.
update users_ram set balance = balance + speed where sub = 1;
The table users_ram needs some index starting with sub. With such, it can go directly to the row(s). But...
It seems that there are 1166970 such rows. That seems like half the table?? At which point, the index is pretty useless. But...
Updating 1M rows is terribly slow, regardless of indexing.
Plan A: Avoid the UPDATE. Perhaps this can be done by storing speed in some other table and doing the + whenever you read the data. (It is generally bad schema design to need huge updates like that.)
Plan B: Update in chunks: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
How the heck did you get index-type to be HASH? Perhaps `ENGINE=MEMORY? What version of MySQL?
What is speed? Another column? A constant?
Please provide SHOW CREATE TABLE users_ram -- There are some other things we need to see, such as the PRIMARY KEY and ENGINE.
(I need some of the above info before tackling "How could adding an index could have no effect or even slow down the query?")
I have a MariaDB 10.4 with a hung table (about 100 million rows) for storing crawled posts. The table contains 4x columns, and one of them is lastUpadate (datetime) and indexed.
Recently I try to select posts by lastUpdate. Most of them returns fast with index used, but some takes minutes with fewer records returned and looks like a table scan.
This is the query explain without conditions.
> explain select 1 from SourceAttr;
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| 1 | SIMPLE | SourceAttr | index | NULL | idxCreateDate | 5 | NULL | 79830491 | Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
This is the query explain and number of rows returned for the slow one. The number of rows in the explain is almost equals to the above one.
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | SourceAttr | index | idxLastUpdate | idxLastUpdate | 5 | NULL | 79827437 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
394454 rows in set (14 min 40.908 sec)
The is the fast one.
> explain select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | SourceAttr | range | idxLastUpdate | idxLastUpdate | 5 | NULL | 3699041 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
1352552 rows in set (2.982 sec)
Any reason what might cause this ?
Thanks a lot.
When you see type: index it's called an index scan. This is almost as bad as a table-scan.
Notice the rows: 79827437 in the EXPLAIN of the two slow queries. This means it's examining over 79 million items in the scanned index, either idxCreateDate or idxLastUpdate. So it's basically examining every index entry, which takes nearly as long as examining every row of the table.
Whereas the quick query says rows: 3699041 so it's estimating less than 3.7 million rows examined. More than 20x fewer.
mysql> EXPLAIN SELECT fldjobitemid, fldstatus, tblbulkreportjobitems.fldparticipantid, CONCAT(fldFirstName, ' ', fldLastName) as full_name FROM tblbulkreportjobitems FORCE INDEX (fldparticipantid) JOIN tblparticipant ON tblparticipant.fldParticipantId = tblbulkreportjobitems.fldparticipantid WHERE fldjobid = 9 ORDER BY fldjobitemid
-> ;
+----+-------------+-----------------------+--------+--------------------------+---------+---------+------------------------------------------------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------------+--------+--------------------------+---------+---------+------------------------------------------------------+------+-----------------------------+
| 1 | SIMPLE | tblbulkreportjobitems | ALL | fldparticipantid | NULL | NULL | NULL | 869 | Using where; Using filesort |
| 1 | SIMPLE | tblparticipant | eq_ref | PRIMARY,fldParticipantId | PRIMARY | 4 | medicus_devel.tblbulkreportjobitems.fldparticipantid | 1 | Using where |
+----+-------------+-----------------------+--------+--------------------------+---------+---------+------------------------------------------------------+------+-----------------------------+
2 rows in set (0.05 sec)
Why is it using a filesort still?
The MySQL Query Optimizer will always overrule your choice of index if the number of keys is 5% of the number of rows in the table.
Run these queries please:
SELECT COUNT(1) FROM tblbulkreportjobitems;
I guess this should be 869 (from the explain plan)
SELECT COUNT(1) FROM tblbulkreportjobitems WHERE fldjobid = 9;
SELECT COUNT(1),fldjobid FROM tblbulkreportjobitems GROUP BY fldjobid WITH ROLLUP;
SELECT COUNT(1),fldjobid FROM tblbulkreportjobitems GROUP BY fldjobid WITH ROLLUP;
You will see which rows that appear more than 5% of the total rows. In that case, the MySQL Query Optimizer will choose a full table scan over using a lopsided index.
If you had fldparticipantid in the WHERE clause, then you will get a different result.
I have two (characteristic_list and measure_list) tables that are related to each other by a column called 'm_id'. I want to retrieve records using filters (columns from characteristic_list) within a date range (columns from measure_list). When I gave the following SQL using INNER JOIN, it takes a while to retrieve the record. What am I doing wrong?
mysql> explain select c.power_set_point, m.value, m.uut_id, m.m_id, m.measurement_status, m.step_name from measure_list as m INNER JOIN characteristic_lis
t as c ON (m.m_id=c.m_id) WHERE (m.sequence_end_time BETWEEN '2010-06-18' AND '2010-06-20');
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | c | ALL | NULL | NULL | NULL | NULL | 82952 | |
| 1 | SIMPLE | m | ALL | NULL | NULL | NULL | NULL | 85321 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-------+-------------+
2 rows in set (0.00 sec)
mysql> select count(*) from measure_list;
+----------+
| count(*) |
+----------+
| 83635 |
+----------+
1 row in set (0.18 sec)
mysql> select count(*) from characteristic_list;
+----------+
| count(*) |
+----------+
| 83635 |
+----------+
1 row in set (0.10 sec)
The reason this query takes a while to execute is because it has to scan the entire table. You never want to see "ALL" as the type of the query. To speed things up, you need to make smart decisions about what to index.
See the following documents at the MySQL site:
http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
http://dev.mysql.com/doc/refman/5.1/en/using-explain.html
As an add-on to the previous answer by Dan, you should consider indexing the join columns and the where columns. In this case, that means the m_id cols in both tables and the sequence_end_time in the measure_list table. They are small enough that you could add an index, run explain plan and time it, then change the index and compare. Should be relatively quick to solve.