Speed up query with large table DESC limit 1 - mysql

MariaDB 10 (myisam)
Query executes rather slowly, takes about 90 seconds.
I tried deleting some old rows and then optimizing the table.
SELECT ceil(rate * 8 / 1000000)
FROM db.Octets
WHERE id = 5344
order by datetime DESC
LIMIT 1;
Query takes a really long time to execute.
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | Octets | index | NULL | Octets_1_idx | 8 | NULL | 1 | Using where |
+------+-------------+----------------+-------+---------------+------------------+---------+------+------+-------------+

you could try adding a composite redundant index
create index idx2 on Octets ( id , datetime, rate)

Related

Why MySQL indexing is taking too much time for < operator?

This is my MYSQL table demo having more than 7 million rows;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | varchar(42) | YES | MUL | NULL | |
| date | datetime | YES | MUL | NULL | |
| text | varchar(100) | YES | | NULL | |
+-------+--------------+------+-----+---------+-------+
I read that indexes work sequentially.
Case 1:
select * from demo where id="43984a7e-edcf-11ea-92c7-509a4cb89342" order by date limit 30;
I created (id, date) index and it is working fine and query is executing too fast.
But Hold on to see the below cases.
Case 2:
Below is my SQL query.
select * from demo where id>"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
to execute the above query faster I created an index on (id, date). But it is taking more than 10 sec.
then I made another index on (date). This took less than 1 sec. Why the composite index(id, date) is too much slower than (date) index in this case ??
Case 3:
select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
for this query, even the (date) index is taking more than 1.8 sec. Why < operator is not optimized with any index either it is (date) or(id, date).
and even this query is just going through around 300 rows and still taking more than 1.8 sec why?
mysql> explain select * from demo where id<"43984a7e-edcf-11ea-92c7-509a4cb89342" order by date desc limit 30;
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
| 1 | SIMPLE | demo | NULL | index | demoindex1,demoindex2 | demoindex3 | 6 | NULL | 323 | 36.30 | Using where |
+----+-------------+-------+------------+-------+-----------------------+------------+---------+------+------+----------+-------------+
Any suggestions for how to create an index in Case 3 to optimize it?
In your first query, the index can be used for both the where clause and the ordering. So it will be very fast.
For the second query, the index can only be used for the where clause. Because of the inequality, information about the date is no longer in order. So the engine needs to explicitly order.
In addition, I imagine that the second query returns much more data than the first -- a fair amount of data if it take 10 seconds to sort it.

MySQL - nested select queries running many time slower than sequential queries (on a large table)

I have a MySQL query that I am having performance problems with that I do not understand. When I try to debug and run the overall query as a sequence of separate subqueries they seem to perform reasonably well, given the volume of data. When I combine them into a single nested query I get much much much longer execution times.
The main ratings table mentioned below is approx 30 million rows (4GB of disk space), with a couple of foreign keys (it's a many-to-many table linking users and items with a small amount of additional supplementary user specific item information - approx 13 fields and 30 bytes).
Query 1 - approx 23s
SELECT COUNT(1) FROM (SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1) AS t1;
Query 1 saved to table - approx 65s if I save the results to a temporary table
CREATE TABLE temp_table SELECT fields FROM ratings WHERE (id >= 0 AND id < 10000)
AND item_type = 1;
Query 2 - approx 3s
SELECT COUNT(1) FROM temp_table WHERE id IN (SELECT id from item_stats WHERE
ratings_count > 1000);
Bases on this I would expect a combined query to be approx 30s or so, and not more than approx 70s.
Combined query (Query 1 + Query 2) - indeterminate time (10s of minutes before I give up and cancel)
SELECT COUNT(1) from (SELECT * FROM (SELECT fields FROM ratings WHERE (id >= 0
AND id < 10000) AND item_type = 1) AS t1 WHERE t1.id IN (SELECT id FROM
item_stats WHERE ratings_count > 1000)) as t2;
Can anyone help explain this difference and guide me in creating a query that works? If I need to I can rely on the sequential queries (which would take approx 70s), but that is cumbersome and does not seem the right way to go.
I have tried using INNER JOIN instead of IN but this did not seem to make much difference. The ID count from the item_stats table is about 2700 IDs.
It's using MySQL 8.0 on a laptop (16GB RAM, SSD).
Response to suggestions / questions:
Query 1
EXPLAIN select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1;
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | collections | NULL | ALL | user_id | NULL | NULL | NULL | 32898400 | 1.31 | Using where |
+----+-------------+-------------+------------+------+---------------+------+---------+------+----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Query 2
EXPLAIN select * from temp_coll where game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| 1 | SIMPLE | <subquery2> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | NULL |
| 1 | SIMPLE | temp_coll | NULL | ALL | NULL | NULL | NULL | NULL | 1674386 | 10.00 | Using where; Using join buffer (hash join) |
| 2 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
3 rows in set, 1 warning (0.00 sec)
Combined query
EXPLAIN select * from (select user_id, game_id, item_type_id, rating, plays, own, bgg_last_modified from collections where (user_id >= 0 and user_id < 10000) and item_type_id = 1) as t1 where t1.game_id in (select game_id from games_ratings_stats where (ratings_count > 1000) or (ratings_count > 500 and ratings_avg >= 7.0));
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | <subquery3> | NULL | ALL | NULL | NULL | NULL | NULL | NULL | 100.00 | Using where |
| 1 | SIMPLE | collections | NULL | ref | user_id,game_id | game_id | 5 | <subquery3>.game_id | 199 | 1.31 | Using where |
| 3 | MATERIALIZED | games_ratings_stats | NULL | ALL | NULL | NULL | NULL | NULL | 81585 | 40.74 | Using where |
+----+--------------+---------------------+------------+------+-----------------+---------+---------+---------------------+-------+----------+-------------+
3 rows in set, 1 warning (0.00 sec)
Your query appears to be functionally identical to the following (rather implausible) query:
SELECT COUNT(*) total
FROM ratings r
JOIN item_stats s
ON s.id = r.id
WHERE r.id >= 0
AND r.id < 10000
AND r.item_type = 1
AND s.ratings_count > 1000
r.id is, presumably, the PRIMARY KEY, so it's automatically included in any INNODB index, which leaves just item_type and ratings_count requiring indexes.
You would benefit a lot from an online tutorial on learning how to read the EXPLAIN plan. The EXPLAINS you shared clearly show missing indexes.
As a general rule, queries should not take 23 seconds or 65 seconds, even with millions of rows. Proper indexes + partitioning should resolve the slowness.
Query 1: The user_id index on that table is not helping performance, as 99% of users are within the range in the where clause. You can add an index on item_type_id
ALTER TABLE collections ADD KEY (item_type_id)
Query 2: The temp_coll table is missing a game_id index. Also, I'm not sure if the underlying code for games_ratings_stats has an index on ratings_count and if that would help. I dont have experience with MySQL materialized tables.
ALTER TABLE temp_coll ADD KEY (game_id)
Query 3:
Would benefit from above indexes.
Increasing the InnoDB Buffer Pool Size (now set to 8GB) seems to have made a significant improvement. If anyone has any further setup or tuning advice on MySQL then that would be appreciated!

MariaDB SELECT with index used but looks like table scan

I have a MariaDB 10.4 with a hung table (about 100 million rows) for storing crawled posts. The table contains 4x columns, and one of them is lastUpadate (datetime) and indexed.
Recently I try to select posts by lastUpdate. Most of them returns fast with index used, but some takes minutes with fewer records returned and looks like a table scan.
This is the query explain without conditions.
> explain select 1 from SourceAttr;
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
| 1 | SIMPLE | SourceAttr | index | NULL | idxCreateDate | 5 | NULL | 79830491 | Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+-------------+
This is the query explain and number of rows returned for the slow one. The number of rows in the explain is almost equals to the above one.
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
| 1 | SIMPLE | SourceAttr | index | idxLastUpdate | idxLastUpdate | 5 | NULL | 79827437 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+----------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-11 11:46:37' AND lastUpdate < '2020-01-12 11:46:37');
394454 rows in set (14 min 40.908 sec)
The is the fast one.
> explain select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | SourceAttr | range | idxLastUpdate | idxLastUpdate | 5 | NULL | 3699041 | Using where; Using index |
+------+-------------+------------+-------+---------------+---------------+---------+------+---------+--------------------------+
> select 1 from SourceAttr where (lastUpdate >= '2020-01-15 11:46:37' AND lastUpdate < '2020-01-16 11:46:37');
1352552 rows in set (2.982 sec)
Any reason what might cause this ?
Thanks a lot.
When you see type: index it's called an index scan. This is almost as bad as a table-scan.
Notice the rows: 79827437 in the EXPLAIN of the two slow queries. This means it's examining over 79 million items in the scanned index, either idxCreateDate or idxLastUpdate. So it's basically examining every index entry, which takes nearly as long as examining every row of the table.
Whereas the quick query says rows: 3699041 so it's estimating less than 3.7 million rows examined. More than 20x fewer.

mysql time for select not same as real lines

I met a unexpect result in my mysql server.
the lines more , the query time less??
I have one table and for the total rows for each filter:
select count(*) from tcr where eid=648;
+----------+
| count(*) |
+----------+
| 11336 |
select count(*) from tcr where eid=997;
+----------+
| count(*) |
+----------+
| 1262307 |
but the query time is oppisite to the total lines for each filter:
select * from tcr where eid=648 order by start_time desc limit 0,10;
[data display]
10 rows in set (16.92 sec)
select * from tcr where eid=997 order by start_time desc limit 0,10;
[data display]
10 rows in set (0.21 sec)
"reset query cache" has been execute before every query sql.
the index of table tcr is
KEY `cridx_eid` (`eid`),
KEY `cridx_start_time` (`start_time`)
BTW:attach the explain result: this is very strange, but it looks more like the reuslt we take.(the eid=997 has less lines than eid=648
explain select * from talk_call_record where eid=648 order by start_time desc limit 0,10;
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | talk_call_record | index | cridx_eid | cridx_start_time | 5 | NULL | 3672 | Using where |
explain select * from talk_call_record where eid=997 order by start_time desc limit 0,10;
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | talk_call_record | index | cridx_eid | cridx_start_time | 5 | NULL | 32 | Using where |
First, you must have a very large table.
MySQL is using the index on start_time for the queries. What is happening is that it is "walking" through the table, one row at a time. It happens to find eid=997 much more quickly than it finds eid=648. It only has to find 10 records, so the engine stops when it gets to the 10th one.
What can you do? The optimal index for the query is a composite index on (eid, start_time). This will go directly to the values that you want.

MySQL equality slower than greater-than

I'm having a curious issue with my MySQL database. The following query runs in 0.003 seconds:
SELECT * FROM `post` where `thread_id` > 12117484 and `index` > -1 limit 1;
If I change the second > to an =, the query doesn't complete (it runs for over a minute):
SELECT * FROM `post` where `thread_id` > 12117484 and `index` = 0 limit 1;
It's worth noting that the result from the first query has index = 0. I know it's bad form to name a column index...but it's the database I've been given. Here's the MySQL Explain for the second query:
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+
| 1 | SIMPLE | post | range | post_thread_id_idx1 | post_thread_id_idx1 | 5 | NULL | 1 | Using index condition; Using where |
+----+-------------+-------+-------+---------------------+---------------------+---------+------+------+------------------------------------+