Why different primary key queries have huge speed difference in innodb? - mysql

I have a simple table Test:
id, primary key;
id2, index;
and other 50+ all kinds of type columns;
And I know that if I select id from Test, it'll use secondary index id2 rather that primary index (clustered index) as stated in this post.
If I force queries using primary index, why do the results time differ a lot when selecting different columns?
Query 1
select id, url from Test order by id limit 1000000, 1, uses only 500ms+ and here is the explain:
MySQL [x]> explain select id, url from Test order by id limit 1000000, 1;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| 1 | SIMPLE | Test | NULL | index | NULL | PRIMARY | 8 | NULL | 1000001 | 100.00 | NULL |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
Query 2
select * from Test order by id limit 1000000, 1 uses only 2000ms+, and here is the explain:
MySQL [x]> explain select * from Test order by ID limit 1000000, 1;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
| 1 | SIMPLE | Test | NULL | index | NULL | PRIMARY | 8 | NULL | 1000001 | 100.00 | NULL |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+---------+----------+-------+
1 row in set, 1 warning (0.00 sec)
I don't see any difference between both explains. So why is there such a huge difference regarding result time, since they use the same clustered index?

For the following query:
select id, url from t order by id limit 1000000, 1
MySQL seems to read 1,000,000 rows ordered by id instead of skipping them.
I would suggest changing the query to this:
select * from t where id = (select id from t order by id limit 1000000, 1)
MySQL seems to do a better job at skipping 1,000,000 rows when limit is placed inside a sub query.

Ok, I found the reason finally... It's because the implementation of mysql limit. (sorry that I just found this Chinese explanation, no English version)
In Query1 and Query2 above, here is what limit do:
Mysql query the clustered index, get the first row;
Mysql will convert the first row to result;
then before sending it to the client, Mysql finds that there is a limit 1000000, so the first row is not the right answer...
Mysql then just go to the 2nd row and convert it to result;
then before sending it to the client, Mysql finds that there is a limit 1000000, so the second row is not the right answer...;
again and again, till it findss the 1000001th row, after converting it to result, it matches the limit 1000000, 1 clase;
so finally, this is the right answer, and send it to the client;
However, it has converted totally 1000000 rows. So in the above question, it's the cost between 'all fields conversion(select *) multiply 1000000 rows' vs. 'one/two field conversion(select id/url) multiply 1000000 rows'. No doubt that the former is far slower than the latter.
Don't know why mysql limit behaives so clumsy, but it just is...

check sql profile,Determine more information
mysql> show profile
2.mysql explain is not very powerful yet.
3.What kind of scene needs limit 10000?

Related

Performance drop upgrading from MySQL 5.7.33 to 8.0.31 - why did it stop using the Index Condition Pushdown Optimization?

I have a table like this (details elided for readability):
CREATE TABLE UserData (
id bigint NOT NULL AUTO_INCREMENT,
userId bigint NOT NULL DEFAULT '0', ...
c6 int NOT NULL DEFAULT '0', ...
hidden int NOT NULL DEFAULT '0', ...
c22 int NOT NULL DEFAULT '0', ...
PRIMARY KEY (id), ...
KEY userId_hidden_c6_c22_idx (userId,hidden,c6,c22), ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3
and was happily doing queries on it like this in MySQL 5.7:
mysql> select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----
| ...
+----
10 rows in set (0.03 sec)
However, in MySQL 8.0 these queries started doing this:
mysql> select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----
| ...
+----
10 rows in set (1.56 sec)
Explain shows the following, 5.7:
mysql> explain select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
| 1 | SIMPLE | UserData | NULL | ref | userId_hidden_c6_c22_idx | userId_hidden_c6_c22_idx | 12 | const,const | 78062 | 100.00 | Using index condition; Using filesort |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+---------------------------------------+
8.0:
mysql> explain select * from UserData use index (userId_hidden_c6_c22_idx) where (userId = 123 AND hidden = 0) order by id DESC limit 10 offset 0;
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
| 1 | SIMPLE | UserData | NULL | ref | userId_hidden_c6_c22_idx | userId_hidden_c6_c22_idx | 12 | const,const | 79298 | 100.00 | Using filesort |
+----+-------------+----------+------------+------+---------------+---------------+---------+-------------+-------+----------+----------------+
The main difference seems to be that 5.7 is Using index condition; Using filesort and 8.0 is only Using filesort.
Why did 8.0 stop using the index condition, and how can I get it to start using it?
EDIT: Why did performance drop 10-100x with MySQL 8.0? It looks like it's because it stopped using the Index Condition Pushdown Optimization - how can I get it to start using it?
The table has ~150M rows in it, and that user has ~75k records, so I guess it could be a change in the size-based heuristics or whatever goes into the MySQL decision making?
In the EXPLAIN you show, the type column is ref and the key column names the index, which indicates it is using that index to optimize the lookup.
You are making an incorrect interpretation of what "index condition" means in the extra column. Admittedly it does sound like "using the index" versus not using the index if that note is absent.
The note about "index condition" is referring to Index Condition Pushdown, which is not related to using the index, but it's about delegating other conditions to be filtered at the storage engine level. Read about it here: https://dev.mysql.com/doc/refman/8.0/en/index-condition-pushdown-optimization.html
It's unfortunate that the notes reported by EXPLAIN are so difficult to understand. You really have to study a lot of documentation to understand how to read those notes.
This would be much faster in either version because it would stop after 10 rows. That is, the "filesort" would be avoided.
INDEX(userId, hidden, id)
This won't do "Using index" (aka "covering"), but neither did your attempts. That is different from "Using index condition" (aka "ICP", as you point out).
Try these to get more insight:
EXPLAIN FORMAT_JSON SELECT ...
EXPLAIN ANALYZE SELECT ...
(No, I cannot explain the regression.)

Trouble optimizing MySQL query containing range expression, order by, and limit

I'm having trouble optimizing what I think is a reasonable / straightforward query in MySQL. After spending a few late nights on this, I thought I'd post my question here as I'm sure the solution will be obvious to somebody.
Here are the columns in a simplified version of my table T:
id: varchar(32) not null (primary key)
timestamp: bigint(20) unsigned not null
family: char(32) not null
size: bigint(20) unsigned not null
And here's the query that I need to optimize:
select
id
from
T
where
family = 'some constant'
and
timestamp between T1 and T2
order by
size desc
limit
5
My table is fairly large (~630M rows) so I'm hoping that an index can do most of the work for me... but I'm having trouble picking the right columns for my index.
It seems that in order for MySQL to use an index to answer a range query (like what I'm doing w/ the timestamp), that column must be the last column in the index. But then there's the "order by", which is on a different column. I'm not sure which one of these columns should be last in my index, so I've tried creating the following indices:
i1, on (family, timestamp, size)
i2, on (family, size, timestamp)
... but neither of these seems to help.
Any idea what I'm doing wrong?
(BTW I'm running MySQL 8 in Amazon RDS, in case that makes a difference.)
Thanks in advance for any helpful suggestions you may have!
EDIT #1 ---------------------------------------
I just created this simplified table that I described above, and copied 10M rows worth of data from the original table to the simplified table, just to keep things clean. Then I ran the following query:
mysql> select
-> id, size
-> from
-> T
-> where
-> family = 'be0bf4a203797729f38c6355b6d80903'
-> and
-> timestamp between 1578460425887 and 1584710866343
-> order by
-> size desc;
... and took 1.27 seconds. I really need this to be faster, otherwise this sort of query (which I need to run many times per second) will take much too long on the real dataset.
Here are the results of an EXPLAIN on the query above:
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
| 1 | SIMPLE | T | NULL | range | idx_family_timestamp_size | idx_family_timestamp_size | 136 | NULL | 178324 | 100.00 | Using where; Using index; Using filesort |
+----+-------------+-------+------------+-------+---------------------------+---------------------------+---------+------+--------+----------+------------------------------------------+
I bet it's the filesort that's killing performance. Any ideas?
EDIT #2 ---------------------------------------
Oops, I just realized that I forgot the LIMIT in EDIT #1's query. I've fixed that here, and also grew T to 100M rows -- so now it's 10x the size that it was in my previous edit.
Now my query takes almost 10 sec. to run, and the results of the EXPLAIN are as follows:
mysql> explain
-> select
-> id, size
-> from
-> T
-> where
-> family = 'be0bf4a203797729f38c6355b6d80903'
-> and
-> timestamp between 1578460425887 and 1584710866343
-> order by
-> size desc
-> limit
-> 5;
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+
| 1 | SIMPLE | T | NULL | range | idx_family_timestamp_size,idx_family_size_timestamp | idx_family_size_timestamp | 144 | NULL | 410744 | 100.00 | Using where; Using index for skip scan; Using filesort |
+----+-------------+-------+------------+-------+-----------------------------------------------------+---------------------------+---------+------+--------+----------+--------------------------------------------------------+

Should I use derived table in this situation?

I need to fetch 10 random rows from a table, the query below will not do it as it is going to be very slow on a large scale (I've read strong arguments against it):
SELECT `title` FROM table1 WHERE id1 = 10527 and id2 = 37821 ORDER BY RAND() LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
SIMPLE | table1 | ref | id1,id2 | id2 | 5 | const| 7 | Using where; Using temporary; Using filesort
I tried the following workaround:
SELECT * FROM
(SELECT `title`, RAND() as n1
FROM table1
WHERE id1 = 10527 and id2 = 37821) TTA
ORDER BY n1 LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | Using filesort |
DERIVED | table1 | ref | id1,id2 | id2 | 5 |const | 7 | Using where |
But I’ve read also couple of statements against using derived tables.
Could you please tell me if the latter query is going to make any improvement?
You should try the first method to see if it works for you. If you have an index on table1(id1, id2) and there are not very many occurrences of any given value pair, then the performance is probably fine for what you want to do.
Your second query is going to have somewhat worse performance than the first. The issue with the performance of order by rand() is not the time taken to calculate random numbers. The issue is the order by, and your second query is basically doing the same thing, with the additional overhead of a derived table.
If you know that there were always at least, say, 1000 matching values, then the following would generally work faster:
SELECT `title`
FROM table1
WHERE id1 = 10527 and id2 = 37821 and rand() < 0.05
ORDER BY RAND()
LIMIT 10;
This would take a random sample of about 5% of the data and with 1,000 matching rows, you would almost always have at least 10 rows to choose from.

Improve SQL statement in mysql to run faster. I want to merge field but merge on different rows

Why does this code take a long time to run?
SELECT
concat((select Sname from member order by rand() limit 1),
Ssurname) as tee
FROM member
But This code is very fast to run
SELECT
concat(Sname,
Ssurname) as tee
FROM member
For every result row returned by your first example, MySQL must produce another row from which to CONCAT() Sname, with a custom (random) order. Because order by rand() is used, the whole table must be reordered randomly for every row in your table. That is likely to be a very expensive operation, since the result of the subquery cannot be cached.
In the second example, a simple rowset is returned. Sname and Ssurname are concatenated from columns in the same row.
I ran an EXPLAIN on a similar query, having one indexed column concatenated against a non-indexed subquery. MySQL is using a temporary table to compute the subqueries.
mysql> EXPLAIN SELECT CONCAT(g_userName, (SELECT g_fullName FROM g2_User ORDER BY RAND() LIMIT 1)) FROM g2_User;
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| 1 | PRIMARY | g2_User | index | NULL | g_userName | 98 | NULL | 5 | Using index |
| 2 | UNCACHEABLE SUBQUERY | g2_User | ALL | NULL | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+

Getting a Column's Max Value

Is there any tangible difference (speed/efficiency) between these statements? Assume the column is indexed.
SELECT MAX(someIntColumn) AS someIntColumn
or
SELECT someIntColumn ORDER BY someIntColumn DESC LIMIT 1
This depends largely on the query optimizer in your SQL implementation. At best, they will have the same performance. Typically, however, the first query is potentially much faster.
The first query essentially asks for the DBMS to inspect every value in someIntColumn and pick the largest one.
The second query asks the DBMS to sort all the values in someIntColumn from largest to smallest and pick the first one. Depending on the number of rows in the table and the existence (or lack thereof) of an index on the column, this could be significantly slower.
If the query optimizer is sophisticated enough to realize that the second query is equivalent to the first one, you are in luck. But if you retarget your app to another DBMS, you might get unexpectedly poor performance.
EDIT based on explain plan:
Explain plan shows that max(column) is more efficient. The explain plan say, “Select tables optimized away”.
EXPLAIN SELECT version from schema_migrations order by version desc limit 1;
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
| 1 | SIMPLE | schema_migrations | index | NULL | unique_schema_migrations | 767 | NULL | 1 | Using index |
+----+-------------+-------------------+-------+---------------+--------------------------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT max(version) FROM schema_migrations ;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)