MySQL stops using index when SQL_CALC_FOUND_ROWS is added - mysql

I have a query which is running far slower than it should. I have distilled the problem down to a simple select statement (some fields have been renamed for privacy):
SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS id, date_started, date_complete, status
FROM table_a
ORDER BY date DESC
LIMIT 0, 100
When SQL_CALC_FOUND_ROWS is used then query completes in about 0.70 seconds, however when SQL_CALC_FOUND_ROWS is removed then the query completes in about 0.0005 seconds (in both cases SQL_NO_CACHE is used in the query).
table_a has an index on the date field.
Apparently SQL_CALC_FOUND_ROWS can prevent an index from being used:
So, obvious conclusion from this simple test is: when we have
appropriate indexes for WHERE/ORDER clause in our query, it is much
faster to use two separate queries instead of one with
SQL_CALC_FOUND_ROWS.
I have confirmed this. No index is used when SQL_CALC_FOUND_ROWS is included:
EXPLAIN SELECT SQL_NO_CACHE SQL_CALC_FOUND_ROWS id, date_started, date_complete, status FROM table_a ORDER BY date DESC limit 0, 100;
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
| 1 | SIMPLE | table_a | ALL | NULL | NULL | NULL | NULL | 132208 | Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+--------+----------------+
But when SQL_CALC_FOUND_ROWS is not used then the index on the date field is used:
EXPLAIN SELECT SQL_NO_CACHE id, date_started, date_complete, status FROM table_a ORDER BY date DESC limit 0, 100;
+----+-------------+-------------+-------+---------------+------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+------+---------+------+--------+-------+
| 1 | SIMPLE | table_a | index | NULL | date | 13 | NULL | 132208 | |
+----+-------------+-------------+-------+---------------+------+---------+------+--------+-------+
Is there any way to speed the query up without removing SQL_CALC_FOUND_ROWS from the query?
I'm using MySQL version 5.0.51a-3ubuntu5.1-log.

Related

Why does an indexed mysql query filtered on less char values result in more rows examined?

When I run the following query, I see the expected rows examined as 40
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action in ('R','C')
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 40 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
But when I run this query that simply changes AND s.period_end_action in ('R','C') to AND s.period_end_action = 'C', I see the expected rows examined as 611
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action = 'C'
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 611 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
I have the following indexes on the subscription table:
_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
INDEX(status, period_end_action),
INDEX(current_period_end_date),
Any ideas? I don't understand why removing one of the period_end_action values would cause such a large increase in rows examined?
(I agree with others that EXPLAIN often has terrible row estimates.)
Actually the numbers might be reasonable (though I doubt it). The optimizer decided to do a table scan in both cases. And the query with fewer options for period_end_action probably has to scan farther to get the 20 rows. This is because it punted on using either of your secondary indexes.
These indexes are more likely to help your second query:
INDEX(period_end_action, _id)
INDEX(period_end_action, status)
INDEX(period_end_action, current_period_end_date)
The optimal index is usually starts with any columns tested by =.
Since there is no such thing for your first query, the Optimizer probably decided to scan in _id order so that it could avoid the "sort" mandated by ORDER BY.

How to make Mysql use index for selects with unary condition in 'where'

I have a query in Ruby on Rails application with a strange unary condition in where:
SELECT * FROM messages WHERE (active) ORDER BY id DESC;
I didn't even know that such conditions are allowed and can't find documentation describing this syntax anywhere. Experiments show that this is some kind of an equivalent to
SELECT * FROM messages WHERE active!=0 ORDER BY id DESC;
The problem is that Mysql uses index for the second variany only:
mysql> explain SELECT * FROM messages WHERE (active) ORDER BY id DESC;
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | messages | ALL | NULL | NULL | NULL | NULL | 560646 | Using where; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
mysql> explain SELECT * FROM messages WHERE active!=0 ORDER BY id DESC;
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | messages | range | active_id,active | active | 2 | NULL | 1394 | Using index condition; Using filesort |
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
I can't change the query text since, as it was explained to me, the application generates queries on the fly and they are not stored anywhere. So my questions are:
Do I understand the meaning of this unary clause correctly?
Why such queries don't use indices?
Is it possible to make Mysql to use an index on this one without changing the query text?
You are right, both clauses should have the same result (assuming an int type).
The server code doesn't appear to recognise the equivalence. Testing on MySQL-8.0 and the same query plan exists. MariaDB-10.2, and 10.3 appear to both use an index for both cases.
No.
If the range of values in active is 0 or 1; A SELECT * FROM messages WHERE active=1 ORDER BY id DESC query will be able to use the index for ordering (hence no filesort), if id is a primary key;

mysql time for select not same as real lines

I met a unexpect result in my mysql server.
the lines more , the query time less??
I have one table and for the total rows for each filter:
select count(*) from tcr where eid=648;
+----------+
| count(*) |
+----------+
| 11336 |
select count(*) from tcr where eid=997;
+----------+
| count(*) |
+----------+
| 1262307 |
but the query time is oppisite to the total lines for each filter:
select * from tcr where eid=648 order by start_time desc limit 0,10;
[data display]
10 rows in set (16.92 sec)
select * from tcr where eid=997 order by start_time desc limit 0,10;
[data display]
10 rows in set (0.21 sec)
"reset query cache" has been execute before every query sql.
the index of table tcr is
KEY `cridx_eid` (`eid`),
KEY `cridx_start_time` (`start_time`)
BTW:attach the explain result: this is very strange, but it looks more like the reuslt we take.(the eid=997 has less lines than eid=648
explain select * from talk_call_record where eid=648 order by start_time desc limit 0,10;
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | talk_call_record | index | cridx_eid | cridx_start_time | 5 | NULL | 3672 | Using where |
explain select * from talk_call_record where eid=997 order by start_time desc limit 0,10;
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------+-------+---------------+------------------+---------+------+------+-------------+
| 1 | SIMPLE | talk_call_record | index | cridx_eid | cridx_start_time | 5 | NULL | 32 | Using where |
First, you must have a very large table.
MySQL is using the index on start_time for the queries. What is happening is that it is "walking" through the table, one row at a time. It happens to find eid=997 much more quickly than it finds eid=648. It only has to find 10 records, so the engine stops when it gets to the 10th one.
What can you do? The optimal index for the query is a composite index on (eid, start_time). This will go directly to the values that you want.

Should I use derived table in this situation?

I need to fetch 10 random rows from a table, the query below will not do it as it is going to be very slow on a large scale (I've read strong arguments against it):
SELECT `title` FROM table1 WHERE id1 = 10527 and id2 = 37821 ORDER BY RAND() LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
SIMPLE | table1 | ref | id1,id2 | id2 | 5 | const| 7 | Using where; Using temporary; Using filesort
I tried the following workaround:
SELECT * FROM
(SELECT `title`, RAND() as n1
FROM table1
WHERE id1 = 10527 and id2 = 37821) TTA
ORDER BY n1 LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | Using filesort |
DERIVED | table1 | ref | id1,id2 | id2 | 5 |const | 7 | Using where |
But I’ve read also couple of statements against using derived tables.
Could you please tell me if the latter query is going to make any improvement?
You should try the first method to see if it works for you. If you have an index on table1(id1, id2) and there are not very many occurrences of any given value pair, then the performance is probably fine for what you want to do.
Your second query is going to have somewhat worse performance than the first. The issue with the performance of order by rand() is not the time taken to calculate random numbers. The issue is the order by, and your second query is basically doing the same thing, with the additional overhead of a derived table.
If you know that there were always at least, say, 1000 matching values, then the following would generally work faster:
SELECT `title`
FROM table1
WHERE id1 = 10527 and id2 = 37821 and rand() < 0.05
ORDER BY RAND()
LIMIT 10;
This would take a random sample of about 5% of the data and with 1,000 matching rows, you would almost always have at least 10 rows to choose from.

Improve SQL statement in mysql to run faster. I want to merge field but merge on different rows

Why does this code take a long time to run?
SELECT
concat((select Sname from member order by rand() limit 1),
Ssurname) as tee
FROM member
But This code is very fast to run
SELECT
concat(Sname,
Ssurname) as tee
FROM member
For every result row returned by your first example, MySQL must produce another row from which to CONCAT() Sname, with a custom (random) order. Because order by rand() is used, the whole table must be reordered randomly for every row in your table. That is likely to be a very expensive operation, since the result of the subquery cannot be cached.
In the second example, a simple rowset is returned. Sname and Ssurname are concatenated from columns in the same row.
I ran an EXPLAIN on a similar query, having one indexed column concatenated against a non-indexed subquery. MySQL is using a temporary table to compute the subqueries.
mysql> EXPLAIN SELECT CONCAT(g_userName, (SELECT g_fullName FROM g2_User ORDER BY RAND() LIMIT 1)) FROM g2_User;
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| 1 | PRIMARY | g2_User | index | NULL | g_userName | 98 | NULL | 5 | Using index |
| 2 | UNCACHEABLE SUBQUERY | g2_User | ALL | NULL | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+