Should I use derived table in this situation? - mysql

I need to fetch 10 random rows from a table, the query below will not do it as it is going to be very slow on a large scale (I've read strong arguments against it):
SELECT `title` FROM table1 WHERE id1 = 10527 and id2 = 37821 ORDER BY RAND() LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
SIMPLE | table1 | ref | id1,id2 | id2 | 5 | const| 7 | Using where; Using temporary; Using filesort
I tried the following workaround:
SELECT * FROM
(SELECT `title`, RAND() as n1
FROM table1
WHERE id1 = 10527 and id2 = 37821) TTA
ORDER BY n1 LIMIT 10;
EXPLAIN:
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
------------+-------------+------+---------------+-------+---------+------+------+----------------+
PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | Using filesort |
DERIVED | table1 | ref | id1,id2 | id2 | 5 |const | 7 | Using where |
But I’ve read also couple of statements against using derived tables.
Could you please tell me if the latter query is going to make any improvement?

You should try the first method to see if it works for you. If you have an index on table1(id1, id2) and there are not very many occurrences of any given value pair, then the performance is probably fine for what you want to do.
Your second query is going to have somewhat worse performance than the first. The issue with the performance of order by rand() is not the time taken to calculate random numbers. The issue is the order by, and your second query is basically doing the same thing, with the additional overhead of a derived table.
If you know that there were always at least, say, 1000 matching values, then the following would generally work faster:
SELECT `title`
FROM table1
WHERE id1 = 10527 and id2 = 37821 and rand() < 0.05
ORDER BY RAND()
LIMIT 10;
This would take a random sample of about 5% of the data and with 1,000 matching rows, you would almost always have at least 10 rows to choose from.

Related

Why does an indexed mysql query filtered on less char values result in more rows examined?

When I run the following query, I see the expected rows examined as 40
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action in ('R','C')
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 40 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
But when I run this query that simply changes AND s.period_end_action in ('R','C') to AND s.period_end_action = 'C', I see the expected rows examined as 611
EXPLAIN SELECT s.* FROM subscription s
WHERE s.current_period_end_date <= NOW()
AND s.status in ('A', 'T')
AND s.period_end_action = 'C'
ORDER BY s._id ASC limit 20;
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | s | index | status,current_period_end_date | PRIMARY | 4 | NULL | 611 | Using where |
+----+-------------+-------+-------+--------------------------------+---------+---------+------+------+-------------+
I have the following indexes on the subscription table:
_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
INDEX(status, period_end_action),
INDEX(current_period_end_date),
Any ideas? I don't understand why removing one of the period_end_action values would cause such a large increase in rows examined?
(I agree with others that EXPLAIN often has terrible row estimates.)
Actually the numbers might be reasonable (though I doubt it). The optimizer decided to do a table scan in both cases. And the query with fewer options for period_end_action probably has to scan farther to get the 20 rows. This is because it punted on using either of your secondary indexes.
These indexes are more likely to help your second query:
INDEX(period_end_action, _id)
INDEX(period_end_action, status)
INDEX(period_end_action, current_period_end_date)
The optimal index is usually starts with any columns tested by =.
Since there is no such thing for your first query, the Optimizer probably decided to scan in _id order so that it could avoid the "sort" mandated by ORDER BY.

How to make Mysql use index for selects with unary condition in 'where'

I have a query in Ruby on Rails application with a strange unary condition in where:
SELECT * FROM messages WHERE (active) ORDER BY id DESC;
I didn't even know that such conditions are allowed and can't find documentation describing this syntax anywhere. Experiments show that this is some kind of an equivalent to
SELECT * FROM messages WHERE active!=0 ORDER BY id DESC;
The problem is that Mysql uses index for the second variany only:
mysql> explain SELECT * FROM messages WHERE (active) ORDER BY id DESC;
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
| 1 | SIMPLE | messages | ALL | NULL | NULL | NULL | NULL | 560646 | Using where; Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+--------+-----------------------------+
mysql> explain SELECT * FROM messages WHERE active!=0 ORDER BY id DESC;
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
| 1 | SIMPLE | messages | range | active_id,active | active | 2 | NULL | 1394 | Using index condition; Using filesort |
+----+-------------+----------+-------+------------------+--------+---------+------+------+---------------------------------------+
I can't change the query text since, as it was explained to me, the application generates queries on the fly and they are not stored anywhere. So my questions are:
Do I understand the meaning of this unary clause correctly?
Why such queries don't use indices?
Is it possible to make Mysql to use an index on this one without changing the query text?
You are right, both clauses should have the same result (assuming an int type).
The server code doesn't appear to recognise the equivalence. Testing on MySQL-8.0 and the same query plan exists. MariaDB-10.2, and 10.3 appear to both use an index for both cases.
No.
If the range of values in active is 0 or 1; A SELECT * FROM messages WHERE active=1 ORDER BY id DESC query will be able to use the index for ordering (hence no filesort), if id is a primary key;

mysql query optimization strategy on single table covering index

currently i am working on mysql query optimization . My mysql table contain 200 million records.
After doing lots of google i finally decide to use covering index.
So, i made a index in this order
alter table table_name add index
index_name(MODEL, COUNTRY, REGION, NETWORK, E_TUAL,
ECHS, DEVID, COUNTRY_CODE, SOURCE);
when i run this query efficiency is good as compare to previous
SELECT E_TUAL,ECHS, DEVID, MODEL, COUNTRY, REGION, COUNTRY_CODE, NETWORK, SOURCE
FROM table_name
WHERE model = 'fox | s453' AND country = 'india' AND
E_TUAL <= '1435755600' AND
E_TUAL >= '1433163600'
ORDER BY E_TUAL DESC
LIMIT 101 OFFSET 0;
+----+-------------+-----------+------+---------------+--------+---------+-------------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+--------+---------+-------------+------+-----------------------------+
| 1 | SIMPLE | mytable | ref | genrl | genrl | 131 | const | 239 | Using where; Using filesort |
+----+-------------+-----------+------+---------------+--------+---------+-------------+------+-----------------------------+
But This query is worst compare to older one (before index)
SELECT E_TUAL,ECHS, DEVID, MODEL,COUNTRY, REGION, COUNTRY_CODE, NETWORK, SOURCE
FROM table_name
WHERE model = 'fox | s453' AND
country is not null AND
E_TUAL <= '1435755600' AND
E_TUAL >= '1433163600'
ORDER BY E_TUAL DESC
LIMIT 101 OFFSET 0;
+----+-------------+-----------+-------+---------------+--------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+-------+---------------+--------+---------+------+------+-----------------------------+
| 1 | SIMPLE | mytable | range | genrl | genrl | 131 | NULL | 1105 | Using where; Using filesort |
+----+-------------+-----------+-------+---------------+--------+---------+------+------+-----------------------------+
i write country is not null to maintain order so mysql optimizer use index
plz help how to improve efficiency when some of column field is absent and
to make use of index what can i do ?
I am not good in english. so for any mistake i am sorry
Dispense with the covering index and use two separate indexes:
alter table table_name add index index_name(MODEL, COUNTRY, E_TUAL);
for the first query. And:
alter table table_name add index index_name(MODEL, E_TUAL);
A covering index might provide marginal improvement, but it will use a lot of space. Instead, focus on the WHERE clauses and then the ORDER BY if you can use those for the index.
As a note: your ordering of the columns was not optimal for either query.

Why does this MySQL query not use the index properly?

Sorry that this is such a specific and probably cliche question, but it is really causing me major problems.
Everyday I have to do several hundred thousands select statements that look like these two (this is one example but they're all pretty much the same just with different word1):
SELECT pibn,COUNT(*) AS aaa FROM research_storage1
USE INDEX (word21pibn)
WHERE word1=270299 AND word2=0
GROUP BY pibn
ORDER BY aaa DESC
LIMIT 1000;
SELECT pibn,page FROM research_storage1
USE INDEX (word12num)
WHERE word1=270299 AND word2=0
ORDER BY num DESC
LIMIT 1000;
The first statement is quick-as-a-flash and takes a fraction of a second. The second statement takes about 2 seconds, which is way too long considering I have hundreds of thousands to do.
The indexes are:
word21pibn: word2, word1, pibn
word12num: word1, word2, num
The results of explain (for both extended and partitions are):
mysql> explain extended SELECT pibn,COUNT(*) AS aaa FROM research_storage1 USE INDEX (word21pibn) WHERE word1=270299 AND word2=0 GROUP BY pibn ORDER BY aaa DESC LIMIT 1000;
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | research_storage1 | ref | word21pibn | word21pibn | 6 | const,const | 1549 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------+---------------+------------+---------+-------------+------+----------+-----------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain partitions SELECT pibn,COUNT(*) AS aaa FROM research_storage1 USE INDEX (word21pibn) WHERE word1=270299 AND word2=0 GROUP BY pibn ORDER BY aaa DESC LIMIT 1000;
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | research_storage1 | p99 | ref | word21pibn | word21pibn | 6 | const,const | 1549 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+-------------------+------------+------+---------------+------------+---------+-------------+------+-----------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain extended SELECT pibn,page FROM research_storage1 USE INDEX (word12num) WHERE word1=270299 AND word2=0 ORDER BY num DESC LIMIT 1000;
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | research_storage1 | ref | word12num | word12num | 6 | const,const | 818 | 100.00 | Using where |
+----+-------------+-------------------+------+---------------+-----------+---------+-------------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain partitions SELECT pibn,page FROM research_storage1 USE INDEX (word12num) WHERE word1=270299 AND word2=0 ORDER BY num DESC LIMIT 1000;
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
| 1 | SIMPLE | research_storage1 | p99 | ref | word12num | word12num | 6 | const,const | 818 | Using where |
+----+-------------+-------------------+------------+------+---------------+-----------+---------+-------------+------+-------------+
1 row in set (0.00 sec)
The only difference I see is that the second statement does not have Using index in the extra column of describe. Though this does not make sense because the index was designed for that statement, so I don't see why it would not be used.
Any idea?
Try adding the pbin and page column to the word12num compound index. Then all the information you need for your query will be in the index, like it is in your first query.
Edit I missed the pbin column you're selecting; sorry about that.
If your compound index turns out to contain (word1, word2, num, pbin, page) then everything in your second query can come from the index.
If you look at the Extra column under your first query's EXPLAIN, one of the blurbs in there is Using index. #sebas pointed this out. This means, actually, Using index only. This means the server can satisfy your query by just consulting the index without having to consult the table. That's why it is so fast: the server doesn't have to bang the disk heads around random-accessing the table to get the extra columns. Using index is not present in your second query's EXPLAIN.
The columns mentioned in WHERE come first. Then we have the columns in ORDER BY. Finally we have the columns you're simply SELECTing. Why use this particular order for columns in the index? The server finds its way to the first index entry matching the SELECT, then can read the index sequentially to satisfy the query.
It is indeed expensive to construct and maintain a compound index on a big table. You are looking at a basic tradeoff in DBMS design: do you want to spend time constructing the table or looking things up in it? Only you know whether it's better to incur the cost when building the table or when looking things up in it.

Improve SQL statement in mysql to run faster. I want to merge field but merge on different rows

Why does this code take a long time to run?
SELECT
concat((select Sname from member order by rand() limit 1),
Ssurname) as tee
FROM member
But This code is very fast to run
SELECT
concat(Sname,
Ssurname) as tee
FROM member
For every result row returned by your first example, MySQL must produce another row from which to CONCAT() Sname, with a custom (random) order. Because order by rand() is used, the whole table must be reordered randomly for every row in your table. That is likely to be a very expensive operation, since the result of the subquery cannot be cached.
In the second example, a simple rowset is returned. Sname and Ssurname are concatenated from columns in the same row.
I ran an EXPLAIN on a similar query, having one indexed column concatenated against a non-indexed subquery. MySQL is using a temporary table to compute the subqueries.
mysql> EXPLAIN SELECT CONCAT(g_userName, (SELECT g_fullName FROM g2_User ORDER BY RAND() LIMIT 1)) FROM g2_User;
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| 1 | PRIMARY | g2_User | index | NULL | g_userName | 98 | NULL | 5 | Using index |
| 2 | UNCACHEABLE SUBQUERY | g2_User | ALL | NULL | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+