MySQL optimisation needs explaination - mysql

I needed to get values from the "latest" (i.e. highest record id) record for each value of a field (server_name in this case).
I had already added a server_name_id index on server_name and id.
My first attempt took minutes to run.
SELECT server_name, state
FROM replication_client as a
WHERE id = (
SELECT MAX(id)
FROM replication_client
WHERE server_name = a.server_name)
ORDER BY server_name
My second attempt took 0.001s to run.
SELECT rep.server_name, state FROM (
SELECT server_name, MAX(id) AS max_id
FROM replication_client
GROUP BY server_name) AS newest,
replication_client AS rep
WHERE rep.id = newest.max_id
ORDER BY server_name
What is the principle behind this optimisation? (I'd like to be able to write optimised queries without trial and error.)
P.S. Explained below:
mysql> EXPLAIN
->
-> SELECT server_name, state
-> FROM replication_client as a
-> WHERE id = (SELECT MAX(id) FROM replication_client WHERE server_name = a.server_name)
-> ORDER BY server_name
-> ;
+----+--------------------+--------------------+------+----------------+----------------+---------+-------------------+--------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------------+------+----------------+----------------+---------+-------------------+--------+-----------------------------+
| 1 | PRIMARY | a | ALL | NULL | NULL | NULL | NULL | 630711 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | replication_client | ref | server_name_id | server_name_id | 18 | mrg.a.server_name | 45050 | Using index |
+----+--------------------+--------------------+------+----------------+----------------+---------+-------------------+--------+-----------------------------+
mysql> explain
-> SELECT rep.server_name, state FROM (
-> SELECT server_name, MAX(id) AS max_id
-> FROM replication_client
-> GROUP BY server_name) AS newest,
-> replication_client AS rep
-> WHERE rep.id = newest.max_id
-> ORDER BY server_name
-> ;
+----+-------------+--------------------+--------+---------------+----------------+---------+---------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+--------+---------------+----------------+---------+---------------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | Using temporary; Using filesort |
| 1 | PRIMARY | rep | eq_ref | PRIMARY | PRIMARY | 4 | newest.max_id | 1 | |
| 2 | DERIVED | replication_client | range | NULL | server_name_id | 18 | NULL | 15 | Using index for group-by |
+----+-------------+--------------------+--------+---------------+----------------+---------+---------------+------+---------------------------------+

Well, the whole thing is quite self-explaining, when you look at two words in your first explain plan: DEPENDENT SUBQUERY
This means, that for every row, your where condition examines, the subquery is executed. Of course this can be slow as hell.
Also note, that there's an order of operations when executing a query.
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
ORDER BY clause
SELECT clause
When you can filter in FROM clause, it's better than filtering in WHERE clause...

Related

index not used in subquery where in clause

mysql version is 5.5.40-0+wheezy1-log
I have this query:
SELECT cycle_id, sum(fst_field) + sum(snd_field) AS tot_sum
FROM mytable WHERE parent_id IN (
SELECT id FROM mytable WHERE cycle_id = 2662
)
I have these indexes:
parent_id
parent_id, cycle_id, fst_field, snd_field
If I execute the command
EXPLAIN EXTENDED SELECT cycle_id, sum(fst_field) + sum(snd_field) AS tot_sum
FROM mytable WHERE parent_id IN (
SELECT id FROM mytable WHERE cycle_id = 2662
)
This is the result:
+----+--------------------+-----------+-----------------+----------------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-----------+-----------------+----------------------+---------+---------+------+--------+----------+-------------+
| 1 | PRIMARY | mytable | ALL | NULL | NULL | NULL | NULL | 185971 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | mytable | unique_subquery | PRIMARY,cycle_id_idx | PRIMARY | 4 | func | 1 | 100.00 | Using where |
+----+--------------------+-----------+-----------------+----------------------+---------+---------+------+--------+----------+-------------+
It does not use any index. I tried to add other composed indexes (i tried several), without success.
I don't remember if 5.5 still had a very crude handling of IN ( SELECT ... ). If so, that would probably explain the problem
Consider upgrading to 5.6 or 5.7 or 8.0.
Convert the query to use a JOIN.
INDEX(cycle_id) is needed.

About MySQL's Leftmost Prefix Matching Optimization

I now have a table like this:
> DESC userInfo;
+--------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | char(32) | NO | MUL | NULL | |
| age | tinyint(3) unsigned | NO | | NULL | |
| gender | tinyint(1) | NO | | 1 | |
+--------+---------------------+------+-----+---------+----------------+
I made (name, age) a joint unique index:
> SHOW INDEX FROM userInfo;
+----------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+--------------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+--------------------+
| userInfo | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | |
| userInfo | 0 | joint_unique_index | 1 | name | A | 0 | NULL | NULL | | BTREE | | 联合唯一索引 |
| userInfo | 0 | joint_unique_index | 2 | age | A | 0 | NULL | NULL | | BTREE | | 联合唯一索引 |
+----------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+--------------------+
3 rows in set (0.00 sec)
Now, when I use the following query statement, its type is All:
> DESC SELECT * FROM userInfo WHERE age = 18;
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | userInfo | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+------+---------+------+------+----------+-------------+
I can understand this behavior, because according to the leftmost prefix matching feature, age will not be used as an index column when querying.
But when I use the following statement to query, its type is Index:
> DESC SELECT name, age FROM userInfo WHERE age = 18;
+----+-------------+----------+------------+-------+---------------+--------------------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+--------------------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | userInfo | NULL | index | NULL | joint_unique_index | 132 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+----------+------------+-------+---------------+--------------------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
I can't understand how this result is produced. According to Example 1, the age as the query condition does not satisfy the leftmost prefix matching feature, but from the results, its type is actually Index! Is this an optimization in MySQL?
When I try to make sure I use indexed columns as query conditions, their type is always ref, as shown below:
> DESC SELECT * FROM userInfo WHERE name = "Jack";
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | userInfo | NULL | ref | joint_unique_index | joint_unique_index | 128 | const | 1 | 100.00 | NULL |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
> DESC SELECT name, age FROM userInfo WHERE name = "Jack";
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------------+
| 1 | SIMPLE | userInfo | NULL | ref | joint_unique_index | joint_unique_index | 128 | const | 1 | 100.00 | Using index |
+----+-------------+----------+------------+------+--------------------+--------------------+---------+-------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
Please tell me why when I use age as a query, the first result is ALL, but the second result is INDEX. Is this the result of MySQL optimization?
In other words, when SELECT * is used, index column queries are not applied, but when SELECT joint_col1, joint_col2 FROM joint_col2 are used, index column queries (because type is INDEX) are used. Why does this difference occur?
Simplifying a bit, an index (name, age) is basically the same as if you had another table (name, age, id) with a copy of those values. The primary key is (for InnoDB) included for technical reasons - MySQL uses it to find the full row in the original table.
So you can basically think of it as if you have 2 tables: (id, name, age, gender) and (name, age, id), both with the same amount of rows. And both have the ability to jump to/skip specific rows if you provide the leftmost columns.
If you do
SELECT * FROM userInfo WHERE age = 18;
MySQL has to read, as you expected, every row of the table, as there is no way to find rows with age = 18 faster - just as you concluded, there is no index with age as the leftmost column.
If you do
SELECT name, age FROM userInfo WHERE age = 18;
the situation doesn't change a lot: MySQL will also have to read every row, and still cannot use the index on (name, age) to limit the number of rows it has to read.
But MySQL can use a trick: since you only need the columns name and age, it can read all rows from the index-"table" and still have all information it needs, as the index is a covering index (it covers all required columns).
Why would MySQL do that? Because it has to read less absolute data than reading the complete table: the index stores the information you want in less bytes (as it doesn't include gender). Reading less data to get all the information you need is better/faster than reading more data to get the same information. So MySQL will do just that.
But to emphasize it: your query still has to read all rows, it is still basically a full table scan ("ALL") - just on a "table" (the index) with less columns, to save some bytes. While you won't notice a difference with one tinyint column, if your table has a lot of or large columns, it's actually a relevant speedup.
The "leftmost" rule applies to the WHERE clause versus the INDEX.
INDEX(name, age) is useful for WHERE name = '...' or WHERE name = '...' AND ((anything else)) because name is leftmost in the index.
What you have is WHERE age = ... ((and nothing else)), so you need INDEX(age) (or INDEX(age, ...)).
In particular, SELECT name, age FROM userInfo WHERE age = 18;:
INDEX(age) -- good
INDEX(age, name) -- better because it is "covering".
The order of columns in the WHERE does not matter; the order in the INDEX does matter.

Optimize MySQL Sub-Query

Is there a way this query can be optimized? It looks redundant:
SELECT
SUM((SELECT
IFNULL(SUM(trx.totalAmount), 0)
FROM trx
WHERE
FIND_IN_SET (trx.clientOrderId, "B6A8DB9568,6E7705B487,59C4D4234D,1D9CD4EF96,4C373E8CDE,E818BEE48F,6610555669,ECF388E288,32FD93075C,B03417425B,18FD77061A,1C39E4BD04,C92B970E55,0920F06DFA,EEFB4AAADA,FC2D9FF9AD") > 0
AND trx.txnType IN ('REFUND', 'VOID')
)) as refunds,
SUM((SELECT
IFNULL(SUM(trx.totalAmount), 0)
FROM trx
WHERE
FIND_IN_SET (trx.clientOrderId, "B6A8DB9568,6E7705B487,59C4D4234D,1D9CD4EF96,4C373E8CDE,E818BEE48F,6610555669,ECF388E288,32FD93075C,B03417425B,18FD77061A,1C39E4BD04,C92B970E55,0920F06DFA,EEFB4AAADA,FC2D9FF9AD") > 0
AND trx.txnType = 'SALE'
AND trx.billingCycleNumber != 1
)) AS lifetimeRevenue
Pleas note that this is just a part of the query and there are like 10 more of those on the original query so really needs to know if it can be optimized.
Thank guys.
The problem with using subqueries like that is that each subquery has to scan the full table. Also using FIND_IN_SET() the way you are using it forces a full table-scan even if you have indexes. So you are doing 12 full table-scans.
Here's a solution that does not use subqueries at all. It scans the table for the matching clientOrderId values once, to get a superset of all the rows that match any of the txTypes you need.
Then each sum of the totalAmount is conditional, if the txnType is one of certain types, otherwise use zero for each row's totalAmount, and zero contributes nothing to the sum, so it's as if you had skipped the rows with non-matching txnType.
SELECT
SUM(IF(trx.txnType IN ('REFUND', 'VOID'), trx.totalAmount, 0)) AS refunds,
SUM(IF(trx.txnType = 'SALE' AND trx.billingCycleNumber != 1, trx.totalAmount, 0)) AS lifetimeRevenue
FROM trx
WHERE trx.clientOrderId IN (
'B6A8DB9568', '6E7705B487', '59C4D4234D', '1D9CD4EF96',
'4C373E8CDE', 'E818BEE48F', '6610555669', 'ECF388E288',
'32FD93075C', 'B03417425B', '18FD77061A', '1C39E4BD04',
'C92B970E55', '0920F06DFA', 'EEFB4AAADA', 'FC2D9FF9AD')
AND trx.txnType IN ('REFUND', 'VOID', 'SALE');
You should have an index on (clientOrderId) for this query. Since you have two IN() predicates, the WHERE clause will only use the index for the first column in the index anyway.
Don't use a FIND_IN_SET() expression, because it won't use an index for the WHERE clause.
You said there are 10 more terms in the query. So I anticipate that there are some different types of expressions in those terms. I'm not going to answer any "but what if the next terms look like something different...". I have shown you the method of unraveling the subquery into one single-pass query. Applying it to other terms in your query is up to you.
Here's a demo I tested:
create table trx (
clientOrderId char(10),
txnType enum('REFUND','VOID','SALE'),
totalAmount numeric(9,2),
billingCycleNumber int default 0,
key (clientOrderId)
);
+---------------+---------+-------------+--------------------+
| clientOrderId | txnType | totalAmount | billingCycleNumber |
+---------------+---------+-------------+--------------------+
| B6A8DB9568 | REFUND | 42.00 | 0 |
| 59C4D4234D | SALE | 84.00 | 0 |
+---------------+---------+-------------+--------------------+
Here's the EXPLAIN for your query:
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | SUBQUERY | trx | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
| 2 | SUBQUERY | trx | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
Notice one subquery for each term, and each one does "type=All" as its table access.
Here's the EXPLAIN for my query:
+----+-------------+-------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------------+
| 1 | SIMPLE | trx | NULL | range | clientOrderId | clientOrderId | 11 | NULL | 16 | 50.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------------+
One simple table access, using an index.
The result from both your query and my query given the example data I tried:
+---------+-----------------+
| refunds | lifetimeRevenue |
+---------+-----------------+
| 42.00 | 84.00 |
+---------+-----------------+

MySQL Indexing - In vs. Equals indexing issues

Following queries run quite fast and instantaneously on mysql server:
SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
SELECT table_name.id
from table_name
where table_name.id = (SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
);
But if I change the second query to as following, then it takes more than 20 seconds:
SELECT table_name.id
from table_name
where table_name.id in (SELECT table_name.id
FROM table_name
WHERE table_name.id in (10000)
);
On doing explain, I get the following output. It is clear that there are some issues regarding how MySQL indexes the data, and use in keyword.
For first query:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
For second query:
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
| 1 | PRIMARY | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
| 2 | SUBQUERY | table_name | const | PRIMARY | PRIMARY | 4 | | 1 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+-------+------+-------------+
For third query:
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
| id | select_type | table_name | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
| 1 | PRIMARY | table_name | index | NULL | sentTo | 5 | NULL | 6250751 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | table_name | const | PRIMARY | PRIMARY | 4 | const | 1 | Using index |
+----+--------------------+------------+-------+---------------+---------+---------+-------+---------+--------------------------+
I am using InnoDB and have tried changing the third query to forcibly use the index as indicated by the following category.
In first case you have only first record from subquery (It runs once, because equals is only for first value)
In second query you got Cartesian multiplication (each per each) because IN runs subquery for each row. Which is not good for performance
Try to use joins for these cases.

MySQL using Filesort and Query is very slow?

I have a query :
SELECT listings.*, listingagents.agentid
FROM listings
LEFT JOIN listingagents ON (listingagents.id = listings.listingagentid)
LEFT JOIN ignore ON (ignore.system_key = listings.listingid)
WHERE ignore.id IS NULL
ORDER BY listings.id ASC
I am trying to improve the performance of this query since it is very slow and it is putting a heavy load on the MySQL server.
When I do a mysql explain, output shows :
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
| 1 | SIMPLE | listings | ALL | NULL | NULL | NULL | NULL | 383360 | Using filesort |
| 1 | SIMPLE | listingagents | eq_ref | PRIMARY | PRIMARY | 4 | db.listings.listingagen... | 1 | |
| 1 | SIMPLE | ignore | ref | system_key | system_key | 1 | const | 404 | Using where; Not exists |
+--------+-------------+---------------+--------+---------------+------------+---------+----------------------------+--------+-------------------------+
I tried to do a simple query:
SELECT listings.*
FROM listings
ORDER BY listings.id ASC
And that query also have "Using filesort;".
The fields "listings.id", "listingagents.id" and "ignore.id" are Primary Keys
The fields "listingagents.id" and "ignore.system_key" have indexes.
What can I do to improve the 1st query?
try to decrease listings range (currently 383360 rows) by adding some condition. e.g. id > x or limit.