Slow join with order query - mysql

I have a problem with the speed of query. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using where; Using index; Using temporary; Using filesort
Slow query:
select
distinct(`books`.`id`)
from `books`
join `books_genres` on `books_genres`.`book_id` = `books`.`id`
where
`books`.`is_status` = 'active' and `books`.`master_book` = 'true'
and `books_genres`.`genre_id` in(380,381,384,385,1359)
order by
`books`.`livelib_read_num` DESC, `books`.`id` DESC
limit 0,25
#25 rows (0.319 s)
But if I remove order statement from query it is really fast:
select sql_no_cache
distinct(`books`.`id`)
from `books`
join `books_genres` on `books_genres`.`book_id` = `books`.`id`
where
`books`.`is_status` = 'active' and `books`.`master_book` = 'true'
and `books_genres`.`genre_id` in(380,381,384,385,1359)
limit 0,25
#25 rows (0.005 s)
Explain:
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | books_genres | range | book_id,categorie_id,book_id2,genre_id_book_id | genre_id_book_id | 10 | NULL | 194890 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | books | eq_ref | PRIMARY,is_status,master_book,is_status_master_book,is_status_master_book_indexed,is_status_donor_no_ru_master_book | PRIMARY | 4 | knigogid3.books_genres.book_id | 1 | Using where |
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
2 rows in set (0.00 sec)
My tables:
CREATE TABLE `books_genres` (
`book_id` int(11) DEFAULT NULL,
`genre_id` int(11) DEFAULT NULL,
`sort` tinyint(4) DEFAULT NULL,
UNIQUE KEY `book_id` (`book_id`,`genre_id`),
KEY `categorie_id` (`genre_id`),
KEY `sort` (`sort`),
KEY `book_id2` (`book_id`),
KEY `genre_id_book_id` (`genre_id`,`book_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`is_status` enum('active','parser','incorrect','extremist','delete','fulldeteled') NOT NULL DEFAULT 'active',
`livelib_book_id` int(11) DEFAULT NULL,
`master_book` enum('true','false') DEFAULT 'true'
PRIMARY KEY (`id`),
KEY `is_status` (`is_status`),
KEY `master_book` (`master_book`),
KEY `livelib_book_id` (`livelib_book_id`),
KEY `livelib_read_num` (`livelib_read_num`),
KEY `is_status_master_book` (`is_status`,`master_book`),
KEY `livelib_book_id_master_book` (`livelib_book_id`,`master_book`),
KEY `is_status_master_book_indexed` (`is_status`,`master_book`,`indexed`),
KEY `is_status_donor_no_ru_master_book` (`is_status`,`donor`,`no_ru`,`master_book`),
KEY `livelib_url_master_book_is_status` (`livelib_url`,`master_book`,`is_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Problems with books_genres.
It has no PRIMARY KEY.
All columns are nullable. Will you ever insert a row with any NULLs?
Recommend (after saying NOT NULL on all columns):
PRIMARY KEY(`book_id`,`genre_id`)
INDEX(genre_id, book_id, sort)
and remove all the rest.
I don't see livelib_read_num in the table???
In the other table, remove any indexes that are the exact prefix of some other index.
These might help with speed. (Again, filter out prefix indexes that are redundant.) (These are "covering" indexes, which helps a little.)
books: INDEX(is_status, master_book, livelib_read_num, id)
books: INDEX(livelib_read_num, id, is_status, master_book)
The second index may cause the Optimizer to give preference to ORDER BY. (That is a risky optimization, since it might have to scan the entire index without finding 25 relevant rows.)

SELECT sql_no_cache
`books`.`id`
FROM
`books`
use index(books_idx_is_stat_master_livelib_id)
WHERE
(
1 = 1
AND `books`.`is_status` = 'active'
AND `books`.`master_book` = 'true'
)
AND (
EXISTS (
SELECT
1
FROM
`books_genres`
WHERE
(
`books_genres`.`book_id` = `books`.`id`
)
AND (
`books_genres`.`genre_id` IN (
380, 381, 384, 385, 1359
)
)
)
)
ORDER BY
`books`.`livelib_read_num` DESC,
`books`.`id` DESC LIMIT 0,
25;
25 rows in set (0.07 sec)

Related

Need help optimizing sql JOIN query and indexes on large tables

I have a query with a JOIN on three tables that is taking a very long time to run. I created an index on one of my tables for the foreign key (user_shared_url_id) and two columns (event_result, enabled) in the WHERE clause, so it's an index of three columns total. There seems to be no different from when I simply use an index of the foreign key (user_shared_url_id). The other two tables are using single column indexes. My users table has about 20,000 rows, but the other two tables are quite large, with ~20 million rows. I can't get a query that takes less than a minute or so to finish. Can anyone think of any potential optimizations I can make to speed this up? Are there other indexes or improvements to my custom index that I can work with?
The tables:
CREATE TABLE `users` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`roles` varchar(500) DEFAULT NULL,
`first_name` varchar(200) DEFAULT NULL,
`last_name` varchar(100) DEFAULT NULL,
`org_id` int(11) unsigned NOT NULL,
`user_email` varchar(100) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`user_id`),
KEY `org_id` (`org_id`),
KEY `status` (`status`),
KEY `org_id_user_id` (`org_id`,`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=162524 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `user_shared_urls` (
`user_id` int(11) unsigned NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_shared_url_id` int(11) NOT NULL AUTO_INCREMENT,
`target_url` text,
PRIMARY KEY (`user_shared_url_id`),
KEY `user_id` (`user_id`),
KEY `user_id_usu_id` (`user_id`,`user_shared_url_id`)
) ENGINE=InnoDB AUTO_INCREMENT=62449105 DEFAULT CHARSET=utf8 |
CREATE TABLE `user_share_events` (
`user_share_event_id` int(11) NOT NULL AUTO_INCREMENT,
`event_result` tinyint(1) unsigned DEFAULT NULL,
`user_shared_url_id` int(11) NOT NULL,
`enabled` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`user_share_event_id`),
KEY `user_shared_url_id` (`user_shared_url_id`),
KEY `usuid_enabled_result` (`user_shared_url_id`,`enabled`,`event_result`)
) ENGINE=InnoDB AUTO_INCREMENT=35067339 DEFAULT CHARSET=utf8 |
My indexes:
CREATE INDEX org_id_user_id ON users(org_id, user_id);
CREATE INDEX user_id_usu_id ON user_shared_urls(user_id, user_shared_url_id);
CREATE INDEX usuid_enabled_result ON user_share_events(user_shared_url_id,enabled,event_result);
My query:
SELECT
users.user_id,
users.user_email "user_email",
users.roles "role",
CONCAT(users.first_name, ' ', users.last_name) "name",
usus.target_url
FROM
users
JOIN user_shared_urls usus ON usus.user_id = users.user_id
JOIN user_share_events uses ON usus.user_shared_url_id = uses.user_shared_url_id
WHERE
users.org_id = 1523
AND
uses.enabled = '1'
AND
uses.event_result = 1
Explain output of the above query:
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| 1 | SIMPLE | users | ref | PRIMARY,org_id,org_id_user_id | org_id | 4 | const | 1235 | NULL |
| 1 | SIMPLE | usus | ref | PRIMARY,user_id,user_id_usu_id | user_id_usu_id | 4 | luster.users.user_id | 213 | NULL |
| 1 | SIMPLE | uses | ref | user_shared_url_id,user_and_service,result_service_occurred,usuid_enabled_result | user_shared_url_id | 4 | luster.usus.user_shared_url_id | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
3 rows in set (0.00 sec)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)
Change that index you added to
INDEX(user_shared_url_id, -- = and used for the JOIN
enabled, -- =
event_result) -- Last (not an = test)
The order of columns in an INDEX is important. Start with the columns that are tested for = (or IS NULL).
Then remove the FORCE INDEX and run the EXPLAIN again.
Are these tables in a 1:many relationship? Tell us which way.
Another comment: If event_result really has only two values (true/false) and you are using NULL for false, then change the query from
uses.event_result IS NOT NULL
to
uses.event_result = 1
The point is that the Optimizer likes to optimize =, but sees NOT NULL as being any of 256 possible values; very far from =. With this query change, your index should work. And even be picked without using FORCE.
For this query:
SELECT u.user_id, u.user_email, u.roles "role",
CONCAT(u.first_name, ' ', u.last_name) "name",
usu.target_url
FROM user_shared_urls usu JOIN
users u
ON usu.user_id = u.user_id JOIN
user_share_events usev
ON usus.user_shared_url_id = usev.user_shared_url_id
WHERE u.org_id = 1010 AND
usev.event_result IS NOT NULL AND
usev.enabled = 1;
Probably the best indexes are:
users(org_id, user_id)
user_shared_urls(user_id, user_shared_url_id)
user_share_events(user_shared_url_id, enabled, event_result)
This assumes that the filtering on org_id is more selective than the other filters.

MySql group by optimization - avoid tmp table and/or filesort

I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...

How to optimize join when search in categories

I have a table with items:
CREATE TABLE `ost_content` (
`uid` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`type` enum('media','serial','season','series') NOT NULL,
`alias` varchar(200) NOT NULL,
`views` mediumint(7) NOT NULL DEFAULT '0',
`ratings_count` enum('0','1','2','4','5') NOT NULL DEFAULT '0',
`ratings_sum` mediumint(5) NOT NULL DEFAULT '0',
`upload_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`conversion_status` enum('converting','error','success','announcement') NOT NULL DEFAULT 'converting',
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`uid`,`type`),
KEY `idx_type` (`type`),
KEY `idx_upload_date DESC` (`upload_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And table, that connect items with categories:
CREATE TABLE `ost_categories2media` (
`categories2media_id` mediumint(6) unsigned NOT NULL AUTO_INCREMENT,
`categories2media_category_id` smallint(5) unsigned NOT NULL,
`categories2media_uid` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`categories2media_id`),
KEY `categories2media_media_id` (`categories2media_uid`),
KEY `categories2media_category_id` (`categories2media_category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=501114 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Than, I executing query:
SELECT
c1.uid,
c1.alias,
c1.type,
c1.views,
c1.upload_date,
c1.ratings_sum,
c1.ratings_count,
c1.conversion_status
FROM
ost_content c1
LEFT JOIN ost_categories2media c2m ON c2m.categories2media_uid = c1.uid
WHERE
c2m.categories2media_category_id = '53'
AND c1.conversion_status IN ('success', 'announcement')
AND c1.type IN ('serial', 'media')
ORDER BY
c1.upload_date DESC
LIMIT 16, 16
It executing slow, categories2media_category_id check many rows:
+----+-------------+-------+--------+--------------------------------------------------------+------------------------------+---------+---------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------------------------------------+------------------------------+---------+---------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | c2m | ref | categories2media_media_id,categories2media_category_id | categories2media_category_id | 2 | const | 32076 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | c1 | eq_ref | PRIMARY,idx_uid_type,idx_type | PRIMARY | 3 | uakino.c2m.categories2media_uid | 1 | Using where |
+----+-------------+-------+--------+--------------------------------------------------------+------------------------------+---------+---------------------------------+-------+----------------------------------------------+
How I can optimize or rewrite this query?
Mysql indexes are like cooks, too many of them aren't very useful because mysql uses only one index per table. Let's look at ost_categories2media,
That's three separate indexes on three columns. You are better off with two indexes like this.
PRIMARY KEY (`categories2media_id`),
KEY `categories2media_media_id` (`categories2media_uid`,`categories2media_category_id`)
Now mysql no longer has to decide between an index on categories2media_uid or categories2media_category_id it has an index that covers both!
Looking at your ost_content table we see
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`uid`,`type`),
KEY `idx_type` (`type`),
KEY `idx_upload_date DESC` (`upload_date`)
Some of these indexes are a bit redundant. Any query that filters on the uid field can use the PK while any query that filters on type can use idx_type that means idx_uid_type is there just to enforce the uniqueness. But we can make it more usefull like this:
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`type`,`uid`),
KEY `idx_upload_date DESC` (`upload_date`)
We've got rid of one index! that ought to make your indexes a lot faster. You still have an index on upload_date that isn't used in this particulary query. So how about a composite index for that?
PRIMARY KEY (`uid`),
UNIQUE KEY `idx_uid_type` (`type`,`uid`),
KEY `idx_upload_date DESC` (`uid`,`upload_date`)
First, the LEFT JOIN is not necessary. So, you can write the query as:
SELECT c.*
FROM ost_content c JOIN
ost_categories2media c2m
ON c2m.categories2media_uid = c.uid
WHERE c2m.categories2media_category_id = '53' AND
c.conversion_status IN ('success', 'announcement') AND
c.type IN ('serial', 'media')
ORDER BY c.upload_date DESC
LIMIT 16, 16;
Unfortunately, your conditions on the content table are not simple = conditions. If they were, and index on ost_content(conversion_status, type, uid) would be recommended. This might still be the better option.
Another option is to go the other way: An index on ost_categories2media(categories2media_category_id, categories2media_uid).
You might find that the first composite index and this query work best:
SELECT c.*
FROM ((SELECT c.*
FROM ost_content c JOIN
ost_categories2media c2m
ON c2m.categories2media_uid = c.uid
WHERE c2m.categories2media_category_id = '53' AND
c.conversion_status = 'success' AND
c.type IN ('serial', 'media')
) UNION ALL
(SELECT c.*
FROM ost_content c JOIN
ost_categories2media c2m
ON c2m.categories2media_uid = c.uid
WHERE c2m.categories2media_category_id = '53' AND
c.conversion_status = 'announcement' AND
c.type IN ('serial', 'media')
)
) c
ORDER BY c.upload_date DESC
LIMIT 16, 16;
This looks more complicated, but each subquery can take advantage of the index, so it might have improved performance.

Distinct vs Group By

I have two tables like this.
The 'order' table has 21886 rows.
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=572375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
My question is here.
MariaDB [test]> explain
-> SELECT DISTINCT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 22151 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index; Distinct |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
MariaDB [test]> explain
-> SELECT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> GROUP BY A.id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 65 | Using index; Using temporary |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
Listed above, two queries returns same result but distinct is too slow(explain too many rows).
What's the difference?
It is usually advised to use DISTINCT instead of GROUP BY, since that is what you actually want, and let the optimizer choose the "best" execution plan. However - no optimizer is perfect. Using DISTINCT the optimizer can have more options for an execution plan. But that also means that it has more options to choose a bad plan.
You write that the DISTINCT query is "slow", but you don't tell any numbers. In my test (with 10 times as many rows on MariaDB 10.0.19 and 10.3.13) the DISTINCT query is like (only) 25% slower (562ms/453ms). The EXPLAIN result is no help at all. It's even "lying". With LIMIT 100, 30 it would need to read at least 130 rows (that's what my EXPLAIN actually schows for GROUP BY), but it shows you 65.
I can't explain the 25% difference in execution time, but it seems that the engine is doing a full table/index scan in any case, and sorts the result before it can skip 100 and select 30 rows.
The best plan would probably be:
Read rows from idx_reg_date index (table A) one by one in descending order
Look if there is a match in the idx_order_id index (table B)
Skip 100 matching rows
Send 30 matching rows
Exit
If there are like 10% of rows in A which have no match in B, this plan would read something like 143 rows from A.
Best I could do to somehow force this plan is:
SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
This query returns the same result in 156 ms (3 times faster than GROUP BY). But that is still too slow. And it's probaly still reading all rows in table A.
We can proof that a better plan can exist with a "little" subquery trick:
SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
This query executes in "no time" (~ 0 ms) and returns the same result on my test data. And though it's not 100% reliable, it shows that the optimizer is not doing a good job.
So what are my conclusions:
The optimizer does not always do the best job and sometimes needs help
Even when we know "the best plan", we can not always enforce it
DISTINCT is not always faster than GROUP BY
When no index can be used for all clauses - things are getting quite tricky
Test schema and dummy data:
drop table if exists `order`;
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into `order`(reg_date)
select from_unixtime(floor(rand(1) * 1000000000)) as reg_date
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 218860;
drop table if exists `order_detail_products`;
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into order_detail_products(id, order_id, order_detail_id, prod_id)
select null as id
, floor(rand(2)*218860)+1 as order_id
, 0 as order_detail_id
, 0 as prod_id
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 437320;
Queries:
SELECT DISTINCT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 562 ms
SELECT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
GROUP BY A.id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 453 ms
SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 156 ms
SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- ~ 0 ms
I believe your select distinct is slow because you broke the index by matching on another table. In most cases, select distinct will be faster. But in this case, since you are matching on parameters of another table, the index is broken and is much slower.

Optimize MySQL join query to remove Using temporary and use an index?

I have a query with ORDER BY name and the index on name is being ignored.
How can I optimize the query to use an index and get rid of Using temporary from EXPLAIN?
I have log-queries-not-using-indexes enabled and I'm seeing this query thousands of times.
Here's the query:
SELECT l.parent_id, j.id, j.location_id, j.currency, j.frequency, ROUND((j.salary_min + j.salary_max)/2) as salary
FROM jobs AS j
JOIN location AS l
ON j.location_id = l.id
WHERE j.salary_min !=0
AND j.status != 'Rejected'
AND l.published =1
AND date_sub(now(), interval 1 month) <= j.effected_date
ORDER BY l.name
The explain:
+----+-------------+-------+--------+----------------------------------+---------------+---------+----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+---------------+---------+----------------------------+------+----------------------------------------------+
| 1 | SIMPLE | j | range | effected_date,location_id,status | effected_date | 9 | NULL | 562 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | esljw_joomla.j.location_id | 1 | Using where |
+----+-------------+-------+--------+----------------------------------+---------------+---------+----------------------------+------+----------------------------------------------+
2 rows in set (0.01 sec)
And the table structure:
CREATE TABLE IF NOT EXISTS `jobs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`location_id` varchar(255) NOT NULL,
`status` varchar(255) DEFAULT NULL,
`currency` varchar(255) DEFAULT NULL,
`salary_min` int(11) DEFAULT NULL,
`salary_max` int(11) DEFAULT NULL,
`effected_date` datetime DEFAULT NULL,
`frequency` varchar(255) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `effected_date` (`effected_date`),
KEY `location_id` (`location_id`),
KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=10130 ;
CREATE TABLE IF NOT EXISTS `location` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
`parent_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=304 ;
Add published + id composite index to location table
Move l.published =1 condition to the ON clause
This is what you can to do in your case. But probbaly you'll never get rid of using temporary since you're sorting not by primary table, but by joined table.
It's because you've listed job first. Change the order of the tables, like this:
SELECT l.parent_id, j.id, j.location_id, j.currency, j.frequency, ROUND((j.salary_min + j.salary_max)/2) as salary
FROM location AS l
JOIN jobs AS j ON j.location_id = l.id
WHERE j.salary_min !=0
AND j.status != 'Rejected'
AND l.published =1
AND date_sub(now(), interval 1 month) <= j.effected_date
ORDER BY l.name
Try it and post how it goes.
Many times I've done queries where you have proper primary table as first in query, with good indexes, adding STRAIGHT_JOIN alone can fix a query. So, with your existing criteria, you should be good with your date index and use that as the primary criteria... such as
SELECT STRAIGHT_JOIN
L.Parent_ID,
J.id,
J.location_id,
J.currency,
J.frequency,
ROUND(( J.salary_min + J.salary_max) / 2 ) as Salary
FROM
jobs J
join Location L
on J.Location_ID = L.ID
AND L.Published = 1
WHERE
J.Effected_Date >= date_sub(now(), interval 1 month)
AND J.salary_min != 0
AND J.status != 'Rejected'
ORDER BY
L.name