I have the following query:
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
The analytics table has 60M rows and the transactions table has 3M rows.
When I run an EXPLAIN on this query, I get:
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| # id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| '1' | 'SIMPLE' | 'analytics' | 'ref' | 'analytics_user_id | analytics_source' | 'analytics_user_id' | '5' | 'const' | '337662' | 'Using where; Using temporary; Using filesort' |
| '1' | 'SIMPLE' | 'transactions' | 'ref' | 'tran_analytics' | 'tran_analytics' | '5' | 'dijishop2.analytics.id' | '1' | NULL | |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
I can't figure out how to optimise this query as it's already very basic. It takes around 70 seconds to run this query.
Here are the indexes that exist:
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| # Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'analytics' | '0' | 'PRIMARY' | '1' | 'id' | 'A' | '56934235' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_user_id' | '1' | 'user_id' | 'A' | '130583' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_product_id' | '1' | 'product_id' | 'A' | '490812' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_affil_user_id' | '1' | 'affil_user_id' | 'A' | '55222' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_source' | '1' | 'source' | 'A' | '24604' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_country_name' | '1' | 'country_name' | 'A' | '39510' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '1' | 'id' | 'A' | '56934235' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '2' | 'user_id' | 'A' | '56934235' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'analytics' | '1' | 'analytics_gordon' | '3' | 'source' | 'A' | '56934235' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| # Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'transactions' | '0' | 'PRIMARY' | '1' | 'id' | 'A' | '2436151' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_user_id' | '1' | 'user_id' | 'A' | '56654' | NULL | NULL | '' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'transaction_id' | '1' | 'transaction_id' | 'A' | '2436151' | '191' | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_analytics' | '1' | 'analytics' | 'A' | '2436151' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'tran_status' | '1' | 'status' | 'A' | '22' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'gordon_trans' | '1' | 'status' | 'A' | '22' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
| 'transactions' | '1' | 'gordon_trans' | '2' | 'analytics' | 'A' | '2436151' | NULL | NULL | 'YES' | 'BTREE' | '' | '' |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
Simplified schema for the two tables before adding any extra indexes as suggested as it didn't improve the situation.
CREATE TABLE `analytics` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`affil_user_id` int(11) DEFAULT NULL,
`product_id` int(11) DEFAULT NULL,
`medium` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`terms` varchar(1024) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`is_browser` tinyint(1) DEFAULT NULL,
`is_mobile` tinyint(1) DEFAULT NULL,
`is_robot` tinyint(1) DEFAULT NULL,
`browser` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`mobile` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`robot` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`platform` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`referrer` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`domain` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`ip` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`continent_code` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`country_name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`city` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `analytics_user_id` (`user_id`),
KEY `analytics_product_id` (`product_id`),
KEY `analytics_affil_user_id` (`affil_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=64821325 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `transactions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transaction_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`pay_key` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`sender_email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`amount` decimal(10,2) DEFAULT NULL,
`currency` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`analytics` int(11) DEFAULT NULL,
`ip_address` varchar(46) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`session_id` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`eu_vat_applied` int(1) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `tran_user_id` (`user_id`),
KEY `transaction_id` (`transaction_id`(191)),
KEY `tran_analytics` (`analytics`),
KEY `tran_status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10019356 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
If the above can not be optimised any further. Any implementation advice on summary tables will be great. We are using a LAMP stack on AWS. The above query is running on RDS (m1.large).
I would create the following indexes (b-tree indexes):
analytics(user_id, source, id)
transactions(analytics, status)
This is different from Gordon's suggestion.
The order of columns in the index is important.
You filter by specific analytics.user_id, so this field has to be the first in the index.
Then you group by analytics.source. To avoid sorting by source this should be the next field of the index. You also reference analytics.id, so it is better to have this field as part of the index, put it last. Is MySQL capable of reading just the index and not touching the table? I don't know, but it is rather easy to test.
Index on transactions has to start with analytics, because it would be used in the JOIN. We also need status.
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
First some analysis...
SELECT a.source AS referrer,
COUNT(*) AS frequency, -- See question below
SUM(t.status = 'COMPLETED') AS sales
FROM analytics AS a
LEFT JOIN transactions AS t ON a.id = t.analytics AS a
WHERE a.user_id = 52094
GROUP BY a.source
ORDER BY frequency DESC
LIMIT 10
If the mapping from a to t is "one-to-many", then you need to consider whether the COUNT and SUM have the correct values or inflated values. As the query stands, they are "inflated". The JOIN occurs before the aggregation, so you are counting the number of transactions and how many were completed. I'll assume that is desired.
Note: The usual pattern is COUNT(*); saying COUNT(x) implies checking x for being NULL. I suspect that check is not needed?
This index handles the WHERE and is "covering":
analytics: INDEX(user_id, source, id) -- user_id first
transactions: INDEX(analytics, status) -- in this order
The GROUP BY may or may not require a 'sort'. The ORDER BY, being different than the GROUP BY, definitely will need a sort. And the entire grouped set of rows will need to be sorted; there is no shortcut for the LIMIT.
Normally, Summary tables are date-oriented. That is, the PRIMARY KEY includes a 'date' and some other dimensions. Perhaps, keying by date and user_id would make sense? How many transactions per day does the average user have? If at least 10, then let's consider a Summary table. Also, it is important not to be UPDATEing or DELETEing old records. More
I would probably have
user_id ...,
source ...,
dy DATE ...,
status ...,
freq MEDIUMINT UNSIGNED NOT NULL,
status_ct MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY(user_id, status, source, dy)
Then the query becomes
SELECT source AS referrer,
SUM(freq) AS frequency,
SUM(status_ct) AS completed_sales
FROM Summary
WHERE user_id = 52094
AND status = 'COMPLETED'
GROUP BY source
ORDER BY frequency DESC
LIMIT 10
The speed comes from many factors
Smaller table (fewer rows to look at)
No JOIN
More useful index
(It still needs the extra sort.)
Even without the summary table, there may be some speedups...
How big are the tables? How big is `innodb_buffer_pool_size?
Normalizing some of the strings that are both bulky and repetitive could make that table not I/O-bound.
This is awful: KEY (transaction_id(191)); See here for 5 ways to fix it.
IP addresses do not need 255 bytes, nor utf8mb4_unicode_ci. (39) and ascii are sufficient.
For this query:
SELECT a.source AS referrer,
COUNT(*) AS frequency,
SUM( t.status = 'COMPLETED' ) AS sales
FROM analytics a LEFT JOIN
transactions t
ON a.id = t.analytics
WHERE a.user_id = 52094
GROUP BY a.source
ORDER BY frequency DESC
LIMIT 10 ;
You want an index on analytics(user_id, id, source) and transactions(analytics, status).
Try below and let me know if this helps.
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM (SELECT * FROM analytics where user_id = 52094) analytics
LEFT JOIN (SELECT analytics, status from transactions where analytics = 52094) transactions ON analytics.id = transactions.analytics
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
Could you try below Approach:
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(sales) AS sales
FROM analytics
LEFT JOIN(
SELECT transactions.Analytics, (CASE WHEN transactions.status = 'COMPLETED' THEN 1 ELSE 0 END) AS sales
FROM analytics INNER JOIN transactions ON analytics.id = transactions.analytics
) Tra
ON analytics.id = Tra.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
This query potentially joins millions of analytics records with transactions records and calculates the sum (including the status check) on millions of records.
If we could first apply the LIMIT 10 and then do the join and calculate the sum, we could speed up the query.
Unfortunately, we need the analytics.id for the join, which gets lost after applying the GROUP BY. But maybe analytics.source is selective enough to boost the query anyway.
My Idea is therefore to calculate the frequencies, limit by them, to return the analytics.source and frequency in a subquery and to use this result to filter the analytics in the main query, which then does the rest of the joins and calculations on a hopefully much reduced number of records.
Minimal subquery (note: no join, no sum, returns 10 records):
SELECT
source,
COUNT(id) AS frequency
FROM analytics
WHERE user_id = 52094
GROUP BY source
ORDER BY frequency DESC
LIMIT 10
The full query using the above query as subquery x:
SELECT
x.source AS referrer,
x.frequency,
SUM(IF(t.status = 'COMPLETED', 1, 0)) AS sales
FROM
(<subquery here>) x
INNER JOIN analytics a
ON x.source = a.source -- This reduces the number of records
LEFT JOIN transactions t
ON a.id = t.analytics
WHERE a.user_id = 52094 -- We could have several users per source
GROUP BY x.source, x.frequency
ORDER BY x.frequency DESC
If this does not yield the expected performance boost, this could be due to MySQL applying the joins in an unexpected order. As explained here "Is there a way to force MySQL execution order?", you could replace the join by STRAIGHT_JOIN in this case.
Only Problem I find in your query is
GROUP BY analytics.source
ORDER BY frequency DESC
because of this query is doing filesort using temporary table.
One way to avoid this is by creating another table like
CREATE TABLE `analytics_aggr` (
`source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`frequency` int(10) DEFAULT NULL,
`sales` int(10) DEFAULT NULL,
KEY `sales` (`sales`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;`
insert data into analytics_aggr using below query
insert into analytics_aggr SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY null
Now you can easily get you data using
select * from analytics_aggr order by sales desc
Try this
SELECT
a.source AS referrer,
COUNT(a.id) AS frequency,
SUM(t.sales) AS sales
FROM (Select id, source From analytics Where user_id = 52094) a
LEFT JOIN (Select analytics, case when status = 'COMPLETED' Then 1 else 0 end as sales
From transactions) t ON a.id = t.analytics
GROUP BY a.source
ORDER BY frequency DESC
LIMIT 10
I'm proposing this because you said "they are massive table" but this sql using very few columns only. In this case if we use inline view with require columns only then it will be good
Note: memory also will play important role here. So confirm the memory before decide the inline view
I would try to separate querying from the two tables. Since you need only top 10 sources, I would get them first and then query from transactions the sales column:
SELECT source as referrer
,frequency
,(select count(*)
from transactions t
where t.analytics in (select distinct id
from analytics
where user_id = 52094
and source = by_frequency.source)
and status = 'completed'
) as sales
from (SELECT analytics.source
,count(*) as frequency
from analytics
where analytics.user_id = 52094
group by analytics.source
order by frequency desc
limit 10
) by_frequency
It may be also faster without the distinct
I would try subquery:
SELECT a.source AS referrer,
COUNT(*) AS frequency,
SUM((SELECT COUNT(*) FROM transactions t
WHERE a.id = t.analytics AND t.status = 'COMPLETED')) AS sales
FROM analytics a
WHERE a.user_id = 52094
GROUP BY a.source
ORDER BY frequency DESC
LIMIT 10;
Plus indexes exactly as #Gordon's answer: analytics(user_id, id, source) and transactions(analytics, status).
I am assuming the predicate, user_id = 52094, is for illustration purpose and in application, the selected user_id is a variable.
I also assume that ACID property is not very important here.
(1) Therefore, I will maintain two replica tables with only the necessary fields (it is similar to the indices Vladimir had suggested above) using a utility table.
CREATE TABLE mv_anal (
`id` int(11) NOT NULL,
`user_id` int(11) DEFAULT NULL,
`source` varchar(45),
PRIMARY KEY (`id`)
);
CREATE TABLE mv_trans (
`id` int(11) NOT NULL,
`status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`analytics` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE util (
last_updated_anal int (11) NOT NULL,
last_updated_trans int (11) NOT NULL
);
INSERT INTO util (0, 0);
The gain here is that we will be reading a relatively smaller projections of the original tables -- hopefully, OS level and DB level caches work and they aren't read from slower secondary storage but from faster RAM. This can be a very great gain.
Here is how I updated the two tables (the below is a transaction run by a cron) :
-- TRANSACTION STARTS --
INSERT INTO mv_trans
SELECT id, IF (status = 'COMPLETE', 1, 0) AS status, analysis
FROM transactions JOIN util
ON util.last_updated_trans <= transactions.id
UPDATE util
SET last_updated_trans = sub.m
FROM (SELECT MAX (id) AS m FROM mv_trans) sub;
-- TRANSACTION COMMITS --
-- similar transaction for mv_anal.
(2) Now, I will tackle the selectivity to reduce sequential scan time. I will have to build a b-tree index on user_id, source and id (in this sequence) on mv_anal.
Note: the above can be achieved by just creating index on analytics table but building such an index requires reading big table with 60M rows. My method requires the index building to read only very thin table. Thus, we can rebuild the btree more frequently (to counter the skew problem as the table is append-only).
This is how I make sure the high selectivity is achieved when querying and to counter skewing btree problem.
(3) In PostgreSQL, WITH subqueries are always materialized. I hope similarly for MySQL. Therefore, as the last mile of optimization:
WITH sub_anal AS (
SELECT user_id, source AS referrer, COUNT (id) AS frequency
FROM mv_anal
WHERE user_id = 52094
GROUP BY user_id, source
ORDER BY COUNT (id) DESC
LIMIT 10
)
SELECT sa.referrer, sa.frequency, SUM (status) AS sales
FROM sub_anal AS sa
JOIN mv_anal anal
ON sa.referrer = anal.source AND sa.user_id = anal.user_id
JOIN mv_trans AS trans
ON anal.id = trans.analytics
Late to the party. I think you'll need to load one index into MySQL's cache. The NLJ is probably killing performance. Here's how I see it:
The Path
Your query is simple. It has two tables and the "path" is very clear:
The optimizer should plan on reading the analytics table first.
The optimizer should plan on reading the transactions table second. This is because you are using a LEFT OUTER JOIN. No much discussion on this one.
Besides, the analytics table is 60 million rows and the best path should filter rows as soon as possible on this one.
The Access
Once the path is clear, you need to decide if you want to use an Index Access or a Table Access. Both have pros and cons. However, you want to improve the SELECT performance:
You should choose Index Access.
Avoid hybrid access. Therefore, you should avoid any Table Access (fetches) at all cost. Translation: place all the participating columns in indexes.
The Filtering
Again, you want high performance for the SELECT. Therefore:
You should perform the filtering at the index level, not at the table level.
Row Aggregation
After filtering, the next step is to aggregate rows by GROUP BY analytics.source. This can be improved by placing the source column as the first column in the index.
Optimal Indexes for Path, Access, Filtering, and Aggregation
Considering all the above, you should include all mentioned columns into indexes. The following indexes should improve the response time:
create index ix1_analytics on analytics (user_id, source, id);
create index ix2_transactions on transactions (analytics, status);
These indexes fulfill the "path", the "access", and the "filtering" strategies decribed above.
The Index Cache
Finally -- and this is critical -- load the secondary index into MySQL's memory cache. MySQL is performing a NLJ (Nested Loop Join) -- a 'ref' in MySQL lingo -- and needs to access the second one randomly nearly 200k times.
Unfortunately, I don't know for sure how to load the index into MySQL's cache. The use of FORCE may work, as in:
SELECT
analytics.source AS referrer,
COUNT(analytics.id) AS frequency,
SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions FORCE index (ix2_transactions)
ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094
GROUP BY analytics.source
ORDER BY frequency DESC
LIMIT 10
Make sure you have enough cache space. Here's a short question/answer to figure out: How to figure out if mysql index fits entirely in memory
Good luck! Oh, and post the results.
This question has definitely received a lot of attention so I'm sure all obvious solutions have been tried. I did not see something that addresses the LEFT JOIN in the query, though.
I have noticed that LEFT JOIN statements usually force query planners into hash join which are fast for a small number of results, but terribly slow for a large number of results. As noted in #Rick James' answer, since the join in the original query is on the identity field analytics.id, this will generate large number of results. A hash join will yield terrible performance results. The suggestion below addresses this below without any schema or processing changes.
Since the aggregation is by analytics.source, I would try a query that creates separate aggregations for frequency by source and sales by source and defers the left join until after aggregation is complete. This should allow the indexes to be used best (typically this is a merge join for large data sets).
Here is my suggestion:
SELECT t1.source AS referrer, t1.frequency, t2.sales
FROM (
-- Frequency by source
SELECT a.source, COUNT(a.id) AS frequency
FROM analytics a
WHERE a.user_id=52094
GROUP BY a.source
) t1
LEFT JOIN (
-- Sales by source
SELECT a.source,
SUM(IF(t.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics a
JOIN transactions t
WHERE a.id = t.analytics
AND t.status = 'COMPLETED'
AND a.user_id=52094
GROUP by a.source
) t2
ON t1.source = t2.source
ORDER BY frequency DESC
LIMIT 10
Hope this helps.
Related
I have a problem with the speed of query. Question is similar to this one, but can't find solution. Explain says that MySQL is using: Using where; Using index; Using temporary; Using filesort
Slow query:
select
distinct(`books`.`id`)
from `books`
join `books_genres` on `books_genres`.`book_id` = `books`.`id`
where
`books`.`is_status` = 'active' and `books`.`master_book` = 'true'
and `books_genres`.`genre_id` in(380,381,384,385,1359)
order by
`books`.`livelib_read_num` DESC, `books`.`id` DESC
limit 0,25
#25 rows (0.319 s)
But if I remove order statement from query it is really fast:
select sql_no_cache
distinct(`books`.`id`)
from `books`
join `books_genres` on `books_genres`.`book_id` = `books`.`id`
where
`books`.`is_status` = 'active' and `books`.`master_book` = 'true'
and `books_genres`.`genre_id` in(380,381,384,385,1359)
limit 0,25
#25 rows (0.005 s)
Explain:
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
| 1 | SIMPLE | books_genres | range | book_id,categorie_id,book_id2,genre_id_book_id | genre_id_book_id | 10 | NULL | 194890 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | books | eq_ref | PRIMARY,is_status,master_book,is_status_master_book,is_status_master_book_indexed,is_status_donor_no_ru_master_book | PRIMARY | 4 | knigogid3.books_genres.book_id | 1 | Using where |
+------+-------------+--------------+--------+---------------------------------------------------------------------------------------------------------------------+------------------+---------+--------------------------------+--------+-----------------------------------------------------------+
2 rows in set (0.00 sec)
My tables:
CREATE TABLE `books_genres` (
`book_id` int(11) DEFAULT NULL,
`genre_id` int(11) DEFAULT NULL,
`sort` tinyint(4) DEFAULT NULL,
UNIQUE KEY `book_id` (`book_id`,`genre_id`),
KEY `categorie_id` (`genre_id`),
KEY `sort` (`sort`),
KEY `book_id2` (`book_id`),
KEY `genre_id_book_id` (`genre_id`,`book_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`is_status` enum('active','parser','incorrect','extremist','delete','fulldeteled') NOT NULL DEFAULT 'active',
`livelib_book_id` int(11) DEFAULT NULL,
`master_book` enum('true','false') DEFAULT 'true'
PRIMARY KEY (`id`),
KEY `is_status` (`is_status`),
KEY `master_book` (`master_book`),
KEY `livelib_book_id` (`livelib_book_id`),
KEY `livelib_read_num` (`livelib_read_num`),
KEY `is_status_master_book` (`is_status`,`master_book`),
KEY `livelib_book_id_master_book` (`livelib_book_id`,`master_book`),
KEY `is_status_master_book_indexed` (`is_status`,`master_book`,`indexed`),
KEY `is_status_donor_no_ru_master_book` (`is_status`,`donor`,`no_ru`,`master_book`),
KEY `livelib_url_master_book_is_status` (`livelib_url`,`master_book`,`is_status`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Problems with books_genres.
It has no PRIMARY KEY.
All columns are nullable. Will you ever insert a row with any NULLs?
Recommend (after saying NOT NULL on all columns):
PRIMARY KEY(`book_id`,`genre_id`)
INDEX(genre_id, book_id, sort)
and remove all the rest.
I don't see livelib_read_num in the table???
In the other table, remove any indexes that are the exact prefix of some other index.
These might help with speed. (Again, filter out prefix indexes that are redundant.) (These are "covering" indexes, which helps a little.)
books: INDEX(is_status, master_book, livelib_read_num, id)
books: INDEX(livelib_read_num, id, is_status, master_book)
The second index may cause the Optimizer to give preference to ORDER BY. (That is a risky optimization, since it might have to scan the entire index without finding 25 relevant rows.)
SELECT sql_no_cache
`books`.`id`
FROM
`books`
use index(books_idx_is_stat_master_livelib_id)
WHERE
(
1 = 1
AND `books`.`is_status` = 'active'
AND `books`.`master_book` = 'true'
)
AND (
EXISTS (
SELECT
1
FROM
`books_genres`
WHERE
(
`books_genres`.`book_id` = `books`.`id`
)
AND (
`books_genres`.`genre_id` IN (
380, 381, 384, 385, 1359
)
)
)
)
ORDER BY
`books`.`livelib_read_num` DESC,
`books`.`id` DESC LIMIT 0,
25;
25 rows in set (0.07 sec)
I have a query with a JOIN on three tables that is taking a very long time to run. I created an index on one of my tables for the foreign key (user_shared_url_id) and two columns (event_result, enabled) in the WHERE clause, so it's an index of three columns total. There seems to be no different from when I simply use an index of the foreign key (user_shared_url_id). The other two tables are using single column indexes. My users table has about 20,000 rows, but the other two tables are quite large, with ~20 million rows. I can't get a query that takes less than a minute or so to finish. Can anyone think of any potential optimizations I can make to speed this up? Are there other indexes or improvements to my custom index that I can work with?
The tables:
CREATE TABLE `users` (
`user_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`roles` varchar(500) DEFAULT NULL,
`first_name` varchar(200) DEFAULT NULL,
`last_name` varchar(100) DEFAULT NULL,
`org_id` int(11) unsigned NOT NULL,
`user_email` varchar(100) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`user_id`),
KEY `org_id` (`org_id`),
KEY `status` (`status`),
KEY `org_id_user_id` (`org_id`,`user_id`)
) ENGINE=MyISAM AUTO_INCREMENT=162524 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
CREATE TABLE `user_shared_urls` (
`user_id` int(11) unsigned NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`user_shared_url_id` int(11) NOT NULL AUTO_INCREMENT,
`target_url` text,
PRIMARY KEY (`user_shared_url_id`),
KEY `user_id` (`user_id`),
KEY `user_id_usu_id` (`user_id`,`user_shared_url_id`)
) ENGINE=InnoDB AUTO_INCREMENT=62449105 DEFAULT CHARSET=utf8 |
CREATE TABLE `user_share_events` (
`user_share_event_id` int(11) NOT NULL AUTO_INCREMENT,
`event_result` tinyint(1) unsigned DEFAULT NULL,
`user_shared_url_id` int(11) NOT NULL,
`enabled` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`user_share_event_id`),
KEY `user_shared_url_id` (`user_shared_url_id`),
KEY `usuid_enabled_result` (`user_shared_url_id`,`enabled`,`event_result`)
) ENGINE=InnoDB AUTO_INCREMENT=35067339 DEFAULT CHARSET=utf8 |
My indexes:
CREATE INDEX org_id_user_id ON users(org_id, user_id);
CREATE INDEX user_id_usu_id ON user_shared_urls(user_id, user_shared_url_id);
CREATE INDEX usuid_enabled_result ON user_share_events(user_shared_url_id,enabled,event_result);
My query:
SELECT
users.user_id,
users.user_email "user_email",
users.roles "role",
CONCAT(users.first_name, ' ', users.last_name) "name",
usus.target_url
FROM
users
JOIN user_shared_urls usus ON usus.user_id = users.user_id
JOIN user_share_events uses ON usus.user_shared_url_id = uses.user_shared_url_id
WHERE
users.org_id = 1523
AND
uses.enabled = '1'
AND
uses.event_result = 1
Explain output of the above query:
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
| 1 | SIMPLE | users | ref | PRIMARY,org_id,org_id_user_id | org_id | 4 | const | 1235 | NULL |
| 1 | SIMPLE | usus | ref | PRIMARY,user_id,user_id_usu_id | user_id_usu_id | 4 | luster.users.user_id | 213 | NULL |
| 1 | SIMPLE | uses | ref | user_shared_url_id,user_and_service,result_service_occurred,usuid_enabled_result | user_shared_url_id | 4 | luster.usus.user_shared_url_id | 1 | Using where |
+----+-------------+-------+------+----------------------------------------------------------------------------------+--------------------+---------+--------------------------------+------+-------------+
3 rows in set (0.00 sec)
(Please use SHOW CREATE TABLE; it is more descriptive than DESCRIBE.)
Change that index you added to
INDEX(user_shared_url_id, -- = and used for the JOIN
enabled, -- =
event_result) -- Last (not an = test)
The order of columns in an INDEX is important. Start with the columns that are tested for = (or IS NULL).
Then remove the FORCE INDEX and run the EXPLAIN again.
Are these tables in a 1:many relationship? Tell us which way.
Another comment: If event_result really has only two values (true/false) and you are using NULL for false, then change the query from
uses.event_result IS NOT NULL
to
uses.event_result = 1
The point is that the Optimizer likes to optimize =, but sees NOT NULL as being any of 256 possible values; very far from =. With this query change, your index should work. And even be picked without using FORCE.
For this query:
SELECT u.user_id, u.user_email, u.roles "role",
CONCAT(u.first_name, ' ', u.last_name) "name",
usu.target_url
FROM user_shared_urls usu JOIN
users u
ON usu.user_id = u.user_id JOIN
user_share_events usev
ON usus.user_shared_url_id = usev.user_shared_url_id
WHERE u.org_id = 1010 AND
usev.event_result IS NOT NULL AND
usev.enabled = 1;
Probably the best indexes are:
users(org_id, user_id)
user_shared_urls(user_id, user_shared_url_id)
user_share_events(user_shared_url_id, enabled, event_result)
This assumes that the filtering on org_id is more selective than the other filters.
I have a slow query, without the group by is fast (0.1-0.3 seconds), but with the (required) group by the duration is around 10-15s.
The query joins two tables, events (near 50 million rows) and events_locations (5 million rows).
Query:
SELECT `e`.`id` AS `event_id`,`e`.`time_stamp` AS `time_stamp`,`el`.`latitude` AS `latitude`,`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,`e`.`entity_id` AS `asset_name`, `el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id`AS `entity_type_id`, el.some_id
FROM events e
INNER JOIN events_locations el ON el.event_id = e.id
WHERE 1=1
AND el.other_id = '1'
AND time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
GROUP BY `e`.`event_type_id` , `el`.`some_id` , `el`.`group_alias`;
Table events:
CREATE TABLE `events` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event_type_id` int(11) NOT NULL,
`entity_type_id` int(11) NOT NULL,
`entity_id` varchar(64) NOT NULL,
`alias` varchar(64) NOT NULL,
`time_stamp` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `entity_id` (`entity_id`),
KEY `event_type_idx` (`event_type_id`),
KEY `idx_events_time_stamp` (`time_stamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Table events_locations
CREATE TABLE `events_locations` (
`event_id` bigint(20) NOT NULL,
`latitude` double NOT NULL,
`longitude` double NOT NULL,
`some_id` bigint(20) DEFAULT NULL,
`other_id` bigint(20) DEFAULT NULL,
`time_span` bigint(20) DEFAULT NULL,
`group_alias` varchar(64) NOT NULL,
KEY `some_id_idx` (`some_id`),
KEY `idx_events_group_alias` (`group_alias`),
KEY `idx_event_id` (`event_id`),
CONSTRAINT `fk_event_id` FOREIGN KEY (`event_id`) REFERENCES `events` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The explain:
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
| 1 | SIMPLE | ea | ALL | 'idx_event_id' | NULL | NULL | NULL | 5152834 | 'Using where; Using temporary; Using filesort' |
| 1 | SIMPLE | e | eq_ref | 'PRIMARY,idx_events_time_stamp' | PRIMARY | '8' | 'name.ea.event_id' | 1 | |
+----+-------------+----------------+---------------------------------+---------+---------+-------------------------------------------+----------+------------------------------------------------+
2 rows in set (0.08 sec)
From the doc:
Temporary tables can be created under conditions such as these:
If there is an ORDER BY clause and a different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.
DISTINCT combined with ORDER BY may require a temporary table.
If you use the SQL_SMALL_RESULT option, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
I already tried:
Create an index by 'el.some_id , el.group_alias'
Decrease the varchar size to 20
Increase the size of sort_buffer_size and read_rnd_buffer_size;
Any suggestions for performance tuning would be much appreciated!
In your case events table has time_span as indexing property. So before joining both tables first select required records from events table for specific date range with required details. Then join the event_location by using table relation properties.
Check your MySql Explain keyword to check how does your approach your table records. It will tell you how much rows are scanned for before selecting required records.
Number of rows that are scanned also involve in query execution time. Use my below logic to reduce the number of rows that are scanned.
SELECT
`e`.`id` AS `event_id`,
`e`.`time_stamp` AS `time_stamp`,
`el`.`latitude` AS `latitude`,
`el`.`longitude` AS `longitude`,
`el`.`time_span` AS `extra`,
`e`.`entity_id` AS `asset_name`,
`el`.`other_id` AS `geozone_id`,
`el`.`group_alias` AS `group_alias`,
`e`.`event_type_id` AS `event_type_id`,
`e`.`entity_type_id` AS `entity_type_id`,
`el`.`some_id` as `some_id`
FROM
(select
`id` AS `event_id`,
`time_stamp` AS `time_stamp`,
`entity_id` AS `asset_name`,
`event_type_id` AS `event_type_id`,
`entity_type_id` AS `entity_type_id`
from
`events`
WHERE
time_stamp >= '2018-01-01'
AND time_stamp <= '2019-06-02'
) AS `e`
JOIN `events_locations` `el` ON `e`.`event_id` = `el`.`event_id`
WHERE
`el`.`other_id` = '1'
GROUP BY
`e`.`event_type_id` ,
`el`.`some_id` ,
`el`.`group_alias`;
The relationship between these tables is 1:1, so, I asked me why is a group by required and I found some duplicated rows, 200 in 50000 rows. So, somehow, my system is inserting duplicates and someone put that group by (years ago) instead of seek of the bug.
So, I will mark this as solved, more or less...
I have two tables like this.
The 'order' table has 21886 rows.
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=572375 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
My question is here.
MariaDB [test]> explain
-> SELECT DISTINCT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 22151 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index; Distinct |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
MariaDB [test]> explain
-> SELECT A.id
-> FROM order A
-> JOIN order_detail_products B ON A.id = B.order_id
-> GROUP BY A.id
-> ORDER BY A.reg_date DESC LIMIT 100, 30;
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
| 1 | SIMPLE | A | index | PRIMARY | idx_reg_date | 8 | NULL | 65 | Using index; Using temporary |
| 1 | SIMPLE | B | ref | idx_order_id | idx_order_id | 8 | bom_20140804.A.id | 2 | Using index |
+------+-------------+-------+-------+---------------+--------------+---------+-------------------+------+------------------------------+
Listed above, two queries returns same result but distinct is too slow(explain too many rows).
What's the difference?
It is usually advised to use DISTINCT instead of GROUP BY, since that is what you actually want, and let the optimizer choose the "best" execution plan. However - no optimizer is perfect. Using DISTINCT the optimizer can have more options for an execution plan. But that also means that it has more options to choose a bad plan.
You write that the DISTINCT query is "slow", but you don't tell any numbers. In my test (with 10 times as many rows on MariaDB 10.0.19 and 10.3.13) the DISTINCT query is like (only) 25% slower (562ms/453ms). The EXPLAIN result is no help at all. It's even "lying". With LIMIT 100, 30 it would need to read at least 130 rows (that's what my EXPLAIN actually schows for GROUP BY), but it shows you 65.
I can't explain the 25% difference in execution time, but it seems that the engine is doing a full table/index scan in any case, and sorts the result before it can skip 100 and select 30 rows.
The best plan would probably be:
Read rows from idx_reg_date index (table A) one by one in descending order
Look if there is a match in the idx_order_id index (table B)
Skip 100 matching rows
Send 30 matching rows
Exit
If there are like 10% of rows in A which have no match in B, this plan would read something like 143 rows from A.
Best I could do to somehow force this plan is:
SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
This query returns the same result in 156 ms (3 times faster than GROUP BY). But that is still too slow. And it's probaly still reading all rows in table A.
We can proof that a better plan can exist with a "little" subquery trick:
SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30
OFFSET 100
This query executes in "no time" (~ 0 ms) and returns the same result on my test data. And though it's not 100% reliable, it shows that the optimizer is not doing a good job.
So what are my conclusions:
The optimizer does not always do the best job and sometimes needs help
Even when we know "the best plan", we can not always enforce it
DISTINCT is not always faster than GROUP BY
When no index can be used for all clauses - things are getting quite tricky
Test schema and dummy data:
drop table if exists `order`;
CREATE TABLE `order` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`reg_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_reg_date` (`reg_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into `order`(reg_date)
select from_unixtime(floor(rand(1) * 1000000000)) as reg_date
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 218860;
drop table if exists `order_detail_products`;
CREATE TABLE `order_detail_products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`order_id` bigint(20) unsigned NOT NULL,
`order_detail_id` int(11) NOT NULL,
`prod_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_order_detail_id` (`order_detail_id`,`prod_id`),
KEY `idx_order_id` (`order_id`,`order_detail_id`,`prod_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
insert into order_detail_products(id, order_id, order_detail_id, prod_id)
select null as id
, floor(rand(2)*218860)+1 as order_id
, 0 as order_detail_id
, 0 as prod_id
from information_schema.COLUMNS a
, information_schema.COLUMNS b
limit 437320;
Queries:
SELECT DISTINCT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 562 ms
SELECT A.id
FROM `order` A
JOIN order_detail_products B ON A.id = B.order_id
GROUP BY A.id
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 453 ms
SELECT A.id
FROM `order` A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- 156 ms
SELECT A.id
FROM (
SELECT id, reg_date
FROM `order`
ORDER BY reg_date DESC
LIMIT 1000
) A
WHERE EXISTS (SELECT * FROM order_detail_products B WHERE A.id = B.order_id)
ORDER BY A.reg_date DESC
LIMIT 30 OFFSET 100;
-- ~ 0 ms
I believe your select distinct is slow because you broke the index by matching on another table. In most cases, select distinct will be faster. But in this case, since you are matching on parameters of another table, the index is broken and is much slower.
I have a query with ORDER BY name and the index on name is being ignored.
How can I optimize the query to use an index and get rid of Using temporary from EXPLAIN?
I have log-queries-not-using-indexes enabled and I'm seeing this query thousands of times.
Here's the query:
SELECT l.parent_id, j.id, j.location_id, j.currency, j.frequency, ROUND((j.salary_min + j.salary_max)/2) as salary
FROM jobs AS j
JOIN location AS l
ON j.location_id = l.id
WHERE j.salary_min !=0
AND j.status != 'Rejected'
AND l.published =1
AND date_sub(now(), interval 1 month) <= j.effected_date
ORDER BY l.name
The explain:
+----+-------------+-------+--------+----------------------------------+---------------+---------+----------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------+---------------+---------+----------------------------+------+----------------------------------------------+
| 1 | SIMPLE | j | range | effected_date,location_id,status | effected_date | 9 | NULL | 562 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | l | eq_ref | PRIMARY | PRIMARY | 4 | esljw_joomla.j.location_id | 1 | Using where |
+----+-------------+-------+--------+----------------------------------+---------------+---------+----------------------------+------+----------------------------------------------+
2 rows in set (0.01 sec)
And the table structure:
CREATE TABLE IF NOT EXISTS `jobs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`location_id` varchar(255) NOT NULL,
`status` varchar(255) DEFAULT NULL,
`currency` varchar(255) DEFAULT NULL,
`salary_min` int(11) DEFAULT NULL,
`salary_max` int(11) DEFAULT NULL,
`effected_date` datetime DEFAULT NULL,
`frequency` varchar(255) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
KEY `effected_date` (`effected_date`),
KEY `location_id` (`location_id`),
KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=10130 ;
CREATE TABLE IF NOT EXISTS `location` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
`parent_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=304 ;
Add published + id composite index to location table
Move l.published =1 condition to the ON clause
This is what you can to do in your case. But probbaly you'll never get rid of using temporary since you're sorting not by primary table, but by joined table.
It's because you've listed job first. Change the order of the tables, like this:
SELECT l.parent_id, j.id, j.location_id, j.currency, j.frequency, ROUND((j.salary_min + j.salary_max)/2) as salary
FROM location AS l
JOIN jobs AS j ON j.location_id = l.id
WHERE j.salary_min !=0
AND j.status != 'Rejected'
AND l.published =1
AND date_sub(now(), interval 1 month) <= j.effected_date
ORDER BY l.name
Try it and post how it goes.
Many times I've done queries where you have proper primary table as first in query, with good indexes, adding STRAIGHT_JOIN alone can fix a query. So, with your existing criteria, you should be good with your date index and use that as the primary criteria... such as
SELECT STRAIGHT_JOIN
L.Parent_ID,
J.id,
J.location_id,
J.currency,
J.frequency,
ROUND(( J.salary_min + J.salary_max) / 2 ) as Salary
FROM
jobs J
join Location L
on J.Location_ID = L.ID
AND L.Published = 1
WHERE
J.Effected_Date >= date_sub(now(), interval 1 month)
AND J.salary_min != 0
AND J.status != 'Rejected'
ORDER BY
L.name