How to make JOINS faster?

How to make JOINS faster? - mysql

I had this query to start out with:
SELECT DISTINCT spentits.*
FROM `spentits`
WHERE (spentits.user_id IN
(SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44')
ORDER BY id DESC
LIMIT 15 OFFSET 0
This query takes 10ms to execute.
But once I add a simple join in:
SELECT DISTINCT spentits.*
FROM `spentits`
LEFT JOIN wishlist_items ON wishlist_items.user_id = 44 AND wishlist_items.spentit_id = spentits.id
WHERE (spentits.user_id IN
(SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44')
ORDER BY id DESC
LIMIT 15 OFFSET 0
This execute time increased by 11x. Now it takes around 120ms to execute. What's interesting is that if I remove either the LEFT JOIN clause or the ORDER BY id DESC , the time goes back to 10ms.
I am new to databases so I don't understand this. Why is it that removing either one of these clauses speeds it up 11x ? And how can I keep it as is but make it faster?
I have indexes on spentits.user_id, follows.follower_id, follows.accepted, and on primary ids of each table.
EXPLAIN:
1 PRIMARY spentits index index_spentits_on_user_id PRIMARY 4 NULL 15 Using where; Using temporary
1 PRIMARY wishlist_items ref index_wishlist_items_on_user_id,index_wishlist_items_on_spentit_id index_wishlist_items_on_spentit_id 5 spentit.spentits.id 1 Using where; Distinct
2 SUBQUERY follows index_merge index_follows_on_follower_id,index_follows_on_following_id,index_follows_on_accepted
index_follows_on_follower_id,index_follows_on_accepted 5,2 NULL 566 Using intersect(index_follows_on_follower_id,index_follows_on_accepted); Using where

You should have index also on:
wishlist_items.spentit_id
Because you are joining over that column

The LEFT JOIN is easy to explain: A cross product of all entries against all other entries is made. The conditions of the join (in your case: Take all entries on the left and find fitting ones on the right) are applied afterwards. So if your spentits table is large it will take the server some time. Would suggest you get rid of your subquery and make three joins. Start with the smallest table to avoid big amounts of data.

In the 2nd example the subselect runs for every spentits.user_id.
If you write is like this it will be faster because the subselect runs once:
SELECT DISTINCT spentits.*
FROM `spentits`, (SELECT following_id
FROM `follows`
WHERE `follows`.`follower_id` = '44'
AND `follows`.`accepted` = 1)
OR spentits.user_id = '44') as `follow`
LEFT JOIN wishlist_items ON wishlist_items.user_id = 44 AND wishlist_items.spentit_id = spentits.id
WHERE (spentits.user_id IN
(follow)
ORDER BY id DESC
LIMIT 15 OFFSET 0
As you can see the subselect moved to the FROM-part of the query and creates a imaginary tabel (or view).
This imaginary tabel is a inline-view.
JOINs and inline-views are faster every time than a subselect in the WHERE-part.

Related

Understaing the difference between two queries from performance point

I have this two version of the same query. Both produce same results (164 rows). But the second one takes .5 sec while the 1st one takes 17 sec. Can someone explain what's going on here?
TABLE organizations : 11988 ROWS
TABLE transaction_metas : 58232 ROWS
TABLE contracts_history : 219469 ROWS
# TAKES 17 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
# TAKES .6 SEC
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
left join (select * from `transaction_metas` where contract_token in (select token from `contracts_history` where seller_id = 850)) as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token` and `tm`.`field` = '1'
WHERE `contracts_history`.`seller_id` = '850'
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Explain Results:
First Query: https://prnt.sc/hjtiw6
Second Query: https://prnt.sc/hjtjjg
As based on my debugging of the first query it was clear that left join to transaction_metas table was making it slow, So I tried to limit its rows instead of joining to the full table. It seems to work but I don't understand why.

Join is a set of combinations from rows in your tables. That in mind, in the first query the engine combines all the results to filter just after. In second case one it applies the filter before it tries make the combinations.
The best case would make use of filter in JOIN clause without subquery.
Much like this:
SELECT contracts_history.buyer_id as id, org.name, SUM(transactions_count) as transactions_count, GROUP_CONCAT(DISTINCT(tm.value)) as balancing_authorities
From `contracts_history`
INNER JOIN `organizations` as `org`
ON `org`.`id` = `contracts_history`.`buyer_id`
AND `contracts_history`.`seller_id` = '850'
LEFT JOIN `transaction_metas` as `tm`
ON `tm`.`contract_token` = `contracts_history`.`token`
AND `tm`.`field` = 1
GROUP BY `contracts_history`.`buyer_id` ORDER BY `balancing_authorities` DESC
Note: When you reduce the size of the join tables by filtering with subqueries, it may allow the rows fit into the buffer. Nice trick to small buffer limit.
A Better explication:
https://dev.mysql.com/doc/refman/5.5/en/explain-output.html

MySQL query optimization - getting the last post of all threads

My MySQL query is loading very slow (over 30 secs), I was wondering what tweaks I can make to optimize it.
The query should return the last post with the string "?" of all threads.
SELECT FeedbackId, ParentFeedbackId, PageId, FeedbackTitle, FeedbackText, FeedbackDate
FROM ReaderFeedback AS c
LEFT JOIN (
SELECT max(FeedbackId) AS MaxFeedbackId
FROM ReaderFeedback
WHERE ParentFeedbackId IS NOT NULL
GROUP BY ParentFeedbackId
) AS d ON d.MaxFeedbackId = c.FeedbackId
WHERE ParentFeedbackId IS NOT NULL
AND FeedbackText LIKE '%?%'
GROUP BY ParentFeedbackId
ORDER BY d.MaxFeedbackId DESC LIMIT 50

Before discuss this problem, I have formatted your SQL:
SELECT feedbackid,
parentfeedbackid,
pageid,
feedbacktitle,
feedbacktext,
feedbackdate
FROM readerfeedback AS c
LEFT JOIN (SELECT Max(feedbackid) AS MaxFeedbackId
FROM readerfeedback
WHERE parentfeedbackid IS NOT NULL
GROUP BY parentfeedbackid) AS d
ON d.maxfeedbackid = c.feedbackid
WHERE parentfeedbackid IS NOT NULL
AND feedbacktext LIKE '%?%'
GROUP BY parentfeedbackid
ORDER BY d.maxfeedbackid DESC
LIMIT 50
Since there is an Inefficient query criteria in your SQL:
feedbacktext LIKE '%?%'
Which is not able to take benefit from Index and needs a full scan, I suggest you to add a new field
isQuestion BOOLEAN
to your table, and then add logic in your program to assign this field when insert/update a feedbacktext.
Finally your can query based on this field and take benefit from index.

Firstly your SQL is not valid. The outer Group by is not valid.
According to the SQL the second group by is not needed. I moved the 2 where into inner SQL, as well as the limit, wonder if the following is quicker:
SELECT FeedbackId, ParentFeedbackId, PageId, FeedbackTitle, FeedbackText, FeedbackDate
FROM ReaderFeedback AS c
JOIN (
SELECT max(FeedbackId) AS MaxFeedbackId
FROM ReaderFeedback
WHERE ParentFeedbackId IS NOT NULL
AND FeedbackText LIKE '%?%'
GROUP BY ParentFeedbackId
ORDER BY 1 DESC LIMIT 50
) AS d ON d.MaxFeedbackId = c.FeedbackId
Please have a look at your table structure, see if there is any normalisation be downed for speed concern.

Poor Performance from MySQL JOIN - How to Make Improvements?

A bit of a generic question title but I have the following query:
SELECT t.from_number, COUNT(*) AS calls
FROM t
WHERE t.organisation_id = 999
AND t.direction = 'inbound'
AND t.start_time BETWEEN '2014-03-26' AND NOW()
AND t.from_number != ''
GROUP BY t.from_number
ORDER BY calls DESC LIMIT 20
and it executes in 488ms.
However, aswell as retrieving the data from that table I need to lookup who the number belongs to.
SELECT t.from_number, COUNT(*) AS calls
FROM t
LEFT JOIN n on CONCAT('44', n.number) = t.from_number
WHERE t.organisation_id = 999
AND t.direction = 'inbound'
AND t.start_time BETWEEN '2014-03-26' AND NOW()
AND t.from_number != ''
GROUP BY t.from_number
ORDER BY calls DESC LIMIT 20
As soon as I add the JOIN the query execution time jumps up to anything from 8 - 12 seconds and that's only to find the organisation that the number belongs to, I'd need yet another join after that to retrieve the organisation name from the organisations table.
The cardinality of t and n are > 2,000,000 and ~ 63,000 respectively, and, as you can guess from above, the numbers are stored slightly differently in each:
t stores numbers as 123456789 since the country code (44) is stored in a separate column but n stores numbers as 44123456789 which is why I need to use the CONCAT but I didn't think this would affect performance since it's not in the WHERE clause.
As far as I can tell, I have indexed the important columns in each table.
Are there any suggestions on how I can improve the performance of queries when it comes to these tables?
Update
EXPLAIN output added
id, select_type, table, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE t index_merge organisation_id,start_time,direction,from_number organisation_id,direction 4,13 NULL 4174 Using intersect(organisation_id,direction); Using where; Using temporary; Using filesort
1 SIMPLE n index NULL number 768 NULL 62759 Using index

The problem is on the JOIN clause:
LEFT JOIN n on CONCAT('44', n.number) = t.from_number
It is joining the tables using the result of the function CONCAT('44', n.number).
Some databases (as Oracle), can create an index based on a funcion, but others (as MySQL) cannot. So, it cannot use any index on table n to make the join.
A solution would be to create a new column on n with the result of the used function and to index it.
You could use a code similar to:
ALTER TABLE n ADD COLUMN extended_number varchar(128) null;
UPDATE n
SET extended_number = CONCAT('44', number);
CREATE INDEX ext_numb_idx
ON n.extended_number;
After this, modify the JOIN clause of the query:
SELECT t.from_number, COUNT(*) AS calls
FROM t
LEFT JOIN n on n.extended_number = t.from_number
WHERE t.organisation_id = 999
AND t.direction = 'inbound'
AND t.start_time BETWEEN '2014-03-26' AND NOW()
AND t.from_number != ''
GROUP BY t.from_number
ORDER BY calls DESC LIMIT 20
Then MySQL will use the newly created index and will execute the query much faster.

How to limiting subquery requests to one?

I was thinking a way to using one query with a subquery instead of using two seperate queries.
But turns out using a subquery is causing multiple requests for each row in result set. Is there a way to limit that count subquery result only one with in a combined query ?
SELECT `ad_general`.`id`,
( SELECT count(`ad_general`.`id`) AS count
FROM (`ad_general`)
WHERE `city` = 708 ) AS count,
FROM (`ad_general`)
WHERE `ad_general`.`city` = '708'
ORDER BY `ad_general`.`id` DESC
LIMIT 15
May be using a join can solve the problem but dunno how ?

SELECT ad_general.id, stats.cnt
FROM ad_general
JOIN (
SELECT count(*) as cnt
FROM ad_general
WHERE city = 708
) AS stats
WHERE ad_general.city = 708
ORDER BY ad_general.id DESC
LIMIT 15;
The explicit table names aren't required, but are used both for clarity and maintainability (the explicit table names will prevent any imbiguities should the schema for ad_general or the generated table ever change).

You can self-join (join the table to itself table) and apply aggregate function to the second.
SELECT `adgen`.`id`, COUNT(`adgen_count`.`id`) AS `count`
FROM `ad_general` AS `adgen`
JOIN `ad_general` AS `adgen_count` ON `adgen_count`.city = 708
WHERE `adgen`.`city` = 708
GROUP BY `adgen`.`id`
ORDER BY `adgen`.`id` DESC
LIMIT 15
However, it's impossible to say what the appropriate grouping is without knowing the structure of the table.

How to optimize query looking for rows where conditional join rows do not exist?

I've got a table of keywords that I regularly refresh against a remote search API, and I have another table that gets a row each each time I refresh one of the keywords. I use this table to block multiple processes from stepping on each other and refreshing the same keyword, as well as stat collection. So when I spin up my program, it queries for all the keywords that don't have a request currently in process, and don't have a successful one within the last 15 mins, or whatever the interval is. All was working fine for awhile, but now the keywords_requests table has almost 2 million rows in it and things are bogging down badly. I've got indexes on almost every column in the keywords_requests table, but to no avail.
I'm logging slow queries and this one is taking forever, as you can see. What can I do?
# Query_time: 20 Lock_time: 0 Rows_sent: 568 Rows_examined: 1826718
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT JOIN `keywords_requests` as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
AND KeywordsRequest.created > FROM_UNIXTIME(1234551323)
)
WHERE KeywordsRequest.id IS NULL
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;

It seems your most selective index on Keywords is one on KeywordRequest.created.
Try to rewrite query this way:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` as kr
WHERE created > FROM_UNIXTIME(1234567890) /* Happy unix_time! */
) AS KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
AND (KeywordsRequest.status = 'success' OR KeywordsRequest.status = 'active')
AND KeywordsRequest.source_id = '29'
)
WHERE keyword_id IS NULL;
It will (hopefully) hash join two not so large sources.
And Bill Karwin is right, you don't need the GROUP BY or ORDER BY
There is no fine control over the plans in MySQL, but you can try (try) to improve your query in the following ways:
Create a composite index on (keyword_id, status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
UNION
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN `keywords_requests` kr
ON (
keyword_id = id
AND status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
WHERE keyword_id IS NULL
This ideally should use NESTED LOOPS on your index.
Create a composite index on (status, source_id, created) and make it so:
SELECT Keyword.id, Keyword.keyword
FROM `keywords` as Keyword
LEFT OUTER JOIN (
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'success'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
UNION ALL
SELECT *
FROM `keywords_requests` kr
WHERE
status = 'active'
AND source_id = '29'
AND created > FROM_UNIXTIME(1234567890)
)
ON keyword_id = id
WHERE keyword_id IS NULL
This will hopefully use HASH JOIN on even more restricted hash table.

When diagnosing MySQL query performance, one of the first things you need to analyze is the report from EXPLAIN.
If you learn to read the information EXPLAIN gives you, then you can see where queries are failing to make use of indexes, or where they are causing expensive filesorts, or other performance red flags.
I notice in your query, the GROUP BY is irrelevant, since there will be only one NULL row returned from KeywordRequests. Also the ORDER BY is irrelevant, since you're ordering by a column that will always be NULL due to your WHERE clause. If you remove these clauses, you'll probably eliminate a filesort.
Also consider rewriting the query into other forms, and measure the performance of each. For example:
SELECT k.id, k.keyword
FROM `keywords` AS k
WHERE NOT EXISTS (
SELECT * FROM `keywords_requests` AS kr
WHERE kr.keyword_id = k.id
AND kr.status IN ('success', 'active')
AND kr.source_id = '29'
AND kr.created > FROM_UNIXTIME(1234551323)
);
Other tips:
Is kr.source_id an integer? If so, compare to the integer 29 instead of the string '29'.
Are there appropriate indexes on keyword_id, status, source_id, created? Perhaps even a compound index over all four columns would be best, since MySQL will use only one index per table in a given query.
You did a screenshot of your EXPLAIN output and posted a link in the comments. I see that the query is not using an index from Keywords, which makes sense since you're scanning every row in that table anyway. The phrase "Not exists" indicates that MySQL has optimized the LEFT OUTER JOIN a bit.
I think this should be improved over your original query. The GROUP BY/ORDER BY was probably causing it to save an intermediate data set as a temporary table, and sorting it on disk (which is very slow!). What you'd look for is "Using temporary; using filesort" in the Extra column of EXPLAIN information.
So you may have improved it enough already to mitigate the bottleneck for now.
I do notice that the possible keys probably indicate that you have individual indexes on four columns. You may be able to improve that by creating a compound index:
CREATE INDEX kr_cover ON keywords_requests
(keyword_id, created, source_id, status);
You can give MySQL a hint to use a specific index:
... FROM `keywords_requests` AS kr USE INDEX (kr_cover) WHERE ...

Dunno about MySQL but in MSSQL the lines of attack I would take are:
1) Create a covering index on KeywordsRequest status, source_id and created
2) UNION the results tog et around the OR on KeywordsRequest.status
3) Use NOT EXISTS instead o the Outer Join (and try with UNION instead of OR too)

Try this
SELECT Keyword.id, Keyword.keyword
FROM keywords as Keyword
LEFT JOIN (select * from keywords_requests where source_id = '29' and (status = 'success' OR status = 'active')
AND source_id = '29'
AND created > FROM_UNIXTIME(1234551323)
AND id IS NULL
) as KeywordsRequest
ON (
KeywordsRequest.keyword_id = Keyword.id
)
GROUP BY Keyword.id
ORDER BY KeywordsRequest.created ASC;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008