I whipped up a query here that does something particular with retrieving results that do not match the join (as suggested by this SO question).
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE f_id = cf.f_id
)
Any ideas on how to speed this up? There are anywhere from 30k-200k rows it's looking through and appears to be using indexes, but the query times out.
EXPLAIN/DESCRIBE Info:
1 PRIMARY c ALL PRIMARY NULL NULL NULL 39119
1 PRIMARY cf ref c_id, c_id_2 c_id 8 ...c.id 11 Using where; Using index
2 DEPENDENT SUBQUERY following index NULL PRIMARY 8 NULL 35612 Using where; Using index
The comments table isn't used explicitly in the query. Is it being used for filtering? If not, try:
SELECT cf.f_id
FROM comments_following cf
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE follows.f_id = cf.f_id
)
By the way, if this generates a syntax error (because follows.f_id does not exist), then that is the problem. In that case, you would think you have a correlated subquery, but there is not really one.
Or the left outer join version:
SELECT cf.f_id
FROM comments_following cf left outer join
follows f
on f.f_id = cf.f_id
where f.f_id is null
Having an index on follows(f_id) should make both these versions run faster.
LEFT JOIN sometimes is faster then WHERE NOT EXISTS subquerys, try:
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
LEFT JOIN follows AS f ON f.f_id = cf.f_id
WHERE f.f_id IS NULL
The answer to this problem was to place a second index on follows.f_id.
Related
Still trying to learn my way around indexes, shouldn't the JOINs in the outer query be using the index on the Primary Key? Are indexes not working in combination with subqueries? Thanks!
SELECT SQL_BIG_RESULT
I.item_group_id
FROM
(
SELECT SQL_BIG_RESULT
MAX(ITM.id) as max_id
FROM a_movements M
JOIN a_items_to_movements ITM ON ITM.movement_id = M.id -- Index used
WHERE M.warehouse_id IN (...) -- Index used
GROUP BY ITM.item_id
ORDER BY NULL
) X
JOIN a_items_to_movements ITM ON ITM.id = X.max_id -- Index not used
JOIN a_movements M ON M.id = ITM.movement_id
AND M.direction = 0
AND M.settled IS NOT NULL
JOIN a_items I ON I.id = ITM.item_id -- Index not used
GROUP BY I.item_group_id
ORDER BY NULL
EDIT: attached EXPLAIN output here: https://imgur.com/PdO3mIo
How to fetch rows where a joined subquery is null?
SELECT *
FROM bank_recon b
LEFT JOIN (
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1
LIMIT 1
) s ON s.bank_recon_id=b.id
WHERE s IS NULL
update
When using this query (the subquery) something is fetched depending on if is_ocr_verified is set or not
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1 && o.bank_recon_id=320062
When using this query everything is fetched no matter what!?
SELECT b.txt, b.amount
FROM bank_recon b
LEFT JOIN (
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1
LIMIT 1
) s ON s.bank_recon_id=b.id
WHERE b.id=320062 && s.bank_recon_id IS NULL
Specify a column in your WHERE clause, not just the subquery.
WHERE s.bank_recon_id IS NULL
An anti join (which is what you are trying to apply here) is a method we use when the straight-forward NOT IN or NOT EXISTS have performance issues in a DBMS.
Provided data_voucher_ocr_bank.bank_recon_id cannot be null, we can use:
SELECT txt, amount
FROM bank_recon
WHERE id NOT IN
(
SELECT bank_recon_id
FROM data_voucher_ocr_bank
WHERE data_voucher_id IN (SELECT id FROM data_voucher WHERE is_ocr_verified = 1)
);
(Otherwise we'd add AND bank_recon_id IS NOT NULL or use NOT EXISTS instead.)
Can someone tell me how do I write the following SQL:
SELECT url_source_wp.url
FROM url_source_wp
WHERE url_source_wp.id NOT IN (
SELECT url_done_wp.url_source_wp
FROM url_done_wp
WHERE (url_done_wp.url_group = 4) AND (hash IS NULL)) LIMIT 50;
using a join?
I tried:
SELECT url_source_wp.url
FROM url_source_wp
LEFT OUTER JOIN url_done_wp ON url_source_wp.id = url_done_wp.url_source_wp
WHERE url_done_wp.url_group = 4 AND url_source_wp.hash is NULL LIMIT 50
But the reply is not the same.
The problem is that the first SQL is very very slow.
I believe that you are looking for something like this:
SELECT url_source_wp.url
FROM url_source_wp
LEFT OUTER JOIN url_done_wp
ON url_source_wp.id = url_done_wp.url_source_wp AND url_done_wp.url_group = 4 AND hash IS NULL
WHERE url_done_wp.url_source_wp IS NULL
LIMIT 50
Shouldn't you just negate the two conditions in the WHERE clause ?
I assume you're trying to get all the url_source_wp records whose id's referenced in the url_done_wp table by the FK url_source_wp which do NOT have url_group = 4 and their hash column is NOT NULL, since you used a subquery with NOT IN.
INNER JOIN should be fine to.
So it should be:
SELECT url_source_wp.url
FROM url_source_wp
INNER JOIN url_done_wp ON url_source_wp.id = url_done_wp.url_source_wp
WHERE url_done_wp.url_group != 4 AND url_source_wp.hash IS NOT NULL LIMIT 50
I've been puzzling around this problem in mySQL 5.0.51a for quite a while now:
When using LEFT JOIN to join a table AND using a column of the joined table in the WHERE clause, mySQL fails to use the primary index of the joined table in the JOIN, even FORCE INDEX (PRIMARY) fails.
If no column of the joined table is in the WHERE clause, everything works fine.
If the GROUP BY is removed, the index is also used.
Yet I need both of them.
Faulty:
(in my special case up to 1000 secs of exec time)
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id
WHERE cu.marketing_allowed = 1 AND co.marketing_allowed = 1
GROUP BY cu.id
ORDER BY cu.name ASC
Working, but not solving my problems:
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id
GROUP BY co.id
Table structures (transcribed, as the real tables are more complex)
tbl_contract:
id: INT(11) PRIMARY
customer_id: INT(11)
marketing_allowed: TINYINT(1)
tbl_customer:
customer_id: INT(11) PRIMARY
marketing_allowed: TINYINT(1)
mySQL EXPLAIN notices PRIMARY as possible key when joining, but doesn't use it.
There has been one solution:
SELECT (...)
HAVING cu.marketing_allowed = 1
Solves the problem BUT we use the query in other contexts, where we can only select ONE column in the whole statement, but HAVING needs the marketing_allowed column to be selected in the SELECT-Statement.
I also noticed, that running ANALYZE TABLE on the desired tables will make mySQL 5.5.8 on my local system do the right thing, but I cannot always assure that ANALYZE has been run right before the statement. Anyways, this solution does not work under mySQL 5.0.51a on our productive server. :(
Is there a special rule in mySQL which I didn't notice? Why are LEFT JOIN indexes not used if columns appear in the WHERE clause? Why can't I force them?
Thx in advance,
René
[EDIT]
Thanks to some replies I could optimize the query using an INNER JOIN, but unfortunately, though seeming absolutely fine, mySQL still rejects to use an index when using an ORDER BY clause, as I found out:
SELECT *
FROM tbl_contract co
INNER JOIN tbl_customer cu ON cu.customer_id = co.customer_id AND cu.marketing_allowed = 1
WHERE cu.marketing_allowed = 1
ORDER BY cu.name ASC
If you leave the ORDER BY out, mySQL will use the index correctly.
I have removed the GROUP BY as it has no relevance in the example.
[EDIT2]
FORCING Indexes does not help, as well. So, the question is: Why does mySQL not use an index for joining, as the ORDER BY is executed AFTER joining and reducing the result set by a WHERE clause ? This should usually not influence joining...
I'm not sure I understand what you are asking, but
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id
WHERE cu.marketing_allowed = 1 AND co.marketing_allowed = 1
will not do an outer join (because of cu.marketing_allowed = 1).
You probably meant to use:
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu
ON cu.customer_id = co.customer_id
AND cu.marketing_allowed = 1
WHERE co.marketing_allowed = 1
I had the same trouble. MySQL optimizer is not using indexes while using JOIN with conditions. I changed my SQL statement from JOIN to subqueries :
SELECT
t1.field1,
t1.field2,
...
(SELECT
t2.field3
FROM table2 t2
WHERE t2.fieldX=t1.fieldX
) AS field3,
(SELECT
t2.field4
FROM table2 t2
WHERE t2.fieldX=t1.fieldX
) AS field4,
FROM table1 t1
WHERE t1.fieldZ='valueZ'
ORDER BY t1.sortedField
This request is much more complicated but as indexes are used, it is also much more faster.
You could also use STRAIGHT_JOIN but performance is better with above query. Here's a comparison for by DB with 100k rows in table1 and 20k in table2 :
0.00s using above query
0.10s using STRAIGHT_JOIN
0.30 using JOIN
Have you tried multiple condition on JOIN clause?
SELECT *
FROM tbl_contract co
LEFT JOIN tbl_customer cu ON cu.customer_id = co.customer_id AND cu.marketing_allowed = 1
WHERE co.marketing_allowed = 1
So... which one is faster (NULl value is not an issue), and are indexed.
SELECT * FROM A
JOIN B b ON b.id = a.id
JOIN C c ON c.id = b.id
WHERE A.id = '12345'
Using Left Joins:
SELECT * FROM A
LEFT JOIN B ON B.id=A.bid
LEFT JOIN C ON C.id=B.cid
WHERE A.id = '12345'
Here is the actual query
Here it is.. both return the same result
Query (0.2693sec) :
EXPLAIN EXTENDED SELECT *
FROM friend_events, zcms_users, user_events,
EVENTS WHERE friend_events.userid = '13006'
AND friend_events.state =0
AND UNIX_TIMESTAMP( friend_events.t ) >=1258923485
AND friend_events.xid = user_events.id
AND user_events.eid = events.eid
AND events.active =1
AND zcms_users.id = user_events.userid
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE zcms_users ALL PRIMARY NULL NULL NULL 43082
1 SIMPLE user_events ref PRIMARY,eid,userid userid 4 zcms_users.id 1
1 SIMPLE events eq_ref PRIMARY,active PRIMARY4 user_events.eid 1 Using where
1 SIMPLE friend_events eq_ref PRIMARY PRIMARY 8 user_events.id,const 1 Using where
LEFTJOIN QUERY: (0.0393 sec)
EXPLAIN EXTENDED SELECT *
FROM `friend_events`
LEFT JOIN `user_events` ON user_events.id = friend_events.xid
LEFT JOIN `events` ON user_events.eid = events.eid
LEFT JOIN `zcms_users` ON user_events.userid = zcms_users.id
WHERE (
events.active =1
)
AND (
friend_events.userid = '13006'
)
AND (
friend_events.state =0
)
AND (
UNIX_TIMESTAMP( friend_events.t ) >=1258923485
)
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE friend_events ALL PRIMARY NULL NULL NULL 53113 Using where
1 SIMPLE user_events eq_ref PRIMARY,eid PRIMARY 4 friend_events.xid 1 Using where
1 SIMPLE zcms_users eq_ref PRIMARY PRIMARY 4 user_events.userid 1
1 SIMPLE events eq_ref PRIMARY,active PRIMARY 4 user_events.eid 1 Using where
It depends; run them both to find out; then run an 'explain select' for an explanation.
The actual performance difference may range from "virtually non-existent" to "pretty significant" depending on how many rows in A with id='12345' have no matching records in B and C.
Update (based on posted query plans)
When you use INNER JOIN it doesn't matter (results-wise, not performance-wise) which table to start with, so optimizer tries to pick the one it thinks would perform best. It seems you have indexes on all appropriate PK / FK columns and you either don't have an index on friend_events.userid or there are too many records with userid = '13006' and it's not being used; either way optimizer picks the table with less rows as "base" - in this case it's zcms_users.
When you use LEFT JOIN it does matter (results-wise) which table to start with; thus friend_events is picked. Now why it takes less time that way I'm not quite sure; I'm guessing friend_events.userid condition helps. If you were to add an index (is it really varchar, btw? not numeric?) on that, your INNER JOIN might behave differently (and become faster) as well.
The INNER JOIN has to do an extra check to remove any records from A that don't have matching records in B and C. Depending on the number of records initially returned from A it COULD have an impact.
LEFT JOIN shows all data from A and only shows data from B/C only if the condition is true. As for INNER JOIN, it has to do some extra checking on both tables. So, I guess that explains why LEFT JOIN is faster.
Use EXPLAIN to see the query plan. It's probably the same plan for both cases, so I doubt it makes much difference, assuming there are no rows that don't match. But these are two different queries so it really doesn't make sense to compare them - you should just use the correct one.
Why not use the "INNER JOIN" keyword instead of "LEFT JOIN"?