How can I merge these two left joins into a single one? - mysql

How can I merge these two left joins: http://sqlfiddle.com/#!9/1d2954/69/0
SELECT d.`id`, (adcount + bdcount)
FROM `docs` d
LEFT JOIN
(
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
) ad ON ad.`doc_id` = d.`id`
LEFT JOIN
(
SELECT db.`doc_id`, COUNT(db.`doc_id`) AS bdcount FROM `docs_scod_b` db
INNER JOIN `scod_b` b ON b.`id` = db.`scod_b_id`
WHERE b.`ver_b` IN ('BA', 'BB')
GROUP BY db.`doc_id`
) bd ON bd.`doc_id` = d.`id`
to be a Single left join just to ease its use in my code, while making it no less slower?

Let me first emphasize that your method of doing the calculation is the better method. You have two separate dimensions and aggregating them separately is often the most efficient method for doing the calculation. It is also the most scalable method.
That said, your query should be equivalent to this version:
SELECT d.id,
count(distinct a.id),
count(distinct b.id)
FROM docs d left join
docs_scod_a da
ON da.doc_id = d.id LEFT JOIN
scod_a a
ON a.id = da.scod_a_id AND a.ver_a IN ('AA', 'AB') LEFT JOIN
docs_scod_b db
ON db.doc_id = d.id LEFT JOIN
scod_b b
ON b.id = db.scod_b_id AND b.ver_b IN ('BA', 'BB')
GROUP BY d.id
ORDER BY d.id;
This query is more expensive than it looks, because the COUNT(DISTINCT) incurs additional overhead compared to COUNT().
And here is the SQL Fiddle.
And, because LEFT JOIN can return NULL values, your query is more correctly written as:
SELECT d.`id`, COALESCE(adcount, 0) + COALESCE(bdcount, 0)
If you were having problems with the results, this small change might fix those problems.

Performance may be a big problem, depending on sizes of each table. It appears to be an "inflate-deflate" situation since it first "inflates" the number of rows via JOIN, then "deflates" via GROUP BY. The formulation below avoids inflation-deflation.
But first, if I understand this subquery correctly, this
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount
FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
can be rewritten as
SELECT `doc_id`,
( SELECT COUNT(*)
FROM `scod_a`
WHERE `id` = da.`scod_a_id`
AND `ver_a` IN ('AA', 'AB')
) AS adcount
FROM `docs_scod_a` AS da
If that is correct, then the entire query becomes
SELECT d.id,
( SELECT COUNT(*)
FROM docs_scod_a ds
JOIN scod_a s ON s.id = ds.scod_a_id
WHERE ds.doc_id = d.id
AND s.ver_a IN ('AA', 'AB')
) +
( SELECT COUNT(*)
FROM docs_scod_b ds
JOIN scod_b s ON s.id = ds.scod_b_id
WHERE ds.doc_id = d.id
AND s.ver_b IN ('BA', 'BB')
)
FROM docs AS d
Which needs these indexes:
docs_scod_a: (doc_id, scod_a_id), (scod_a_id, doc_id)
docs_scod_b: (doc_id, scod_b_id), (scod_b_id, doc_id)
scod_a: (ver_a, id)
scod_b: (ver_b, id)
docs: -- presumably has PRIMARY KEY(id)
Note the lack of GROUP BY.
docs_scod_a smells like a many-to-many mapping table. I recommend you follow the tips here.
(No COALESCE is needed since COUNT will simply return zero.)
(I don't know whether my version is better (faster or whatever) than Gordon's, nor whether my indexes will help his formulation.)

Related

FULL table scan in LEFT JOIN using OR

How can this query be optimized to avoid the full table scan described below?
I've got a slow query that's taking approximately 15 seconds to return.
Let's get this part out of the way - I've confirmed all indexes are there.
When I run EXPLAIN, it shows that there is a FULL TABLE scan ran on the crosswalk table (the index for fromQuestionCategoryJoinID is not used, even if I attempt to force) - if I remove either of the fields and the OR, the index is used and query completes in milliseconds.
SELECT c.id, c.name, GROUP_CONCAT(DISTINCT tags.externalDisplayID SEPARATOR ', ') AS tags
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id OR cw.fromQuestionCategoryJoinID = qcatjsub.id)
-- index used if I remove OR, eg.: LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id)
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
GROUP BY c.id
ORDER BY c.name, tags.externalDisplayID;
Split the query into two queries for each part of the OR. Then combine them with UNION.
SELECT id, name, GROUP_CONCAT(DISTINCT externalDisplayID SEPARATOR ', ') AS tags
FROM (
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatj.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
UNION ALL
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatjsub.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
) AS x
GROUP BY x.id
ORDER BY x.name
Also, it doesn't make sense to include externalDisplayID in ORDER BY, because that will order by its value from a random row in the group. You could put ORDER BY externalDisplayID in the GROUP_CONCAT() arguments if that's what you want.
There is a second inefficiency going on here. I call it "explode-implode". First a bunch of JOINs (potentially) expand the number of rows in an intermediate table, then GROUP BY c.id collapses the number of rows back to what you started with (one row of output per row of checkpoint).
Before trying to help with that, please answer:
Is LEFT really needed?
How many rows in each table? (Especially in cw)
Can you get rid of DISTINCT?
Barmar's answer can possibly be improved upon by delaying the JOINs to qcj1andtagsuntil after theUNION`:
SELECT ...
FROM ( SELECT ...
FROM first few tables
UNION ALL
SELECT ...
FROM first few tables
) AS u
[LEFT] JOIN qcj1
[LEFT] JOIN tags
GROUP BY ...
ORDER BY ...
Another optimization (again building on Barmar's)
GROUP BY x.id
ORDER BY x.name
-->
GROUP BY x.name, x.id
ORDER BY x.name, x.id
When the items in GROUP BY and ORDER BY are the "same", they can be done in a single action, thereby saving (at least) a sort.
x.name, x.id is deterministic, where as x.name might put two rows with the same name in a different order, depending (perhaps) on the phase of the moon.
These indexes may help:
qcheckj: INDEX(checklistID, questionID)
qcatj: INDEX(questionID, id)
qcatjsub: INDEX(parentQuestionID, id)
cw: INDEX(fromQuestionCategoryJoinID, toQuestionCategoryJoinID)

BigQuery - Left Join two tables ON a column OR another

I want to perform a left join on two tables. The field I will join by is an email address. As the table I on the left has two email fields which may be different, I want that if the joining by the first email fails and returns null values, perform a second join on the other email field. Lastly, I want to throw away those entries which have not been matched with any of the two joins.
I have thought of doing something like this:
SELECT *
FROM a
LEFT JOIN b
ON
a.address1 = b.email
OR a.address2 = b.email
However, this returned an error message saying LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.
What is the correct way of achieving this?
OR in JOIN conditions makes is really hard to optimize queries. So, BigQuery makes it hard to use OR.
I think you can rephrase the query by doing:
SELECT *
FROM a CROSS JOIN
UNNEST(ARRAY[address1, address2]) address LEFT JOIN
b
ON address = b.email;
You might be able to phrase your logic using a union:
SELECT * FROM a LEFT JOIN b ON a.address1 = b.email
UNION
SELECT * FROM a LEFT JOIN b ON a.address2 = b.email;
Consider below (BigQuery) approach
SELECT * EXCEPT(row_key, priority)
FROM (
SELECT *, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, 1) priority
FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
ON a.address1 = b.email
UNION ALL
SELECT *, TO_JSON_STRING(a), IF(b.email IS NULL, 3, 2)
FROM `project.dataset.tableA` a LEFT JOIN `project.dataset.tableB` b
ON a.address2 = b.email
)
WHERE true
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1
Or you can use less verbose / more generic (so you can avoid redundant code fragments which can be useful if you need more than just two conditions) version of above
SELECT * EXCEPT(row_key, priority)
FROM (
SELECT a.*, b.*, TO_JSON_STRING(a) row_key, IF(b.email IS NULL, 3, c.priority) priority
FROM `project.dataset.tableA` a,
UNNEST([STRUCT(address1 as address, 1 as priority), (address2, 2)]) c
LEFT JOIN `project.dataset.tableB` b
ON c.address = b.email
)
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY row_key ORDER BY priority NULLS LAST) = 1

Collapse rows with equal values in particular columns in one

I have a table with deals data written in it. I need to query said deals while joining on other tables, however d.originalorderid is not a unique entry and I get several duplicate entries. I want to:
For each unique d.originalorderid select one row
This row should be most recent (largest ID)
How would I go about this? This is the query I have right now.
SELECT d.id,
d.date,
d.ip, d.panmask,
d.merchantorderid,
d.amount,
d.cardholder,
d.bankhumanname,
d.cardtypeid,
d.bankcountrycode,
d.usercountrycode,
mc.paymentkey as merchantname,
dt.status,
d.merchantcontract,
dt.tag,
d.originalorderid,
ds.refnumber,
ds.dealauthcode,
mc.processingid,
pc.Name as processing,
d.customparams
FROM Deal as d
LEFT JOIN MerchantContract as mc ON mc.Id = d.MerchantContract
LEFT JOIN DealTrace as dt ON d.Id = dt.DealId
AND dt.id = (SELECT MAX(id)
FROM DealTrace WITH (nolock)
WHERE DealId = d.id)
LEFT JOIN DealSummary ds ON d.Id = ds.DealId
AND ds.id = (SELECT MAX(id)
FROM DealSummary WITH (nolock)
WHERE DealId = d.id)
LEFT JOIN Processing pc on mc.ProcessingId = pc.id
WHERE (d.MerchantContract IN ('12'))
ORDER BY ID desc OFFSET 0 ROWS FETCH NEXT 1000 ROWS ONLY
If I understand the requirement, instead of joining to the Deal table, join to a correlated subquery on it which returns the deal with the highest deal id which has the same original order id. Test it but I think its OK..
SELECT ....
FROM
(SELECT *
FROM Deal d1
WHERE d1.Id=(SELECT MAX(Id)
FROM Deal d2
WHERE d2.OriginalOrderId=d1.OriginalOrderId)) d
LEFT JOIN MerchantContract as mc ON mc.Id = d.MerchantContract
etc etc

Wrong expectation when count and join 3 table

I have 3 tables: broadcasts, broadcasts_likes and broadcasts_viewers:
I want return result include total_likes, total_viewers but when I run this sql, the expectation is wrong, could you help me check this. I just want use sql normally, don't use sub query. Here is my sql:
SELECT `b`.*,
COUNT(`bl`.`broadcast_id`) AS `total_likes`,
COUNT(`bv`.`broadcast_id`) AS `total_viewers`
FROM `broadcasts` `b`
LEFT JOIN `broadcasts_likes` `bl`
ON `b`.`broadcast_id` = `bl`.`broadcast_id`
LEFT JOIN `broadcasts_viewers` `bv`
ON `b`.`broadcast_id` = `bv`.`broadcast_id`
GROUP BY `b`.`broadcast_id`;
The simplest solution is to switch to count(distinct):
SELECT `b`.*,
COUNT(DISTINCT `bl`.profile_id`) AS `total_likes`,
COUNT(DISTINCT `bv`.view_profile_id) AS `total_viewers`
FROM `broadcasts` `b` LEFT JOIN
`broadcasts_likes` `bl`
ON `b`.`broadcast_id` = `bl`.`broadcast_id` LEFT JOIN
`broadcasts_viewers` `bv`
ON `b`.`broadcast_id` = `bv`.`broadcast_id`
GROUP BY `b`.`broadcast_id`;
If broadcast have lots of views and lots of likes, this won't scale particularly well. In that case, you can aggregate the tables before joining or use correlated subqueries:
SELECT `b`.*,
(SELECT COUNT(*)
FROM broadcasts_likes bl
WHERE `b`.`broadcast_id` = `bl`.`broadcast_id`
) as `total_likes`,
(SELECT COUNT(*)
FROM broadcasts_viewers bv
WHERE `b`.`broadcast_id` = `bv`.`broadcast_id`
) as `total_viewers`
FROM `broadcasts` `b`;
Actually, with indexes on broadcasts_likes(broadcast_id) and broadcasts_viewers(broadcast_id), this very should have very good performance.

How to optimize this complected query?

While working with following query on mysql, Its getting locked,
SELECT event_list.*
FROM event_list
INNER JOIN members
ON members.profilenam=event_list.even_loc
WHERE (even_own IN (SELECT frd_id
FROM network
WHERE mem_id='911'
GROUP BY frd_id)
OR even_own = '911' )
AND event_list.even_active = 'y'
GROUP BY event_list.even_id
ORDER BY event_list.even_stat ASC
The Inner query inside IN constraint has many frd_id, So because of that above query is slooow..., So please help.
Thanks.
Try this:
SELECT el.*
FROM event_list el
INNER JOIN members m ON m.profilenam = el.even_loc
WHERE el.even_active = 'y' AND
(el.even_own = 911 OR EXISTS (SELECT 1 FROM network n WHERE n.mem_id=911 AND n.frd_id = el.even_own))
GROUP BY el.even_id
ORDER BY el.even_stat ASC
You don't need the GROUP BY on the inner query, that will be making the database engine do a lot of unneeded work.
If you put even_own = '911' before the select from network, then if even_own IS 911 then it will not have to do the subquery.
Also why do you have a group by on the subquery?
Also run explain plan top find out what is taking the time.
This might work better:
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
JOIN network AS n ON e.even_own = n.frd_id
WHERE n.mem_id = '911'
AND e.even_active = 'y'
ORDER BY e.even_stat ASC )
UNION DISTINCT
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
WHERE e.even_own = '911'
AND e.even_active = 'y' )
ORDER BY e.even_stat ASC
Since I don't know whether the JOINs one-to-many (or what), I threw in DISTINCT to avoid dups. There may be a better way, or it may be unnecessary (that is, UNION ALL).
Notice how I avoid two things that are performance killers:
OR -- turned into UNION
IN (SELECT...) -- turned into JOIN.
I made aliases to cut down on the clutter. I moved the ORDER BY outside the UNION (and added parens to make it work right).