Wrong expectation when count and join 3 table - mysql

I have 3 tables: broadcasts, broadcasts_likes and broadcasts_viewers:
I want return result include total_likes, total_viewers but when I run this sql, the expectation is wrong, could you help me check this. I just want use sql normally, don't use sub query. Here is my sql:
SELECT `b`.*,
COUNT(`bl`.`broadcast_id`) AS `total_likes`,
COUNT(`bv`.`broadcast_id`) AS `total_viewers`
FROM `broadcasts` `b`
LEFT JOIN `broadcasts_likes` `bl`
ON `b`.`broadcast_id` = `bl`.`broadcast_id`
LEFT JOIN `broadcasts_viewers` `bv`
ON `b`.`broadcast_id` = `bv`.`broadcast_id`
GROUP BY `b`.`broadcast_id`;

The simplest solution is to switch to count(distinct):
SELECT `b`.*,
COUNT(DISTINCT `bl`.profile_id`) AS `total_likes`,
COUNT(DISTINCT `bv`.view_profile_id) AS `total_viewers`
FROM `broadcasts` `b` LEFT JOIN
`broadcasts_likes` `bl`
ON `b`.`broadcast_id` = `bl`.`broadcast_id` LEFT JOIN
`broadcasts_viewers` `bv`
ON `b`.`broadcast_id` = `bv`.`broadcast_id`
GROUP BY `b`.`broadcast_id`;
If broadcast have lots of views and lots of likes, this won't scale particularly well. In that case, you can aggregate the tables before joining or use correlated subqueries:
SELECT `b`.*,
(SELECT COUNT(*)
FROM broadcasts_likes bl
WHERE `b`.`broadcast_id` = `bl`.`broadcast_id`
) as `total_likes`,
(SELECT COUNT(*)
FROM broadcasts_viewers bv
WHERE `b`.`broadcast_id` = `bv`.`broadcast_id`
) as `total_viewers`
FROM `broadcasts` `b`;
Actually, with indexes on broadcasts_likes(broadcast_id) and broadcasts_viewers(broadcast_id), this very should have very good performance.

Related

How can I merge these two left joins into a single one?

How can I merge these two left joins: http://sqlfiddle.com/#!9/1d2954/69/0
SELECT d.`id`, (adcount + bdcount)
FROM `docs` d
LEFT JOIN
(
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
) ad ON ad.`doc_id` = d.`id`
LEFT JOIN
(
SELECT db.`doc_id`, COUNT(db.`doc_id`) AS bdcount FROM `docs_scod_b` db
INNER JOIN `scod_b` b ON b.`id` = db.`scod_b_id`
WHERE b.`ver_b` IN ('BA', 'BB')
GROUP BY db.`doc_id`
) bd ON bd.`doc_id` = d.`id`
to be a Single left join just to ease its use in my code, while making it no less slower?
Let me first emphasize that your method of doing the calculation is the better method. You have two separate dimensions and aggregating them separately is often the most efficient method for doing the calculation. It is also the most scalable method.
That said, your query should be equivalent to this version:
SELECT d.id,
count(distinct a.id),
count(distinct b.id)
FROM docs d left join
docs_scod_a da
ON da.doc_id = d.id LEFT JOIN
scod_a a
ON a.id = da.scod_a_id AND a.ver_a IN ('AA', 'AB') LEFT JOIN
docs_scod_b db
ON db.doc_id = d.id LEFT JOIN
scod_b b
ON b.id = db.scod_b_id AND b.ver_b IN ('BA', 'BB')
GROUP BY d.id
ORDER BY d.id;
This query is more expensive than it looks, because the COUNT(DISTINCT) incurs additional overhead compared to COUNT().
And here is the SQL Fiddle.
And, because LEFT JOIN can return NULL values, your query is more correctly written as:
SELECT d.`id`, COALESCE(adcount, 0) + COALESCE(bdcount, 0)
If you were having problems with the results, this small change might fix those problems.
Performance may be a big problem, depending on sizes of each table. It appears to be an "inflate-deflate" situation since it first "inflates" the number of rows via JOIN, then "deflates" via GROUP BY. The formulation below avoids inflation-deflation.
But first, if I understand this subquery correctly, this
SELECT da.`doc_id`, COUNT(da.`doc_id`) AS adcount
FROM `docs_scod_a` da
INNER JOIN `scod_a` a ON a.`id` = da.`scod_a_id`
WHERE a.`ver_a` IN ('AA', 'AB')
GROUP BY da.`doc_id`
can be rewritten as
SELECT `doc_id`,
( SELECT COUNT(*)
FROM `scod_a`
WHERE `id` = da.`scod_a_id`
AND `ver_a` IN ('AA', 'AB')
) AS adcount
FROM `docs_scod_a` AS da
If that is correct, then the entire query becomes
SELECT d.id,
( SELECT COUNT(*)
FROM docs_scod_a ds
JOIN scod_a s ON s.id = ds.scod_a_id
WHERE ds.doc_id = d.id
AND s.ver_a IN ('AA', 'AB')
) +
( SELECT COUNT(*)
FROM docs_scod_b ds
JOIN scod_b s ON s.id = ds.scod_b_id
WHERE ds.doc_id = d.id
AND s.ver_b IN ('BA', 'BB')
)
FROM docs AS d
Which needs these indexes:
docs_scod_a: (doc_id, scod_a_id), (scod_a_id, doc_id)
docs_scod_b: (doc_id, scod_b_id), (scod_b_id, doc_id)
scod_a: (ver_a, id)
scod_b: (ver_b, id)
docs: -- presumably has PRIMARY KEY(id)
Note the lack of GROUP BY.
docs_scod_a smells like a many-to-many mapping table. I recommend you follow the tips here.
(No COALESCE is needed since COUNT will simply return zero.)
(I don't know whether my version is better (faster or whatever) than Gordon's, nor whether my indexes will help his formulation.)

How to access parent column from a subquery within a join

I'm trying to left join the second table useri_ban based on the users' ids, with the extra condition: useri_ban.start_ban = max_start.
In order for me to calculate max_start, I have to run the following subquery:
(SELECT MAX(ub.start_ban) AS max_start, user_id FROM useri_ban ub WHERE ub.user_id = useri.id)
Furthermore, in order to add max_start to every row, I need to inner join this subquery's result into the main result. However, it seems that once I apply that join, the subquery is no longer able to access useri.id.
What am I doing wrong?
SELECT
useri.id as id,
useri.email as email,
useri_ban.warning_type_id as warning_type_id,
useri_ban.type as type,
useri.created_at AS created_at
FROM `useri`
inner join
(SELECT MAX(ub.start_ban) AS max_start, user_id FROM useri_ban ub WHERE ub.user_id = useri.id) `temp`
on `useri`.`id` = `temp`.`user_id`
left join `useri_ban` on `useri_ban`.`user_id` = `useri`.`id` and `useri_ban`.`start_ban` = `max_start`
Does this solve your problem? You need GROUP BY in the inner query instead of another join.
SELECT useri.id, useri.email, maxQuery.maxStartBan
FROM useri
INNER JOIN
(
SELECT useri_ban.user_id ubid, MAX(useri_ban.startban) maxStartBan
FROM useri_ban
GROUP BY useri_ban.user_id
) AS maxQuery
ON maxQuery.ubid = useri.id;

count associate rows in multiple tables with left join

I'm trying to select Posts with the associate numbers of Comments and Likes.
This is my query
SELECT `waller_posts`.*,
COUNT(waller_comments.id) AS num_comments,
COUNT(waller_likes.id) AS num_likes
FROM `waller_posts`
LEFT JOIN `waller_comments` ON `waller_comments`.`post_id` = `waller_posts`.`id`
LEFT JOIN `waller_likes` ON `waller_likes`.`post_id` = `waller_posts`.`id`
WHERE `wall_id` = 1
AND `wall_type` = "User"
GROUP BY `waller_posts`.`id`
When I add the second left join in this case of the likes, the results of the num_comments and num_likes came wrong. How can I perform this kind of query?
The query builds up to give you every possible combination of comments and likes on a post.
Probably easiest to just use COUNT(DISTINCT...) :-
SELECT `waller_posts`.*,
COUNT(DISTINCT waller_comments.id) AS num_comments,
COUNT(DISTINCT waller_likes.id) AS num_likes
FROM `waller_posts`
LEFT JOIN `waller_comments` ON `waller_comments`.`post_id` = `waller_posts`.`id`
LEFT JOIN `waller_likes` ON `waller_likes`.`post_id` = `waller_posts`.`id`
WHERE `wall_id` = 1
AND `wall_type` = "User"
GROUP BY `waller_posts`.`id`
Note that your query is relying on a feature of MySQL but which would cause an error in most flavours of SQL. For most flavours of SQL you need to list ALL the non aggregate columns in the GROUP BY clause.
Use Distinct clause because it will display combination of like and comment table data
SELECT `waller_posts`.*,
COUNT(DISTINCT waller_comments.id) AS num_comments,
COUNT(DISTINCT waller_likes.id) AS num_likes
FROM `waller_posts`
LEFT JOIN `waller_comments` ON `waller_comments`.`post_id` = `waller_posts`.`id`
LEFT JOIN `waller_likes` ON `waller_likes`.`post_id` = `waller_posts`.`id`
WHERE `wall_id` = 1
AND `wall_type` = "User"
GROUP BY `waller_posts`.`id`

How to optimize this complected query?

While working with following query on mysql, Its getting locked,
SELECT event_list.*
FROM event_list
INNER JOIN members
ON members.profilenam=event_list.even_loc
WHERE (even_own IN (SELECT frd_id
FROM network
WHERE mem_id='911'
GROUP BY frd_id)
OR even_own = '911' )
AND event_list.even_active = 'y'
GROUP BY event_list.even_id
ORDER BY event_list.even_stat ASC
The Inner query inside IN constraint has many frd_id, So because of that above query is slooow..., So please help.
Thanks.
Try this:
SELECT el.*
FROM event_list el
INNER JOIN members m ON m.profilenam = el.even_loc
WHERE el.even_active = 'y' AND
(el.even_own = 911 OR EXISTS (SELECT 1 FROM network n WHERE n.mem_id=911 AND n.frd_id = el.even_own))
GROUP BY el.even_id
ORDER BY el.even_stat ASC
You don't need the GROUP BY on the inner query, that will be making the database engine do a lot of unneeded work.
If you put even_own = '911' before the select from network, then if even_own IS 911 then it will not have to do the subquery.
Also why do you have a group by on the subquery?
Also run explain plan top find out what is taking the time.
This might work better:
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
JOIN network AS n ON e.even_own = n.frd_id
WHERE n.mem_id = '911'
AND e.even_active = 'y'
ORDER BY e.even_stat ASC )
UNION DISTINCT
( SELECT e.*
FROM event_list AS e
INNER JOIN members AS m ON m.profilenam = e.even_loc
WHERE e.even_own = '911'
AND e.even_active = 'y' )
ORDER BY e.even_stat ASC
Since I don't know whether the JOINs one-to-many (or what), I threw in DISTINCT to avoid dups. There may be a better way, or it may be unnecessary (that is, UNION ALL).
Notice how I avoid two things that are performance killers:
OR -- turned into UNION
IN (SELECT...) -- turned into JOIN.
I made aliases to cut down on the clutter. I moved the ORDER BY outside the UNION (and added parens to make it work right).

Is it possible to convert this subquery into a join?

I want to replace the subquery with a join, if possible.
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON fftenant_surveyanswer.surveyquestion_id = 1
AND fftenant_surveyanswer.`surveyresult_id` IN (SELECT y.`surveyresult_id` FROM `fftenant_farmer_surveyresults` y WHERE y.farmer_id = `fftenant_farmer`.`person_ptr_id`)
I tried:
SELECT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
But that gave me one record per farmer per survey result for that farmer. I only want one record per farmer as returned by the first query.
A join may be faster on most RDBMs, but the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible.
You could use DISTINCT or GROUP BY, as mvds and Brilliand suggest, but I think it's closer to the query's design intent if you change the last join to an inner-join, but elevating its precedence:
SELECT farmer.person_ptr_id, surveyanswer.text_value
FROM fftenant_farmer AS farmer
INNER
JOIN fftenant_person AS person
ON person.id = farmer.person_ptr_id
LEFT
OUTER
JOIN
( fftenant_farmer_surveyresults AS farmer_surveyresults
INNER
JOIN fftenant_surveyanswer AS surveyanswer
ON surveyanswer.surveyresult_id = farmer_surveyresults.surveyresult_id
AND surveyanswer.surveyquestion_id = 1
)
ON farmer_surveyresults.farmer_id = farmer.person_ptr_id
Broadly speaking, this will end up giving the same results as the DISTINCT or GROUP BY approach, but in a more principled, less ad hoc way, IMHO.
Use SELECT DISTINCT or GROUP BY to remove the duplicate entries.
Changing your attempt as little as possible:
SELECT DISTINCT `fftenant_farmer`.`person_ptr_id`, `fftenant_surveyanswer`.`text_value`#, T5.`text_value`
FROM `fftenant_farmer`
INNER JOIN `fftenant_person`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_person`.`id`)
LEFT OUTER JOIN `fftenant_farmer_surveyresults`
ON (`fftenant_farmer`.`person_ptr_id` = `fftenant_farmer_surveyresults`.`farmer_id`)
LEFT OUTER JOIN `fftenant_surveyanswer`
ON (`fftenant_farmer_surveyresults`.`surveyresult_id` = `fftenant_surveyanswer`.`surveyresult_id`)
AND fftenant_surveyanswer.surveyquestion_id = 1
the real reason I asked this question is I just can't seem to formulate a join to replace the subquery and I want to know if it's even possible
Then consider a much simpler example to begin with e.g.
SELECT *
FROM T1
WHERE id IN (SELECT id FROM T2);
This is known as a semi join and if desired may be re-written using (among other possibilities) a JOIN with a SELECT clause to a) project only from the 'outer' table, and b) return only DISTINCT rows:
SELECT DISTINCT T1.*
FROM T1
JOIN T2 USING (id);