weird problem with select count from multiple tables (with joins) - mysql

I'm having an odd problem with the following query, it works all correct,
the count part in it gets me the number of comments on a given 'hintout'
I'm trying to add a similar count that gets the number of 'votes' for each hintout, the below is the query:
SELECT h.*
, h.permalink AS hintout_permalink
, hi.hinter_name
, hi.permalink
, hf.user_id AS followed_hid
, ci.city_id, ci.city_name, co.country_id, co.country_name, ht.thank_id
, COUNT(hc.comment_id) AS commentsCount
FROM hintouts AS h
INNER JOIN hinter_follows AS hf ON h.hinter_id = hf.hinter_id
INNER JOIN hinters AS hi ON h.hinter_id = hi.hinter_id
LEFT JOIN cities AS ci ON h.city_id = ci.city_id
LEFT JOIN countries as co ON h.country_id = co.country_id
LEFT JOIN hintout_thanks AS ht ON (h.hintout_id = ht.hintout_id
AND ht.thanker_user_id = 1)
LEFT JOIN hintout_comments AS hc ON hc.hintout_id = h.hintout_id
WHERE hf.user_id = 1
GROUP BY h.hintout_id
I tried to add the following to the select part:
COUNT(ht2.thanks_id) AS thanksCount
and the following on the join:
LEFT JOIN hintout_thanks AS ht2 ON h.hintout_id = ht2.hintout_id
but the weird thing happening, to which I could not find any answers or solutions,
is that the moment I add this addtiional part, the count for comments get ruined (I get wrong and weird numbers), and I get the same number for the thanks -
I couldn't understand why or how to fix it...and I'm avoiding using nested queries
so any help or pointers would be greatly appreciated!
ps: this might have been posted twice, but I can't find the previous post

When you add
LEFT JOIN hintout_thanks AS ht2 ON h.hintout_id = ht2.hintout_id
The number of rows increases, you get duplicate rows for table hc, which get counted double in COUNT(hc.comment_id).
You can replace
COUNT(hc.comment_id) <<-- counts duplicated
/*with*/
COUNT(DISTINCT(hc.comment_id)) <<-- only counts unique ids
To only count unique appearances on an id.
On values that are not unique, like co.county_name the count(distinct will not work because it will only list the distinct countries (if all your results are in the USA, the count will be 1).
Quassnoi
Has solved the whole count problem by putting the counts in a sub-select so that the extra rows caused by all those joins do not influence those counts.

SELECT h.*, h.permalink AS hintout_permalink, hi.hinter_name,
hi.permalink, hf.user_id AS followed_hid,
ci.city_id, ci.city_name, co.country_id, co.country_name,
ht.thank_id,
COALESCE(
(
SELECT COUNT(*)
FROM hintout_comments hci
WHERE hc.hintout_id = h.hintout_id
), 0) AS commentsCount,
COALESCE(
(
SELECT COUNT(*)
FROM hintout_comments hti
WHERE hti.hintout_id = h.hintout_id
), 0) AS thanksCount
FROM hintouts AS h
JOIN hinter_follows AS hf
ON hf.hinter_id = h.hinter_id
JOIN hinters AS hi
ON hi.hinter_id = h.hinter_id
LEFT JOIN
cities AS ci
ON ci.city_id = h.city_id
LEFT JOIN
countries as co
ON co.country_id = h.country_id
LEFT JOIN
hintout_thanks AS ht
ON ht.hintout_id = h.hintout_id
AND ht.thanker_user_id=1
WHERE hf.user_id = 1

Related

FULL table scan in LEFT JOIN using OR

How can this query be optimized to avoid the full table scan described below?
I've got a slow query that's taking approximately 15 seconds to return.
Let's get this part out of the way - I've confirmed all indexes are there.
When I run EXPLAIN, it shows that there is a FULL TABLE scan ran on the crosswalk table (the index for fromQuestionCategoryJoinID is not used, even if I attempt to force) - if I remove either of the fields and the OR, the index is used and query completes in milliseconds.
SELECT c.id, c.name, GROUP_CONCAT(DISTINCT tags.externalDisplayID SEPARATOR ', ') AS tags
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id OR cw.fromQuestionCategoryJoinID = qcatjsub.id)
-- index used if I remove OR, eg.: LEFT JOIN crosswalk cw on (cw.fromQuestionCategoryJoinID = qcatj.id)
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
GROUP BY c.id
ORDER BY c.name, tags.externalDisplayID;
Split the query into two queries for each part of the OR. Then combine them with UNION.
SELECT id, name, GROUP_CONCAT(DISTINCT externalDisplayID SEPARATOR ', ') AS tags
FROM (
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatj.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
UNION ALL
SELECT c.id, c.name, tags.externalDisplayID
FROM checklist c
LEFT JOIN questionchecklistjoin qcheckj on qcheckj.checklistID = c.id
LEFT JOIN questioncategoryjoin qcatj ON qcatj.questionID = qcheckj.questionID
LEFT JOIN questioncategoryjoin qcatjsub on qcatjsub.parentQuestionID = qcatj.questionID
LEFT JOIN crosswalk cw on cw.fromQuestionCategoryJoinID = qcatjsub.id
LEFT JOIN questioncategoryjoin qcj1 on qcj1.id = cw.toQuestionCategoryJoinID
LEFT JOIN question tags on tags.id = qcj1.questionID
) AS x
GROUP BY x.id
ORDER BY x.name
Also, it doesn't make sense to include externalDisplayID in ORDER BY, because that will order by its value from a random row in the group. You could put ORDER BY externalDisplayID in the GROUP_CONCAT() arguments if that's what you want.
There is a second inefficiency going on here. I call it "explode-implode". First a bunch of JOINs (potentially) expand the number of rows in an intermediate table, then GROUP BY c.id collapses the number of rows back to what you started with (one row of output per row of checkpoint).
Before trying to help with that, please answer:
Is LEFT really needed?
How many rows in each table? (Especially in cw)
Can you get rid of DISTINCT?
Barmar's answer can possibly be improved upon by delaying the JOINs to qcj1andtagsuntil after theUNION`:
SELECT ...
FROM ( SELECT ...
FROM first few tables
UNION ALL
SELECT ...
FROM first few tables
) AS u
[LEFT] JOIN qcj1
[LEFT] JOIN tags
GROUP BY ...
ORDER BY ...
Another optimization (again building on Barmar's)
GROUP BY x.id
ORDER BY x.name
-->
GROUP BY x.name, x.id
ORDER BY x.name, x.id
When the items in GROUP BY and ORDER BY are the "same", they can be done in a single action, thereby saving (at least) a sort.
x.name, x.id is deterministic, where as x.name might put two rows with the same name in a different order, depending (perhaps) on the phase of the moon.
These indexes may help:
qcheckj: INDEX(checklistID, questionID)
qcatj: INDEX(questionID, id)
qcatjsub: INDEX(parentQuestionID, id)
cw: INDEX(fromQuestionCategoryJoinID, toQuestionCategoryJoinID)

Adjusting a SELECT statement containing a LEFT OUTER JOIN to an UPDATE statement from the same table- is it possible?

As a newby, I've hunted high, and low to find a solution to my problem. Hoping someone can shed some light on a solution.
I have a SELECT statement that spits out Reports as desired. What I'd like to do is have an UPDATE statement that updates my table column named report with the resulting number. Any suggestions would be greatly appreciated.
SELECT
l.id,
l.plate,
COUNT(*) AS Reports
FROM
coh_items AS l
LEFT OUTER JOIN coh_items AS r
ON
l.id >= r.id AND l.plate = r.plate
GROUP BY
l.id,
l.plate
;
Is this what you want?
update coh_items i left join
(select i.id, i.plate, count(*) as reports
from coh_items i left join
coh_items i2
on i.id >= i2.id and i.plate = i2.plate
group by i.id, i.place
) i2
using (id, plate)
set i.reports = i2.reports;
The left join seems unnecessary because any given row is always going to match itself (assuming the comparison columns are not NULL).

I'm getting more than one result per left row from a LEFT JOIN with a subquery

With this query I'm supposed to get the latest message from the chats table for every agreement I get, as well as all the information including business name, etc.
I kind of solved it using GROUP BY in the subquery, but it is not the way I wanna fix this, because I don't understand why it does act as a RIGHT JOIN, and WHY doesn't it order it in the way I meant in the subquery:
SELECT agreements.id, agreements.`date`, agreements.state, business.name, chat.message
FROM ((agreements JOIN
business_admin
ON agreements.business = business_admin.business AND business_admin.user = 1
) LEFT JOIN
business
ON business.id = agreements.business
) LEFT JOIN
(SELECT agreements_chat.agreement, agreements_chat.message
FROM agreements_chat
WHERE origin = 0
ORDER BY agreements_chat.`date` DESC
) AS chat
ON agreements.id = chat.agreement
I really appreciate your help, thank you so much!
It's not working because the subquery in your left join returns more than one rows, hence the duplication of rows you get.
SELECT agreements.id,
agreements.`date`,
agreements.state,
business.name,
chat.message
FROM agreements
JOIN business_admin
ON agreements.business = business_admin.business AND
business_admin.user = 1
LEFT JOIN
business
ON business.id = agreements.business
LEFT JOIN
agreements_chat chat
ON chat.origin = 0 AND
chat.agreement = agreements.id
LEFT JOIN
(
SELECT agreement, max(`date`) last_date
FROM agreements_chat
GROUP BY agreement
) last_chat
ON chat.agreement = last_chat.agreement AND
chat.`date` = last_chat.last_date
Note that (as per #GordonLinoff comment) you don't need parenthese around your joins.

sql counts wrong number of likes

I have written an sql statement that besides all the other columns should return the number of comments and the number of likes of a certain post. It works perfectly when I don't try to get the number of times it has been shared too. When I try to get the number of time it was shared instead it returns a wrong number of like that seems to be either the number of shares and likes or something like that. Here is the code:
SELECT
[...],
count(CS.commentId) as shares,
count(CL.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN
account A ON A.id = `AS`.accountId
INNER JOIN
comment C ON C.accountId = A.id
LEFT JOIN
commentLikes CL ON C.commentId = CL.commentId
LEFT JOIN
commentShares CS ON C.commentId = CS.commentId
GROUP BY
C.time
ORDER BY
year, month, hour, month
Could you also tell me if you think this is an efficient SQL statement or if you would do it differently? thank you!
Do this instead:
SELECT
[...],
(select count(*) from commentLikes CL where C.commentId = CL.commentId) as shares,
(select count(*) from commentShares CS where C.commentId = CS.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN account A ON A.id = `AS`.accountId
INNER JOIN comment C ON C.accountId = A.id
GROUP BY C.time
ORDER BY year, month, hour, month
If you use JOINs, you're getting back one result set, and COUNT(any field) simply counts the rows and will always compute the same thing, and in this case the wrong thing. Subqueries are what you need here. Good luck!
EDIT: as posted below, count(distinct something) can also work, but it's making the database do more work than necessary for the answer you want to end up with.
Quick fix:
SELECT
[...],
count(DISTINCT CS.commentId) as shares,
count(DISTINCT CL.commentId) as numberOfLikes
Better approach:
SELECT [...]
, Coalesce(shares.numberOfShares, 0) As numberOfShares
, Coalesce(likes.numberOfLikes , 0) As numberOfLikes
FROM [...]
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfShares
FROM commentShares
GROUP
BY commentId
) As shares
ON shares.commentId = c.commentId
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfLikes
FROM commentLikes
GROUP
BY commentId
) As likes
ON likes.commentId = c.commentId

Sums are being multiplied in query

I am working on 2 problems for homework and after many hours I have just about solved them both, the last issue I have is that both of my queries are coming back with doubled numerical values instead of single.
Here is what I have:
SELECT SUM(P.AMT_PAID) AS TOTAL_PAID, C.CITATION_ID, C.DATE_ISSUED, SUM(V.FINE_CHARGED) AS TOTAL_CHARGED
FROM PAYMENT P, CITATION C, VIOLATION_CITATION V
WHERE V.CITATION_ID = C.CITATION_ID
AND C.CITATION_ID = P.CITATION_ID
GROUP BY C.CITATION_ID;
and my other one:
SELECT C.CITATION_ID, C.DATE_ISSUED, SUM(V.FINE_CHARGED) AS TOTAL_CHARGED, SUM(P.AMT_PAID) AS TOTAL_PAID, SUM(V.FINE_CHARGED) - SUM(P.AMT_PAID) AS TOTAL_OWED
FROM (CITATION C)
LEFT JOIN VIOLATION_CITATION V
ON V.CITATION_ID = C.CITATION_ID
LEFT JOIN PAYMENT P
ON P.CITATION_ID = C.CITATION_ID
GROUP BY C.CITATION_ID
ORDER BY TOTAL_OWED DESC;
I am sure there is just something that I am overlooking. If someone else could kindly tell me where I went awry it would be a great help.
Select Sum(P.Amt_Paid) As Total_Paid, C.Citation_Id
, C.Date_Issued, Sum(V.Fine_Charged) As Total_Charged
From Payment P
Join Citation C
On C.Citation_Id = P.Citation_Id
Join Violation_Citation V
On V.Citation_Id = C.Citation_Id
Group By C.Citation_Id
First, you should use the JOIN syntax instead of using the comma-delimited list of tables. It makes it easier to read, more standardized and will help prevent problems by overlooking a filtering clause.
Second, the most likely reason for having a sum that is too large is due to the join to the VIOLATION_CITATION table. If you remove the Group By and columns with aggregate functions, you will likely see that P.AMT_PAID is repeated for each instance of VIOLATION_CITATION. Perhaps, the following will solve the problem:
Select Coalesce(PaidByCitation.TotalAmtPaid,0) As Total_Paid
, C.Citation_Id, C.Date_Issued
, Coalesce(ViolationByCitation.TotalCharged,0) As Total_Charged
, Coalesce(ViolationByCitation.TotalCharged,0)
- Coalesce(PaidByCitation.TotalAmtPaid,0) As Total_Owed
From Citation As C
Left Join (
Select P.Citation_Id, Sum( P.Amt_Paid ) As TotalAmtPaid
From Payment As P
Group By P.Citation_Id
) As PaidByCitation
On PaidByCitation.Citation_Id = C.Citation_Id
Left Join (
Select V.Citation_Id, Sum( V.Find_Charged ) As TotalCharged
From Violation_Citation As V
Group By V.Citation_Id
) As ViolationByCitation
On ViolationByCitation.Citation_Id = C.Citation_Id
The use of Coalesce is to ensure that if the left join returns no rows for a given Citation_ID value, that we replace the Null with zero.