Consolidate 2 queries with similar where clauses but different aggregate function targets - mysql

I have 2 queries that work fine separately. Given they are similar, I'd like to consolidate them into one performant query. Seems straightforward as the where clauses are similar. But the sum, count, and min functions all apply to different rows and get in the way.
Context:
Users can score (or rate) a location and get points
User A can refer User B and get referral points when User B first submits a score
Points expire after a certain date
Goal is to build a leaderboard of users and their total points for scoring and referring for a particular location (area/country)
Positional parameters are filled in with hard values for 'Massachusetts', 'United States', and the scoreDateTime expiration date and are unfortunately duplicated in both select subqueries.
Question:
How can the query below be reorganized to combine constraints? There must be a way to start with a list of scores from a specific location after a certain date. The only complication is to get User B's first score date and only offer referral points to User A if it is after the expiration date.
select scoring.userId, scoring.points + referring.points as leaderPoints
from (
select userId, sum(ratingPoints) as points
from scores s, locations l
where s.locationId = l.locationId and
l.locationArea = 'Massachusetts' and
l.locationCountry = 'United States' and
s.scoreDateTime > '2016-04-16 18:50:53.154' and
s.userId != 0
group by s.userId
) as scoring
join (
select u1.userId, count(*) * 20 as points
from users u0
join users u1 on u0.userId = u1.userId
join users u2 on u2.referredByEmail = u1.emailAddress
join scores s on u2.userId = s.userId
join locations l on s.locationId = l.locationId
where l.locationArea = 'Massachusetts' and
l.locationCountry = 'United States' and
scoreDateTime = (
select min(scoreDateTime)
from scores
where userId = u2.userId
) and
scoreDateTime >= '2016-04-16 18:50:53.154'
group by u1.userId
) as referring on scoring.userId = referring.userId
order by leaderPoints desc
limit 10;

This is untested code, but it should do the trick. The Cross Apply is for readability...it'll hurt performance, but this doesn't seem to be a particularly process-intensive query, so I would keep it.
Please give it a try and let me know if you have any questions.
SELECT U.UserID,
ISNULL(SUM(CASE WHEN S.UserID IS NULL THEN 0 ELSE S.ratingPoints END), 0) AS [Rating Points],
ISNULL(SUM(CASE WHEN SS.userID IS NULL THEN 0 ELSE 20 END), 0) AS [Referral Points]
FROM Users U
LEFT OUTER JOIN scores S
ON S.userID = U.userID
AND S.scoreDateTime >= '2016-04-16 18:50:53.154'
LEFT OUTER JOIN locations L
ON S.locationID = L.locationID
AND L.locationArea = 'Massachusetts'
AND L.LocationCountry = 'United States'
LEFT OUTER JOIN Users U2
ON U2.referredByEmail = U.emailAddress
LEFT OUTER JOIN scores SS
ON SS.userID = U2.userID
LEFT OUTER JOIN locations LL
ON SS.locationID = LL.locationID
AND LL.locationArea = 'Massachusetts'
AND LL.locationCountry = 'United States'
AND SS.scoreDateTime >= '2016-04-16 18:50:53.154'
AND SS.scoreDateTime =
(
SELECT MIN(scoreDateTime)
FROM scores
where userID = U2.userID
)
GROUP BY U.userID
EDIT:
Modified answer to remove Cross Apply

Thanks Stan Shaw but I was unable to get your query to work on MySQL to test the results. However, I did notice a special case that was not covered by my original query. A user can get refer points from areas in which they themselves have not submitted scores. As long as the new user scores in that area, they get refer points there.
Here is the final query I'm using. I was not able to consolidate the duplicate where clauses in a way that appeared performant.
select userId, sum(points) as leaderPoints
from (
select s.userId, sum(s.ratingPoints) as points
from scores s, locations l
where s.locationId = l.locationId and
l.locationArea = 'Georgia' and
l.locationCountry = 'United States' and
s.scoreDateTime >= '2016-04-05 03:00:00.000' and
s.userId != 1
group by userId
union
select u1.userId, 20 as points
from users u0, users u1, users u2, scores s, locations l
where u0.userId = u1.userId and
u2.referredByEmail = u1.emailAddress and
u2.userId = s.userId and
s.locationId = l.locationId and
l.locationArea = 'Georgia' and
l.locationCountry = 'United States' and
scoreDateTime >= '2016-04-05 03:00:00.000' and
scoreDateTime = (
select min(scoreDateTime)
from scores
where userId = u2.userId
)
) as pointsEarned
group by userId
order by leaderPoints desc
limit 10
order by leaderPoints desc
limit 100;

Related

Is there a method of counting an attribute that is in a GROUP BY clause?

I need have created a select statement to list out all the customers that have been to multiple merchants below.
I want to create another statement to display how many of those customers have been to each merchant.
What is the optimal method of approaching this problem?
Lists out all customers that have been to multiple merchants.
WITH valentinesDayMerchant AS (
SELECT m.MerchantId, m.MerchantGroupId, m.WebsiteName
FROM Merchant m
INNER JOIN OpeningHours oh ON m.MerchantId = oh.MerchantId AND oh.DayOfWeek = 'TUE'
LEFT JOIN devices.DeviceConnectionState AS dcs ON dcs.MerchantId = oh.MerchantId
WHERE MerchantStatus = '-' AND (m.PrinterType IN ('V','O') OR dcs.State = 1 OR dcs.StateTransitionDateTime > '2023-01-23')
)
SELECT DISTINCT ul.UserLoginId, ul.FullName, ul.EmailAddress, ul.Mobile
FROM dbo.UserLogin AS ul
INNER JOIN dbo.Patron AS p ON p.UserLoginId = ul.UserLoginId
INNER JOIN valentinesDayMerchant AS m ON (m.MerchantId = ul.ReferringMerchantId OR m.MerchantId IN (SELECT pml.MerchantId FROM dbo.PatronMerchantLink AS pml WHERE pml.PatronId = p.PatronId AND ISNULL(pml.IsBanned, 0) = 0))
LEFT JOIN (
SELECT mg.MerchantGroupId, mg.MerchantGroupName, groupHost.HostName [GroupHostName]
FROM dbo.MerchantGroup AS mg
INNER JOIN dbo.Merchant AS parent ON parent.MerchantId = mg.ParentMerchantId
INNER JOIN dbo.HttpHostName AS groupHost ON groupHost.MerchantID = parent.MerchantId AND groupHost.Priority = 0
) mGroup ON mGroup.MerchantGroupId = m.MerchantGroupId
LEFT JOIN (
SELECT po.PatronId, MAX(po.OrderDateTime) [LastOrder]
FROM dbo.PatronsOrder AS po
GROUP BY po.PatronId
) orders ON orders.PatronId = p.PatronId
INNER JOIN dbo.HttpHostName AS hhn ON hhn.MerchantID = m.MerchantId AND hhn.Priority = 1
WHERE ul.UserLoginId NOT IN (1,2,100,372) AND ul.UserStatus <> 'D' AND (
ISNULL(orders.LastOrder, '2000-01-01') > '2020-01-01' OR ul.RegistrationDate > '2022-01-01'
)
GROUP BY ul.UserLoginId, ul.FullName, ul.EmailAddress, ul.Mobile
HAVING COUNT(m.MerchantId) > 1
Methods I have tried include adding the merchant name to a group by and displaying the count of the customers, however this does not work as I cannot have anything related to the Merchant in the GROUP BY, or I wouldn't be able to use HAVING clause to identify the customers that have been to multiple merchants. I have also tried selecting all the merchants and counting the distinct customers which doesn't work as it takes into account all the customers, not specifically the customers that have been to multiple merchants only.

MYSQL LEFT JOINing the latest record from another table

I have a training_stats table (current due training) and I also have a completed_training table.
What I want to do is query due training with the last completed date from the completed table.
I've nearly got what I want, I get the due training, but they are duplicated with each completed record(as there are many completed records to each current due), and I only want single rows and the latest completed date.
I've been trying to use MAX, and when I run the MAX query independently, I get the last record. But when the MAX query is in the join, it is returning all completed rows.
This is the query that I am using:
SELECT s.course_stat_id
,o.org_name
,u.id
,u.first_name
,u.last_name
,a.area_id
,a.area_name
,tc.course_id
,tc.course_name
,s.assigned_on
,s.due
,s.pass_mark
,s.completed_on
,completed.complete_training_id
,completed.complete_date
FROM training_stats s
JOIN organisations o ON o.org_id = s.org_id
LEFT JOIN (
SELECT complete_training_id
,user_id
,area_id
,course_id
,max(completed_on) AS complete_date
FROM completed_training
GROUP BY complete_training_id
) completed ON completed.user_id = s.user_id
AND completed.area_id = s.area_id
AND completed.course_id = s.course_id
LEFT JOIN users u ON u.id = s.user_id
LEFT JOIN areas a ON a.area_id = s.area_id
LEFT JOIN training_courses tc ON tc.course_id = s.course_id
WHERE u.active = 1
AND o.active = 1
AND s.assigned = 1
Can you see what I am doing wrong?
Not exactly positive of your expected results, but the failure is PROBABLY for your group by and JOIN. Your group by is ONLY on the training ID, but you are also pulling user, area and course as well as max date completed for said respective training ID, user, area, course. You group by and join should match the unique characteristics.
Without seeing data, the query as I interpret it is that the "complete_training_id" is an auto-increment column for that table. Having said that, there would only ever be one record for that ID.
Having said that, the completed training table can have for a single user, area and course, multiple training days of which you want the most recent. For example someone attending college and needs to take many computer classes and they are refreshers from prior so assume all are same course ID. A person could take in 2012, 2014, 2016. You would want the instance of the user/area/course showing the 2016 dated training. So lets look at that first.
select
ct.user_id,
ct.area_id,
ct.course_id,
max(ct.completed_on) AS complete_date
FROM
completed_training ct
GROUP BY
ct.user_id,
ct.area_id,
ct.course_id
Now, for each user, area and course of study, I have one record with the most recent completion date. NOW lets pull the rest of the details, but since you need the completed training ID too, I applied the MAX() of that in the query below. The ID should by default be increasing every time a new record is added, so one completed a year ago would have a lower value than the ID completed today. So you get both the completed ID and its corresponding date for a given user, area, course.
SELECT
s.course_stat_id,
o.org_name,
u.id,
u.first_name,
u.last_name,
a.area_id,
a.area_name,
tc.course_id,
tc.course_name,
s.assigned_on,
s.due,
s.pass_mark,
s.completed_on,
ct.complete_training_id,
ct.complete_date
FROM
training_stats s
JOIN organisations o
ON s.org_id = o.org_id
AND o.active = 1
LEFT JOIN
( select
ct.user_id,
ct.area_id,
ct.course_id,
max(ct.complete_training_id ) as complete_training_id,
max(ct.completed_on) AS complete_date
FROM
completed_training ct
GROUP BY
ct.user_id,
ct.area_id,
ct.course_id ) ct
on s.user_id = ct.user_id
AND s.area_id = ct.area_id
AND s.course_id = ct.course_id
JOIN users u
ON s.user_id = u.id
AND u.active = 1
LEFT JOIN areas a
ON s.area_id = a.area_id
LEFT JOIN training_courses tc
ON s.course_id = tc.course_id
WHERE
s.assigned = 1
I'm not 100% sure of that. First, run this query. It should list all completed training, with a rnk from 1 (lastest), to n (oldest).
SELECT complete_training_id
,user_id
,area_id
,course_id
,completed_on AS complete_date
,#curRank := case when complete_training_id <> #cur_complete_training_id then 0 else #curRank + 1 end rnk
FROM completed_training, (select #curRank := 0, #cur_complete_training_id := 0)
ORDER BY complete_training_id, completed_on DESC
If true, the answer is :
SELECT s.course_stat_id
,o.org_name
,u.id
,u.first_name
,u.last_name
,a.area_id
,a.area_name
,tc.course_id
,tc.course_name
,s.assigned_on
,s.due
,s.pass_mark
,s.completed_on
,completed.complete_training_id
,completed.complete_date
FROM training_stats s
JOIN organisations o ON o.org_id = s.org_id
LEFT JOIN (
SELECT complete_training_id
,user_id
,area_id
,course_id
,completed_on AS complete_date
,#curRank := case when complete_training_id <> #cur_complete_training_id then 0 else #curRank + 1 end rnk
FROM completed_training, (select #curRank := 0, #cur_complete_training_id := 0)
ORDER BY complete_training_id, completed_on DESC
) completed ON completed.user_id = s.user_id and completed.rnk = 1
AND completed.area_id = s.area_id
AND completed.course_id = s.course_id
LEFT JOIN users u ON u.id = s.user_id
LEFT JOIN areas a ON a.area_id = s.area_id
LEFT JOIN training_courses tc ON tc.course_id = s.course_id
WHERE u.active = 1
AND o.active = 1
AND s.assigned = 1

Issue returning row content data alongside Max(score) by class & round, using group by

I want to return the personal best for a user, by class & round; but the Date Shot is coming out incorrect. Help please - so frustrating!
SELECT
c.Class,
r.Round,
h.shootdate as 'Date Shot',
max(h.Score) AS 'Personal Best'
FROM history h, classes c, rounds r
WHERE c.id = h.classid AND r.id = h.roundid AND h.userid = 1
GROUP BY c.Class, r.Round
You could use a self join on history table to pick a row with maximum score for each user per classid and roundid
SELECT
c.Class,
r.Round,
h.shootdate as 'Date Shot',
h.Score AS 'Personal Best'
FROM history h
JOIN (
SELECT classid, roundid, max(score) score
FROM history
WHERE userid = 1
GROUP BY classid, roundid
) h1 ON h.classid = h1.classid AND h.roundid = h1.roundid AND h.score = h1.score
JOIN classes c ON c.id = h.classid
JOIN rounds r ON r.id = h.roundid
-- WHERE h.userid = 1 // not necessary
In your query you are picking shootdate which is not present in group by that is why you are not getting correct value where score is max, Also use explicit join syntax to relate your tables

Issues with duplicated Ids in aggregated subquery

I have this query: (apologies for complexity, I'm not certain what I can remove without impacting the question)
SELECT COUNT(*) AS total,
SUM(o.total) AS total_loss,
SUM((SELECT SUM(cost_price) FROM `orders_items` WHERE orders_id = o.orders_id)) AS cost_total ,
SUM((SELECT COUNT(*) FROM refunds AS r1 WHERE r1.order_id = r.order_id AND NOT r.reason IS NULL)) AS refund_count ,
SUM((SELECT COUNT(*) FROM exchanges AS e1 WHERE e1.order_id = e.order_id AND e.type = :countResend AND NOT e.reason IS NULL)) AS resend_count ,
SUM((SELECT COUNT(*) FROM exchanges AS e2 WHERE e2.order_id = e.order_id AND e.type = :countExchange AND NOT e.reason IS NULL)) AS exchange_count
FROM orders AS o
JOIN sales_channel_config AS s ON o.sales_channel = s.sales_channel AND o.sub_sales_channel = s.sub_sales_channel
JOIN courier_service AS cs ON o.courier_service = cs.code
LEFT JOIN refunds AS r ON o.orders_id = r.order_id
JOIN orders_items AS oi ON o.orders_id = oi.orders_id
JOIN third_party_config AS tc ON SUBSTRING(oi.product_id_new, 3, 2) = tc.code
LEFT JOIN exchanges AS e ON o.orders_id = e.order_id
WHERE 1 = 1
AND o.tracking_num NOT IN (:cancelStatus)
AND (o.order_date >= :startDate AND o.order_date <= :endDate)
AND o.courier_service = :courier
AND SUBSTRING(oi.product_id_new, 3, 2) = :supplier
AND (NOT r.reason IS NULL OR NOT e.reason IS NULL)
The problem I'm having is that the various SUM((query)) clauses are counting duplicate orders, which is proving difficult to resolve. For example:
SUM((SELECT COUNT(DISTINCT r1.order_id) FROM refunds AS r1 WHERE r1.order_id = r.order_id AND NOT r.reason IS NULL)) AS refund_count ,
And
SUM((SELECT COUNT(*) FROM refunds AS r1 WHERE r1.order_id = r.order_id AND NOT r.reason IS NULL GROUP BY r1.order_id)) AS refund_count ,
Do not lower the resulting SUM at all. I have confirmed that the data returned will contain duplicates via another structurally identical query that returns rows from the parent query. When the other query is run without duplicate filtering, the counts match correctly so I'm confident that my problem query is accurate aside from including duplicated order ids.
So can anyone suggest another approach I might try?
For anyone who might benefit:
I removed most of the select logic and grouped on orders_id, which gives me an entirely accurate list of relevant orders:
SELECT o.orders_id AS order_id, r.id AS refund_id, e.id AS exchange_id, e.type AS exchange_type
FROM orders AS o
JOIN sales_channel_config AS s ON o.sales_channel = s.sales_channel AND o.sub_sales_channel = s.sub_sales_channel
JOIN courier_service AS cs ON o.courier_service = cs.code
LEFT JOIN refunds AS r ON o.orders_id = r.order_id
JOIN orders_items AS oi ON o.orders_id = oi.orders_id
JOIN third_party_config AS tc ON SUBSTRING(oi.product_id_new, 3, 2) = tc.code
LEFT JOIN exchanges AS e ON o.orders_id = e.order_id
WHERE 1 = 1
AND o.tracking_num NOT IN (:cancelStatus)
AND (o.order_date >= :startDate
AND o.order_date <= :endDate)
AND o.courier_service = :courier
AND SUBSTRING(oi.product_id_new, 3, 2) = :supplier
AND (NOT r.reason IS NULL OR NOT e.reason IS NULL)
GROUP BY (o.orders_id)
I've bitten the bullet here. I'm going to do some post processing to get the counts themselves, which is at least possible for me now.
Still don't understand why getting distinct values in the sub selects failed though.

How can I adjust a JOIN clause so that rows that have columns with NULL values are returned in the result?

How can I adjust this JOIN clause so that rows with a NULL value for the CountLocId or CountNatId columns are returned in the result?
In other words, if there is no match in the local_ads table, I still want the user's result from the nat_ads table to be returned -- and vice-versa.
SELECT u.franchise, CountLocId, TotalPrice, CountNatId, TotalNMoney, (
TotalPrice + TotalNMoney
)TotalRev
FROM users u
LEFT JOIN local_rev lr ON u.user_id = lr.user_id
LEFT JOIN (
SELECT lrr_id, COUNT( lad_id ) CountLocId, SUM( price ) TotalPrice
FROM local_ads
GROUP BY lrr_id
)la ON lr.lrr_id = la.lrr_id
LEFT JOIN nat_rev nr ON u.user_id = nr.user_id
INNER JOIN (
SELECT nrr_id, COUNT( nad_id ) CountNatId, SUM( tmoney ) TotalNMoney
FROM nat_ads
WHERE MONTH = 'April'
GROUP BY nrr_id
)na ON nr.nrr_id = na.nrr_id
WHERE lr.month = 'April'
AND franchise != 'Corporate'
ORDER BY franchise
Thanks in advance for your help!
try the following in where clause while making a left join. This will take all rows from right table with matched condition
eg.
LEFT JOIN local_rev lr ON (u.user_id = lr.user_id) or (u.user_id IS NULL)
Use this template, as it ensures that :
you have only one record per user_id (notice all subquerys have a GROUP BY user_id) so for one record on user table you have one (or none) record on subquery
independent joins (and calculated data) are not messed togeder
-
SELECT u.franchise, one.CountLocId, one.TotalPrice, two.CountNatId, two.TotalNMoney, (COALESCE(one.TotalPrice,0) + COALESCE(two.TotalNMoney,0)) TotalRev
FROM users u
LEFT JOIN (
SELECT x.user_id, sum(xORy.whatever) as TotalPrice, count(xORy.whatever) as CountLocId
FROM x -- where x is local_rev or local_ads I dont know
LEFT JOIN y on x.... = y.... -- where y is local_rev or local_ads I dont know
GROUP BY x.user_id
) as one on u.user_id = one.user_id
LEFT JOIN (
SELECT x.user_id, sum(xORy.whatever) as TotalNMoney, count(xORy.whatever) as CountNatId
FROM x -- where x is nat_rev or nat_ads I dont know
LEFT JOIN y on x.... = y.... -- where y is nat_rev or nat_ads I dont know
GROUP BY x.user_id
) as two on u.user_id = two.user_id