first time on here, hoping for help. (MySQL) I tried to use subqueries in a SELECT statement but when I GROUP BY, the single aggregate value outputs of the subqueries just produce the one same value for all rows in the table. This implies they are not GROUPED, right? How close am I to getting this right? Thanks
SELECT
c.name, ca.name, DATE_FORMAT(sp.created,'%Y%m') AS yr_month,
ss.signup_source, count(sp.seller_profile_id) AS No_seller_profiles,
(SELECT SUM(seller_invoice.gbp_value)/100
FROM seller_invoice JOIN seller_profile
ON seller_invoice.seller_profile_id = seller_profile.seller_profile_id
WHERE seller_invoice.created BETWEEN seller_profile.created AND ADDDATE(seller_profile.created, INTERVAL 30 DAY)),
(SELECT count(project_response.project_response_id)
FROM project_response JOIN seller_profile
ON project_response.seller_profile_id = seller_profile.seller_profile_id
WHERE project_response.created BETWEEN seller_profile.created AND ADDDATE(seller_profile.created, INTERVAL 30 DAY) AND project_response.is_visible_to_seller = 1)
FROM seller_profile AS sp
JOIN country AS c ON sp.country_id = c.country_id
JOIN seller_category AS sc ON sp.seller_profile_id = sc.seller_profile_id
JOIN category AS ca ON sc.category_id = ca.category_id
JOIN seller_signup_source AS ss ON sp.seller_profile_id = ss.seller_profile_id
WHERE sp.created BETWEEN '2018-11-01' AND '2018-12-31'
GROUP BY 1,2,3,4;
I have a training_stats table (current due training) and I also have a completed_training table.
What I want to do is query due training with the last completed date from the completed table.
I've nearly got what I want, I get the due training, but they are duplicated with each completed record(as there are many completed records to each current due), and I only want single rows and the latest completed date.
I've been trying to use MAX, and when I run the MAX query independently, I get the last record. But when the MAX query is in the join, it is returning all completed rows.
This is the query that I am using:
SELECT s.course_stat_id
,o.org_name
,u.id
,u.first_name
,u.last_name
,a.area_id
,a.area_name
,tc.course_id
,tc.course_name
,s.assigned_on
,s.due
,s.pass_mark
,s.completed_on
,completed.complete_training_id
,completed.complete_date
FROM training_stats s
JOIN organisations o ON o.org_id = s.org_id
LEFT JOIN (
SELECT complete_training_id
,user_id
,area_id
,course_id
,max(completed_on) AS complete_date
FROM completed_training
GROUP BY complete_training_id
) completed ON completed.user_id = s.user_id
AND completed.area_id = s.area_id
AND completed.course_id = s.course_id
LEFT JOIN users u ON u.id = s.user_id
LEFT JOIN areas a ON a.area_id = s.area_id
LEFT JOIN training_courses tc ON tc.course_id = s.course_id
WHERE u.active = 1
AND o.active = 1
AND s.assigned = 1
Can you see what I am doing wrong?
Not exactly positive of your expected results, but the failure is PROBABLY for your group by and JOIN. Your group by is ONLY on the training ID, but you are also pulling user, area and course as well as max date completed for said respective training ID, user, area, course. You group by and join should match the unique characteristics.
Without seeing data, the query as I interpret it is that the "complete_training_id" is an auto-increment column for that table. Having said that, there would only ever be one record for that ID.
Having said that, the completed training table can have for a single user, area and course, multiple training days of which you want the most recent. For example someone attending college and needs to take many computer classes and they are refreshers from prior so assume all are same course ID. A person could take in 2012, 2014, 2016. You would want the instance of the user/area/course showing the 2016 dated training. So lets look at that first.
select
ct.user_id,
ct.area_id,
ct.course_id,
max(ct.completed_on) AS complete_date
FROM
completed_training ct
GROUP BY
ct.user_id,
ct.area_id,
ct.course_id
Now, for each user, area and course of study, I have one record with the most recent completion date. NOW lets pull the rest of the details, but since you need the completed training ID too, I applied the MAX() of that in the query below. The ID should by default be increasing every time a new record is added, so one completed a year ago would have a lower value than the ID completed today. So you get both the completed ID and its corresponding date for a given user, area, course.
SELECT
s.course_stat_id,
o.org_name,
u.id,
u.first_name,
u.last_name,
a.area_id,
a.area_name,
tc.course_id,
tc.course_name,
s.assigned_on,
s.due,
s.pass_mark,
s.completed_on,
ct.complete_training_id,
ct.complete_date
FROM
training_stats s
JOIN organisations o
ON s.org_id = o.org_id
AND o.active = 1
LEFT JOIN
( select
ct.user_id,
ct.area_id,
ct.course_id,
max(ct.complete_training_id ) as complete_training_id,
max(ct.completed_on) AS complete_date
FROM
completed_training ct
GROUP BY
ct.user_id,
ct.area_id,
ct.course_id ) ct
on s.user_id = ct.user_id
AND s.area_id = ct.area_id
AND s.course_id = ct.course_id
JOIN users u
ON s.user_id = u.id
AND u.active = 1
LEFT JOIN areas a
ON s.area_id = a.area_id
LEFT JOIN training_courses tc
ON s.course_id = tc.course_id
WHERE
s.assigned = 1
I'm not 100% sure of that. First, run this query. It should list all completed training, with a rnk from 1 (lastest), to n (oldest).
SELECT complete_training_id
,user_id
,area_id
,course_id
,completed_on AS complete_date
,#curRank := case when complete_training_id <> #cur_complete_training_id then 0 else #curRank + 1 end rnk
FROM completed_training, (select #curRank := 0, #cur_complete_training_id := 0)
ORDER BY complete_training_id, completed_on DESC
If true, the answer is :
SELECT s.course_stat_id
,o.org_name
,u.id
,u.first_name
,u.last_name
,a.area_id
,a.area_name
,tc.course_id
,tc.course_name
,s.assigned_on
,s.due
,s.pass_mark
,s.completed_on
,completed.complete_training_id
,completed.complete_date
FROM training_stats s
JOIN organisations o ON o.org_id = s.org_id
LEFT JOIN (
SELECT complete_training_id
,user_id
,area_id
,course_id
,completed_on AS complete_date
,#curRank := case when complete_training_id <> #cur_complete_training_id then 0 else #curRank + 1 end rnk
FROM completed_training, (select #curRank := 0, #cur_complete_training_id := 0)
ORDER BY complete_training_id, completed_on DESC
) completed ON completed.user_id = s.user_id and completed.rnk = 1
AND completed.area_id = s.area_id
AND completed.course_id = s.course_id
LEFT JOIN users u ON u.id = s.user_id
LEFT JOIN areas a ON a.area_id = s.area_id
LEFT JOIN training_courses tc ON tc.course_id = s.course_id
WHERE u.active = 1
AND o.active = 1
AND s.assigned = 1
I want to count the patient diagnosis per municipality and consultation per municipality:
so it should be:
diagnosis per municipality + consultation per municipality
SELECT COUNT(consultations.id) +
(SELECT COUNT(patientdiagnosis.id)
FROM consultations
LEFT JOIN patientdiagnosis
ON patientdiagnosis.consultation_id = consultations.id
LEFT JOIN patients
ON consultations.patient_id = patients.id
LEFT JOIN rcitymun
ON patients.municipality = rcitymun.citycode
/*GROUP BY PER MUNICIPALITY SHOULD BE HERE*/
) as encounters, rcitymun.cityname
FROM consultations
LEFT JOIN patients
ON consultations.patient_id = patients.id
LEFT JOIN rcitymun
ON patients.municipality = rcitymun.citycode
GROUP BY patients.municipality;
current output:
encounters municipality
10323 BATAC
10423 NUEVA ERA
the encounter data is huge because it's counting all of the diagnosis instead of per municipality
what i want is to count the diagnosis per municipality.
desired output is something like this:
encounters municipality
105 BATAC
70 NUEVA ERA
It may be possible to reduce this by one subquery, but often it is best to start with independently grouped subqueries.
SELECT
rcitymun.cityname
, SUM(c.consult_count) consult_count
, SUM(d.diag_count) diag_count
FROM patients
INNER JOIN rcitymun ON patients.municipality = rcitymun.citycode
LEFT JOIN (
SELECT
consultations.patient_id
, COUNT(*) consult_count
FROM consultations
GROUP BY
consultations.patient_id
) c ON patients.id = c.patient_id
LEFT JOIN (
SELECT
consultations.patient_id
, COUNT(*) diag_count
FROM consultations
INNER JOIN patientdiagnosis ON patientdiagnosis.consultation_id = consultations.id
GROUP BY
consultations.patient_id
) d ON patients.id = d.patient_id
GROUP BY
rcitymun.cityname
another annoying student here!
Today I spend hours trying to combine (select) 2 already joined SQL outputs + the ID of the original table in a single table output. which ultimately resulted in this query:
SELECT * FROM(
SELECT fd1.User_idUser,avg(fd1.caloryIntake)
AS 'workdays'
FROM fact_dailysnapshot fd1
INNER JOIN dim_day dd1 ON dd1.DATE_SK = fd1.DATE_SK
WHERE dd1.weekend_ind = 'N'
GROUP BY fd1.User_idUser
ORDER BY fd1.User_idUser) A,
(SELECT avg(fd1.caloryIntake) AS 'weekend'
FROM fact_dailysnapshot fd1
INNER
JOIN dim_day dd1 ON dd1.DATE_SK = fd1.DATE_SK
WHERE dd1.weekend_ind = 'Y'
GROUP BY fd1.User_idUser
ORDER BY fd1.User_idUser) B;
Which translates into…
Now this is a false result, the second column gives an almost constant value for all user entries. I think this must be solved with some kind of EXTRA join but I literally ran out of ideas. Thanks in advance..!
Your JOIN is missing an ON clause to relate dUser_idUser.
But, the simplest way to write the query uses conditional aggregation:
SELECT fd1.User_idUser,
avg(case when dd1.weekend_ind = 'N' then fd1.caloryIntake end) as weekday_avg,
avg(case when dd1.weekend_ind = 'Y' then fd1.caloryIntake end) as weekend_avg
FROM fact_dailysnapshot fd1 INNER JOIN
dim_day dd1
ON dd1.DATE_SK = fd1.DATE_SK
GROUP BY fd1.User_idUser
ORDER BY fd1.User_idUser;
This is one query instead of two.
If I understand correctly, this is what you are looking for:
SELECT A.User_idUser, A.workdays, B.weekend
FROM (
SELECT fd1.User_idUser, avg(fd1.caloryIntake) AS 'workdays'
FROM fact_dailysnapshot fd1
INNER JOIN dim_day dd1
ON dd1.DATE_SK = fd1.DATE_SK
WHERE dd1.weekend_ind = 'N'
GROUP BY fd1.User_idUser
ORDER BY fd1.User_idUser) A
JOIN
(SELECT fd1.User_idUser, avg(fd1.caloryIntake) AS 'weekend'
FROM fact_dailysnapshot fd1
INNER JOIN dim_day dd1
ON dd1.DATE_SK = fd1.DATE_SK
WHERE dd1.weekend_ind = 'Y'
GROUP BY fd1.User_idUser
ORDER BY fd1.User_idUser) B
ON A.User_idUser = B.User_idUser
Each query gives you all users by ID and their workdays or weekends. You need to JOIN the results of the two query on the user ID.
I have four tables person,loan,ca,payments
I would like to get the sum of all payments amounts and cash advance amounts which has the same ID as the loan joined with a person from a specific date.
Here is my code, but the sum is calculated incorrectly:
SELECT pd.*,
l.total_loan_amount,
sum(c.ca_total_amount) AS ctot,
sum(p.payment_amount)
FROM personal_data pd
LEFT JOIN loans l
ON pd.id_personal_data = l.id_personal_data
LEFT JOIN ca c
ON l.id_loan = c.id_loan
LEFT JOIN payments p
ON l.id_loan = p.id_loan
WHERE l.loan_date = curDate()
AND (
c.ca_date = curDate()
OR c.ca_date IS NULL
)
AND (
p.payment_date = curDate()
OR p.payment_date IS NULL
)
GROUP BY pd.id_personal_data
Doing that may sometimes retrieve invalid results because id may or may not sometimes be present on other table.
Try using a subquery for each column you want to retrieve.
SELECT pd.*,
l.total_loan_amount,
c.totalCA,
p.totalPayment
FROM personal_data pd
LEFT JOIN loans l
ON pd.id_personal_data = l.id_personal_data
LEFT JOIN
(
SELECT id_loan, SUM(ca_total_amount) totalCA
FROM ca
-- WHERE DATE(ca_date) = DATE(CURDATE()) OR
-- ca_date IS NULL
GROUP BY id_loan
) c ON l.id_loan = c.id_loan
LEFT JOIN
(
SELECT id_loan, SUM(payment_amount) totalPayment
FROM payments
-- WHERE DATE(payment_date) = DATE(CURDATE()) OR
-- payment_date IS NULL
GROUP BY id_loan
) p ON l.id_loan = p.id_loan
WHERE DATE(l.loan_date) = DATE(curDate())
I think dates on every payment and cash advance are irrelevant because you are looking for its totals based on the date of loan