Mysql Select query with double JOIN statement - mysql

Hy guys. I need to select the customer_id who is on 'base' table, where the created date is the current date and where this customer dont have a budget_item executed on the current date. My query brings up an incorrect result. It show the customer_id of all budget_id that are not on that date..
What is wrong in my stetement?
PS.: I cannot consolidate the tables.
SELECT base.customer_id
FROM base
LEFT JOIN budget ON base.customer_id = budget.customer_id
LEFT JOIN budget_item ON budget.budget_id = budget_item.budget_id
WHERE
CAST(base.created as Date) = CURDATE()
AND budget_item.execution_date <> CURDATE();

Left join budget/budget_item specifically looking for items with the current date and then exclude them in WHERE by checking that some non-nullable column of budget_item is NULL (indicating no record was joined):
SELECT base.customer_id
FROM base
LEFT JOIN budget ON base.customer_id = budget.customer_id
LEFT JOIN budget_item ON budget.budget_id = budget_item.budget_id AND budget_item.execution_date = CURDATE()
WHERE
CAST(base.created as Date) = CURDATE()
AND budget_item.budget_id IS NULL;
Some prefer using NOT EXISTS for this, but the result and efficiency should be the same:
SELECT base.customer_id
FROM base
WHERE
CAST(base.created as Date) = CURDATE()
AND NOT EXISTS (
SELECT 1
FROM budget
JOIN budget_item ON budget.budget_id = budget_item.budget_id AND budget_item.execution_date = CURDATE()
WHERE budget.customer_id = base.customer_id
);
fiddle

Use aggregation:
SELECT base.customer_id
FROM base LEFT JOIN
budget bu
ON base.customer_id = bu.customer_id LEFT JOIN
budget_item bi
ON bu.budget_id = bi.budget_id
WHERE CAST(base.created as Date) = CURDATE()
GROUP BY base.customer_id
HAVING SUM(bi.execution_date = CURDATE()) = 0;

Related

MYSQL use GROUP BY to SUM timediff to get a total open time

I have multiple tables that I am having to join together in order to work out how long tickets have been open, I am using the following query (convoluted I know!):
SELECT DISTINCT u_db.environments.name AS Env_name, TIMEDIFF(u_db.tickets.close_date, u_db.tickets.created_date) AS Total_open_time
FROM u_db.tickets
INNER JOIN u_db.ticket_units
ON u_db.tickets.id = u_db.ticket_units.ticket_id
INNER JOIN u_db.units
ON u_db.ticket_units.unit_id = u_db.units.id
INNER JOIN u_db.locations
ON u_db.units.location_id = u_db.locations.id
INNER JOIN u_db.location_groups
ON u_db.locations.locations_group_id = u_db.location_groups.id
INNER JOIN u_db.environments
ON u_db.location_groups.environment = u_db.environments.id
WHERE u_db.tickets.created_date >= '2021-09-01 00:00:00'
AND u_db.tickets.created_date < '2021-10-01 00:00:00'
AND u_db.location_groups.id IN (50,17,46,45,48,49)
AND u_db.tickets.id IN (132357,132361,132372,132473);
Note: the close_date and created_date are stored as TIMESTAMP.
This generates the following output:
Env_name Total_open_time
GA 27:38:59
GA 01:43:51
GR 04:32:58
GR 49:39:19
However, I would like to group by Env_name and SUM the Total_open_times for each group, so my desired output is:
Env_name Total_open_time
GA 29:22:50
GR 54:12:17
I cannot seem to get the times to totals to sum when I group by Env_name, any suggestions on how to achieve this would be greatly appreciated!
Guess you can sum with difference in seconds, then convert seconds to time
SELECT u_db.environments.name AS Env_name, SEC_TO_TIME(SUM(TIMESTAMPDIFF(SECOND, u_db.tickets.created_date, u_db.tickets.close_date))) AS Total_open_time
FROM u_db.tickets
INNER JOIN u_db.ticket_units
ON u_db.tickets.id = u_db.ticket_units.ticket_id
INNER JOIN u_db.units
ON u_db.ticket_units.unit_id = u_db.units.id
INNER JOIN u_db.locations
ON u_db.units.location_id = u_db.locations.id
INNER JOIN u_db.location_groups
ON u_db.locations.locations_group_id = u_db.location_groups.id
INNER JOIN u_db.environments
ON u_db.location_groups.environment = u_db.environments.id
WHERE u_db.tickets.created_date >= '2021-09-01 00:00:00'
AND u_db.tickets.created_date < '2021-10-01 00:00:00'
AND u_db.location_groups.id IN (50,17,46,45,48,49)
AND u_db.tickets.id IN (132357,132361,132372,132473)
GROUP BY Env_name

GROUP BY from inner SELECT suquery is ignored in column sum

I have following query
SELECT YEAR(T.date), MONTH(T.date), T.production, T.lineID, SUM(rework + scrap)
FROM
(SELECT MAX(positionID), date, production, lineID
FROM productionPerPosition
WHERE lineID = 2
AND date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY date) AS T
INNER JOIN linePosition lp ON lp.lineID = T.lineID
INNER JOIN fttErrorType fet ON fet.positionID = lp.positionID
INNER JOIN fttData fd ON fd.errorID = fet.errorID
AND fd.date = T.date
GROUP BY YEAR(T.date), MONTH(T.date)
which gives this result
Now, I would like to group these results by year and month to get sum of production and sum of last column. I've tried this query
SELECT YEAR(T.date), MONTH(T.date), SUM(T.production), T.lineID, SUM(rework + scrap)
FROM
(SELECT MAX(positionID), date, production, lineID
FROM productionPerPosition
WHERE lineID = 2
AND date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY date) AS T
INNER JOIN linePosition lp ON lp.lineID = T.lineID
INNER JOIN fttErrorType fet ON fet.positionID = lp.positionID
INNER JOIN fttData fd ON fd.errorID = fet.errorID
AND fd.date = T.date
GROUP BY YEAR(T.date), MONTH(T.date)
Which gives me
Here production sum is wrong! It seems that GROUP BY from 7th line in first query is ignored.
Any idea how could I get needed result?
Edit: In inner SELECT I have separate production for several different positions (positionID) but I'm using only production from position that has highest positionID
Group has missing grouping columns that why it is resulting in some unexpected result
SELECT YEAR(T.date), MONTH(T.date), SUM(T.production), T.lineID, SUM(rework + scrap)
FROM
(SELECT MAX(positionID), date, production, lineID
FROM productionPerPosition
WHERE lineID = 2
AND date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY date, production, lineID) AS T
INNER JOIN linePosition lp ON lp.lineID = T.lineID
INNER JOIN fttErrorType fet ON fet.positionID = lp.positionID
INNER JOIN fttData fd ON fd.errorID = fet.errorID
AND fd.date = T.date
GROUP BY YEAR(T.date), MONTH(T.date), T.lineID
Has explained in e4c5 comment, you have to add all the unaggregated fields to your GROUP BY. I made it in the inner SELECT and in the main SELECT:
SELECT YEAR(T.date), MONTH(T.date), SUM(T.production), T.lineID, SUM(rework + scrap)
FROM
(SELECT MAX(positionID), date, production, lineID
FROM productionPerPosition
WHERE lineID = 2
AND date BETWEEN '2017-01-01' AND '2017-01-31'
GROUP BY date, production, lineID) AS T
INNER JOIN linePosition lp ON lp.lineID = T.lineID
INNER JOIN fttErrorType fet ON fet.positionID = lp.positionID
INNER JOIN fttData fd ON fd.errorID = fet.errorID
AND fd.date = T.date
GROUP BY YEAR(T.date), MONTH(T.date), T.lineID

Where is the time stamp for uploaded/entered aggregate reports physically stored in DHIS2 and how do I pull the time stamp data from it?

I have been trying to find the table that stores the time stamp for an uploaded aggregate report in dhis2. I need that data for a report i am creating using SQL view. I managed to find a table called datavalueaudit that has a time stamp column but every query I try pulls the time stamp for the dataelements that make up the aggregate report. I am still a newbie at both sql and dhis2 so i do not know how to go about solving the problem.
I am trying to modify a query that I kind of understand but still find kinda complicated
The following is the original query
SELECT DISTINCT p.startdate, prov.name AS province, par.name AS cheifdom, ou.name AS village, regexp_replace(ou.phonenumber, '+260', '0'), CASE WHEN b.reported IS NULL THEN 0::integer ELSE 1::integer END AS reported FROM datasetsource dss CROSS JOIN (SELECT DISTINCT periodid, startdate FROM period WHERE startdate <= now() AND periodtypeid = (SELECT periodtypeid FROM periodtype WHERE name ='Monthly') ORDER BY startdate DESC OFFSET 1 LIMIT 12) p LEFT JOIN (SELECT DISTINCT dv.sourceid, dv.periodid, TRUE AS reported FROM datavalue dv INNER JOIN (SELECT DISTINCT periodid, startdate FROM period WHERE startdate <= now() AND periodtypeid = (SELECT periodtypeid FROM periodtype WHERE name ='Monthly') ORDER BY startdate DESC OFFSET 1 LIMIT 12) a ON dv.periodid = a.periodid WHERE dv.dataelementid IN (SELECT DISTINCT dataelementid FROM datasetmembers WHERE datasetid = (SELECT datasetid FROM dataset WHERE uid = 'Hbcr2fLc9jM'))) b ON b.sourceid = dss.sourceid AND b.periodid = p.periodid INNER JOIN organisationunit ou ON dss.sourceid = ou.organisationunitid INNER JOIN organisationunit par ON ou.parentid = par.organisationunitid INNER JOIN organisationunit prov ON par.parentid = prov.organisationunitid INNER JOIN _periodstructure ps ON p.periodid = ps.periodid WHERE dss.datasetid = (SELECT datasetid FROM dataset WHERE uid = 'Hbcr2fLc9jM') ORDER BY prov.name, par.name, ou.name, p.startdate
The following is the one I tried modifying
SELECT DISTINCT p.startdate, prov.name AS province, par.name AS cheifdom, ou.name AS village, regexp_replace(ou.phonenumber, '+260', '0'), CASE WHEN b.reported IS NULL THEN 0::integer ELSE 1::integer END AS reported, dva.timestamp AS "Reports On Time" FROM datasetsource dss CROSS JOIN (SELECT DISTINCT periodid, startdate FROM period WHERE startdate <= now() AND periodtypeid = (SELECT periodtypeid FROM periodtype WHERE name ='Monthly') ORDER BY startdate DESC OFFSET 1 LIMIT 12) p LEFT JOIN (SELECT DISTINCT dv.sourceid, dv.periodid, TRUE AS reported FROM datavalue dv INNER JOIN (SELECT DISTINCT periodid, startdate FROM period WHERE startdate <= now() AND periodtypeid = (SELECT periodtypeid FROM periodtype WHERE name ='Monthly') ORDER BY startdate DESC OFFSET 1 LIMIT 12) a ON dv.periodid = a.periodid WHERE dv.dataelementid IN (SELECT DISTINCT dataelementid FROM datasetmembers WHERE datasetid = (SELECT datasetid FROM dataset WHERE uid = 'Hbcr2fLc9jM'))) b ON b.sourceid = dss.sourceid AND b.periodid = p.periodid LEFT JOIN ( SELECT DISTINCT dv.timestamp, dv.periodid, TRUE AS reported FROM datavalueaudit dv INNER JOIN (SELECT DISTINCT periodid, startdate FROM period WHERE startdate <= now() AND periodtypeid = (SELECT periodtypeid FROM periodtype WHERE name ='Monthly') ORDER BY startdate DESC OFFSET 1 LIMIT 12) a ON dv.periodid = a.periodid WHERE dv.dataelementid IN (SELECT DISTINCT MAX(dataelementid) FROM datasetmembers WHERE datasetid = '29827' GROUP BY datasetid)) k ON k.periodid = p.periodid INNER JOIN organisationunit ou ON dss.sourceid = ou.organisationunitid LEFT JOIN datavalueaudit dv ON dss.sourceid = dv.organisationunitid INNER JOIN datavalueaudit dva ON k.timestamp = dva.timestamp INNER JOIN organisationunit par ON ou.parentid = par.organisationunitid INNER JOIN organisationunit prov ON par.parentid = prov.organisationunitid INNER JOIN _periodstructure ps ON p.periodid = ps.periodid WHERE dss.datasetid = (SELECT datasetid FROM dataset WHERE uid = 'Hbcr2fLc9jM') ORDER BY prov.name, par.name, ou.name, p.startdate, dva.timestamp
The query I tried modifying only pulls the time stamp of when all the dataelements in the dataset of the completed aggregate report were uploaded instead of the time stamp of when just the completed aggregate report was uploaded
I would like to add a new column that pulls the time stamp data from the table that stores it but only for when a completed aggregate report*(record) has been uploaded.
There is no direct notion of an aggregate report in DHIS 2. Looking at the created column of the datavalue table will give you an approximation. If your data upload client uses data value sets and sets the completed property to true, effectively creating a complete data set registration, you can query the completedatasetregistration table for records.

MySQL getting sum of tables with the same id

I have four tables person,loan,ca,payments
I would like to get the sum of all payments amounts and cash advance amounts which has the same ID as the loan joined with a person from a specific date.
Here is my code, but the sum is calculated incorrectly:
SELECT pd.*,
l.total_loan_amount,
sum(c.ca_total_amount) AS ctot,
sum(p.payment_amount)
FROM personal_data pd
LEFT JOIN loans l
ON pd.id_personal_data = l.id_personal_data
LEFT JOIN ca c
ON l.id_loan = c.id_loan
LEFT JOIN payments p
ON l.id_loan = p.id_loan
WHERE l.loan_date = curDate()
AND (
c.ca_date = curDate()
OR c.ca_date IS NULL
)
AND (
p.payment_date = curDate()
OR p.payment_date IS NULL
)
GROUP BY pd.id_personal_data
Doing that may sometimes retrieve invalid results because id may or may not sometimes be present on other table.
Try using a subquery for each column you want to retrieve.
SELECT pd.*,
l.total_loan_amount,
c.totalCA,
p.totalPayment
FROM personal_data pd
LEFT JOIN loans l
ON pd.id_personal_data = l.id_personal_data
LEFT JOIN
(
SELECT id_loan, SUM(ca_total_amount) totalCA
FROM ca
-- WHERE DATE(ca_date) = DATE(CURDATE()) OR
-- ca_date IS NULL
GROUP BY id_loan
) c ON l.id_loan = c.id_loan
LEFT JOIN
(
SELECT id_loan, SUM(payment_amount) totalPayment
FROM payments
-- WHERE DATE(payment_date) = DATE(CURDATE()) OR
-- payment_date IS NULL
GROUP BY id_loan
) p ON l.id_loan = p.id_loan
WHERE DATE(l.loan_date) = DATE(curDate())
I think dates on every payment and cash advance are irrelevant because you are looking for its totals based on the date of loan

Problem with simple SQL join

This relates to a webpage that should show all upcoming events, and mark any that are in the current user's diary.
diary table
diary_id
member_id
event_id
event table
event_id
region_id
...
region table
region_id
...
member table
member_id
...
QUERY:
SELECT event.*, region.name, diary.diary_id
FROM event, region
LEFT JOIN diary on diary.member_id = 10 AND diary.event_id = event.event_id
WHERE region.region_id = event.region_id AND `date` >= NOW()
This is returning unknown column event.event_id and I can't figure out why. I'm no SQL whiz but expected this would just work and give me a NULL in the diary_id column for all events that are not in the user's diary
You are mixing join syntax. Try this instead.
SELECT event.*,
region.name,
diary.diary_id
FROM event
INNER JOIN region
ON region.region_id = event.region_id
LEFT JOIN diary
ON diary.member_id = 10
AND diary.event_id = event.event_id
WHERE `date` >= NOW()
Update
Your problem with not finding event_id is because of this FROM event, region. It can't find event_id in the on clause. Change your query as suggested above but it would also be possible to fix it by switching places of the tables to FROM region, event. Don't do that. Use the new join syntax introduced to the SQL language some 20 years ago.
Don't put diary.member_id = 10 in the where clause if you want to do the left join. In PL/SQL this will turn your left join into a join without asking you
The below should do better:
SELECT event.*, region.name, diary.diary_id
FROM event
JOIN region on region.region_id = event.region_id
LEFT JOIN ( select diary_id, event_id
from diary
where diary.member_id = 10 ) diary
ON diary.event_id = event.event_id
WHERE `date` >= NOW()
First of all I wouldn't put diary.member_id = 10 in the Join:
SELECT event.*, region.name, diary.diary_id
FROM `event`, region
LEFT JOIN diary ON diary.event_id = event.event_id
WHERE region.region_id = event.region_id AND `date` >= NOW() AND diary.member_id = 10
Are you sure that event.event_id is not event.id?