Why MySQL full outer join returns nulls?
Hi
I have the following data:
s_id,date,p_id,amount_sold
1, '2015-10-01', 1, 10
2, '2015-10-01', 2, 12
7, '2015-10-01', 1, 11
3, '2015-10-02', 1, 11
4, '2015-10-02', 2, 10
5, '2015-10-15', 1, 22
6, '2015-10-16', 2, 20
8, '2015-10-22', 3, 444
and i want my query to output something like this: (A = sum of amount_sold for p_id=1 for that date,B = sum of amount_sold for p_id=2 for that date)
date,A,B,Difference
'2015-10-01',21,12,9
'2015-10-02',11,10,1
'2015-10-15',22,0,22
'2015-10-01',0,20,-20
I tried with this query, but the order its returning is having NULLS and the output is wrong:
SELECT A.p_id,A.date,sum(A.amount_sold) A,B.Bs, (sum(A.amount_sold) - B.Bs) as difference FROM sales as A
LEFT JOIN (
SELECT SUM( amount_sold ) Bs,p_id,s_id, DATE
FROM sales
WHERE p_id =2
group by date
) as B ON A.s_id = B.s_id
where A.p_id=1 or B.p_id=2
group by A.date, A.p_id
UNION
SELECT A.p_id,A.date,sum(A.amount_sold) A,B.Bs, (sum(A.amount_sold) - B.Bs) as difference FROM sales as A
RIGHT JOIN (
SELECT SUM( amount_sold ) Bs,p_id,s_id, DATE
FROM sales
WHERE p_id =2
group by date
) as B ON A.s_id = B.s_id
where B.p_id=2
group by A.date, A.p_id
It returned:
p_id date A Bs difference
1 2015-10-01 21 NULL NULL
2 2015-10-01 12 12 0
1 2015-10-02 11 NULL NULL
2 2015-10-02 10 10 0
1 2015-10-15 22 NULL NULL
2 2015-10-16 20 20 0
What am i doing wrong here? and what is the correct way of doing it? any help would be appreciated.
A full join isn't needed. You can use conditional aggregation instead:
select
date,
sum(case when p_id = 1 then amount_sold else 0 end) a,
sum(case when p_id = 2 then amount_sold else 0 end) b,
sum(case when p_id = 1 then amount_sold else 0 end)
- sum(case when p_id = 2 then amount_sold else 0 end) difference
from sales
where p_id in (1,2)
group by date
Related
I have two tables that I am trying to LEFT join but I am not getting the expected results.
Rooms have multiple Children on different days, however Children are only counted in a Room after they have started and if they have hours allocated on that day. The output I am trying to achieve is this.
Room | MaxNum | Mon(Week1) | Tue(Week1) | Mon(Week2) | Tue(Week2)
Blue | 5 | 4 | 4 | 3 | 2
Green | 10 | 10 | 10 | 9 | 9
Red | 15 | 15 | 15 | 15 | 15
Here is the schema and some data...
create table Rooms(
id INT,
RoomName VARCHAR(10),
MaxNum INT
);
create table Children (
id INT,
RoomID INT,
MonHrs INT,
TueHrs INT,
StartDate DATE
);
INSERT INTO Rooms VALUES (1, 'Blue', 5);
INSERT INTO Rooms VALUES (2, 'Green', 10);
INSERT INTO Rooms VALUES (3, 'Red', 15);
INSERT INTO Children VALUES (1, 1, 5, 0, '2018-12-02');
INSERT INTO Children VALUES (2, 1, 0, 5, '2018-12-02');
INSERT INTO Children VALUES (3, 1, 5, 5, '2018-12-09');
INSERT INTO Children VALUES (4, 1, 0, 5, '2018-12-09');
INSERT INTO Children VALUES (5, 2, 5, 0, '2018-12-09');
INSERT INTO Children VALUES (6, 2, 0, 5, '2018-12-09');
The SQL I am having trouble with is this. It may not be the correct approach.
SELECT R.RoomName, R.MaxNum,
R.MaxNum - SUM(CASE WHEN C1.MonHrs > 0 THEN 1 ELSE 0 END) AS Mon1,
R.MaxNum - SUM(CASE WHEN C1.TueHrs > 0 THEN 1 ELSE 0 END) AS Tue1,
R.MaxNum - SUM(CASE WHEN C2.MonHrs > 0 THEN 1 ELSE 0 END) AS Mon2,
R.MaxNum - SUM(CASE WHEN C2.TueHrs > 0 THEN 1 ELSE 0 END) AS Tue2
FROM Rooms R
LEFT JOIN Children C1
ON R.id = C1.RoomID
AND C1.StartDate <= '2018-12-02'
LEFT JOIN Children C2
ON R.id = C2.RoomID
AND C2.StartDate <= '2018-12-09'
GROUP BY R.RoomName;
There is a double up happening on the Rows in the LEFT JOINs that is throwing the counts way off and I don't know how to prevent them. You can see the effect if you replace the SELECT with *
Any suggestions would help a lot.
This sort of problem usually surfaces from doing an aggregation in a too broad point in the query, which then results in duplicate counting of records. Try aggregating the Children table in a separate subquery:
SELECT
R.RoomName,
R.MaxNum,
R.MaxNum - C.Mon1 AS Mon1,
R.MaxNum - C.Tue1 AS Tue1,
R.MaxNum - C.Mon2 AS Mon2,
R.MaxNum - C.Tue2 AS Tue2
FROM Rooms R
LEFT JOIN
(
SELECT
RoomID,
COUNT(CASE WHEN MonHrs > 0 AND StartDate <= '2018-12-02'
THEN 1 END) AS Mon1,
COUNT(CASE WHEN TueHrs > 0 AND StartDate <= '2018-12-02'
THEN 1 END) AS Tue1,
COUNT(CASE WHEN MonHrs > 0 AND StartDate <= '2018-12-09'
THEN 1 END) AS Mon2,
COUNT(CASE WHEN TueHrs > 0 AND StartDate <= '2018-12-09'
THEN 1 END) AS Tue2
FROM Children
GROUP BY RoomID
) C
ON R.id = C.RoomID;
Note that we can avoid the double left join in your original query by instead using conditional aggregation on the start date.
Late edit: You probably don't even need a subquery at all, q.v. the answer by #Salman. But either of our answers should resolve the double counting problem.
You need to use one LEFT JOIN and move the date filter from JOIN condition to the aggregate:
SELECT R.id, R.RoomName, R.MaxNum
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-02' AND C.MonHrs > 0 THEN 1 END) AS Mon1
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-02' AND C.TueHrs > 0 THEN 1 END) AS Tue1
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-09' AND C.MonHrs > 0 THEN 1 END) AS Mon2
, R.MaxNum - COUNT(CASE WHEN C.StartDate <= '2018-12-09' AND C.TueHrs > 0 THEN 1 END) AS Tue2
FROM Rooms R
LEFT JOIN Children C ON R.id = C.RoomID
GROUP BY R.id, R.RoomName, R.MaxNum
I have the following (simplified) database schema:
Persons:
[Id] [Name]
-------------------
1 'Peter'
2 'John'
3 'Anna'
Items:
[Id] [ItemName] [ItemStatus]
-------------------
10 'Cake' 1
20 'Dog' 2
ItemDocuments:
[Id] [ItemId] [DocumentName] [Date]
-------------------
101 10 'CakeDocument1' '2016-01-01 00:00:00'
201 20 'DogDocument1' '2016-02-02 00:00:00'
301 10 'CakeDocument2' '2016-03-03 00:00:00'
401 20 'DogDocument2' '2016-04-04 00:00:00'
DocumentProcessors:
[PersonId] [DocumentId]
-------------------
1 101
1 201
2 301
I have also set up an SQL fiddle to play with: http://www.sqlfiddle.com/#!3/e6082
The relation logic is the following: every Person can work on zero or infinite number of ItemDocuments (many-to-many); each ItemDocument belongs to exactly one Item (one-to-many). Item has status 1 - Active, 2 - Closed
What I need is a report that fulfills the following requirements:
for each person in Persons table, display count of Items that have ItemDocuments related to this person
the counts should be split in two columns by ItemStatus
the query should be filterable by two optional date periods (using two BETWEEN conditions on ItemDocuments.Date field) and the Item counts should also be split into two periods
if a Person does not have any ItemDocuments assigned, it still should be shown in the results with all count values set to 0
if a Person has more than one ItemDocument for an Item, the Item still should be counted only once
Essentially, here is how the results should look like if I use both periods to NULL (to read all the data):
[PersonName] [Active Items for period 1] [Closed Items for period 1] [Active Items for period 2] [Closed Items for period 2]
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'Peter' 1 1 1 1
'John' 1 0 1 0
'Anna' 0 0 0 0
While I can create an SQL query for each requirement separately, I have a problem to understand how to combine all of them together into one.
For example, I can split ItemStatus counts in two columns using
COUNT(CASE WHEN t.ItemStatus = 1 THEN 1 ELSE NULL END) AS Active,
COUNT(CASE WHEN t.ItemStatus = 2 THEN 1 ELSE NULL END) AS Closed
and I can filter by two periods (with max/min date constants from MS SQL server specification to avoid NULLs for optional period dates) using
between coalesce(#start1, '1753-01-01') and coalesce(#end1, '9999-12-31')
between coalesce(#start2, '1753-01-01') and coalesce(#end2, '9999-12-31')
but how to combine all of this together, considering also JOINs between tables?
Is there any technique, join or MS SQL Server specific approach to do this in efficient way?
My first attempt seems to work as required but it looks like ugly subquery duplications multiple times:
DECLARE #start1 DATETIME, #start2 DATETIME, #end1 DATETIME, #end2 DATETIME
-- SET #start2 = '2017-01-01'
SELECT
p.Name,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Active1,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31')
)
) AS Closed1,
(SELECT COUNT(1)
FROM Items i
WHERE i.ItemStatus = 1 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Active2,
(SELECT COUNT(*)
FROM Items i
WHERE i.ItemStatus = 2 AND EXISTS(
SELECT 1
FROM DocumentProcessors AS dcp
INNER JOIN ItemDocuments AS idc ON dcp.DocumentId = idc.Id
WHERE dcp.PersonId = p.Id AND idc.ItemId = i.Id
AND idc.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31')
)
) AS Closed2
FROM Persons p
I'm not absolutely sure if I really got what you want, but you might try this
WITH AllData AS
(
SELECT p.Id AS PersonId
,p.Name AS Person
,id.Date AS DocDate
,id.DocumentName AS DocName
,i.ItemName AS ItemName
,i.ItemStatus AS ItemStatus
,CASE WHEN id.Date BETWEEN COALESCE(#start1, '1753-01-01') AND COALESCE(#end1, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod1
,CASE WHEN id.Date BETWEEN COALESCE(#start2, '1753-01-01') AND COALESCE(#end2, '9999-12-31') THEN 1 ELSE 0 END AS InPeriod2
FROM Persons AS p
LEFT JOIN DocumentProcessors AS dp ON p.Id=dp.PersonId
LEFT JOIN ItemDocuments AS id ON dp.DocumentId=id.Id
LEFT JOIN Items AS i ON id.ItemId=i.Id
)
SELECT PersonID
,Person
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ActiveIn1
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod1 = 1 THEN 1 ELSE NULL END) AS ClosedIn1
,COUNT(CASE WHEN ItemStatus = 1 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ActiveIn2
,COUNT(CASE WHEN ItemStatus = 2 AND InPeriod2 = 1 THEN 1 ELSE NULL END) AS ClosedIn2
FROM AllData
GROUP BY PersonID,Person
I have the following query and the sub query is producing a strange result when left joined. Here is what is happening (row with StoreID 1 is returning all null in joined table) and what I really want :
TABLE A
StoreID unitsales1 unitsales2 unitsales3
1 NULL NULL 6
2 12 12 12
3 12 NULL 12
TABLE B
StoreID prevunitsales1 prevunitsales2 prevunitsales3
2 NULL NULL 6
3 12 12 12
What I am getting
LEFT JOIN ON A.StoreId = B.StoreId
StoreID unitsales1 unitsales2 unitsales3 prevunitsales1 prevunitsales2 prevunitsales3
1 NULL NULL NULL NULL NULL NULL
2 12 12 12 NULL NULL 6
3 12 NULL 12 12 12 12
What I want
LEFT JOIN ON A.StoreId = B.StoreId
StoreID unitsales1 unitsales2 unitsales3 prevunitsales1 prevunitsales2 prevunitsales3
1 NULL NULL 6 NULL NULL NULL
2 12 12 12 NULL NULL 6
3 12 NULL 12 12 12 12
Here is the query:
SELECT SQL_CALC_FOUND_ROWS #storeid:=z.id,z.biz_name, z.wf_store_name, z.e_address, z.e_city, z.e_state, z.e_postal, IFNULL(total_sales - prev_total_sales,'CV') as diff_total_sales, IFNULL(d_source,'N/A') as d_source, IFNULL(unit_sales1 - prev_unit_sales1,(SELECT IFNULL(max(datetimesql),'NS') FROM storeCheckRecords WHERE store_id=#storeid AND upc=855555000019)) as diff_unit_sales1, IFNULL(unit_sales2 - prev_unit_sales2,(SELECT IFNULL(max(datetimesql),'NS') FROM storeCheckRecords WHERE store_id=#storeid AND upc=855555000022)) as diff_unit_sales2, IFNULL(unit_sales3 - prev_unit_sales3,(SELECT IFNULL(max(datetimesql),'NS') FROM storeCheckRecords WHERE store_id=#storeid AND upc=855555000025)) as diff_unit_sales3, IFNULL(unit_sales4 - prev_unit_sales4,(SELECT IFNULL(max(datetimesql),'NS') FROM storeCheckRecords WHERE store_id=#storeid AND upc=855555000028)) as diff_unit_sales4 FROM
(SELECT s1.id,s1.biz_name as biz_name, s1.wf_store_name as wf_store_name, s1.e_address as e_address,s1.e_city as e_city,s1.e_state as e_state,s1.e_postal as e_postal,sum(s2.unit_sales) as total_sales, sum(s2.unit_sales/4.28571428571) as week_avg,group_concat(DISTINCT s2.d_source separator ',') as d_source,
SUM(CASE u.id WHEN 1 THEN s2.unit_sales ELSE NULL END) AS unit_sales1,SUM(CASE u.id WHEN 2 THEN s2.unit_sales ELSE NULL END) AS unit_sales2,SUM(CASE u.id WHEN 3 THEN s2.unit_sales ELSE NULL END) AS unit_sales3,SUM(CASE u.id WHEN 4 THEN s2.unit_sales ELSE NULL END) AS unit_sales4
FROM allStores as s1
INNER JOIN storeCheckRecords AS s2
ON s1.id = s2.store_id
AND s2.datetimesql BETWEEN '2015-07-01' AND '2015-07-31'
AND s1.key_retailer LIKE 'FRESH THYME'
INNER JOIN ( SELECT 1 AS id, '855555000019' AS upc UNION SELECT 2, '855555000022' UNION SELECT 3, '855555000025' UNION SELECT 4, '855555000028' ) u ON u.upc = s2.upc
GROUP BY
s1.id) x
LEFT OUTER JOIN
(SELECT s1.id,s1.biz_name as prev_biz_name, s1.wf_store_name as prev_wf_store_name, s1.e_address as prev_e_address,s1.e_city as prev_e_city,s1.e_state as prev_e_state,s1.e_postal as prev_e_postal,sum(s2.unit_sales) as prev_total_sales, sum(s2.unit_sales/4.28571428571) as prev_week_avg,group_concat(DISTINCT s2.d_source separator ',') as prev_d_source,
SUM(CASE u.id WHEN 1 THEN s2.unit_sales ELSE 0 END) AS prev_unit_sales1,SUM(CASE u.id WHEN 2 THEN s2.unit_sales ELSE 0 END) AS prev_unit_sales2,SUM(CASE u.id WHEN 3 THEN s2.unit_sales ELSE 0 END) AS prev_unit_sales3,SUM(CASE u.id WHEN 4 THEN s2.unit_sales ELSE 0 END) AS prev_unit_sales4
FROM allStores as s1
INNER JOIN storeCheckRecords AS s2
ON s1.id = s2.store_id
AND s2.datetimesql BETWEEN '2015-06-01' AND '2015-06-30'
AND s1.key_retailer LIKE 'FRESH THYME'
INNER JOIN ( SELECT 1 AS id, '855555000019' AS upc UNION SELECT 2, '855555000022' UNION SELECT 3, '855555000025' UNION SELECT 4, '855555000028' ) u ON u.upc = s2.upc
GROUP BY
s1.id) y
ON x.id = y.id
RIGHT JOIN
(SELECT s1.id,s1.biz_name,s1.wf_store_name,s1.e_address,s1.e_city,s1.e_state,s1.e_postal
FROM allStores as s1
WHERE 1
AND s1.key_retailer LIKE 'FRESH THYME') z
ON y.id = z.id
ORDER BY biz_name ASC
LIMIT 0, 1000
What am I missing?
Comparing NULL with NULL using == never matches, which is why there is no table2 row that matches the first row of table1.
You can use <==> instead if you really want to do this.
mysql> SELECT 1 <=> 1, NULL <=> NULL, 1 <=> NULL;
-> 1, 1, 0
mysql> SELECT 1 = 1, NULL = NULL, 1 = NULL;
-> 1, NULL, NULL
I have a table like this:
score
id week status
1 1 0
2 1 1
3 1 0
4 1 0
1 2 0
2 2 1
3 2 0
4 2 0
1 3 1
2 3 1
3 3 1
4 3 0
I want to get all the id's of people who have a status of zero for all weeks except for week 3. something like this:
Result:
result:
id w1.status w2.status w3.status
1 0 0 1
3 0 0 1
I have this query, but it is terribly inefficient on larger datasets.
SELECT w1.id, w1.status, w2.status, w3.status
FROM
(SELECT s.id, s.status
FROM score s
WHERE s.week = 1) w1
LEFT JOIN
(SELECT s.id, s.status
FROM score s
WHERE s.week = 2) w2 ON w1.id=w2.id
LEFT JOIN
(SELECT s.id, s.status
FROM score s
WHERE s.week = 3) w3 ON w1.id=w3.id
WHERE w1.status=0 AND w2.status=0 AND w3.status=1
I am looking for a more efficient way to calculate the above.
select id
from score
where week in (1, 2, 3)
group by id
having sum(
case
when week in (1, 2) and status = 0 then 1
when week = 3 and status = 1 then 1
else 0
end
) = 3
Or more generically...
select id
from score
group by id
having
sum(case when status = 0 then 1 else 0 end) = count(*) - 1
and min(case when status = 1 then week else null end) = max(week)
You can do using not exists as
select
t1.id,
'0' as `w1_status` ,
'0' as `w2_status`,
'1' as `w3_status`
from score t1
where
t1.week = 3
and t1.status = 1
and not exists(
select 1 from score t2
where t1.id = t2.id and t1.week <> t2.week and t2.status = 1
);
For better performance you can add index in the table as
alter table score add index week_status_idx (week,status);
In case of static number of weeks (1-3), group_concat may be used as a hack..
Concept:
SELECT
id,
group_concat(status) as totalStatus
/*(w1,w2=0,w3=1 always!)*/
FROM
tableName
WHERE
totalStatus = '(0,0,1)' /* w1=0,w2=1,w3=1 */
GROUP BY
id
ORDER BY
week ASC
(Written on the go. Not tested)
SELECT p1.id, p1.status, p2.status, p3.status
FROM score p1
JOIN score p2 ON p1.id = p2.id
JOIN score p3 ON p2.id = p3.id
WHERE p1.week = 1
AND p1.status = 0
AND p2.week = 2
AND p2.status = 0
AND p3.week = 3
AND p3.status = 1
Try this, should work
It seemed so easy.
I am getting following table by using COALESCE. I need to perform distinct on row level.
1 1 5 5 5 (null)
2 2 2 2 25 25
3 7 35 35 35 35
That's what I am looking for.
1 5 null
2 25
3 7 35
Here's a Demo on http://sqlfiddle.com/#!3/e945b/5/0
This is the only way I can think of doing it.
Do not currently have enough time to explain its operation, so please post questions in comments;
WITH DataCTE (RowID, a, b, c, d, e, f) AS
(
SELECT 1, 1, 1, 5, 5, 5, NULL UNION ALL
SELECT 2, 2, 2, 2, 2, 25, 25 UNION ALL
SELECT 3, 3, 7, 35, 35, 35, 35
)
,UnPivotted AS
(
SELECT DC.RowID
,CA.Distinctcol
,OrdinalCol = ROW_NUMBER() OVER (PARTITION BY DC.RowID ORDER BY CA.Distinctcol)
FROM DataCTE DC
CROSS
APPLY (
SELECT Distinctcol
FROM
(
SELECT Distinctcol = a UNION
SELECT b UNION
SELECT c UNION
SELECT d UNION
SELECT e UNION
SELECT f
)DT
WHERE Distinctcol IS NOT NULL
) CA(Distinctcol)
)
SELECT RowID
,Col1 = MAX(CASE WHEN OrdinalCol = 1 THEN Distinctcol ELSE NULL END)
,Col2 = MAX(CASE WHEN OrdinalCol = 2 THEN Distinctcol ELSE NULL END)
,Col3 = MAX(CASE WHEN OrdinalCol = 3 THEN Distinctcol ELSE NULL END)
,Col4 = MAX(CASE WHEN OrdinalCol = 4 THEN Distinctcol ELSE NULL END)
,Col5 = MAX(CASE WHEN OrdinalCol = 5 THEN Distinctcol ELSE NULL END)
FROM UnPivotted
GROUP BY RowID