In an interview I have been asked to return the SQL query that reports the IDs of customers with the total purchases strictly increasing yearly
Table is something like below:
And the output expected is 11,22 as 33 is not strictly increasing.
I did not able to solve it however, on googling I found the solution which is:
WITH year_cte AS (
SELECT customer_id,
YEAR(order_date) AS year,
SUM(price) AS total
FROM Orders
GROUP BY customer_id, year
ORDER BY NULL
)
SELECT a.customer_id
FROM year_cte a
LEFT JOIN year_cte b
ON b.customer_id = a.customer_id AND b.year = a.year + 1
GROUP BY a.customer_id
HAVING SUM(a.total >= IFNULL(b.total, 0)) = 1 // Did not get this SUM
ORDER BY NULL;
Now my problem is that I am able to understand the solution apart from 1 line which is:
HAVING SUM(a.total >= IFNULL(b.total, 0)) = 1 // Did not get this SUM
Can someone please help me to understand why there is a condition inside the sum() and why it is equating with 1?
The a table contains the total price for a customer_id for the year prior to table b.
You want to retrieve all customer_ids having total strictly increasing so you need to check that the number of rows satisfying the condition a.total >= IFNULL(b.total, 0) are 1.
This case happens when the customer has a strictly increasing streak of totals during the years, the = 1 accounts for the fact that the last line will have a value of b.total equal to NULL (and so a.total >= 0 will satisfy the condition).
Using customer_id = 11 as an example:
customer_id | year | total
--------------------------
11 | 2019 | 150
11 | 2020 | 200
11 | 2021 | 250
For a.year = 2019 and b.year = 2020 you will have 150 >= 200.
For a.year = 2020 and b.year = 2021 you will have 200 >= 250.
For a.year = 2021 and b.year = 2022 you will have 250 >= 0.
The SUM of this expression will give 1.
Related
I have two tables namely "appointment" and "skills_data".
Structure of appointment table is:
id_ap || ap_meet_date || id_skill || ap_status.
And the value of ap_status are complete, confirm, cancel and missed.
And the skills_data table contains two columns namely:
id_skill || skill
I want to get the count of total number of appointments for each of these conditions
ap_status = ('complete' and 'confirm'),
ap_status = 'cancel' and
ap_status = 'missed'
GROUP BY id_skill and year and
order by year DESC
I tried this query which only gives me count of one condition but I want to get other two based on group by and order by clauses as mentioned.
If there is no record(for example: zero appointments missed in 2018 for a skill) matching for certain conditions, then it should display the output value 0 for zero count.
Could someone please suggest me with a query whether I should implement multiple select query or CASE clause to achieve my expected results. I have lot of records in appointment table and want a efficient way to query my records. Thank you!
SELECT a.id_skill, YEAR(a.ap_meet_date) As year, s.skill,COUNT(*) as count_comp_conf
FROM appointment a,skills_data s WHERE a.id_skill=s.id_skill and a.ap_status IN ('complete', 'confirm')
GROUP BY `id_skill`, `year`
ORDER BY `YEAR` DESC
Output from my query:
id_skill | year | skill | count_comp_conf
-----------------------------------------
1 2018 A 20
2 2018 B 15
1 2019 A 10
2 2019 B 12
3 2019 C 10
My expected output should be like this:
id_skill | year | skill | count_comp_conf | count_cancel | count_missed
------------------------------------------------------------------------
1 2018 A 20 5 1
2 2018 B 15 8 0
1 2019 A 10 4 1
2 2019 B 12 0 5
3 2019 C 10 2 2
You can use conditional aggregation using case when expression
SELECT a.id_skill, YEAR(a.ap_meet_date) As year, s.skill,
COUNT(case when a.ap_status IN ('complete', 'confirm') then 1 end) as count_comp_conf,
COUNT(case when a.ap_status = 'cancel' then 1 end) as count_cancel,
COUNT(case when a.ap_status = 'missed' then 1 end) as count_missed
FROM appointment a inner join skills_data s on a.id_skill=s.id_skill
GROUP BY `id_skill`, `year`
ORDER BY `YEAR` DESC
SELECT a.id_skill,
YEAR(a.ap_meet_date) As year,
s.skill,
SUM(IF(a.ap_status IN ('complete', 'confirm'),1,0)) AS count_comp_conf,
SUM(IF(a.ap_status='cancel',1,0)) AS count_cancel,
SUM(IF(a.ap_status='missed',1,0)) AS count_missed
FROM appointment a,skills_data s WHERE a.id_skill=s.id_skill
GROUP BY `id_skill`, `year`
ORDER BY `YEAR` DESC;
Please try to use if condition along with sum.
With below query you will get output.
select id_skill ,
year ,
skill ,
count_comp_conf ,
count_cancel ,
count_missed ( select id_skill, year, skill, if ap_status ='Completed' then count_comp_conf+1, elseif ap_status ='cancelled' then count_cancel +1 else count_missed+1
from appointment a join skills_data s on (a.id_skill = s.id_skill) group by id_skill, year) group by id_skill,year
order by year desc;
The following query,
select shelf_id, issue_date, current_qty
from Stock
where barcode = '555' and issue_date <= '2018-05-30 14:28:32'
will give the following results,
10 2018-05-25 00:00:00 5
10 2018-05-28 00:00:00 55
5 2018-05-29 00:00:00 100
Adding group by shelf_id will lead to that result,
10 2018-05-25 00:00:00 5
5 2018-05-29 00:00:00 100
The desired result is the following.
10 2018-05-28 00:00:00 55
5 2018-05-29 00:00:00 100
The reasoning behind this, is that for each group I would like to return the row of the group with the latest issue_date.
limit 1 limits the total groups returned to just one,
having issue_date... would be a possible solution but I do not know how to get the closest date to Max(issue_date)
Is it possible at all to accomplish this without using a subquery?
Edit:
The second condition in the where clause issue_date <= '2018-05-30 14:28:32' is a user input issue_date <= ?2 ment to initially filter the table, the query then should group by the results per shelf_if, but return the row with the closest day to the max(issue_date). So I don't see how I could just replace this condition with a subquery.
Don't use group by! You are trying to filter the rows. Here is one method:
select s.*
from stock s
where s.issue_date = (select max(s2.issue_date) from stock s2 where s2.shelf_id = s.shelf_id);
As a bonus, with an index on stock(shelf_id, issue_date), the performance should be better than the group by.
If you have identify column then you can use LIMIT clause:
select s.*
from Stock s
where barcode = '555' and issue_date <= '2018-05-30 14:28:32' and
identity_col = (select identity_col
from Stock s1
where s1.shelf_id = s.shelf_id
order by s1.issue_date desc
limit 1
);
you may try it this way
select ST.shelf_id, ST.issue_date, ST.current_qty
from Stock as ST INNER JOIN (select shelf_id, MAX(issue_date) AS issue_date
from Stock
where barcode = '555' and issue_date <= '2018-05-30 14:28:32'
GROUP BY shelf_id) AS A ON ST.shelf_id = A.shelf_id and ST.issue_date = A.issue_date
as long as (shelf_id,issue_date) are unique, this should work, please let me know if im wrong
I am trying to sort the transaction dates into an aging policy. When LastDate has been in the location for greater than Aging Days limit policy it should show up as OverAge if not Within referring to the current date.
Here is the current table:
+---------+------+----------+-------------+
|LastDate | Part | Location | Aging Days |
+---------+------+----------+-------------+
12/1/2016 123 VVV 90
8/10/2017 444 RRR 10
8/01/2017 144 PR 21
7/15/2017 12 RRR 10
Here is the query:
select
q.lastdate,
r.part, r.location,
a.agingpolicy as 'Aging Days'
from opsintranexcel r (nolock)
left InventoryAging a (nolock) on r.location=a.location
left join (select part,MAX(trandate) as lastdate from opsintran group by
part) q on r.part=q.part
Here is the extra column I want added in:
+---------+------+----------+------------+---------+
|LastDate | Part | Location | Aging Days | Age |
+---------+------+----------+------------+---------+
12/1/2016 123 VVV 90 Overage
8/10/2017 444 RRR 10 Within
8/01/2017 144 PR 21 Within
7/15/2017 12 RRR 10 Overage
I appreciate your help.
I think below code will be work for you
SELECT
q.lastdate,
r.part,
r.location,
a.agingpolicy as 'Aging Days'
'Age' =
CASE
WHEN DATEDIFF( day, q.LastDate, GETDATE() ) > a.agingpolicy THEN 'Overage'
ELSE THEN 'Within'
END
FROM opsintranexcel r (nolock)
LEFT JOIN InventoryAging a (nolock) on r.location=a.location
LEFT JOIN (
SELECT part,MAX(trandate) as lastdate
FROM opsintran
WHERE trantype='II' and PerPost>='201601'
GROUP BY part) q ON r.part=q.part
you can check the difference of the current date and the lastdate value if over or within the aging days
CASE WHEN DATEDIFF(NOW(), q.lastdate) > a.agingpolicy
THEN 'Overage'
ELSE 'Within'
END AS age
You should modify your query as:
select
q.lastdate,
r.part, r.location,
a.agingpolicy as 'Aging Days',
if(DATEDIFF(NOW(), q.lastdate)) > a.agingpolicy, 'Overage','Within') as 'Age'
from opsintranexcel r (nolock)
left InventoryAging a (nolock) on r.location=a.location
left join (select part,MAX(trandate) as lastdate from opsintran where
trantype='II' and PerPost>='201601' group by part) q on r.part=q.part
I need some help to solve an issue with my query. I want to join the output of two select statements:
1st
select extract(year from createdDate) as year,
count(extract(year from createdDate)) as count
from table
where to_user_id= 322
group by extract(year from createdDate);
and its output
Year Count
2014 18
2015 117
2016 9
and 2nd query
select count(extract(year from createdDate)) as count
from table
where userId=322
group by extract(year from createdDate);
and its output
Count
18
110
11
I want to add this two tables into one table.
I want that type of output,
Year Count Count
2014 18 18
2015 117 110
2016 9 11
Note that I use to_user_id in query 1 but userId in query 2.
I tried to solved out this thing but I got repeated values in the output.
Anyone know the solution?
Write them as subqueries and join them together.
SELECT a.year, a.count AS t_user_count, b.count AS user_count
FROM (select YEAR(create_date) AS year, COUNT(*) AS count
FROM table
WHERE to_user_id = 322
GROUP BY year) AS a
JOIN (SELECT YEAR(create_date) AS year, COUNT(*) AS count
FROM table
WHERE user_id = 322
GROUP BY year) AS b
ON a.year = b.year
Table 1:
Date PlacementID CampaignID Impressions
04/01/2014 100 10 1000
04/01/2014 101 10 1500
04/01/2014 100 11 500
Table 2:
Date PlacementID CampaignID Cost
04/01/2014 100 10 5000
04/01/2014 101 10 6000
04/01/2014 100 11 7000
04/01/2014 103 10 8000
When I have joined this table using Full Join and Left Join statement, I am not able to get uncommon record which is last row in table2 that display PlacementID 103 and campaignID 10 and Cost 8000. However I have searched all raw data and file but this missing records are not common between two sources. However, I want to include this records in final table. How can I do that? This two table are two different source and I have got results only common records.
Moreover, when I found out that missing value is exact value that are required in final figure so want to include every thing. I am including my SQL script below:
SELECT A.palcementid,
A.campaignid,
A.date,
Sum(A.impressions) AS Impressions,
Sum(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.placementid = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT Count(A.placementid) AS Placement_Count,
placementid. campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.placementid = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY A.placementid,
A.campaignid,
A.date
I am dividing Cost by placement because in source the cost was allocated for one placement only and one time so I have to divide those because in actual table the same Placementid repeat more than 1 times on same date.
As you didn't provide any expected output I guessing here but if the result you want is this:
PlacementID CampaignID Date Impressions Cost
----------- ----------- ----------------------- ----------- -----------
100 10 2014-04-01 02:00:00.000 1000 5000
100 11 2014-04-01 02:00:00.000 500 7000
101 10 2014-04-01 02:00:00.000 1500 6000
103 10 2014-04-01 02:00:00.000 NULL 8000
Then the following query should do it:
SELECT COALESCE(A.PlacementID,b.placementid) AS PlacementID,
COALESCE(A.campaignid, b.campaignid) AS CampaignID,
COALESCE(A.date, b.date) AS [Date],
SUM(A.impressions) AS Impressions,
SUM(CASE
WHEN C.placement_count > 1 THEN ( B.cost / C.placement_count )
ELSE B.cost
END ) AS Cost
FROM table1 A
FULL JOIN table2 B
ON A.[PlacementID] = B.placementid
AND A.campaignid = B.campaignid
AND A.date = B.date
LEFT JOIN (SELECT COUNT(PlacementID) AS Placement_Count,
placementid, campaignid,
date
FROM table1
GROUP BY placementid,
campaignid,
date) c
ON A.[PlacementID] = C.placementid
AND A.campaignid = C.campaignid
AND A.date = C.date
GROUP BY COALESCE(A.PlacementID, B.PlacementID),
COALESCE(A.campaignid, b.campaignid),
COALESCE(A.date, b.date)
Sample SQL Fiddle