Need to Calculate few metrics from dataset using SQL - separate queries - mysql

Dataset looks like this : This is a sample dataset for number of employee login activity named - activity
I need to calculate few metrics, was able to do in python data frames, but new in mySQL.
what is the average number of employee active per day for month of jan 2018 by dept ( was able to do somewhat half of it, but results coming are not correct.
number of unique active employee (login >0) per month for jan 2018 for each dept_id (was able to do it)
month over month growth for all dept_id from dec-2017 to jan 2018 where at least one employee was active (login >0) - no idea how to do this in sql
fraction of users who were active in each dept_id for dec 2017 and were also active in the same dept_id for jan 2018
how many employee login in on 3 or more consecutive days in jan 2018
Any help would be appreciated.
Query written for case 1:
select dept_id,
DAU
from
(
select dept_id
, month(date)
, year(date)
, avg(logins) as DAU
from
(select * from activity where login >0)
where year(date) =2018
and month(date) =1
group by dept_id, month(date), year(date)
)
Textual Format Dataset
date dept_id emp_id logins
29-11-2017 ABC001 A1 1
30-11-2017 ABC002 A2 2
01-12-2017 XYZ001 A3 0
01-12-2017 XYZ002 A4 1
03-12-2017 ABC001 D2 4
04-12-2017 ABC002 D1 1
05-12-2017 XYZ001 A6 2
05-12-2017 XYZ002 A7 3
30-12-2017 ABC001 A8 0
01-01-2018 ABC002 A2 6
02-01-2018 XYZ001 A10 4
03-01-2018 XYZ002 A11 2
04-01-2018 ABC001 A1 2
04-01-2018 ABC002 A2 0
05-01-2018 XYZ001 A13 4
05-01-2018 XYZ001 A6 2
05-01-2018 XYZ002 A7 1
06-01-2018 XYZ001 A6 2
06-01-2018 XYZ002 A7 3
07-01-2018 XYZ001 A6 3
07-01-2018 XYZ002 A7 4
06-01-2018 XYZ002 A14 3
30-01-2018 ABC001 A15 2

Let me know if this works otherwise I will update the answer, I don't have MYSQL installed so wasn't able to check.
And the date is a keyword in oracle but not sure in MYSQL so use it in quotes like "date".
Case 1:
SELECT dept_id,
AVG(cnt) average_emp
FROM (SELECT dept_id,
days,
COUNT(emp_id) cnt
FROM (SELECT dept_id,
emp_id,
SUM(logins) logins,
to_char(DATES, 'dd') days
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018'
GROUP BY dept_id,
emp_id,
to_char(DATES, 'dd') )
WHERE logins > 0
GROUP BY dept_id,
days )
GROUP BY dept_id;
Case 2:
SELECT dept_id,
COUNT(emp_id)
FROM (SELECT dept_id,
emp_id,
SUM(logins) logins
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018'
GROUP BY dept_id,
emp_id )
WHERE logins > 0
GROUP BY dept_id;
Case 3:
SELECT months,
users,
ROUND( (users - nvl(LAG(users) OVER (ORDER BY rownum),users) ) / nvl(LAG(users) OVER (ORDER BY ROWNUM), 1)
, 2) growth_rate
FROM (SELECT to_char(mt.DATES, 'MON-YYYY') months,
count(mt.EMP_ID) users
FROM (SELECT *
FROM MYTABLE
ORDER BY DATES) mt
WHERE mt.DATES >= to_date('DEC-2017', 'MON-YYYY')
AND mt.DATES <= to_date('JAN-2018', 'MON-YYYY')
GROUP BY to_char(mt.DATES, 'MON-YYYY')
ORDER BY to_date(months, 'MON-YYYY') ) oq
WHERE exists(SELECT 1
FROM MYTABLE iq
WHERE to_char(iq.DATES, 'MON-YYYY') = oq.months
AND iq.EMP_ID IN (SELECT EMP_ID
FROM MYTABLE
WHERE iq.LOGINS > 0) );
Case 4:
SELECT dept_id,
emp_id
FROM (SELECT dept_id,
emp_id
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '122017'
AND logins > 0
GROUP BY dept_id,
emp_id )
INTERSECT
SELECT dept_id,
emp_id
FROM (SELECT dept_id,
emp_id
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018'
AND logins > 0
GROUP BY dept_id,
emp_id )
Case 5:
-- not full proof
SELECT COUNT(*) emp_cnt
FROM (SELECT emp_id,
DENSE_RANK() OVER(ORDER BY DATES) rn,
COUNT(*) OVER(PARTITION BY emp_id ORDER BY DATES) cnt
FROM mytable
WHERE to_char(DATES,'mmyyyy') = '012018'
AND logins > 0
ORDER BY rn,
cnt )
WHERE rn = cnt
AND rn >= 3;

Related

Hackerrank Problem:15 days of learning SQL (Stuck at count section in subpart)

I have been trying to solve the below problem
https://www.hackerrank.com/challenges/15-days-of-learning-sql/problem?isFullScreen=true
but looks like stuck at finding the count of hacker_ids who have submissions for every date in the order by given start date following. Below is the 2 versions of solution max_submissions which gives max count of submission per date with lowest id if multiple max dates that is coming as correct but in the final query for count I am unable to get proper counts it is giving count as 35 for all dates with submissions on every day per hacker_id. Only 2nd column which is unique hackers count in the output that I am unable to get either I get 35 as count value for all or other values which seems to be different from expected output but by logic seems correct
with max_submissions
as
(
Select t.submission_date,t.hacker_id,t.cnt,h.name From
(Select * from
(Select submission_date, hacker_id, cnt, dense_rank() over (partition by submission_date order by cnt desc,hacker_id asc) as rn
from
(Select
submission_date, hacker_id, count(submission_id) cnt
from
submissions
where submission_date between '2016-03-01' and '2016-03-15'
group by submission_date, hacker_id
)
)where rn =1
) t join
hackers h on t.hacker_id=h.hacker_id
),
t1
as
(
select hacker_id
from
(
Select
hacker_id, lead(submission_date) over ( order by hacker_id,submission_date)
-submission_date cnt
from
submissions
where submission_date between '2016-03-01' and '2016-03-15'
order by hacker_id asc, submission_date asc)
group by hacker_id having sum(case when cnt=1 then 1 else 0 end) =14)
select s.submission_date,count( t1.hacker_id)
from submissions s
join
t1 on
s.hacker_id=t1.hacker_id
group by s.submission_date;
This should give you the correct result:
WITH calendar (day) AS (
-- Generate a calendar so we don't need to assume that there will always be a submission
-- every day.
SELECT DATE '2016-03-01' + LEVEL - 1 AS day
FROM DUAL
CONNECT BY LEVEL <= 15
),
daily_hacker_submissions (submission_date, hacker_id, num_submissions) AS (
-- Find the number of submissions for hackers on each day.
SELECT c.day,
hacker_id,
COUNT(*) AS num_submissions
FROM calendar c
LEFT OUTER JOIN submissions s
ON (
-- Don't assume dates are always midnight.
c.day <= s.submission_date
AND s.submission_date < c.day + 1
)
GROUP BY
c.day,
s.hacker_id
),
daily_submissions (submission_date, num_hackers, hacker_id ) AS (
-- Find the number of hackers on each day and the hacker with the greatest number of
-- submissions and the least hacker id.
SELECT submission_date,
COUNT(DISTINCT hacker_id),
MIN(hacker_id) KEEP (DENSE_RANK LAST ORDER BY num_submissions)
FROM daily_hacker_submissions
GROUP BY
submission_date
)
-- Include the hacker's name
SELECT d.submission_date,
d.num_hackers,
d.hacker_id,
h.name
FROM daily_submissions d
LEFT OUTER JOIN hackers h
ON (d.hacker_id = h.hacker_id)
Which, for the sample data:
CREATE TABLE submissions (submission_date, submission_id, hacker_id, score) AS
SELECT DATE '2016-03-01', 1, 1, 80 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 2, 1, 90 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 3, 1, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 4, 2, 90 FROM DUAL UNION ALL
SELECT DATE '2016-03-01', 5, 2, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 6, 1, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 7, 2, 90 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 8, 2, 100 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 9, 3, 80 FROM DUAL UNION ALL
SELECT DATE '2016-03-02', 10, 3, 100 FROM DUAL;
CREATE TABLE hackers (hacker_id, name) AS
SELECT 1, 'Alice' FROM DUAL UNION ALL
SELECT 2, 'Betty' FROM DUAL UNION ALL
SELECT 3, 'Carol' FROM DUAL;
Outputs:
SUBMISSION_DATE
NUM_HACKERS
HACKER_ID
NAME
2016-03-01 00:00:00
2
1
Alice
2016-03-02 00:00:00
3
2
Betty
2016-03-03 00:00:00
0
null
null
...
...
...
...
db<>fiddle here

MySQL Select duplicities according date interval

I have table orders:
id
login_name
success
order_date
1
login1
0
2021-01-05
2
login2
0
2021-01-06
3
login3
0
2021-01-08
4
login1
1
2021-01-04
5
login2
0
2021-01-01
I need to select id, login_name with success=0 for which exist another order with order_date older or younger than 60 days.
The result should be:
1 - login1, 2 - login2, 5 - login2
I have this, but I think that is not a right way:
SELECT id, login_name, COUNT(*)
FROM orders
WHERE success=0
GROUP BY login_name
HAVING COUNT(*) > 1
You can use the EXISTS as follows:
SELECT id, login_name, COUNT(*)
FROM orders r
WHERE success=0
and exists
(select 1 from orders rr
where rr.login = r.login
and abs(datediff(rr.order_date, r.order_date)) <= 60
and rr.id <> r.id
)
If you want orders that appear within 60 days of each other, you can use lag() and lead():
select o.*
from (select o.*,
lag(order_date) over (partition by login_name order by order_date) as prev_order_date,
lead(order_date) over (partition by login_name order by order_date) as lead_order_date
from orders o
) o
where prev_order_date > dateadd(day, -60, order_date) or
next_order_date < dateadd(day, 60, order_date);

Group by a summed variable

SELECT deposit.numberSuccessfulDeposits, count(distinct userid)
FROM deposit WHERE deposit.asOfDate between '2016-04-01 00:00:00' and '2016-04-03 23:59:59'
AND deposit.licenseeId = 1306
GROUP BY deposit.numberSuccessfulDeposits
Sample output
numberSuccessfulDeposits count(distinct userid)
0 228
1 878
2 90
3 37
4 17
However, if Bob made 1 deposit on Monday and 3 deposits on Tuesday, then it will count towards both "1" and "3" for number of successful deposits.
numberSuccessfulDeposits count(distinct userid)
0 ##
1 1
2 ##
3 1
4 ##
Ideally, it should only count towards "4"
numberSuccessfulDeposits count(distinct userid)
0 ##
1 ##
2 ##
3 ##
4 1
Thoughts?
Change the grouping to user-based and sum all occurences of deposits. Then count users for each sum of those deposits:
SELECT
numberSuccessfulDeposits,
COUNT(userid) AS users_count
FROM (
SELECT
sum(numberSuccessfulDeposits) AS numberSuccessfulDeposits,
userid
FROM deposit
WHERE asOfDate between '2016-04-01 00:00:00' and '2016-04-03 23:59:59'
AND licenseeId = 1306
GROUP BY userid
) t
GROUP BY numberSuccessfulDeposits
Edit: Grouping deposints into 0, 1, 2, 3+ category would look like that:
SELECT
numberSuccessfulDeposits,
COUNT(userid) AS user_count
FROM (
SELECT
CASE WHEN numberSuccessfulDeposits >= 3 THEN '3+' ELSE numberSuccessfulDeposits::TEXT END AS numberSuccessfulDeposits,
userid
FROM (
SELECT
sum(numberSuccessfulDeposits) AS numberSuccessfulDeposits,
userid
FROM deposit
WHERE asOfDate between '2016-04-01 00:00:00' and '2016-04-03 23:59:59'
AND licenseeId = 1306
GROUP BY userid
) t
) f
GROUP BY numberSuccessfulDeposits
Calculate the per-user sum in a subquery, then the per-total count in the main query.
SELECT totalDeposits, COUNT(*)
FROM (SELECT userid, SUM(numberOfSuccessfulDeposits) AS totalDeposits
FROM deposit
WHERE deposit.asOfDate between '2016-04-01 00:00:00' and '2016-04-03 23:59:59'
AND deposit.licenseeId = 1306
GROUP BY userid) AS subquery
GROUP BY totalDeposits

Adding Number of Days of Reservation Over Every Month Between Range

I'm calculating
the number of days a reservation took place over every month (for every month since the first record)
A total price based on the total # of days and rate.
INSERT INTO `reservations`
(`id`, `user_id`, `property_id`, `actual_check_in`,`actual_check_out`)
VALUES
(5148, 1, 2, '2014-01-01', '2014-01-10'),
(5149, 1, 2, '2014-02-03', '2014-02-10'),
(5151, 1, 2, '2014-02-02', '2014-02-15'),
(5153, 1, 2, '2014-03-05', '2014-03-10'),
(5153, 1, 2, '2014-02-20', '2014-03-30'),
SELECT
YEAR(month.d),
MONTHNAME(month.d),
r.property_id,
SUM(
DATEDIFF(LEAST(actual_check_out, LAST_DAY(month.d)), GREATEST(actual_check_in, month.d))
) AS days,
SUM(days*p.rate),
MIN(r.actual_check_in) as firstDate,
MAX(r.actual_check_out) as lastDate
FROM reservations as r
LEFT JOIN property as p on r.property_id=p.id
RIGHT JOIN (
select
DATE_FORMAT(m1, '%Y%m%d') as d
from
(
select
(firstDate - INTERVAL DAYOFMONTH(firstDate)-1 DAY)
+INTERVAL m MONTH as m1
from
(
select #rownum:=#rownum+1 as m from
(select 1 union select 2 union select 3 union select 4) t1,
(select 1 union select 2 union select 3 union select 4) t2,
(select 1 union select 2 union select 3 union select 4) t3,
(select 1 union select 2 union select 3 union select 4) t4,
(select #rownum:=-1) t0
) d1
) d2
where m1<=lastDate
order by m1
) AS month ON
actual_check_in <= LAST_DAY(month.d)
AND month.d <= actual_check_out
GROUP BY user_id, month.d
Troubles I'm having:
getting MySQL to accept a variable for firstDate & lastDate in the joined query
I want to sum the monthly number of days together, for reservations by the same user, for the same month. I'm trying to turn the proper parts into a subquery to calculate that but having trouble..
http://sqlfiddle.com/#!9/71e34/1
I would like to have the results like (if the property rate is 150/day):
DATE | USER | #Days | Total Rate
--------------------------------------
01/2014 | 1 | 9 | 1350
01/2014 | 2 | 0 | 0
02/2014 | 1 | 30 | 4500
02/2014 | 2 | 0 | 0
03/2014 | 1 | 35 | 5250
03/2014 | 2 | 0 | 0
04/2014 | 1 | 0 | 0
04/2014 | 2 | 0 | 0
* # days can be more than the # of days in a month because there might be multiple reservations existing during that month
UPDATE---- This almost solved the problem, but I'm having trouble in the second large select statement to actually calculate the prices properly. The query is only taking in to account the first property rate, and not selecting them as per the join statement. Any help?
select
r.user_id,
DATE_FORMAT(m1, '%b %Y') as date,
(SELECT
SUM(
DATEDIFF(LEAST(actual_check_out, LAST_DAY(m1)), GREATEST(actual_check_in, m1))
) AS numdays
FROM reservations
where actual_check_in <= LAST_DAY(m1)
AND m1 <= actual_check_out
AND user_id=r.user_id
GROUP BY m1) as days,
(SELECT
SUM(
DATEDIFF(LEAST(r.actual_check_out, LAST_DAY(m1)), GREATEST(r.actual_check_in, m1))
) *p.rate
FROM reservations as r
left join property as p
on r.property_id=p.id
where actual_check_in <= LAST_DAY(m1)
AND m1 <= actual_check_out
AND user_id=r.user_id
GROUP BY m1) as price
from (
select ('2015-01-01' - INTERVAL DAYOFMONTH('2015-01-01')-1 DAY) +INTERVAL m MONTH as m1 from (
select #rownum:=#rownum+1 as m from
(select 1 union select 2 union select 3 union select 4) t1,
(select 1 union select 2 union select 3 union select 4) t2,
(select 1 union select 2 union select 3 union select 4) t3,
(select 1 union select 2 union select 3 union select 4) t4,
(select #rownum:=-1) t0
) d1
) d2
cross join reservations as r
where m1<=CURDATE() group by user_id, m1 order by m1
http://sqlfiddle.com/#!9/36035/21
Still not sure of your request, but the query below may point you to right direction:
SELECT DATE_FORMAT(r.actual_check_in, '%m/%Y') AS mnth, r.user_id,
DATEDIFF(MAX(r.actual_check_out),MIN(r.actual_check_in)) AS days,
DATEDIFF(MAX(r.actual_check_out),MIN(r.actual_check_in))*p.rate AS totalRate
FROM reservations r
JOIN property p ON r.property_id=p.id
GROUP BY DATE_FORMAT(r.actual_check_in, '%m/%Y'), r.user_id;
This returns data like below:
mnth user_id days totalRate
------- ------- ------ -----------
01/2014 1 9 1350
02/2014 1 56 8400
03/2014 1 5 750
http://sqlfiddle.com/#!9/36035/36
select
r.user_id as userId,
DATE_FORMAT(m1, '%b %Y') as date,
(SELECT
SUM(
DATEDIFF(LEAST(actual_check_out, LAST_DAY(m1)), GREATEST(actual_check_in, m1))
) AS numdays
FROM reservations
where actual_check_in <= LAST_DAY(m1)
AND m1 <= actual_check_out
AND user_id=userId
GROUP BY m1) as days,
(SELECT
sum(DATEDIFF(LEAST(r.actual_check_out, LAST_DAY(m1)), GREATEST(r.actual_check_in, m1))*p.rate)
FROM reservations as r
left join property as p
on r.property_id=p.id
where r.actual_check_in <= LAST_DAY(m1)
AND m1 <= r.actual_check_out
AND r.user_id=userId
GROUP BY m1) as price
from (
select ('2015-01-01' - INTERVAL DAYOFMONTH('2015-01-01')-1 DAY) +INTERVAL m MONTH as m1 from (
select #rownum:=#rownum+1 as m from
(select 1 union select 2 union select 3 union select 4) t1,
(select 1 union select 2 union select 3 union select 4) t2,
(select 1 union select 2 union select 3 union select 4) t3,
(select 1 union select 2 union select 3 union select 4) t4,
(select #rownum:=-1) t0
) d1
) d2
cross join reservations as r
where m1<=CURDATE() group by user_id, m1 order by m1

specific status on consecutive days

I have a MySQL table ATT which has EMP_ID,ATT_DATE,ATT_STATUS with ATT_STATUS with different values 1-Present,2-Absent,3-Weekly-off. I want to find out those EMP_ID's which have status 2 consecutively for 10 days in a given date range.
Please help
Please have a try with this:
SELECT EMP_ID FROM (
SELECT
IF((#prevDate!=(q.ATT_DATE - INTERVAL 1 DAY)) OR (#prevEmp!=q.EMP_ID) OR (q.ATT_STATUS != 2), #rownum:=#rownum+1, #rownum:=#rownum) AS rownumber, #prevDate:=q.ATT_DATE, #prevEmp:=q.EMP_ID, q.*
FROM (
SELECT
EMP_ID
, ATT_DATE
, ATT_STATUS
FROM
org_tb_dailyattendance, (SELECT #rownum:=0, #prevDate:='', #prevEmp:=0) vars
WHERE ATT_DATE BETWEEN '2013-01-01' AND '2013-02-15'
ORDER BY EMP_ID, ATT_DATE, ATT_STATUS
) q
) sq
GROUP BY EMP_ID, rownumber
HAVING COUNT(*) >= 10
The logic is, to first sort the table by employee id and the dates. Then introduce a rownumber which increases only if
the days are not consecutive or
the employee id is not the previous one or
the status is not 2
Then I just grouped by this rownumber and counted if there are 10 rows in each group. That should be the ones who were absent for 10 days or more.
Have you tried something like this
SELECT EMP_ID count(*) as consecutive_count min(ATT_DATE)
FROM (SELECT * FROM ATT ORDER BY EMP_ID)
GROUP BY EMP_ID, ATT_DATE
WHERE ATT_STATUS = 2
HAVING consecutive_count > 10