COUNT DISTINCT + COUNT GROUP BY HAVING (value) + GROUP BY months - mysql

I have a table with columns: cid, date
Sample table data: Note: cid contains string values eg: 'otsytb8o7sbs50w9doghwzvfy0vb8f9h' many are duplicated.
cid. date
--------------------------------------------------------
1 2015-10-10 04:57:57
2 2015-10-10 05:03:58
3 2015-10-10 05:24:49
4 2015-10-10 05:28:24
5 2015-10-10 05:28:26
6 2015-10-10 05:28:40
7 2015-10-10 05:30:39
8 2015-10-10 05:33:04
9 2015-10-10 05:35:42
9 2015-10-10 05:36:03
I want to get the following:
Count of Distinct cid as uniqVisits
Count of cid HAVING (count <= 1) as bounced
Grouped by month
I want to get bounce rate per month from Cookie ID's (cid).
So I am looking for: ( COUNT of unique Cookie ID's with a count of <=1 ) for bounced, and ( COUNT DISTINCT cid's ) for total unique visitors, Grouped By month
Desired result:
uniqVisits | bounced | month
-----------|---------|-------
2345 | 325 | 2015-10
-----------|---------|-------
7345 | 734 | 2015-11
-----------|---------|-------
3982 | 823 | 2015-12
-----------|---------|-------
4291 | 639 | 2016-01
I have tried a lot of methods the below is the closest I can get but it gives me error: "Operand should contain 1 column(s)"
SELECT count(*) AS bounced,
( SELECT count( DISTINCT(cid) ) AS uniqVisits,
SUBSTR(DATE(date),1,7) AS month
FROM table ) AS uniqVisits
FROM (
SELECT COUNT(cid) AS bounced,
SUBSTR(DATE(date),1,7) AS month
FROM table
GROUP BY cid
HAVING (count <= 1)
) AS x
GROUP BY month
How can I write this query to give me the desired result I want in the "Desired result:" chart / table illustrated above?
BTW: I also tried the below query but it times out, and then throws a server error: It also does not group the second query into month, obviously because of the "cid having count <=1"
SELECT c1.uniqVisits,
c1.month,
c2.bounced
FROM ( SELECT COUNT(DISTINCT t1.cid) AS `uniqVisits`,
SUBSTR(DATE(t1.date),1,7) AS `month`
FROM table t1
GROUP BY month
) c1
JOIN ( SELECT COUNT(*) AS `bounced`,
SUBSTR(DATE(t2.date),1,7) AS `month`
FROM table t2
GROUP BY month, cid HAVING (count <= 1)
) c2
ON c2.month = c1.month
ORDER BY c1.month

So I have resolved this:
SELECT uniqVisitors, COUNT(*) AS bounced, T1.month
FROM (
SELECT cid,
SUBSTR(DATE(date),1,7) AS month
FROM table
GROUP BY cid
HAVING COUNT(*) <= 1
) T1
LEFT JOIN
( SELECT count( DISTINCT(cid) ) AS uniqVisitors,
SUBSTR(DATE(date),1,7) AS month
FROM table
GROUP By month ) T2
ON T1.month = T2.month
GROUP BY month
Gives me:
uniqVisitors | bounced | month
---------------------------------
7237 6822 2015-10
12597 12136 2015-11
12980 12573 2015-12
12091 11695 2016-01
5396 5134 2016-02

Related

Using mysql 8 window functions

My salary table looks like this,
employeeId Salary salaryEffectiveFrom
19966 10000.00 2022-07-01
19966 20000.00 2022-07-15
My role/grades table looks like this,
employeeId grade roleEffectiveFrom
19966 grade 3 2022-07-01
19966 grade 2 2022-07-10
I am trying to get the salary a grade is paid for by taking into account the effective date in both tables.
grade 3 is effective from 1-July-2022. grade 2 is effective from the 10th of July, implying grade 3 is effective till the 9th of July i.e. 9 days.
grade 2 is effective from 10-July-2022 onwards.
A salary of 10000 is effective from 1-July-2022 till 14-July-2022 as the salary of 20000 is effective from the 15th. Therefore grade 3 had a salary of 10000 for 9 days, grade 2 salary of 10000 for 4 days with grade 2 with a salary of 20000 from the 10th onwards. The role effectivefrom
date takes precedence over the salary effectivefrom date.
This query,
SELECT er.employeeId,
es.salary,
`grade`,
date(er.effectiveFrom) roleEffectiveFrom,
date(es.effectiveFrom) salaryEffectiveFrom,
DATEDIFF(LEAST(COALESCE(LEAD(er.effectiveFrom)
OVER (PARTITION BY er.employeeId ORDER By er.effectiveFrom),
DATE_ADD(LAST_DAY(er.effectiveFrom),INTERVAL 1 DAY)),
DATE_ADD(LAST_DAY(er.effectiveFrom),INTERVAL 1 DAY)),
er.effectiveFrom) as '#Days' ,
ROUND((salary * 12) / 365, 2) dailyRate
FROM EmployeeRole er
join EmployeeSalary es ON (es.employeeId = er.employeeId)
and er.employeeId = 19966
;
gives me the result set shown below,
employeeId Salary grade roleEffectiveFrom salaryEffectiveFrom Days dailyRate
19966 10000.00 grade 3 2022-07-01 2022-07-01 0 328.77
19966 20000.00 grade 3 2022-07-01 2022-07-15 9 657.53
19966 10000.00 grade 2 2022-07-10 2022-07-01 0 328.77
19966 20000.00 grade 2 2022-07-10 2022-07-15 22 657.53
grade3 is effective for 9 days in July so I want to get the total salary for those 9 days using a daily rate column, 328.77 * 9 = 2985.93 as a separate column but I am unable to do as I am getting the days for the wrong row i.e. 9 should be the result for the first row.
dbfiddle
merge the 2 table dates, lead them then use correlated sub queries
with cte as
(
SELECT employeeid,effectivefrom from EMPLOYEEROLE
union
select employeeid,effectivefrom from employeesalary
)
,cte1 as
(select employeeid,effectivefrom,
coalesce(
date_sub(lead(effectivefrom) over (partition by employeeid order by effectivefrom),interval 1 day) ,
now()) nexteff
from cte
)
select *,
datediff(nexteff,effectivefrom) + 1 diff,
(select grade from employeerole e where e.effectivefrom <= cte1.effectivefrom order by e.effectivefrom desc limit 1) grade,
(select salary from employeesalary e where e.effectivefrom <= cte1.nexteff order by e.effectivefrom desc limit 1) salary
from cte1;
+------------+---------------------+---------------------+------+---------+--------+
| employeeid | effectivefrom | nexteff | diff | grade | salary |
+------------+---------------------+---------------------+------+---------+--------+
| 19966 | 2022-07-01 00:00:00 | 2022-07-09 00:00:00 | 9 | grade 3 | 10000 |
| 19966 | 2022-07-10 00:00:00 | 2022-07-14 00:00:00 | 5 | grade 2 | 10000 |
| 19966 | 2022-07-15 00:00:00 | 2022-10-08 08:51:49 | 86 | grade 2 | 20000 |
+------------+---------------------+---------------------+------+---------+--------+
3 rows in set (0.003 sec)
with cte as
(
SELECT employeeid,effectivefrom from EMPLOYEEROLE
union
select employeeid,effectivefrom from employeesalary
)
,cte1 as
(select cte.employeeid,effectivefrom,
coalesce(
date_sub(lead(effectivefrom) over (partition by employeeid order by effectivefrom),interval 1 day) ,
last_day(maxdt)) nexteff
from cte
JOIN (select cte.employeeid,max(effectivefrom) maxdt from cte group by employeeid) c1
on c1.employeeid = cte.employeeid
)
select *,
datediff(nexteff,effectivefrom) + 1 diff,
(select grade from employeerole e where e.effectivefrom <= cte1.effectivefrom order by e.effectivefrom desc limit 1) grade,
(select salary from employeesalary e where e.effectivefrom <= cte1.nexteff order by e.effectivefrom desc limit 1) salary
from cte1;
+------------+---------------------+---------------------+------+---------+--------+
| employeeid | effectivefrom | nexteff | diff | grade | salary |
+------------+---------------------+---------------------+------+---------+--------+
| 19966 | 2022-07-01 00:00:00 | 2022-07-09 00:00:00 | 9 | grade 3 | 10000 |
| 19966 | 2022-07-10 00:00:00 | 2022-07-14 00:00:00 | 5 | grade 2 | 10000 |
| 19966 | 2022-07-15 00:00:00 | 2022-07-31 00:00:00 | 17 | grade 2 | 20000 |
+------------+---------------------+---------------------+------+---------+--------+
3 rows in set (0.004 sec)
I think if it were me, I'd generate a list containing an entry for each day with the effective grade and salary, and then just aggregate at the end. Take a look at this fiddle:
https://dbfiddle.uk/4t2RW2M2
I've started with the aggregate query, just so we can see the output, then I break out pieces of the query to show intermediate outputs. Here is an image of the final output and the query generating it:
SELECT grade, gradeEffective, salary, salaryEffective,
min(dt) as startsOn, max(dt) as endsOn, count(*) as days,
dailyRate,
sum(dailyRate) as pay
FROM (
SELECT DISTINCT dt, grade, gradeEffective, salary, salaryEffective,
ROUND((salary * 12) / 365, 2) as dailyRate
FROM (
SELECT dts.dt,
first_value(r.grade) OVER w as grade,
first_value(r.effectiveFrom) OVER w as gradeEffective,
first_value(s.salary) OVER w as salary,
first_value(s.effectiveFrom) OVER w as salaryEffective
FROM (
WITH RECURSIVE dates(n) AS (SELECT 0 UNION SELECT n + 1 FROM dates WHERE n + 1 <= 30)
SELECT '2022-07-01' + INTERVAL n DAY as dt FROM dates
) dts
LEFT JOIN EmployeeSalary s ON dts.dt >= s.effectiveFrom
LEFT JOIN EmployeeRole r on dts.dt >= r.effectiveFrom
WINDOW w AS (
PARTITION BY dts.dt
ORDER BY r.effectiveFrom DESC, s.effectiveFrom DESC
ROWS UNBOUNDED PRECEDING
)
) z
) a GROUP BY grade, gradeEffective, salary, salaryEffective, dailyRate
ORDER BY min(dt);
Now, the first thing I've done is create a list of dates using a recursive CTE:
WITH RECURSIVE dates(n) AS (SELECT 0 UNION SELECT n + 1 FROM dates WHERE n + 1 <= 30)
SELECT '2022-07-01' + INTERVAL n DAY as dt FROM dates
which produces a list of dates from July 1st to July 31st.
Take that list of dates and left join both of your tables to it, like so:
SELECT *
FROM (
WITH RECURSIVE dates(n) AS (SELECT 0 UNION SELECT n + 1 FROM dates WHERE n + 1 <= 30)
SELECT '2022-07-01' + INTERVAL n DAY as dt FROM dates
) dts
LEFT JOIN EmployeeSalary s ON dts.dt >= s.effectiveFrom
LEFT JOIN EmployeeRole r on dts.dt >= r.effectiveFrom
with the dt greater than or equal to the effective dates. Notice that after the 9th you start to get duplicate rows for each date.
We'll create a window to get the first values for grade and salary for each date, and we'll order first by role effectiveFrom and then salary effectiveFrom, to fulfil your priority condition.
SELECT dts.dt,
first_value(r.grade) OVER w as grade,
first_value(r.effectiveFrom) OVER w as gradeEffective,
first_value(s.salary) OVER w as salary,
first_value(s.effectiveFrom) OVER w as salaryEffective
FROM (
WITH RECURSIVE dates(n) AS (SELECT 0 UNION SELECT n + 1 FROM dates WHERE n + 1 <= 30)
SELECT '2022-07-01' + INTERVAL n DAY as dt FROM dates
) dts
LEFT JOIN EmployeeSalary s ON dts.dt >= s.effectiveFrom
LEFT JOIN EmployeeRole r on dts.dt >= r.effectiveFrom
WINDOW w AS (
PARTITION BY dts.dt
ORDER BY r.effectiveFrom DESC, s.effectiveFrom DESC
ROWS UNBOUNDED PRECEDING
);
This is still going to leave us multiple entries for some dates, although they are duplicates, so let's use that output in a new query, using DISTINCT to leave us only one copy of each row and using the opportunity to add the daily rate field:
SELECT DISTINCT dt, grade, gradeEffective, salary, salaryEffective,
ROUND((salary * 12) / 365, 2) as dailyRate
FROM (
SELECT dts.dt,
first_value(r.grade) OVER w as grade,
first_value(r.effectiveFrom) OVER w as gradeEffective,
first_value(s.salary) OVER w as salary,
first_value(s.effectiveFrom) OVER w as salaryEffective
FROM (
WITH RECURSIVE dates(n) AS (SELECT 0 UNION SELECT n + 1 FROM dates WHERE n + 1 <= 30)
SELECT '2022-07-01' + INTERVAL n DAY as dt FROM dates
) dts
LEFT JOIN EmployeeSalary s ON dts.dt >= s.effectiveFrom
LEFT JOIN EmployeeRole r on dts.dt >= r.effectiveFrom
WINDOW w AS (
PARTITION BY dts.dt
ORDER BY r.effectiveFrom DESC, s.effectiveFrom DESC
ROWS UNBOUNDED PRECEDING
)
) z;
This produces the deduplicated daily data
and now all we have to do is use aggregation to pull out the sums for each combination of grade and salary, which is the query that I started off with.
Let me know if this is what you were looking for, or if anything is unclear.
Since the start and end conditions weren't fleshed out in the question, I just created the date list arbitrarily. It's not difficult to generate the list based on the first effectiveFrom in both tables, and here is an example that runs from that start date until current:
WITH RECURSIVE dates(n) AS (
SELECT min(effectiveFrom) FROM (
select effectiveFrom from EmployeeRole UNION
select effectiveFrom from EmployeeSalary
) z
UNION SELECT n + INTERVAL 1 DAY FROM dates WHERE n <= now()
)
SELECT n as dt FROM dates
I also didn't handle for multiple employees, since there was only one given and I would just be guessing at the shape of the actual data.
You can start adding two new columns (i.e. tmpFrom and tmpTo), which should give the correct dates which are needed to calculate the 9 Days.
SELECT
er.employeeId,
es.salary,
`grade`,
date(er.effectiveFrom) roleEffectiveFrom,
date(es.effectiveFrom) salaryEffectiveFrom,
DATEDIFF(LEAST(COALESCE(LEAD(er.effectiveFrom)
OVER (PARTITION BY er.employeeId ORDER By er.effectiveFrom),
DATE_ADD(LAST_DAY(er.effectiveFrom),INTERVAL 1 DAY)),
DATE_ADD(LAST_DAY(er.effectiveFrom),INTERVAL 1 DAY)),
er.effectiveFrom) as '#Days' ,
ROUND((salary * 12) / 365, 2) dailyRate,
date(er.effectiveFrom) tmpFrom,
(select e2.effectiveFrom
from EmployeeRole e2
where e2.employeeId = er.employeeId and e2.effectiveFrom > er.effectiveFrom
order by e2.effectiveFrom
limit 1) as tmpTo
FROM EmployeeRole er
join EmployeeSalary es ON (es.employeeId = er.employeeId)
and er.employeeId = 19966
order by er.effectiveFrom
;
In above query I used a sub-select, which might hurt performance. You can study Window Function, and check if there is a function which suits your needs better than this sub-query.
It's up to you to calculate the number of days between those two columns, but you should also solve the NULL value which should be end of month (But I am not sure if I remember your problem correctly...)
see: DBFIDDLE

Creating an overdraft statement

I'm currently stuck on how to create a statement that shows daily overdraft statements for a particular council.
I have the following, councils, users, markets, market_transactions, user_deposits.
market_transaction run daily reducing user's account balance. When the account_balance is 0 the users go into overdraft (negative). When users make a deposit their account balance increases.
I Have put the following tables to show how transactions and deposits are stored.
if I reverse today's transactions I'm able to get what account balance a user had yesterday but to formulate a query to get the daily OD amount is where the problem is.
USERS
user_id
name
account_bal
1
Wells
-5
2
James
100
3
Joy
10
4
Mumbi
-300
DEPOSITS
id
user_id
amount
date
1
1
5
2021-04-26
2
3
10
2021-04-26
3
3
5
2021-04-25
4
4
5
2021-04-25
TRANSACTIONS
id
user_id
amount_tendered
date
1
1
5
2021-04-27
2
2
10
2021-04-26
3
3
15
2021-04-26
4
4
50
2021-04-25
The Relationships are as follows,
COUNCILS
council_id
name
1
a
2
b
3
c
MARKETS
market_id
name
council_id
1
x
3
2
y
1
3
z
2
MARTKET_USER_LINK
id
market_id
user_id
1
1
3
2
2
2
3
3
1
I'm running this SQL query to get the total amount users have spent and subtracting with the current user account balance.
Don't know If I can use this to figure out the account_balance for each day.
SELECT u.user_id, total_spent, total_deposits,m.council_id
FROM users u
JOIN market_user_link ul ON ul.user_id= u.user_id
LEFT JOIN markets m ON ul.market_id =m.market_id
LEFT JOIN councils c ON m.council_id =c.council_id
LEFT JOIN (
SELECT user_id, SUM(amount_tendered) AS total_spent
FROM transactions
WHERE DATE(date) BETWEEN DATE('2021-02-01') AND DATE(NOW())
GROUP BY user_id
) t ON t.user_id= u.user_id
ORDER BY user_id, total_spent ASC
// looks like this when run
| user_id | total_spent | council_id |
|-------------|----------------|------------|
| 1 | 50.00 | 1 |
| 2 | 2.00 | 3 |
I was hoping to reverse transactions and deposits done to get the account balance for a day then get the sum of users with an account balance < 0... But this has just failed to work.
The goal is to produce a query that shows daily overdraft (Only SUM the total account balance of users with account balance below 0 ) for a particular council.
Expected Result
date
council_id
o_d_amount
2021-04-24
1
-300.00
2021-04-24
2
-60.00
2021-04-24
3
-900.00
2021-04-25
1
-600.00
2021-04-25
2
-100.00
2021-04-25
3
-1200.00
This is actually not that hard, but the way you asked makes it hard to follow.
Also, your expected result should match the data you provided.
Edited: Previous solution was wrong - It counted withdraws and deposits more than once if you have more than one event for each user/date.
Start by having the total exchanged on each day, like
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d
group by user_id, date
order by user_id, date;
What follows gets the state of the account only on days that had any deposits or withdraws.
To get the results of all days (and not just those with account movement) you just have to change the cross join part to get a table with all dates you want (like Get all dates between two dates in SQL Server) but I digress...
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
order by dates.date desc, c.council_id, u.name;
From there you can rearrange to get the result you want.
select date, council_id
, sum(case when amount_on_start_of_day<0 then amount_on_start_of_day else 0 end) o_d_amount_start
, sum(case when amount_on_end_of_day<0 then amount_on_end_of_day else 0 end) o_d_amount_end
from (
select dates.date, c.council_id, u.name username
, u.account_bal - sum(case when e.date >= dates.date then e.exchanged_on_day else 0 end) as amount_on_start_of_day
, u.account_bal - sum(case when e.date > dates.date then e.exchanged_on_day else 0 end) as amount_on_end_of_day
from councils c
inner join markets m on c.council_id=m.council_id
inner join market_user_link mul on m.market_id=mul.market_id
inner join users u on mul.user_id=u.user_id
left join (
select user_id, date, sum(amount) exchanged_on_day from (
select user_id, date, amount amount from deposits
union all select user_id, date, -amount_tendered amount from transactions
) d group by user_id, date
) e on u.user_id=e.user_id --exchange on each Day
cross join (select distinct date from (select date from deposits union select date from transactions) datesInternal) dates --all days that had a transaction
group by dates.date, c.council_id, u.name, u.account_bal
) result
group by date, council_id
order by date;
You can check it on https://www.db-fiddle.com/f/msScT6B5F7FjU2aQXVr2da/6
Basically the query maps users to councils, caculates periods of overdrafts for users, them aggregates over councils. I assume that starting balance is dated start of the month '2021-04-01' (it could be ending balance as well, see below), change it as needed. Also that negative starting balance counts as an overdraft. For simplicity and debugging the query is divided into a number of steps.
with uc as (
select distinct m.council_id, mul.user_id
from markets m
join market_user_link mul on m.market_id = mul.market_id
),
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
sum(sum(s)) over(partition by user_id order by date) rt
from (
select user_id, date, -amount_tendered s
from transactions
union all
select user_id, date, amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, 0 s
union all
select '2021-04-01' d, account_bal
) se
) t
group by user_id, date
),
user_overdraft as (
select user_id, date, nxt, least(rt, 0) ovd
from user_running_total
where date <= date(NOW())
),
dates as (
select date
from user_overdraft
union
select nxt
from user_overdraft
),
council__overdraft as (
select uc.council_id, d.date, sum(uo.ovd) total_overdraft, lag(sum(uo.ovd), 1, sum(uo.ovd) - 1) over(partition by uc.council_id order by d.date) prev_ovd
from uc
cross join dates d
join user_overdraft uo on uc.user_id = uo.user_id and d.date between uo.date and uo.nxt
group by uc.council_id, d.date
)
select council_id, date, total_overdraft
from council__overdraft
where total_overdraft <> prev_ovd
order by date, council_id
Really council__overdraft is quite usable, the last step just compacts output excluding intermidiate dates when overdraft is not changed.
With following sample data:
users
user_id name account_bal
1 Wells -5
2 James 100
3 Joy 10
4 Mumbi -300
deposits, odered by date, extra row added for the last date
id user_id amount date
3 3 5 2021-04-25
4 4 5 2021-04-25
1 1 5 2021-04-26
2 3 10 2021-04-26
5 3 73 2021-05-06
transactions, odered by date (note the added row, to illustrate running total in action)
id user_id amount_tendered date
5 4 50 2021-04-25
2 2 10 2021-04-26
3 3 15 2021-04-26
1 1 5 2021-04-27
4 3 17 2021-04-27
councils
council_id name
1 a
2 b
3 c
markets
market_id name council_id
1 x 3
2 y 1
3 z 2
market_user_link
id market_id user_id
1 1 3
2 2 2
3 3 1
4 3 4
the query ouput is
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-305
3
2021-04-01
0
2
2021-04-25
-350
2
2021-04-26
-345
2
2021-04-27
-350
3
2021-04-27
-7
3
2021-05-06
0
Alternatively, provided the users table is holding a closing (NOW()) balance, replace user_running_total CTE with the following code
user_running_total as (
select user_id, date,
coalesce(lead(date) over(partition by user_id order by date) - interval 1 day, date) nxt,
coalesce(sum(sum(s)) over(partition by user_id order by date desc
rows between unbounded preceding and 1 preceding), sum(s)) rt
from (
select user_id, date, amount_tendered s
from transactions
union all
select user_id, date, -amount
from deposits
union all
select user_id, se.d, se.s
from users
cross join lateral (
select date(NOW() + interval 1 day) d, account_bal s
union all
select '2021-04-01' d, 0
) se
) t
where DATE(date) between date '2021-04-01' and date(NOW() + interval 1 day)
group by user_id, date
),
This way the query starts with closing balance dated next date after now and rollouts a running total in the reverse order till '2021-04-01' as a starting date.
Output
council_id
date
overdraft
1
2021-04-01
0
2
2021-04-01
-260
3
2021-04-01
-46
2
2021-04-25
-305
3
2021-04-25
-41
2
2021-04-26
-300
3
2021-04-26
-46
2
2021-04-27
-305
3
2021-04-27
-63
3
2021-05-06
0
db-fiddle both versions

How can I count distinct months names in a set of date values?

I have table 'Data' and there has two field is Date_date1 and also Data_date2, and i want count it based on month.
this my database
Table: Data
Data_date1 Data_date2
---------------------------------
2019-07-23 2019-01-23
2019-08-23 2019-01-24
2019-08-24 2019-02-23
2019-09-21 2019-07-23
2019-09-22 2019-09-22
2019-09-23 2019-09-23
and i want the results like this one
Month Count_Date1 Count_Date2
Jan 0 2
Feb 0 1
July 1 1
Aug 2 0
Sep 3 9
Try this:
SELECT MONTH(data_date) m
,SUM(d=1) d1
,SUM(d=2) d2
FROM
(SELECT 1 d, data_date1 data_date FROM my_table
UNION
SELECT 2, data_date2 FROM my_table
) x
GROUP BY m
Here’s some setup with which to test this query, which produces the desired results:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(data_date1 DATE NOT NULL
,data_date2 DATE NOT NULL
);
INSERT INTO my_table VALUES
('2019-07-23','2019-01-23'),
('2019-08-23','2019-01-24'),
('2019-08-24','2019-02-23'),
('2019-09-21','2019-07-23'),
('2019-09-22','2019-09-22'),
('2019-09-23','2019-09-23');
You can use union all and group by:
select month(dte), sum(cnt1), sum(cnt2)
from ((select data_date1 as dte, 1 as cnt1, 0 as cnt2
from t
) union all
(select data_date2, 0, 1
from t
)
) dd
group by month(dte);
This shows the month number rather than the month name.
If you want the month name, you would do:
select monthname(dte), sum(cnt1), sum(cnt2)
from ((select data_date1 as dte, 1 as cnt1, 0 as cnt2
from t
) union all
(select data_date2, 0, 1
from t
)
) dd
group by monthname(dte), month(dte)
order by month(dte);

SQL retrieving most recent record (dated) per ID?

SELECT detailsID,`Topic 1 Scores`, MAX(Date) as "Date"
FROM Information.scores
WHERE `Topic 1 Scores` IS NOT NULL
GROUP BY `detailsID`,`Topic 1 Scores`
Is printing;
detailsID, Topic 1 Scores, MAX(Date)
2 0 26/09/2017
2 45 26/09/2017
2 100 26/09/2017
3 30 25/09/2017
3 80 14/10/2017
Rather than actually selecting the most recent date per detailsID which would be:
2 100 26/09/2017
3 80 14/10/2017
I want to retrieve TOPIC 1 SCORES with the most recent score (excluding null) (sorted by date) for each detailsID, (there are only detailsID 2 and 3 here, therefore only two results should return)
Solution 1 attempt
Inner subquery
You can do this:
SELECT t1.detailsID, t1.`Topic 1 Scores`, t1.date
FROM scores as t1
INNER JOIN
(
SELECT detailsID, MAX(date) as "LatestDate"
FROM scores
WHERE `Topic 1 Scores` IS NOT NULL
GROUP BY `detailsID`
) AS t2 ON t1.detailsID = t2.detailsID AND t1.date = t2.LatestDate
Demo
The subquery will give you the most recent date for each detailsID then in the outer query, there is a join with the original table to eliminate all the rows except those with the most recent date.
Update:
There are some rows with the same latest date, thats why you will have multiple rows with the same date and the same detailsID, to solve this you can add another aggregate for the score, so that you have only one row for each details id with the latest date and max score:
SELECT t1.detailsID, t1.`Topic 1 Scores`, t1.date
FROM scores as t1
INNER JOIN
(
SELECT detailsID, MAX(`Topic 1 Scores`) AS MaxScore, MAX(date) as "LatestDate"
FROM scores
WHERE `Topic 1 Scores` IS NOT NULL
GROUP BY `detailsID`
) AS t2 ON t1.detailsID = t2.detailsID
AND t1.date = t2.LatestDate
AND t1.`Topic 1 Scores` = t2.MaxScore
updated demo
Results:
| detailsID | Topic 1 Scores | date |
|-----------|----------------|------------|
| 2 | 100 | 2017-09-26 |
| 3 | 80 | 2017-10-14 |
WITH MYCTE AS
(
SELECT DetailsId, [Topic 1 Score], ROW_NUMBER() OVER ( Partition BY DetailsID ORDER BY DATE DESC) Num
FROM Scores
)
SELECT * FROM MYCTE WHERE num = 1
GO

Use a sub query result

I have a table with numbers and dates (1 number each date and dates aren't necessarily at regular intervals).
I would like to get the count of dates when a number isn't in the table.
Where I am :
select *
from
(
select
date from nums
where chiffre=1
order by date desc
limit 2
) as f
I get this :
date
--------------
2014-09-07
--------------
2014-07-26
Basically, I have this query dynamically:
select * from nums where date between "2014-07-26" and "2014-09-07"
And in a second time, browse the whole table (because there I limited to the first 2 rows but I would compare the 2 and 3 and 3 and 4 etc...)
The goal is to get this:
date | actual_number_of_real_dates_between_two_given_dates
2014-09-07 - 2014-07-26 | 20
2014-04-02 - 2014-02-12 | 13
etc...
How can I do this? Thanks.
Edit:
What I have (just an example, dates and "chiffre" are more complex) :
date | chiffre
2014-09-30 | 2
2014-09-29 | 1
2014-09-28 | 2
2014-09-27 | 2
2014-09-26 | 1
2014-09-25 | 2
2014-09-24 | 2
etc...
What I need for the number "1":
actual_number_of_real_dates_between_two_given_dates
1
3
etc...
Edit 2:
My updated query thanks to Gordon Linoff
select count(n.id) as difference
from nums n inner join
(select min(date) as d1, max(date) as d2
from (select date from nums where chiffre=1 order by date desc limit 2) d
) dd
where n.date between dd.d1 and dd.d2
How can I test row 2 with 3? 3 with 4 etc... Not only last 2?
Should I use a loop? Or I can do it without?
Does this do what you want?
select count(distinct n.date) as numDates,
(datediff(dd.d2, dd.d1) + 1) as datesInPeriod,
(datediff(dd.d2, dd.d1) + 1 - count(distinct n.date)) as missingDates
from nums n cross join
(select date('2014-07-26') as d1, date('2014-09-07') as d2) d
where n.date between dd.d1 and dd.d2;
EDIT:
If you just want the last two dates:
select count(distinct n.date) as numDates,
(datediff(dd.d2, dd.d1) + 1) as datesInPeriod,
(datediff(dd.d2, dd.d1) + 1 - count(distinct n.date)) as missingDates
from nums n cross join
(select min(date) as d1, max(date) as d2
from (select date from nums order by date desc limit 2) d
) dd
where n.date between dd.d1 and dd.d2;