Here's my table, showing user names and the timestamp they scored a point:
id user date
1 Aaron 23/02/2012 22:44
2 Betty 23/02/2012 22:47
3 Carlos 24/02/2012 16:01
4 David 28/02/2012 11:40
5 David 28/02/2012 12:32
6 David 28/02/2012 16:59
7 Aaron 2/03/2012 13:46
8 Aaron 30/03/2012 18:37
9 Betty 30/03/2012 19:58
10 Emma 9/04/2012 6:49
11 Emma 9/04/2012 13:19
12 Emma 9/04/2012 18:20
13 Emma 9/04/2012 20:46
14 Aaron 10/04/2012 15:47
15 Betty 10/04/2012 19:15
16 Betty 10/04/2012 20:40
17 Carlos 11/04/2012 9:44
18 Carlos 11/04/2012 20:01
19 David 11/04/2012 23:17
20 David 12/04/2012 17:09
And here is the results table I am trying to achieve, i.e. an x axis showing month-year, and a y axis displaying the number of users who reached a certain points threshold within that month:
date 1 point First time? 2 points First time? 3 points First time? 4 points First time? Total
Feb-12 A,B,C A,B,C D D 4
Mar-12 B A A 3
Apr-12 A,B,C B,C,D B,C,D E E 4
I've only got as far as calculating the total number of points and the total number of distinct scorers within a given month:
SELECT DISTINCT CONCAT (MONTHNAME(date), ' ', YEAR(date)) as 'date', COUNT(id) as total_points, COUNT(distinct referrer_id) as number_of_scorers
from points
group by CONCAT (MONTH(date), ' ', YEAR(date))
order by YEAR(date), MONTH(date)
which is only giving me:
date total_points number_of_scorers
Feb-12 6 4
Mar-12 3 3
etc.
So my questions are:
How can I amend the query to show me which users reached each point threshold within each month?
How can I amend the query to show me which users reached each point threshold for the first time within that month?
Thanks
The basic query you need is this:
select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user;
This gets the number of points for each user in a month.
The rest is just aggregations, joins, and conditions:
select ymu.yyyymm,
group_concat(case when ymu.points = 1 then user end) as Points1_Users,
group_concat(case when ymu.points = 1 and ymu.yyyymm = u.min_yyyymm then user end) as Points1_Users_First,
group_concat(case when ymu.points = 2 then user end) as Points2_Users,
group_concat(case when ymu.points = 2 and ymu.yyyymm = u.min_yyyymm then user end) as Points2_Users_First
from (select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user
) ymu join
(select user, min(yyyymm) as min_yyyymm
from (select date_format(date, '%Y-%m') as yyyymm, user, count(*) as points
from t
group by date_format(date, '%Y-%m') as yyyymm, user
) t
group by user
) u
on ymu.user = u.user
group by yyyymm
order by yyyymm;
Related
I have an main dataset(users) as follows.
ID Username Status
1 John Active
2 Mike Active
3 Ann Deactive
4 Leta Active
5 Lena Active
6 Lara Active
7 Mitch Active
Further I have revenue table as follows.
subuser hour Revenue
John_01 2/26/2022 5:00 5
Mike_01 2/26/2022 7:00 8
Mike_02 2/26/2022 7:00 22
Leta_03 2/26/2022 7:00 67
Leta_07 2/26/2022 9:00 56
Mitch_07 2/26/2022 11:00 34
Now I need to get a table as follows.
User Total Usage
John 5
Mike 22
Leta 123
Lena 0
Lara 0
Mitch 0
Here I need to get the sum of all hours of each user substring and match with main user table.Further if same hour is for same substring I need to get the maximum revenue value and other values should be neglect for that particular hour.
Ex:
Mike_01 2/26/2022 7:00 8
Mike_02 2/26/2022 7:00 22
Here Mike_01 2/26/2022 7:00 8 should neglect.
So I tried as below.
SELECT
u.Username,
COALESCE(SUM(Revenue), 0) AS TOTAL USAGE
FROM users u
LEFT JOIN revenuetable e
ON SUBSTRING_INDEX(e.subuser, '_', 1) = u.Username AND
e.Hour BETWEEN 'XXX' and 'XXX'
where u.Status='Active'
GROUP BY
u.Username
order by u.ID.
But this didn't get the maximum value if same hour repeats. Can someone show me where I messed this?
update:
Do we have any method other tan using window functions?
If using MySQL that supports row_number() then join to a derived table that removes the unwanted rows.
SELECT
u.Username,
COALESCE(SUM(Revenue), 0) AS TOTAL USAGE
FROM users u
LEFT JOIN (
Select *
, row_number() OVER(partition by SUBSTRING_INDEX(e.subuser, '_', 1), hour order by revenue DESC) rn
From revenuetable ) e
ON SUBSTRING_INDEX(e.subuser, '_', 1) = u.Username AND rn = 1
e.Hour BETWEEN 'XXX' and 'XXX'
where u.Status='Active'
GROUP BY
u.Username
order by u.ID
Introducing this function and the over clause will give precedence to the highest revenue in each hour per user as the 'rn' column will be 1 for each such row.
I have the following data (base_data):
visit_date user_id
11/12/2021 Jake
11/12/2021 Amy
12/12/2021 Holt
12/12/2021 Jake
13/12/2021 Amy
13/12/2021 Jake
14/12/2021 Jake
14/12/2021 Holt
There are two users that visit on 11th and then only one of them visit on 12th. Hence where 11th is the first day, Day_1 = 2 and Day_2 = 1.
According to my query, I get the following result after pivoting rcohortday as row and day_number as column:
Date Day_1 Day_2 Day_3 Day_4
11/12/2021 2 1 2 1
12/12/2021 1 0 1
13/12/2021 0 0
14/12/2021 1
However, the 12/12/2021 row doesn't consider the user that arrived on the 1st and the 2nd day. I want it to consider totals for that day regardless if the user had visited in the previous days or not.
My desired result would be:
Date Day_1 Day_2 Day_3 Day_4
11/12/2021 2 1 2 1
12/12/2021 2 1 1
13/12/2021 2 1
14/12/2021 1
Let me know if you need anymore clarity especially with the examples.
The following is my query:
with user_cohorts as (
SELECT user_id
, MIN(DATETRUNC(to_date(visit_date, 'yyyymmdd'),'dd')) as cohortday
FROM base_data
GROUP BY user_id
),
visit_day as (
SELECT user_id
, (DATEDIFF(to_date(visit_date, 'yyyymmdd'),cohortday, 'dd')+1) as day_number
, count(distinct user_id) as user_count
FROM base_data
LEFT JOIN user_cohorts USING(user_id)
GROUP BY user_id, DATEDIFF(to_date(visit_date, 'yyyymmdd'),cohortday, 'dd')
),
cohort_size as (
SELECT count(*) as user_count
, cohortday
FROM user_cohorts
GROUP BY cohortday
ORDER BY cohortday
),
retention_table as (
SELECT c.cohortday as rcohortday
, o.day_number
, sum(user_count) as user_count
FROM visit_day o
LEFT JOIN user_cohorts c USING (user_id)
group by c.cohortday
, o.day_number
)
select * from retention_table
I am using Max compute SQL which is an Ali Baba technology. It's similar to MySQL.
I have a table that looks like this
id
date registered
date cancelled
1
2021-01-01
2021-03-02
2
2021-01-05
2021-01-21
3
2021-02-04
2021-02-25
4
2021-02-16
2021-03-26
How do I generate a query in mysql that will give me counts of cancelled and registered for each month.
I can do it for just one of the dates but don't know how to combine for both dates.
For eg for a single date I would do this.
SELECT date_format(`users`.`dateregistered`,_utf8'%Y-%m') AS `DateREegistered`, count(0) AS `Registration Count`
FROM `users`
GROUP BY date_format(`users`.`dateregistered`,_utf8'%Y-%m')
But I want something like this
Date
Registered Count
Cancelled Count
2021-01
2
1
2021-02
2
1
2021-03
0
2
Please let me know if you have any ideas.
You can join the distinct months appearing in date registered and date registered to the table and use conditional aggregation:
SELECT t.Date,
SUM(t.Date = date_format(dateregistered, '%Y-%m')) `Registered Count`,
SUM(t.Date = date_format(datecancelled, '%Y-%m')) `Cancelled Count`
FROM (
SELECT date_format(dateregistered, '%Y-%m') Date FROM users
UNION
SELECT date_format(datecancelled, '%Y-%m') FROM users
) t INNER JOIN users u
ON t.Date IN (date_format(dateregistered, '%Y-%m'), date_format(datecancelled, '%Y-%m'))
GROUP BY t.Date
See the demo.
Results:
Date
Registered Count
Cancelled Count
2021-01
2
1
2021-02
2
1
2021-03
0
2
I'm loking for one logic that might be not accepatable.
But my requirement is I want count of customers(NewCustomers, repeatCustomers) on the basis of previous and current month
Like from this data I want
DATE NAME
2016-01-01 A
2016-01-01 B
2016-01-01 C
2016-01-05 E
2016-01-05 F
2016-01-25 G
2016-01-25 H
2016-02-25 A
2016-02-25 E
2016-02-10 X
2016-02-11 Y
2016-02-13 F
Output like this
MONTH NewCustomer RepeatCustomer CustomerCount of refernece month (Like here is JAN)
FEB 2 3 7
Same will go for next months
Any suggestion ? Thanks !!
I don't know what the reference month is, but you can get the first two columns by combining the first time you see a customer with who visits in each month:
select date_format(c.date, '%Y-%m') as yyyymm,
count(distinct c.name) as NumCustomers,
sum(case when date_format(c.date, '%Y-%m') <> date_format(cc.start_date, '%Y-%m')
then 1 else 0
end) as NumRepeatCustomers
from customers c join
(select c.name, min(c.date) as start_date
from customers c
group by c.name
) cc
on c.name = cc.name
group by date_format(c.date, '%Y-%m')
order by yyyymm;
I want to list grouped by month and the days remaining to complete the course in the next column. The Course has 10 days.
Example data
ID Name Date
1 Sandy 2015-05-06
2 Candy 2015-05-06
3 Sandy 2015-05-28
4 Candy 2015-05-29
5 Candy 2015-06-01
Preferred output
| Name | Month | Attended | Remaining|
| Sandy| May | 2 | 8 |
| Candy| May | 2 | 8 |
| Candy| June | 1 | 7 |
If I use GROUP BY DATE_FORMAT(date, '%Y%m'), Name and try to do the calculation it does not work.
You need two different aggregates:
The number of days attended in the current month for a given user.
The number of days attended in all months up to and including the current month for a given user.
That's a tad fiddly, so 'tis time to do some Test-Driven Query Design (TDQD).
The table in the question is anonymous — that's such a common and irritating situation. So, the table is henceforth CourseAttendance, with the three columns shown in the data (ID, Name, Date).
Number of days attended by user in a specific month
Assuming the expression DATE_FORMAT(date, '%Y-%m') is syntactically valid, and that neither Date nor Month as a column name causes problems, then:
SELECT DATE_FORMAT(Date, '%Y-%m') AS Month,
Name,
COUNT(*) AS NumDays
FROM CourseAttendance
GROUP BY Month, Name
This should produce:
Month Name NumDays
2015-05 Sandy 2
2015-05 Candy 2
2015-06 Candy 1
Number of days attended by user up to and including a specific month
This time, the aggregate has to be over all dates less than or equal to the converted month value:
SELECT D.Month, D.Name, SUM(C.NumDays) AS TotDays
FROM (SELECT DISTINCT DATE_FORMAT(Date, '%Y-%m') AS Month, Name
FROM CourseAttendance
) AS D
JOIN (SELECT DATE_FORMAT(Date, '%Y-%m') AS Month,
Name,
COUNT(*) AS NumDays
FROM CourseAttendance
GROUP BY Month, Name
) AS C
ON C.Month <= D.Month AND C.Name = D.Name
GROUP BY D.Month, D.Name
This should give the output:
Month Name NumDays
2015-05 Sandy 2
2015-05 Candy 2
2015-06 Candy 3
Assembling the final result
The two previous result tables need to be joined on Month and Name, to yield the result:
SELECT A.Name, A.Month, A.NumDays AS Attended, (10 - B.TotDays) AS Remaining
FROM (SELECT DATE_FORMAT(Date, '%Y-%m') AS Month,
Name,
COUNT(*) AS NumDays
FROM CourseAttendance
GROUP BY Month, Name
) AS A
JOIN (SELECT D.Month, D.Name, SUM(C.NumDays) AS TotDays
FROM (SELECT DISTINCT DATE_FORMAT(Date, '%Y-%m') AS Month, Name
FROM CourseAttendance
) AS D
JOIN (SELECT DATE_FORMAT(Date, '%Y-%m') AS Month,
Name,
COUNT(*) AS NumDays
FROM CourseAttendance
GROUP BY Month, Name
) AS C
ON C.Month <= D.Month AND C.Name = D.Name
GROUP BY D.Month, D.Name
) AS B
ON A.Month = B.Month AND A.Name = B.Name
ORDER BY A.Name, A.Month
This should give output like:
Name Month Attended Remaining
Candy 2015-05 2 8
Candy 2015-06 1 7
Sandy 2015-05 2 8
You can fettle the month value into a month name if you need to. You can also fettle the sort order if you want month and name within month, etc.