Mysql : Selecting a monthly total occurence count - mysql

In MySQL, I got a table similar to :
id user_id date
1 1 2014-09-27
2 1 2014-11-05
3 1 2014-11-14
4 2 2014-12-03
5 1 2014-12-23
I would like to select the total monthly amount of people.
ExpectedOutput : 4
2014-09 = 1 user
2014-10 = 0 user
2014-11 = 1 user //user 1 is present twice in november, but I want him only once per month
2014-12 = 2 user
total expected = 4
So far, my request is :
SELECT count(id)
FROM myTable u1
WHERE EXISTS(
SELECT id
FROM myTable u2
WHERE u2.user_id = u1.user_id
AND DATE_SUB(u2.date, INTERVAL 1 MONTH) > u1.date
);
It ouput the correct amount, but on my (not so heavy) table, it take hours to execute. Any hints to make this one lighter or faster ?
Bonus :
Since INTERVAL 1 MONTH is not available in DQL, is there any way to do it with a Doctrine QueryBuilder ?

Try this!
It should give you exactly what you need...
SELECT
EXTRACT(YEAR FROM dates) AS the_year,
EXTRACT(MONTH FROM dates) AS the_month,
COUNT( DISTINCT user_id ) AS total
FROM
myTable
GROUP BY
EXTRACT(YEAR FROM dates),
EXTRACT(MONTH FROM dates);

For you problem, what I would do is :
Creating a subrequst grouping the distinct sum of people by month
Creating a request making the sum of the sub-result.
Here is a working example (with your datas) sqlFiddle
And here is the request :
SELECT SUM(nb_people)
FROM (
-- This request return the number of distinct people in one month.
SELECT count(distinct(user_id)) AS nb_people, MONTH(`date`), YEAR(`date`)
FROM test
GROUP BY MONTH(`date`)
) AS subQuery
;

SELECT COUNT(DISTINCT user_id), YEAR(date) + '-' + MONTH(date)
FROM MyTable
GROUP BY YEAR(date), MONTH(date)

Related

Getting the number of users for this year and last year in SQL

My table is like this:
root_tstamp
userId
2022-01-26T00:13:24.725+00:00
d2212
2022-01-26T00:13:24.669+00:00
ad323
2022-01-26T00:13:24.629+00:00
adfae
2022-01-26T00:13:24.573+00:00
adfa3
2022-01-26T00:13:24.552+00:00
adfef
...
...
2021-01-26T00:12:24.725+00:00
d2212
2021-01-26T00:15:24.669+00:00
daddfe
2021-01-26T00:14:24.629+00:00
adfda
2021-01-26T00:12:24.573+00:00
466eff
2021-01-26T00:12:24.552+00:00
adfafe
I want to get the number of users in the current year and in previous year like below using SQL.
Date Users previous_year
2022-01-01 10 5
2022-01-02 20 15
The code is written as follows.
select CAST(root_tstamp as DATE) as Date,
count(DISTINCT userid) as users,
count(Distinct case when CAST(root_tstamp as DATE) = dateadd(MONTH,-12,CAST(root_tstamp as DATE)) then userid end) as previous_year
FROM table1
But it returns 0 for previous_year values.
How can I fix that?
Possible solution for SQL Server:
WITH cte AS ( SELECT 2022 [year]
UNION ALL
SELECT 2021 )
SELECT cte.[year],
COUNT(DISTINCT test.userId) current_users_amount,
COUNT(DISTINCT CASE WHEN YEAR(test.root_tstamp) < cte.[year]
THEN test.userId
END) previous_users_amount
FROM test
JOIN cte ON YEAR(test.root_tstamp) <= cte.[year]
GROUP BY cte.[year]
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=88b78aad9acd965bdbac4c85a0b81927
This query (for MySql) returns unique number of userids where the root_timestamp is in the current year, by day, and the number of unique userids for the same day last year. If there is no record for a day in the current year nothing will be displayed for that day. If there are rows for the current year, but no rows for the same day last year, then NULL will be shown for that lastyear column.
SELECT cast(ty.root_tstamp as date) as Dte,
COUNT(DISTINCT ty.userId) as users_this_day,
count(distinct lysd.userid) as users_sameday_lastyear
FROM test ty
left join
test lysd
on cast(lysd.root_tstamp as date)=date_add(cast(ty.root_tstamp as date), interval -1 year)
WHERE YEAR(ty.root_tstamp) = year(current_date())
GROUP BY Dte
If you wish to show output rows for calendar days even if there are no rows in current year and/or last year, then you also need a calendar table to be introduced (let's hope that it is not what you need)

Get count of departmant with total employees for period of month in mysql

I have requirement to get count of distinct department with total employees in period of month but unfortunately query is not working and throwing error
My table
Department_id emloyee_id date_time
1 1 2020-02-01
1 2 2020-02-04
3 7 2020-02-06
1 4 2020-02-07
expected output
total department=2
total employee of all department=4
But all should work based on last one record , I am getting sql syntax error
Query:
SELECT COUNT(DISTINCT department_id) x, COUNT(*) y
FROM department
WHERE date_time>=DATE_FORMAT(NOW() ,'%Y-%m-01')
AND date_time<DATE(NOW()+INTERVAL 1 DAY and status='1'
You can combine them within only one query :
SELECT COUNT(DISTINCT Department_id), COUNT(DISTINCT employee_id)
FROM department
WHERE date_time >= NOW() - INTERVAL 1 MONTH
AND status = '1';
counting both distinctly.
Update : If you mean to stay within the current month, then also
AND date_time>=DATE_FORMAT(NOW() ,'%Y-%m-01')
might be added to this query as in your original one.
It seems you should use month instead of day and are missing a bracket after month
SELECT COUNT(DISTINCT department_id) AS departments,
COUNT(*) AS employees
FROM department
WHERE date_time>=DATE_FORMAT(NOW() ,'%Y-%m-01')
AND date_time < DATE(NOW()+INTERVAL 1 MONTH)
AND status = '1';

Percent on basis of row count and row count on basis of condition in single table in mysql

I have four tables with the following structure.
Table 1:
Project - have unique project names (prj_name)
Table 2:
my_records - have the following fields:
record_id,prj_name,my_dept,record_submit_date,record_state
Table 3:
record_states have multiple states where 'Completed' is one.
Table 4:custom_dept_list
dept_name
I need to get the percentage of (records have state as completed) and (Total records) grouped by my_project where my_dept in custom_dept_list and record_submit_date is greater than "some date"
I have tried the following:
Query:
select prj_name,count(record_id) as total,((select count(record_id) from
my_records where record_state='Completed')/(count(record_id)))*100 as
percent from my_records,custom_dept_list where record_state='Completed'
and record_submit_date >= ( CURDATE() - INTERVAL 15 DAY ) and
my_dept=dept_name group by prj_name order by percent desc;
Total records for project A = 50
Total records for project A with record_state='Completed' = 30
Ratio is not coming - (30/50)*100 = 60
It is giving some very big value.
Below is the data from my_records, i have removed record_submit date to make it simple:
|1|prj1|dept1|Completed
|2|prj1|dept1|XYZ
|3|prj1|dept1|Completed
|4|prj1|dept2|XYZ
|5|prj1|dept2|Completed
|6|prj1|dept1|XYZ
|7|prj1|dept1|XYZ
|8|prj1|dept1|XYZ
|9|prj1|dept2|XYZ
|10|prj1|dept2|XYZ
|11|prj1|dept2|Completed
|12|prj1|dept2|Completed
|13|prj1|dept2|Completed
|14|prj1|dept3|XYZ
|15|prj1|dept4|Completed
|16|prj1|dept4|XYZ
|17|prj1|dept5|Completed
|18|prj1|dept6|XYZ
|19|prj1|dept7|XYZ
|20|prj1|dept8|XYZ
|21|prj1|dept10|XYZ
|22|prj1|dept2|XYZ
|23|prj1|dept2|Completed
|24|prj1|dept2|Completed
|25|prj1|dept2|Completed
Data From Custom_dept_List:
dept_name
dept1
dept3
dept4
dept5
dept6
dept8
dept10
I have tried the following queries :
Query 1
select count(record_id) as count,prj_name from my_records,custom_dept_list where my_dept=dept_name group by prj_name order by count desc;
Ouput -- 13
Query 2
select count(record_id) as count,prj_name from my_records,custom_dept_list where my_dept=dept_name and record_state='Completed' group by prj_name order by count desc;
Output -- 4
Query 3
select prj_name,count(record_id) as total,count(case when record_state='Completed' then record_id end) /count(record_id) *100 as percent from my_records join custom_dept_list on my_dept = dept_name where record_state = 'Completed' group by prj_name order by percent desc;
Output :
prj_name total percent
prj1 4 100.0000
First of all, please use proper join instead of multiple tables in your from clause.
Then, you don't need that inner query to get the count with a specific record_state, you can use a case inside the count:
select prj_name,
count(record_id) as total,
count(case when record_state='Completed' then record_id end) /
count(record_id) * 100 as percent
from my_records
join custom_dept_list
on my_dept = dept_name
where record_submit_date >= ( CURDATE() - INTERVAL 15 DAY )
group by prj_name
order by percent desc;
Your problem was probably caused by that inner query, that was not counting each project's completed records, but all the completed records instead.
you do not need this record_state = 'Completed' condition because of this you get only completed record as total recoded. so try without it.
select prj_name,
count(record_id) as total,
count(case when record_state='Completed' then record_id end) /
count(record_id) * 100 as percent
from my_records
join custom_dept_list
on my_dept = dept_name
where record_submit_date >= ( CURDATE() - INTERVAL 15 DAY )
group by prj_name

Mysql- exclude the rows which are having same minutes

I want the excluded result of rows which are having same minutes
Example: customers table
id date
1 2015-07-23 00:06:56
2 2015-07-23 00:11:38
3 2015-07-23 01:10:16
4 2015-07-23 01:10:13
5 2015-07-24 01:13:26
6 2015-07-24 01:13:13
I want the query to exclude id's: 3, 4, 5 & 6 because (3 & 4) and (5 & 6) are having same minutes
So expected result is:
id date
1 2015-07-23 00:06:56
2 2015-07-23 00:11:38
Thanks in advance!
Try this out. Subquery finds out the hour and minutes that are duplicated. Outer query selects all records except the one returned by the subquery.
select *
from customers
where date_format(dt, '%H%i') not in
(
select date_format(dt, '%H%i')
from customers a
group by date_format(dt, '%H%i')
having count(*) > 1
)
Example: http://sqlfiddle.com/#!9/88a24/4
Filter out duplicate hour + minute by day
select *
from test
where date_format(dt, '%Y%j%H%i') not in
(
select date_format(dt, '%Y%j%H%i')
from test a
group by date_format(dt, '%Y%j%H%i')
having count(*) > 1
)
Example: http://sqlfiddle.com/#!9/fa2da/2
Another example: http://sqlfiddle.com/#!9/7eeaf/1 (using data from updated question)
You can form a group using the year, day, hour and minute portions of the date column, and then only retain groups which have one record. This filters out all records which have duplicate minutes. Then I JOIN this temporary table back to your customers table for the result. This solution does not use a subquery.
SELECT id, date
FROM customers c
INNER JOIN
(
SELECT id
FROM customers
GROUP BY
EXTRACT(YEAR FROM date),
EXTRACT(MONTH FROM date),
EXTRACT(DAY FROM date),
EXTRACT(HOUR FROM date),
EXTRACT(MINUTE FROM date),
HAVING COUNT(*) = 1
) t
ON c.id = t.id
Try this query
select * from tablename
group by date
having count(*)=1

How to wirte an extensible SQL to find the users who continuously login for n days

If I have a table(Oracle or MySQL), which stores the date user logins.
So how can I write a SQL(or something else) to find the users who have continuously login for n days.
For example:
userID | logindate
1000 2014-01-10
1000 2014-01-11
1000 2014-02-01
1000 2014-02-02
1001 2014-02-01
1001 2014-02-02
1001 2014-02-03
1001 2014-02-04
1001 2014-02-05
1002 2014-02-01
1002 2014-02-03
1002 2014-02-05
.....
We can see that user 1000 has continually logined for two days in 2014, and user 1001 has continually logined for 5 days. and user 1002 never continuously logins.
The SQL should be extensible , which means I can pick every number of n, and modify a little or pass a new parameter, and the results is as expected.
Thank you!
As we don't know what dbms you are using (you named both MySQL and Oracle), here are are two solutions, both doing the same: Order the rows and subtract rownumber days from the login date (so if the 6th record is 2014-02-12 and the 7th is 2014-02-13 they both result in 2014-02-06). So we group by user and that groupday and count the days. Then we group by user to find the longest series.
Here is a solution for a dbms with analytic window functions (e.g. Oracle):
select userid, max(days)
from
(
select userid, groupday, count(*) as days
from
(
select
userid, logindate - row_number() over (partition by userid order by logindate) as groupday
from mytable
)
group by userid, groupday
)
group by userid
--having max(days) >= 3
And here is a MySQL query (untested, because I don't have MySQL available):
select
userid, max(days)
from
(
select
userid, date_add(logindate, interval -row_number day) as groupday, count(*) as days
from
(
select
userid, logindate,
#row_num := #row_num + 1 as row_number
from mytable
cross join (select #row_num := 0) r
order by userid, logindate
)
group by userid, groupday
)
group by userid
-- having max(days) >= 3
I think the following query will give you a very extensible parametrization:
select z.userid, count(*) continuous_login_days
from
(
with max_dates as
( -- Get max date for every user ID
select t.userid, max(t.logindate) max_date
from test t
group by t.userid
),
ranks as
( -- Get ranks for login dates per user
select t.*,
row_number() over
(partition by t.userid order by t.logindate desc) rnk
from test t
)
-- So here, we select continuous days by checking if rank inside group
-- (per user ID) matches login date compared to max date
select r.userid, r.logindate, r.rnk, m.max_date
from ranks r, max_dates m
where m.userid = r.userid
and r.logindate + r.rnk - 1 = m.max_date -- here is the key
) z
-- Then we only group by user ID to get the number of continuous days
group by z.userid
;
Here is the result:
USERID CONTINUOUS_LOGIN_DAYS
1 1000 2
2 1001 5
3 1002 1
So you can just choose by querying field CONTINUOUS_LOGIN_DAYS.
EDIT : If you want to choose from all ranges (not only the last one), my query structure no longer works because it relied on the last range. But here is a workaround:
with w as
( -- Parameter
select 2 nb_cont_days from dual
)
select *
from
(
select t.*,
-- Get number of days around
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days + 1
and t.logindate) m1,
-- Get also number of days more in the past, and in the future
(select count(*) from test t2
where t2.userid = t.userid
and t2.logindate between t.logindate - nb_cont_days
and t.logindate + 1) m2,
w.nb_cont_days
from w, test t
) x
-- If these 2 fields match, then we have what we want
where x.m1 = x.nb_cont_days
and x.m2 = x.nb_cont_days
order by 1, 2
You just have to change the parameter in the WITH clause, so you can even create a function from this query to call it with this parameter.
SELECT userID,count(userID) as numOfDays FROM LOGINTABLE WHERE logindate between '2014-01-01' AND '2014-02-28'
GROUP BY userID
In this case you can check the login days per user, in a specific period