Calculating consecutive occurences in MySQL - mysql

I have a quick question in relation to windowing in MySQL
SELECT
Client,
User,
Date,
Flag,
lag(Date) over (partition by Client,User order by Date asc) as last_date,
lag(Flag) over (partition by Client,User order by Date asc) as last_flag,
case when Flag = 1 and last_flag = 1 then 1 else 0 end as consecutive
FROM db.tbl
This query returns something like the below. I am trying to work out the number of consecutive times that the Flag column was 1 for each user most recently, if they had 11110000111 then we should take the final three occurences of 1 to determine that they had a consecutive flag of 3 times.
I need to extract the start and end date for the consecutive flag.
How would I go about doing this, can anyone help me :)
If we use the example of 11110000111 then we should extract only 111 and therefore the 3 most recent dates for that customer. So in the below, we would need to take 10.01.2023 as the first date and 24.01.2023 as the last date. The consecutive count should be 3
Output:

Use aggregation and string functions:
WITH cte AS (
SELECT Client, User,
GROUP_CONCAT(CASE WHEN Flag THEN Date END ORDER BY Date) AS dates,
CHAR_LENGTH(SUBSTRING_INDEX(GROUP_CONCAT(Flag ORDER BY Date SEPARATOR ''), '0', '-1')) AS consecutive
FROM tablename
GROUP BY Client, User
)
SELECT Client, User,
NULLIF(SUBSTRING_INDEX(SUBSTRING_INDEX(dates, ',', -consecutive), ',', 1), '') AS first_date,
CASE WHEN consecutive > 0 THEN SUBSTRING_INDEX(dates, ',', -1) END AS last_date,
consecutive
FROM cte;
Another solution with window functions and conditional aggregation:
WITH
cte1 AS (SELECT *, SUM(NOT Flag) OVER (PARTITION BY Client, User ORDER BY Date) AS grp FROM tablename),
cte2 AS (SELECT *, MAX(grp) OVER (PARTITION BY Client, User) AS max_grp FROM cte1)
SELECT Client, User,
MIN(CASE WHEN Flag THEN Date END) AS first_date,
MAX(CASE WHEN Flag THEN Date END) AS last_date,
SUM(Flag) AS consecutive
FROM cte2
WHERE grp = max_grp
GROUP BY Client, User;
See the demo.

Made an attempt to get the result with more simpler queries and here is my approach taking advantage of lastDate and lastFlag column too.
Run here
WITH eTT
AS
( SELECT Client, User, NULLIF(MAX(Date),
(SELECT MAX(Date) FROM tt t2 WHERE t1.Client=t2.Client AND t1.User=t2.User)) as endDate
FROM tt t1 WHERE LastFlag=0 OR LastFlag IS NULL GROUP BY Client, User
)
SELECT Client, User,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE MIN(Date) END) as first_date,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE MAX(Date) END) as last_date,
(CASE WHEN MAX(endDate) IS NULL THEN NULL ELSE COUNT(endDate) END) as consecutive
FROM tt LEFT JOIN eTT USING (Client, User)
WHERE Date >= endDate OR endDate IS null GROUP BY Client, User;
EDIT
The original table doesn't have LastDate and LastFlag columns and were created using OP's initial query.
Since the method used is not apparantly supported but I get an impression that OP somehow manages to do that on their side.
Hence another cte called tt can be added before eTT containing that query.

Related

Summing of count result at same level

I'm trying to sum the results of count(id) at the same level, in order to find out the relative portion of the count(id) from the overall count.
The count is grouped by the respective previous number, and I want to stay at the same table and have it all together.
`
select totalattempts, count(totalattempts) allattempts, count(case when success>0 then totalattempts else null end) successfulattempts
from (
select *, case when success> 0 then attemptspresuccess+1 else attemptspresuccess end totalattempts
from (select orderid, count(orderid) attemptspresuccess, count(case when recoveredPaymentId is not null then recoveredPaymentId end ) success from (
select orderid, recoveredPaymentId
from errors
where platform = 'woo'
) alitable
group by orderid) minitable ) finaltable
group by totalattempts
order by totalattempts asc
`
I need to add another column that basically would have, to put it simply, count(totalattempts)/sum(count(totalattempts).
I'm running out of ideas basically.
I can't use windows as this is an app of retool which doesn't support that
Assuming some test data here:
DECLARE #table TABLE (AttemptNumber INT IDENTITY, Success BIT)
INSERT INTO #table (Success) VALUES
(0),(0),(0),(0),(1),(1),(0),(0),(0),(0),(0),(1),(0),(1),(0),(0),
(0),(0),(1),(0),(0),(0),(0),(1),(0),(1),(0),(0),(0),(1),(0),(0)
I sounds like you want to know how many attempts there were, how many were successful and what that is a percentage?
SELECT COUNT(Success) AS TotalCount,
COUNT(CASE WHEN Success = 1 THEN 1 END) AS SuccessCount,
COUNT(CASE WHEN Success = 1.0 THEN 1 END)/(COUNT(Success)+.0) AS SuccessPct
FROM #table
TotalCount SuccessCount SuccessPct
--------------------------------------
32 8 0.2500000000000

MYSQL - Filter consecutive not null dates

Get only the biggest date:
These are check-in and check-out records of employees, some times they do twice or more entries on the system in a row. In this sample there were two check-out in a row. Assuming these rows always gonna be ordered, in the case of check-out I would like have the biggest date, and in the case of the check-in the smallest date.
In that case I would like to have this:
The smaller date was excluded:
DEMO
Try this, in this big CASE statement I increment column by one, if checkin switches from null to not null and the other way around. Then it's enough to group by this column taking max and min of checkout and checkin respectively:
select #checkinLag := null, #rn := 0;
select max(id),
functionario,
loja,
min(checkin),
max(checkout)
from (
select case when (checkinLag is null and checkin is not null) or
(checkinLag is not null and checkin is null)
then #rn := #rn + 1 else #rn end rn,
checkin,
checkout,
loja,
id,
functionario
from (
select #checkinLag checkinLag,
#checkinLag := checkin,
checkin,
checkout,
loja,
id,
functionario
from dummyTable
order by coalesce(checkin, checkout)
) a
) a group by functionario, loja, rn
I have used subqueries, to guarantee order of evaluating expressions (assigning and using of #checkinLag), as Gordon Linoff pointed.
Demo
My solution:
Select
*
from dummyTable base
where (base.checkout is null or not exists (
select
1
from dummyTable co
where co.checkout between base.checkout and DATE_ADD(base.checkout, INTERVAL 5 SECOND)
and base.id <> co.id
and base.functionario = co.functionario
and base.loja = co.loja
)) and (base.checkin is null or not exists (
select
1
from dummyTable ci
where ci.checkin between DATE_SUB(base.checkin, INTERVAL 5 SECOND) and base.checkin
and base.id <> ci.id
and base.functionario = ci.functionario
and base.loja = ci.loja
));
you can test the query here. There is no need that the rows are orderd. I choose 5 seconds as the interval where check-in/outs should be ignored.

Select column(s) corresponding to max/min of another column without joins

I have a table (id, employee_id, device_id, logged_time) [simplified] that logs attendances of employees from biometric devices.
I generate reports showing the first in and last out time of each employee by date.
Currently, I am able to fetch the first in and last out time of each employee by date, but I also need to fetch the first in and last out device_ids of each employee. The entries are not in sequential order of the logged time.
I do not want to (and probably cannot) use joins as in one of the reports the columns are dynamically generated and can lead to thousands of joins. Furthermore, these are subqueries and are joined to other queries to get further details.
A sample setup of the table and queries are at http://sqlfiddle.com/#!9/3bc755/4
The first one just shows lists the entry and exit time by date of every employee
select
attendance_logs.employee_id,
DATE(attendance_logs.logged_time) as date,
TIME(MIN(attendance_logs.logged_time)) as entry_time,
TIME(MAX(attendance_logs.logged_time)) as exit_time
from attendance_logs
group by date, attendance_logs.employee_id
The second one builds up an attendance chart given a date range
select
`attendance_logs`.`employee_id`,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_18,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_18,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-18' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_18,
DATE(MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END)) as date_2017_09_19,
MIN(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as entry_2017_09_19,
MAX(case when DATE(`attendance_logs`.`logged_time`) = '2017-09-19' THEN `attendance_logs`.`logged_time` END) as exit_2017_09_19
/*
* dynamically generated columns for dates in date range
*/
from `attendance_logs`
where `attendance_logs`.`logged_time` >= '2017-09-18 00:00:00' and `attendance_logs`.`logged_time` <= '2017-09-19 23:59:59'
group by `attendance_logs`.`employee_id`;
Tried:
Similar to max and min logged_time of each date using case, tried to select the device_id where logged_time is max/min.
```MIN(case
when
`attendance_logs.logged_time` = MIN(
case when DATE(`attendance_logs`.`logged_time`)
= '2017-09-18' THEN `attendance_logs`.`logged_time` END
)
then `attendance_logs`.`device_id` end) as entry_device_2017_09_18 ```
This results in invalid use of group by
A quick hack for your query to pick the device id for in and out by using GROUP_CONCAT with in SUBSTRING_INDEX
SUBSTRING_INDEX(GROUP_CONCAT(case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END ORDER BY `l`.`device_id` desc),',',1) exit_device_2017_09_18,
Or if device id will be same for each in and its out then simply it can be written with GROUP_CONCAT only
GROUP_CONCAT(DISTINCT case when DATE(`l`.`logged_time`) = '2017-09-18' THEN `l`.`device_id` END)
DEMO
To avoid joins I suggest you try "correlated subqueries" instead:
select
employee_id
, logdate
, TIME(entry_time) entry_time
, (select MIN(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.entry_time) entry_device
, TIME(exit_time) exit_time
, (select MAX(l.device_id)
from attendance_logs l
where l.employee_id = d.employee_id
and l.logged_time = d.exit_time) exit_device
from (
select
attendance_logs.employee_id
, DATE(attendance_logs.logged_time) as logdate
, MIN(attendance_logs.logged_time) as entry_time
, MAX(attendance_logs.logged_time) as exit_time
from attendance_logs
group by
attendance_logs.employee_id
, DATE(attendance_logs.logged_time)
) d
;
see: http://sqlfiddle.com/#!9/06e0e2/3
Note: I have used MIN() and MAX() on those subqueries only to avoid any possibility that these return more than one value. You could use limit 1 instead if you prefer.
Note also: I do not normally recommend correlated subqueries as they can cause performance issues, but they do supply the data you need.
oh, and please try to avoid using date as a column name, it isn't good practice.

SQL: Union or Self Join

I have a simple table: user(id, date, task)
The task field contains either "download" or "upload"
I want to figure out the number of users who do each action per day.
Output: date, # of users who downloaded, # of users who uploaded
I first ran into the issue of using a subquery in the aggregate count function of the select, so I thought I should be using a self join here to break apart the data in the "task" column.
I thought I could create to tables for each case and then combine those and count, but I am having trouble finishing this out:
SELECT id, date, task as task_download
FROM user
WHERE task = 'download'
SELECT id, date, task as task_upload
FROM user
WHERE task = 'upload'
select `date`,
COUNT( distinct CASE WHEN task = 'download' then id end ) 'download',
COUNT( distinct CASE WHEN task = 'upload' then id end ) 'upload'
from user
group by `date`
I would say, neither nor. Just a query like this will do the job:
select `date`,
count(distinct case when task = 'download' then id else null end) as downloads,
count(distinct case when task = 'upload' then id else null end) as uploads
from user
where task in ('download', 'upload')
group by `date`
assuming, date is a column containing only the date part and not the complete timestamp and id is the user id. You can use the distinct keyword within aggregate functions, that's what I did here.
To have this query run appropriately fast, I recommend using an index on task,date
If, however, date contains the complete timestamp (i.e. including the time-part) you would want to group differently:
select `date`,
count(distinct case when task = 'download' then id else null end) as downloads,
count(distinct case when task = 'upload' then id else null end) as uploads
from user
where task in ('download', 'upload')
group by date(`date`)
You can do it with sub-queries, e.g.:
SELECT `date` AS `day`,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'upload') AS upload_count,
(SELECT COUNT(*) FROM activity WHERE date = day AND activity = 'download') AS download_count
FROM activity
GROUP BY date;
Here's the SQL Fiddle.
First count distinct users by date and task, and then sum users depending on each task by date.
select date,
sum(case when task = 'upload' then num_users else 0 end) as "upload",
sum(case when task = 'download' then num_users else 0 end) as "download"
from (
select date, task, count(distinct id) num_users
from usert
group by date, task
) x
group by date
;
Check it here: http://rextester.com/ZACFB64945
If you want the distinct users, then that suggests count(distinct):
SELECT date,
COUNT(DISTINCT CASE WHEN task = 'upload' THEN userid END) as uploads,
COUNT(DISTINCT CASE WHEN task = 'download' THEN userid END) as downloads
FROM user
GROUP BY date
ORDER BY date;
If you want distinct actions then you can do this as:
SELECT date,
SUM( (task = 'upload')::int ) as uploads,
SUM( (task = 'download')::int) as downloads
FROM user
GROUP BY date
ORDER BY date;
This uses a convenient Postgres shorthand for counting the boolean expressions.
I'd use conditional aggregation.
To get a count of the number of users that performed at least one upload on a given date (but only increment the count by one for that user for that date, even if that user performed more than one upload on the same date), we can use a COUNT(DISTINCT user) expression.
To get a count of the total number of uploads, we can use a COUNT or SUM.
SELECT DATE(t.date) AS `date`
, COUNT(DISTINCT IF(t.task='upload' ,t.user,NULL)) AS cnt_users_who_uploaded
, COUNT(DISTINCT IF(t.task='download',t.user,NULL)) AS cnt_users_who_downloaded
, SUM(IF(t.task='upload' ,1,0)) AS cnt_uploads
, SUM(IF(t.task='download',1,0)) AS cnt_downloads
FROM user t
GROUP BY DATE(t.date)
ORDER BY DATE(t.date)
Note: this will not return counts of zero for dates where there are no rows for that date does not appear in the table.

PHP MySQL Group By question

I have a column inside my table: tbl_customers that distinguishes a customer record as either a LEAD or a CUS.
The column is simply: recordtype, with is a char(1). I populate it with either C, or L.
Obviously C = customer, while L = lead.
I want to run a query that groups by the day the record was created, so I have a column called: datecreated.
Here's where I get confused with the grouping.
I want to display a result (in one query) the COUNT of customers and the COUNT of leads for a particular day, or date range. I'm successful with only pulling the number for either recordtype:C or recordtype:L , but that takes 2 queries.
Here's what I have so far:
SELECT COUNT(customerid) AS `count`, datecreated
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
AND `recordtype` = 'C'
GROUP BY `datecreated` ASC
As expected, this displays 2 columns (the count of customer records and the datecreated).
Is there a way to display both in one query, while still grouping by the datecreated column?
You can do a group by with over multiple columns.
SELECT COUNT(customerid) AS `count`, datecreated, `recordtype`
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
GROUP BY `datecreated` ASC, `recordtype`
SELECT COUNT(customerid) AS `count`,
datecreated,
SUM(`recordtype` = 'C') AS CountOfC,
SUM(`recordtype` = 'L') AS CountOfL
FROM `tbl_customers`
WHERE `datecreated` BETWEEN '$startdate."' AND '".$enddate."'
GROUP BY `datecreated` ASC
See Is it possible to count two columns in the same query
There are two solutions, depending on whether you want the two counts in separate rows or in separate columns.
In separate rows:
SELECT datecreated, recordtype, COUNT(*)
FROM tbl_customers
WHERE datecreated BETWEEN '...' AND '...'
GROUP BY datecreated, recordtype
In separate colums (this is called pivoting the table)
SELECT datecreated,
SUM(recordtype = 'C') AS count_customers,
SUM(recordtype = 'L') AS count_leads
FROM tbl_customers
WHERE datecreated BETWEEN '...' AND '...'
GROUP BY datecreated
Use:
$query = sprintf("SELECT COUNT(c.customerid) AS count,
c.datecreated,
SUM(CASE WHEN c.recordtype = 'C' THEN 1 ELSE 0 END) AS CountOfC,
SUM(CASE WHEN c.recordtype = 'L' THEN 1 ELSE 0 END) AS CountOfL
FROM tbl_customers c
WHERE c.datecreated BETWEEN STR_TO_DATE('%s', '%Y-%m-%d %H:%i')
AND STR_TO_DATE('%s', '%Y-%m-%d %H:%i')
GROUP BY c.datecreated",
$startdate, $enddate);
You need to fill out the date format - see STR_TO_DATE for details.