This is a follow up question to MySQL count / track streaks or consecutive dates
The solution provided by Matt to my earlier question works great, but I'm running into an issue now that I'm dealing with 1 extra column (prod_cond). A product can be used or new, and will be listed under the same prod_id. In these cases the streaks are not calculated correctly anymore.
I created an example here: http://sqlfiddle.com/#!9/3f04c3/17
I haven't been able to get them to display correctly with this additional column.
+-----+------------+------------+-----------------------------------+
| id | seller_id | prod_id | prod_cond | date |
+-----+------------+------------------------------------------------+
| 1 | 283 | 4243 | 1 | 2016-10-10 23:55:01 |
| 2 | 283 | 4243 | 2 | 2016-10-10 02:01:06 |
| 3 | 283 | 4243 | 1 | 2016-10-11 23:55:06 |
| 4 | 283 | 4243 | 2 | 2016-10-11 23:55:07 |
| 5 | 283 | 4243 | 1 | 2016-10-12 23:55:07 |
| 6 | 283 | 4243 | 2 | 2016-10-13 23:55:07 |
| 7 | 283 | 4243 | 1 | 2016-10-14 23:55:07 |
| 8 | 283 | 4243 | 2 | 2016-10-14 23:57:06 |
| 9 | 283 | 4243 | 1 | 2016-10-15 23:57:06 |
| 10 | 283 | 4243 | 2 | 2016-10-15 23:57:06 |
+-----+------------+------------+-------------+---------------------+
So basically I need to identify each block of consecutive dates for each seller (seller_id) selling products (prod_id) with product condition (prod_cond) new (1) or (2) used.
This is what the result should look like:
+------------+---------+---------+----------------+---------------+
| seller_id | prod_id | cond_id | streak in days | begin streak |
+------------+---------+---------+----------------+---------------+
| 283 | 4243 | 1 | 3 | 2016-10-10 |
| 283 | 4243 | 1 | 2 | 2016-10-14 |
| 283 | 4243 | 2 | 2 | 2016-10-10 |
| 283 | 4243 | 2 | 3 | 2016-10-13 |
+------------+---------+---------+----------------+---------------|
But as you can see here: http://sqlfiddle.com/#!9/3f04c3/17
It is not working correctly.
In MySQL, you would do this using variables:
select seller_id, prod_id, cond_id, count(*) as numdays,
min(date), max(date)
from (select t.*,
(#rn := if(#grp = concat_ws(':', seller_id, prod_id, cond_id), #rn + 1,
if(#grp := concat_ws(':', seller_id, prod_id, cond_id), #rn + 1, #rn + 1)
)
) rn
from transact t cross join
(select #grp := 0, #rn := '') params
order by seller_id, prod_id, cond_id, date
) t
group by seller_id, prod_id, cond_id, date_sub(date(date), interval rn day)
The idea is that for each group -- based on seller, product, and condition -- the query enumerates the dates. Then, the date minus the enumerated value is constant for consecutive dates.
Here is a SQL Fiddle showing it working.
Related
This question already has answers here:
Group by minimum value in one field while selecting distinct rows
(10 answers)
Closed 2 years ago.
I have a table that stores facial login data of employees based upon employee id. I need to get the earliest login for each employee on a day and all other logins to be ignored. I know how to get latest or earliest record for each employee but I am unable to figure out how to get earliest entry in each day by each employee.
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 14 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-11 12:07:24 |
| 15 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-11 12:09:35 |
| 16 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-11 12:09:41 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 18 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-12 14:40:20 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 20 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-12 16:07:24 |
| 21 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-12 17:09:35 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
The result will look like below...
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
You can do:
select *
from t
where (employee_id, created_at) in (
select employee_id, min(created_at)
from t
group by employee_id, date(created_at)
)
how to get earliest entry in each day by each employee
You can filter with a correlated subquery:
select t.*
from mytable t
where t.created_at = (
select min(t1.created_at)
from mytable t1
where
t1.employee_id = t.employee_id
and t1.created_at >= date(t.created_at)
and t1.created_at < date(t.created_at) + interval 1 day
)
This query would take advantage of an index on (employee_id, created_at).
Or, if you are running MySQL 8.0, you can use window functions:
select *
from (
select
t.*,
row_number() over(
partition by employee_id, date(created_at)
order by created_at
) rn
from mytable t
) t
where rn = 1
I have a table in an old version of MySQL 5.x like this:
+---------+------------+------------+
| Task_ID | Start_Date | End_Date |
+---------+------------+------------+
| 1 | 2015-10-15 | 2015-10-16 |
| 2 | 2015-10-17 | 2015-10-18 |
| 3 | 2015-10-19 | 2015-10-20 |
| 4 | 2015-10-21 | 2015-10-22 |
| 5 | 2015-11-01 | 2015-11-02 |
| 6 | 2015-11-17 | 2015-11-18 |
| 7 | 2015-10-11 | 2015-10-12 |
| 8 | 2015-10-12 | 2015-10-13 |
| 9 | 2015-11-11 | 2015-11-12 |
| 10 | 2015-11-12 | 2015-11-13 |
| 11 | 2015-10-01 | 2015-10-02 |
| 12 | 2015-10-02 | 2015-10-03 |
| 13 | 2015-10-03 | 2015-10-04 |
| 14 | 2015-10-04 | 2015-10-05 |
| 15 | 2015-11-04 | 2015-11-05 |
| 16 | 2015-11-05 | 2015-11-06 |
| 17 | 2015-11-06 | 2015-11-07 |
| 18 | 2015-11-07 | 2015-11-08 |
| 19 | 2015-10-25 | 2015-10-26 |
| 20 | 2015-10-26 | 2015-10-27 |
| 21 | 2015-10-27 | 2015-10-28 |
| 22 | 2015-10-28 | 2015-10-29 |
| 23 | 2015-10-29 | 2015-10-30 |
| 24 | 2015-10-30 | 2015-10-31 |
+---------+------------+------------+
If the End_Date of the tasks are consecutive,
then they are part of the same project.
I am interested in finding the total number of different projects completed.
If there is more than one project that have the same number of completion days,
then order by the Start_Date of the project.
For this few sample records the expected output would be:
2015-10-15 2015-10-16
2015-10-17 2015-10-18
2015-10-19 2015-10-20
2015-10-21 2015-10-22
2015-11-01 2015-11-02
2015-11-17 2015-11-18
2015-10-11 2015-10-13
2015-11-11 2015-11-13
2015-10-01 2015-10-05
2015-11-04 2015-11-08
2015-10-25 2015-10-31
I am a bit jammed with this.
I would really appreciate any help. Thanks.
Following query should work:
select tmp.projectid, date_sub(max(tmp.ed2), interval max(tmp.projectdays) day) start_date,
max(tmp.ed2) end_date,
max(tmp.projectdays) No_Of_ProjectDays
from
(
select t1.task_id tid1, t1.start_date sd1, t1.end_date ed1,
t2.task_id tid2, t2.start_date sd2, t2.end_date ed2,
case when datediff(t2.start_date, ifnull(t1.start_date,'1000-01-01')) != 1
then (#pid := #pid + 1)
else (#pid := #pid)
end as ProjectId,
case when datediff(t2.start_date, ifnull(t1.start_date,'1000-01-01')) != 1
then (#pdays := 1)
else (#pdays := #pdays + 1)
end as ProjectDays
from tasks t1 right join tasks t2
on t2.task_id = t1.task_id + 1
cross join (select #pid :=1, #pdays := 1) vars
) tmp
group by tmp.projectid
order by max(tmp.projectdays), start_date
Please find the Demo here.
EDIT : I have made changes in the query and link according to new data sample. Please have a look.
This answers -- and answers correctly -- the original version of this question.
Hmmmm . . . I think you can use variables. The simplest way is to generate a sequential number and then subtract this value to get a constant for adjacent rows from the date:
select min(start_date), max(end_date)
from (select t.*, (#rn := #rn + 1) as rn
from (select t.* from tasks t order by end_date) t cross join
(select #rn := 0) params
) t
group by (end_date - interval rn day);
Here is a db<>fiddle.
It's a little tricky problem, but the query below works fine.
It builds two tables, one with Start_Date and other with End_Date
that NOT IN End_Date and Start_Date respectively from Projects table,
and query these tables fetching Start_Date WHERE Start_Date < End_Date grouping by Start_Date
using aggregate function MIN with End_Date to get a complete Project.
DATEDIFF(MIN(End_Date), Start_Date) to calculate project_duration and able to order by project_duration.
SELECT Start_Date, MIN(End_Date) AS End_Date, DATEDIFF(MIN(End_Date), Start_Date) AS project_duration
FROM
(SELECT Start_Date FROM Projects WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) a,
(SELECT End_Date FROM Projects WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) b
WHERE Start_Date < End_Date
GROUP BY Start_Date
ORDER BY project_duration ASC, Start_Date ASC;
expected output
+------------+------------+---------------+
| Start_Date | End_Date | project_duration |
+------------+------------+---------------+
| 2015-10-15 | 2015-10-16 | 1 |
| 2015-10-17 | 2015-10-18 | 1 |
| 2015-10-19 | 2015-10-20 | 1 |
| 2015-10-21 | 2015-10-22 | 1 |
| 2015-11-01 | 2015-11-02 | 1 |
| 2015-11-17 | 2015-11-18 | 1 |
| 2015-10-11 | 2015-10-13 | 2 |
| 2015-11-11 | 2015-11-13 | 2 |
| 2015-10-01 | 2015-10-05 | 4 |
| 2015-11-04 | 2015-11-08 | 4 |
| 2015-10-25 | 2015-10-31 | 6 |
+------------+------------+---------------+
I have the following table bellow.
The timeStamp is the moment that the status began.
There are some rows that don't add new information if status changed (like the second row) and they could be ignored.
I would to calculate (using mysql 5.7) the total amount of time for each status.
| timeStamp | status |
|------------------------------|
| 2019-12-10 14:00:00 | 1 |
| 2019-12-10 14:10:00 | 1 | // this row could be ignored
| 2019-12-10 14:00:00 | 2 | // more 24 hours in status 1
| 2019-12-11 14:10:00 | 2 |
| 2019-12-12 14:00:00 | 1 | // more 24 hours in status 2
| 2019-12-14 14:00:00 | 2 | // more 48 hours in status 1
| 2019-12-16 14:10:00 | 2 |
| 2019-12-17 14:20:00 | 2 |
| 2019-12-18 14:00:00 | 3 | // more 96 hours in status 2
| 2019-12-19 14:00:00 | 1 | // more 24 hours in status 3
I would like to see as result a table like bellow.
| status | amount_of_time |
|-------------------------|
| 1 | 72 hours |
| 2 | 120 hours |
| 3 | 24 hours |
What complicates this is that the status don't stay in order: is not 1, 2,3.
In the example above it is: 1, 2, 1, 2, 3, 1, so I can't use the MIN information.
Get the timestamp of the following row in a subquery and calculate the difference to the timestamp of the current row:
select t1.status, timestampdiff(second,
t1.timeStamp,
(
select min(t2.timeStamp)
from mytable t2
where t2.timeStamp > t1.timeStamp
)
) as diff
from mytable t1;
This will return:
| status | diff |
| ------ | ------ |
| 1 | 600 |
| 1 | 86400 |
| 2 | 600 |
| 2 | 85800 |
| 1 | 172800 |
| 2 | 173400 |
| 2 | 87000 |
| 2 | 85200 |
| 3 | 86400 |
| 1 | NULL |
View on DB Fiddle
From here it's just a matter of GROUP BY and SUM:
select status, sum(diff) as duratation_in_seconds
from (
select t1.status, timestampdiff(second,
t1.timeStamp,
(
select min(t2.timeStamp)
from mytable t2
where t2.timeStamp > t1.timeStamp
)
) as diff
from mytable t1
) x
group by status;
Result:
| status | duratation_in_seconds |
| ------ | --------------------- |
| 1 | 259800 |
| 2 | 432000 |
| 3 | 86400 |
View on DB Fiddle
If you want the time in hours, change the first line to
select status, round(sum(diff)/3600) as duratation_in_hours
and you will get:
| status | duratation_in_hours |
| ------ | ------------------- |
| 1 | 72 |
| 2 | 120 |
| 3 | 24 |
View on DB Fiddle
You might though want to use floor() instead of round(). That's not clear from your question.
In MySQL 8 you could use the LEAD() window function to get the timestamp of the next row:
select status, sum(diff) as duratation_in_seconds
from (
select
status,
timestampdiff(second, timeStamp, lead(timeStamp) over (order by timeStamp)) as diff
from mytable
) x
group by status;
View on DB Fiddle
I'm wondering how to select the second smallest value from a mysql table, grouped on a non-numeric column. If I have a table that looks like this:
+----+----------+------------+--------+------------+
| id | customer | order_type | amount | created_dt |
+----+----------+------------+--------+------------+
| 1 | 1 | web | 5 | 2017-01-01 |
| 2 | 1 | web | 7 | 2017-01-05 |
| 3 | 2 | web | 2 | 2017-01-07 |
| 4 | 3 | web | 2 | 2017-02-01 |
| 5 | 3 | web | 3 | 2017-02-01 |
| 6 | 2 | web | 5 | 2017-03-15 |
| 7 | 1 | in_person | 7 | 2017-02-01 |
| 8 | 3 | web | 8 | 2017-01-01 |
| 9 | 2 | web | 1 | 2017-04-01 |
+----+----------+------------+--------+------------+
I want to count the number of second orders in each month/year. I also have a customer table (which is where the customer ids come from). I can find the number of customers with more than at least 2 orders by the customer's created date by querying
select date(c.created_dt) as create_date, count(c.id)
from customer c
where c.id in
(select or.identity_id
from orders or
where
(select count(o.created_dt)
from orders o
where or.customer = o.customer and o.order_tpe in ('web')
) > 1
)
group by 1;
However, that result gives customer by their created date, and I can't seem to figure out how to find the the number of second orders by date.
The desired output i'd like to see, based on the data above, is:
+-------+------+---------------+
| month | year | second_orders |
+-------+------+---------------+
| 1 | 2017 | 1 |
| 2 | 2017 | 1 |
| 3 | 2017 | 1 |
+-------+------+---------------+
One way to approach this
SELECT YEAR(created_dt) year, MONTH(created_dt) month, COUNT(*) second_orders
FROM (
SELECT created_dt,
#rn := IF(#c = customer, #rn + 1, 1) rn,
#c := customer
FROM orders CROSS JOIN (
SELECT #c := NULL, #rn := 1
) i
WHERE order_type = 'web'
ORDER BY customer, id
) q
WHERE rn = 2
GROUP BY YEAR(created_dt), MONTH(created_dt)
ORDER BY year, month
Here is a dbfiddle demo
Output:
+------+-------+---------------+
| year | month | second_orders |
+------+-------+---------------+
| 2017 | 1 | 1 |
| 2017 | 2 | 1 |
| 2017 | 3 | 1 |
+------+-------+---------------+
+-----+------------+------------+---------------------+
| id | seller_id | prod_id | date |
+-----+------------+----------------------------------+
| 1 | 283 | 4243 | 2016-10-10 23:55:01 |
| 2 | 287 | 4243 | 2016-10-10 02:01:06 |
| 3 | 283 | 4243 | 2016-10-11 23:55:06 |
| 4 | 311 | 4243 | 2016-10-11 23:55:07 |
| 5 | 283 | 4243 | 2016-10-12 23:55:07 |
| 6 | 283 | 4243 | 2016-10-13 23:55:07 |
| 7 | 311 | 4243 | 2016-10-13 23:55:07 |
| 8 | 287 | 4243 | 2016-10-14 23:57:06 |
| 9 | 311 | 4243 | 2016-10-14 23:57:06 |
| 10 | 311 | 4243 | 2016-10-15 23:57:06 |
+-----+------------+------------+---------------------+
From the table above how would I extract the following information using an MySQL query?
+------------+---------+----------------+---------------+
| seller_id | prod_id | streak in days | begin streak |
+-----+------------+--------------------+---------------+
| 283 | 4243 | 4 | 2016-10-10 |
| 287 | 4243 | 1 | 2016-10-10 |
| 311 | 4243 | 1 | 2016-10-11 |
| 311 | 4243 | 3 | 2016-10-13 |
| 287 | 4243 | 1 | 2016-10-14 |
+------------+---------+----------------+---------------|
So basically I need to identify each block of consecutive dates for each seller (seller_id) selling products (prod_id).
I limited this example to 1 prod_id and only a range of a few days, but sellers do sell more than 1 product (prod_id)
SELECT
seller_id
,prod_id
,COUNT(*) as StreakInDays
,MIN(DateCol) as BeginStreak
FROM
(
SELECT
seller_id
,prod_id
,DATE(DateCol) as DateCol
,(#rn:= if((#seller = seller_id) AND (#prod = prod_id), #rn + 1,
if((#seller:= seller_id) AND (#prod:= prod_id), 1, 1)
)
) as RowNumber
FROM
Transact t
CROSS JOIN (SELECT #seller:=0, #prod:=0, #rn:=0) var
ORDER BY
seller_id
,prod_id
,DATE(DateCol)
) t
GROUP BY
seller_id
,prod_id
,DATE_SUB(DateCol, INTERVAL RowNumber Day)
ORDER BY
prod_id
,DATE_SUB(DateCol, INTERVAL RowNumber Day)
,seller_id
Generate a partitioned row number partitioned by seller_id and prod_id. Then use the Date - RownNumber as a grouping and you can get to your answer by simple aggregation.
SQL Fiddle to show you it works for multiple products, sellers etc. http://sqlfiddle.com/#!9/0a0c44/8/0
Note if it is possible that the same seller can have more than 1 transaction for a product on the same day then you will need to replace the Transact with a derived table of DISTINCT seller_id, prod_id, DATE(date) before generating the row number like this:
SELECT
seller_id
,prod_id
,COUNT(*) as StreakInDays
,MIN(DateCol) as BeginStreak
FROM
(
SELECT
seller_id
,prod_id
,DateCol
,(#rn:= if((#seller = seller_id) AND (#prod = prod_id), #rn + 1,
if((#seller:= seller_id) AND (#prod:= prod_id), 1, 1)
)
) as RowNumber
FROM
(SELECT DISTINCT seller_id, prod_id, DATE(DateCol) as DateCol
FROM
Transact
)t
CROSS JOIN (SELECT #seller:=0, #prod:=0, #rn:=0) var
ORDER BY
seller_id
,prod_id
,DateCol
) t
GROUP BY
seller_id
,prod_id
,DATE_SUB(DateCol, INTERVAL RowNumber Day)
ORDER BY
prod_id
,DATE_SUB(DateCol, INTERVAL RowNumber Day)
,seller_id
http://sqlfiddle.com/#!9/0a0c44/11