MySQL count rows with similar timestamp - mysql

Is there anyway to count a given run of timestamps that are close to each other, but not necessarily in a fixed time frame?
Ie, not grouped by hour or minute, but rather grouped by how close the current row's timestamp is to the next row's timestamp. If the next row is within "x" seconds/minutes then add that row to the group, otherwise start a new grouping.
Given this data:
+----+---------+---------------------+
| id | item_id | event_date |
+----+---------+---------------------+
| 1 | 1 | 2013-05-17 11:59:59 |
| 2 | 1 | 2013-05-17 12:00:00 |
| 3 | 1 | 2013-05-17 12:00:02 |
| 4 | 1 | 2013-05-17 12:00:03 |
| 5 | 3 | 2013-05-17 14:05:00 |
| 6 | 3 | 2013-05-17 14:05:01 |
| 7 | 3 | 2013-05-17 15:30:00 |
| 8 | 3 | 2013-05-17 15:30:01 |
| 9 | 3 | 2013-05-17 15:30:02 |
| 10 | 1 | 2013-05-18 09:12:00 |
| 11 | 1 | 2013-05-18 09:13:30 |
| 12 | 1 | 2013-05-18 09:13:45 |
| 13 | 1 | 2013-05-18 09:14:00 |
| 14 | 2 | 2013-05-20 15:45:00 |
| 15 | 2 | 2013-05-20 15:45:03 |
| 16 | 2 | 2013-05-20 15:45:10 |
| 17 | 2 | 2013-05-23 07:36:00 |
| 18 | 2 | 2013-05-23 07:36:10 |
| 19 | 2 | 2013-05-23 07:36:12 |
| 20 | 2 | 2013-05-23 07:36:15 |
| 21 | 1 | 2013-05-24 11:55:00 |
| 22 | 1 | 2013-05-24 11:55:02 |
+----+---------+---------------------+
Desired Results:
+---------+-------+---------------------+
| item_id | total | last_date_in_group |
+---------+-------+---------------------+
| 1 | 4 | 2013-05-17 12:00:03 |
| 3 | 2 | 2013-05-17 14:05:01 |
| 3 | 3 | 2013-05-17 15:30:02 |
| 1 | 4 | 2013-05-18 09:14:00 |
| 2 | 3 | 2013-05-20 15:45:10 |
| 2 | 4 | 2013-05-23 07:36:15 |
| 1 | 2 | 2013-05-24 11:55:02 |
+---------+-------+---------------------+

This is a little complicated. To start, you need is time of the next event for each record. The following subquery adds in such a time (nexted), if it is within bounds:
select t.*,
(select event_date
from t t2
where t2.item_id = t.item_id and
t2.event_date > t.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t
This uses a correlated subquery. The <date comparison here> is for whatever date comparison you want. When there is no record, the value will be NULL.
Now, with this information (nexted) there is a trick to get the grouping. For any record, it is the first event time afterwards where nexted is NULL. This will be the last event in the series. Unfortunately, this requires two levels of nested correlated subqueries (or joins with aggregations). The result looks a bit unwieldy:
select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date,
COUNT(*) as num_dates
from (select t.*,
(select min(t2.event_date)
from (select t1.*,
(select event_date
from t t2
where t2.item_id = t1.item_id and
t2.event_date > t1.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t1
) t2
where t2.nexted is null
) as grouping
from t
) s
group by item_id, grouping;

What about approaching it from finding each individual record's local associations, and then grouping on the max event date from each record's discoveries. This is based on a static differential time interval (5 minutes in my example)
SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM (
SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group
FROM table_name t1
LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE
GROUP BY t1.id
) t
GROUP BY last_date_in_group

Related

SQL - Select records that their columns do not follow the same order

Given we have following table where the series number and the the date should increment
+----+--------+------------+
| id | series | date |
+----+--------+------------+
| 1 | 10 | 2020-08-13 |
| 2 | 9 | 2020-08-02 |
| 3 | 8 | 2020-06-23 |
| 4 | 7 | 2020-06-08 |
| 5 | 6 | 2020-05-20 |
| 6 | 5 | 2020-05-05 |
| 7 | 4 | 2020-05-01 |
+----+--------+------------+
Is there a way to check if there are records that do not follow this pattern ?
For example row 2 has bigger series number but it's date is before row 3
+----+--------+------------+
| id | series | date |
+----+--------+------------+
| 1 | 10 | 2020-08-13 |
| 2 | 9 | 2020-06-02 |
| 3 | 8 | 2020-07-23 |
| 4 | 7 | 2020-06-08 |
| 5 | 6 | 2020-05-20 |
| 6 | 5 | 2020-05-05 |
| 7 | 4 | 2020-05-01 |
+----+--------+------------+
You can use window functions:
select *
from (
select t.*, lead(date) over(order by series) lead_date
from mytable t
) t
where date > lead_date
Alternatively:
select *
from (
select t.*, lead(series) over(order by date) lead_series
from mytable t
) t
where series > lead_series
You can use lag():
select t.*
from (select t.*,
lag(id) over (order by series) as prev_id_series,
lag(id) over (order by date) as prev_id_date
from t
) t
where prev_id_series <> prev_id_date;
You can fetch problematic rows and their corresponding conflicting rows using SELF JOIN like this (assuming your table is called "series"):
SELECT s1.id AS row_id, s1.series AS row_series, s1.date AS row_date,
s2.id AS conflict_id, s2.series AS conflict_series, s2.date AS conflict_date
FROM series AS s1
JOIN series AS s2
ON s1.series > s2.series AND s1.date < s2.date;

Select complete record with earliest timestamp on a day for each employee [duplicate]

This question already has answers here:
Group by minimum value in one field while selecting distinct rows
(10 answers)
Closed 2 years ago.
I have a table that stores facial login data of employees based upon employee id. I need to get the earliest login for each employee on a day and all other logins to be ignored. I know how to get latest or earliest record for each employee but I am unable to figure out how to get earliest entry in each day by each employee.
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 14 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-11 12:07:24 |
| 15 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-11 12:09:35 |
| 16 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-11 12:09:41 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 18 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-12 14:40:20 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 20 | 2 | 3p74yrq35nfaazwdo8auguvn2h5hpugtfvvw | 2 | 2020-07-12 16:07:24 |
| 21 | 2 | hpa2am40ufke7o7q2y733hh83h7ykxxdgkof | 16 | 2020-07-12 17:09:35 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
The result will look like below...
+----+-----------+--------------------------------------+-------------+-----------------------+
| id | camera_id | image_name | employee_id | created_at |
+----+-----------+--------------------------------------+-------------+-----------------------+
| 10 | 2 | pjcc7vf142pec6li7k8kqxuqvnmhm0tyo8ib | 16 | 2020-07-11 10:40:20 |
| 11 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-11 10:40:22 |
| 17 | 2 | tapufkiuj5toxfdoikjicbe3k7tl32yj5khp | 16 | 2020-07-12 12:09:47 |
| 19 | 2 | 9iizfdtk3m81a745ut7tzqzqh8kf9ipz2u02 | 2 | 2020-07-12 15:40:22 |
| 22 | 2 | g7adgyzloab2t4z7xx2id0a9cjqx8ojfni99 | 2 | 2020-07-13 12:09:41 |
+----+-----------+--------------------------------------+-------------+-----------------------+
You can do:
select *
from t
where (employee_id, created_at) in (
select employee_id, min(created_at)
from t
group by employee_id, date(created_at)
)
how to get earliest entry in each day by each employee
You can filter with a correlated subquery:
select t.*
from mytable t
where t.created_at = (
select min(t1.created_at)
from mytable t1
where
t1.employee_id = t.employee_id
and t1.created_at >= date(t.created_at)
and t1.created_at < date(t.created_at) + interval 1 day
)
This query would take advantage of an index on (employee_id, created_at).
Or, if you are running MySQL 8.0, you can use window functions:
select *
from (
select
t.*,
row_number() over(
partition by employee_id, date(created_at)
order by created_at
) rn
from mytable t
) t
where rn = 1

Get first and last record number in every date exists in table

I am trying to show invoices for every single day, so for that purpose I used group by on created date and sum on subtotal. This is how I done it :
SELECT
`main_table`.*,
SUM(subtotal) AS `total_sales`
FROM
`sales_invoice` AS `main_table`
GROUP BY
DATE_FORMAT(created_at, "%m-%y")
Its working, but I also want to get the Invoice # from and Invoice # to for every date. Is it possible to do it with single query ?
EDIT :
Table Structure :
------------------------------------------------
| id | inoice_no | created_at | subtotal
| 1 | 34 | 2015-03-17 05:55:27 | 5
| 2 | 35 | 2015-03-17 12:35:00 | 7
| 3 | 36 | 2015-03-20 01:40:00 | 3
| 4 | 37 | 2015-03-20 07:05:13 | 6
| 5 | 38 | 2015-03-20 10:25:23 | 1
| 6 | 39 | 2015-03-24 12:00:00 | 6
------------------------------------------------
Output
---------------------------------------------------------------
| id | inoice_no | created_at | subtotal | total_sales
| 2 | 35 | 2015-03-17 12:35:00 | 7 | 12
| 5 | 38 | 2015-03-20 10:25:23 | 1 | 10
| 6 | 39 | 2015-03-24 12:00:00 | 6 | 6
-----------------------------------------------------------------
What I Expect
---------------------------------------------------------------
| id | inoice_no | created_at | subtotal | total_sales | in_from | in_to
| 2 | 35 | 2015-03-17 12:35:00 | 7 | 12 | 34 | 35
| 5 | 38 | 2015-03-20 10:25:23 | 1 | 10 | 36 | 38
| 6 | 39 | 2015-03-24 12:00:00 | 6 | 6 | 39 | 39
-----------------------------------------------------------------
If your invoice number is INTEGER then below query will give you the result what you want:
SELECT DATE_FORMAT(A.created_at, "%m-%y") AS InvoiceDate,
MIN(A.invoiveNo) AS FromInvoiceNo,
MAX(A.invoiveNo) AS ToInvoiceNo,
SUM(A.subtotal) AS total_sales
FROM sales_invoice AS A
GROUP BY InvoiceDate;
I guess salesid is primaryid in sales_invoice table.
select * from(
SELECT
`main_table`.*,
SUM(subtotal) AS `total_sales`
FROM
`sales_invoice` AS `main_table`
GROUP BY
DATE_FORMAT(created_at, "%m-%y")
order by main_table.salesid limit 1
union all
SELECT
`main_table`.*,
SUM(subtotal) AS `total_sales`
FROM
`sales_invoice` AS `main_table`
GROUP BY
DATE_FORMAT(created_at, "%m-%y")
order by main_table.salesid desc limit 1
)a

mysql select most recent row by date for each user

I'm trying to select the most recent rows for every unique userid where pid = 50 and active = 1. I haven't been able to figure it out.
Here is a sample table
+-----+----------+-------+-----------------------+---------+
| id | userid | pid | start_date | active |
+-----+----------+-------+-----------------------+---------+
| 1 | 4 | 50 | 2015-05-15 12:00:00 | 1 |
| 2 | 4 | 50 | 2015-05-16 12:00:00 | 1 |
| 3 | 4 | 50 | 2015-05-17 12:00:00 | 0 |
| 4 | 4 | 51 | 2015-06-29 12:00:00 | 1 |
| 5 | 4 | 51 | 2015-06-30 12:00:00 | 1 |
| 6 | 5 | 50 | 2015-07-05 12:00:00 | 1 |
| 7 | 5 | 50 | 2015-07-06 12:00:00 | 1 |
| 8 | 5 | 51 | 2015-07-08 12:00:00 | 1 |
+-----+----------+-------+-----------------------+---------+
Desired Result
+-----+----------+-------+-----------------------+---------+
| id | userid | pid | start_date | active |
+-----+----------+-------+-----------------------+---------+
| 2 | 4 | 50 | 2015-05-16 12:00:00 | 1 |
| 7 | 5 | 50 | 2015-07-06 12:00:00 | 1 |
+-----+----------+-------+-----------------------+---------+
I've tried a bunch of things and this is the closest I got but unfortunately it is not quit there.
SELECT *
FROM mytable t1
WHERE
(
SELECT COUNT(*)
FROM mytable t2
WHERE
t1.userid = t2.userid
AND t1.start_date < t2.start_date
) < 1
AND pid = 50
AND active = 1
ORDER BY start_date DESC
plan
get last record grouping by userid where pid is 50 and is active
inner join to mytable to get the record info associated with last
query
select
my.*
from
(
select userid, pid, active, max(start_date) as lst
from mytable
where pid = 50
and active = 1
group by userid, pid, active
) maxd
inner join mytable my
on maxd.userid = my.userid
and maxd.pid = my.pid
and maxd.active = my.active
and maxd.lst = my.start_date
;
output
+----+--------+-----+------------------------+--------+
| id | userid | pid | start_date | active |
+----+--------+-----+------------------------+--------+
| 2 | 4 | 50 | May, 16 2015 12:00:00 | 1 |
| 7 | 5 | 50 | July, 06 2015 12:00:00 | 1 |
+----+--------+-----+------------------------+--------+
sqlfiddle
notes
as suggested by #Strawberry, updated to join also on pid and active. this will avoid the possibility of a record which is not active or not pid 50 but has exact same date also being rendered.

Filter out closest duplicated rows (but not completely all) from MySQL table

In table I need to filter out nearest duplicated rows which have same status_id (but not completely all) when user_id is the same. GROUP BY or DISTINCT did not help in this situation. Here is an example:
---------------------------------------------------
| id | user_id | status_id | date |
---------------------------------------------------
| 1 | 10 | 1 | 2010-10-10 10:00:10|
| 2 | 10 | 1 | 2010-10-11 10:00:10|
| 3 | 10 | 1 | 2010-10-12 10:00:10|
| 4 | 10 | 2 | 2010-10-13 10:00:10|
| 5 | 10 | 4 | 2010-10-14 10:00:10|
| 6 | 10 | 4 | 2010-10-15 10:00:10|
| 7 | 10 | 2 | 2010-10-16 10:00:10|
| 8 | 10 | 2 | 2010-10-17 10:00:10|
| 9 | 10 | 1 | 2010-10-18 10:00:10|
| 10 | 10 | 1 | 2010-10-19 10:00:10|
Have to look like:
---------------------------------------------------
| id | user_id | status_id | date |
---------------------------------------------------
| 1 | 10 | 1 | 2010-10-10 10:00:10|
| 4 | 10 | 2 | 2010-10-13 10:00:10|
| 5 | 10 | 4 | 2010-10-14 10:00:10|
| 7 | 10 | 2 | 2010-10-16 10:00:10|
| 9 | 10 | 1 | 2010-10-18 10:00:10|
Oldest entries (by date) should remain in the table
You want to keep each row where the previous status is different, based on the id or date column.
If your ids are really sequential (as they are in the question), you can do this with a convenient join:
select t.*
from t left outer join
t tprev
on t.id = tprev.id+1
where tprev.id is null or tprev.status <> t.status;
If the ids are not sequential, you can get the previous one using a correlated subquery:
select t.*
from (select t.*,
(select t2.status
from t t2
where t2.user_id = t.user_id and
t2.id < t.id
order by t2.id desc
limit 1
) as prevstatus
from t
) t
where prevstatus is null or prevstatus <> t.status;