Mysql query on 2 or 3 tables - mysql

| ds |
|sales_id| date_issued |
| 1 | 2016-11-30 01:00:00 |
| 2 | 2016-11-30 02:00:00 |
| 3 | 2016-11-30 03:00:00 |
| dsr |
| dsr_id | quantity | date_returned |
| 5 | 1 | 2016-11-30 01:01:00|
| 6 | 1 | 2016-11-30 01:11:00|
| 7 | 3 | 2016-11-30 02:21:00|
| 8 | 1 | 2016-11-30 02:31:00|
| 9 | 2 | 2016-11-30 03:02:00|
How or what query would it be where I could apply this logic
ADD the quantities of dsr WHERE its date_returned is greater than the first date_issued AND less than the following date_issued
that the result would be:
| 2 |
| 4 |
| 2 |
The idea would be something like this:
| dsr |
| dsr_id | quantity | date_returned |
| 5 | 1 | 2016-11-30 01:01:00| --- This 1st and 2nd rows
| 6 | 1 | 2016-11-30 01:11:00| --/ will be added because
the 1st date_issued is
'2016-11-30 01:00:00' >= (the 1st date_returned) < '2016-11-30 02:00:00' which is the following date_issued
| 7 | 3 | 2016-11-30 02:21:00| --- Same idea for this two
| 8 | 1 | 2016-11-30 02:31:00| --/ Since its fits to the condition where this date_returned is just between 2nd & 3rd's date_issued
| 9 | 2 | 2016-11-30 03:02:00|
I know this could be easily done programmatically but I just want to know and learn how to do it in SQL and if it is easier in SQL.

SQL DEMO
SELECT start_date, end_date, SUM(D.quantity)
FROM (
SELECT ds1.`sales_id`,
ds1.`date_issued` start_date,
COALESCE(ds2.`date_issued`, CURRENT_DATE) as end_date
FROM ds as ds1
LEFT JOIN ds as ds2
ON ds1.`sales_id` = ds2.`sales_id` - 1
) R
JOIN dsr D
ON D.`date_returned` >= start_date
AND D.`date_returned` < end_date
GROUP BY start_date, end_date
OUTPUT

Related

MySQL - select average of column A for first N entries from column B

I have a ratings table, where each user can add one rating a day. But each user might miss several days between ratings.
I'd like to get the average rating for each user_id's first 7 entries of created_at.
My table:
mysql> desc entries;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| rating | tinyint(4) | NO | | NULL | |
| user_id | int(10) unsigned | NO | MUL | NULL | |
| created_at | timestamp | YES | | NULL | |
+------------+------------------+------+-----+---------+----------------+
Ideally I'd just get something like:
+------------+------------------+
| day | average_rating |
+------------+------------------+
| 1 | 2.53 |
+------------+------------------+
| 2 | 4.30 |
+------------+------------------+
| 3 | 3.67 |
+------------+------------------+
| 4 | 5.50 |
+------------+------------------+
| 5 | 7.23 |
+------------+------------------+
| 6 | 6.98 |
+------------+------------------+
| 7 | 7.22 |
+------------+------------------+
The closest I've been able to get is:
SELECT rating, user_id, created_at FROM entries ORDER BY user_id asc, created at desc
Which isn't very close at all...
Is it even possible? Will the performance be terrible? It's something that would need to run every time a web page is loaded, so would it be better to just run this once a day and save the results? (to another table!?)
edit - second attempt
Working towards a solution, I think this would get the rating for each user's first day:
select rating from entries where user_id in
(select user_id from entries order by created_at limit 1);
But I get:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
So now I'm going to play around with JOIN to see if that helps.
edit - third attempt, getting closer
I found this stackoverflow post, which is closer to what I want.
select e1.* from entries e1 left join entries e2
on (e1.user_id = e2.user_id and e1.created_at > e2.created_at)
where e2.id is null;
It gets the rating for the first day for each user.
Next step is to work out how to get days 2 to 7. I can't use 1.created_at > e2.created_at for that, so I'm really confused now.
edit - fourth attempt
Okay, I think it's not possible. Once I worked out how to turn off 'full group by' mode, I realised I'll probably need to use a subquery with limit <user_id>, <day_num>, for which I get:
ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
My current method is to just get the entire table, and use PHP to calculate the average for each day.
If I understand correctly you want to take the last 7 ratings the user gave, ordered by the date they gave the rating. The last 7 ratings of one user may fall on different days to another user, however they will be averaged together regardless of date.
First we need to order the data by user and date and give each user their own incrementing row count. I do this by adding two variables, one for the last user id and one for the row number:
select e.created_at,
e.rating,
if(#lastUser=user_id,#row := #row+1, #row:=1) as row,
#lastUser:= e.user_id as user_id
from entries e,
( select #row := 0, #lastUser := 0 ) vars
order by e.user_id asc,
e.created_at desc;
If the previous user_id is different we reset the row counter to 1. The result from this is:
+---------------------+--------+------+---------+
| created_at | rating | row | user_id |
+---------------------+--------+------+---------+
| 2017-01-10 00:00:00 | 1 | 1 | 1 |
| 2017-01-09 00:00:00 | 1 | 2 | 1 |
| 2017-01-08 00:00:00 | 1 | 3 | 1 |
| 2017-01-07 00:00:00 | 1 | 4 | 1 |
| 2017-01-06 00:00:00 | 1 | 5 | 1 |
| 2017-01-05 00:00:00 | 1 | 6 | 1 |
| 2017-01-04 00:00:00 | 1 | 7 | 1 |
| 2017-01-03 00:00:00 | 1 | 8 | 1 |
| 2017-01-02 00:00:00 | 1 | 9 | 1 |
| 2017-01-01 00:00:00 | 1 | 10 | 1 |
| 2017-01-13 00:00:00 | 1 | 1 | 2 |
| 2017-01-11 00:00:00 | 1 | 2 | 2 |
| 2017-01-09 00:00:00 | 1 | 3 | 2 |
| 2017-01-07 00:00:00 | 1 | 4 | 2 |
| 2017-01-05 00:00:00 | 1 | 5 | 2 |
| 2017-01-03 00:00:00 | 1 | 6 | 2 |
| 2017-01-01 00:00:00 | 1 | 7 | 2 |
| 2017-01-13 00:00:00 | 1 | 1 | 3 |
| 2017-01-01 00:00:00 | 1 | 2 | 3 |
| 2017-01-03 00:00:00 | 1 | 1 | 4 |
| 2017-01-01 00:00:00 | 1 | 2 | 4 |
| 2017-01-02 00:00:00 | 1 | 1 | 5 |
+---------------------+--------+------+---------+
We now simply wrap this in another statement to select the avg where the row number is less than or equal to seven.
select e1.row day, avg(e1.rating) avg
from (
select e.created_at,
e.rating,
if(#lastUser=user_id,#row := #row+1, #row:=1) as row,
#lastUser:= e.user_id as user_id
from entries e,
( select #row := 0, #lastUser := 0 ) vars
order by e.user_id asc,
e.created_at desc) e1
where e1.row <=7
group by e1.row;
This outputs:
+------+--------+
| day | avg |
+------+--------+
| 1 | 1.0000 |
| 2 | 1.0000 |
| 3 | 1.0000 |
| 4 | 1.0000 |
| 5 | 1.0000 |
| 6 | 1.0000 |
| 7 | 1.0000 |
+------+--------+

mysql display each type in row and group by date range

I am not sure this data structure able to do the result I want.
http://sqlfiddle.com/#!9/84939
This is the data, please ignore the duration column.
+----+---------------------+---------------------+---------------------+----------+--------+------+
| id | created_date | start_date | end_date | duration | status | type |
+----+---------------------+---------------------+---------------------+----------+--------+------+
| 1 | 2016-04-05 15:23:29 | 2016-08-15 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 2 |
| 2 | 2016-04-06 15:23:29 | 2016-08-15 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 1 |
| 3 | 2016-04-06 15:23:29 | 2016-08-15 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 3 |
| 4 | 2016-04-06 15:23:29 | 2016-08-17 10:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 1 |
| 5 | 2016-04-06 15:23:29 | 2016-08-17 09:21:53 | 2016-08-19 00:00:00 | 30 | 1 | 1 |
| 6 | 2016-04-06 15:23:29 | 2016-08-01 09:21:53 | 2016-08-31 00:00:00 | 30 | 1 | 1 |
| 7 | 2016-04-06 15:23:29 | 2016-08-01 09:21:53 | 2016-08-31 00:00:00 | 30 | 0 | 1 |
| 8 | 2016-04-06 15:23:29 | 2016-08-15 09:21:53 | 2016-08-16 00:00:00 | 30 | 1 | 2 |
| 9 | 2016-04-06 15:23:29 | 2016-08-16 09:21:53 | 2016-08-17 00:00:00 | 30 | 1 | 3 |
| 10 | 2016-04-06 15:23:29 | 2016-08-19 09:21:53 | 2016-08-20 00:00:00 | 30 | 1 | 2 |
+----+---------------------+---------------------+---------------------+----------+--------+------+
I want to filter the report from 2016-08-15 until 2016-08-19. for 2015-08-19 even 00:00:00, I am not sure consider count or not. But for my example. I just count it because it is in the range.
This is the summary done by me manually:-
(type-2)15,16,17,18,19
(type-1)15,16,17,18,19
(type-3)15,16,17,18,19
(type-1)17,18,19
(type-1)17,18,19
(type-1)15,16,17,18,19
(type-1)15,16,17,18,19
(type-2)15,16
(type-3)16,17
(type-2)19,20
This is the result I would like to generate in sql return data.
+------------+--------+-----------+-----------+-----------+
| date | ct_all | ct_type_1 | ct_type_2 | ct_type_3 |
+------------+--------+-----------+-----------+-----------+
| 2016-08-15 | 6 | 3 | 2 | 1 |
| 2016-08-16 | 7 | 3 | 2 | 2 |
| 2016-08-17 | 8 | 5 | 1 | 2 |
| 2016-08-18 | 7 | 5 | 1 | 1 |
| 2016-08-19 | 8 | 5 | 2 | 1 |
+------------+--------+-----------+-----------+-----------+
ct_all = count all
ct_type_1 = count total for type 1
As long as the type fall into start_date and end_date then it will count.
Normally we done search date is base on one column type, e.g created_date. and I can use between >= and <= to find the range. But this one got start and end date. Not sure can be accomplished or not.
You have three different things going on here.
an enumeration of days.
a DATETIME range filter.
a so-called pivot, pivoting rows by type into columns.
It's helpful to take these one at a time.
First, I guess you have five days you wish to filter, [15-Aug-2016 - 19-Aug-2016] inclusive. You want to make a list of all those days. This little query will do that. (http://sqlfiddle.com/#!9/84939/21/0)
SELECT CONVERT('2016-08-15' + INTERVAL seq DAY, DATETIME) AS CURDATE
FROM (SELECT 0 AS SEQ UNION ALL SELECT 1 UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4
) seq_0_to_4
(Notice something: The MariaDB fork of MySQL has sequence tables like seq_0_to_4 built in so you don't have to do all this UNION ALL stuff.)
Second, you want to get a list of the type values occurring on each day. You can get that to happen with a LEFT JOIN, like so (http://sqlfiddle.com/#!9/84939/26/0).
SELECT seq.curdate, record.type
FROM (
SELECT CONVERT('2016-08-15' + INTERVAL seq DAY, DATETIME) AS CURDATE
FROM (SELECT 0 AS SEQ UNION ALL SELECT 1 UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4
) seq_0_to_4
) seq
LEFT JOIN record ON seq.curdate >= DATE(record.start_date)
AND seq.curdate <= DATE(record.end_date)
This gives you a list of curdate and type values.
The ON condition of that join chooses record rows that start on or before each date, and end anytime on each date.
Finally, you need to do a pivot operation to summarize the counts of type values. That looks something like this. (http://sqlfiddle.com/#!9/84939/28/0)
SELECT curdate,
COUNT(type) ct_all,
SUM(CASE WHEN type = 1 THEN 1 ELSE 0 END) ct_1,
SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) ct_2,
SUM(CASE WHEN type = 3 THEN 1 ELSE 0 END) ct_3
FROM (the above query) d
GROUP BY curdate
ORDER BY curdate
This is a case where the structured part of Structured Query Language is necessary.

Calculate timediff but exclude weekend

How to calculate total hours between now and any date but to exclude weekdays?
I'm trying on this way:
select id, creationTime,
time_format(timediff(now(), creationTime), '%H:%m:%s') AS totalspenttime
from tblrezgo where DAYOFWEEK(creationTime) NOT IN (1,7)
This query should remove saturday and sundays from calculation but it seems that includes also those two days.
By running query:
select id, creationTime, DAYOFWEEK(creationTime) FROM tblrezgo
Output is:
+-------------+---------------------+------------+
| ID | creationTime | DAYOFWEEK |
+-------------+---------------------+------------+
| 1 | 2015-10-23 17:12:05 | 6 |
+-------------+---------------------+------------+
| 2 | 2015-10-24 10:23:11 | 7 |
+-------------+---------------------+------------+
| 3 | 2015-10-24 11:51:04 | 7 |
+-------------+---------------------+------------+
| 4 | 2015-10-26 14:30:28 | 2 |
+-------------+---------------------+------------+
| 5 | 2015-10-26 08:24:59 | 2 |
+-------------+---------------------+------------+
| 6 | 2015-10-26 17:29:03 | 2 |
+-------------+---------------------+------------+
| 7 | 2015-10-27 08:16:45 | 3 |
+-------------+---------------------+------------+
If i run my query then totalspenttime for ID = 1 is about 86 hour which is not correct. I've checked and it should be about 41 hours 'til now (if we execlude two days of weekend).

get amount between range [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
This a simple my table
+-----------+----------------+-----------+
| id | date | meter |
------------+----------------+-----------+
| 1 | 2103-11-01 | 5 |
| 2 | 2103-11-10 | 8 |
| 4 | 2103-11-14 | 10 |
| 6 | 2103-11-20 | 18 |
| 7 | 2103-11-25 | 25 |
| 10 | 2103-11-29 | 30 |
+-----------+----------------+-----------+
how do I get the results to the use of meters between two ranges of the results of recording time,
like bellow
+----------------+----------------+-------+-----+--------+
| date1 | date2 | start | end | amount |
+----------------+----------------+-------+-----+--------+
| 2013-11-01 | 2013-11-10 | 5 | 8 | 3 |
| 2013-11-10 | 2013-11-14 | 8 | 10 | 2 |
| 2013-11-14 | 2013-11-20 | 10 | 18 | 8 |
| 2013-11-20 | 2013-11-25 | 18 | 25 | 7 |
| 2013-11-25 | 2013-11-29 | 25 | 30 | 5 |
+----------------+----------------+-------+-----+--------+
Edit:
I got it:
select meters1.date as date1, min(meters2.date) as date2, meters1.meter as start,
meters2.meter as end, (meters2.meter - meters1.meter) as amount
from meters meters1, meters meters2 where meters1.date < meters2.date
group by date1;
Outputs:
+------------+------------+-------+-----+--------+
| date1 | date2 | start | end | amount |
+------------+------------+-------+-----+--------+
| 2013-11-01 | 2013-11-10 | 5 | 8 | 3 |
| 2013-11-10 | 2013-11-14 | 8 | 10 | 2 |
| 2013-11-14 | 2013-11-20 | 10 | 18 | 8 |
| 2013-11-20 | 2013-11-25 | 18 | 25 | 7 |
| 2013-11-25 | 2013-11-29 | 25 | 30 | 5 |
+------------+------------+-------+-----+--------+
Original Post:
This is most of the way there:
select meters1.date as date1, meters2.date as date2, meters1.meter as start,
meters2.meter as end, (meters2.meter - meters1.meter) as amount
from meters meters1, meters meters2 having date1 < date2 order by date1;
It outputs:
+------------+------------+-------+-----+--------+
| date1 | date2 | start | end | amount |
+------------+------------+-------+-----+--------+
| 2013-11-01 | 2013-11-10 | 5 | 8 | 3 |
| 2013-11-01 | 2013-11-20 | 5 | 18 | 13 |
| 2013-11-01 | 2013-11-29 | 5 | 30 | 25 |
| 2013-11-01 | 2013-11-14 | 5 | 10 | 5 |
| 2013-11-01 | 2013-11-25 | 5 | 25 | 20 |
| 2013-11-10 | 2013-11-20 | 8 | 18 | 10 |
| 2013-11-10 | 2013-11-29 | 8 | 30 | 22 |
| 2013-11-10 | 2013-11-14 | 8 | 10 | 2 |
| 2013-11-10 | 2013-11-25 | 8 | 25 | 17 |
| 2013-11-14 | 2013-11-25 | 10 | 25 | 15 |
| 2013-11-14 | 2013-11-20 | 10 | 18 | 8 |
| 2013-11-14 | 2013-11-29 | 10 | 30 | 20 |
| 2013-11-20 | 2013-11-25 | 18 | 25 | 7 |
| 2013-11-20 | 2013-11-29 | 18 | 30 | 12 |
| 2013-11-25 | 2013-11-29 | 25 | 30 | 5 |
+------------+------------+-------+-----+--------+
If it's SQL server try it this way
WITH cte AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY date) rnum
FROM table1
)
SELECT c.date date1, p.date date2, c.meter [start], p.meter [end], p.meter - c.meter amount
FROM cte c JOIN cte p
ON c.rnum = p.rnum - 1
Here is SQLFiddle demo
If it's MySQL then you can do
SELECT date1, date2, meter1, meter2, meter2 - meter1 amount
FROM
(
SELECT #d date2, date date1, #m meter2, meter meter1, #d := date, #m := meter
FROM table1 CROSS JOIN (SELECT #d := NULL, #m := NULL) i
ORDER BY date DESC
) q
WHERE date2 IS NOT NULL
ORDER BY date1
Here is SQLFiddle demo
Output in both cases:
| DATE1 | DATE2 | START | END | AMOUNT |
|------------|------------|-------|-----|--------|
| 2103-11-01 | 2103-11-10 | 5 | 8 | 3 |
| 2103-11-10 | 2103-11-14 | 8 | 10 | 2 |
| 2103-11-14 | 2103-11-20 | 10 | 18 | 8 |
| 2103-11-20 | 2103-11-25 | 18 | 25 | 7 |
| 2103-11-25 | 2103-11-29 | 25 | 30 | 5 |
MySql
SELECT DATES.date1,
DATES.date2,
m1.meter as start,
m2.meter as end,
m2.meter - m1.meter as amount
FROM
(SELECT date as date1,
(SELECT min(date)
FROM tableName t2
WHERE t2.date > t1.date) as date2
FROM tableName t1
)DATES,
tableName m1,
tableName m2
WHERE DATES.date2 IS NOT NULL
AND m1.date = DATES.date1
AND m2.date = DATES.date2
ORDER BY DATES.date1
sqlFiddle here
in MS-SQL SERVER 2002 change the word end to "end" as it complains about syntax near end
You haven't made it clear whether you're really using mySQL or SQL Server but I'm posting a solution that works for SQL 2008 and above. Might work for 2005 but I can't test that.
-- Set up a temp table with sample data
DECLARE #testData AS TABLE(
id int,
dt date,
meter int)
INSERT #testData(id, dt, meter) VALUES
(1, '2013-11-01', 5)
,(2, '2013-11-10', 8)
,(4, '2013-11-14', 10)
,(6, '2013-11-20', 18)
,(7, '2013-11-25', 25)
,(10, '2013-11-29',30)
---------------------------------------------
-- Begin SQL Server solution
;WITH cte AS (
SELECT
ROW_NUMBER() OVER (ORDER BY id) AS rownum
,id
,dt
,meter
FROM
#testData AS [date2]
)
SELECT
t1.id
,t1.dt AS [date1]
,t2.dt AS [date2]
,t1.meter AS [start]
,t2.meter AS [end]
,t2.meter - t1.meter AS [amount]
FROM
cte t1
LEFT OUTER JOIN cte t2 ON (t2.rownum = t1.rownum + 1)
WHERE
t2.dt IS NOT NULL
If you're using MySQL, then a self-join will work well here. Join the table to itself, using an ON clause to make sure you don't join the same record to itself. This will give you ((N * N) - N) permutations of your data, where N is the number of original rows.
SELECT
...
FROM
tableName first
JOIN
tableName second
ON first.id != second.id
Then, it's all about SELECTing the right stuff (including the calculation of the difference between the two meter values). To get the columns in the result set you posted, you'd probably want to SELECT:
first.date AS date1,
second.date AS date2,
first.meter AS start,
second.meter AS end,
ABS(first.meter - second.meter) AS amount
Edit
Ah, I see. I'd envisioned something like a inter-city mileage chart that you used to see on road maps (where you'd have the same cities in the rows and columns, and the cell in the intersection would indicate the number of miles between those two cities.
But it looks like you just want to compare values from one date to the next. If that's the case, you can take advantage of the way MySQL handles GROUPing and ORDERing... but be careful, because I'm not sure this is guaranteed:
mysql> SELECT
table1.date AS date1,
table2.date AS date2,
table1.meter AS start,
table2.meter AS end,
ABS(table1.meter - table2.meter) AS amount
FROM tableName table1
JOIN tableName table2
WHERE table2.date > table1.date
GROUP BY table1.date
ORDER BY table2.date - table1.date;
+---------------------+---------------------+-------+------+--------+
| date1 | date2 | start | end | amount |
+---------------------+---------------------+-------+------+--------+
| 2103-11-25 00:00:00 | 2103-11-29 00:00:00 | 25 | 30 | 5 |
| 2103-11-10 00:00:00 | 2103-11-14 00:00:00 | 8 | 10 | 2 |
| 2103-11-20 00:00:00 | 2103-11-25 00:00:00 | 18 | 25 | 7 |
| 2103-11-14 00:00:00 | 2103-11-20 00:00:00 | 10 | 18 | 8 |
| 2103-11-01 00:00:00 | 2103-11-10 00:00:00 | 5 | 8 | 3 |
+---------------------+---------------------+-------+------+--------+
5 rows in set (0.00 sec)

MySQL count rows with similar timestamp

Is there anyway to count a given run of timestamps that are close to each other, but not necessarily in a fixed time frame?
Ie, not grouped by hour or minute, but rather grouped by how close the current row's timestamp is to the next row's timestamp. If the next row is within "x" seconds/minutes then add that row to the group, otherwise start a new grouping.
Given this data:
+----+---------+---------------------+
| id | item_id | event_date |
+----+---------+---------------------+
| 1 | 1 | 2013-05-17 11:59:59 |
| 2 | 1 | 2013-05-17 12:00:00 |
| 3 | 1 | 2013-05-17 12:00:02 |
| 4 | 1 | 2013-05-17 12:00:03 |
| 5 | 3 | 2013-05-17 14:05:00 |
| 6 | 3 | 2013-05-17 14:05:01 |
| 7 | 3 | 2013-05-17 15:30:00 |
| 8 | 3 | 2013-05-17 15:30:01 |
| 9 | 3 | 2013-05-17 15:30:02 |
| 10 | 1 | 2013-05-18 09:12:00 |
| 11 | 1 | 2013-05-18 09:13:30 |
| 12 | 1 | 2013-05-18 09:13:45 |
| 13 | 1 | 2013-05-18 09:14:00 |
| 14 | 2 | 2013-05-20 15:45:00 |
| 15 | 2 | 2013-05-20 15:45:03 |
| 16 | 2 | 2013-05-20 15:45:10 |
| 17 | 2 | 2013-05-23 07:36:00 |
| 18 | 2 | 2013-05-23 07:36:10 |
| 19 | 2 | 2013-05-23 07:36:12 |
| 20 | 2 | 2013-05-23 07:36:15 |
| 21 | 1 | 2013-05-24 11:55:00 |
| 22 | 1 | 2013-05-24 11:55:02 |
+----+---------+---------------------+
Desired Results:
+---------+-------+---------------------+
| item_id | total | last_date_in_group |
+---------+-------+---------------------+
| 1 | 4 | 2013-05-17 12:00:03 |
| 3 | 2 | 2013-05-17 14:05:01 |
| 3 | 3 | 2013-05-17 15:30:02 |
| 1 | 4 | 2013-05-18 09:14:00 |
| 2 | 3 | 2013-05-20 15:45:10 |
| 2 | 4 | 2013-05-23 07:36:15 |
| 1 | 2 | 2013-05-24 11:55:02 |
+---------+-------+---------------------+
This is a little complicated. To start, you need is time of the next event for each record. The following subquery adds in such a time (nexted), if it is within bounds:
select t.*,
(select event_date
from t t2
where t2.item_id = t.item_id and
t2.event_date > t.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t
This uses a correlated subquery. The <date comparison here> is for whatever date comparison you want. When there is no record, the value will be NULL.
Now, with this information (nexted) there is a trick to get the grouping. For any record, it is the first event time afterwards where nexted is NULL. This will be the last event in the series. Unfortunately, this requires two levels of nested correlated subqueries (or joins with aggregations). The result looks a bit unwieldy:
select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date,
COUNT(*) as num_dates
from (select t.*,
(select min(t2.event_date)
from (select t1.*,
(select event_date
from t t2
where t2.item_id = t1.item_id and
t2.event_date > t1.event_date and
<date comparison here>
order by event_date limit 1
) as nexted
from t1
) t2
where t2.nexted is null
) as grouping
from t
) s
group by item_id, grouping;
What about approaching it from finding each individual record's local associations, and then grouping on the max event date from each record's discoveries. This is based on a static differential time interval (5 minutes in my example)
SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM (
SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group
FROM table_name t1
LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE
GROUP BY t1.id
) t
GROUP BY last_date_in_group