Number duplicate records on the MySQL table - mysql

Have a table with similar schema
id control code amount
1 200 12 300
2 400 12 300
3 200 12 300
4 100 10 400
5 100 10 400
6 500 13 500
Trying to list the duplicates of records on a UI.
Using following query I can retrieve the duplicate records and show it on UI.
select * from mwt group by control,code,amount having count(id) > 1;
id control code amount
1 200 12 300
4 100 10 400
Here the records with id 1 and 4 are duplicates of 3 and 5 respectively.
On the UI, the user will click a check-box adjacent to the record and corresponding duplicate records should be populate to the UI. To make things easier trying to populate another column named dup_id. Using this dup_id it is possible to filter the results from UI , which is in the JSON format.
How to create a result set similar to the one shown below?
id control code amount dup_id
1 200 12 300 1
2 400 12 300
3 200 12 300 1
4 100 10 400 4
5 100 10 400 4
6 500 13 500

This seems like a simpler solution than that suggested by #kickstarter - but maybe I've misunderstood the requirement...
SELECT x.*
, y.dup_id
FROM my_table x
LEFT
JOIN
( SELECT MIN(id) dup_id
, control
, code
, amount
FROM my_table
GROUP
BY control
, code
, amount
HAVING COUNT(*) > 1
) y
ON y.control = x.control
AND y.code = x.code
AND y.amount = x.amount;

Depending on how accurate the order has to be, you could do something like this.
This is getting all the unique control / code / amount with a count, to get a flag to know if that is a duplicate row, and ordered by control / code / amount so that they are in order. It does a cross join to initialise a few user variables.
Then it calculates a counter, only incrementing it if any of control / code / amount have changed AND it is a duplicate row. Then sets user variables to store the previous values of control / code / amount.
The outer query then orders the results back in to id order.
SELECT sub3.id,
sub3.control,
sub3.code,
sub3.amount,
sub3.dup_id
FROM
(
SELECT sub2.id,
sub2.control,
sub2.code,
sub2.amount,
#cnt:=IF(#control=control AND #code=code AND #amount=amount AND sub2.id_count IS NOT NULL, #cnt, IF(sub2.id_count IS NULL, #cnt, #cnt + 1)),
#control:=control,
#code:=code,
#amount:=amount,
IF(sub2.id_count IS NULL, NULL, #cnt) AS dup_id
FROM
(
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub1.id_count
FROM mwt
LEFT OUTER JOIN
(
SELECT control, code, amount, COUNT(id) AS id_count
FROM mwt
GROUP BY control,code,amount
HAVING id_count > 1
) sub1
ON mwt.control = sub1.control
AND mwt.code = sub1.code
AND mwt.amount = sub1.amount
ORDER BY mwt.control, mwt.code, mwt.amount
) sub2
CROSS JOIN
(
SELECT #cnt:=0, #control:=0, #code:=0, #amount:=0
) sub0
) sub3
ORDER BY id
Note that this is ordering by control, code and amount, so not an exact match for your required output (which would require getting the first duplicates ordered by id first).
EDIT - Simpler and better way to do it. This gets all the duplicate rows with the min id for those duplicates (ordered by the min id), and uses a user variable to add a sequence number for those. Then LEFT OUTER JOINs that back against the main table to put that sequence number in all the matching rows.
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub2.dup_id
FROM mwt
LEFT OUTER JOIN
(
SELECT sub1.id, sub1.control, sub1.code, sub1.amount, #cnt:=#cnt+1 AS dup_id
FROM
(
SELECT MIN(id) AS id, control, code, amount
FROM mwt
GROUP BY control,code,amount
HAVING COUNT(id) > 1
ORDER BY id
) sub1
CROSS JOIN
(
SELECT #cnt:=0
) sub0
) sub2
ON mwt.control = sub2.control
AND mwt.code = sub2.code
AND mwt.amount = sub2.amount
ORDER BY mwt.id

Would you need a dup_id column ?. I hope this can be achieved with a simple query like below
select id
, control
, code
, amount
from table
where control = from selected Record
and code = from selected Record
and amount = from selected Record
and id not equals from selected Record
You can very well omit the last not equals if the requirement is to list down duplicates including the selected record.

Related

Retrieve the record with the newest change per group

I've got a table locations:
user | timestamp | state | other_data
-------------------------------------
1 100 1 some_data
1 200 1 some_data
1 300 0 some_data
1 400 0 some_data
2 100 0 some_data
2 200 0 some_data
This is for a location tracking app. A location has two states (1 for "user is within range" and 0 for "user is out of range").
Now I want to retrieve the last time a user's location state has changed.
Example for user = 1
user | timestamp | state | other_data
-------------------------------------
1 300 0 some_data
Because this was the first location update that has the same state value as the "current" (timestamp 400) record.
Higher-level description: I want to display the user something like "You have been in / out of range since [timestamp]"
The faster the solution, the better of course
I would use ranks to order the rows and then pick the min timestamp of the first ranked rows.
select user,min(timestamp) as timestamp,min(state) as state
from
(select l.*,#rn:=case when #user=user and #state=state then #rn
when #user<>user then 1
else #rn+1 end as rnk
,#user:=user,#state:=state
from locations l
cross join (select #rn:=0,#user:='',#state:='') r
order by user,timestamp desc
) t
where rnk=1
group by user
You can do this with a correlated subquery:
select l.*
from locations l
where l.timestamp = (select max(l2.timestamp)
from locations l2
where l2.user = l.user
);
For this to work well, you want an index on locations(user, timestamp).
This can be faster than the join and group by approach. In particular, the correlated subquery can make use of an index, but a group by on the whole table often does not (in MySQL).
As far as I am aware the only way to achieve this is a sells join. Something a bit like;
Select table.* From table
Inner Join
(Select id, max(timestamp) as tm from table group by id) as m
On m.tm = table.timestamp and m.id = table.id
Syntax is for MsSQL, it should transfer to MySQL though. Might have to specify column names instead of table.*

MySQL : Group By Clause Not Using Index when used with Case

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets
Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)

SELECT rows with minimum count(*)

Let's say i have a simple table voting with columns
id(primaryKey),token(int),candidate(int),rank(int).
I want to extract all rows having specific rank,grouped by candidate and most importantly only with minimum count(*).
So far i have reached
SELECT candidate, count( * ) AS count
FROM voting
WHERE rank =1
AND candidate <200
GROUP BY candidate
HAVING count = min( count )
But,it is returning empty set.If i replace min(count) with actual minimum value it works properly.
I have also tried
SELECT candidate,min(count)
FROM (SELECT candidate,count(*) AS count
FROM voting
where rank = 1
AND candidate < 200
group by candidate
order by count(*)
) AS temp
But this resulted in only 1 row,I have 3 rows with same min count but with different candidates.I want all these 3 rows.
Can anyone help me.The no.of rows with same minimum count(*) value will also help.
Sample is quite a big,so i am showing some dummy values
1 $sampleToken1 101 1
2 $sampleToken2 102 1
3 $sampleToken3 103 1
4 $sampleToken4 102 1
Here ,when grouped according to candidate there are 3 rows combining with count( * ) results
candidate count( * )
101 1
103 1
102 2
I want the top 2 rows to be showed i.e with count(*) = 1 or whatever is the minimum
Try to use this script as pattern -
-- find minimum count
SELECT MIN(cnt) INTO #min FROM (SELECT COUNT(*) cnt FROM voting GROUP BY candidate) t;
-- show records with minimum count
SELECT * FROM voting t1
JOIN (SELECT id FROM voting GROUP BY candidate HAVING COUNT(*) = #min) t2
ON t1.candidate = t2.candidate;
Remove your HAVING keyword completely, it is not correctly written.
and add SUB SELECT into the where clause to fit that criteria.
(ie. select cand, count(*) as count from voting where rank = 1 and count = (select ..... )
The HAVING keyword can not use the MIN function in the way you are trying. Replace the MIN function with an absolute value such as HAVING count > 10

Elegant mysql to select, group, combine multiple rows from one table

Here is a simplified version of my table:
group price spec
a 1 .
a 2 ..
b 1 ...
b 2
c .
. .
. .
I'd like to produce a result like this: (I'll refer to this as result_table)
price_a |spec_a |price_b |spec_b |price_c ...|total_cost
1 |. |1 |.. |... |
(min) (min) =1+1+...
Basically I want to:
select the rows containing the min price within each group
combine columns into a single row
I know this can be done using several queries and/or combined with some non-sql processing on the results, but I suspect that there maybe better solutions.
The reason that I want to do task 2 (combine columns into a single row)
is because I want to do something like the following with the result_table:
select *,
(result_table.total_cost + table1.price + table.2.price) as total_combined_cost
from result_table
right join table1
right join table2
This may be too much to ask for, so here is some other thoughts on the problem:
Instead of trying to combine multiple rows(task 2), store them in a temporary table
(which would be easier to calculate the total_cost using sum)
Feel free to drop any thoughts, don't have to be complete answer, I feel it's brilliant enough if you have an elegant way to do task 1 !
==Edited/Added 6 Feb 2012==
The goal of my program is to identify best combinations of items with minimal cost (and preferably possess higher utilitarian value at the same time).
Consider #ypercube's comment about large number of groups, temporary table seems to be the only feasible solution. And it is also pointed out there is no pivoting function in MySQL (although it can be implemented, it's not necessary to perform such operation).
Okay, after study #Johan's answer, I'm thinking about something like this for task 1:
select * from
(
select * from
result_table
order by price asc
) as ordered_table
group by group
;
Although looks dodgy, it seems to work.
==Edited/Added 7 Feb 2012==
Since there could be more than one combination may produce the same min value, I have modified my answer :
select result_table.* from
(
select * from
(
select * from
result_table
order by price asc
) as ordered_table
group by group
) as single_min_table
inner join result_table
on result_table.group = single_min_table.group
and result_table.price = single_min_table.price
;
However, I have just realised that there is another problem I need to deal with:
I can not ignore all the spec, since there is a provider property, items from different providers may or may not be able to be assembled together, so to be safe (and to simplify my problem) I decide to combine items from the same provider only, so the problem becomes:
For example if I have an initial table like this(with only 2 groups and 2 providers):
id group price spec provider
1 a 1 . x
2 a 2 .. y
3 a 3 ... y
4 b 1 ... y
5 b 2 x
6 b 3 z
I need to combine
id group price spec provider
1 a 1 . x
5 b 2 x
and
2 a 2 .. y
4 b 1 ... y
record (id 6) can be eliminated from the choices since it dose not have all the groups available.
So it's not necessarily to select only the min of each group, rather it's to select one from each group so that for each provider I have a minimal combined cost.
You cannot pivot in MySQL, but you can group results together.
The GROUP_CONCAT function will give you a result like this:
column A column B column c column d
groups specs prices sum(price)
a,b,c some,list,xyz 1,5,7 13
Here's a sample query:
(The query assumes you have a primary (or unique) key called id defined on the target table).
SELECT
GROUP_CONCAT(a.`group`) as groups
,GROUP_CONCAT(a.spec) as specs
,GROUP_CONCAT(a.min_price) as prices
,SUM(a.min_prices) as total_of_min_prices
FROM
( SELECT price, spec, `group` FROM table1
WHERE id IN
(SELECT MIN(id) as id FROM table1 GROUP BY `group` HAVING price = MIN(price))
) AS a
See: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
Producing the total_cost only:
SELECT SUM(min_price) AS total_cost
FROM
( SELECT MIN(price) AS min_price
FROM TableX
GROUP BY `group`
) AS grp
If a result set with the minimum prices returned in row (not in column) per group is fine, then your problem is of the gretaest-n-per-group type. There are various methods to solve it. Here's one:
SELECT tg.grp
tm.price AS min_price
tm.spec
FROM
( SELECT DISTINCT `group` AS grp
FROM TableX
) AS tg
JOIN
TableX AS tm
ON
tm.PK = --- the Primary Key of the table
( SELECT tmin.PK
FROM TableX AS tmin
WHERE tmin.`group` = tg.grp
ORDER BY tmin.price ASC
LIMIT 1
)

MySQL GROUP BY DateTime +/- 3 seconds

Suppose I have a table with 3 columns:
id (PK, int)
timestamp (datetime)
title (text)
I have the following records:
1, 2010-01-01 15:00:00, Some Title
2, 2010-01-01 15:00:02, Some Title
3, 2010-01-02 15:00:00, Some Title
I need to do a GROUP BY records that are within 3 seconds of each other. For this table, rows 1 and 2 would be grouped together.
There is a similar question here: Mysql DateTime group by 15 mins
I also found this: http://www.artfulsoftware.com/infotree/queries.php#106
I don't know how to convert these methods into something that will work for seconds. The trouble with the method on the SO question is that it seems to me that it would only work for records falling within a bin of time that starts at a known point. For instance, if I were to get FLOOR() to work with seconds, at an interval of 5 seconds, a time of 15:00:04 would be grouped with 15:00:01, but not grouped with 15:00:06.
Does this make sense? Please let me know if further clarification is needed.
EDIT: For the set of numbers, {1, 2, 3, 4, 5, 6, 7, 50, 51, 60}, it seems it might be best to group them {1, 2, 3, 4, 5, 6, 7}, {50, 51}, {60}, so that each grouping row depends on if the row is within 3 seconds of the previous. I know this changes things a bit, I'm sorry for being wishywashy on this.
I am trying to fuzzy-match logs from different servers. Server #1 may log an item, "Item #1", and Server #2 will log that same item, "Item #1", within a few seconds of server #1. I need to do some aggregate functions on both log lines. Unfortunately, I only have title to go on, due to the nature of the server software.
I'm using Tom H.'s excellent idea but doing it a little differently here:
Instead of finding all the rows that are the beginnings of chains, we can find all times that are the beginnings of chains, then go back and ifnd the rows that match the times.
Query #1 here should tell you which times are the beginnings of chains by finding which times do not have any times below them but within 3 seconds:
SELECT DISTINCT Timestamp
FROM Table a
LEFT JOIN Table b
ON (b.Timestamp >= a.TimeStamp - INTERVAL 3 SECONDS
AND b.Timestamp < a.Timestamp)
WHERE b.Timestamp IS NULL
And then for each row, we can find the largest chain-starting timestamp that is less than our timestamp with Query #2:
SELECT Table.id, MAX(StartOfChains.TimeStamp) AS ChainStartTime
FROM Table
JOIN ([query #1]) StartofChains
ON Table.Timestamp >= StartOfChains.TimeStamp
GROUP BY Table.id
Once we have that, we can GROUP BY it as you wanted.
SELECT COUNT(*) --or whatever
FROM Table
JOIN ([query #2]) GroupingQuery
ON Table.id = GroupingQuery.id
GROUP BY GroupingQuery.ChainStartTime
I'm not entirely sure this is distinct enough from Tom H's answer to be posted separately, but it sounded like you were having trouble with implementation, and I was thinking about it, so I thought I'd post again. Good luck!
Now that I think that I understand your problem, based on your comment response to OMG Ponies, I think that I have a set-based solution. The idea is to first find the start of any chains based on the title. The start of a chain is going to be defined as any row where there is no match within three seconds prior to that row:
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
Now we can assume that any non-chain starters belong to the chain starter that appeared before them. Since MySQL doesn't support CTEs, you might want to throw the above results into a temporary table, as that would save you the multiple joins to the same subquery below.
SELECT
SQ1.my_id,
COUNT(*) -- You didn't say what you were trying to calculate, just that you needed to group them
FROM
(
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
) SQ1
INNER JOIN My_Table MT3 ON
MT3.title = SQ1.title AND
MT3.my_time >= SQ1.my_time
LEFT OUTER JOIN
(
SELECT
MT1.my_id,
MT1.title,
MT1.my_time
FROM
My_Table MT1
LEFT OUTER JOIN My_Table MT2 ON
MT2.title = MT1.title AND
(
MT2.my_time < MT1.my_time OR
(MT2.my_time = MT1.my_time AND MT2.my_id < MT1.my_id)
) AND
MT2.my_time >= MT1.my_time - INTERVAL 3 SECONDS
WHERE
MT2.my_id IS NULL
) SQ2 ON
SQ2.title = SQ1.title AND
SQ2.my_time > SQ1.my_time AND
SQ2.my_time <= MT3.my_time
WHERE
SQ2.my_id IS NULL
This would look much simpler if you could use CTEs or if you used a temporary table. Using the temporary table might also help performance.
Also, there will be issues with this if you can have timestamps that match exactly. If that's the case then you will need to tweak the query slightly to use a combination of the id and the timestamp to distinguish rows with matching timestamp values.
EDIT: Changed the queries to handle exact matches by timestamp.
Warning: Long answer. This should work, and is fairly neat, except for one step in the middle where you have to be willing to run an INSERT statement over and over until it doesn't do anything since we can't do recursive CTE things in MySQL.
I'm going to use this data as the example instead of yours:
id Timestamp
1 1:00:00
2 1:00:03
3 1:00:06
4 1:00:10
Here is the first query to write:
SELECT a.id as aid, b.id as bid
FROM Table a
JOIN Table b
ON (a.Timestamp is within 3 seconds of b.Timestamp)
It returns:
aid bid
1 1
1 2
2 1
2 2
2 3
3 2
3 3
4 4
Let's create a nice table to hold those things that won't allow duplicates:
CREATE TABLE
Adjacency
( aid INT(11)
, bid INT(11)
, PRIMARY KEY (aid, bid) --important for later
)
Now the challenge is to find something like the transitive closure of that relation.
To do so, let's find the next level of links. by that I mean, since we have 1 2 and 2 3 in the Adjacency table, we should add 1 3:
INSERT IGNORE INTO Adjacency(aid,bid)
SELECT adj1.aid, adj2.bid
FROM Adjacency adj1
JOIN Adjacency adj2
ON (adj1.bid = adj2.aid)
This is the non-elegant part: You'll need to run the above INSERT statement over and over until it doesn't add any rows to the table. I don't know if there is a neat way to do that.
Once this is over, you will have a transitively-closed relation like this:
aid bid
1 1
1 2
1 3 --added
2 1
2 2
2 3
3 1 --added
3 2
3 3
4 4
And now for the punchline:
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
returns:
aid Neighbors
1 1,2,3
2 1,2,3
3 1,2,3
4 4
So
SELECT DISTINCT Neighbors
FROM (
SELECT aid, GROUP_CONCAT( bid ) AS Neighbors
FROM Adjacency
GROUP BY aid
) Groupings
returns
Neighbors
1,2,3
4
Whew!
I like #Chris Cunningham's answer, but here's another take on it.
First, my understanding of your problem statement (correct me if I'm wrong):
You want to look at your event log as a sequence, ordered by the time of the event,
and partitition it into groups, defining the boundary as being an interval of
more than 3 seconds between two adjacent rows in the sequence.
I work mostly in SQL Server, so I'm using SQL Server syntax. It shouldn't be too difficult to translate into MySQL SQL.
So, first our event log table:
--
-- our event log table
--
create table dbo.eventLog
(
id int not null ,
dtLogged datetime not null ,
title varchar(200) not null ,
primary key nonclustered ( id ) ,
unique clustered ( dtLogged , id ) ,
)
Given the above understanding of the problem statement, the following query should give you the upper and lower bounds your groups. It's a simple, nested select statement with 2 group by to collapse things:
The innermost select defines the upper bound of each group. That upper boundary defines a group.
The outer select defines the lower bound of each group.
Every row in the table should fall into one of the groups so defined, and any given group may well consist of a single date/time value.
[edited: the upper bound is the lowest date/time value where the interval is more than 3 seconds]
select dtFrom = min( t.dtFrom ) ,
dtThru = t.dtThru
from ( select dtFrom = t1.dtLogged ,
dtThru = min( t2.dtLogged )
from dbo.EventLog t1
left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
and datediff(second,t1.dtLogged,t2.dtLogged) > 3
group by t1.dtLogged
) t
group by t.dtThru
You could then pull rows from the event log and tag them with the group to which they belong thus:
select *
from ( select dtFrom = min( t.dtFrom ) ,
dtThru = t.dtThru
from ( select dtFrom = t1.dtLogged ,
dtThru = min( t2.dtLogged )
from dbo.EventLog t1
left join dbo.EventLog t2 on t2.dtLogged >= t1.dtLogged
and datediff(second,t1.dtLogged,t2.dtLogged) > 3
group by t1.dtLogged
) t
group by t.dtThru
) period
join dbo.EventLog t on t.dtLogged >= period.dtFrom
and t.dtLogged <= coalesce( period.dtThru , t.dtLogged )
order by period.dtFrom , period.dtThru , t.dtLogged
Each row is tagged with its group via the dtFrom and dtThru columns returned. You could get fancy and assign an integral row number to each group if you want.
Simple query:
SELECT * FROM time_history GROUP BY ROUND(UNIX_TIMESTAMP(time_stamp)/3);