mysql combine rows on conditions - mysql

I have a table like:
user | area | start | end
1 1 12 18
1 1 19 27
1 1 29 55
1 1 80 99
means: a 'user' appeared in an 'area' from time 'start' to time 'end', areas can be overlapped.
what I want is to get a result like:
user | start-end
1 12-18,19-27,29-55
1 80-99
which means: combine appears with time difference less than a specified value, i.e (row2.start - row1.end < 10), and one result row stands for one 'visit' of the area for a user.
Currently I can distinguish each visit and get the count of visits by comparing the same table using one sql statement. But I'm not able to find a way to get the above result.
Any help is appreciated.
Explanation: The first 3 appears are linked together as only one visit because: row2.start-row1.end < 10 and row3.start-row2.end < 10, the last appear is a new visit because:80(row4.start) - 55(row3.end) >= 10 .

We need two steps:
1 - combine a row with its predcessor to have start and last end in the same row
SELECT
user, area, start, end, #lastend AS lastend, #lastend:=end AS ignoreme
FROM
tablename,
(SELECT #lastend:=0) AS init
ORDER BY user, area, start, end;
2 - use the difference as a grouping criterion
SELECT
...
FROM
...
(SELECT #groupnum:=0) AS groupinit
GROUP BY
... ,
IF(start-lastend>=10,#groupnum:=#groupnum+1,#groupnum)
Now let's combine it:
SELECT
user, area,
GROUP_CONCAT(CONCAT(start,"-",end)) AS start_end
FROM (
SELECT
user, area, start, end, #lastend AS lastend, #lastend:=end AS ignoreme
FROM
tablename,
(SELECT #lastend:=0) AS init
ORDER BY user, area, start, end
) AS baseview,
(SELECT #groupnum:=0) AS groupinit
GROUP BY
user, area,
IF(start-lastend>=10,#groupnum:=#groupnum+1,#groupnum)
Edit
Fixed typos and verified: SQLfiddle

Related

MySQL accessing previous row values

I have two columns Page_from and Page_to. When the records are sorted, they appear as...
Page_from Page_to
1 4
5 7
9 11
Here page number 8 is missing.
I want to find the missing page number.So I must be able to compare the value of Page_to in previous row with Page_from in current row.
You can find the beginning of a missing sequence by finding the previous record, and comparing the previous page_to and the current page_from. If there is a gap, you can get both the first and last page in the gap.
select tprev.page_to + 1 as missing_page_from, t.page_from - 1 as missing_page_to
from (select t.*,
(select tprev.page_to
from t tprev
where tprev.page_from < t.page_from
limit 1
) as prev_to
from t
) t
where prev_to is not null and
prev_to <> t.page_from - 1;

Mysql query to skip rows and check for status changes

I'm building a mysql query but I'm stuck... (I'm logging each minute)
I have 3 tables. Logs, log_field, log_value.
logs -> id, create_time
log_value -> id, log_id,log_field_id,value
log_field -> id, name (one on the entries is status and username)
The values for status can be online,offline and idle...
What I would like to see is from my query is:
When in my logs someone changes from status, I want a row with create_time, username, status.
So for a given user, I want my query to skip rows until a new status appears...
And I need to be able to put a time interval in which status changes are ignored.
Can someone please help ?
Although you have nothing to differentiate an actual "User" (such as by user ID) listed in your post, and what happens if you have two "John Smith" names.
First, an introduction to MySQL #variables. You can think of them as an inline program running while the query is processing rows. You create variables, then change them as each row gets processed, IN THE SAME order as the := assignment in the field selection occurs which is critical. I'll cover that shortly.
Fist an initial premise. You have a field value table of all possible fields that can/do get logged. Of which, two of them exist... one is for the user's name, another for the status you are looking a log changed. I don't know what those internal "ID" numbers are, but they would have to be fixed values per your existing table. In my scenario, I am assuming that field ID = 1 is for the User's Name, and field ID 2 = status column... Otherwise, you would need two more joins to get the field table just to confirm which field was the one you wanted. Obviously my "ID" field values will not match your production tables, so please change those accordingly.
Here's the query...
select FinalAlias.*
from (
select
PQ.*,
if( #lastUser = PQ.LogUser, 1, 0 ) as SameUser,
#lastTime := if( #lastUser = PQ.LogUser, #lastTime, #ignoreTime ) as lastChange,
if( PQ.create_time > #lastTime + interval 20 minute, 1, 0 ) as BeyondInterval,
#lastTime := PQ.create_time as chgTime,
#lastUser := PQ.LogUser as chgUser
from
( select
ByStatus.id,
l.create_time,
ByStatus.Value LogStatus,
ByUser.Value LogUser
from
log_value as ByStatus
join logs l
on ByStatus.log_id = l.id
join log_value as ByUser
on ByStatus.log_id = ByUser.log_id
AND ByUser.log_field_id = 1
where
ByStatus.log_field_id = 2
order by
ByUser.Value,
l.create_time ) PQ,
( select #lastUser := '',
#lastTime := now(),
#ignoreTime := now() ) sqlvars
) FinalAlias
where
SameUser = 1
and BeyondInterval = 1
Now, what's going on. The inner-most query (result alias PQ representing "PreQuery") is just asking for all log values where the field_id = 2 (status column) exists. From that log entry, go to the log table for it's creation time... while we're at it, join AGAIN to the log value table on the same log ID, but this time also look for field_id = 1 so we can get the user name.
Once that is done, get the log ID, Creation time, Status Value and Who it was for all pre-sorted on a per-user basis and sequentially time oriented. This is the critical step. The data must be pre-organized by user/time to compare the "last" time for a given user to the "next" time their log status changed.
Now, the MySQL #variables. Join the prequery to another select of #variables which is given an "sqlvars" query alias. This will pre-initialize the variables fo #lastUser, #lastTime and #ignoreTime. Now, look at what I'm doing in the field list via section
if( #lastUser = PQ.LogUser, 1, 0 ) as SameUser,
#lastTime := if( #lastUser = PQ.LogUser, #lastTime, #ignoreTime ) as lastChange,
if( PQ.create_time > #lastTime + interval 20 minute, 1, 0 ) as BeyondInterval,
#lastTime := PQ.create_time as chgTime,
#lastUser := PQ.LogUser as chgUser
This is like doing the following pseudo code in a loop for every record (which is already sequentially ordered by same person and their respective log time
FOR EACH ROW IN RESULT SET
Set a flag "SameUser" = 1 if the value of the #lastUser is the same
as the current person record we are looking at
if the last user is the same as the previous record
use the #lastTime field as the "lastChange" column
else
use the #ignore field as the last change column
Now, build another flag based on the current record create time
and whatever the #lastTime value is based on a 20 minute interval.
set it to 1 if AT LEAST the 20 minute interval has been meet.
Now the key to the cycling the next record.
force the #lastTime = current record create_time
force the #lastUser = current user
END FOR LOOP
So, if you have the following as a result of the prequery... (leaving date portion off)
create status user sameuser lastchange 20minFlag carry to next row compare
07:34 online Bill 0 09:05 0 07:34 Bill
07:52 idle Bill 1 07:34 0 07:52 Bill
08:16 online Bill 1 07:52 1 08:16 Bill
07:44 online Mark 0 09:05 0 07:44 Mark
07:37 idle Monica 0 09:05 0 07:37 Monica
08:03 online Monica 1 07:37 1 08:03 Monica
Notice first record for Bill. The flag same user = 0 since there was nobody before him. The last change was 9:05 (via the NOW() when creating the sqlvars variables), but then look at the "carry to next row compare". This is setting the #lastTime and #lastUser after the current row was done being compared as needed.
Next row for Bill. It sees he is same as last user previous row, so the SameUser flag is set to 1. We now know that we have a good "Last Time" to compare against the current record "Create Time". So, from 7:34 to 7:52 is 18 minutes and LESS than our 20 minute interval so the 20 minute flag is set to 0. Now, we retain the current 7:52 and Bill for third row.
Third row for Bill. Still Same User (flag=1), last change of 7:52 compared to now 8:16 and we have 24 minutes... So the 20 minute flag = 1. Retain 8:16 and Bill for next row.
First row for Mark. Same User = 0 since last user was Bill. Uses same 9:05 ignore time and don't care about 20 min flag, but now save 7:44 and Mark for next row compare.
On to Monica. Different than Mark, so SameUser = 0, etc to finish similar to Bill.
So, now we have all the pieces and rows considered. Now, take all these and wrap them up as the "FinalAlias" of the query and all we do is apply a WHERE clause for "SameUser = 1" AND "20 Minute Flag" has been reached.
You can strip down the final column list as needed, and remove the where clause to look at results, but be sure to add an outer ORDER BY clause for name/create_time to see similar pattern as I have here.

Checking consecutive values at a MySQL query

I have a MySQL table like this:
ID - Time - Value
And I'm getting every pair of ID, Time (grouped by ID) where Value is greater than a certain threshold. So basicaly, I'm getting every ID which has at least one time a value greater than the threshold. The query looks like this:
SELECT ID, Time FROM mydb.MYTABLE
WHERE Value>%s AND Time>=%s AND Time<=%s
GROUP BY ID
EDIT: The Time checks allow to operate in a time range of my choice between all the data which is into the table; it has nothing else to do with what I am asking.
It works perfectly, but now I want to add some filtering: I want it to avoid those times the value is greater than the threshold (let's call it alarms) if the alarm hasn't happened also the Time just before or just after. I mean: if the alarm accurs at a single, isolated instant of time instead of two consecutive instants of time, I'll consider it is a false alarm and avoid it to be returned at the query response.
Of course I can do this with a call for each Id to check for this, but I'd like to do this in a single query to make it faster. I guess I could use conditionals, but I don't have that expertise at MySQL.
Any help?
EDIT2: Example for Threshold = 10
ID - Time - Value
1 - 2004 - 9
1 - 2005 - 11
1 - 2006 - 8
2 - 2107 - 12
2 - 2109 - 13
3 - 3402 - 11
3 - 3403 - 12
In this example, only ID 3 should be a valid alarm, since 2 consecutive time values for this ID have their value > threshold. ID 1 has a single, isolated alarm, so it should be filteres. For ID 2 there are 2 alarms, but not consecutive, so it should be also filtered.
Something like this:
10 - is a threshold
0 - minimum of the time period
100000 - maximum of the time period
select ID, min(Time)
from
(
SELECT ID, Time,
(select max(time) from t
where Time<t1.Time
and Id=t1.Id
and Value>10) LAG_G,
(select max(time) from t
where Time<t1.Time
and Id=t1.Id
and Value<=10) LAG_L,
(select min(time) from t
where Time>t1.Time
and Id=t1.Id
and Value>10) LEAD_G,
(select min(time) from t
where Time>t1.Time
and Id=t1.Id
and Value<=10) LEAD_L
FROM t as t1
WHERE Value>10 AND Time>=0 AND Time<=100000
) t3
where ifnull(LAG_G,0)>ifnull(LAG_L,0)
OR
ifnull(LEAD_G,100000)<ifnull(LEAD_L,100000)
GROUP BY ID
SQLFiddle demo
This query works for searching near records.
If you need to search records by Time (+1, -1 ) as you've mentioned in the comment try this query:
select ID, min(Time) from t as t1
where Value>10
AND Time>=%s2 AND Time<=%s1
and
(
Exists(select 1 from t where Value>10
and Id=t1.Id
and Time=t1.Time-1)
OR
Exists(select 1 from t where Value>10
and Id=t1.Id
and Time=t1.Time+1)
)
group by ID
SQLFiddle demo
such alarm ?
SELECT ID, Time , count(if(value>%treshold ,1,0)) alert_active
FROM mydb.MYTABLE
WHERE Value>%s3 AND Time>=%s2 AND Time<=%s1
GROUP BY ID;
i don't understand exactly:
In this example, only ID 3 should be a valid alarm, since 2
consecutive time values for this ID have their value > threshold. ID 1
has a single, isolated alarm, so it should be filteres. For ID 2 there
are 2 alarms, but not consecutive, so it should be also filtered.
I guess that You want filter alerts:
SELECT ID, Time
FROM mydb.MYTABLE
WHERE Value>%s3 AND Time>=%s2 AND Time<=%s1
GROUP BY ID
having value<%treshold;

SQL Query Help - Grouping By Sequences of Digits

I have a table, which includes the following columns and data:
id dtime instance data dtype
1 2012-10-22 10000 d 1
2 2012-10-22 10000 d 1
..
7 2012-10-22 10004 d 1
..
15 2012-10-22 10000 # 1
16 2012-10-22 10004 d 1
17 2012-10-22 10000 d 1
I want to group sequences of 'd's in the data column, with the '#' at the end of the sequence.
This could have been done by grouping via the instance column, which is an individual stream of data, however there can be multiple sequences within the stream.
I also want to end a sequence if there are no data columns in the same instance for, say, 3 seconds after the last data of that instance and no '#'s have been found within that interval.
I have managed to do exactly this using cursors and while loops, which worked reasonably well for tables with 1000s of rows, however this query will be used on far more rows eventually, and these two methods would take around a minute with a dataset of just 3-5000 rows.
Reading on this website and others, it seems that set-based logic may be the way to go, however I can think of no way to do what I need without some kind of loop on each row that compares it to every other to build the 'sequences'.
If anyone could help, or point me in the direction of something that could, it would be greatly appreciated. :)
I would ideally like the data to be output in the following format:
datacount instance lastdata dtime
20 10000 # 2012-10-22
19 10000 d 2012-10-22
22 10004 # 2012-10-22
20 10022 # 2012-10-22
Where (datacount) is a count of the number of rows in a 'sequence' (which is the data leading up to a '#' or 3 second delay), (instance) is the instance ID from the original table, (lastdata) is the last data value in the sequence, (dtime) is the datetime value of the last data value.
Let me show you how to do this for the final '#'. The time difference follows a similar idea. The key idea is to get the next '#' after the current row. For this you need a correlated subquery. After that, you can do a group by:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime
from (select t.*,
(select min(t2.id) from t t2 where t2.id > t.id and t2.data = '#'
) as groupid
from t
) t
group by groupid
Handling the time sequence is a bit more complicated. It is something like this:
select groupid, count(*) as NumInSeq, max(dtime) as LastDateTime,
(case when sum(case when data = '#' then 1 else 0 end) > 0 then '#' else 'd' end) as FinalData
from (select t.*,
(select min(t2.id)
from t t2
where t2.id > t.id and
(t2.data = '#' or UNIX_TIMESTAMP(t2.dtime) - UNIX_TIMESTAMP(t.dtime) < 3
) as groupid
from t
) t
group by groupid

Retrieve maximum value from a table containing duplicate values according to a condition

I have a table tbl_usertests from which i want to retrieve the user who have maximum testscore for each test.
Note: User here means usertestid which is unique.
Its colums are:
pk_usertestid attemptdate uploaddate fk_tbl_tests_testid fk_tbl_users_userid testscore totalquestionsnotattempted totalquestionscorrect totalquestionsincorrect totalquestions timetaken iscurrent
data :
1;NULL;"2010-06-24 22:48:07";"11";"3";"1";"53";"1";"21";"75";"92";"1"
2;NULL;"2010-06-25 01:21:37";"11";"4";"13";"0";"13";"62";"75";"801";"1"
3;NULL;"2010-06-25 01:21:50";"10";"4";"17";"5";"17";"53";"75";"640";"1"
4;NULL;"2010-06-25 01:24:23";"11";"4";"13";"0";"13";"62";"75";"801";"1"
5;NULL;"2010-06-25 01:24:47";"10";"4";"17";"5";"17";"53";"75";"640";"1"
6;NULL;"2010-06-25 01:36:04";"11";"5";"13";"0";"13";"62";"75";"801";"1"
7;NULL;"2010-06-25 01:47:26";"7";"5";"10";"1";"10";"49";"60";"302";"1"
My Query is :
SELECT max(`testscore`) , `fk_tbl_tests_testid` , `fk_tbl_users_userid` , `pk_usertestid`
FROM `tbl_usertests`
GROUP BY `fk_tbl_tests_testid`
This query output:
max(`testscore`) fk_tbl_tests_testid fk_tbl_users_userid pk_usertestid
10 7 5 7
17 10 4 3
13 11 3 1
But the problem is that if there are two users who have same score, it displays only one user because i have used group by clause.
For. e.g. testid =10 i have two records(pk_usertestid 3 and 5) but it displays 3 only.
I want the user whose upload date is less than the other user(in case of two users having same testscore). It should display for usertestid=3 since 3 upload date is less than 5.
Right now its displaying 3 but it is due to group by clause.
I am unable to construct the query.
Please help me on this
Thanks
Try this:
SELECT t.`fk_tbl_tests_testid` , t.`fk_tbl_users_userid` , t.`pk_usertestid`, maxscores.maxscore
FROM `tbl_usertests` t
JOIN (SELECT `fk_tbl_tests_testid`,max(`testscore`) as maxscore
FROM `tbl_usertests`
GROUP BY `fk_tbl_tests_testid`) maxscores ON t.`fk_tbl_tests_testid` = maxscores.`fk_tbl_tests_testid`
the logic behind is to separate the whole thing into two parts: get the maximum (or any other aggregate) values for each group (this is the subquery part), then for each element, join the corresponding aggregate. (JOIN it back to the riginal table)