MySQL - self join optimization - mysql

I have a table of phone events by HomeId. Each row has an EventId (on hook, off hook, ring, DTMF, etc), TimeStamp, Sequence (auto increment) and HomeId. Im working on a query to find specific types of occurrences(IE inbound or outbound calls) and duration.
I had planned on doing this using a multiple self-join on this table to pick out the sequences of events that usually indicate one type of occurrence or the other. EG inbound calls would be a period of inactivity followed by no DTMF, then ringing and caller id (possibly) then an off hook. I would find the next on-hook and thus have the duration.
My table is indexed by HomeId, EventId and Sequence and has ~60K records. When I do an 'explain' of my query it shows indexing and 75, 75, 1, 1, 748 for the row counts. Seems pretty doable. But when I run the query its taking more than 10 minutes (at which point the MySQL query browser times out).
Query for outbound calls:
select pe0.HomeId, pe1.Stamp, pe1.mSec, timediff( pe4.Stamp, pe0.Stamp ) from Phone_Events pe0
join Phone_Events pe1 on pe0.HomeId = pe1.HomeId and pe1.Sequence = pe0.Sequence - 1 and abs(timediff( pe0.Stamp, pe1.Stamp )) > 10
join Phone_Events pe2 on pe0.HomeId = pe2.HomeId and pe2.Sequence = pe0.Sequence + 1 and pe2.EventId = 22
join Phone_Events pe4 on pe4.HomeId = pe0.HomeId and pe4.EventId = 30 and pe4.Stamp > pe0.Stamp
where pe0.eventId = 12 and pe0.HomeId = 111
AND
NOT EXISTS(SELECT * FROM Phone_Events pe3
WHERE pe3.HomeId = pe0.HomeId
AND pe3.EventId not in( 13, 22 )
AND pe3.Stamp > pe0.Stamp and pe3.Stamp < pe4.Stamp );
Is there something specific to self joining that makes this slow? Is there a better way to optimize this? The killer seems to be the 'not exists' portion - this part is there to make sure there are no events between the last 'on hook' and the current 'off hook'.
EDIT: EventId's as follows:
'1', 'device connection'
'2', 'device disconnection'
'3', 'device alarm'
'11', 'ring start'
'12', 'off hook'
'13', 'hang up(other end)'
'15', 'missed call'
'21', 'caller id'
'22', 'dtmf'
'24', 'device error'
'30', 'on hook'
'31', 'ring stop'

Complete rewrite based on new information. How I approached this was to start with an inner-most query to get all records we care about based exclusively on HomeID = 111 and make sure they came back pre-sorted by the sequence ID (have index on HomeID, Sequence). As we all know, a phone call starts by picking up the phone -- eventID = 12, getting dial tone -- eventid = 22, dialing out, and someone answering, until the phone is back on the hook -- eventid = 30). If its a hangup (eventid=13), we want to ignore it.
I don't know why you are looking at the sequence # PRIOR to the current call, don't know if it really has any bearing. It looks like you are just trying to get completed calls and how long the duration. That said, I would remove the portion of the LEFT JOIN Phone_Event and the corresponding WHERE clause. It may have been there while you were just trying to figure this out.
Anyhow, back to the logic. The inner most guarantees the call sequences in order. You won't have two calls simultaneous. So by getting them in order first, I then join to the SQLVars (which creates inline variable #NextCall for the query). The purpose for this is to identify every time a new call is about to begin (EventID = 12). If so, take whatever the sequence number is, and save it. This will remain the same until the next call, so all the other "event IDs" will have the same "starting sequence ID". In addition, I'm looking for the other events... an event = 22 based on the starting sequence +1 and setting it as a flag. Then, the max time based on the start of the call (only set when eventid = 12), and end of the call (eventid = 30), and finally a flag based on your check for a hang up (eventid = 13) ie: don't consider the call if it was a hangup and no connection through.
By doing a group by, I've in essence, rolled-up each call to its own line... grouped by the home ID, and the sequence number used to initiate the actual phone call. Once THAT is done, I can then query the data and compute the call duration since the start/end time are on the same row, no self-self-self joins involved.
Finally, the where clause... Kick out any phone calls that HAD a HANG UP. Again, I don't know if you still need the element of what the starting call's time was of the last ending event.
SELECT
PreGroupedCalls.*,
timediff( PreGroupedCalls.CallEndTime, PreGroupedCalls.CallStartTime ) CallDuration
from
( SELECT
Calls.HomeID,
#NextCall := #NextCall + if( Calls.EventID = 12, Calls.Sequence, #NextCall ) as NextNewCall,
MAX( if( Calls.EventID = 12, Calls.Stamp, 0 )) as CallStartTime,
MAX( if( Calls.EventID = 30, Calls.Stamp, 0 )) as CallEndTime,
MAX( if( Calls.EventID = 22 and Calls.Sequence = #NewCallFirstSeq +1, 1, 0 )) as HadDTMFEntry,
MAX( if( Calls.EventID = 13 and Calls.Sequence = #NewCallFirstSeq +1, 1, 0 )) as WasAHangUp
from
( select pe.HomeId,
pe.Sequence,
pe.EventID,
pe.Stamp
from
Phone_Events pe
where
pe.HomeID = 111
order by
pe.Sequence ) Calls,
( select #NextCall := 0 ) SQLVars
group by
Calls.HomeID,
NextNewCall ) PreGroupedCalls
LEFT JOIN Phone_Event PriorCallEvent
ON PreGroupCalls.NextNewCall = PriorCallEvent.Sequence -1
where
PreGroupedCalls.WasHangUp = 0
AND ( PriorCallEvent.Sequence IS NULL
OR abs(timediff( PriorCallEvent.Stamp, PreGroupedCalls.CallStartTime )) > 10 )
COMMENT FROM FEEDBACK / ERROR reported
To try and fix the DOUBLE error, you obviously will need to make a slight change in the SQLVars select.. try the following
( select #NextCall := CAST( 0 as INT ) ) SQLVars
Now, what the IF() is doing... Lets take a look.
#NextCall + if(Calls.EventID = 12,Calls.Sequence, #NextCall)
means take a look at the Event ID. If it is a 12 (ie: off-hook), grab whatever the sequence number is for that entry. This will become the new "Starting Sequence" of another call. If not, just keep whatever the last value set was, as its a continuation of a call in progress. Now, lets look at some simulated data to help better illustrate all the columns
Original data Values that will ultimately be built into...
HomeID Sequence EventID Stamp #NextCall
111 1 12 8:00:00 1 beginning of a new call
111 2 22 8:00:01 1 not a new "12" event, keep last value
111 3 30 8:05:00 1 call ended, phone back on hook
111 4 12 8:09:00 4 new call, use the sequence of THIS entry
111 5 22 8:09:01 4 same call
111 6 13 8:09:15 4 same call, but a hang up
111 7 30 8:09:16 4 same call, phone back on hook
111 8 12 8:15:30 8 new call, get sequence ID
111 9 22 8:15:31 8 same call...
111 10 30 8:37:15 8 same call ending...
Now, the query SHOULD create something like this
HomeID NextNewCall CallStartTime CallEndTime HadDTMFEntry WasAHangUp
111 1 8:00:00 8:05:00 1 0
111 4 8:09:00 8:09:16 1 1
111 8 8:15:30 8:37:15 1 0
As you can see, the #NextCall keeps all the sequential entries for a given call "Grouped" together so you don't have to just use greater than span information or less than... It is always going to follow a certain path of "events", so whatever is the one that started the call is the basis for the rest of the events until the next call is started, then THAT sequence is grabbed for THAT group call.
Yup, its a lot to grasp.. but hopefully now more digestible for you :)

Related

mysql order with many criterias on two tables

I'm trying to sort a visitor list after some different criteria and got stuck, as I can't figure out, how to do this.
I have a queue of people who check in first, and out of that the list is generated. The client is marked as showedUp, if he comes to the door (after called with his number on the list). If someone comes late, he must be at the end of the list. Another thing is, the list starts everytime with a different number.
Day 1 -> List from 1 to 160
Day 2 -> List from 33 to 160, 1 to 32
Day 3 -> List from 65 to 160, 1 to 64
If someone comes late, meaning the number after him is already called, he should be added to the end of the list, like 1 to 160, 10 was late, as 20 was already called, it should be 1 to 160, 10. If there is another starting number it should be 33 to 160, 1 to 32, 10. The criteria here is: if a placeNr after your number is already called (showedUp), than you be at the end of the list.
Tables
clients (id, name, placeNr)
visits (id, pid, checkInTime, showedUp, showedUpTime)
Select
SELECT clients.id AS id, visits.id AS visitId, clients.placeNr AS placeNr, clients.name AS name
FROM clients, visits
WHERE clients.id = visits.pid AND visits.checkInTime >= '1447286401' AND visits.checkInTime <= '1447372799'
ORDER BY clients.placeNr < '1', if(visits.showedUpTime < visits.checkInTime, clients.placeNr, 1), ttc.placeNr
So how do I get the late showers at the end of my list?
Thank you very much in advance!
If I follow your logic, you need to specify whether or not someone is late. The following is the structure that you want for this type of query. I think I've captured the rules in your question:
select v.id, v.id AS visitId, c.placeNr, c.name,
(case when v.showedUpTime >
(select min(v.checkInTime)
from visits v2 join
clients c2
on v2.pid = c2.id
where date(v2.showedUpTime) = date(v.showedUpTime) and
c2.placeNr > c.placeNr
)
then 1 else 0 end) as IsLate
from clients c join
visits v
on c.id = v.pid
order by date(v.showedUpTime),
isLate,
c.placeNr;

Generate time list from query with date range and multiple joins

Struggling to get my head round a complicated SQL query.
Here's a sqlfiddle with the tables/data http://sqlfiddle.com/#!2/7de65
Might make more sense if I explain what the tables are doing;
schedules is a list of train schedules, calling is a list of calling points for that schedule ordered as the train will pass them, activations are created when it's confirmed the train is going to run1 and a movement is created as the train moves over a specified calling point.
calling is associated with schedules via the calling.sid. activations are associated with schedules via activations.sid. movement is associated with activations via movement.activation, and with calling via movement.calling_id.
Now the actual problem;
I want to generate a list of trains active per minute. A train is considered active if
It has at least 1 movement associated with it's activation (I.E. has not left it's origin)
It does not have a movement associated with it's final calling point
It was activated less than 24 hours ago
A train should always be considered active if all of these criteria are met, so listed in the count.
With the data in the above sqlfiddle, a train leaves it's first calling point at 14:20 and arrives at it's last calling point at 15:04, it should be included in the count for every minute from 14:20-15:04. I'm wondering if someone could shed some light on how to do this. I wouldn't consider myself a SQL expert (probably why I'm struggling, I wouldn't actually consider myself vaguely competent but that's a different issue, or maybe the same, I'm not sure).
I've started going down this sort of line
SELECT
YEAR( activations.activated ),
MONTH( activations.activated ),
DAY( activations.activated ),
HOUR( activations.activated ),
MINUTE( activations.activated ),
count(activations.id)
FROM activations, movement, calling, schedules
WHERE activations.id = movement.activation AND movement.calling_id = calling.id AND schedules.id = activations.sid
GROUP BY DAYOFYEAR( activations.activated ) , HOUR( activations.activated ), MINUTE(activations.activated )
But I know that's wrong, because a train will only be listed once however long it's activated.
I also thought about doing it directly in Python with a loop for each minute of the specified period, and it sort of works like that but it's super slow (getting active trains at a minute resolution over 24 hours results in 1440 queries, not exactly optimized). So I'm thinking it ether has to be some clever grouping, or some sort of loop within SQL, but I have no idea how to do ether.
So if I ran the query for 14:18 through to 15:07 I would get something like
+-----------------+------------------+
| Timestamp | Active services |
+-----------------+------------------+
| 14:18 1/1/2014 | 0 |
| 14:19 1/1/2014 | 0 |
| 14:20 1/1/2014 | 1 |
| 14:21 1/1/2014 | 1 |
| 14:22 1/1/2014 | 1 |
[...
Identical record for every minute through to
...]
| 15:03 1/1/2014 | 1 |
| 15:04 1/1/2014 | 1 |
| 15:05 1/1/2014 | 0 |
| 15:06 1/1/2014 | 0 |
| 15:07 1/1/2014 | 0 |
+-----------------+------------------+
(Format of the time stamp is not important so long as I can parse it later)
In my head, I can see it sort of working like this (pseudocode)
while time is between report_start_date and report_end_date:
records = count(
activations where number of movements(
movement.actual < time
) > 0 //Number of movements created before current minute
and
movement.calling_id = calling_points(
actual < minute
).last.id does not exist //As of this minute doesn't have a movement for last calling point
and
activations.activated > now - 24 hours //Was activated less than 24 hours ago
)
result timestamp, records
time + 1 minute
I've pretty much got the records = count() bit sorted, it's just ether looping over or grouping by time I'm not sure about. I can group by the date of the first movement record but again the record will only show for the first minute. I want it to show for every minute it's active for.
Bonus points
I'm actually trying to implement this in SQLAlchemy (hence the tag), I'm trying to get the basics working in SQL before moving it into a SQLAlchemy query but if you can do it in SQL and SQLAlchemy/Python you'll get something, I'm not quite sure what yet and it may be hypothetical.
Before anyone who actually knows about this stuff criticizes me, an activation doesn't confirm a train will run, but it's close enough for my current purposes. My final query will exclude cancellations and stuff but I'm just trying to get the basics down first.
In order to generate some result for every possible minute, I would not rely on the fact that every possible minute is a value in some table in the database. For this reason I would actually create a "static" table in the database which would only store those timestamps, and we would construct a query starting from there. I have done the following:
CREATE TABLE "static_time" (
"yyyymmddhhmm" datetime NOT NULL,
PRIMARY KEY ("yyyymmddhhmm")
);
NOTE: for all the tests I used sqlite database, so you might need to change in some places to use corresponsing mysql constructs.
I also added all the data for the period of 2 days for testing. You should probably do the same for period from when you want to run the first analysis to some big year in the future (for example: 2050-12-31T23:59:00). I did this using sqlalchemy, but i am sure it would make sense to do this directly using some function or a loop:
class StaticTime(Base):
__tablename__ = 'static_time'
__table_args__ = ({'autoload': True, },)
# ...
def populate_static_time():
print "Adding static times"
sdt = datetime(2014, 1, 1)
edt = sdt + timedelta(days=2)
cdt = sdt
while cdt <= edt:
session.add(StaticTime(yyyymmddhhmm = cdt))
cdt += timedelta(minutes=1)
session.commit()
populate_static_time()
Further I assumed your SA model including defined relationships as per below:
# MODEL
class Schedule(Base):
__tablename__ = 'schedules'
__table_args__ = ({'autoload': True, },)
class Calling(Base):
__tablename__ = 'calling'
__table_args__ = ({'autoload': True, },)
class Activation(Base):
__tablename__ = 'activations'
__table_args__ = ({'autoload': True, },)
# relationships:
schedule = relationship("Schedule")
class Movement(Base):
__tablename__ = 'movement'
__table_args__ = ({'autoload': True, },)
# relationships:
# #note: use activation_rel as activation is column name
activation_rel = relationship("Activation", backref="movements")
Now, lets build the query:
# 0. start with all times and proper counting (columns in SELECT)
q = session.query(
StaticTime.yyyymmddhhmm.label("yyyymmddhhmm"),
func.count(Activation.id.distinct()).label("count"),
)
# 1. join on the trains which are active (or finished, which will be excluded later)
q = q.filter(Activation.movements.any(Movement.actual < StaticTime.yyyymmddhhmm))
# 2. join on the trains which are not finished (or no rows for those that did not)
# 2.a) subquery to get the "last" calling per sid
last_calling_sqry = (session.query(
Calling.sid.label("sid"),
func.max(Calling.id).label("max_calling_id"),
)
.group_by(Calling.sid)
).subquery("xxx")
# 2.b) subquery to find the movement for the "last" colling
train_done_at_sqry = (session.query(
Activation.id.label("activation_id"),
Movement.actual.label("arrived_time"),
)
.join(last_calling_sqry, Activation.sid == last_calling_sqry.c.sid)
.join(Movement, and_(
Movement.calling_id == last_calling_sqry.c.max_calling_id,
Movement.activation == Activation.id,
))
).subquery("yyy")
# 2.c) lets use it now
q = q.outerjoin(train_done_at_sqry,
train_done_at_sqry.c.activation_id == Activation.id,
)
# 2.d) only those that arrived "after" currently tested time
q = q.filter(train_done_at_sqry.c.arrived_time >= StaticTime.yyyymmddhhmm)
# 3. add filter to use only those trains that started in last 24 hours
# #note: do not need this in case when step-X is used as well as it filters
# #TODO: replace func.date(...) with MYSQL version
q = q.filter(Activation.activated >= func.date("now", "-1 days"))
# 4. filter and group by
q = q.group_by(StaticTime.yyyymmddhhmm)
q = q.order_by(StaticTime.yyyymmddhhmm)
# #NOTE: at this point "q" will return only those minutes which have at least 1 active train
# X. FINALLY: WRAP AGAIN TO HAVE ALL MINUTES (also those with no active trains)
sub = q.subquery("sub")
w = session.query(
StaticTime.yyyymmddhhmm.label("Timestamp"),
func.ifnull(sub.c.count, 0).label("Active Services")
)
w = w.outerjoin(sub, sub.c.yyyymmddhhmm == StaticTime.yyyymmddhhmm)
# #TODO: replace func.date(...) with MYSQL version
w = w.filter(Activation.activated >= func.date("now", "-1 days"))
for a in w:
print a
This is a rather complicated query, and given only the data that you provided, it is very difficult to test different scenarios. But hopefully you would be able to compare to your current results and the code will give you some hints on how this could be done. Also, i might have joined on wrong columns in some places (actual vs planned). Again, this will probably not work on the mysql (I do not have it and do not know it very well).
Bonus (reversed): the SQL statement generated by the w query for sqlite. You might find it easier to start with raw SQL and gradually move towards sqlalchemy.
SELECT static_time.yyyymmddhhmm AS "Timestamp", ifnull(sub.count, ?) AS "Active Services"
FROM static_time
LEFT OUTER JOIN (
SELECT static_time.yyyymmddhhmm AS yyyymmddhhmm, count(DISTINCT activations.id) AS count
FROM activations, static_time
LEFT OUTER JOIN (
SELECT activations.id AS activation_id, movement.actual AS arrived_time
FROM activations
JOIN (
SELECT calling.sid AS sid, max(calling.id) AS max_calling_id
FROM calling
GROUP BY calling.sid
) AS xxx
ON activations.sid = xxx.sid
JOIN movement
ON movement.calling_id = xxx.max_calling_id AND movement.activation = activations.id
) AS yyy
ON yyy.activation_id = activations.id
WHERE (EXISTS (SELECT 1
FROM movement
WHERE activations.id = movement.activation AND movement.actual < static_time.yyyymmddhhmm)
)
AND yyy.arrived_time >= static_time.yyyymmddhhmm
GROUP BY static_time.yyyymmddhhmm
ORDER BY static_time.yyyymmddhhmm
) AS sub
ON sub.yyyymmddhhmm = static_time.yyyymmddhhmm
WHERE static_time.yyyymmddhhmm >= ? AND static_time.yyyymmddhhmm <= ?
PARAMS: (0, '2014-01-01 14:15:00.000000', '2014-01-01 15:10:00.000000')

Mysql query to skip rows and check for status changes

I'm building a mysql query but I'm stuck... (I'm logging each minute)
I have 3 tables. Logs, log_field, log_value.
logs -> id, create_time
log_value -> id, log_id,log_field_id,value
log_field -> id, name (one on the entries is status and username)
The values for status can be online,offline and idle...
What I would like to see is from my query is:
When in my logs someone changes from status, I want a row with create_time, username, status.
So for a given user, I want my query to skip rows until a new status appears...
And I need to be able to put a time interval in which status changes are ignored.
Can someone please help ?
Although you have nothing to differentiate an actual "User" (such as by user ID) listed in your post, and what happens if you have two "John Smith" names.
First, an introduction to MySQL #variables. You can think of them as an inline program running while the query is processing rows. You create variables, then change them as each row gets processed, IN THE SAME order as the := assignment in the field selection occurs which is critical. I'll cover that shortly.
Fist an initial premise. You have a field value table of all possible fields that can/do get logged. Of which, two of them exist... one is for the user's name, another for the status you are looking a log changed. I don't know what those internal "ID" numbers are, but they would have to be fixed values per your existing table. In my scenario, I am assuming that field ID = 1 is for the User's Name, and field ID 2 = status column... Otherwise, you would need two more joins to get the field table just to confirm which field was the one you wanted. Obviously my "ID" field values will not match your production tables, so please change those accordingly.
Here's the query...
select FinalAlias.*
from (
select
PQ.*,
if( #lastUser = PQ.LogUser, 1, 0 ) as SameUser,
#lastTime := if( #lastUser = PQ.LogUser, #lastTime, #ignoreTime ) as lastChange,
if( PQ.create_time > #lastTime + interval 20 minute, 1, 0 ) as BeyondInterval,
#lastTime := PQ.create_time as chgTime,
#lastUser := PQ.LogUser as chgUser
from
( select
ByStatus.id,
l.create_time,
ByStatus.Value LogStatus,
ByUser.Value LogUser
from
log_value as ByStatus
join logs l
on ByStatus.log_id = l.id
join log_value as ByUser
on ByStatus.log_id = ByUser.log_id
AND ByUser.log_field_id = 1
where
ByStatus.log_field_id = 2
order by
ByUser.Value,
l.create_time ) PQ,
( select #lastUser := '',
#lastTime := now(),
#ignoreTime := now() ) sqlvars
) FinalAlias
where
SameUser = 1
and BeyondInterval = 1
Now, what's going on. The inner-most query (result alias PQ representing "PreQuery") is just asking for all log values where the field_id = 2 (status column) exists. From that log entry, go to the log table for it's creation time... while we're at it, join AGAIN to the log value table on the same log ID, but this time also look for field_id = 1 so we can get the user name.
Once that is done, get the log ID, Creation time, Status Value and Who it was for all pre-sorted on a per-user basis and sequentially time oriented. This is the critical step. The data must be pre-organized by user/time to compare the "last" time for a given user to the "next" time their log status changed.
Now, the MySQL #variables. Join the prequery to another select of #variables which is given an "sqlvars" query alias. This will pre-initialize the variables fo #lastUser, #lastTime and #ignoreTime. Now, look at what I'm doing in the field list via section
if( #lastUser = PQ.LogUser, 1, 0 ) as SameUser,
#lastTime := if( #lastUser = PQ.LogUser, #lastTime, #ignoreTime ) as lastChange,
if( PQ.create_time > #lastTime + interval 20 minute, 1, 0 ) as BeyondInterval,
#lastTime := PQ.create_time as chgTime,
#lastUser := PQ.LogUser as chgUser
This is like doing the following pseudo code in a loop for every record (which is already sequentially ordered by same person and their respective log time
FOR EACH ROW IN RESULT SET
Set a flag "SameUser" = 1 if the value of the #lastUser is the same
as the current person record we are looking at
if the last user is the same as the previous record
use the #lastTime field as the "lastChange" column
else
use the #ignore field as the last change column
Now, build another flag based on the current record create time
and whatever the #lastTime value is based on a 20 minute interval.
set it to 1 if AT LEAST the 20 minute interval has been meet.
Now the key to the cycling the next record.
force the #lastTime = current record create_time
force the #lastUser = current user
END FOR LOOP
So, if you have the following as a result of the prequery... (leaving date portion off)
create status user sameuser lastchange 20minFlag carry to next row compare
07:34 online Bill 0 09:05 0 07:34 Bill
07:52 idle Bill 1 07:34 0 07:52 Bill
08:16 online Bill 1 07:52 1 08:16 Bill
07:44 online Mark 0 09:05 0 07:44 Mark
07:37 idle Monica 0 09:05 0 07:37 Monica
08:03 online Monica 1 07:37 1 08:03 Monica
Notice first record for Bill. The flag same user = 0 since there was nobody before him. The last change was 9:05 (via the NOW() when creating the sqlvars variables), but then look at the "carry to next row compare". This is setting the #lastTime and #lastUser after the current row was done being compared as needed.
Next row for Bill. It sees he is same as last user previous row, so the SameUser flag is set to 1. We now know that we have a good "Last Time" to compare against the current record "Create Time". So, from 7:34 to 7:52 is 18 minutes and LESS than our 20 minute interval so the 20 minute flag is set to 0. Now, we retain the current 7:52 and Bill for third row.
Third row for Bill. Still Same User (flag=1), last change of 7:52 compared to now 8:16 and we have 24 minutes... So the 20 minute flag = 1. Retain 8:16 and Bill for next row.
First row for Mark. Same User = 0 since last user was Bill. Uses same 9:05 ignore time and don't care about 20 min flag, but now save 7:44 and Mark for next row compare.
On to Monica. Different than Mark, so SameUser = 0, etc to finish similar to Bill.
So, now we have all the pieces and rows considered. Now, take all these and wrap them up as the "FinalAlias" of the query and all we do is apply a WHERE clause for "SameUser = 1" AND "20 Minute Flag" has been reached.
You can strip down the final column list as needed, and remove the where clause to look at results, but be sure to add an outer ORDER BY clause for name/create_time to see similar pattern as I have here.

SQL - Find all down times and the lengths of the downtimes from MySQL data (set of rows with time stamps and status messages)

I have started monitoring my ISP's downtimes with a looping PHP script which checks the connection automatically every 5 seconds and stores the result in MySQL database. The scripts checks if it's able to reach a couple of remote websites and logs the result. The time and status of the check are always stored in the database.
The structure of the table is following:
id (auto increment)
time (time stamp)
status (varchar)
Now to my issue.
I have the data, but I don't know how to use it to achieve the result I would like to get. Basically I would like to find all the periods of time when the connection was down and for how long the connection was down.
For instance if we have 10 rows with following data
0 | 2012-07-24 22:23:00 | up
1 | 2012-07-24 22:23:05 | up
2 | 2012-07-24 22:23:10 | down
3 | 2012-07-24 22:23:16 | down
4 | 2012-07-24 22:23:21 | up
5 | 2012-07-24 22:23:26 | down
6 | 2012-07-24 22:23:32 | down
7 | 2012-07-24 22:23:37 | up
8 | 2012-07-24 22:23:42 | up
9 | 2012-07-24 22:23:47 | up
the query should return the periods (from 22:23:10 to 22:23:21, and from 22:23:26 to 22:23:37). So the query should find always the time between the first time the connection goes down, and the first time the connection is up again.
One method I thought could work was finding all the rows where the connection goes down or up, but how could I find these rows? And is there some better solution than this?
I really don't know what the query should look like, so the help would be highly appreciated.
Thank you, regards Lassi
Here's one approach.
Start by getting the status rows in order by timestamp (inline view aliased as s). Then use MySQL user variables to keep the values from previous rows, as you process through each row.
What we're really looking for is an 'up' status that immediately follows a sequence of 'down' status. And when we find that row with the 'up' status, what we really need is the earliest timestamp from the preceding series of 'down' status.
So, something like this will work:
SELECT d.start_down
, d.ended_down
FROM (SELECT #i := #i + 1 AS i
, #start := IF(s.status = 'down' AND (#status = 'up' OR #i = 1), s.time, #start) AS start_down
, #ended := IF(s.status = 'up' AND #status = 'down', s.time, NULL) AS ended_down
, #status := s.status
FROM (SELECT t.time
, t.status
FROM mydata t
WHERE t.status IN ('up','down')
ORDER BY t.time ASC, t.status ASC
) s
JOIN (SELECT #i := 0, #status := 'up', #ended := NULL, #start := NULL) i
) d
WHERE d.start_down IS NOT NULL
AND d.ended_down IS NOT NULL
This works for the particular data set you show.
What this doesn't handle (what it doesn't return) is a 'down' period that is not yet ended, that is, a sequence of 'down' status with no following 'up' status.
To avoid a filesort operation to return the rows in order, you'll want a covering index on (time,status). This query will generate a temporary (MyISAM) table to materialize the inline view aliased as d.
NOTE: To understand what this query is doing, peel off that outermost query, and run just the query for the inline view aliased as d (you can add s.time to the select list.)
This query is getting every row with an 'up' or 'down' status. The "trick" is that it is assigning both a "start" and "end" time (marking a down period) on only the rows that end a 'down' period. (That is, the first row with an 'up' status following rows with a 'down' status.) This is where the real work is done, the outermost query just filters out all the "extra" rows in this resultset (that we don't need.)
SELECT #i := #i + 1 AS i
, #start := IF(s.status = 'down' AND (#status = 'up' OR #i = 1), s.time, #start) AS start_down
, #ended := IF(s.status = 'up' AND #status = 'down', s.time, NULL) AS ended_down
, #status := s.status
, s.time
FROM (SELECT t.time
, t.status
FROM mydata t
WHERE t.status IN ('up','down')
ORDER BY t.time ASC, t.status ASC
) s
JOIN (SELECT #i := 0, #status := 'up', #ended := NULL, #start := NULL) i
The purpose of inline view aliased as s is to get the rows ordered by timestamp value, so we can process them in sequence. The inline view aliased as i is just there so we can initialize some user variables at the start of the query.
If we were running on Oracle or SQL Server, we could make use of "analytic functions" or "ranking functions" (as they are named, respectively.) MySQL doesn't provide anything like that, so we have to "roll our own".
I don't really have time to adapt this to work for your setup right now, but I'm doing pretty much the same thing on a web page to monitor when a computer was turned off, and when it was turned back on, then calculating the total time it was on for...
I also don't know if you have access to PHP, if not completely ignore this. If you do, you might be able to adapt something like this:
$lasttype="OFF";
$ontime=0;
$totalontime=0;
$query2 = " SELECT
log_unixtime,
status
FROM somefaketablename
ORDER BY
log_unixtime asc
;";
$result2=mysql_query($query2);
while($row2=mysql_fetch_array($result2)){
if($lasttype=="OFF" && $row2['status']=="ON"){
$ontime = $row2['log_unixtime'];
}elseif($lasttype=="ON" && $row2['status']=="OFF"){
$thisblockontime=$row2['log_unixtime']-$ontime;
$totalontime+=($thisblockontime);
}
$lasttype=$row2['status'];
}
Basically, you start out with a fake row that says the computer is off, then loop through each real row.
IF the computer was off, but is now on, set a variable to see when it was turned on, then keep looping...
Keep looping until the computer was ON, but is now OFF. When that happens, subtract the previously-stored time it was turned on from the current row's time. That shows how long it was on for, for that group of "ON's".
Like I said, you'll have to adapt that pretty heavily to get it to do what you want, but if you replace "computer on/off" with "connection up/down", it's essentially the same idea...
One thing that makes this work is that I'm storing dates as integers, as a unix timestamp. So you might have to convert your dates so the subtraction works.
I'm unsure if this works (if not just comment)
It does: Select rows only if the row with an id 1 smaller than the current id has a different status (therefore selecting the first entry of any perion) and determinate the end Time through the >= and the same status.
SELECT ou.id AS outerId,
ou.timeColumn AS currentRowTime,
ou.status AS currentRowStatus,
( SELECT max(time)
FROM statusTable
WHERE time >= ou.timeColumn AND status = ou.status) AS endTime
FROM statusTable ou
WHERE ou.status !=
(SELECT status
FROM statusTable
WHERE id = (ou.id -1))

Use mysql to work out in and out times of vehicle at customer, multiple stops and entries for each customer

I have a mysql table that contains data as per the screenshot below.
My requirement is to generate a mysql query that will show me the in and out time for each customer.
the issue I have is that I cannot use min or max as the vehicle might have visited the same customer two or three times within a period.
So the output I am looking for is:
Vehicle: RB10
Customer: Hulamin
In: 10:19
out: 10:35
Time Taken: 16 min
In: 11:14
out: 11:29
Time Taken: 15 min
ave time taken: 15.5 min
and the same for each of the other sites and vehicles as required.
How do I tell mysql to take the smallest in time before the corresponding out time and report?
Many thanks for the assistance.
You could do it using SQL Variables to help control when the address changes, even IF they occur multiple times. Without having MySQL readily available, I would approach something like below. Start with an inner query that stamps a "GroupSeq" based on a change in either vehicle and/or address. Keep the order sequential by date/time. After each test against the #lastGroup is either left alone, or added 1 to the sequence, THEN update the #lastAddress and #lastVehicle as basis for the NEXT record being selected into the result set for comparison.
Per your example, the results of each customer would be (all these same vehicle, so not duplicating display of that column)
Address GroupSeq
Hulamin 1
SACD 2
UL 3
NP 4
Hulamin 5
SACD 6
After that, you can then properly do your MIN/MAX based on the GroupSeq assigned.
select
PreQuery.Vehicle,
PreQuery.Address,
PreQuery.GroupSeq,
MIN( PreQuery.`DateTime` ) as InTime,
MAX( PreQuery.`DateTime` ) as OutTime
from
( select
YT.Vehicle,
YT.Address,
YT.`DateTime`,
YT.Direction,
#lastGroup := #lastGroup + if( #lastAddress = YT.Address
AND #lastVehicle = YT.Vehicle, 0, 1 ) as GroupSeq,
#lastVehicle := YT.Vehicle as justVarVehicleChange,
#lastAddress := YT.Address as justVarAddressChange
from
YourTable YT,
( select #lastVehicle := '',
#lastAddress := '',
#lastGroup := 0 ) SQLVars
order by
YT.`DateTime` ) PreQuery
Group By
PreQuery.Vehicle,
PreQuery.Address,
PreQuery.GroupSeq
The above SHOULD result in something like
Vehicle Address GroupSeq InTime OutTime
RB10 Hulamin 1 10:19 10:35
RB10 SACD 2 10:37 10:40
RB10 UL 3 10:41 11:06
RB10 NP 4 11:07 11:14
RB10 Hulamin 5 11:14 11:28
RB10 SACD 6 11:29 12:21
Now, the above sample does not actually compute the total time taken per in-out, nor the average per vehicle/customer average time for what appears to be processing, but you can add those computations after you understand and get this part.
Please note, this is based on natural order as appears by date/time. It looks like one transaction from beginning to end can have many "IN"s, but ALWAYS ends with an "OUT" before proceeding to the next customer address. If this is an incorrect assumption, modifications would obviously need to be made.