SQL select from date ranges - mysql

I have a table of RADIUS session records that includes start time, stop time, and MAC address. I have a requirement to collect a list of users that were online during two time ranges. I believe I'm getting a list of all users online during the time ranges with the following query:
SELECT s_session_id, s_start_time, s_stop_time, s_calling_station_id
FROM sessions
WHERE (
("2015-10-01 08:00:00" BETWEEN s_start_time AND s_stop_time OR "2015-10-01 08:30:00" BETWEEN s_start_time AND s_stop_time)
OR
("2015-10-01 12:00:00" BETWEEN s_start_time AND s_stop_time OR "2015-10-01 12:30:00" BETWEEN s_start_time AND s_stop_time)
)
ORDER BY s_start_time;
But the next step, isolating details for only those users online during both periods, is eluding me. The closest I get is adding
GROUP BY s_calling_station_id HAVING COUNT(s_calling_station_id) > 1
but that doesn't provide me with all the session details.
Fiddle is here: http://sqlfiddle.com/#!9/1df471/1
Thanks for any assistance!

Use a self-join. Use column aliases so you can access the columns from each session with different names.
SELECT s1.s_calling_station_id,
s1.s_session_id AS s1_session_id, s1.s_start_time AS s1_start_time, s1.s_stop_time AS s1_stop_time,
s2.s_session_id AS s2_session_id, s2.s_start_time AS s2_start_time, s2.s_stop_time AS s2_stop_time
FROM sessions AS s1
JOIN sessions AS s2
ON s1.s_calling_station_id = s2.s_calling_station_id
AND s1.s_session_id != s2.s_session_id
WHERE ("2015-10-01 08:00:00" BETWEEN s1.s_start_time AND s1.s_stop_time OR "2015-10-01 08:30:00" BETWEEN s1.s_start_time AND s1.s_stop_time)
AND
("2015-10-01 12:00:00" BETWEEN s2.s_start_time AND s2.s_stop_time OR "2015-10-01 12:30:00" BETWEEN s2.s_start_time AND s2.s_stop_time)
DEMO

Although this question already has an accepted answer, I'd like to add this one (it avoids duplicates and pulls the data from the sessions table of all sessions that fulfill the condition):
First, create a table that holds the filtered data (the MAC addresses that have connections on both intervals:
create table temp_sessions
select s1.s_calling_station_id
, if(#t1_1 between s1.s_start_time and s1.s_stop_time or #t1_2 between s1.s_start_time and s1.s_stop_time, s1.s_session_id, null) as s_1
, if(#t2_1 between s2.s_start_time and s2.s_stop_time or #t2_2 between s2.s_start_time and s2.s_stop_time, s2.s_session_id, null) as s_2
from -- I use user variables because it will make easier to modify the time intervals if needed
(select #t1_1 := '2015-10-01 08:00:00', #t1_2 := '2015-10-01 08:30:00'
, #t2_1 := '2015-10-01 12:00:00', #t2_2 := '2015-10-01 12:30:00') as init
, sessions as s1
inner join sessions as s2
on s1.s_calling_station_id = s2.s_calling_station_id
and s1.s_session_id != s2.s_session_id
having s_1 is not null and s_2 is not null;
And now, simply use this table to get what you need:
select sessions.*
from sessions
inner join (
select s_calling_station_id, s_1 as s_session_id
from temp_sessions
union
select s_calling_station_id, s_2 as s_session_id
from temp_sessions
) as a using (s_calling_station_id, s_session_id);
Here's the SQL fiddle

Related

Optimizing Parameterized MySQL Queries

I have a query that has a number of parameters which if I run from in MySQLWorkbench takes around a second to run.
If I take this query and get rid of the parameters and instead substitute the values into the query then it takes about 22 seconds to run, same as If I convert this query to a parameterized stored procedure and run it (it then takes about 22 seconds).
I've enabled profiling on MySQL and I can see a few things there. For example, it shows the number of rows examined and there's an order of difference (20,000 to 400,000) which I assume is the reason for the 20x increase in processing time.
The other difference in the profile is that the parameterized query sent from MySQLWorkbench still has the parameters in (e.g. where limit < #lim) while the sproc the values have been set (where limit < 300).
I've tried this a number of different ways, I'm using JetBrains's DataGrip (as well as MySQLWorkbench) and that works like MySQLWorkbench (sends through the # parameters), I've tried executing the queries and the sproc from MySQLWorkbench, DataGrip, Java (JDBC) and .Net. I've also tried prepared statements in Java but I can't get anywhere near the performance of sending the 'raw' SQL to MySQL.
I feel like I'm missing something obvious here but I don't know what it is.
The query is relatively complex, it has a CTE a couple of sub-selects and a couple of joins, but as I said it runs quickly straight from MySQL.
My main question is why the query is 20x faster in one format than another.
Does the way the query is sent to MySQL have anything to do with this (the '#' values sent through and can I replicate this in a stored procedure?
Updated 1st Jan
Thanks for the comments, I didn't post the query originally as I'm more interested in the general concepts around the use of variables/parameters and how I could take advantage of that (or not)
Here is the original query:
with tmp_bat as (select bd.MatchId,
bd.matchtype,
bd.playerid,
bd.teamid,
bd.opponentsid,
bd.inningsnumber,
bd.dismissal,
bd.dismissaltype,
bd.bowlerid,
bd.fielderid,
bd.score,
bd.position,
bd.notout,
bd.balls,
bd.minutes,
bd.fours,
bd.sixes,
bd.hundred,
bd.fifty,
bd.duck,
bd.captain,
bd.wicketkeeper,
m.hometeamid,
m.awayteamid,
m.matchdesignator,
m.matchtitle,
m.location,
m.tossteamid,
m.resultstring,
m.whowonid,
m.howmuch,
m.victorytype,
m.duration,
m.ballsperover,
m.daynight,
m.LocationId
from (select *
from battingdetails
where matchid in
(select id
from matches
where id in (select matchid from battingdetails)
and matchtype = #match_type
)) as bd
join matches m on m.id = bd.matchid
join extramatchdetails emd1
on emd1.MatchId = m.Id
and emd1.TeamId = bd.TeamId
join extramatchdetails emd2
on emd2.MatchId = m.Id
and emd2.TeamId = bd.TeamId
)
select players.fullname name,
teams.teams team,
'' opponents,
players.sortnamepart,
innings.matches,
innings.innings,
innings.notouts,
innings.runs,
HS.score highestscore,
HS.NotOut,
CAST(TRUNCATE(innings.runs / (CAST((Innings.Innings - innings.notOuts) AS DECIMAL)),
2) AS DECIMAL(7, 2)) 'Avg',
innings.hundreds,
innings.fifties,
innings.ducks,
innings.fours,
innings.sixes,
innings.balls,
CONCAT(grounds.CountryName, ' - ', grounds.KnownAs) Ground,
'' Year,
'' CountryName
from (select count(case when inningsnumber = 1 then 1 end) matches,
count(case when dismissaltype != 11 and dismissaltype != 14 then 1 end) innings,
LocationId,
playerid,
MatchType,
SUM(score) runs,
SUM(notout) notouts,
SUM(hundred) Hundreds,
SUM(fifty) Fifties,
SUM(duck) Ducks,
SUM(fours) Fours,
SUM(sixes) Sixes,
SUM(balls) Balls
from tmp_bat
group by MatchType, playerid, LocationId) as innings
JOIN players ON players.id = innings.playerid
join grounds on Grounds.GroundId = LocationId and grounds.MatchType = innings.MatchType
join
(select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') as teams
from playersteams pt
join teams t on pt.teamid = t.id
group by pt.playerid, t.matchtype)
as teams on teams.playerid = innings.playerid and teams.matchtype = innings.MatchType
JOIN
(SELECT playerid,
LocationId,
MAX(Score) Score,
MAX(NotOut) NotOut
FROM (SELECT battingdetails.playerid,
battingdetails.score,
battingdetails.notout,
battingdetails.LocationId
FROM tmp_bat as battingdetails
JOIN (SELECT battingdetails.playerid,
battingdetails.LocationId,
MAX(battingdetails.Score) AS score
FROM tmp_bat as battingdetails
GROUP BY battingdetails.playerid,
battingdetails.LocationId,
battingdetails.playerid) AS maxscore
ON battingdetails.score = maxscore.score
AND battingdetails.playerid = maxscore.playerid
AND battingdetails.LocationId = maxscore.LocationId ) AS internal
GROUP BY internal.playerid, internal.LocationId) AS HS
ON HS.playerid = innings.playerid and hs.LocationId = innings.LocationId
where innings.runs >= #runs_limit
order by runs desc, KnownAs, SortNamePart
limit 0, 300;
Wherever you see '#match_type' then I substitute that for a value ('t'). This query takes ~1.1 secs to run. The query with the hard coded values rather than the variables down to ~3.5 secs (see the other note below). The EXPLAIN for this query gives this:
1,PRIMARY,<derived7>,,ALL,,,,,219291,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,teams.playerid,1,100,
1,PRIMARY,<derived2>,,ref,<auto_key3>,<auto_key3>,26,"teams.playerid,teams.matchtype",11,100,Using where
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,innings.LocationId,1,10,Using where
1,PRIMARY,<derived8>,,ref,<auto_key0>,<auto_key0>,8,"teams.playerid,innings.LocationId",169,100,
8,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
8,DERIVED,<derived14>,,ref,<auto_key0>,<auto_key0>,13,"battingdetails.PlayerId,battingdetails.LocationId,battingdetails.Score",10,100,Using index
14,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
3,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where
3,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
3,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
3,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
3,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
3,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
and the EXPLAIN for the query with the hardcoded values looks like this:
1,PRIMARY,<derived8>,,ALL,,,,,20097,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,HS.PlayerId,1,100,
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,HS.LocationId,1,100,Using where
1,PRIMARY,<derived2>,,ref,<auto_key0>,<auto_key0>,30,"HS.LocationId,HS.PlayerId,grounds.MatchType",17,100,Using where
1,PRIMARY,<derived7>,,ref,<auto_key0>,<auto_key0>,46,"HS.PlayerId,innings.MatchType",10,100,Using where
8,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
8,DERIVED,m,,eq_ref,"PRIMARY,LocationId",PRIMARY,4,matches.Id,1,100,
8,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
8,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
8,DERIVED,<derived14>,,ref,<auto_key2>,<auto_key2>,4,m.LocationId,17,100,
8,DERIVED,battingdetails,,ref,"PlayerId,TeamId,Score,MatchId,match_team",MatchId,8,"matches.Id,maxscore.PlayerId",1,3.56,Using where
8,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
14,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
14,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
14,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
14,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
14,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
14,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
2,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
2,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
2,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
2,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
2,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
Pointers as to ways to improve my SQL are always welcome (I'm definitely not a database person), but I''d still like to understand whether I can use the SQL with the variables from code and why that improves the performance by so much
Update 2 1st Jan
AAArrrggghhh. My machine rebooted overnight and now the queries are generally running much quicker. It's still 1 sec vs 3 secs but the 20 times slowdown does seem to have disappeared
In your WITH construct, are you overthinking your select in ( select in ( select in ))) ... overstating what could just be simplified to the with Innings I have in my solution.
Also, you were joining to the extraMatchDetails TWICE, but joined on the same conditions on match and team, but never utliized either of those tables in the "WITH CTE" rendering that component useless, doesn't it? However, the MATCH table has homeTeamID and AwayTeamID which is what I THINK your actual intent was
Also, your WITH CTE is pulling many columns not needed or used in subsequent return such as Captain, WicketKeeper.
So, I have restructured... pre-query the batting details once up front and summarized, then you should be able to join off that.
Hopefully this MIGHT be a better fit, function and performance for your needs.
with innings as
(
select
bd.matchId,
bd.matchtype,
bd.playerid,
m.locationId,
count(case when bd.inningsnumber = 1 then 1 end) matches,
count(case when bd.dismissaltype in ( 11, 14 ) then 0 else 1 end) innings,
SUM(bd.score) runs,
SUM(bd.notout) notouts,
SUM(bd.hundred) Hundreds,
SUM(bd.fifty) Fifties,
SUM(bd.duck) Ducks,
SUM(bd.fours) Fours,
SUM(bd.sixes) Sixes,
SUM(bd.balls) Balls
from
battingDetails bd
join Match m
on bd.MatchID = m.MatchID
where
matchtype = #match_type
group by
bd.matchId,
bd.matchType,
bd.playerid,
m.locationId
)
select
p.fullname playerFullName,
p.sortnamepart,
CONCAT(g.CountryName, ' - ', g.KnownAs) Ground,
t.team,
i.matches,
i.innings,
i.runs,
i.notouts,
i.hundreds,
i.fifties,
i.ducks,
i.fours,
i.sixes,
i.balls,
CAST( TRUNCATE( i.runs / (CAST((i.Innings - i.notOuts) AS DECIMAL)), 2) AS DECIMAL(7, 2)) 'Avg',
hs.maxScore,
hs.maxNotOut,
'' opponents,
'' Year,
'' CountryName
from
innings i
JOIN players p
ON i.playerid = p.id
join grounds g
on i.locationId = g.GroundId
and i.matchType = g.matchType
join
(select
pt.playerid,
t.matchtype,
GROUP_CONCAT(t.name SEPARATOR ', ') team
from
playersteams pt
join teams t
on pt.teamid = t.id
group by
pt.playerid,
t.matchtype) as t
on i.playerid = t.playerid
and i.MatchType = t.matchtype
join
( select
i2.playerid,
i2.locationid,
max( i2.score ) maxScore,
max( i2.notOut ) maxNotOut
from
innings i2
group by
i2.playerid,
i2.LocationId ) HS
on i.playerid = HS.playerid
AND i.locationid = HS.locationid
FROM
where
i.runs >= #runs_limit
order by
i.runs desc,
g.KnownAs,
p.SortNamePart
limit
0, 300;
Now, I know that you stated that after the server reboot, performance is better, but really, what you DO have appears to really have overbloated queries.
Not sure this is the correct answer but I thought I'd post this in case other people have the same issue.
The issue seems to be the use of CTEs in a stored procedure. I have a query that creates a CTE and then uses that CTE 8 times. If I run this query using interpolated variables it takes about 0.8 sec, if I turn it into a stored procedure and use the stored procedure parameters then it takes about to a minute (between 45 and 63 seconds) to run!
I've found a couple of ways of fixing this, one is to use multiple temporary tables (8 in this case) as MySQL cannot re-use a temp table in a query. This gets the query time right down but just doesn't fell like a maintainable or scalable solution. The other fix is to leave the variables in place and assign them from the stored procedure parameters, this also has no real performance issues. So my sproc looks like this:
create procedure bowling_individual_career_records_by_year_for_team_vs_opponent(IN team_id INT,
IN opponents_id INT)
begin
set #team_id = team_id;
set #opponents_id = opponents_id;
# use these variables in the SQL below
...
end
Not sure this is the best solution but it works for me and keeps the structure of the SQL the same as it was previously.

SQL : Geeting Order's status timestamp in a nested query

I'm working on a Order Table which has all the details regarding the order's that were allocated. The sample DB example is
Order ID Order Status Action Date
23424 ALC 1571467792094280
23424 PIK 1571467792999990
23424 PAK 1571469792999990
23424 SHP 1579967792999990
33755 ALC 1581467792238640
33755 PIK 1581467792238640
33755 PAK 1581467792238640
33755 SHP 1581467792238640
In the table I have order ID , status, action_date (the action dates updated when ever there is an update on order status against the update timestamp, the action_date is unix time)
I'm trying to write a query that can provide me the Order ID, ALC_AT, PIK_AT, PAK_AT, SHP_AT
Basically all the timestamp updates against a Order ID within one row, I know it can be done via Nested Query but, I'm unable to figure how how to do it.
Any help would be highly appreciated.
Edit (As asked to provide the sample result ) :
Order ID Order Status ALC_AT PIK_AT PAK_AT SHP_AT
23424 SHP 1571467792094280 1571467792999990 1571469792999990 1579967792999990
I am not sure how it is done in mysql. But below describes how it will be done in Oracle.
You can searh more for PIVOT in mysql to help you in the same.
select *
from (select order_id,
status,
action_date
from order)
pivot (max(status)
for app_id in ( 'ALC' as 'ALC_AT', 'PIK' as 'PIK_AT', 'PAK' as 'PAK_AT', 'SHP' as 'SHP_AT'))
Hope this will help you.
EDIT for mysql:
select *
from (select "order.order_number",
"shipment.status",
from_unixtime("action_date"/1000000) as "action_date"
from order_table
where "order.order_number" = '2019-10-19-N2-6411')
pivot (max("action_date")
for "shipment_status" in ( 'ALC' AS 'ALC_AT', 'PIK' AS 'PIK_AT', 'PAK'
AS 'PAK_AT', 'SHP' AS 'SHP_AT'))

Get the No of Pending, Accepted and Rejected Users from two tables

I have two tables namely register and expressinterest
Where register table contains (user related columns) some are -
i) matri_id (unique but not primary)
ii) mobile
iii) email
iv) last_login
and expressinterest table contains data related to the matches details where columns are namely
i) ei_sender
ii) ei_receiver
iii) ei_response
iv) ei_sent_date
I am comparing the matri_id of register table to ei_sender and ei_receiver of expressinterest table, where one user can send requests to another users and he can receive request from another users.
I have to get the count of Pending, Accepted and Rejected status of all the users present in the register table, but when I am running the query it's running very slow, It takes around 45-60 seconds to fetch only 5000 rows in which the data is not proper (a single ID is coming in 3 rows like Accepted in one row, Rejected in one row and Pending in one row), But I want all the counts to come in a single row like this
r.matri_id | r._email | r.mobile | pending_count | accepted_ count | rejected_count| r.last_login
Some queries which I had tried so far are
select r.matri_id, r.email, r.mobile, r.last_login, e.receiver_response, count(e.receiver_response), e.ei_sender, e.ei_receiver from register r, expressinterest e where r.matri_id = e.ei_sender or r.matri_id = e.ei_receiver GROUP BY e.receiver_response, r.matri_id ORDER BY r.last_login DESC
This is what I want but its taking 5-6 seconds to execute
select matri_id, email, mobile, last_login, (select count(ei_sender) from expressinterest where ei_receiver=matri_id and receiver_response = 'Pending') AS pending_count_mine,
(select count(ei_sender) from expressinterest where ei_sender=matri_id and receiver_response = 'Accepted') AS accepted_count,
(select count(ei_sender) from expressinterest where ei_sender=matri_id and receiver_response = 'Rejected') AS rejected_count FROM register ORDER BY last_login DESC
Thanks
You can replace the multiple correlated subqueries with a single subquery and a left join:
select r.matri_id,
r.email,
r.mobile,
r.last_login,
t.accepted_count,
t.rejected_count,
t.pending_count
from register r
left join (
select ei_sender,
sum(receiver_response = 'Accepted') as accepted_count,
sum(receiver_response = 'Rejected') as rejected_count,
sum(receiver_response = 'Pending') as pending_count
from expressinterest
where receiver_response in ('Rejected', 'Accepted', 'Pending')
group by ei_sender
) t on r.matri_id = t.ei_sender;

SQL query not finding correct answer

I am trying to figure out website visits. Every visit within 30 minutes should count as one visit for that user.
My table looks like this
TimeUser, Userid, OrderID
10/7/2013 14:37:14 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/7/2013 14:39:59 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/7/2013 14:40:35 _26Tf-0PjaS0dpiZXB61Rg 151078706
10/11/2013 0:09:23 _2MrGz4L_d5AF3UHpP-oJQ 151078706
10/2/2013 20:55:05 _4Pb2wEwiQomUny_XwVuvQ 151078706
10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
10/2/2013 20:55:06 _4Pb2wEwiQomUny_XwVuvQ 151078706
In this case 151078706 should return 3 visits.
I think my SQL query looks right, but when I check my answer with my Excel created Visits number, some of orders off by 5%. I am hundred percent sure Excel numbers are correct.
Here is what I have so far. If anyone sees any issue with my query please correct me. And also if there any other better ways to find visits?
SET #row_num=0,
#temp_row=1;
SELECT orderidtable.orders,
count(orderidtable.users)
FROM
(SELECT temptab.temprow,
temptab.userid users,
temptab.orderid orders,
temptab.TimeUser
FROM
(SELECT #row_num := #row_num + 1 AS rownumber, TimeUser,
userid,
orderid
FROM order.order_dec
ORDER BY orderid,
userid,
timeuser) subtable ,
(SELECT #temp_row:= #temp_row+1 AS temprow, Timeuser,
userid,
orderid
FROM
ORDER.order_dec
ORDER BY orderid,
userid,
timeuser) temptab
WHERE (subtable.rownumber=temptab.temprow
AND abs(Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))>=1800)
OR (subtable.rownumber=temptab.temprow
AND subtable.userid<>temptab.userid)
OR (subtable.rownumber=temptab.temprow
AND subtable.orderid<>temptab.orderid)) orderidtable
GROUP BY orderidtable.orders
Numbering the rows is a right strategy; your query is going wrong in where condition.
Algorithm to solve it would be:
Number the rows ordering by orderid, userid, timeuser. Make two copies (subtable and temptable) of this dataset as you are already doing.
Join these tables on following condition:
subtable.rownumber =temptab.temprow + 1
What we trying to do here is to join the tables in a manner such that a row of subtable joins with a row of temptable with rownumber 1 lesser than its own. We are doing it to be capable of comparing consecutive time of visits of an user to an Ad. (You have already done it by setting #row_num=0, #temp_row=1). This is the only condition we should apply to the JOIN.
Now in the SELECT statement use CASE statement like below
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 1800 THEN 0
ELSE 1) As IsVisit
Now in an outer query GROUP BY order_id and in SELECT sum up IsVisit.
Let me know should you need more clarity or let me know if it worked.
Addendum:
From the previous query you can try replacing the where condition as subtable.rownumber = temptab.temprow + 4 and in SELECT statement replace the CASE statement of above query with the following:
(CASE WHEN subtable.orderid = temptable.orderid AND subtable.userid = temptable.userid AND (Time_To_Sec(subtable.TimeUser)-Time_To_Sec(temptab.TimeUser))< 900 THEN 1
ELSE 0) As IsVisit
Take UNION of the result set returned by previous query and this one, and then apply GROUP BY.
One issue I see: Your query is overly complex.
What about this?
Now then, both your original and this query will err when there's a visit near midnight, and another visit right shortly after it - in this case, both queries will count them as 2 visits when they really should be counted as one, if I understood your request correctly. From this simplified query, though, it should be easy for you to do the required change.
SELECT orderidtable.OrderID, COUNT(orderidtable.UserID) visits
FROM (
SELECT Timeuser, Userid, OrderID
FROM order.order_dec SubTab1
WHERE NOT EXISTS (
SELECT 1 FROM order.order_dec SubTab2
WHERE SubTab1.OrderID = SubTab2.OrderID
AND SubTab2.TimeUser > SubTab2.TimeUser
AND Time_To_Sec(SubTab2.TimeUser)
BETWEEN Time_To_Sec(SubTab1.OrderID)
AND Time_To_Sec(SubTab1.OrderID)+1800
)
) orderidtable
GROUP BY orderidtable.OrderID
I think just one time table full scan is sufficient for what you want as follows.
You can test here. http://www.sqlfiddle.com/#!2/a5dbcd/1.
Although my Query is not tested on many sample data, I think minor change is needed if it has bugs.
SELECT MAX(current_uv) AS uv
FROM (
SELECT orderid, userid, timeuser,
IF(orderid != #prev_orderid, #prev_timeuser := 0, #prev_timeuser) AS prev_timeuser,
#prev_orderid := orderid AS prev_orderid,
IF(userid != #prev_userid, #prev_timeuser := 0, #prev_timeuser) AS prev_timeuser2,
#prev_userid := userid AS prev_userid,
IF(TO_SECONDS(timeuser) - #prev_timeuser > 1800, #current_uv := #current_uv + 1, #current_uv) AS current_uv,
#prev_timeuser := TO_SECONDS(timeuser) AS prev_timeuser3
FROM order_dec,
(SELECT #prev_orderid := 0, #prev_userid = '', #prev_timeuser := 0, #current_uv := 0) init
ORDER BY orderid, userid, timeuser
) x;

SQL - Find all down times and the lengths of the downtimes from MySQL data (set of rows with time stamps and status messages)

I have started monitoring my ISP's downtimes with a looping PHP script which checks the connection automatically every 5 seconds and stores the result in MySQL database. The scripts checks if it's able to reach a couple of remote websites and logs the result. The time and status of the check are always stored in the database.
The structure of the table is following:
id (auto increment)
time (time stamp)
status (varchar)
Now to my issue.
I have the data, but I don't know how to use it to achieve the result I would like to get. Basically I would like to find all the periods of time when the connection was down and for how long the connection was down.
For instance if we have 10 rows with following data
0 | 2012-07-24 22:23:00 | up
1 | 2012-07-24 22:23:05 | up
2 | 2012-07-24 22:23:10 | down
3 | 2012-07-24 22:23:16 | down
4 | 2012-07-24 22:23:21 | up
5 | 2012-07-24 22:23:26 | down
6 | 2012-07-24 22:23:32 | down
7 | 2012-07-24 22:23:37 | up
8 | 2012-07-24 22:23:42 | up
9 | 2012-07-24 22:23:47 | up
the query should return the periods (from 22:23:10 to 22:23:21, and from 22:23:26 to 22:23:37). So the query should find always the time between the first time the connection goes down, and the first time the connection is up again.
One method I thought could work was finding all the rows where the connection goes down or up, but how could I find these rows? And is there some better solution than this?
I really don't know what the query should look like, so the help would be highly appreciated.
Thank you, regards Lassi
Here's one approach.
Start by getting the status rows in order by timestamp (inline view aliased as s). Then use MySQL user variables to keep the values from previous rows, as you process through each row.
What we're really looking for is an 'up' status that immediately follows a sequence of 'down' status. And when we find that row with the 'up' status, what we really need is the earliest timestamp from the preceding series of 'down' status.
So, something like this will work:
SELECT d.start_down
, d.ended_down
FROM (SELECT #i := #i + 1 AS i
, #start := IF(s.status = 'down' AND (#status = 'up' OR #i = 1), s.time, #start) AS start_down
, #ended := IF(s.status = 'up' AND #status = 'down', s.time, NULL) AS ended_down
, #status := s.status
FROM (SELECT t.time
, t.status
FROM mydata t
WHERE t.status IN ('up','down')
ORDER BY t.time ASC, t.status ASC
) s
JOIN (SELECT #i := 0, #status := 'up', #ended := NULL, #start := NULL) i
) d
WHERE d.start_down IS NOT NULL
AND d.ended_down IS NOT NULL
This works for the particular data set you show.
What this doesn't handle (what it doesn't return) is a 'down' period that is not yet ended, that is, a sequence of 'down' status with no following 'up' status.
To avoid a filesort operation to return the rows in order, you'll want a covering index on (time,status). This query will generate a temporary (MyISAM) table to materialize the inline view aliased as d.
NOTE: To understand what this query is doing, peel off that outermost query, and run just the query for the inline view aliased as d (you can add s.time to the select list.)
This query is getting every row with an 'up' or 'down' status. The "trick" is that it is assigning both a "start" and "end" time (marking a down period) on only the rows that end a 'down' period. (That is, the first row with an 'up' status following rows with a 'down' status.) This is where the real work is done, the outermost query just filters out all the "extra" rows in this resultset (that we don't need.)
SELECT #i := #i + 1 AS i
, #start := IF(s.status = 'down' AND (#status = 'up' OR #i = 1), s.time, #start) AS start_down
, #ended := IF(s.status = 'up' AND #status = 'down', s.time, NULL) AS ended_down
, #status := s.status
, s.time
FROM (SELECT t.time
, t.status
FROM mydata t
WHERE t.status IN ('up','down')
ORDER BY t.time ASC, t.status ASC
) s
JOIN (SELECT #i := 0, #status := 'up', #ended := NULL, #start := NULL) i
The purpose of inline view aliased as s is to get the rows ordered by timestamp value, so we can process them in sequence. The inline view aliased as i is just there so we can initialize some user variables at the start of the query.
If we were running on Oracle or SQL Server, we could make use of "analytic functions" or "ranking functions" (as they are named, respectively.) MySQL doesn't provide anything like that, so we have to "roll our own".
I don't really have time to adapt this to work for your setup right now, but I'm doing pretty much the same thing on a web page to monitor when a computer was turned off, and when it was turned back on, then calculating the total time it was on for...
I also don't know if you have access to PHP, if not completely ignore this. If you do, you might be able to adapt something like this:
$lasttype="OFF";
$ontime=0;
$totalontime=0;
$query2 = " SELECT
log_unixtime,
status
FROM somefaketablename
ORDER BY
log_unixtime asc
;";
$result2=mysql_query($query2);
while($row2=mysql_fetch_array($result2)){
if($lasttype=="OFF" && $row2['status']=="ON"){
$ontime = $row2['log_unixtime'];
}elseif($lasttype=="ON" && $row2['status']=="OFF"){
$thisblockontime=$row2['log_unixtime']-$ontime;
$totalontime+=($thisblockontime);
}
$lasttype=$row2['status'];
}
Basically, you start out with a fake row that says the computer is off, then loop through each real row.
IF the computer was off, but is now on, set a variable to see when it was turned on, then keep looping...
Keep looping until the computer was ON, but is now OFF. When that happens, subtract the previously-stored time it was turned on from the current row's time. That shows how long it was on for, for that group of "ON's".
Like I said, you'll have to adapt that pretty heavily to get it to do what you want, but if you replace "computer on/off" with "connection up/down", it's essentially the same idea...
One thing that makes this work is that I'm storing dates as integers, as a unix timestamp. So you might have to convert your dates so the subtraction works.
I'm unsure if this works (if not just comment)
It does: Select rows only if the row with an id 1 smaller than the current id has a different status (therefore selecting the first entry of any perion) and determinate the end Time through the >= and the same status.
SELECT ou.id AS outerId,
ou.timeColumn AS currentRowTime,
ou.status AS currentRowStatus,
( SELECT max(time)
FROM statusTable
WHERE time >= ou.timeColumn AND status = ou.status) AS endTime
FROM statusTable ou
WHERE ou.status !=
(SELECT status
FROM statusTable
WHERE id = (ou.id -1))