I have this Log table:
ID | controllerID | alert | value | timestamp
1 1 ping 5 11:43:20
2 3 ping 4 11:45:21
3 1 low 3 11:46:25
4 1 ping 7 11:43:27
5 1 ping 12 11:44:29
6 1 ping 5656 11:45:28
7 1 ping 56 11:46:02
8 1 low 64 11:46:45
9 1 ping 6 11:48:02
I am looking for a specific controller and a specific type of alert:
WHERE controllerID = 1 and alert = 'ping'
And I need to output all the rows that the difference between their timestamps is more than a minute and there was no other 'ping' alert between them. So my output will be rows 4 and 5
ID | controllerID | alert | value | timestamp
4 1 ping 7 11:43:27
5 1 ping 12 11:44:29
7 1 ping 56 11:46:02
9 1 ping 6 11:48:02
How can i implement it? It can be in 2 or more queries but only in MySQL.
I thought of ordering them by timestamp after sorting all the neccessary rows and then adding a new column to have a timedifference and selecting all that have more than a minute but i am not sure how i can add this new column.
Related
I investigate certain effects within a household / between partners. I have paneldata (person-year) for several variables, and a partner id. I would like to regress the outcome of a person on the dependent variable values of its partner. I don't know how to do this specification in Stata.
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id pid y x)
1 1 3 9 2
2 1 3 10 4
3 1 . 11 6
1 2 4 20 2
2 2 4 21 6
3 2 3 22 7
1 3 1 25 5
2 3 1 30 10
3 3 2 35 15
1 4 2 20 4
2 4 2 30 6
3 4 . 40 8
end
* pooled regression
reg y x
* fixed effects regression
xtset year id
xtreg y x, fe
I can do pooled and fixed effects regressions. But even for the pooled / simple regression, how can I regress someones outcome on somebody else's independent variable?
Actually for Person 1, I need to regress 9/10/11 on 5/10/. and so on.
Person 2: regress 20/21/22 on 4/6/15
Person 3: regress 25/30/35 on 2/4/7
Person 4: regress 20/30/40 on 2/6/.
Idea: If there is no option in the regress function, I guess I could create new variables for each independent variable I have and name it x_partner. In this example x_partner should contain 5,10,.,4,6,15,2,4,7,2,6,. but I still don't know how to achieve this.
bysort id (year): egen x_partner = x[pid] // rough idea
The rough idea won't work. egen needs one of its own functions specified, and that alone makes the syntax illegal.
But the essence here is to look up the partner's values and put in new variables aligned with each identifier.
Thanks for using dataex.
rangestat from SSC, a community-contributed command, allows a one-line solution. Consider
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year id pid y x)
1 1 3 9 2
2 1 3 10 4
3 1 . 11 6
1 2 4 20 2
2 2 4 21 6
3 2 3 22 7
1 3 1 25 5
2 3 1 30 10
3 3 2 35 15
1 4 2 20 4
2 4 2 30 6
3 4 . 40 8
end
ssc install rangestat
rangestat wanted_y=y wanted_x=x if !missing(id, pid), interval(id pid pid) by(year)
list, sepby(id)
+-------------------------------------------------+
| year id pid y x wanted_y wanted_x |
|-------------------------------------------------|
1. | 1 1 3 9 2 25 5 |
2. | 2 1 3 10 4 30 10 |
3. | 3 1 . 11 6 . . |
|-------------------------------------------------|
4. | 1 2 4 20 2 20 4 |
5. | 2 2 4 21 6 30 6 |
6. | 3 2 3 22 7 35 15 |
|-------------------------------------------------|
7. | 1 3 1 25 5 9 2 |
8. | 2 3 1 30 10 10 4 |
9. | 3 3 2 35 15 22 7 |
|-------------------------------------------------|
10. | 1 4 2 20 4 20 2 |
11. | 2 4 2 30 6 21 6 |
12. | 3 4 . 40 8 . . |
+-------------------------------------------------+
Basically I am trying to calculate shots received in golf for various four balls, here is my data:-
DatePlayed PlayerID HCap Groups Hole01 Hole02 Hole03 Shots
----------------------------------------------------------------------
2018-11-10 001 15 2 7 3 6
2018-11-10 004 20 1 7 4 6
2018-11-10 025 20 2 7 4 5
2018-11-10 047 17 1 8 3 6
2018-11-10 048 20 2 8 4 6
2018-11-10 056 17 1 6 3 5
2018-11-10 087 18 1 7 3 5
I want to retrieve the above lines with an additional column which is to be calculated depending on the value in the group column, which is the players (Handicap - (the lowest handicap in the group)) x .75
I can achieve it in a group by but need to aggregate everything, is there a way I can return the value as above?, here is query that returns the value:
SELECT
PlayerID,
MIN(Handicap),
MIN(Hole01) AS Hole01,
MIN(Hole02) AS Hole02,
MIN(Hole03) AS Hole03,
MIN(CourseID) AS CourseID,
Groups,
ROUND(
MIN((Handicap -
(SELECT MIN(Handicap) FROM Results AS t
WHERE DatePlayed='2018-11-10 00:00:00' AND t.Groups=Results.Groups)) *.75))
AS Shots
FROM
Results
WHERE
Results.DatePlayed='2018=11=10 00:00:00'
GROUP BY
DatePlayed, Groups, PlayerID
.
PlayerID MIN(Handicap)Hole01 Hole02 Hole03 CourseID Groups Shots
-----------------------------------------------------------------
4 20 7 4 6 1 1 2
47 17 8 3 6 1 1 0
56 17 6 3 5 1 1 0
87 18 7 3 5 1 1 1
1 15 7 3 6 1 2 0
25 20 7 4 5 1 2 4
48 20 8 4 6 1 2 4
Sorry about any formatting really couldn't see how to get my table in here, any help will be much appreciated, I am using the latest mysql from ubuntu 18.04
Not an answer; too long for a comment...
First off, I happily know nothing about golf, so what follows might not be optimal, but it must, at least, be a step in the right direction...
A normalized schema might look something like this...
rounds
round_id DatePlayed PlayerID HCap Groups
1 2018-11-10 1 15 2
2 2018-11-10 4 20 1
round_detail
round_id hole shots
1 1 7
1 2 3
1 3 6
2 1 7
2 2 4
2 3 6
Hi Guys I have found the solution, basically I need to drop the MIN immediately after the ROUND of the equation and therefore it does not need a Group By.
SELECT
PlayerID,
Handicap,
Hole01,
Hole02,
Hole03,
CourseID,
Groups,
ROUND((Handicap -
(SELECT MIN(Handicap) FROM Results AS t
WHERE DatePlayed='2018-11-10 00:00:00'
AND t.Groups=Results.Groups))
*.75) AS Shots
FROM
Results
WHERE
Results.DatePlayed='2018=11=10 00:00:00'
I have to count how many repeated times a user has called within next 7 days (days have to be flexible) or more.
The query should only consider records with the 7 days earlier than the last date in the table.
My data looks something like this:
call_date user
2017-05-01 100
2017-05-01 500
2017-05-02 200
2017-05-02 300
2017-05-03 300
2017-05-04 100
2017-05-05 400
2017-05-06 500
2017-05-07 600
2017-05-08 200
2017-05-09 700
2017-05-10 500
2017-05-11 400
2017-05-12 300
2017-05-13 100
2017-05-14 200
The desired output of the query is:
call_date user count
2017-05-01 100 2
2017-05-01 500 2
2017-05-02 200 2
2017-05-02 300 2
2017-05-03 300 1
2017-05-04 100 1
2017-05-05 400 2
2017-05-06 500 2
2017-05-07 600 1
Explanation:
While listing the date the first contact should be considered (user 100 called on 2017-05-01, 2017-05-04 and 2017-05-13) but only 2017-05-01 displayed
For user 100, only records within 7 days should be considered hence count of user 100 becomes 2 (2017-05-01 and 2017-05-04; excluding 2017-05-13 since falls out of range) for call_date 2017-05-01
No records after 2017-05-07 are considered because it is the date which is 7 days earlier than the max date i.e. 2017-05-14
This query has to run on 25+ million records hence an optimized query would be added advantage.
I am quite unsure as to how to nail down this problem; a detailed explanation with the query would be much appreciated.
Assuming this is your table definition (I've changed user to user_id to avoid clashing with a reserved keyword):
CREATE TABLE calls
(
call_date date NOT NULL,
user_id integer NOT NULL
/* no primary key. There *can* be duplicate rows, that could be
changed if call_date were instead call_datetime. Then:
PRIMARY KEY (user_id, call_datetime)
Assumed user's cannot make simultaneous calls, nor any faster than
the datetime resolution.
*/
)
;
-- These indexes will help `using index` query plans.
CREATE INDEX idx_calls_user_id_call_date ON calls(user_id, call_date) ;
CREATE INDEX idx_calls_call_date_user_id ON calls(call_date, user_id) ;
... and that we import your data. We can then query the database with:
SELECT
call_date, user_id,
-- Count of the number of calls on `call_date` for `user_id`
count(call_date) AS count_on_date,
-- Count of the number of calls between `call_date` and the next 6 days (including both)
(SELECT count(call_date) FROM calls c1 WHERE c1.user_id = c.user_id AND c1.call_date BETWEEN c.call_date AND c.call_date + interval 6 day) AS count_next_7_days
FROM
calls c
-- The next JOIN is used to retrieve the `reference date`, and do it only once.
-- This will allow to take into account only dates from (2017-05-14 - 13 day) = 2017-05-01 and (2017-05-14 - 7 day) = 2017-05-07
JOIN (SELECT max(call_date) AS ref_date FROM calls) AS d ON c.call_date BETWEEN ref_date - interval 13 day AND ref_date - interval 7 day
GROUP BY
call_date, user_id
ORDER BY
call_date, user_id ;
This query will return:
call_date | user_id | count_on_date | count_next_7_days
:--------- | ------: | ------------: | ----------------:
2017-05-01 | 100 | 1 | 2
2017-05-01 | 500 | 1 | 2
2017-05-02 | 200 | 1 | 2
2017-05-02 | 300 | 1 | 2
2017-05-03 | 300 | 1 | 1
2017-05-04 | 100 | 1 | 1
2017-05-05 | 400 | 1 | 2
2017-05-06 | 500 | 1 | 2
2017-05-07 | 600 | 1 | 1
dbfiddle here
Have you tried DAYOFWEEK() function? This link should be helpful.
I need some direction on how to create this MySQL query.
I have a table that looks like this.
id group name value
1 1 user mike
2 1 setting 1
3 2 user joe
4 2 setting 2
5 3 user jill
6 3 setting 1
7 4 user mark
8 4 setting 1
9 4 other 22
I would like to format the query to group users that have identical settings (IE mike and jill would be grouped in this example, not mark because of "other")
At the end of the day I am trying to consolidate and make the table look like this. If I can figure out the query to properly group them ,I will use PHP to combine the values and save it back to the DB.
id group name value
1 1 user mike OR jill
2 1 setting 1
3 2 user joe
4 2 setting 2
7 4 user mark
8 4 setting 1
9 4 other 22
Thank you!
i have a database with workers, stations and session. A session describes at which time which worker has been on which station. I managed to build a query that gives me the duration of the overlap of each session.
SELECT
sA.station_id,
sA.worker_id AS worker1,
sB.worker_id AS worker2,
SEC_TO_TIME(
TIME_TO_SEC(LEAST(sA.end,sB.end)) - TIME_TO_SEC(GREATEST(sA.start,sB.start))
) AS overlap
FROM
`sessions` AS sA,
`sessions` AS sB
WHERE
sA.station_id = sb.station_id
AND
sA.station_id = 6
AND (
sA.start BETWEEN sB.start AND sB.end
OR
sA.end BETWEEN sB.start AND sB.end
)
With this query i get an result like this
station_id worker1 worker2 overlap
6 1 1 09:00:00
6 2 1 02:30:00
6 5 1 00:00:00
6 1 1 09:00:00
6 2 1 01:30:00
6 3 1 09:00:00
...
6 12 3 02:00:00
6 14 3 01:00:00
6 17 3 02:00:00
...
What i would like now is to sum up the overlap for every combination of worker1 and worker2 to get the overall overlap duration.
I tried different ways of using SUM() and GROUP BY but i never got the wanted result.
SELECT
...
SEC_TO_TIME(
**SUM**(TIME_TO_SEC(LEAST(sA.end,sB.end)) - TIME_TO_SEC(GREATEST(sA.start,sB.start)))
) AS overlap
...
#has as result
station_id worker1 worker2 overlap
6 1 1 838:59:59
#in combination with
GROUP BY
worker1
#i get
station_id worker1 worker2 overlap
6 1 1 532:30:00
6 2 1 -33:00:00
6 3 1 270:30:00
6 5 1 598:30:00
6 6 1 542:00:00
6 7 1 508:00:00
6 8 5 53:00:00
6 9 1 54:30:00
6 10 1 310:00:00
6 11 1 -108:00:00
6 12 1 593:30:00
6 14 1 97:30:00
6 15 1 -53:30:00
6 17 1 293:30:00
the last result is close but i am still missing a lot of combinations. I also dont understand why the combination 8 - 5 is displayed.
thanks for ur help (and time to read)
aaargh, sorry for my stupidity, the solution was fairly simple
....
SUM(((UNIX_TIMESTAMP(LEAST(sA.end,sB.end))-UNIX_TIMESTAMP(GREATEST(sA.start,sB.start)))/3600))
...
GROUP BY station_id, worker1, worker2
ORDER BY worker1, worker2
i switched to using timestamps and transforming it to hours by /3600 because my former used approach with TIME_TO_SEC and SEC_TO_TIME only used the TIME part of the DATETIME field and thereby produced some wrong numbers. With MySQL 5.5 i could use TO_SECONDS but unfortunately my server is still runing 5.1.