I have two tables of time series data that I am trying to query and don't know how to properly do it.
The first table is time series data of device measurements. Each device is associated with a source and the data contains an hourly measurement. In this example there are 5 devices (101-105) with data for 5 days (June 1-5).
device_id date_time source_id meas
101 2016-06-01 00:00 ABC 105
101 2016-06-01 01:00 ABC 102
101 2016-06-01 02:00 ABC 103
...
101 2016-06-05 23:00 ABC 107
102 2016-06-01 00:00 XYZ 102
...
105 2016-06-05 23:00 XYZ 104
The second table is time series data of source measurements. Each source has three hourly measurements (meas_1, meas_2 and meas_3).
source_id date_time meas_1 meas_2 meas_3
ABC 2016-06-01 00:00 100 101 102
ABC 2016-06-01 01:00 99 100 105
ABC 2016-06-01 02:00 104 108 109
...
ABC 2016-06-05 23:00 102 109 102
XYZ 2016-06-01 00:00 105 106 103
...
XYZ 2016-06-05 23:00 103 105 101
I am looking for a query to get the data for a specified date range that grabs the device's measurements and its associated source's measurements. This example is the result for querying for device 101 from June 2-4.
device_id date_time d.meas s.meas_1 s.meas_2 s.meas_3
101 2016-06-02 00:00 105 100 101 102
101 2016-06-02 01:00 102 99 100 105
101 2016-06-02 02:00 103 104 108 109
...
101 2016-06-04 23:00 107 102 109 102
The actual data set could get large with lets say 100,000 devices and 90 days of hourly measurements. So any help on properly indexing the tables would be appreciated. I'm using MySQL.
UPDATE - Solved
Here's the query I used:
SELECT d.device_id, d.date_time, d.meas, s.meas_1, s.meas_2, s.meas_3
FROM devices AS d
JOIN sources AS s
ON d.source_id = s.source_id AND d.date_time = s.date_time AND d.device_id = '101' AND d.date_time >= '2016-06-02 00:00' AND d.date_time <= '2016-06-04 23:00'
ORDER BY d.date_time;
For what its worth, it also worked with the filters in a WHERE clause rather than in the JOIN but it was slower performing. Thanks for the help.
Related
I have table like this
user_id order_id create_time payment_amount product
101 10001 2018-04-02 5:26 48000 chair
102 10002 2018-04-02 7:44 25000 sofa
101 10003 2018-04-02 8:34 320000 ac
101 10004 2018-04-02 8:37 180000 water
103 10005 2018-04-02 9:32 21000 chair
102 10006 2018-04-02 9:33 200000 game console
103 10007 2018-04-02 9:36 11000 chair
107 10008 2018-04-02 11:05 10000 sofa
105 10009 2018-04-02 11:06 49000 ac
101 10010 2018-04-02 12:05 1200000 cc
105 10011 2018-04-02 12:12 98000 ac
103 10012 2018-04-02 13:11 85000 insurance
106 10013 2018-04-02 13:11 240000 cable tv
108 10014 2018-04-02 13:15 800000 financing
106 10015 2018-04-02 13:18 210000 phone
my goal is to find which user did transaction consecutively less than 10min.
I'm using mysql
Based on the format of your dates in the table, you will need to convert them using STR_TO_DATE to use them in a query. If your column is actually a datetime type, and that is just your display code outputting that format, just replace STR_TO_DATE(xxx, '%m/%d/%Y %k:%i') in this query with xxx.
The way to find orders within 10 minutes of each other is to self-join your table on user_id, order_id and the time on the second order being within the time of the first order and 10 minutes later:
SELECT t1.user_id, t1.create_time AS order1_time, t2.create_time AS order2_time
FROM transactions t1
JOIN transactions t2 ON t2.user_id = t1.user_id
AND t2.order_id != t1.order_id
AND STR_TO_DATE(t2.create_time, '%m/%d/%Y %k:%i') BETWEEN
STR_TO_DATE(t1.create_time, '%m/%d/%Y %k:%i')
AND STR_TO_DATE(t1.create_time, '%m/%d/%Y %k:%i') + INTERVAL 10 MINUTE
Output:
user_id order1_time order2_time
101 4/2/2018 8:34 4/2/2018 8:37
103 4/2/2018 9:32 4/2/2018 9:36
106 4/2/2018 13:11 4/2/2018 13:18
Demo on dbfiddle
Use this query:
SELECT user_id FROM `table_name` WHERE create_time < DATE_SUB(NOW(), INTERVAL 10 MINUTE) GROUP BY user_id HAVING count(user_id) > 1
I have 3 database tables with sample data given below
Meas_id - integer(Foreign keyed to Measurement.meas_id)
Tool_id - integer(Foreign keyed to Events.Machine_id)
Processdate- Timestamp with timezone (UTC)
CreatedDate- Timestamp with timezone (UTC)
Readings
Meas_id Tool_id Status Processdate
1 13 Completed 2016-01-01 01:34:11
1 28 Failed 2016-01-01 08:37:11
1 54 Failed 2016-01-02 16:04:12
1 32 Completed 2016-01-04 07:13:11
1 39 Completed 2016-01-04 14:14:14
1 12 Completed 2016-01-05 22:10:09
1 9 Completed 2015-12-28 13:11:07
1 17 Completed 2016-01-25 13:14:11
1 27 Completed 2016-01-15 14:15:16
1 31 Failed 2016-01-07 16:08:04
2 113 Completed 2016-01-01 01:34:11
2 128 Failed 2016-01-01 08:37:11
2 154 Failed 2016-01-02 16:04:12
2 132 Completed 2016-01-04 07:13:11
2 139 Completed 2016-01-04 14:14:14
2 112 Completed 2016-01-05 22:10:09
2 90 Completed 2015-12-28 13:11:07
2 117 Completed 2016-01-25 13:14:11
2 127 Completed 2016-01-15 14:15:16
2 131 Failed 2016-01-07 16:08:04
Events
Meas_id Machine_id Event_Name CreatedDate
1 13 Success 2015-12-27 01:34:11
1 17 Error 2015-12-27 08:37:11
1 28 Success 2015-12-27 16:04:12
1 9 Success 2015-12-28 07:13:11
1 54 Success 2015-12-28 14:14:14
1 31 Error 2015-12-28 22:10:09
1 32 Success 2015-12-29 13:11:07
1 39 Success 2015-12-29 13:14:11
1 12 Success 2015-12-31 14:15:16
1 27 Success 2016-01-01 16:08:04
2 113 Success 2015-12-27 01:34:11
2 117 Error 2015-12-27 08:37:11
2 128 Success 2015-12-27 16:04:12
2 90 Success 2015-12-28 07:13:11
2 154 Success 2015-12-28 14:14:14
2 131 Error 2015-12-28 22:10:09
2 132 Success 2015-12-29 13:11:07
2 139 Success 2015-12-29 13:14:11
2 112 Success 2015-12-31 14:15:16
2 127 Success 2016-01-01 16:08:04
Mesurement
Meas_id Meas_name
1 Length
2 Breadth
For each measurement ‘length’ and ‘breadth’ and each day of the week, I am trying to calculate the percentage of success in the first week of 2016 for all completed measurements of tools/machines within 168 hours of thier creation date.
My Desired Output is
Measurement DayofTheWeek PercentageSuccess
Length 1 50
Length 2 0
Length 3 0
Length 4 100
Length 5 100
Length 6 0
Length 7 0
Breadth 1 50
Breadth 2 0
Breadth 3 0
Breadth 4 100
Breadth 5 100
Breadth 6 0
Breadth 7 0
I tried doing it this way but certainly missing some logic and its not working.
Select m.Meas_name,
datepart(dd, Processdate) as DayofTheWeek,
(Count(m.Meas_name)* 100 / (Select Count(Event_Name) From Events where Event_Name = 'Success')) as PercentageSuccess
FROM Readings r JOIN
Measurements m
ON r.Meas_id = m.Meas_id
JOIN Events e
ON e.Meas_id = m.Meas_id
WHERE m.Meas_name IN ('Length', 'Breadth') AND
r.Status = 'Completed' AND
e.CreatedDate >= DATEADD(hh, -168, GETDATE())
GROUP BY m.Meas_name, datepart(dd, Processdate);
Kindly provide inputs on an optimized way of achieving it.
Nice I got downvoted for a correct answer probably because my answer wasn't very clear it is kind of hard to explain so here is an edit aimed at your comment and the downvoter (whom I think was just retaliating).
Anyway, Your joining of 3 tables while valid replicates the data in your events table. Due to that the way you are counting the records will always be exaggerated and incorrect. your calculation for percentage is also happens to be backwards.
On the join it looks like you are just missing the use of the Tool_id in your join. You could try something like the following:
SELECT
m.Meas_name
,DAYOFWEEK(r.ProcessDate) as DayOfTheWeek
,(COUNT(CASE WHEN e.Event_Name = 'Success' tHEN e.Meas_id END)/(COUNT(e.Meas_id) * 1.0)) * 100 as PercentageSuccess
FROM
Measurements m
INNER JOIN Events e
ON m.Meas_id = e.Meas_id
INNER JOIN Readings r
ON e.Meas_id = r.Meas_id
AND e.Machine_id = r.Tool_id
AND r.Status = 'Completed'
AND r.ProcessDate BETWEEN '2016-01-01' AND '2016-01-07'
WHERE
m.Meas_name IN ('Length','Breadth')
GROUP BY
m.Meas_name
,DAYOFWEEK(r.ProcessDate)
Note this is written for mysql because that is what is tagged in you post. if you actually want sql-server as your syntax suggests let me know. Also, I am guessing that you a really want to filter by processdate but if you want to filter by Event.CreateDate then put that in the ON condition of the Events join.
I have this table that has the name of the employee and their phone time duration in mysql. The table looks like this:
Caller Emplid Calldate Call_Duration
Jack 333 1/1/2016 43
Jack 333 1/2/2016 45
Jack 333 1/3/2016 87
Jack 333 2/4/2016 44
Jack 333 2/5/2016 234
jack 333 2/6/2016 431
Jeff 111 1/1/2016 23
Jeff 111 1/2/2016 54
Jeff 111 1/3/2016 67 48
I am trying to calculate the running Daily average of each employee total_Duration by day each month. Suppose I have daily running average for the month of April, then the running average for the May should start from 1st of may and end on 31st of that month. I have tried doing many ways and mysql does not have pivot and partition function like sql server. The total employee who made the call changes daily, I need something that dynamically takes care of no of employees that makes call.
The output should look like this:
Caller Emplid Calldate Call_Duration Running_avg
Jack 333 1/1/2016 43 43
Jack 333 1/2/2016 45 44
Jack 333 1/3/2016 87 58.33333333
Jack 333 2/4/2016 44 44
Jack 333 2/5/2016 234 139
Jack 333 2/6/2016 431 236.3333333
Jeff 111 1/1/2016 23 23
Jeff 111 1/2/2016 54 38.5
Jeff 111 1/3/2016 67 48
This is the query that I started below:
SELECT row_number,Month_Year,Callername,Calldate,Caller_Emplid,`Sum of Call`,`Sum of Call`/row_number as AvgCall,
#`sum of call`:=#`sum of call`+ `sum of call` overallCall,
#row_number:=row_number overallrow_number,
#RunningTotal:=#`sum of call`/#row_number runningTotal
FROM
(SELECT
#row_number:=
CASE WHEN #CallerName=CallerName and date_format(calldate,'%d') = date_format(calldate,'%d') and
date_format(calldate,'%m') = date_format(calldate,'%m')
THEN #row_number+1
ELSE 1 END AS row_number,#CallerName:=CallerName AS Callername,Calldate,Caller_Emplid,Month_Year,`Sum of Call`
FROM transform_data_2, (SELECT #row_number:=0,#CallerName:='') AS t
ORDER BY callername) a
JOIN (SELECT #`Sum of call`:= 0) t
I have 2 tables payment_scroll and virtual
virtual table contains 4 fields...
ID self_id net_amount date
1 101 600 1-1-2012
2 102 700 5-8-2012
3 103 900 13-11-2012
4 104 1100 16-9-2012
In payment_scroll table net_amount field is updated from gridview on front end. After update net_amount on same table
ID self_id net_amount date
1 101 950 3-4-2012
2 102 1100 11-6-2012
3 103 900 13-11-2012
4 104 1100 16-9-2012
I want the to update the second table payment_scroll via virtual table like the below manner.
ID self_id total_amount p1 d1 p2 d2 p3 d3 p4 d4..........upto p100 d100
1 101 5000 600 (1-1-2012) 950 (3-4-2012)
2 102 9650 700 (5-8-2012) 1100 (11-6-2012)
3 103 8000 900 (13-11-2012)
4 104 1100 1100 (16-9-2012)
please suggest me the right query to do this??
I'm working on a problem of finding mean processing times. I'm trying to eliminate outlier data by essentially performing a average on only the best 80% of the data.
I am struggling trying to adapt existing Top N per Group solutions to perform averaging per group. Using SQL Server 2008.
Here is a sample of what the table looks like:
OpID | ProcessMin | Datestamp
2 | 234 | 2012-01-26 09:07:29.000
2 | 222 | 2012-01-26 10:04:22.000
3 | 127 | 2012-01-26 11:09:51.000
3 | 134 | 2012-01-26 05:02:11.000
3 | 566 | 2012-01-26 05:27:31.000
4 | 234 | 2012-01-26 04:08:41.000
I want it to take the lowest 80% of the ProcessMin for each OpID, and take the average of that array. Any help would be appreciated!
* UPDATE *
Given the following table:
OpID ProcessMin Datestamp
602 33 46:54.0
602 36 38:59.0
602 37 18:45.0
602 39 22:01.0
602 41 36:43.0
602 42 33:00.0
602 49 03:48.0
602 51 22:08.0
602 69 39:15.0
602 105 59:56.0
603 13 34:07.0
603 18 07:17.0
603 31 57:07.0
603 39 01:52.0
603 39 01:02.0
603 40 40:10.0
603 46 22:56.0
603 47 11:03.0
603 48 40:13.0
603 56 25:01.0
I would expect this output:
OptID ProcessMin
602 41
603 34.125
Notice that since there are 10 data points for each OpID, it would only average the lowest 8 values (80%).
You can use ntile
select OpID,
avg(ProcessMin) as ProcessMin
from
(
select OpID,
ProcessMin,
ntile(5) over(partition by OpID order by ProcessMin) as nt
from YourTable
) as T
where nt <= 4
group by OpID
SE-Data
If ProcessMin is an integer you can do avg(cast(ProcessMin as float)) as ProcessMin to get the decimal average value.