This is one of my MySQL tables:
idpk
id
status
timestamp
1
43857
AVAILABLE
2023-01-07 09:14:19
2
43857
OFFLINE
2023-01-07 18:14:19
3
43857
AVAILABLE
2023-01-07 23:14:19
4
43860
AVAILABLE
2023-01-08 09:14:19
5
43860
OFFLINE
2023-01-08 18:14:19
6
43860
AVAILABLE
2023-01-08 22:14:19
7
43857
OFFLINE
2023-01-08 23:14:19
I track multiple id's and the it's current status. My goal is to know the offline time of each ID within 24 hours.
For example:
id -> 43857 on 2023-01-07 it would be an offline time of 5 hours.
id -> 43860 on 2023-01-08 it would be an offline time of 4 hours.
The calculation should be only if status is "OFFLINE" until next "AVAILABLE" status.
Currently I do this via python, I select affected rows within 24 hours, calculate and save the offline time in another table.
Is there a better way to calculate this? For example in SQL language?
I'm also open to new ideas, better ways to save the data in MySQL to make the calculation easier.
Btw.:
I also found this question which is pretty much what I need, but can someone explain me that query a little bit?
This would eliminate all time stamps that have no available status reached
WITH CTE as (SELECT
t1.id, t1.`timestamp`
, (SELECT `timestamp` FROM tab1 t2 WHERE t2.id = t1.id
AND t2.`status`= 'AVAILABLE'
AND t1.`timestamp` < t2.`timestamp`
ORDER BY t2.`idpk` ASC LIMIT 1) atime
FROM tab1 t1
WHERE t1.`status` = 'OFFLINE'
)
SELECT
id,
CONCAT(
TIMESTAMPDIFF(day,`timestamp`,atime) , ' days ',
MOD( TIMESTAMPDIFF(hour,`timestamp`,atime), 24), ' hours ',
MOD( TIMESTAMPDIFF(minute,`timestamp`,atime), 60), ' minutes '
, ' OFFLINE') `offline`
FROM CTE
WHERE atime IS NOT NULL
id
offline
43857
0 days 5 hours 0 minutes OFFLINE
43860
0 days 4 hours 0 minutes OFFLINE
fiddle
Related
I come from MS SQL Server and I'm relatively new to MySQL / MariaDB 10 (at least in a deeper way than just "SELECT * FROM [Table]"). I now searched for several hours in Google and StackOverflow, but I haven't found a soluton to my problem yet. If it's relevant in any way: I use MySQL Workbench for writing my code.
The Background
I have a new data logging project for saving and displaying data from several temperature and humidity sensors within the house. I save it in following table:
ID
Time
Device
Temperature
Humidity
1
2022-01-09 13:34:00
1
20.1
52.3
2
2022-01-09 13:35:00
1
20.0
52.3
3
2022-01-09 13:36:00
1
20.1
52.4
4
2022-01-09 13:37:00
1
20.1
52.5
5
2022-01-09 13:38:00
1
20.0
52.5
6
2022-01-09 13:39:00
1
20.1
52.6
I query the needed data for a chart using a stored procedure. Especially the on 0.1°C rounded temperature values have the disadvantage that they naturally often change between a value of 0.1 when the temperature is pretty stable. So I thought of a moving average to smooth the values over the last 10 minutes which works perfectly with an average window function.
Here a simplified version of my procedure:
CREATE PROCEDURE `stpGetSensorData`(sensorId INT, startDate VARCHAR(8))
BEGIN
DECLARE FromDate DATE;
SET FromDate = STR_TO_DATE(startDate, '%Y%m%d');
SELECT
L.ID,
L.Time,
L.Device,
AVG(L.Temperature) OVER (ORDER BY L.Time ROWS BETWEEN 10 PRECEDING AND 0 FOLLOWING) AS Temperature,
AVG(L.Humidity) OVER (ORDER BY L.Time ROWS BETWEEN 10 PRECEDING AND 0 FOLLOWING) AS Humidity
FROM
LoggedData AS L
WHERE
Device = sensorId
AND Time < DATE_ADD(FromDate, INTERVAL 1 DAY)
AND Time >= FromDate
ORDER BY Time DESC;
END
The challenge
Now I thought I let the end user decide about the size of the window, i.e. an average over the last 5, 10, 30, 60, ... minutes. But when I try to insert a parameter in the window function, it leads to the error: "averageRows is not valid at this position".
Here the code:
CREATE PROCEDURE `stpGetSensorData`(sensorId INT, startDate VARCHAR(8), averageRows INT)
BEGIN
DECLARE FromDate DATE;
SET FromDate = STR_TO_DATE(startDate, '%Y%m%d');
SELECT
L.ID,
L.Time,
L.Device,
AVG(L.Temperature) OVER (ORDER BY L.Time ROWS BETWEEN averageRows PRECEDING AND 0 FOLLOWING) AS Temperature,
AVG(L.Humidity) OVER (ORDER BY L.Time ROWS BETWEEN averageRows PRECEDING AND 0 FOLLOWING) AS Humidity
FROM
LoggedData AS L
WHERE
Device = sensorId
AND Time < DATE_ADD(FromDate, INTERVAL 1 DAY)
AND Time >= FromDate
ORDER BY Time DESC;
END
I guess it's possible to solve this using Dynamic SQL, but I try to avoid Dynamic SQL whereever possible and thought there must be a 'normal' solution as well and I'm just too blind to see it.
Any smart ideas?
I currently have two tables in mysql that are collecting data. One collects every 3 hours and the other every 15 minutes. I am trying to create a forecast using this data using machine learning techniques which requires the data to be of the same time interval.
My question is, how can I get the average value for every 3 hours from both tables into a new table that organises them by date and hour? So of a format like:
Date / Hour / Table 1 3-hour average / Table 2 3-hour average
Table 1 (every 3 hours):
datetime(timestamp)/level(double)
Table 2 (every 15 minutes):
datetime(timestamp)/value(double)
I have managed to create a table that gives me the average per day, but I'd prefer something more accurate!
This is the statement I've already got for daily averages:
SELECT DAY(`datetime`),
AVG(`level`) AS `Table 1 Average` ,
(SELECT AVG(`value`) AS `Table 2 Daily Average`
FROM `table2`
WHERE DAY(`datetime`) = DAY(table1.datetime)
GROUP BY DAY(`datetime`))
FROM `table1` GROUP BY DAY(`datetime`);
Thanks!
You really just need to change the subquery. I think this may do way you want:
SELECT t1.*,
SELECT AVG(t2.`value`)
FROM `table2`
WHERE t2.datetime <= t1.datime and
t2.datetime >= date_sub(t1.datetime, interval 3 hour)
) as t2_moving_average
FROM `table1` t1;
I've been looking at several other SO questions but I could not make out a solution from these. First, the description, then what I'm missing from the other threads. (Heads up: I'm very well aware of the non-normalised structure of our database, which is something I have addressed in meetings before but this is what we have and what I have to work with.)
Background description
We have a machine that manufactures products in 25 positions. These products' production data is being logged in a table that among other things logs current and voltage for every position. This is only logged when the machine is actually producing products (i.e. has a product in the machine). The time where no product is present, nothing is being logged.
This machine can run in two different production modes: full production and R&D production. Full production means that products are being inserted continuously so that every instance has a product at all times (i.e. 25 products are present in the machine at all times). The second mode, R&D production, only produces one product at a time (i.e. one product enters the machine, goes through the 25 instances one by one and when this one is finished, the second product enters the machine).
To clarify: every position logs data once every second whenever a product is present, which means 25 instances per second when full production is running. When R&D mode is running, position 1 will have ~20 instances for 20 consecutive seconds, position 2 will have ~20 instances for the next 20 consecutive seconds and so on.
Table structure
Productiondata:
id (autoincrement)
productID
position
time (timestamp for logged data)
current (amperes)
voltage (volts)
Question
We want to calculate the uptime of the machine, but we want to separate the uptime for production mode and R&D mode, and we want to separate this data on a weekly basis.
Guessed solution
Since we have instances logged every second I can count the amount of DISTINCT instances of time values we have in the table to find out the total uptime for both production and R&D mode. To find the R&D mode, I can safely say that whenever there is a time instance that has only one entry, I'm running in R&D mode (production mode would have 25 instances).
Progress so far
I have the following query which sums up all distinct instances to find both production and R&D mode:
SELECT YEARWEEK(time) AS YWeek, COUNT(DISTINCT time) AS Time_Seconds, ROUND(COUNT(DISTINCT time)/3600, 1) AS Time_Hours
FROM Database.productiondata
WHERE YEARWEEK(time) >= YEARWEEK(curdate()) - 21
GROUP BY YWeek;
This query finds out how many DISTINCT time instances there are in the table and counts the number and groups that by the week.
Problem
The above query counts the amount of instances that exist in the table, but I want to find ONLY the UNIQUE instances. Basically, I'm trying to find something like IF count(time) = 1, then count that instance, IF count(time) > 1 then don't count it at all (DISTINCT still counts this).
I looked at several other SO threads, but almost all explain how to find unique values with DISTINCT, which only accomplishes half of what I'm looking for. The closest I got was this which uses a HAVING clause. I'm currently stuck at the following:
SELECT YEARWEEK(time) as YWeek, COUNT(Distinct time) As Time_Seconds, ROUND(COUNT(Distinct time)/3600, 1) As Time_Hours
FROM
(SELECT * FROM Database.productiondata
WHERE time > '2014-01-01 00:00:00'
GROUP BY time
HAVING count(time) = 1) as temptime
GROUP BY YWeek
ORDER BY YWeek;
The problem here is that we have a GROUP BY time inside the nested select clause which takes forever (~5 million rows only for this year so I can understand that). I mean, syntactically I think that this is correct but it takes forever to exectue. Even EXPLAIN for this times out.
And that is where I am. Is this the correct approach or is there any other way that is smarter/requires less query time/avoids the group by time clause?
EDIT: As a sample, we have this table (apologies for formatting, don't know how to make a table format here on SO)
id position time
1 1 1
2 2 1
3 5 1
4 19 1
... ... ...
25 7 1
26 3 2
27 6 2
... ... ...
This table shows how it looks like when there is a production run going on. As you can see, there is no general structure for which position gets the first entry when logging the data in the table; what happens is that the 25 positions gets logged during every second and the data is then added to the table depending on how fast the PLC sends the data for every position. The following table shows how the table looks like when it runs in research mode.
id position time
245 1 1
246 1 2
247 1 3
... ... ...
269 1 25
270 2 26
271 2 27
... ... ...
Since all the data is consolidated into one single table, we want to find out how many instances there are when COUNT(time) is exactly equal to 1, or we could look for every instance when COUNT(time) is strictly larger than 1.
EDIT2: As a reply to Alan, the suggestion gives me
YWeek Time_Seconds Time_Hours
201352 1 0.0
201352 1 0.0
201352 1 0.0
... ... ...
201352 1 0.0 (1000 row limit)
Whereas my desired output is
Yweek Time_Seconds Time_Hours
201352 2146 35.8
201401 5789 96.5
... ... ...
201419 8924 148.7
EDIT3: I have gathered the tries and the results so far here with a description in gray above the queries.
You might achieve better results by eliminating your sub select:
SELECT YEARWEEK(time) as YWeek,
COUNT(time) As Time_Seconds,
ROUND(COUNT(time)/3600, 1) As Time_Hours
FROM Database.productiondata
WHERE time > '2014-01-01 00:00:00'
GROUP BY YWeek
HAVING count(time) = 1)
ORDER BY YWeek;
I'm assuming time has an index on it, but if it does not you could expect a significant improvement in performance by adding one.
UPDATE:
Per the recently added sample data, I'm not sure your approach is correct. The time column appears to be an INT representing seconds while you're treating it as a DATETIME with YEARWEEK. Below I have a working example in SQL that does exactly what you asked IF time is actually a DATETIME column:
DECLARE #table TABLE
(
id INT ,
[position] INT ,
[time] DATETIME
)
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -1, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -2, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -3, GETDATE()) )
INSERT INTO #table
VALUES ( 1, 1, DATEADD(week, -3, GETDATE()) )
SELECT CAST(DATEPART(year, [time]) AS VARCHAR)
+ CAST(DATEPART(week, [time]) AS VARCHAR) AS YWeek ,
COUNT([time]) AS Time_Seconds ,
ROUND(COUNT([time]) / 3600, 1) AS Time_Hours
FROM #table
WHERE [time] > '2014-01-01 00:00:00'
GROUP BY DATEPART(year, [time]) ,
DATEPART(week, [time])
HAVING COUNT([time]) > 0
ORDER BY YWeek;
SELECT pd1.*
FROM Database.productiondata pd1
LEFT JOIN Database.productiondata pd2 ON pd1.time=pd2.time AND pd1.id<pd2.id
WHERE pd1.time > '2014-01-01 00:00:00' AND pd2.time > '2014-01-01 00:00:00'
AND pd2.id IS NULL
You can LEFT JOIN to the same table and leave only the rows with no related
UPDATE The query works using the SQL fiddle
SELECT pd1.* From productiondata pd1
left Join productiondata pd2
ON pd1.time = pd2.time and pd1.id < pd2.id
Where pd1.time > '2014-01-01 00:00:00' and pd2.id IS NULL;
I have incoming sensor data that is stored in a table. Each record for a sensor has few "counter" columns. Counters are nothing but a snapshot in time. example records are shown below.
|id | sensor | counter 1 | counter 2 | timestamp
1 5 100 200 10:00 AM
2 5 125 210 10:01 AM
...
60 5 1000 800 11:00 AM
I have thousands of such sensors sending snapshots of their counter values in time. What i need to do is, given a time period say between 10 am and 11 am, map the delta between records to a chart. Delta is always taken to the previous record.
What is the SQL query to get the delta's between consecutive records?
Second question I have is, what is a good table design to store such data which is self-referencing. Probably as a linked list with each record pointing to the previous record?
The generic SQL is:
select t.*,
(select counter1 from t t2 where t2.sensor = t.sensor and t2.timestamp < t.timestamp
order by timestamp desc
limit 1
) prevCounter1
from t
The limit 1 depends on the database. It might be top 1 (SQL Server, Sybase) or where rownum = 1 (Oracle).
You need to do this for each counter.
If you are using Postgres, Oracle, or SQL Server 2012, you can use lag():
select t.*,
lag(counter1) over (partition by sensor order by timestamp) as prevCounter1
from t
In MySQL, you will be best off if you put in a prevID. If not, an index on sensor, timestamp should give you reasonable performance.
I want to get the number of Registrations back from a time period (say a week), which isn't that hard to do, but I was wondering if it is in anyway possible to in MySQL to return a zero for days that have no registrations.
An example:
DATA:
ID_Profile datCreate
1 2009-02-25 16:45:58
2 2009-02-25 16:45:58
3 2009-02-25 16:45:58
4 2009-02-26 10:23:39
5 2009-02-27 15:07:56
6 2009-03-05 11:57:30
SQL:
SELECT
DAY(datCreate) as RegistrationDate,
COUNT(ID_Profile) as NumberOfRegistrations
FROM tbl_profile
WHERE DATE(datCreate) > DATE_SUB(CURDATE(),INTERVAL 9 DAY)
GROUP BY RegistrationDate
ORDER BY datCreate ASC;
In this case the result would be:
RegistrationDate NumberOfRegistrations
25 3
26 1
27 1
5 1
Obviously I'm missing a couple of days in between. Currently I'm solving this in my php code, but I was wondering if MySQL has any way to automatically return 0 for the missing days/rows. This would be the desired result:
RegistrationDate NumberOfRegistrations
25 3
26 1
27 1
28 0
1 0
2 0
3 0
4 0
5 1
This way we can use MySQL to solve any problems concerning the number of days in a month instead of relying on php code to calculate for each month how many days there are, since MySQL has this functionality build in.
Thanks in advance
No, but one workaround would be to create a single-column table with a date primary key, preloaded with dates for each day. You'd have dates from your earliest starting point right through to some far off future.
Now, you can LEFT JOIN your statistical data against it - then you'll get nulls for those days with no data. If you really want a zero rather than null, use IFNULL(colname, 0)
Thanks to Paul Dixon I found the solution. Anyone interested in how I solved this read on:
First create a stored procedure I found somewhere to populate a table with all dates from this year.
CREATE Table calendar(dt date not null);
CREATE PROCEDURE sp_calendar(IN start_date DATE, IN end_date DATE, OUT result_text TEXT)
BEGIN
SET #begin = 'INSERT INTO calendar(dt) VALUES ';
SET #date = start_date;
SET #max = SUBDATE(end_date, INTERVAL 1 DAY);
SET #temp = '';
REPEAT
SET #temp = concat(#temp, '(''', #date, '''), ');
SET #date = ADDDATE(#date, INTERVAL 1 DAY);
UNTIL #date > #max
END REPEAT;
SET #temp = concat(#temp, '(''', #date, ''')');
SET result_text = concat(#begin, #temp);
END
call sp_calendar('2009-01-01', '2010-01-01', #z);
select #z;
Then change the query to add the left join:
SELECT
DAY(dt) as RegistrationDate,
COUNT(ID_Profile) as NumberOfRegistrations
FROM calendar
LEFT JOIN
tbl_profile ON calendar.dt = tbl_profile.datCreate
WHERE dt BETWEEN DATE_SUB(CURDATE(),INTERVAL 6 DAY) AND CURDATE()
GROUP BY RegistrationDate
ORDER BY dt ASC
And we're done.
Thanks all for the quick replies and solution.