Finding entire fluctuation in a dataset - mysql

I have a table of historic data for a set of tanks in a MySQL database. I want to find fluctuations in the volume of tank contents of greater than 200 gallons/hour. My SQL statement thus far is:
SELECT t1.tankhistid as start, t2.tankhistid as end
FROM
(SELECT * from tankhistory WHERE tankid = ? AND curtime BETWEEN ? AND ?) AS t1
INNER JOIN
(SELECT * from tankhistory WHERE tankid = ? AND curtime BETWEEN ? AND ?) AS t2
ON t1.tankid = t2.tankid AND t1.curtime < t2.curtime
WHERE TIMESTAMPDIFF(HOUR, t1.curtime, t2.curtime) < 1 AND ABS(t1.vol - t2.vol) > 200
ORDER BY t1.tankhistid, t2.tankhistid
In the code above, curtime is a timestamp at the time of inserting the record, tankhistid is the table integer primary key, tankid is the individual tank id, and vol is the volume reading.
This returns too many results since data is collected every 5 minutes and fluctuations could take hours (multiple rows with the same id in an end and then start column) , or just over 10 minutes (multiple rows with the same start or end id). Example output:
7514576,7515478
7515232,7515478
7515314,7515478
7515396,7515478
7515478,7515560
7515478,7515642
7515478,7515724
Note that all of these rows should just be one: 7514576,7515724. The query takes 4 minutes for just one day of a tank's data, so any speed up would be great as well. I am guessing there is a way to take the current query and use it as a subquery, but I am not sure how to do the filtering.

Related

how to update a sql database table based on the condition from another table in the same database

I have a two tables in a database.
table_1(device_ID, date,voltage)
table_2(device_ID,device_status)
I am trying to create an event to execute every 5 minutes.
What I am trying to achieve is, select device_ID from table_1 if there is no new data over the last 10 minutes and update the table_2, that means set device_status to 0.
How do i pass conditions between two tables?
BEGIN
select device_ID from table_1 where date = DATE_SUB(NOW(), INTERVAL 10 Minutes);
//here i will get device_IDs if there was a data within last 10 minutes.
//but i need device_ID if there were no data.
//how to update table_2 based on the above condition?
END
You can use the results of your first query as a subquery to de-select rows (by using NOT IN) for the UPDATE:
UPDATE table2
SET device_status = 0
WHERE device_ID NOT IN (select device_ID
from table_1
where date > DATE_SUB(NOW(), INTERVAL 10 Minutes))
Note I think you probably want >, not = in your where condition in the subquery.

Finding First Appearing Value in a List of Duplicate Values

I have a table that stores the statuses an applications goes through. Some applications go through the same status multiple times. Each time it goes through a status, the time of the status change is recorded.
How can I pull a list of applications based on the first time applications goes through a particular status within a specified date range. Below is what I have tried thus far:
SELECT d1.STATUS,
d1.APPL_ID
FROM APP_STATUS d1
LEFT JOIN APP_STATUS d2 ON d1.APPL_ID = d2.APPL_ID AND d1.STATUS = 'AT_CUSTOMER' AND d2.STATUS = 'AT_CUSTOMER'
WHERE DATE(d1.STATUS_CREATE_DT) >= '2014-10-26'
AND DATE(d1.STATUS_CREATE_DT) <= '2014-11-25'
AND d2.STATUS IS NULL
GROUP BY d1.APPL_ID;
To get the first time a status goes through, try this query:
select a.appl_id, min(status_create_dt) as first_dt
from ap_status
where d.STATUS_CREATE_DT >= '2014-10-26' and
d.STATUS_CREATE_DT < date('2014-11-25') + interval 1 day and
d2.STATUS IS NULL
group by a.appl_id;
I think this does what you need. If you want more columns, then you can join this back to ap_status.
Note that I changed the date logic a bit. The date functions are only on the constant side of the dates. This allows the query to take advantage of an index on STATUS_CREATE_DT, if appropriate.

MariaDB: Optimum Way to Update Billions of Records

I am looking for an optimum way to update billions of records present in one table (Example below Table 3). Each entry is associated with a timestamp that is in the order of milliseconds. In this example Table 3 is out of date, Tables 1 and 2 are up to date with the real entries of their respective data. I do not have anything that links tables 1 and 2 to table 3. If needed, please also let me know as I am not a database expert.
Table 1 has 4 columns:
Timestamp T_0 PRIMARY KEY (ex: '2014-07-04 16:17:16.800000')
X1_T1 VARCHAR
X2_T1 VARCHAR
X3_T1 VARCHAR
Table 2 has 4 columns:
Timestamp T_0 PRIMARY KEY (ex: '2014-07-04 16:17:16.800000')
X1_T2 VARCHAR
X2_T2 VARCHAR
X3_T3 VARCHAR
Table 3 has 7 columns:
Timestamp T_0 PRIMARY KEY (ex: '2014-07-04 16:17:16.800000')
X1_T1 VARCHAR
X2_T1 VARCHAR
X3_T1 VARCHAR
X1_T2 VARCHAR
X2_T2 VARCHAR
X3_T3 VARCHAR
I was successful at updating table 3 using a procedure that loops through timestamps and updates each row using the command:
SET tmp_T_0=(SELECT '2014-01-05 17:00:00.000000'); // set to the start of the table's timestamp
label1: LOOP
UPDATE TABLE3 SET
X1_T1=(select X1_T1 FROM TABLE1 where T_0 = tmp_T_0),
X2_T1=(select X2_T1 FROM TABLE1 where T_0 = tmp_T_0),
X3_T1=(select X3_T1 FROM TABLE1 where T_0 = tmp_T_0),
X1_T2=(select X1_T2 FROM TABLE2 where T_0 = tmp_T_0),
X2_T2=(select X2_T2 FROM TABLE2 where T_0 = tmp_T_0),
X3_T2=(select X3_T2 FROM TABLE2 where T_0 = tmp_T_0)
WHERE T_0 = tmp_T_0;
SET tmp_T_0=(SELECT TIMESTAMP(tmp_T_0,'00:00:00.001')); //ADD one millisecond and continue
SET LoopInt=(SELECT(LoopInt + 1));
IF LoopInt < LoopEnd THEN
ITERATE label1;
END IF;
LEAVE label1;
END LOOP label1;
The above method takes around 53 seconds for 100,000 entries. That is not acceptable because it would require around 100 days to complete the rest of entries.
It should be noted that it's not a must that Table 3 has data from tables 1 and/or 2 for each of its respective timestamp entries (i.e., a timestamp in Table 3 may contain data for X1_T1 X2_T1 and X3_T1 while the other values X1_T2 X2_T2 and X3_T2 is NULL).
Any suggestions would help.
Thank you
What about trying this query to pull one hour's worth of info from TABLE1 to TABLE3?
UPDATE TABLE3 AS t3
JOIN TABLE1 AS t1 ON t3.T_0 = t1.T_0
SET t3.X1_T1 = IFNULL(t1.X1_T1,t3.X1_T1),
t3.X2_T1 = IFNULL(t1.X2_T1,t3.X2_T1),
t3.X3_T1 = IFNULL(t1.X3_T1,t3.X3_T1)
WHERE t3.T_0 >= '2014-01-05' + INTERVAL 0 HOUR
AND t3.T_0 < '2014-01-05' + INTERVAL 1 HOUR
What's going on? First, the WHERE clause limits the query's scope to one hour. That's handy because you can test stuff. Also, you're going to want to loop this job hour by hour so your queries don't run for too long. If you're using InnoDB or Aria as a storage engine, if you don't limit the scope of your queries you'll blow out the transaction rollback space too.
You can run this query many times, each time with a change to the HOUR interval, like so.
WHERE t3.T_0 >= '2014-01-05' + INTERVAL 1 HOUR
AND t3.T_0 < '2014-01-05' + INTERVAL 2 HOUR
You're JOINing TABLE1 to TABLE3. That's valid because you've said TABLE3 contains every possible timestamp, and TABLE1 doesn't. The way I have written this query, it won't touch the rows of TABLE3 that don't have corresponding rows in TABLE1. I think that's what you want.
Finally, the IFNULL() function arranges only to change the TABLE3 data when there's non-NULL TABLE1 data.
Look, if your TABLE1 data is sparse (that is, it has lots of randomly scattered valid values in a table that is mostly NULL) you probably want to use three queries like this instead, so you don't actually change rows in TABLE3 unless you have new data. Changing values in rows is relatively expensive.
UPDATE TABLE3 AS t3
JOIN TABLE1 AS t1 ON t3.T_0 = t1.T_0
SET t3.X1_T1 = t1.X1_T1
WHERE t3.T_0 >= '2014-01-05' + INTERVAL 0 HOUR
AND t3.T_0 < '2014-01-05' + INTERVAL 1 HOUR
AND t1.X1_T1 IS NOT NULL
UPDATE TABLE3 AS t3
JOIN TABLE1 AS t1 ON t3.T_0 = t1.T_0
SET t3.X2_T1 = t1.X2_T1
WHERE t3.T_0 >= '2014-01-05' + INTERVAL 0 HOUR
AND t3.T_0 < '2014-01-05' + INTERVAL 1 HOUR
AND t1.X2_T1 IS NOT NULL
UPDATE TABLE3 AS t3
JOIN TABLE1 AS t1 ON t3.T_0 = t1.T_0
SET t3.X3_T1 = t1.X3_T1
WHERE t3.T_0 >= '2014-01-05' + INTERVAL 0 HOUR
AND t3.T_0 < '2014-01-05' + INTERVAL 1 HOUR
AND t1.X3_T1 IS NOT NULL
You will need to repeat all this for your TABLE2 data.
You may want to run this whole thing in a single query. Don't do that! This is the kind of job you need to be able to do an hour at a time and restart when needed. I suggest an hour at a time, but that is 3.6 megarows. You might want to do even smaller chunks at a time, like 6 minutes (360 kilorows).
If I were you I'd definitely debug this whole deal on a copy of a couple of days' worth of your TABLE3.

Using JOIN with DISTINCT and prioritize one table

I am trying to combine data from 2 tables.
Those 2 tables both contain data from the same sensor (lets say a sensor that measures CO2 with 1 entry per 10 minutes).
The first table contains validated data. Let's call it station1_validated. The 2nd table contains raw data. Let's call this one station1_nrt.
While the raw-data table contains live data, the validated table contains only data points that are at least 1 month old. (It needs some time to validate those data and to control it manually afterwards, this happens only once every month).
What I am trying to do now is to combine the data of those 2 tables to display live data on a website. However when validated data is available it should prioritize that data point over the raw data-point.
The relevant columns for this are:
timed [bigint(20)]: Contains the datetime as a unix timestamp in milliseconds from 1.1.1970
CO2 [double]: Contains the measured concentration of CO2 in ppm (parts per million)
I wrote this basic SQL:
SELECT
*
FROM
(SELECT
timed, CO2, '2' tab
FROM
station1_nrt
WHERE
TIMED >= 1386932400000
AND TIMED <= 1386939600000
AND TIMED NOT IN (SELECT
timed
FROM
station1_nrt
WHERE
CO2 IS NOT NULL
AND TIMED >= 1386932400000
AND TIMED <= 1386939600000) UNION SELECT
timed, CO2, '1' tab
FROM
station1_validated
WHERE
CO2 IS NOT NULL
AND TIMED >= 1386932400000
AND TIMED <= 1386939600000) a
ORDER BY timed
This does not work correctly as it selects only those data points where both tables have an entry.
However I want to do this with a JOIN now as it would be much faster. However I don't know how to a JOIN with a DISTINCT (or something similar) with prioritizing a table. Could someone help me out with this (or explain it?)
You haven't mentioned if there exist records in station1_validated which don't exist in station1_nrt so I use FULL JOIN. If all rows from station1_validated exist in station1_nrt then you can use LEFT JOIN instead.
Something like this
SELECT IFNULL(n.timed,v.timed) as timed,
CASE WHEN v.timed IS NOT NULL THEN v.CO2 ELSE n.CO2 END as CO2,
CASE WHEN v.timed IS NOT NULL THEN '1' ELSE '2' END as tab
FROM station1_nrt as n
FULL JOIN station1_validated as v ON n.timed=v.timed AND v.CO2 IS NOT NULL
WHERE
( n.TIMED between 1386932400000 AND 1386939600000
or
v.TIMED between 1386932400000 AND 1386939600000
)
AND
(n.CO2 IS NOT NULL OR v.CO2 IS NOT NULL)
MySQL has an IF that would probably work for you. You would have to select specific columns, though, but you could build the query programmatically.
SELECT
IF(DATE_SUB(NOW(), INTERVAL 1 MONTH) < FROM_UNIXTIME(nrt.TIMED),
val.value,
nrt.value
) AS value
-- Similar for other values
FROM
station1_nrt AS nrt
JOIN station1_validated AS val USING(id)
ORDER BY TIMED
Note that the USING(id) is a placeholder. Presumably there is some indexed column you can join the two tables on.
You can join and then use IFs in the fields to choose the validated values if they exist. Something like:
SELECT
IFNULL(s1val.timed,s1.timed) AS timed,
IFNULL(s1val.C02,s1.C02) AS C02,
2 AS 2,
IFNULL(s1val.tab,s1.tab) AS tab,
FROM
station1_nrt s1
LEFT JOIN station1_validated s1val ON (s1.TIMED = s1val.TIMED)
WHERE
-- Any necessary where clauses
#Jim, #valex, #ExplosionPills
I managed to write a SQL select that emulates a FULL OUTER JOIN (as there is no FULL JOIN in MySQL) and returns the value of the validated data if it exists. If no validated data is available it will return the raw value
So this is the SQL I am using now:
SET #StartTime = 1356998400000;
SET #EndTime = 1386546000000;
SELECT
timed,
IFNULL (mergedData.validatedValue, mergedData.rawValue) as value
FROM
((SELECT
from_unixtime(timed / 1000) as timed,
rawData.NOX as rawValue,
validatedData.NOX as validatedValue
FROM
nabelnrt_bas as rawData
LEFT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE
(rawData.timed > #StartTime
AND rawData.timed < #EndTime)
OR (validatedData.timed > #StartTime
AND validatedData.timed < #EndTime)
) UNION (
SELECT
from_unixtime(timed / 1000) as timed,
rawData.NOX as rawValue,
validatedData.NOX as validatedValue
FROM
nabelnrt_bas as rawData
RIGHT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE
(rawData.timed > #StartTime
AND rawData.timed < #EndTime)
OR (validatedData.timed > #StartTime
AND validatedData.timed < #EndTime)
)
ORDER BY timed DESC) as mergedData

SQL - Calculating variable moving average over variable lenghts

FIRST: This question is NOT a duplicate. I have asked this on here already and it was closed as a duplicate. While it is similar to other threads on stackoverflow, it is actually far more complex. Please read the post before assuming it is a duplicate:
I am trying to calculate variable moving averages crossover with variable dates.
That is: I want to prompt the user for 3 values and 1 option. The input is through a web front end so I can build/edit the query based on input or have multiple queries if needed.
X = 1st moving average term (N day moving average. Any number 1-N)
Y = 2nd moving average term. (N day moving average. Any number 1-N)
Z = Amount of days back from present to search for the occurance of:
option = Over/Under: (> or <. X passing over Y, or X passing Under Y)
X day moving average passing over OR under Y day moving average
within the past Z days.
My database is structured:
tbl_daily_data
id
stock_id
date
adj_close
And:
tbl_stocks
stock_id
symbol
I have a btree index on:
daily_data(stock_id, date, adj_close)
stock_id
I am stuck on this query and having a lot of trouble writing it. If the variables were fixed it would seem trivial but because X, Y, Z are all 100% independent of each other (could look, for example for 5 day moving average within the past 100 days, or 100 day moving average within the past 5) I am having a lot of trouble coding it.
Please help! :(
Edit: I've been told some more context might be helpful?
We are creating an open stock analytic system where users can perform trend analysis. I have a database containing 3500 stocks and their price histories going back to 1970.
This query will be running every day in order to find stocks that match certain criteria
for example:
10 day moving average crossing over 20 day moving average within 5
days
20 day crossing UNDER 10 day moving average within 5 days
55 day crossing UNDER 22 day moving average within 100 days
But each user may be interested in a different analysis so I cannot just store the moving average with each row, it must be calculated.
I am not sure if I fully understand the question ... but something like this might help you get where you need to go: sqlfiddle
SET #X:=5;
SET #Y:=3;
set #Z:=25;
set #option:='under';
select * from (
SELECT stock_id,
datediff(current_date(), date) days_ago,
adj_close,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #X
) move_av_1,
(
SELECT
AVG(adj_close) AS moving_average
FROM
tbl_daily_data T2
WHERE
(
SELECT
COUNT(*)
FROM
tbl_daily_data T3
WHERE
date BETWEEN T2.date AND T1.date
) BETWEEN 1 AND #Y
) move_av_2
FROM
tbl_daily_data T1
where
datediff(current_date(), date) <= #z
) x
where
case when #option ='over' and move_av_1 > move_av_2 then 1 else 0 end +
case when #option ='under' and move_av_2 > move_av_1 then 1 else 0 end > 0
order by stock_id, days_ago
Based on answer by #Tom H here: How do I calculate a moving average using MySQL?