Calculating time difference of every other row from a table - mysql

Note: The data for my question is on SQLFiddle right here where you
can query it.
How the table is created
I have data from a table and put into a temp table using the below logic but the BETWEEN start and end date time stamps are dynamically generated based on other logic in the stored proc, etc.
SET #RowNum = 0;
DROP TEMPORARY TABLE IF EXISTS temp;
CREATE TEMPORARY TABLE temp AS
SELECT #RowNum := #RowNum + 1 RowNum
, TimeStr
, Value
FROM mytable
WHERE TimeStr BETWEEN '2018-01-31 06:15:56' AND '2018-01-31 19:27:09'
AND iQuality = 3 ORDER BY TimeStr;
This gives me a temp table with the row number which increments up one number in order starting with the oldest based TimeStr records, so the oldest is the time of the first record or RowNum 1.
Temp Table
The Data
You can get to this temp table data and play with the queries here on the SQLFiddle I've created but I have a few things I tried there you'll see there which don't give me what I need though.
Attempt to Clarify Further
I need to get the time for each ON and OFF set based on the TimeStr values in each set and I can get this using the TIMEDIFF() function.
I'm having a hard time figuring out how to make it give me the result of each ON and OFF record. The records are always in order from oldest to newest and the row number always starts at 1 too.
I some how need to give give every two records with one after the other RowNum values wise a matching CycleNum starting at 1 and increment by one per each ON and OFF cycle or set.
I can use TIMEDIFF(MAX(TimeStr), MIN(TimeStr)) as duration but I'm not sure how to best get it to group every two RowNum records in order as explained to give each set a subsequent CycleNum value that increments.
Expected Output
The expected output show look like the below screen shot for all ON and OFF cycles or every two RowNum in groups and sequence.
Output Clarification
I need the output to include each ON and OFF cycle's start time, end time, and the duration for the time between the start and stop.

If you can guarantee two things:
That the row numbers are strictly sequential with no gaps.
That the on/off flag is always alternating.
Then you can do this with a relatively simple join. The code looks like:
SELECT (#rn := #rn + 1) as cycle, t.*, tnext.timestr,
timediff(tnext.timestr, t.timestr)
FROM temp t JOIN
temp tnext
ON t.rownum = tnext.rownum - 1 and
t.value = 1 and
tnext.value = 0 cross join
(SELECT #rn := 0) params;
If these conditions are not true, then more complex logic is needed.

Here is a simpler one :
SELECT
t1.TimeStr AS StartTime,
t2.TimeStr AS EndTime,
TIMEDIFF(t2.TimeStr, t1.TimeStr) AS Duration
FROM temp t1
INNER JOIN temp t2 ON t2.RowNum = t1.RowNum + 1
WHERE
t2.Value = 0
AND t1.Value = 1

A quick and dirty way to do it would be this:
SELECT
T1.TimeStr AS StartTime,
(SELECT T2.TimeStr FROM temp AS T2 WHERE T2.RowNum = T1.RowNum+1) AS StopTime,
TIMEDIFF((SELECT T2.TimeStr FROM temp AS T2 WHERE T2.RowNum = T1.RowNum+1),
T1.TimeStr) AS Duration
FROM temp AS T1
WHERE Value = 1;
Seems like there must be better ways to do this. Two subqueries will be slow.
You could do it in two steps:
CREATE TEMPORARY TABLE startstop AS
SELECT
T1.TimeStr AS StartTime,
(SELECT T2.TimeStr FROM temp AS T2 WHERE T2.RowNum = T1.RowNum+1) AS StopTime,
0 AS Duration
FROM temp AS T1
WHERE Value = 1;
UPDATE startstop SET Duration = StopTime - StartTime;
However I cannot test this in the Fiddle.

Related

MySQL self-join optimization while calculating move averages

I have created a mysql query to calculate moving averages of data by using multiple self-joins as shown below. This is consuming lot of time and the data rows are in 100k per query. Any way to further optimize it to reduce time please?
select a.rownum,a.ma_small_price, b.ma_medium_price from
(SELECT t3.rownum, AVG(t.last_price) as 'ma_small_price'
FROM temp_data t3
left JOIN temp_data t ON t.rownum BETWEEN ifnull(t3.rownum,0) - #psmall AND t3.rownum
GROUP BY t3.rownum)
inner join
(SELECT t3.rownum, AVG(t.last_price) as 'ma_medium_price'
FROM temp_data t3
left JOIN temp_data t ON t.rownum BETWEEN ifnull(t3.rownum,0) - #pmedium AND t3.rownum
GROUP BY t3.rownum) b on a.rownum = b.rownum
OVER ( ... ) is disappointingly slow -- in both MySQL 8.0 and MariaDB 10.x.
I like "exponential moving average" as being easier to compute than "moving average". The following is roughly equivalent to what Nick proposed. This runs faster, but has slightly different results:
SELECT rownum,
#small := #small + 0.5 * (last_price - #small) AS mae_small_price,
#med := #med + 0.2 * (last_price - #med) AS mae_med_price
FROM ( SELECT #small := 10, #med := 10 ) AS init
JOIN temp_data
ORDER BY rownum;
The coefficient controls how fast the exponential moving average adapts to changes in the data. It should be greater than 0 and less than 1.
The "10" that I initialized the EPA to was a rough guess of the average -- it biases the first few values but is gradually swamped as more values are folded in.
Since you're running MySQL 8 you should be able to use window functions to get the same result more efficiently. Without seeing sample data it's hard to be 100% certain but this should be close. Note that to use variables in a window frame, you need to use a prepared statement:
SET #sql = '
SELECT rownum,
AVG(last_price) OVER (ORDER BY rownum ROWS BETWEEN ? PRECEDING AND CURRENT ROW) AS ma_small_price,
AVG(last_price) OVER (ORDER BY rownum ROWS BETWEEN ? PRECEDING AND CURRENT ROW) AS ma_medium_price
FROM temp_data';
PREPARE stmt FROM #sql;
EXECUTE stmt USING #psmall, #pmedium;
Demo on dbfiddle

Getting previous row in MySQL

I'm stucked in a MySQL problem that I was not able to find a solution yet. I have the following query that brings to me the month-year and the number new users of each period in my platform:
select
u.period ,
u.count_new as new_users
from
(select DATE_FORMAT(u.registration_date,'%Y-%m') as period, count(distinct u.id) as count_new from users u group by DATE_FORMAT(u.registration_date,'%Y-%m')) u
order by period desc;
The result is the table:
period,new_users
2016-10,103699
2016-09,149001
2016-08,169841
2016-07,150672
2016-06,148920
2016-05,160206
2016-04,147715
2016-03,173394
2016-02,157743
2016-01,173013
So, I need to calculate for each month-year the difference between the period and the last month-year. I need a result table like this:
period,new_users
2016-10,calculate(103699 - 149001)
2016-09,calculate(149001- 169841)
2016-08,calculate(169841- 150672)
2016-07,So on...
2016-06,...
2016-05,...
2016-04,...
2016-03,...
2016-02,...
2016-01,...
Any ideas: =/
Thankss
You should be able to use a similar approach as I posted in another S/O question. You are on a good track to start. You have your inner query get the counts and have it ordered in the final direction you need. By using inline mysql variables, you can have a holding column of the previous record's value, then use that as computation base for the next result, then set the variable to the new balance to be used for each subsequent cycle.
The JOIN to the SqlVars alias does not have any "ON" condition as the SqlVars would only return a single row anyhow and would not result in any Cartesian product.
select
u.period,
if( #prevCount = -1, 0, u.count_new - #prevCount ) as new_users,
#prevCount := new_users as HoldColumnForNextCycle
from
( select
DATE_FORMAT(u.registration_date,'%Y-%m') as period,
count(distinct u.id) as count_new
from
users u
group by
DATE_FORMAT(u.registration_date,'%Y-%m') ) u
JOIN ( select #prevCount := -1 ) as SqlVars
order by
u.period desc;
You may have to play with it a little as there is no "starting" point in counts, so the first entry in either sorted direction may look strange. I am starting the "#prevCount" variable as -1. So the first record processed gets a new user count of 0 into the "new_users" column. THEN, whatever was the distinct new user count was for the record, I then assign back to the #prevCount as the basis for all subsequent records being processed. yes, it is an extra column in the result set that can be ignored, but is needed. Again, it is just a per-line place-holder and you can see in the result query how it gets its value as each line progresses...
I would create a temp table with two columns and then fill it using a cursor that
does something like this (don't remember the exact syntax - so this is just a pseudo-code):
#val = CURSOR.col2 - (select col2 from OriginalTable t2 where (t2.Period = (CURSOR.Period-1) )))
INSERT tmpTable (Period, NewUsers) Values ( CURSOR.Period, #val)

Find difference between sequential rows mysql, no row ID

Objective: get the difference between the value in a row and the value in the next row (I'm using MySQL). Say we have the table "events":
step: timestamp:
Leave for store 1400000000
Buy hamburgers 1400000002
Big party 1400000005
So the result we'd expect is:
2
3
Complication 1: My table doesn't have an ID column, so I can't do this:
select (e2.timestamp - e1.timestamp)
from events e1, events e2
where (e1.id + 1) = e2.id
Complication 2: I'm using a database connection (Splunk) that won't allow me to create or alter temporary tables (otherwise I'd just add an id column). Am I hosed?
thank you!
Use a user variable to hold the timestamp from the previous line.
SELECT step, timestamp - #prevtime AS diff, #prevtime := timestamp
FROM events
CROSS JOIN (SELECT #prevtime := 0) AS x
ORDER BY timestamp

MySQL query index & performance improvements

I have created an application to track progress in League of Legends for me and my friends. For this purpose, I collect information about the current rank several times a day into my MySQL database. To fetch the results and show the to them in the graph, I use the following query / queries:
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
SELECT
lol_summoner.name as name, grid.series + ? as timestamp,
AVG(NULLIF(lol.points, 0)) as points
FROM
series_tmp grid
JOIN
lol ON lol.timestamp >= grid.series AND lol.timestamp < grid.series + ?
JOIN
lol_summoner ON lol.summoner = lol_summoner.id
WHERE
lol_summoner.name IN (". str_repeat('?, ', count($names) - 1) ."?)
GROUP BY
lol_summoner.name, grid.series
ORDER BY
name, timestamp ASC
The first query is used in case I want to retrieve all players which are saved in the database. The grid table is a temporary table which generated timestamps in a specific interval to retrive information in chunks of this interval. The two variable in this query are the interval. The second query is used if I want to retrieve information for specific players only.
The grid table is produces by the following stored procedure which is called with three parameters (n_first - first timestamp, n_last - last timestamp, n_increments - increments between two timestamps):
BEGIN
-- Create tmp table
DROP TEMPORARY TABLE IF EXISTS series_tmp;
CREATE TEMPORARY TABLE series_tmp (
series bigint
) engine = memory;
WHILE n_first <= n_last DO
-- Insert in tmp table
INSERT INTO series_tmp (series) VALUES (n_first);
-- Increment value by one
SET n_first = n_first + n_increment;
END WHILE;
END
The query works and finishes in reasonable time (~10 seconds) but I am thankful for any help to improve the query by either rewriting it or adding additional indexes to the database.
/Edit:
After review of #Rick James answer, I modified the queries as follows:
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
GROUP by lol_summoner.name, lol.timestamp div :range
ORDER by name, timestamp ASC
SELECT lol_summoner.name as name, (lol.timestamp div :range) * :range + :half_range as timestamp, AVG(NULLIF(lol.points, 0)) as points
FROM lol
JOIN lol_summoner ON lol.summoner = lol_summoner.id
WHERE lol_summoner.name IN (<NAMES>)
GROUP by lol_summoner.name, lol.timestamp div " . $steps . "
ORDER by name, timestamp ASC
This improves the query execution time by a really good margin (finished way under 1s).
Problem 1 and Solution
You need a series of integers between two values? And they differ by 1? Or by some larger value?
First, create a permanent table of the numbers from 0 to some large enough value:
CREATE TABLE Num10 ( n INT );
INSERT INTO Num10 VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
CREATE TABLE Nums ( n INT, PRIMARY KEY(n))
SELECT a.n*1000 + b.n*100 + c.n*10 + d.n
FROM Num10 AS a
JOIN Num10 AS b -- note "cross join"
JOIN Num10 AS c
JOIN Num10 AS d;
Now Nums has 0..9999. (Make it bigger if you might need more.)
To get a sequence of consecutive numbers from 123 through 234:
SELECT 123 + n FROM Nums WHERE n < 234-123+1;
To get a sequence of consecutive numbers from 12345 through 23456, in steps of 15:
SELECT 12345 + 15*n FROM Nums WHERE n < (23456-12345+1)/15;
JOIN to a SELECT like one of those instead of to series_tmp.
Barring other issue, that should significantly speed things up.
Problem 2
You are GROUPing BY series, but ORDERing by timestamp. They are related, so you might get the 'right' answer. But think about it.
Problem 3
You seem to be building "buckets" (called "series"?) from "timestamps". Is this correct? If so, let's work backwards -- Turn a "timestamp" into a "bucket" number:
bucket_number = (timestamp - start) / bucket_size
By doing that throughout, you can avoid 'Problem 1' and eliminate my solution to it. That is, reformulate the entire queries in terms of buckets.

Updating sort keys after delete

I have a table which has a field sort_id. In this field there are numbers from 1 to n, that define the order of the data sets.
Now I want to delete some elements and afterwards I want to reorder the table. Therefore I need a query that "finds" the gaps and changes the sort_id field according to the modifications.
Sure, I could do something like this:
SELECT sort_id FROM table WHERE id = 5
Then save the sort_id and afterwards:
DELETE FROM table WHERE id = 5
UPDATE table SET sort_id = sort_id - 1 WHERE sort_id > {id from above}
But I'd like to do the reordering process in one step.
Mladen and Arvo have good ideas, but unfortunately in MySQL you can't SELECT and UPDATE the same table in the same statement (even in a subquery). This is a known limitation of MySQL.
Here's a solution that uses MySQL user variables:
SET #i := 0;
UPDATE mytable
SET sort_id = (#i := #i + 1)
ORDER BY sort_id;
For what it's worth, I wouldn't bother doing this anyway. If your sort_id is used only for sorting and not as a kind of "row number," then the rows are still in sorted order after you delete the row where id=6. The values don't necessarily have to be consecutive for sorting.
for sql server 2005:
this is how you get the new sequence:
SELECT row_number() over(order by sort_id) as RN
FROM table
updating the table means you should join that select to your update:
update t1
set sort_id = t2.RN
FROM table t1
join (SELECT row_number() over(order by sort_id) as RN FROM table) t2
on t1.UniqueId = t2.UniqueId
I don't know MySQL syntax variations and cannot test query live, but something like next should give you at least an idea:
update table t1
set sort_id = (select count * from table t2 where t2.sort_id <= t1.sort_id)