Time interval calculation in time series using SQL - mysql

I have a MySQL table like this
CREATE TABLE IF NOT EXISTS `vals` (
`DT` datetime NOT NULL,
`value` INT(11) NOT NULL,
PRIMARY KEY (`DT`)
);
the DT is unique date with time
data sample:
INSERT INTO `vals` (`DT`,`value`) VALUES
('2011-02-05 06:05:00', 300),
('2011-02-05 11:05:00', 250),
('2011-02-05 14:35:00', 145),
('2011-02-05 16:45:00', 100),
('2011-02-05 18:50:00', 125),
('2011-02-05 19:25:00', 100),
('2011-02-05 21:10:00', 125),
('2011-02-06 00:30:00', 150);
I need to get something like this:
start|end|value
NULL,'2011-02-05 06:05:00',300
'2011-02-05 06:05:00','2011-02-05 11:05:00',250
'2011-02-05 11:05:00','2011-02-05 14:35:00',145
'2011-02-05 14:35:00','2011-02-05 16:45:00',100
'2011-02-05 16:45:00','2011-02-05 18:50:00',125
'2011-02-05 18:50:00','2011-02-05 19:25:00',100
'2011-02-05 19:25:00','2011-02-05 21:10:00',125
'2011-02-05 21:10:00','2011-02-06 00:30:00',150
'2011-02-06 00:30:00',NULL,NULL
I tried the following query:
SELECT T1.DT AS `start`,T2.DT AS `stop`, T2.value AS value FROM (
SELECT DT FROM vals
) T1
LEFT JOIN (
SELECT DT,value FROM vals
) T2
ON T2.DT > T1.DT ORDER BY T1.DT ASC
but it returns to many rows (29 instead of 9) in result and I cold not find any way to limit this using SQL. Is it Possible in MySQL?

Use a subquery
SELECT
(
select max(T1.DT)
from vals T1
where T1.DT < T2.DT
) AS `start`,
T2.DT AS `stop`,
T2.value AS value
FROM vals T2
ORDER BY T2.DT ASC
You can also use a MySQL specific solution employing variables
SELECT CAST( #dt AS DATETIME ) AS `start` , #dt := DT AS `stop` , `value`
FROM (SELECT #dt := NULL) dt, vals
ORDER BY dt ASC
But you need to do it precisely
the ORDER by must be present otherwise the variables don't roll properly
the variable needs to be NULLified within the query using a subquery to set it, otherwise if you run it twice in a row, the 2nd time it will not start with NULL

You can use a server-side variable to simulate it:
select #myvar as start, end, value, #myvar := end as next_rows_start
from vals
Variables are interpreted from left-right in sequence, so the two references to #myvar (start and next_rows_start) will output with two different values.
Just remember to reset #myvar to null before and/or after the query, otherwise the second and subsequent runs will have a wrong first row:
select #myvar := null

This would be easier if the table had a running ID column which corresponds to the times in DT (same order). If you don't want to change the table you can use a temp:
drop table if exists temp;
CREATE TABLE temp (
`id` INT(11) AUTO_INCREMENT,
`DT` datetime NOT NULL,
`value` INT(11) NOT NULL,
PRIMARY KEY (`id`)
);
insert into temp (DT,value) select * from vals order by DT asc;
select t1.DT as `start`, t2.DT as `end`, t2.value
from temp t2
left join temp t1 ON t2.id = t1.id + 1;

Related

Efficient SQL query to find gap in consecutive numeric data (MySQL)

I have a table with column "time" (INT unsigned), every row represents one second and I need to find gaps in time (missing seconds).
I have tried with this query (to find the first time before a gap):
SELECT t1.time
FROM `table` AS t1
LEFT JOIN `table` AS t2 ON t2.time=(t1.time+1)
WHERE t2.time IS NULL
ORDER BY TIME ASC
LIMIT 1
And it works but it's too slow for big tables (near 100M rows)
Is there some faster solution?
EXPLAIN query:
SHOW CREATE:
CREATE TABLE `candles` (
`time` int(10) unsigned NOT NULL,
`open` float unsigned NOT NULL,
`high` float unsigned NOT NULL,
`low` float unsigned NOT NULL,
`close` float unsigned NOT NULL,
`vb` int(10) unsigned NOT NULL,
`vs` int(10) unsigned NOT NULL,
`trades` int(10) unsigned NOT NULL,
PRIMARY KEY (`time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
If DB version is 8.0, then The Recursive Common Table Expression might be used such as
WITH RECURSIVE cte AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1 AS value
FROM cte
WHERE cte.n < (SELECT MAX(time) FROM tab )
)
SELECT n AS gaps
FROM cte
LEFT JOIN tab
ON n=time
WHERE cte.n > (SELECT MIN(time) FROM tab )
AND time IS NULL
Demo
In MySQL 5.7, this is a use case where user variables might be helpful:
select max(time)
from (
select t.time, #rn := #rn + 1 as rn
from (select time from mytable order by time) t
cross join (select #rn := 0) r
) t
group by time - rn
This addresses the question as a gaps-and-islands problem. The idea is to identify groups of records where time increments without gaps (the islands). For this, we assign an incrementing id to each row, ordered by time; whenever the difference between time and the auto-increment changes, you know there is a gap.
With mysql 8, you can use LEAD():
select time from (
select time, lead(time, 1) over (order by time) next_time
from `table`
) t
where time+1 != next_time
In earlier versions, I might do something like:
select prev_time as time from (
select #prev_time+0 as prev_time,if(#prev_time:=time,time,time) as time
from (select #prev_time:=null) initvars
cross join (select time from `table` order by time) t
) t
where time != prev_time+1
Either will not include the greatest time, where your original query would have.
I think the group by required to treat it as a strict gaps and islands problem would be too expensive with that many records.
fiddle

MySql CASE Execute Query Returns Operand should contain 1 columns

The following code returns Operand should contain 1 columns.
SELECT
CASE WHEN
(SELECT COUNT(1) FROM `student` WHERE `join_date` > '2017-03-21 09:00:00') > 0
THEN
(SELECT * FROM `student` >= CAST(CAST('2017-03-21 09:00:00' AS DATE) AS DATETIME)
END
but the following works. Why?
SELECT
CASE WHEN
(SELECT COUNT(1) FROM `student` WHERE `join_date` > '2017-03-21 00:00:00') > 0
THEN
(SELECT `foo`)
ELSE
(SELECT `bar`)
END
How if i want to perform checking and execute 2 different queries according to the checking result.
I want to achieve following result (works fine in sql)
IF (SELECT COUNT(*) FROM table WHERE term LIKE "term") > 4000
EXECUTE (SELECT * FROM table1)
ELSE
EXECUTE (SELECT * FROM table2)
If you force your subselect tor return only a row the also the first select work
SELECT
CASE WHEN
(SELECT COUNT(1) FROM `student` WHERE `join_date` > '2017-03-21 00:00:00') > 0
THEN
(SELECT * FROM `student` order by your_column limit 1)
ELSE
(SELECT * FROM `teacher` order by your_column limit 1)
END
you should also add proper order by on the column your need (in the sample named your_column ) for obtain the valid first row
You can select from both tables using UNION ALL and excluding conditions.
SELECT * FROM `student`
WHERE EXISTS (SELECT * FROM `student` WHERE `join_date` > '2017-03-21 00:00:00')
UNION ALL
SELECT * FROM `teacher`
WHERE NOT EXISTS (SELECT * FROM `student` WHERE `join_date` > '2017-03-21 00:00:00')
Note that the table schemas should be the same.

Select from an explicit table in mysql

I am trying to do a join on data that does not exist in my database, and never changes.
I want to do:
SELECT val, campaign FROM values
LEFT JOIN (SELECT campaign, start, end FROM (
('Spring 2104', '2014-05-01', '2014-08-01'),
('Winter 2014', '2014-08-01', '2014-12-31')
) as campaign_table ON (
values.date > campaign_table.start AND
values.date < campaign_table.end
)
Is that possible? I could create a temporary table, but for what I am trying to do that does not actually work.
You could use union all to create the dummy set. This is a viable solution considering there are only a handful of rows in your dummy dataset.
SELECT val
,campaign
FROM
VALUES
LEFT JOIN (
SELECT 'Spring 2104' campaign
,'2014-05-01' start
,'2014-08-01' [end]
UNION ALL
SELECT 'Winter 2014'
,'2014-08-01'
,'2014-12-31'
) AS campaign_table ON
VALUES.DATE > campaign_table.start
AND
VALUES.DATE < campaign_table.[end]
Maybe you need this executing all queries at once:
CREATE TABLE IF NOT EXISTS `tempo`( `campaign_name` VARCHAR(100), `from` DATE, `to` DATE );
INSERT INTO tempo(campaign_name, `start`, `end`) VALUES ('Spring 2104', '2014-05-01', '2014-08-01'),('Winter 2014', '2014-08-01', '2014-12-31');
SELECT t1.val, t1.campaign, t2.campaign_name FROM `values` t1, `tempo` t2 WHERE t1.date BETWEEN t2.start AND t2.end;
DROP TABLE `tempo`;
Also you can make: CREATE TEMPORARY TABLE
Try!

How many different ways are there to get the second row in a SQL search?

Let's say I was looking for the second most highest record.
Sample Table:
CREATE TABLE `my_table` (
`id` int(2) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`value` int(10),
PRIMARY KEY (`id`)
);
INSERT INTO `my_table` (`id`, `name`, `value`) VALUES (NULL, 'foo', '200'), (NULL, 'bar', '100'), (NULL, 'baz', '0'), (NULL, 'quux', '300');
The second highest value is foo. How many ways can you get this result?
The obvious example is:
SELECT name FROM my_table ORDER BY value DESC LIMIT 1 OFFSET 1;
Can you think of other examples?
I was trying this one, but LIMIT & IN/ALL/ANY/SOME subquery is not supported.
SELECT name FROM my_table WHERE value IN (
SELECT MIN(value) FROM my_table ORDER BY value DESC LIMIT 1
) LIMIT 1;
Eduardo's solution in standard SQL
select *
from (
select id,
name,
value,
row_number() over (order by value) as rn
from my_table t
) t
where rn = 1 -- can pick any row using this
This works on any modern DBMS except MySQL. This solution is usually faster than solutions using sub-selects. It also can easily return the 2nd, 3rd, ... row (again this is achievable with Eduardo's solution as well).
It can also be adjusted to count by groups (adding a partition by) so the "greatest-n-per-group" problem can be solved with the same pattern.
Here is a SQLFiddle to play around with: http://sqlfiddle.com/#!12/286d0/1
This only works for exactly the second highest:
SELECT * FROM my_table two
WHERE EXISTS (
SELECT * FROM my_table one
WHERE one.value > two.value
AND NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
)
LIMIT 1
;
This one emulates a window function rank() for platforms that don't have them. It can also be adapted for ranks <> 2 by altering one constant:
SELECT one.*
-- , 1+COALESCE(agg.rnk,0) AS rnk
FROM my_table one
LEFT JOIN (
SELECT one.id , COUNT(*) AS rnk
FROM my_table one
JOIN my_table cnt ON cnt.value > one.value
GROUP BY one.id
) agg ON agg.id = one.id
WHERE agg.rnk=1 -- the aggregate starts counting at zero
;
Both solutions need functional self-joins (I don't know if mysql allows them, IIRC it only disallows them if the table is the target for updates or deletes)
The below one does not need window functions, but uses a recursive query to enumerate the rankings:
WITH RECURSIVE agg AS (
SELECT one.id
, one.value
, 1 AS rnk
FROM my_table one
WHERE NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
UNION ALL
SELECT two.id
, two.value
, agg.rnk+1 AS rnk
FROM my_table two
JOIN agg ON two.value < agg.value
WHERE NOT EXISTS (
SELECT * FROM my_table nx
WHERE nx.value > two.value
AND nx.value < agg.value
)
)
SELECT * FROM agg
WHERE rnk = 2
;
(the recursive query will not work in mysql, obviously)
You can use inline initialization like this:
select * from (
select id,
name,
value,
#curRank := #curRank + 1 AS rank
from my_table t, (SELECT #curRank := 0) r
order by value desc
) tb
where tb.rank = 2
SELECT name
FROM my_table
WHERE value < (SELECT max(value) FROM my_table)
ORDER BY value DESC
LIMIT 1
SELECT name
FROM my_table
WHERE value = (
SELECT min(r.value)
FROM (
SELECT name, value
FROM my_table
ORDER BY value DESC
LIMIT 2
) r
)
LIMIT 1

MySQL Group By get title column with smallest date value

I have a query like:
SELECT *
FROM table
GROUP BY sid
ORDER BY datestart desc
LIMIT 10
which returns the last 10 sid groups.
For each of these groups, I need the title column of the row with the lowest datestart value
I tried using
SELECT *, min(datestart)
but that didn't return the row with the smallest datestart value, just the lowest datestart. I need the title from the lowest datestart.
(Relevant) Table Structure:
CREATE TABLE `table` (
`title` varchar(1000) NOT NULL,
`datestart` timestamp NOT NULL default CURRENT_TIMESTAMP,
`sid` bigint(12) unsigned NOT NULL,
KEY `datestart` (`datestart`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Any ideas?
Updated answer
select t1.* from `table` as t1
inner join (
select sid,min(datestart) as elder
from `table`
group by sid
order by elder desc limit 10) as t2
on t1.sid = t2.sid and t1.datestart = t2.elder
Use a composite index on (sid,datestart)
Try this query. You will get expected results. If it don't work change Table_2.datestart > Table_1.datestart by Table_2.datestart < Table_1.datestart
SELECT title, datestart
FROM `table` AS Table_1
LEFT JOIN `table` AS Table_2 ON (Table_2.sid = Table_1.sid AND Table_2.datestart > Table_1.datestart)
Table_2.sid IS NULL;
Edited query
SELECT Table_1.title, Table_1.datestart
FROM `table` AS Table_1
LEFT JOIN `table` AS Table_2 ON (Table_2.sid = Table_1.sid AND Table_2.datestart > Table_1.datestart)
Table_2.sid IS NULL;