To find the last value in the dataset of 15 minutes interval - mysql

ID Timestamp Value
1 11:59.54 10
1 12.04.00 20
1 12.12.00 31
1 12.16.00 10
1 12.48.00 05
I want the result set as
ID Timestamp Value
1 11:59.54 10
1 12:00:00 10
1 12.04.00 20
1 12.12.00 31
1 12:15:00 31
1 12:16.00 10
1 12:30:00 10
1 12:45:00 10
1 12.48.00 05

More coffee will probably lead to a simpler solution, but consider the the following...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,timestamp TIMESTAMP
,value INT NOT NULL
);
INSERT INTO my_table VALUES
(1 ,'11:59:54',10),
(2 ,'12:04:00',20),
(3 ,'12:12:00',31),
(4 ,'12:16:00',10),
(5 ,'12:48:00',05);
... in addition, I have a table of integers, that looks like this:
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
So...
SELECT a.timestamp
, b.value
FROM
( SELECT x.*
, MIN(y.timestamp) min_timestamp
FROM
( SELECT timestamp
FROM my_table
UNION
SELECT SEC_TO_TIME((i2.i*10+i1.i)*900)
FROM ints i1
, ints i2
WHERE SEC_TO_TIME((i2.i*10+i1.i)*900)
BETWEEN (SELECT MIN(timestamp) FROM my_table)
AND (SELECT MAX(timestamp) FROM my_table)
ORDER
BY timestamp
) x
LEFT
JOIN my_table y
ON y.timestamp >= x.timestamp
GROUP
BY x.timestamp
) a
JOIN my_table b
ON b.timestamp = min_timestamp;
+-----------+-------+
| timestamp | value |
+-----------+-------+
| 11:59:54 | 10 |
| 12:00:00 | 20 |
| 12:04:00 | 20 |
| 12:12:00 | 31 |
| 12:15:00 | 10 |
| 12:16:00 | 10 |
| 12:30:00 | 5 |
| 12:45:00 | 5 |
| 12:48:00 | 5 |
+-----------+-------+

The idea is as follows. Use SERIES_GENERATE() to generate the missing time stamps with the 15 minute intervals and and union it with the existing data your table T. Now you would want to use LAST_VALUE with IGNORE NULLS. IGNORE NULLS is not implemented in HANA, therefore you have to do a bit of a workaround. I use COUNT() as a window function to count the non null values. I do the same on the original data and then join both on the count. This way I repeat the last non-null value.
select X.ID, X.TIME, Y.VALUE from (
select ID, TIME, value,
count(VALUE) over (order by TIME rows between unbounded preceding and current row) as CNT
from (
--add the missing 15 minute interval timestamps
select 1 as ID, GENERATED_PERIOD_START as TIME, NULL as VALUE
from SERIES_GENERATE_TIME('INTERVAL 15 MINUTE', '12:00:00', '13:00:00')
union all
select ID, TIME, VALUE from T
)
) as X join (
select ID, TIME, value,
count(value) over (order by TIME rows between unbounded preceding and current row) as CNT
from T
) as Y on X.CNT = Y.CNT

Related

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

How to get lowest value from table

Problem 1
How can I get lowest value from table (not null), for ID_CAR? For example, for ID_CAR 1 lowest value is 50, for ID_CAR 2 lowest value is 50 and for ID_CAR 3 lowest value is 300. I don't need duplicates, I need only one value for one car.
ID_CAR | col_1 | col_2 | col_3 | col_4 | col_5 | col_6
1 | null | 250 | 300 | null | 900 | null
2 | 100 | null | 300 | 600 | 200 | 100
1 | 300 | 100 | 800 | 100 | 50 | 900
3 | 300 | 4000 | null | null | null | null
2 | null | null | null | 50 | null | 100
4 | 400 | 900 | 500 | 700 | 800 | 500
Problem 2
In this example, values in col_* are days. I need to add days to col_date and get lowest. For example lowest date for ID_CAR 1 is 2018-01-03 (col_2) and for ID_CAR 2 is 2018-01-15 (col_4).
ID_CAR | col_1 | col_2 | col_3 | col_4 | col_5 | col_6 | col_date
1 | null | 2 | 3 | null | 5 | null | 2018-01-01
2 | 1 | null | 3 | 6 | 10 | 10 | 2018-01-13
1 | 3 | 20 | 80 | 10 | 50 | 90 | 2018-01-02
3 | 30 | 40 | null | null | null | null | 2018-01-03
2 | null | null | null | 5 | null | 10 | 2018-01-10
4 | 10 | 9 | 5 | 70 | 8 | 50 | 2018-01-07
Without union you can simply combine least and min function :
select
ID_CAR,min(least(col_1,col_2,col_3,col_4,col_5,col_6)) lowest_value
from
table
group by
ID_CAR
or if you have nullvalues you need ifnull or coalesce function
select
ID_CAR,
min(least(
ifnull(col_1,~0),
ifnull(col_2,~0),
ifnull(col_3,~0),
ifnull(col_4,~0),
ifnull(col_5,~0),
ifnull(col_6,~0)
)) as lowest_value
from
table
group by
ID_CAR
~0 is the max bigint in mysql
The opposite function of least is greatest
The opposite function of min is max ;-)
Works with Mysql, Oracle, Postgres, Hive ...
Problem 2, something like this :
select
ID_CAR,
min(least(
DATE_ADD(col_date, INTERVAL ifnull(col_1,0) DAY),
DATE_ADD(col_date, INTERVAL ifnull(col_2,0) DAY),
DATE_ADD(col_date, INTERVAL ifnull(col_3,0) DAY),
DATE_ADD(col_date, INTERVAL ifnull(col_4,0) DAY),
DATE_ADD(col_date, INTERVAL ifnull(col_5,0) DAY),
DATE_ADD(col_date, INTERVAL ifnull(col_6,0) DAY)
)) as lowest_date
from
table
group by
ID_CAR
or like this (except if all columns can be null):
select
ID_CAR,
DATE_ADD(col_date, INTERVAL min(least(
ifnull(col_1,~0),
ifnull(col_2,~0),
ifnull(col_3,~0),
ifnull(col_4,~0),
ifnull(col_5,~0),
ifnull(col_6,~0)
)) DAY) as lowest_date
from
table
group by
ID_CAR
You need UNION :
select id_car, min(val) as lowest_value
from (select id_car, col_1 as Val
from table union
select id_car, col_2
from table
. . .
) t
group by id_car;
The following query will give you the required result
select tab.ID_CAR, min(tab.val) as lowest_value from
(
(select ID_CAR,min(col_1) val
from table
group by ID_CAR)
union
(select ID_CAR,min(col_2) val
from table
group by ID_CAR)
union
(select ID_CAR,min(col_3) val
from table
group by ID_CAR)
union
(select ID_CAR,min(col_4) val
from table
group by ID_CAR)
union
(select ID_CAR,min(col_5) val
from table
group by ID_CAR)
union
(select ID_CAR,min(col_6) val
from table
group by ID_CAR)
) tab
group by tab.ID_CAR
Try this
If you are expecting values greater than 9999999999999999999, then use a higher values
select id_car,
min(least(coalesce(col_1,9999999999999999999),coalesce(col_2,9999999999999999999),coalesce(col_3,9999999999999999999),
coalesce(col_4,9999999999999999999),coalesce(col_5,9999999999999999999),coalesce(col_6,9999999999999999999)
)
) as min_val
from your_table
group by id_car
The naive approach would be using least:
SELECT ID_CAR, least(t.col_1, t.col_2, t.col_3, t.col_4, t.col_5, t.col_6)
FROM
(SELECT ID_CAR, min(col_1) as col_1, min(col_2) as col_2, min(col_3) as col_3, min(col_4) as col_4, min(col_5) as col_5, min(col_6) as col_6
FROM YOUR_TABLE GROUP BY ID_CAR) t;
However: If ANY argument to LEAST is NULL, it'll return NULL. You'll either need to convert the NULLs to a high value (which is a hack but will work in practice, see other answers for this).
Which means doing something like this:
SELECT ID_CAR, LEAST(col_1, col_2, col_3,
col_4, col_5, col_6) as l
FROM
(SELECT ID_CAR,
IFNULL(min(col_1), 9999) as col_1,
IFNULL(min(col_2), 9999) as col_2,
IFNULL(min(col_3), 9999) as col_3,
IFNULL(min(col_4), 9999) as col_4,
IFNULL(min(col_5), 9999) as col_5,
IFNULL(min(col_6), 9999) as col_6
FROM YOUR_TABLE GROUP BY ID_CAR) t;
However, it might be good to use a trick to convert your table
into a three row table of the form:
car_id | attr | value
1 1 NULL ; or use strings such as "size"
1 2 250

How to select a row before a specific row on MySQL, if the table is ordered by date?

I have a select result like this:
ID | DATE
----------------
10 | 2014-07-23
7 | 2014-07-24
8 | 2014-07-24
9 | 2014-07-24
1 | 2014-07-25
2 | 2014-07-25
6 | 2014-07-25
3 | 2014-07-26
4 | 2014-07-27
5 | 2014-07-28
The result above is ordered by date. Now, I want to select the one previous row before:
2 | 2014-07-25
Which is:
1 | 2014-07-25
In case I don't know the exact ID and the conditional code must be compatible with if I want to select a previous row of:
3 | 2014-07-26
Which is:
6 | 2014-07-25
What condition should I use?
UPDATE
Tried this:
SET #rank=0;
SELECT #rank:=#rank+1 AS rank, t1.*
FROM table t1
Then I got this:
RANK | ID | DATE
----------------
1 | 10 | 2014-07-23
2 | 7 | 2014-07-24
3 | 8 | 2014-07-24
4 | 9 | 2014-07-24
5 | 1 | 2014-07-25
6 | 2 | 2014-07-25
7 | 6 | 2014-07-25
8 | 3 | 2014-07-26
9 | 4 | 2014-07-27
10 | 5 | 2014-07-28
Then I tried this:
SET #rank=0;
SELECT #rank:=#rank+1 AS rank, t1.*
FROM table t1
WHERE rank < 3;
I got this error: Unknown column 'rank' in 'where clause'.
Here's one way...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(ID INT NOT NULL PRIMARY KEY
,DATE DATE NOT NULL
);
INSERT INTO my_table VALUES
(10 ,'2014-07-23'),
(7 ,'2014-07-24'),
(8 ,'2014-07-24'),
(9 ,'2014-07-24'),
(1 ,'2014-07-25'),
(2 ,'2014-07-25'),
(6 ,'2014-07-25'),
(3 ,'2014-07-26'),
(4 ,'2014-07-27'),
(5 ,'2014-07-28');
SELECT a.id
, a.date
, b.id b_id
, b.date b_date
FROM
( SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON (y.date < x.date)
OR (y.date = x.date AND y.id <= x.id)
GROUP
BY x.date
, x.id
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM my_table x
JOIN my_table y
ON (y.date < x.date)
OR (y.date = x.date AND y.id <= x.id)
GROUP
BY x.date
, x.id
) b
ON b.rank = a.rank - 1;
+----+------------+------+------------+
| id | date | b_id | b_date |
+----+------------+------+------------+
| 10 | 2014-07-23 | NULL | NULL |
| 7 | 2014-07-24 | 10 | 2014-07-23 |
| 8 | 2014-07-24 | 7 | 2014-07-24 |
| 9 | 2014-07-24 | 8 | 2014-07-24 |
| 1 | 2014-07-25 | 9 | 2014-07-24 |
| 2 | 2014-07-25 | 1 | 2014-07-25 |
| 6 | 2014-07-25 | 2 | 2014-07-25 |
| 3 | 2014-07-26 | 6 | 2014-07-25 |
| 4 | 2014-07-27 | 3 | 2014-07-26 |
| 5 | 2014-07-28 | 4 | 2014-07-27 |
+----+------------+------+------------+
... but you can also do this (quicker) with variables.
You can add a row id to the select like this
SELECT #rowid:=#rowid+1 as rowid,
t1.* FROM yourdatabase.tablename t1, (SELECT #rowid:=0) as rowids;
Then you can run a simple query to get the lower rowid from the input.
This uses a sub query that joins the table against itself, where on one side it is the date you are checking and matching against smaller dates. It uses MAX to get the highest smaller date.
This is then joined against another sub query that gets the highest ID for each date, which also joins against the table itself to get the other details from that row.
SELECT table.*
FROM table
INNER JOIN
(
SELECT MAX(a.date) AS latest_prev_date
FROM table1 a
INNER JOIN table1 b
ON a.date > b.date
WHERE a.date = '2014-07-26'
) sub0
ON table.date = sub0.latest_prev_date
INNER JOIN
(
SELECT date, MAX(ID) AS latest_prev_id
FROM table1
GROUP BY date
) sub1
ON table.ID = sub1.latest_prev_id
AND sub1.date = sub0.latest_prev_date
if you want to use a user_defined_variable this is a way to do it.
SELECT
tab.id, temp.id, temp.date
FROM
(
SELECT
#A:=#A + 1 AS rank_col, t.date, t.id
FROM
myTable t
CROSS JOIN (SELECT #A:=0) join_table
) AS tab
LEFT JOIN
(
SELECT
#B:=#B + 1 AS rank_col, t2 . *
FROM myTable t2
CROSS JOIN (SELECT #B:=0) join_table1
) temp ON temp.rank_col = tab.rank_col - 1;
DEMO

MySQL: efficiently converting event logs to time series

I have a table recording the start time and end time of events of interest:
CREATE TABLE event_log (start_time DATETIME, end_time DATETIME);
INSERT INTO event_log VALUES ("2013-06-03 09:00:00","2013-06-03 09:00:05"), ("2013-06-03 09:00:03","2013-06-03 09:00:07"), ("2013-06-03 09:00:10","2013-06-03 09:00:12");
+---------------------+---------------------+
| start_time | end_time |
+---------------------+---------------------+
| 2013-06-03 09:00:00 | 2013-06-03 09:00:05 |
| 2013-06-03 09:00:03 | 2013-06-03 09:00:07 |
| 2013-06-03 09:00:10 | 2013-06-03 09:00:12 |
+---------------------+---------------------+
I am looking for a way to create a "time series" table where one column is a time index and another column is the count of events in progress at that time. I can do it with a subquery and a generator:
SET #first_time := (SELECT MIN(start_time) FROM event_log);
SET #last_time := (SELECT MAX(end_time) FROM event_log);
CREATE OR REPLACE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL
SELECT 15;
CREATE TABLE time_series (t DATETIME, event_count INT(11))
SELECT #first_time + INTERVAL n SECOND t, NULL AS event_count
FROM generator_16
WHERE #first_time + INTERVAL n SECOND <= #last_time;
UPDATE time_series
SET event_count= (SELECT COUNT(*) FROM event_log
WHERE start_time<=t AND end_time>=t);
+---------------------+-------------+
| t | event_count |
+---------------------+-------------+
| 2013-06-03 09:00:00 | 1 |
| 2013-06-03 09:00:01 | 1 |
| 2013-06-03 09:00:02 | 1 |
| 2013-06-03 09:00:03 | 2 |
| 2013-06-03 09:00:04 | 2 |
| 2013-06-03 09:00:05 | 2 |
| 2013-06-03 09:00:06 | 1 |
| 2013-06-03 09:00:07 | 1 |
| 2013-06-03 09:00:08 | 0 |
| 2013-06-03 09:00:09 | 0 |
| 2013-06-03 09:00:10 | 1 |
| 2013-06-03 09:00:11 | 1 |
| 2013-06-03 09:00:12 | 1 |
+---------------------+-------------+
Is there a more efficient way to do it? This method requires a subquery for every time index. Would there, for example, be a way to do it that requires one subquery per "event_log" record? My real problem has 500k time index entries and 1k events; it's taking a little longer than I would like (about 90 seconds).
The "generator" snippet came from http://use-the-index-luke.com/blog/2011-07-30/mysql-row-generator . Clearly one of the larger generators, like the 64k version or the 1M version, would be needed for larger problems.
The only changes happen at start_time and end_time.
So, if you were to
select distinct start_time As time_point from event_log
UNION
select distinct end_time As time_point from event_log
... that would give you all the "points" at which you need a snapshot.
If you create that in a temporary table (say TEMP_POINTS), and join if back to event_log, you should be able to count the number of events at each "point".
CREATE TABLE NON_ZERO_POINTS (t DATETIME, event_count INT(11))
select time_point, count(*)
from TEMP_POINTS
join event_log on time_point between start_time and end_time
group by time_point
Might be worth creating an index on NON_ZERO_POINTS
Then, you could use NON_ZERO_POINTS in your update thus:
UPDATE time_series
SET event_count= (SELECT event_count FROM NON_ZERO_POINTS
WHERE t=time_point);
Also, do you need to update time_series? If not, you could just use it in a query:
select t, coalesce(event_count)
from time_series
left join FROM NON_ZERO_POINTS
on t=time_point

MySQL Query output using if else or case..?

I have a table as
mysql> select * FROM testa;
+---------+-------+
| month_x | money |
+---------+-------+
| 11101 | 12345 |
| 11105 | 100 |
| 11105 | 100 |
| 11105 | 100 |
| 11105 | 100 |
| 11106 | 12345 |
+---------+-------+
6 rows in set (0.00 sec)
where last two digits in the month_x are months now i want my output as
Month TOTAL
01 12345
02 0
03 0
04 0
05 400
06 12345
07 0
08 0
09 0
10 0
11 0
12 0
IS possible using the If else or case.
You can use modular arithmetic to obtain the trailing two digits (they're the remainder when the number is divided by 100), then assuming you wish to sum money when your data is "grouped by" month:
SELECT month_x % 100 AS Month, SUM(money) AS TOTAL
FROM testa
GROUP BY Month
ORDER BY Month ASC;
Alternatively, you could use rely on MySQL's implicit type conversion and use its string functions:
SELECT RIGHT(month_x, 2) AS Month, SUM(money) AS TOTAL
FROM testa
GROUP BY Month
ORDER BY Month ASC;
UPDATE
As #shiplu.mokadd.im states, to show every month (even those for which you have no data), you need to obtain numbers 1 through 12 from a temporary table. However, you can create such a temporary table in your query using UNION:
SELECT 1
UNION SELECT 2
UNION SELECT 3 -- etc
Therefore:
SELECT Month, Sum(money) AS TOTAL
FROM testa
RIGHT JOIN (
SELECT 1 AS Month
UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6
UNION SELECT 7 UNION SELECT 8 UNION SELECT 9 UNION SELECT 10 UNION SELECT 11
UNION SELECT 12
) months ON testa.month_x % 100 = months.Month
GROUP BY Month;
HOWEVER I would note that usually one doesn't usually do this in the database, as it really belongs in the presentation layer: from whatever language you're accessing the database, you'd loop over 1...12 and assume TOTAL to be 0 if there's no corresponding record in the resultset.
For this you need to create a table first with months' numeric value in it.
CREATE TABLE `months` (
`mon` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
INSERT INTO `months` VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12);
Then execute this query,
SELECT m.mon,
IF(Sum(t.money) IS NULL, 0, Sum(t.money)) AS `money`
FROM testa t
RIGHT OUTER JOIN months m
ON ( t.month_x%100 = m.mon )
GROUP BY m.mon;
Result is,
+------+-------+
| mon | money |
+------+-------+
| 1 | 12345 |
| 2 | 0 |
| 3 | 0 |
| 4 | 0 |
| 5 | 400 |
| 6 | 12345 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| 10 | 0 |
| 11 | 0 |
| 12 | 0 |
+------+-------+
You can use IF statements - yes.
Look # this:
http://dev.mysql.com/doc/refman/5.5/en/if-statement.html