MySQL Finding the next row value by ID - mysql

id month status
1 1997-11-01 A
1 2015-08-01 B
2 2010-01-01 A
2 2010-02-01 B
2 2012-10-01 C
That I would like to format to be:
id month lead_month status
1 1997-11-01 2015-08-01 A
1 2015-08-01 NOW() B
2 2010-01-01 2010-02-01 A
2 2010-02-01 2012-10-01 B
2 2012-10-01 NOW() C
MySQL is new to me, and I have trouble wrapping my head around variables. I would prefer to use a simple LEAD() with a PARTITION but unfortunately, I can't.
Here's my attempt, that doesn't work:
SET #lead = '1995-01-01'; --abitrary floor
select id, month, status, #lead, #lead:=month from table
The output looks like this, which also doesn't check if the id's from row to row are the same:
id month lead_month status
1 1997-11-01 1995-01-01 A
1 2015-08-01 1997-11-01 B
2 2010-01-01 2015-08-01 A
2 2010-02-01 2010-01-01 B
2 2012-10-01 2010-02-01 C

Don't muck around with variables in MySQL. That sort of logic would better reside in whatever language you are using for your application. This, however can be done in SQL.
My first instinct is simply to save that data in an extra column. Don't worry about the size of the db–there aren't enough months in the universe to become a problem.
There is also something wrong with your ids: these should almost always be primary keys, i. e. unique.
If you insist on your scheme, you can use a join. Assuming consecutive unique ids:
SELECT a.id, a.month, b.month AS lead_month, status FROM table AS a LEFT JOIN table AS b WHERE a.id - 1 = b.id;

You can use a correlated subquery:
select t.*,
(select t2.month
from t t2
where t.id = t2.id
order by t2.month desc
limit 1
) as next_month
from t;
If you want to replace the value for the last month for each id, then you can use coalesce():
select t.*,
coalesce((select t2.month
from t t2
where t.id = t2.id
order by t2.month desc
limit 1
), now()) as next_month
from t;

Related

Join on SQL using the most recent data instead of the equal one

I need to join 2 dataframes using month and name. However, in one of these dataframes I don't have all the monthly result, so I want to repeat the most recent one.
For example,
Dataframe A
name score month
Alex 20 2020/01
Alex 30 2020/03
Dataframe B
name month tenure
Alex 2020/01 1
Alex 2020/02 2
Alex 2020/03 3
Join A+B using name and month - expected result
name month score tenure
Alex 2020/01 20 1
Alex 2020/02 20 2 --> repeat the score from the most recent date
Alex 2020/03 30 3
Does someone know how can I do that?
You can use a correlated subquery:
select b.*,
(select a.score
from a
where a.name = b.name and a.month <= b.month
order by a.month desc
limit 1
) as score
from b;
Or, you can use window functions and a join:
select b.*, a.score
from b left join
(select a.*,
lead(month) over (partition by name order by month) as next_month
from a
) a
on b.name = a.name and
b.month >= a.month and
(b.month < a.next_month or a.next_month is null);
This method is convenient if you want to fetch multiple columns from a.

Select difference based on record having minimum and maximum date in MySql

Below is my table let's call account
**ID accountID score tracking_date
1 1 3 2014-09-25 00:01:05
2 2 4 2014-09-26 01:05:18
3 1 6 2014-09-27 09:23:05
4 2 9 2014-09-28 20:01:05
5 1 1 2014-09-28 23:21:34
6 3 7 2014-09-21 00:01:00
7 2 1 2014-09-22 01:45:24
8 2 9 2014-09-27 14:01:43
9 3 1 2014-09-24 22:01:27
I want to select record with max date and also the difference of score with the records having tracking_date as minimum for that accountId. So I want output like below
ID accountID score_with_maxdate diff_score_with_mindate max_tracking_date
1 1 1 -2 2014-09-28 23:21:34
2 2 9 8 2014-09-28 20:01:05
3 3 1 -6 2014-09-24 22:01:27
Any help?
Here is one option. We can self-join a subquery which finds both the min and max tracking dates, for each account, twice to your original table. This will bring in all metadata for those max tracking date records, including the scores.
SELECT
t1.accountID,
t2.score AS score_with_maxdate,
t2.score - t3.score AS diff_score_with_mindate,
t1.max_tracking_date
FROM
(
SELECT
accountID,
MAX(tracking_date) AS max_tracking_date,
MIN(tracking_date) AS min_tracking_date
FROM yourTable
GROUP BY accountID
) t1
INNER JOIN yourTable t2
ON t1.accountId = t2.accountID AND t2.tracking_date = t1.max_tracking_date
INNER JOIN yourTable t3
ON t1.accountId = t3.accountID AND t3.tracking_date = t1.min_tracking_date
ORDER BY
t1.accountID;
Demo
This is a somewhat tricky question. I think conditional aggregation is a convenient way to solve the problem:
select min(t.id) as id, t.accountId,
max(case when t.tracking_date = t2.max_td then t.score end) as score_with_maxdate,
max(case when t.tracking_date = t2.max_td then t.score
when t.tracking_date = t2.min_td then - t.score
end) as diff_score_with_mindate,
max(t.tracking_date) as max_tracking_date
from t join
(select t2.accountId, min(t2.tracking_date) as min_td, max(t2.tracking_date) as max_td
from t t2
group by t2.accountId
) t2
on t.accountId = t2.accountId
group by t.accountId;
Another hackish way of getting same results by using aggregate and string fucntion
select t.accountID,
t.score_with_maxdate,
t.score_with_maxdate - t.score_with_mindate score_with_maxdate,
t.max_tracking_date
from(
select accountID,
substring_index(group_concat(score order by tracking_date desc),',', 1) + 0 score_with_maxdate,
substring_index(group_concat(score order by tracking_date asc),',', 1) + 0 score_with_mindate,
max(tracking_date) max_tracking_date
from demo
group by accountID
) t
Demo
But i would suggest you to go with other solutions mentioned by Tim & Gordon

MySQL: select entries with a certain count within a certain period

I have a MySQL table with a datetime row. How can I find all groups with at least 5 entries within 10 minutes?
My only idea is to write a program (in whatever language) and loop over the timestamps, check always 5 (..) successive entries, calculate the time span between the last and the first and check whether it is below the limit.
Can this be done using a single SQL query too?
(The scenario is is simplified and the numbers are just examples.)
As requested, here comes an example:
id | timestamp | other_column
---|---------------------|-------------
3 | 2017-01-01 11:00:00 | thank
2 | 2017-01-01 11:01:00 | you
1 | 2017-01-01 11:02:00 | for
* 6 | 2017-01-01 11:20:00 | your
* 5 | 2017-01-01 11:21:00 | efforts
* 4 | 2017-01-01 11:22:00 | to
* 7 | 2017-01-01 11:23:00 | help
* 8 | 2017-01-01 11:24:00 | me
9 | 2017-01-01 11:40:00 | :
10 | 2017-01-01 11:41:00 | )
If the count limit is 5 and the timespan limit is 10 minutes, I'd like to get the entries marked with "*". The "id" column is the primary key of the table, but the order is not always the order of the timestamps. The "other_column" is used for a where clause. The table has about 1 million entries.
Try to break this down logically. Sorry for the psuedo code bits, I'm a little short on time.
select t1.id, t1.timestamp, t2.timestamp
from yourtable t1
inner join yourtable t2 on t2.timestamp >= t1.timestamp and t2.timestamp < (t1.timestamp + 20 minutes)
(plus 20 minutes won't work as is, use appropriate add function)
So this will give you a relatively giant list of all ID's joined to any other id's within a 20 minute time interval (including one row for itself). (add, I'm only picking out the first row of the group at this point, easier just to grab the 'header row' here by this timestamp plus 20 minutes and worry about the rest in the next step) If we group by the ID and time, we get a count of how many rows were within 20 minutes:
select id, t1.timestamp, count(1)
from yourtable t1
inner join yourtable t2 on t2.timestamp >= t1.timestamp and t2.timestamp < (t1.timestamp + 20 minutes)
group by id, t1.timestamp
having count(1) > 4
This will now give you a list of all the ID's and it's timestamp that has itself and 4 other records or more within 20 minutes away from that timestamp. Now it depends on how you want to group from here, if you want each of the 5 lines, we can call the query above a subquery and join it back to the main table to get the rows you want returned.
select t3.*
from
(select id, t1.timestamp, count(1)
from yourtable t1
inner join yourtable t2
on t2.timestamp >= t1.timestamp and t2.timestamp < (t1.timestamp + 20 minutes)
group by id, t1.timestamp
having count(1) > 4) a
inner join yourtable t3 on t3.timestamp >= a.timestamp and t3.timestamp < (a.timestamp + 20 minutes)
And that should give you ID 4-8 and it's info returned (order as you see fit).
My apologies that I don't have the time to test, but the logic should work.

How can I combine the values of a single column in SQL

I have the table in the following format.
BatchID BatchTime
1 10:00:00
2 13:00:00
3 16:00:00
4 19:00:00
And I what I actually need is:
BatchID BatchTime
1 10:00:00 - 13:00:00
2 13:00:00 - 16:00:00
3 16:00:00 - 19:00:00
4 19:00:00 - 10:00:00
Assuming you have consecutive ids you can do the following:
SELECT a.id,a.dt date_a, b.dt date_b FROM tbl a
INNER JOIN tbl b ON b.id=a.id % (SELECT MAX(id) FROM tbl) + 1
ORDER BY a.id
See here: http://sqlfiddle.com/#!3/24791/6
Should, however, the times dt not be ascending with id (which incidentally could also have gaps) then the following will still work:
WITH t AS (SELECT id,dt,ROW_NUMBER() OVER (ORDER BY dt) n FROM tbl)
SELECT a.id,a.dt date_a, b.dt date_b FROM t a
INNER JOIN t b ON b.n=a.n % (SELECT MAX(n) FROM t) + 1
ORDER BY a.n -- order by ascending times in table a
See here: http://sqlfiddle.com/#!3/74ed6/1
The window function ROW_NUMBER() puts the times in an ascending order in the common table expression t. After that two ts are joined in a cyclic manner (using modulus % on the newly generated row number n).
a.n % (SELECT MAX(n) FROM t) + 1 will always calculate the "next" line in the cyclic order with which to join table t with alias b.

one mysql query getting, per each row, the average of the previous three rows

i have something like this:
id | value
---------------
201311 | 10
201312 | 15
201401 | 20
201402 | 5
201403 | 17
and i need a result like this:
201311 | NULL or 0
201312 | 3.3 // 10/3
201401 | 8.3 // (15+10)/3
201402 | 15 // (20+15+10)/3
201403 | 13.3 // (5+20+15)/3
So far, i got to the point where i can get the AVG of the last three previous rows like this:
select AVG(c.value) FROM (select b.value from table as b where b.id < 201401 order by b.id DESC LIMIT 3) as c
passing the id manually. I'm not able to do it for each id.
Any ideas would be much appreciated!
thanks a lot.
regards
I think you'll have to write a stored procedure, use a cursor, iterate through the table and populate a new table using the values calculated in your cursor loop. If you need help with writing out the cursor loop, just drop a comment and I can get you an example.
i got to this now:
SELECT a.id, (select AVG(b.value) FROM table as b where b.id < a.id AND str_to_date(CONCAT(b.id,'01'), '%Y%m%d') >= DATE_SUB(str_to_date(CONCAT(a.id,'01'), '%Y%m%d'), INTERVAL 3 MONTH)) FROM `table` as a WHERE 1
But i'm quite sure there should be a better/cleaner solution
select a.id,coalesce(b.value,0) from test a left outer join
(select a.id, sum(b.value)/3 as value from
(select #row:=#row+1 as rownum,id,value from test,(select #row:=0)r) a,
(select #row1:=#row1+1 as rownum,id,value from test,(select #row1:=0)r) b
where b.rownum in (a.rownum-1,a.rownum-2,a.rownum-3)
group by a.rownum) b
on a.id=b.id;