Calculate average, minimum, maximum interval between date - mysql

I am trying to do this with SQL. I have a transaction table which contain transaction_date. After grouping by date, I got this list:
| transaction_date |
| 2019-03-01 |
| 2019-03-04 |
| 2019-03-05 |
| ... |
From these 3 transaction dates, I want to achieve:
Average = ((4-1) + (5-4)) / 2 = 2 days (calculate DATEDIFF every single date)
Minimum = 1 day
Maximum = 3 days
Is there any good syntax? Before I iterate all of them using WHILE.
Thanks in advance

If your mysql version didn't support lag or lead function.
You can try to make a column use a subquery to get next DateTime. then use DATEDIFF to get the date gap in a subquery.
Query 1:
SELECT avg(diffDt),min(diffDt),MAX(diffDt)
FROM (
SELECT DATEDIFF((SELECT transaction_date
FROM T tt
WHERE tt.transaction_date > t1.transaction_date
ORDER BY tt.transaction_date
LIMIT 1
),transaction_date) diffDt
FROM T t1
) t1
Results:
| avg(diffDt) | min(diffDt) | MAX(diffDt) |
|-------------|-------------|-------------|
| 2 | 1 | 3 |
if your mysql version higher than 8.0 you can try to use LEAD window function instead of subquery.
Query #1
SELECT avg(diffDt),min(diffDt),MAX(diffDt)
FROM (
SELECT DATEDIFF(LEAD(transaction_date) OVER(ORDER BY transaction_date),transaction_date) diffDt
FROM T t1
) t1;
| avg(diffDt) | min(diffDt) | MAX(diffDt) |
| ----------- | ----------- | ----------- |
| 2 | 1 | 3 |
View on DB Fiddle

Related

How to select rows with the latest date and calculate another field based on the row

I have two tables i.e vehicle and vehicle_maintenance.
vehicle
-----------------------------------
| v_id | v_name | v_no |
-----------------------------------
| 1 | car1 | car123 |
-----------------------------------
| 2 | car2 | car456 |
-----------------------------------
vehicle_maintenance
-----------------------------------------------------------------------
| v_main_id | v_id | v_main_date | v_main_remainder |
-----------------------------------------------------------------------
| 1 | 1 | 2020/10/10 | 1 |
| 2 | 1 | 2020/10/20 | 2 |
| 3 | 2 | 2020/10/04 | 365 |
| 4 | 2 | 2020/10/15 | 5 |
-----------------------------------------------------------------------
I want to get each car maintenance details i.e car2 maintenance date is 2020/10/15 and i want to check next maintenance date based on v_main_remainder field. That means next maintenance date will be 2020/10/20 ( add 5 day to the maintenance date). I want to also calculate the no of days left from next maintenance date. Suppose today is 2020/10/10 then it will show 10 days left.
Here is my query
SELECT
v.v_id,
v.v_name,
v.v_no,
max(vm.v_main_date) as renewal_date,
datediff(
DATE_ADD(
max(vm.v_main_date), INTERVAL +vm.v_main_remainder day
),
now()
) as day_left
FROM vehicle as v, vehicle_maintenance as vm
GROUP BY v.v_id
But the problem is vm.v_main_remainder in date_add function taken from first row.
Here is the result
-----------------------------------------------------------------------
| v_id | v_name | v_no | renewal_date | day_left |
-----------------------------------------------------------------------
| 1 | car1 | car123 | 2020/10/20 | 11 |
-----------------------------------------------------------------------
| 2 | car2 | car456 | 2020/10/15 | 370 |
-----------------------------------------------------------------------
As a starter, your query is obviously missing a join condition between the two tables, so that's a cartesian product. This type of problem is much easier to spot when using explicit joins.
Then: you want to filter on the latest maintenance record per car, so aggregation is not appropriate.
One option uses window functions, available in MySQL 8.0:
select v.v_id, v.v_name, v.v_no, vm.v_main_date as renewal_date,
datediff(vm.v_main_date + interval vm.v_main_remainder day, current_date) as day_left
from vehicle as v
inner join (
select vm.*, row_number() over(partition by v_id order by v_main_date desc) rn
from vehicle_maintenance
) as vm on vm.v_id = v.v_id
where vm.rn = 1
Note that I changed now() to current_date, so datediff() works consistently on dates rather than datetimes.

Seek rows with incorrect dates in historic data

I had a table that is an historic log, recently I fixed a bug that was writing in that table an incorrect date, the dates should be correlatives, but in some cases there was a date that wasn't it, so much older than the previous date.
How can I get all the rows that aren't correlatives for each entity_id? In the example below I should get the rows 5 and 10.
The table has millions of rows and thousand of differents entities. I was thinking to compare the results of ordering by date and id but that is a lot of manual work.
| id | entity_id | time_stamp |
|--------|-------------|---------------|
| 1 | 7 | 2019-01-22 |
| 2 | 9 | 2019-01-05 |
| 3 | 6 | 2019-03-14 |
| 4 | 9 | 2019-04-20 |
| 5 | 6 | 2015-10-04 | WRONG
| 6 | 9 | 2019-07-15 |
| 7 | 3 | 2019-07-04 |
| 8 | 7 | 2019-06-01 |
| 9 | 6 | 2019-11-04 |
| 10 | 7 | 2019-03-04 | WRONG
Are there any function to compare the previous date by the entity id? I'm completely lost here, not sure how to clean the data. The database is MYSQL by the way.
If you are running MySQL 8.0, you can use lag(); the idea is to order records by id within groups having the same entity_id, and then to filter on records where the current timestamp is smaller than the previous one:
select t.*
from (
select t.*, lag(time_stamp) over(partition by entity_id order by id) lag_time_stamp
from mytable t
) t
where time_stamp < lag_time_stamp
In earlier versions, one option is to use a correlated subquery to get the previous timestamp:
select t.*
from mytable t
where time_stamp < (
select time_stamp
from mytable t1
where t1.entity_id = t.entity_id and t1.id < t.id
order by id desc
limit 1
)
SELECT s1.*
FROM sourcetable s1
WHERE EXISTS ( SELECT NULL
FROM sourcetable s2
WHERE s1.id < s2.id
AND s1.entity_id = s2.entity_id
AND s1.time_stamp > s2.time_stamp )
The index by (entity_id, id, time_stamp) or (entity_id, time_stamp, id) will increase the performance.

How to get the average time between multiple dates

What I'm trying to do is bucket my customers based on their transaction frequency. I have the date recorded for every time they transact but I can't work out to get the average delta between each date. What I effectively want is a table showing me:
| User | Average Frequency
| 1 | 15
| 2 | 15
| 3 | 35
...
The data I currently have is formatted like this:
| User | Transaction Date
| 1 | 2018-01-01
| 1 | 2018-01-15
| 1 | 2018-02-01
| 2 | 2018-06-01
| 2 | 2018-06-18
| 2 | 2018-07-01
| 3 | 2019-01-01
| 3 | 2019-02-05
...
So basically, each customer will have multiple transactions and I want to understand how to get the delta between each date and then average of the deltas.
I know the datediff function and how it works but I can't work out how to split them transactions up. I also know that the offset function is available in tools like Looker but I don't know the syntax behind it.
Thanks
In MySQL 8+ you can use LAG to get a delayed Transaction Date and then use DATEDIFF to get the difference between two consecutive dates. You can then take the average of those values:
SELECT User, AVG(delta) AS `Average Frequency`
FROM (SELECT User,
DATEDIFF(`Transaction Date`, LAG(`Transaction Date`) OVER (PARTITION BY User ORDER BY `Transaction Date`)) AS delta
FROM transactions) t
GROUP BY User
Output:
User Average Frequency
1 15.5
2 15
3 35
Demo on dbfiddle.com
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(user INT NOT NULL
,transaction_date DATE
,PRIMARY KEY(user,transaction_date)
);
INSERT INTO my_table VALUES
(1,'2018-01-01'),
(1,'2018-01-15'),
(1,'2018-02-01'),
(2,'2018-06-01'),
(2,'2018-06-18'),
(2,'2018-07-01'),
(3,'2019-01-01'),
(3,'2019-02-05');
SELECT user
, AVG(delta) avg_delta
FROM
( SELECT x.*
, DATEDIFF(x.transaction_date,MAX(y.transaction_date)) delta
FROM my_table x
JOIN my_table y
ON y.user = x.user
AND y.transaction_date < x.transaction_date
GROUP
BY x.user
, x.transaction_date
) a
GROUP
BY user;
+------+-----------+
| user | avg_delta |
+------+-----------+
| 1 | 15.5000 |
| 2 | 15.0000 |
| 3 | 35.0000 |
+------+-----------+
I don't know what to say other than use a GROUP BY.
SELECT User, AVG(DATEDIFF(...))
FROM ...
GROUP BY User

SQL calculate timediff between intervals including a time from a separate table

I have 2 different tables called observations and intervals.
observations:
id | type, | start
------------------------------------
1 | classroom | 2017-06-07 16:18:40
2 | classroom | 2017-06-01 15:12:00
intervals:
+----+----------------+--------+------+---------------------+
| id | observation_id | number | task | time |
+----+----------------+--------+------+---------------------+
| 1 | 1 | 1 | 1 | 07/06/2017 16:18:48 |
| 2 | 1 | 2 | 0 | 07/06/2017 16:18:55 |
| 3 | 1 | 3 | 1 | 07/06/2017 16:19:00 |
| 4 | 2 | 1 | 3 | 01/06/2017 15:12:10 |
| 5 | 2 | 2 | 1 | 01/06/2017 15:12:15 |
+----+----------------+--------+------+---------------------+
I want a view that will display:
observation_id | time_on_task (total time in seconds where task = 1)
1 | 13
2 | 5
So I must first check to see if the first observation has task = 1, if it is I must record the difference between the current interval and the start from the observations table, then add that to the total time. From there on after if the task = 1, I just add the time difference from the current interval and previous interval.
I know I can use:
select observation_id, TIME_TO_SEC(TIMEDIFF(max(time),min(time)))
from your_table
group by observation_id
to find the total time in the intervals table between all intervals outside of the first one.
But
1. I need to only include interval times where task = 1. (The endtime for the interval is the one listed)
2. Need the timediff between the first interval and initial start (from observations table) if number = 1
I'm still new to the Stackoverflow community, but you could try to use SQL
LAG() function
For instance
Using an outer Select Statement
SELECT COl1, COL2, (DATEDIFF(mi, Inner.prevtime, Currentdatetime,0)) AS Difference
FROM ( SELECT LAG(Created_Datetime) OVER (ORDER BY Created_Datetime) AS prevtime
From MyTable
Where SomeCondition) as Inner
Sorry if it looks goofy, still trying to learn to format code here.
https://explainextended.com/2009/03/12/analytic-functions-optimizing-lag-lead-first_value-last_value/
Hope it helps

How to get the latest items distinctively in a row?

I want to get the remaining/latest balance of the cardnumber from the rows. Below is the sample of the table.
trans_id | cardnumber | trans_date | balance
---------------------------------------------------------------
1 | 1000005240000008 | 2009-07-03 04:54:27 | 88
2 | 1000005120000008 | 2009-07-04 05:00:07 | 2
3 | 1000005110000008 | 2009-07-05 13:18:39 | 3
4 | 1000005110000008 | 2009-07-06 13:18:39 | 4
5 | 1000005110000008 | 2009-07-07 14:25:32 | 4.5
6 | 1000005120000002 | 2009-07-08 16:50:51 | -1
7 | 1000005240000002 | 2009-07-09 17:03:17 | 1
The result should look like this:
trans_id | cardnumber | trans_date | balance
---------------------------------------------------------------
1 | 1000005110000008 | 2009-07-07 14:25:32 | 4.5
2 | 1000005120000002 | 2009-07-08 16:50:51 | -1
3 | 1000005240000002 | 2009-07-09 17:03:17 | 1
I already have a query but it goes something like this:
SELECT cardnumber, MAX(balance), trans_date
FROM transactions
GROUP BY cardnumber
I really need help on this, im having a hard time. :(
Thanks in advance.
Mark
I don't have a MySQL in front of me at the moment, but something like this should work:
SELECT latest.cardnumber, latest.max_trans_date, t2.balance
FROM
(
SELECT t1.cardnumber, MAX(t1.trans_date) AS max_trans_date
FROM transactions t1
GROUP BY t1.cardnumber
) latest
JOIN transactions t2 ON (
latest.cardnumber = t2.cardnumber AND
latest.max_trans_date = t2.trans_date
)
Probably requires 5.0.x or later. There may be a better way. It's 3AM :-D
Almost the same as derobert's, but other way around. The idea is anyway that you make a subquery that takes the cardnumber with the latest (max) transaction date and then join that with the original table. This of course assumes that there aren't any transactions on cardnumber occuring at the exact same time.
SELECT t1.trans_id, t1.cardnumber, t1.trans_date, t1.balance
FROM transaction AS t1
JOIN (SELECT MAX(trans_date), cardnumber FROM transactions) AS t2 ON t2.cardnumber = t1.cardnumber
SELECT * FROM transactions WHERE (cardnumber,trans_date) in (SELECT cardnumber, MAX(trans_date) FROM transactions GROUP BY cardnumber);