SQL Work out the average time difference between total rows - mysql

I've searched around SO and can't seem to find a question with an answer that works fine for me. I have a table with almost 2 million rows in, and each row has a MySQL Date formatted field.
I'd like to work out (in seconds) how often a row was inserted, so work out the average difference between the dates of all the rows with a SQL query.
Any ideas?
-- EDIT --
Here's what my table looks like
id, name, date (datetime), age, gender

If you want to know how often (on average) a row was inserted, I don't think you need to calculate all the differences. You only need to sum up the differences between adjacent rows (adjacent based on the timestamp) and divide the result by the number of the summands.
The formula
((T1-T0) + (T2-T1) + … + (TN-TN-1)) / N
can obviously be simplified to merely
(TN-T0) / N
So, the query would be something like this:
SELECT TIMESTAMPDIFF(SECOND, MIN(date), MAX(date)) / (COUNT(*) - 1)
FROM atable
Make sure the number of rows is more than 1, or you'll get the Division By Zero error. Still, if you like, you can prevent the error with a simple trick:
SELECT
IFNULL(TIMESTAMPDIFF(SECOND, MIN(date), MAX(date)) / NULLIF(COUNT(*) - 1, 0), 0)
FROM atable
Now you can safely run the query against a table with a single row.

Give this a shot:
select AVG(theDelay) from (
select TIMESTAMPDIFF(SECOND,a.date, b.date) as theDelay
from myTable a
join myTable b on b.date = (select MIN(x.date)
from myTable x
where x.date > a.date)
) p
The inner query joins each row with the next row (by date) and returns the number of seconds between them. That query is then encapsulated and is queried for the average number of seconds.
EDIT: If your ID column is auto-incrementing and they are in date order, you can speed it up a bit by joining to the next ID row rather than the MIN next date.
select AVG(theDelay) from (
select TIMESTAMPDIFF(SECOND,a.date, b.date) as theDelay
from myTable a
join myTable b on b.date = (select MIN(x.id)
from myTable x
where x.id > a.id)
) p
EDIT2: As brilliantly commented by Mikael Eriksson, you may be able to just do:
select (TIMESTAMPDIFF(SECOND,(MAX(date),MIN(date)) / COUNT(*)) from myTable
There's a lot you can do with this to eliminate off-peak hours or big spans without a new record, using the join syntax in my first example.

Try this:
select avg(diff) as AverageSecondsBetweenDates
from (
select TIMESTAMPDIFF(SECOND, t1.MyDate, min(t2.MyDate)) as diff
from MyTable t1
inner join MyTable t2 on t2.MyDate > t1.MyDate
group by t1.MyDate
) a

Related

Mysql. how to join two tables from this example?

everybody.
I have two requests.
1 query - show the list of dates with time 22:00 from one table
SELECT DATE_FORMAT(tt.create_time,"%Y-%m-%d 22:00:00") AS DAY,tt.id
FROM tick tt
GROUP BY DATE_FORMAT(tt.create_time,"%Y-%m-%d")
2 query - shows the number of records that have create_time less than the date specified in the query
SELECT COUNT(*) AS count FROM
(SELECT * FROM
(SELECT * FROM tick_history th
WHERE th.create_time < '2019-04-15 22:00:00'
ORDER BY th.id DESC) AS t1
GROUP BY t1.tick_id) AS t2
WHERE t2.state NOT IN (1,4,9) AND t2.queue = 1
Is it possible to somehow combine these two queries to get one column with dates from the first query, and the second column is the number from the second query for each date from the first column?
Ie as if substituted date and calculated the number of the second request..
Is it possible? Help with request please

Is there a way to locate/detect streaks of data from several columns at the same time in MYSQL?

I've been trying for a while to come up with a code that calculates streaks from several columns at the same time, for a table where i need to find streaks of values that are above 0. At first, i managed to use a formula that shows the rungroup, the column indicating the number of data that differs the one in question at its respective row. As shown bellow:
select descrip,
`1.01`,
(select count(*)
from `all_data` dp
where dp.`1.01` <> dpo.`1.01`
and dp.descrip <= dpo.descrip) as rungroup,
`1.02`,
(select count(*)
from `all_data` dp
where dp.`1.02` <> dpo.`1.02`
and dp.descrip <= dpo.descrip) as rungroup_2
from `all_data` dpo;
1.01 and 1.02 are the name of the columns, and descrip is used to ordenate the data. This model works so far, but i don't know how couls i use it within another query to show streaks from both columns. Would there be a way for such?
You should for sure make an UNION query with your 2 columns.
Something like this
SELECT descrip, thecol,
(select count(*)
from `all_data` dp
where (dp.`1.01` <> dpo.thecol AND dp.`1.02` <> dpo.thecol)
and dp.descrip <= dpo.descrip) as rungroup
FROM
(
SELECT *
FROM
(
SELECT '1.01' AS origcol, descrip, `1.01` AS thecol from `all_data`
UNION ALL
SELECT '1.02' AS origcol, descrip, `1.02` AS thecol from `all_data`
)
ORDER BY descrip, thecol
) dpo;
However I am totally unsure about this part :
where (dp.`1.01` <> dpo.thecol AND dp.`1.02` <> dpo.thecol)
it might be an OR instead of an AND. It is not easy without seeing the data.
Just play around with my query, decompose to get only the UNION subquery, fix... and you'll get it.

How to calculate time difference between current and previous row in MySQL

I have mysql table t1 like this :
What i want to do is do calculations between all rows and save the value in new coloumn called diff
TICKETID| DATENEW | DIFF
16743 12:36:46 0
16744 12:51:25 15. minute
16745 12:57:25 6.5 minute
..........
.......
etc
i know there are similar questions ,but ive tried all of the solutions
posted here with no success,so how to solve this query ???
To get the time difference in minutes between the current and previous row, you can use timestampdiff on datenow and the previous time, which you can get via subquery:
select ticketid, datenew,
timestampdiff(minute,datenew,(select datenew from mytable t2
where t2.ticketid < t1.ticketid order by t2.ticketid desc limit 1)) as diff
from mytable t1
Update
Here's another way using a variable to store the previous datenew value that might be faster:
select ticketid, datenew, timestampdiff(minute,datenew,prevdatenew)
from (
select ticketid, datenew, #prevDateNew as prevdatenew,
#prevDateNew := datenew
from mytable order by ticketid
) t1
select
t1.*
,coalesce(timestampdiff(MINUTE,t2.dt,t1.dt),0) as tdiff
from t t1 left join t t2
on t1.id = t2.id+1
order by t1.id
As you are only looking for a difference between the current row and the next, you can join on the next row and calculate the time difference in minutes.
Note: This assumes there are no missing id's in the table. You might have to change the join condition if there were missing id's.
SQL Fiddle: http://www.sqlfiddle.com/#!9/4dcae/15

MySQL join date columns with 1-month lag and performance issues

Note: I found this similar question but it does not address my issue, so I do not believe this is a duplicate.
I have two simple MySQL tables (created with the MyISAM engine), Table1 and Table2.
Both of the tables have 3 columns, a date-type column, an integer ID column, and a float value column. Both tables have about 3 million records and are very straightforward.
The contents of the tables looks like this (with Date and Id as primary keys):
Date Id Var1
2012-1-27 1 0.1
2012-1-27 2 0.5
2012-2-28 1 0.6
2012-2-28 2 0.7
(assume Var1 becomes Var2 for the second table).
Note that for each (year, month, ID) triplet, there will only be a single entry. But the actual day of the month that appears is not necessarily the final day, nor is it the final weekday, nor is it the final business day, etc... It's just some day of the month. This day is important as an observation day in other tables, but the day-of-month itself doesn't matter between Table1 and Table2.
Because of this, I cannot rely on Date + INTERVAL 1 MONTH to produce the matching day-of-month for the date it should match to that is one month ahead.
I'm looking to join the two tables on Date and Id but where the values from the second table (Var2) come from 1-month ahead than Var1.
This sort of code will accomplish it, but I am noticing a significant performance degradation with this, explained below.
-- This is exceptionally slow for me
SELECT b.Date,
b.Id,
a.Var1,
b.Var2
FROM Table1 a
JOIN Table2 b
ON a.Id = b.Id
AND YEAR(a.Date + INTERVAL 1 MONTH) = YEAR(b.Date)
AND MONTH(a.Date + INTERVAL 1 MONTH) = MONTH(b.Date)
-- This returns quickly, but if I use it as a sub-query
-- then the parent query is very slow.
SELECT Date + INTERVAL 1 MONTH as FutureDate,
Id,
Var1
FROM Table1
-- That is, the above is fast, but this is super slow:
select b.Date,
b.Id,
a.Var1,
b.Var2
FROM (SELECT Date + INTERVAL 1 MONTH as FutureDate
Id,
Var1
FROM Table1) a
JOIN Table2 b
ON YEAR(a.FutureDate) = YEAR(b.Date)
AND MONTH(a.FutureDate) = MONTH(b.Date)
AND a.Id = b.Id
I've tried re-ordering the JOIN criteria, thinking maybe that matching on Id first in the code would change the query execution plan, but it seems to make no difference.
When I say "super slow", I mean that option #1 from the code above doesn't return the results for all 3 million records even if I wait for over an hour. Option #2 returns in less than 10 minutes, but then option number three takes longer than 1 hour again.
I don't understand why the introduction of the date lag makes it take so long.
How can I
profile the queries to understand why it takes a long time?
write a better query for joining tables based on a 1-month date lag (where day-of-month that results from the 1-month lag may cause mismatches).
Here is an alternative approach:
SELECT b.Date, b.Id, b.Var2
(select a.var1
from Table1 a
where a.id = b.id and a.date < b.date
order by a.date
limit 1
) as var1
b.Var2
FROM Table2 b;
Be sure the primary index is set up with id first and then date on Table1. Otherwise, create another index Table1(id, date).
Note that this assumes that the preceding date is for the preceding month.
Here's another alternative way to go about this:
SELECT thismonth.Date,
thismonth.Id,
thismonth.Var1 AS Var1_thismonth,
lastmonth.Var1 AS Var1_lastmonth
FROM Table2 AS thismonth
JOIN
(SELECT id, Var1,
DATE(DATE_FORMAT(Date,'%Y-%m-01')) as MonthStart
FROM Table2
) AS lastmonth
ON ( thismonth.id = lastmonth.id
AND thismonth.Date >= lastmonth.MonthStart + INTERVAL 1 MONTH
AND thismonth.Date < lastmonth.MonthStart + INTERVAL 2 MONTH
)
To get this to perform ideally, I think you're going to need a compound covering index on (id, Date, Var1).
It works by generating a derived table containing Id,MonthStart,Var1 and then joining the original table to it by a sequence of range scans. Hence the compound covering index.
The other answers gave very useful tips, but ultimately, without making significant modifications to the index structure of my data (which is not feasible at the moment), those methods would not work faster (in any meaningful sense) than what I had already tried in the question.
Ollie Jones gave me the idea to use date formatting, and coupling that with the TIMESTAMPDIFF function seems to make it passably fast, though I still welcome any comments explaining why the use of YEAR, MONTH, DATE_FORMAT, and TIMESTAMPDIFF have such wildly different performance properties.
SELECT b.Date,
b.Id,
b.Var2,
a.Date,
a.Id,
a.Var1
FROM Table1 a
JOIN Table2 b
ON a.Id = b.Id
AND (TIMESTAMPDIFF(MONTH,
DATE_FORMAT(a.Date, '%Y-%m-01'),
DATE_FORMAT(b.Date, '%Y-%m-01')) = 1)

SELECT * FROM table while condition=true?

i want to select something from table while one condition is true,
SELECT * FROM (SELECT * FROM`table1` `t1` ORDER BY t1.date) `t2` WHILE t2.id!=5
when while condition comes to false it stop selecting next rows.
Please help me, I have already search a lot and many similars in stackoverflow but I can't get it.
please don't tell me about where , i want solution in sql not in php or anything other
OK the real problem is here
SELECT *,(SELECT SUM(t2.amount) FROM (select * from transaction as t1 order by t1.date) `t2`) as total_per_transition FROM transaction
here i want to calculate total balance on each transaction
First find the first date where the condition fails, so where id=5:
SELECT date
FROM table1
WHERE id = 5
ORDER BY date
LIMIT 1
Then make the above a derived table (we call it lim) and join it to the original table to get all rows with previous dates: t.date < lim.date
SELECT t.*
FROM table1 AS t
JOIN
( SELECT date
FROM table1
WHERE id = 5
ORDER BY date
LIMIT 1
) AS lim
ON t.date < COALESCE(lim.date, '9999-12-31') ;
The COALESCE() is for the case when there are no rows at all with id=5 - and in that case we want all rows from the table.