MySQL: Average interval between records - mysql

Assume this table:
id date
----------------
1 2010-12-12
2 2010-12-13
3 2010-12-18
4 2010-12-22
5 2010-12-23
How do I find the average intervals between these dates, using MySQL queries only?
For instance, the calculation on this table will be
(
( 2010-12-13 - 2010-12-12 )
+ ( 2010-12-18 - 2010-12-13 )
+ ( 2010-12-22 - 2010-12-18 )
+ ( 2010-12-23 - 2010-12-22 )
) / 4
----------------------------------
= ( 1 DAY + 5 DAY + 4 DAY + 1 DAY ) / 4
= 2.75 DAY

Intuitively, what you are asking should be equivalent to the interval between the first and last dates, divided by the number of dates minus 1.
Let me explain more thoroughly. Imagine the dates are points on a line (+ are dates present, - are dates missing, the first date is the 12th, and I changed the last date to Dec 24th for illustration purposes):
++----+---+-+
Now, what you really want to do, is evenly space your dates out between these lines, and find how long it is between each of them:
+--+--+--+--+
To do that, you simply take the number of days between the last and first days, in this case 24 - 12 = 12, and divide it by the number of intervals you have to space out, in this case 4: 12 / 4 = 3.
With a MySQL query
SELECT DATEDIFF(MAX(dt), MIN(dt)) / (COUNT(dt) - 1) FROM a;
This works on this table (with your values it returns 2.75):
CREATE TABLE IF NOT EXISTS `a` (
`dt` date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `a` (`dt`) VALUES
('2010-12-12'),
('2010-12-13'),
('2010-12-18'),
('2010-12-22'),
('2010-12-24');

If the ids are uniformly incremented without gaps, join the table to itself on id+1:
SELECT d.id, d.date, n.date, datediff(d.date, n.date)
FROM dates d
JOIN dates n ON(n.id = d.id + 1)
Then GROUP BY and average as needed.
If the ids are not uniform, do an inner query to assign ordered ids first.
I guess you'll also need to add a subquery to get the total number of rows.
Alternatively
Create an aggregate function that keeps track of the previous date, and a running sum and count. You'll still need to select from a subquery to force the ordering by date (actually, I'm not sure if that's guaranteed in MySQL).
Come to think of it, this is a much better way of doing it.
And Even Simpler
Just noting that Vegard's solution is much better.

The following query returns correct result
SELECT AVG(
DATEDIFF(i.date, (SELECT MAX(date)
FROM intervals WHERE date < i.date)
)
)
FROM intervals i
but it runs a dependent subquery which might be really inefficient with no index and on a larger number of rows.

You need to do self join and get differences using DATEDIFF function and get average.

Related

SQL Query to get distinct values from a table and the difference between ordered rows

I have a real time data table with time stamps for different data points
Time_stamp, UID, Parameter1, Parameter2, ....
I have 400 UIDs so each time_stamp is repeated 400 times
I want to write a query that uses this table to check if the real time data flow to the SQL database is working as expected - new timestamp every 5 minute should be available
For this what I usually do is query the DISTINCT values of time_stamp in the table and order descending - do a visual inspection and copy to excel to calculate the difference in minutes between subsequent distinct time_stamp
Any difference over 5 min means I have a problem. I am trying to figure out how I can do something similar in SQL, maybe get a table that looks like this. Tried to use LEAD and DISTINCT together but could not write the code myself, im just getting started on SQL
Time_stamp, LEAD over last timestamp
Thank you for your help
You can use lag analytical function as follows:
select t.* from
(select t.*
lag(Time_stamp) over (order by Time_stamp) as lg_ts
from your_Table t)
where timestampdiff('minute',lg_ts,Time_stamp) > 5
Or you can also use the not exists as follows:
select t.*
from your_table t
where not exists
(select 1 from your_table tt
where timestampdiff('minute',tt.Time_stamp,t.Time_stamp) <= 5)
and t.Time_stamp <> (select min(tt.Time_stamp) from your_table tt)
lead() or lag() is the right approach (depending on whether you want to see the row at the start or end of the gap).
For the time comparison, I recommend direct comparisons:
select t.*
from (select t.*
lead(Time_stamp) over (partition by uid order by Time_stamp) as next_time_stamp
from t
) t
where next_timestamp > time_stamp + interval 5 minute;
Note: exactly 5 minutes seems unlikely. You might want a fudge factor such as:
where next_timestamp > time_stamp + interval 5*60 + 10 second;
timestampdiff() counts the number of "boundaries" between two values. So, the difference in minutes between 00:00:59 and 00:01:02 is 1. And the difference between 00:00:00 and 00:00:59 is 0.
So, a difference of "5 minutes" could really be 4 minutes and 1 second or could be 5 minutes and 59 seconds.

Finding records in a range, rounding down when needed

This is a bit difficult to describe, and I'm not sure if this can be done in SQL. Using the following example data set:
ID Count Date
1 0 1/1/2015
2 3 1/5/2015
3 4 1/6/2015
4 3 1/9/2015
5 9 1/15/2015
I want to return records where the Date column falls into a range. But, if the "from" date doesn't exist in the table, I want to use the most recent date as my "From" select. For example, if my date range is between 1/5 and 1/9, I would expect to have records 2,3, and 4 returned. But, if I have a date range of 1/3 - 1/6 I want to return records 1,2,and 3. I want to include record 1 because, as 1/3 does not exist, I want the value of the Count that is rounded down.
Any thoughts on how this can be done? I'm using MySQL.
Basically, you need to replace the from date with the latest date before or on that date. Let me assume that the variables are #v_from and #v_to.
select e.*
from example e
where e.date >= (select max(e2.date) from example e2 where e2.date <= #v_from) and
e.date <= #v_to;
EDIT AFTER EDIT:
SELECT *
FROM TABLE
WHERE DATE BETWEEN (
SELECT Date
FROM TABLE
WHERE Date <= #Start
ORDER BY Date DESC
LIMIT 1
)
AND #End
Or
SELECT *
FROM TABLE
WHERE DATE BETWEEN (
SELECT MAX(Date)
FROM TABLE
WHERE Date <= #Start
)
AND #End

MySQL Query - Include dates without records

I have a report that displays a graph. The X axis uses the date from the below query. Where the query returns no date, I am getting gaps and would prefer to return a value. Is there any way to force a date where there are no records?
SELECT
DATE(instime),
CASE
WHEN direction = 1 AND duration > 0 THEN 'Incoming'
WHEN direction = 2 THEN 'Outgoing'
WHEN direction = 1 AND duration = 0 THEN 'Missed'
END AS type,
COUNT(*)
FROM taxticketitem
GROUP BY
DATE(instime),
CASE
WHEN direction = 1 AND duration > 0 THEN 'Incoming'
WHEN direction = 2 THEN 'Outgoing'
WHEN direction = 1 AND duration = 0 THEN 'Missed'
END
ORDER BY DATE(instime)
One possible way is to create a table of dates and LEFT JOIN your table with them. The table could look something like this:
CREATE TABLE `datelist` (
`date` DATE NOT NULL,
PRIMARY KEY (`date`)
);
and filled with all dates between, say Jan-01-2000 through Dec-31-2050 (here is my Date Generator script).
Next, write your query like this:
SELECT datelist.date, COUNT(taxticketitem.id) AS c
FROM datelist
LEFT JOIN taxticketitem ON datelist.date = DATE(taxticketitem.instime)
WHERE datelist.date BETWEEN `2012-01-01` AND `2012-12-31`
GROUP BY datelist.date
ORDER BY datelist.date
LEFT JOIN and counting not null values from right table's ensures that the count is correct (0 if no row exists for a given date).
You would need to have a set of dates to LEFT JOIN your table to it. Unfortunately, MySQL lacks a way to generate it on the fly.
You would need to prepare a table with, say, 100000 consecutive integers from 0 to 99999 (or how long you think your maximum report range would be):
CREATE TABLE series (number INT NOT NULL PRIMARY KEY);
and use it like this:
SELECT DATE(instime) AS r_date, CASE ... END AS type, COUNT(instime)
FROM series s
LEFT JOIN
taxticketitems ti
ON ti.instime >= '2013-01-01' + INTERVAL number DAY
AND ti.instime < '2013-01-01' + INTERVAL number + 1 DAY
WHERE s.number <= DATEDIFF('2013-02-01', '2013-01-01')
GROUP BY
r_date, type
Had to do something similar before.
You need to have a subselect to generate a range of dates. All the dates you want. Easiest with a start date added to a number:-
SELECT DATE_ADD(SomeStartDate, INTERVAL (a.I + b.1 * 10) DAY)
FROM integers a, integers b
Given a table called integers with a single column called i with 10 rows containing 0 to 9 that SQL will give you a range of 100 days starting at SomeStartDate
You can then left join your actual data against that to get the full range.

Some questions about SQL group by week

I have some problems when coding SQL group by week.
I have a MySQL table named order.
In this entity, there are several attributes, called 'order_id', 'order_date', 'amount', etc.
I want to make a table to show the statistics of past 7 days order sales amount.
I think first I should get the today value.
Since I use Java Server Page, the code like this:
Calendar cal = Calendar.getInstance();
int day = cal.get(Calendar.DATE);
int Month = cal.get(Calendar.MONTH) + 1;
int year = cal.get(Calendar.YEAR);
String today = year + "-" + Month + "-" + day;
then, I need to use group by statement to calculate the SUM of past 7 day total sales amount.
like this:
ResultSet rs=statement.executeQuery("select order_date, SUM(amount) " +
"from `testing`.`order` GROUP BY order_date");
I have problem here. In my SQL, all order_date will be displayed.
How can I modify this SQL so that only display past seven days order sale amount?
Besides that, I discover a problem in my original SQL.
That is, if there is no sales on that day, no results would be displayed.
OF course, I know the ResultSet does not allow return null values in my SQL.
I just want to know if I need the past 7 order sales even the amount is 0 dollars,
Can I have other methods to show the 0?
Please kindly give me advices if you have idea.
Thank you.
Usually it occurs to create with a script or with a stored procedure a calendar table with all dates.
However if you prefer you can create a table with few dates (in your case dates of last week) with a single query.
This is an example:
create table orders(
id int not null auto_increment primary key,
dorder date,
amount int
) engine = myisam;
insert into orders (dorder,amount)
values (curdate(),100),
(curdate(),200),
('2011-02-24',50),
('2011-02-24',150),
('2011-02-22',10),
('2011-02-22',20),
('2011-02-22',30),
('2011-02-22',5),
('2011-02-19',10);
select t.cdate,sum(coalesce(o.amount,0)) as total
from (
select curdate() -
interval tmp.digit * 1 day as `cdate`
from (
select 0 as digit union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 ) as tmp) as t
left join orders as o
on t.cdate = o.dorder and o.dorder >= curdate() - interval 7 day
group by t.cdate
order by t.cdate desc
Hope that it helps. Regards.
To answer your question "How can I modify this SQL so that only display past seven days order sale amount?"
Modify the SQL statement by adding a where clause to it:
Where order_date >= #date_7days_ago
The value for this #date_7days_ago date variable can be set before your statement:
Select #date_7days_ago = dateadd(dd,-7,getdate())
Adding that where clause to your query will return only those records which order date is in the last seven days.
Hope this helps.
You can try using this:
ResultSet rs = statement.executeQuery(
"SELECT IFNULL(SUM(amount),0)
FROM table `testing`.`order`
WHERE order_date >= DATE_SUB('" + today + "', INTERVAL 7 DAY)"
);
This will get you the number of orders made in the last 7 days, and 0 if there were none.

Help needed optimizing MySQL SELECT query

I have a MySQL table like this one:
day int(11)
hour int(11)
amount int(11)
Day is an integer with a value that spans from 0 to 365, assume hour is a timestamp and amount is just a simple integer. What I want to do is to select the value of the amount field for a certain group of days (for example from 0 to 10) but I only need the last value of amount available for that day, which pratically is where the hour field has its max value (inside that day). This doesn't sound too hard but the solution I came up with is completely inefficient.
Here it is:
SELECT q.day, q.amount
FROM amt_table q
WHERE q.day >= 0 AND q.day <= 4 AND q.hour = (
SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day
) GROUP BY day
It takes 5 seconds to execute that query on a 11k rows table, and it just takes a span of 5 days; I may need to select a span of en entire month or year so this is not a valid solution.
Anybody who can help me find another solution or optimize this one is really appreciated
EDIT
No indexes are set, but (day, hour, amount) could be a PRIMARY KEY if needed
Use:
SELECT a.day,
a.amount
FROM AMT_TABLE a
JOIN (SELECT t.day,
MAX(t.hour) AS max_hour
FROM AMT_TABLE t
GROUP BY t.day) b ON b.day = a.day
AND b.max_hour = a.hour
WHERE a.day BETWEEN 0 AND 4
I think you're using the GROUP BY a.day just to get a single amount value per day, but it's not reliable because in MySQL, columns not in the GROUP BY are arbitrary -- the value could change. Sadly, MySQL doesn't yet support analytics (ROW_NUMBER, etc) which is what you'd typically use for cases like these.
Look at indexes on the primary keys first, then add indexes on the columns used to join tables together. Composite indexes (more than one column to an index) are an option too.
I think the problem is the subquery in the where clause. MySQl will at first calculate this "SELECT MAX(p.hour) FROM amt_table p WHERE p.day = q.day" for the whole table and afterwards select the days. Not quite efficient :-)