Difficult MySQL Query - Getting Max difference between dates - mysql

I have a MySQL table of the following form
account_id | call_date
1 2013-06-07
1 2013-06-09
1 2013-06-21
2 2012-05-01
2 2012-05-02
2 2012-05-06
I want to write a MySQL query that will get the maximum difference (in days) between successive dates in call_date for each account_id. So for the above example, the result of this query would be
account_id | max_diff
1 12
2 4
I'm not sure how to do this. Is this even possible to do in a MySQL query?
I can do datediff(max(call_date),min(call_date)) but this would ignore dates in between the first and last call dates. I need some way of getting the datediff() between each successive call_date for each account_id, then finding the maximum of those.

I'm sure fp's answer will be faster, but just for fun...
SELECT account_id
, MAX(diff) max_diff
FROM
( SELECT x.account_id
, DATEDIFF(MIN(y.call_date),x.call_date) diff
FROM my_table x
JOIN my_table y
ON y.account_id = x.account_id
AND y.call_date > x.call_date
GROUP
BY x.account_id
, x.call_date
) z
GROUP
BY account_id;

CREATE TABLE t
(`account_id` int, `call_date` date)
;
INSERT INTO t
(`account_id`, `call_date`)
VALUES
(1, '2013-06-07'),
(1, '2013-06-09'),
(1, '2013-06-21'),
(2, '2012-05-01'),
(2, '2012-05-02'),
(2, '2012-05-06')
;
select account_id, max(diff) from (
select
account_id,
timestampdiff(day, coalesce(#prev, call_date), call_date) diff,
#prev := call_date
from
t
, (select #prev:=null) v
order by account_id, call_date
) sq
group by account_id
| ACCOUNT_ID | MAX(DIFF) |
|------------|-----------|
| 1 | 12 |
| 2 | 4 |
see it working live in an sqlfiddle

If you have an index on account_id, call_date, then you can do this rather efficiently without variables:
select account_id, max(call_date - prev_call_date) as diff
from (select t.*,
(select t2.call_date
from table t2
where t2.account_id = t.account_id and t2.call_date < t.call_date
order by t2.call_date desc
limit 1
) as prev_call_date
from table t
) t
group by account_id;

Just for educational purposes, doing it with JOIN:
SELECT t1.account_id,
MAX(DATEDIFF(t2.call_date, t1.call_date)) AS max_diff
FROM t t1
LEFT JOIN t t2
ON t2.account_id = t1.account_id
AND t2.call_date > t1.call_date
LEFT JOIN t t3
ON t3.account_id = t1.account_id
AND t3.call_date > t1.call_date
AND t3.call_date < t2.call_date
WHERE t3.account_id IS NULL
GROUP BY t1.account_id
Since you didn't specify, this shows max_diff of NULL for accounts with only 1 call.

SELECT a1.account_id , max(a1.call_date - a2.call_date)
FROM account a2, account a1
WHERE a1.account_id = a2.account_id
AND a1.call_date > a2.call_date
AND NOT EXISTS
(SELECT 1 FROM account a3 WHERE a1.call_date > a3.call_date AND a2.call_date < a3.call_date)
GROUP BY a1.account_id
Which gives :
ACCOUNT_ID MAX(A1.CALL_DATE - A2.CALL_DATE)
1 12
2 4

Related

Select the last price according to the date using GROUP BY

I'm trying to do a request with a group BY.
Here is an exemple of my table ticket :
id DtSell Price Qt
1 01-01-2017 3.00 1
1 02-01-2017 2.00 3
2 01-01-2017 5.00 5
2 02-01-2017 8.00 2
And my request :
SELECT id, Price, sum(Qt) FROM ticket
GROUP BY id;
but unfortunately, the price returned is not necessarily the right one; I would like to have the last price according to DtSell like that :
id Price sum(Qt)
1 2.00 4
2 8.00 7
But i didn't find how to do it.
Can you help me ?
Thank you in advance!!
You might need a sub query,try below:
SELECT
t1.id,
(SELECT t2.price FROM ticket t2 WHERE t2.id=t1.id
ORDER BY t2.DtSell DESC LIMIT 1 ) AS price,
SUM(t1.Qt)
FROM ticket t1 GROUP BY t1.id;
You can do this with a group_concat()/substring_index() trick:
SELECT id, Price, SUM(Qt)
SUBSTRING_INDEX(GROUP_CONCAT(price ORDER BY dtsell DESC), ',' 1) as last_price
FROM ticket
GROUP BY id;
Two notes:
This is subject to internal limits on the length of the intermediate string used for GROUP_CONAT() (a limit that can easily be changed).
It changes the type of price to a string.
Try this query.
SELECT id, Price, sum(Qt) FROM ticket
GROUP BY id,Price
Your Output;
id Price sum(Qt)
1 3.00 4
2 8.00 7
You can select all rows from ticket grouped by id ( to sum quantity), then join to the rows which have the max dtsell for each id group( to select the price).
http://sqlfiddle.com/#!9/574cb9/8
SELECT t.id
, t3.price
, SUM(t.Qt)
FROM ticket t
JOIN ( SELECT t1.id
, t1.price
FROM ticket t1
JOIN ( SELECT id
, MAX(dtsell) dtsell
FROM ticket
GROUP BY id ) t2
ON t1.id = t2.id
AND t1.dtsell = t2.dtsell ) t3
ON t3.id = t.id
GROUP BY t.id;
You can do it like this:
declare #t table (id int, dtsell date, price numeric(18,2),qt int)
insert into #t
values
(1 ,'01-01-2017', 3.00 , 1),
(1 ,'02-01-2017', 2.00 , 3),
(2 ,'01-01-2017', 5.00 , 5),
(2 ,'02-01-2017', 8.00 , 2)
select x.id,price,z.Qt from (
select id,price,dtsell,row_number() over(partition by id order by dtsell desc ) as rn from #t
)x
inner join (select SUM(qt) as Qt,ID from #t group by id ) z on x.id = z.id
where rn = 1

Counting between dates

I need the count of all dates including the nonexistent
SELECT ifnull(COUNT(*),0) as num , date_format(c.dataCupo,"%d/%m/%Y") as data
FROM cupons c
WHERE c.dataCupo between "2017-02-02" AND "2018-05-04" AND c.proveidor!="VINCULADO" and c.empresa=1
group by date_format(c.dataCupo,"%Y-%m-%d")
//And I need to count all months including the nonexistent
SELECT ifnull(COUNT(*),0) as num , date_format(c.dataCupo,"%m/%Y") as data
FROM cupons c
WHERE c.dataCupo between "2017-02-02" AND "2018-05-04" AND c.proveidor!="VINCULADO" and c.empresa=1
group by date_format(c.dataCupo,"%Y-%m")
//And I need to count of all years including the nonexistent
SELECT ifnull(COUNT(*),0) as num , date_format(c.dataCupo,"%Y") as data
FROM cupons c
WHERE c.dataCupo between "2015-02-02" AND "2018-05-04" AND c.proveidor!="VINCULADO" and c.empresa=1
group by date_format(c.dataCupo,"%Y")
The result i want its:
02/02/2017 | 10
03/02/2017 | 0
04/02/2017 | 2
05/02/2017 | 0
....
AND
02/2017 | 50
03/2017 | 0
04/2017 | 10
AND
2015 | 0
2016 | 10
2017 | 15
2018 | 0
Easiest way to do this is with a Calendar table. This table will have a datetime column that you can join to and is really useful for reporting. Here goes an example of how to make one in MySQL.
https://gist.github.com/bryhal/4129042
Now that you have the Calendar table, you can join to it to find counts of all dates in a date range.
All days example:
select num, td.db_date
FROM
time_dimension td
left join
(SELECT ifnull(COUNT(*),0) as num , c.dataCupo as data
FROM cupons c
WHERE c.dataCupo between "2017-02-02" AND "2018-05-04" AND
c.proveidor!="VINCULADO" and c.empresa=1
group by c.dataCupo) t
on t.data = td.db_date
WHERE td.db_date between "2017-02-02" AND "2018-05-04"
All months example:
select
sum(t.num),
CONCAT(month(td.db_date),"-",year(td.db_date))
FROM
time_dimension td
left join
(SELECT
ifnull(COUNT(*),0) as num ,
c.dataCupo as data
FROM cupons c
WHERE c.dataCupo between "2017-02-02" AND "2018-05-04" AND
c.proveidor!="VINCULADO" and c.empresa=1) t
on c.data = t.data
WHERE td.db_date between "2017-02-02" AND "2018-05-04"
group by CONCAT(month(td.db_date),"-",year(td.db_date))
You should create A Temporary Table To Store All The Date Ranges Between Your Date Ranges
CREATE TEMPORARY TABLE IF NOT EXISTS AllDateRange engine=memory
SELECT DATE(cal.date) Date
FROM (
SELECT ( case when #prmToDate = #prmFromDate then #prmFromDate else
SUBDATE( #prmFromDate, INTERVAL (DATEDIFF(#prmToDate,#prmFromDate)) DAY) + INTERVAL xc DAY end ) AS Date
FROM (
SELECT #xi:=#xi+1 as xc from
(SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc1,
(SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc2,
(SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc3,
(SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4) xc4,
(SELECT #xi:=+1) xc0
) xxc1
) cal WHERE DATE( cal.date) >= DATE(#prmFromDate) and DATE( cal.date) <= DATE(#prmToDate) ;
And Then Join It With Your Table As.
SELECT count(COALESCE(c.empresa, 0)) as num , date_format(a.Date,"%d/%m/%Y") as data from AllDateTimeRange a
left join cupons c on a.Date=date_format(c.dataCupo,"%Y-%m-%d")
WHERE c.dataCupo between #prmFromDate AND #prmToDate AND c.proveidor!="VINCULADO" and c.empresa=1
group by date_format(c.dataCupo,"%Y-%m-%d");
Similarly Create Temp Tables For Month & Year And Then Join With Your Primary Table, as above in order to get your required results for month and year respectively.

How would I return the result of SQL math operations?

So I was taking a test recently with some higher level SQL problems. I only have what I would consider "intermediate" experience in SQL and I've been working on this for a day or so now. I just can't figure it out.
Here's the problem:
You have a table with 4 columns as such:
EmployeeID int unique
EmployeeType int
EmployeeSalary int
Created date
Goal: I need to retrieve the difference between the latest two EmployeeSalary for any EmployeeType with more than 1 entry. It has to be done in one statement (nested queries are fine).
Example Data Set: http://sqlfiddle.com/#!9/0dfc7
EmployeeID | EmployeeType | EmployeeSalary | Created
-----------|--------------|----------------|--------------------
1 | 53 | 50 | 2015-11-15 00:00:00
2 | 66 | 20 | 2014-11-11 04:20:23
3 | 66 | 30 | 2015-11-03 08:26:21
4 | 66 | 10 | 2013-11-02 11:32:47
5 | 78 | 70 | 2009-11-08 04:47:47
6 | 78 | 45 | 2006-11-01 04:42:55
So for this data set, the proper return would be:
EmployeeType | EmployeeSalary
-------------|---------------
66 | 10
78 | 25
The 10 comes from subtracting the latest two EmployeeSalary values (30 - 20) for the EmployeeType of 66. The 25 comes from subtracting the latest two EmployeeSalary values (70-45) for EmployeeType of 78. We skip EmployeeID 53 completely because it only has one value.
This one has been destroying my brain. Any clues?
Thanks!
How to make really simple query complex?
One funny way(not best performance) to do it is:
SELECT final.EmployeeType, SUM(salary) AS difference
FROM (
SELECT b.EmployeeType, b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 1
UNION ALL
SELECT b.EmployeeType, -b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 2
) AS final
GROUP BY final.EmployeeType;
SqlFiddleDemo
EDIT:
The keypoint is MySQL doesn't support windowed function so you need to use equivalent code:
For example solution in SQL Server:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY EmployeeType ORDER BY Created DESC) AS rn
FROM #tab
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1
LiveDemo
And MySQL equivalent:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (
SELECT t1.EmployeeType, t1.EmployeeSalary,
count(t2.Created) + 1 as rn
FROM #tab t1
LEFT JOIN #tab t2
ON t1.EmployeeType = t2.EmployeeType
AND t1.Created < t2.Created
GROUP BY t1.EmployeeType, t1.EmployeeSalary
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1;
LiveDemo2
The dataset of the fiddle is different from the example above, which is confusing (not to mention a little perverse). Anyway, there's lots of ways to skin this particular cat. Here's one (not the fastest, however):
SELECT a.employeetype, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) b
ON b.employeetype = a.employeetype
AND b.rank = a.rank+1
WHERE a.rank = 1;
a very similar but faster solution looks like this (although you sometimes need to assign different variables between tables a and b - for reasons I still don't fully understand)...
SELECT a.employeetype
, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) a
JOIN
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) b
ON b.employeetype = a.employeetype
AND b.i = a.i + 1
WHERE a.i = 1;

How to select next record value in sql result

EDIT: To clarify there are many users and each user has many records, this is a log table of user activities,
how to find the timestamp difference every record and subsequent record that satisfies some condition ,
for example assuming the table is something like this
| id |u_id| .. | timestamp |
|----|----|----|--------------------|
| 50 | 1 | .. | 2014-04-22 15:35:44|
| 90 | 2 | .. | 2014-04-22 13:35:44|
| .. | .. | .. | ..... |
How do I find the time difference between every record and the next record for only one user id ?
Assuming that you want to do this for all users, the easiest way is to use variables:
select t.*,
if(u_id = #u_id, timediff(`timestamp`, #timestamp), NULL) as diff,
#timestamp := `timestamp`, #u_id := u_id
from table t cross join
(select #timestamp := 0, #u_id := 0) var
order by u_id, timestamp;
It is important that you explicitly order the records to be sure that the processing occurs in sequential order.
Try
select timediff(`timestamp`, #lasttime),
#lasttime := `timestamp`
from your_table
cross join (select #lasttime := 0) d
where u_id = 1
order by id
There are a couple of ways you can do this, the first would be to use a correlated subquery:
SELECT T.id,
T.u_id,
timestamp,
( SELECT T2.timestamp
FROM T AS T2
WHERE T2.u_id = T.u_id
AND T2.timestamp > T.timestamp
ORDER BY T2.Timestamp
LIMIT 1
) AS NextTimeStamp
FROM T;
Or you could do this using JOIN.
SELECT T.id,
T.u_id,
T.timestamp,
T2.timestamp AS NextTimeStamp
FROM T
LEFT JOIN T AS T2
ON T2.u_id = T.u_id
AND T2.timestamp > T.timestamp
LEFT JOIN T AS T3
ON T3.u_id = T.u_id
AND T3.timestamp > T.timestamp
AND T3.timestamp < T2.timestamp
WHERE T3.id IS NULL;
Which one is best will depend on your actual requirements, amount of data, and indexes.

MySQL query, MAX() + GROUP BY

Daft SQL question. I have a table like so ('pid' is auto-increment primary col)
CREATE TABLE theTable (
`pid` INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`timestamp` TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
`cost` INT UNSIGNED NOT NULL,
`rid` INT NOT NULL,
) Engine=InnoDB;
Actual table data:
INSERT INTO theTable (`pid`, `timestamp`, `cost`, `rid`)
VALUES
(1, '2011-04-14 01:05:07', 1122, 1),
(2, '2011-04-14 00:05:07', 2233, 1),
(3, '2011-04-14 01:05:41', 4455, 2),
(4, '2011-04-14 01:01:11', 5566, 2),
(5, '2011-04-14 01:06:06', 345, 1),
(6, '2011-04-13 22:06:06', 543, 2),
(7, '2011-04-14 01:14:14', 5435, 3),
(8, '2011-04-14 01:10:13', 6767, 3)
;
I want to get the PID of the latest row for each rid (1 result per unique RID). For the sample data, I'd like:
pid | MAX(timestamp) | rid
-----------------------------------
5 | 2011-04-14 01:06:06 | 1
3 | 2011-04-14 01:05:41 | 2
7 | 2011-04-14 01:14:14 | 3
I've tried running the following query:
SELECT MAX(timestamp),rid,pid FROM theTable GROUP BY rid
and I get:
max(timestamp) ; rid; pid
----------------------------
2011-04-14 01:06:06; 1 ; 1
2011-04-14 01:05:41; 2 ; 3
2011-04-14 01:14:14; 3 ; 7
The PID returned is always the first occurence of PID for an RID (row / pid 1 is frst time rid 1 is used, row / pid 3 the first time RID 2 is used, row / pid 7 is first time rid 3 is used). Though returning the max timestamp for each rid, the pids are not the pids for the timestamps from the original table. What query would give me the results I'm looking for?
(Tested in PostgreSQL 9.something)
Identify the rid and timestamp.
select rid, max(timestamp) as ts
from test
group by rid;
1 2011-04-14 18:46:00
2 2011-04-14 14:59:00
Join to it.
select test.pid, test.cost, test.timestamp, test.rid
from test
inner join
(select rid, max(timestamp) as ts
from test
group by rid) maxt
on (test.rid = maxt.rid and test.timestamp = maxt.ts)
select *
from (
select `pid`, `timestamp`, `cost`, `rid`
from theTable
order by `timestamp` desc
) as mynewtable
group by mynewtable.`rid`
order by mynewtable.`timestamp`
Hope I helped !
SELECT t.pid, t.cost, to.timestamp, t.rid
FROM test as t
JOIN (
SELECT rid, max(tempstamp) AS maxtimestamp
FROM test GROUP BY rid
) AS tmax
ON t.pid = tmax.pid and t.timestamp = tmax.maxtimestamp
I created an index on rid and timestamp.
SELECT test.pid, test.cost, test.timestamp, test.rid
FROM theTable AS test
LEFT JOIN theTable maxt
ON maxt.rid = test.rid
AND maxt.timestamp > test.timestamp
WHERE maxt.rid IS NULL
Showing rows 0 - 2 (3 total, Query took 0.0104 sec)
This method will select all the desired values from theTable (test), left joining itself (maxt) on all timestamps higher than the one on test with the same rid. When the timestamp is already the highest one on test there are no matches on maxt - which is what we are looking for - values on maxt become NULL. Now we use the WHERE clause maxt.rid IS NULL or any other column on maxt.
You could also have subqueries like that:
SELECT ( SELECT MIN(t2.pid)
FROM test t2
WHERE t2.rid = t.rid
AND t2.timestamp = maxtimestamp
) AS pid
, MAX(t.timestamp) AS maxtimestamp
, t.rid
FROM test t
GROUP BY t.rid
But this way, you'll need one more subquery if you want cost included in the shown columns, etc.
So, the group by and join is better solution.
If you want to avoid a JOIN, you can use:
SELECT pid, rid FROM theTable t1 WHERE t1.pid IN ( SELECT MAX(t2.pid) FROM theTable t2 GROUP BY t2.rid);
Try:
select pid,cost, timestamp, rid from theTable order by timestamp DESC limit 2;