there are a ton of other related questions but I couldn't quite apply them to my situation. I am using sequelize so I'm trying to do this with as few hits to the DB as possible.
My (simplified) table is basically:
id dateAcquired staffId skillId
44 2017-04-27 25 2
341 2018-02-01 28 2
4787 2018-04-04 25 2
8852 2020-01-31 28 2
I am looking for just the id of the most recent dateAcquired per staffId. (note that the most recent one might be a lower id, otherwise I would have had a solution)
4787
8852
Just in SQL using this query I get the correct date but not the correct id:
SELECT id, staffId, max(`dateAcquired`)
FROM `histories` AS `history`
WHERE `history`.`skillId` = '2'
GROUP BY `staffId`, id
Results in:
44 2018-04-04
341 2020-01-31
Although I know it needs tweaking once I get the query right, my sequelize code is:
models.history.findAll(
{
attributes: [sequelize.fn("max", sequelize.col('dateAcquired')), 'id'],
where: {skillId:id},
group: ["id"]
}
).then(maxIds => {
let ids = maxIds.map(result =>{return result.id;});
models.history.findAll({
include: [
{
model:models.staff
}
],
where: {
id: {
[Op.in]: [ids]
}
}
}).then(results =>
{
res.json(results);
})
})
Thanks for your help!
With NOT EXISTS:
select h.* from histories h
where h.skillid = 2
and not exists (
select 1 from histories
where skillid = h.skillid and staffid = h.staffid and dateAcquired > h.dateAcquired
)
See the demo.
Results:
| id | dateAcquired | staffId | skillId |
| ---- | ------------------- | ------- | ------- |
| 4787 | 2018-04-04 00:00:00 | 25 | 2 |
| 8852 | 2020-01-31 00:00:00 | 28 | 2 |
you can try
Create table #test
(id int,
dateAcquired date,
staffid int,
skillid int
)
Insert into #test values (44 , '2017-04-27' , 25 , 2)
Insert into #test values (341 , '2018-02-01' , 28 , 2)
Insert into #test values (4787 , '2018-04-04' , 25 , 2)
Insert into #test values (8852 , '2020-01-31' , 28 , 2)
select id,dateacquired
from
(
select id,dateacquired,
--ROW_NUMBER()over(partition by dateacquired order by dateacquired )
ROW_NUMBER()over( order by dateacquired desc )rn
from #test
)a where rn<=2
Query
SELECT t.*
FROM my_table t
LEFT JOIN my_table t2 ON t2.staffId = t.staffId AND t2.dateAcquired > t.dateAcquired
WHERE t2.id IS NULL
AND t.skillId = 2;
Explanation
What happens is that each row from t joins onto any rows where the staffId matches and dateAcquired is greater. The only rows that don't join are the ones with the highest values in dateAcquired. We then filter out everything that does join in the WHERE clause.
Several possible query patterns to return the specified result
Using an inline view (derived table) to get the latest (maximum) date_acquired for each staffid, and then a join to the base table to get the row(s) that have that latest date_acquired.
SELECT MAX(h.`id`) AS `id`
, h.`staffId`
, h.`dateacquired`
FROM
JOIN ( SELECT l.`staffid`
, MAX(l.`dateacquired`) AS `max_dateacquired`
FROM `histories` l
WHERE l.`skillId` = '2'
GROUP BY l.`staffid`
) m
JOIN `histories` h
ON h.`dateacquired` = m.`max_dateacquired`
ON h.`staffid` = m.`staffid`
AND h.`skillId` = '2'
GROUP
BY h.`staffid`
, h.`dateacquired`
MySQL 8.0 introduced Window Functions, which gives us another way to get the specified result:
WITH w AS
( SELECT h.id
, h.staffid
, h.dateacquired
, ROW_NUMBER() OVER(PARTITION BY h.staffid ORDER BY h.dateacquired DESC, h.id DESC) AS _rn
FROM `histories` h
WHERE h.skillid = '2'
)
SELECT w.id
, w.staffid
, w.dateacquired
FROM w
WHERE _rn = 1
ORDER
BY w.staffid
We could also use an anti-join pattern, to retrieve rows where there isn't a later dateacquired... assuming id is unique in histories (or at least the (staffid,dateacquired,id) tuple is unique)
SELECT h.id
, h.staffid
, h.dateacquired
FROM `histories` h
-- anti-join
LEFT
JOIN `histories` l
ON l.skillid = '2'
AND l.staffid = h.staffid
AND l.dateacquired >= h.dateacquired
AND ( l.datecquired > h.dateacquired OR l.id > h.id )
WHERE l.staffid IS NULL
AND h.skillid = '2'
ORDER
BY h.staffid
We could accomplish the same thing, re-writing the anti-join as a NOT EXISTS
SELECT h.id
, h.staffid
, h.dateacquired
FROM `histories` h
WHERE h.skillid = '2'
AND NOT EXISTS
( SELECT 1
FROM `histories` l
WHERE l.skillid = '2'
AND l.staffid = h.staffid
AND l.dateacquired >= h.dateacquired
AND ( l.datecquired > h.dateacquired OR l.id > h.id )
)
(Note that some of these queries could be simplified a tiny bit if we have a guarantee that (staffid,skillid,dateacquired) tuple is unique. All of the queries above do not assume such a guarantee.)
Related
So I was taking a test recently with some higher level SQL problems. I only have what I would consider "intermediate" experience in SQL and I've been working on this for a day or so now. I just can't figure it out.
Here's the problem:
You have a table with 4 columns as such:
EmployeeID int unique
EmployeeType int
EmployeeSalary int
Created date
Goal: I need to retrieve the difference between the latest two EmployeeSalary for any EmployeeType with more than 1 entry. It has to be done in one statement (nested queries are fine).
Example Data Set: http://sqlfiddle.com/#!9/0dfc7
EmployeeID | EmployeeType | EmployeeSalary | Created
-----------|--------------|----------------|--------------------
1 | 53 | 50 | 2015-11-15 00:00:00
2 | 66 | 20 | 2014-11-11 04:20:23
3 | 66 | 30 | 2015-11-03 08:26:21
4 | 66 | 10 | 2013-11-02 11:32:47
5 | 78 | 70 | 2009-11-08 04:47:47
6 | 78 | 45 | 2006-11-01 04:42:55
So for this data set, the proper return would be:
EmployeeType | EmployeeSalary
-------------|---------------
66 | 10
78 | 25
The 10 comes from subtracting the latest two EmployeeSalary values (30 - 20) for the EmployeeType of 66. The 25 comes from subtracting the latest two EmployeeSalary values (70-45) for EmployeeType of 78. We skip EmployeeID 53 completely because it only has one value.
This one has been destroying my brain. Any clues?
Thanks!
How to make really simple query complex?
One funny way(not best performance) to do it is:
SELECT final.EmployeeType, SUM(salary) AS difference
FROM (
SELECT b.EmployeeType, b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 1
UNION ALL
SELECT b.EmployeeType, -b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 2
) AS final
GROUP BY final.EmployeeType;
SqlFiddleDemo
EDIT:
The keypoint is MySQL doesn't support windowed function so you need to use equivalent code:
For example solution in SQL Server:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY EmployeeType ORDER BY Created DESC) AS rn
FROM #tab
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1
LiveDemo
And MySQL equivalent:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (
SELECT t1.EmployeeType, t1.EmployeeSalary,
count(t2.Created) + 1 as rn
FROM #tab t1
LEFT JOIN #tab t2
ON t1.EmployeeType = t2.EmployeeType
AND t1.Created < t2.Created
GROUP BY t1.EmployeeType, t1.EmployeeSalary
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1;
LiveDemo2
The dataset of the fiddle is different from the example above, which is confusing (not to mention a little perverse). Anyway, there's lots of ways to skin this particular cat. Here's one (not the fastest, however):
SELECT a.employeetype, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) b
ON b.employeetype = a.employeetype
AND b.rank = a.rank+1
WHERE a.rank = 1;
a very similar but faster solution looks like this (although you sometimes need to assign different variables between tables a and b - for reasons I still don't fully understand)...
SELECT a.employeetype
, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) a
JOIN
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) b
ON b.employeetype = a.employeetype
AND b.i = a.i + 1
WHERE a.i = 1;
I need a "little" help with an SQL query (MySQL).
I have the following tables:
COURIERS table:
+------------+
| COURIER_ID |
+------------+
DELIVERIES table:
+-------------+------------+------------+
| DELIVERY_ID | COURIER_ID | START_DATE |
+-------------+------------+------------+
ORDERS table:
+----------+-------------+-------------+
| ORDER_ID | DELIVERY_ID | FINISH_DATE |
+----------+-------------+-------------+
COORDINATES table:
+-------------+-----+-----+------+
| DELIVERY_ID | LAT | LNG | DATE |
+-------------+-----+-----+------+
In the real database I have more columns in each table, but for this example the above columns are enough.
What do I need?
An SQL query that returns all couriers [COURIER_ID], their last
delivery [DELIVERY_ID] (based on last START_DATE), the
delivery's last coordinate [LAT and LNG] (based on last DATE) and the remaining orders count (total of orders of the last delivery that have no FINISH_DATE).
A courier can have no deliveries, in this case I want DELIVERY_ID =
NULL, LAT = NULL and LNG = NULL in the result.
A delivery can have no coordinates, in this case I want LAT = NULL
and LNG = NULL in the result.
What was I able to do?
SELECT c.`COURIER_ID`,
d.`DELIVERY_ID`,
r.`LAT`,
r.`LNG`,
(SELECT COUNT(DISTINCT `ORDER_ID`)
FROM `ORDERS`
WHERE `DELIVERY_ID` = d.`DELIVERY_ID`
AND `FINISH_DATE` IS NULL) AS REMAINING_ORDERS
FROM `COURIERS` AS c
LEFT JOIN `DELIVERIES` AS d USING (`COURIER_ID`)
LEFT JOIN `COORDINATES` AS r ON r.`DELIVERY_ID` = d.`DELIVERY_ID`
WHERE (CASE WHEN
(SELECT MAX(`START_DATE`)
FROM `DELIVERIES`
WHERE `COURIER_ID` = c.`COURIER_ID`) IS NULL THEN d.`START_DATE` IS NULL ELSE d.`START_DATE` =
(SELECT MAX(`START_DATE`)
FROM `DELIVERIES`
WHERE `COURIER_ID` = c.`COURIER_ID`) END)
AND (CASE WHEN
(SELECT MAX(`DATE`)
FROM `COORDINATES`
WHERE `DELIVERY_ID` = d.`DELIVERY_ID`) IS NULL THEN r.`DATE` IS NULL ELSE r.`DATE` =
(SELECT MAX(`DATE`)
FROM `COORDINATES`
WHERE `DELIVERY_ID` = d.`DELIVERY_ID`) END)
GROUP BY c.`COURIER_ID`
ORDER BY d.`START_DATE` DESC
The problem is that this query is very slow (from 5 to 20 seconds) when I have over 5k COORDINATES and it does not returns all couriers sometimes.
Thank you so much for any solution.
Try this:
SELECT C.COURIER_ID, D.DELIVERY_ID, D.START_DATE, D.FINISH_DATE,
B.LAT, B.LNG, B.DATE, C.NoOfOrders
FROM COURIERS C
LEFT JOIN ( SELECT *
FROM (SELECT *
FROM DELIVERIES D
ORDER BY D.COURIER_ID, D.START_DATE DESC
) A
GROUP BY COURIER_ID
) AS A ON C.COURIER_ID = A.COURIER_ID
LEFT JOIN ( SELECT *
FROM (SELECT *
FROM COORDINATES CO
ORDER BY CO.DELIVERY_ID, CO.DATE DESC
) B
GROUP BY CO.DELIVERY_ID
) AS B ON A.DELIVERY_ID = B.DELIVERY_ID
LEFT JOIN ( SELECT O.DELIVERY_ID, COUNT(1) NoOfOrders
FROM ORDERS O WHERE FINISH_DATE IS NULL
GROUP BY O.DELIVERY_ID
) AS C ON A.DELIVERY_ID = C.DELIVERY_ID;
I haven't been able to test this query since I don't have a mysql database set up right now, much less with this schema and sample data. But I think this will work for you:
select
c.courier_id
, d.delivery_id
, co.lat
, co.lng
, oc.cnt as remaining_orders
from
couriers c
left join (
select
d.delivery_id
, d.courier_id
from
deliveries d
inner join (
select
d.delivery_id
, max(d.start_date) as start_date
from
deliveries d
group by
d.delivery_id
) dmax on dmax.delivery_id = d.delivery_id and dmax.start_date = d.start_date
) d on d.courier_id = c.courier_id
left join (
select
c.delivery_id
, c.lat
, c.lng
from
coordinates c
inner join (
select
c.delivery_id
, max(c.date) as date
from
coordinates c
group by
c.delivery_id
) cmax on cmax.delivery_id = c.delivery_id and cmax.date = c.date
) co on co.delivery_id = d.delivery_id
left join (
select
o.delivery_id
, count(o.order_id) as cnt
from
orders o
where
o.finish_date is null
group by
o.delivery_id
) oc on oc.delivery_id = d.delivery_id
I want to return the date column for each of the rows where max() is used within the SELECT. Or maybe there is a better way of doing this?
This is how I imagine it:
SELECT
MAX(time) as time, [date column from max(time) row] as timedate,
MAX(distance) as distance, [date column from max(distance) row] as distancedate,
MAX(weight) as weight, [date column from max(weight) row] as weightdate
Here is my current SQL, this does not return the date for each of the MAX() rows.
$db->query("SELECT e.id as id, e.name, MAX(ue.time) as time, MAX(ue.weight) as weight, MAX(ue.distance) as distance
FROM `users exercises` as ue
LEFT JOIN `exercises` as e ON exerciseid = e.id
GROUP BY e.id
LIMIT 30");
id | exerciseid | date | weight | distance | time
----------------------------------------------------------
1 | 1 | 2014-06-14 | 100 | 33 | null
2 | 1 | 2013-03-03 | 500 | 11 | null
3 | 1 | 2014-11-11 | null | null | 41
Current Output:
Array
(
[id] => 1
[name] => run
[time] => 41
[weight] => 500
[distance] => 33
)
Expected Output:
Array
(
[id] => 1
[name] => run
[time] => 41
[time_date] => 2014-11-11
[weight] => 500
[weight_date] => 2013-03-03
[distance] => 33
[distance_date] => 2014-06-14
)
SQL Fiddle: http://sqlfiddle.com/#!2/75e53/1
SELECT e.id as id, e.name,
MAX(ue.time) as time,
(
select date
from `users exercises`
WHERE time = MAX(ue.time) AND ue.`userid` = $userid
LIMIT 1
) as time_date,
MAX(ue.weight) as weight,
(
select date
from `users exercises`
WHERE weight = MAX(ue.weight) AND ue.`userid` = $userid
LIMIT 1
) as weight_date,
MAX(ue.distance) as distance,
(
select date
from `users exercises`
WHERE distance = MAX(ue.distance) AND ue.`userid` = $userid
LIMIT 1
) as distance_date
FROM `users exercises` as ue
LEFT JOIN `exercises` as e ON exerciseid = e.id
WHERE ue.`userid` = $userid
GROUP BY e.id
LIMIT 30
There's probably a more efficient way to do this, but sadly my MySQL skills aren't that good; however the code below does what you want:
Solution 1
select
mx.time
, t.date as timedate
, mx.distance
, d.date as distancedate
, mx.weight
, w.date as weightdate
from
(
SELECT
MAX(`time`) as `time`
, MAX(`distance`) as `distance`
, MAX(`weight`) as `weight`
from `users exercises`
) as mx
inner join `users exercises` as t on t.time = mx.time
inner join `users exercises` as d on d.distance = mx.distance
inner join `users exercises` as w on w.weight = mx.weight;
Solution 2
select
mx.time
, (select date from `users exercises` as x where x.time = mx.time limit 1) as timedate
, mx.distance
, (select date from `users exercises` as y where y.distance = mx.distance limit 1) as distancedate
, mx.weight
, (select date from `users exercises` as z where z.weight = mx.weight limit 1) as weightdate
from
(
SELECT
MAX(`time`) as `time`
, MAX(`distance`) as `distance`
, MAX(`weight`) as `weight`
from `users exercises`
) as mx;
For anyone using a db which support partition by there is a better way of implementing this; sadly MySQL does not support that functionality currently.
SQL Fiddle: http://sqlfiddle.com/#!2/75e53/13
I have a MySQL table of the following form
account_id | call_date
1 2013-06-07
1 2013-06-09
1 2013-06-21
2 2012-05-01
2 2012-05-02
2 2012-05-06
I want to write a MySQL query that will get the maximum difference (in days) between successive dates in call_date for each account_id. So for the above example, the result of this query would be
account_id | max_diff
1 12
2 4
I'm not sure how to do this. Is this even possible to do in a MySQL query?
I can do datediff(max(call_date),min(call_date)) but this would ignore dates in between the first and last call dates. I need some way of getting the datediff() between each successive call_date for each account_id, then finding the maximum of those.
I'm sure fp's answer will be faster, but just for fun...
SELECT account_id
, MAX(diff) max_diff
FROM
( SELECT x.account_id
, DATEDIFF(MIN(y.call_date),x.call_date) diff
FROM my_table x
JOIN my_table y
ON y.account_id = x.account_id
AND y.call_date > x.call_date
GROUP
BY x.account_id
, x.call_date
) z
GROUP
BY account_id;
CREATE TABLE t
(`account_id` int, `call_date` date)
;
INSERT INTO t
(`account_id`, `call_date`)
VALUES
(1, '2013-06-07'),
(1, '2013-06-09'),
(1, '2013-06-21'),
(2, '2012-05-01'),
(2, '2012-05-02'),
(2, '2012-05-06')
;
select account_id, max(diff) from (
select
account_id,
timestampdiff(day, coalesce(#prev, call_date), call_date) diff,
#prev := call_date
from
t
, (select #prev:=null) v
order by account_id, call_date
) sq
group by account_id
| ACCOUNT_ID | MAX(DIFF) |
|------------|-----------|
| 1 | 12 |
| 2 | 4 |
see it working live in an sqlfiddle
If you have an index on account_id, call_date, then you can do this rather efficiently without variables:
select account_id, max(call_date - prev_call_date) as diff
from (select t.*,
(select t2.call_date
from table t2
where t2.account_id = t.account_id and t2.call_date < t.call_date
order by t2.call_date desc
limit 1
) as prev_call_date
from table t
) t
group by account_id;
Just for educational purposes, doing it with JOIN:
SELECT t1.account_id,
MAX(DATEDIFF(t2.call_date, t1.call_date)) AS max_diff
FROM t t1
LEFT JOIN t t2
ON t2.account_id = t1.account_id
AND t2.call_date > t1.call_date
LEFT JOIN t t3
ON t3.account_id = t1.account_id
AND t3.call_date > t1.call_date
AND t3.call_date < t2.call_date
WHERE t3.account_id IS NULL
GROUP BY t1.account_id
Since you didn't specify, this shows max_diff of NULL for accounts with only 1 call.
SELECT a1.account_id , max(a1.call_date - a2.call_date)
FROM account a2, account a1
WHERE a1.account_id = a2.account_id
AND a1.call_date > a2.call_date
AND NOT EXISTS
(SELECT 1 FROM account a3 WHERE a1.call_date > a3.call_date AND a2.call_date < a3.call_date)
GROUP BY a1.account_id
Which gives :
ACCOUNT_ID MAX(A1.CALL_DATE - A2.CALL_DATE)
1 12
2 4
I have two tables
create table item( id int )
insert into item ( id ) values ( 1 ), ( 2 ), ( 3 )
create table itemstatus
(
itemid int
, ts datetime
, "status" int
)
insert into itemstatus ( itemid, ts, status ) values
( 1, '2013-12-01T12:00:00.000', 1 ),
( 1, '2013-12-01T11:00:00.000', 2 ),
( 1, '2014-01-01T12:00:00.000', 1 ),
( 2, '2011-01-01T12:00:00.000', 1 )
I'd like to get all items with the last status set, in this case
1, '2014-01-01T12:00:00.000', 1
2, '2011-01-01T12:00:00.000', 1
3, NULL, NULL
What's the most efficient way to solve this?
I tried with a subselect and I get the latest timestamp, but I'm not able to add the status since this field is not included in aggregate-function or group-by. If I add it, the results got grouped by status - logically - but that leads to the fact, that I get too much result-lines and would have to add a further condition / subselect.
You may use the Fiddle-link for created tables and testdata. The second query includes the status-field.
Edit:
adding a further join does the trick, but I doubt that's the way to do it.
select
i.*
, d.*
, s.status
from
item i
left join ( select ts = max(ts), itemid from itemstatus group by itemid ) d
on 1 = 1
and i.id = d.itemid
left join itemstatus s
on 1 = 1
and s.itemid = d.itemid
and s.ts = d.ts
See SQL-fiddle for testing.
You can use row_number partitioned by itemid and ordered by ts desc to get the latest registration in itemstatus per itemid.
select I.id,
S.ts,
S.status
from item as I
left outer join (
select S.status,
S.ts,
S.itemid,
row_number() over(partition by S.itemid
order by S.ts desc) as rn
from itemstatus as S
) as S
on I.id = S.itemid and
S.rn = 1