I have a table with the following columns member_id, status and created_at (timestamp) and i want to extract the latest status for each member_id based on the timestamp value.
member_id
status
created_at
1
ON
1641862225
1
OFF
1641862272
2
OFF
1641862397
3
OFF
1641862401
3
ON
1641862402
Source: Raw data image
So, my ideal query result would be like this:
member_id
status
created_at
1
OFF
1641862272
2
OFF
1641862397
3
ON
1641862402
Expected query results image
My go to process for doing things like that is to assign a row number to each data and get row number 1 depending on the partition and sorting.
For mysql, this is only available starting mysql 8
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
This will generate something like this.
row_num
member_id
status
created_at
1
1
OFF
1641862272
2
1
ON
1641862225
1
2
OFF
1641862397
1
3
ON
1641862402
2
3
OFF
1641862401
Then you use that as a sub query and get the rows where row_num = 1
SELECT member_id, status, created_at FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY member_id ORDER BY created_at DESC) as row_num,
member_id, status, created_at FROM table
) a WHERE row_num = 1
MySQL has support for Window Function since v8.0. the solution from crimson589 is preferred for v8+, this solution applies for earlier versions of MySQL or if you need an alternate solution to window queries.
After grouping by member_id we can either join back into the original set to gain the corresponding status value to the MAX(created_at)
SELECT ByMember.member_id
, status.status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
JOIN MemberStatus status ON ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at;
Or you could use a sub query instead of the join:
SELECT ByMember.member_id
, (SELECT status.status FROM MemberStatus status WHERE ByMember.member_id = status.member_id AND ByMember.created_at = status.created_at) as status
, ByMember.created_at
FROM (
SELECT member_id, max(created_at) as created_at
FROM MemberStatus
GROUP BY member_id
) ByMember
The JOIN based solution allows you to query additional columns from the original set instead of having multiple sub-queries. I would almost always advocate for the JOIN solution, but sometimes the sub-query is simpler to maintain.
I've setup a fiddle to compare these options: http://sqlfiddle.com/#!9/0edb931/11
You can group by member_id and max of created_at, then a self join with member_id and created_at will give you the latest status.
Related
I want to fetch the latest entry to the database
I have this data
When I run this query
select id, parent_id, amount, max(created_at) from table group by parent_id
it correctly returns the latest entry but not the rest of the column
what I want is
how do I achieve that?
Sorry that I posted image instead of table, the table won't work for some reason
You can fetch the desired output using subquery. In the subquery fetch the max created_at of each parent_id which will return the row with max created_at for each parent_id. Please try the below query.
SELECT * FROM yourtable t WHERE t.created_at =
(SELECT MAX(created_at) FROM yourtable WHERE parent_id = t.parent_id);
If the id column in your table is AUTO_INCREMENT field then you can fetch the latest entry with the help of id column too.
SELECT * FROM yourtable t WHERE t.id =
(SELECT MAX(id) FROM yourtable WHERE parent_id = t.parent_id);
That's a good use case for a window function like RANK as a subquery:
SELECT id, parent_id, amount, created_at
FROM (
SELECT id, parent_id, amount, created_at,
RANK() OVER (PARTITION BY parent_id ORDER BY created_at DESC) parentID_rank
FROM yourtable) groupedData
WHERE parentID_rank = 1;
or with ORDER BY clause for the outer query if necessary:
SELECT id, parent_id, amount, created_at
FROM (
SELECT id, parent_id, amount, created_at,
RANK() OVER (PARTITION BY parent_id ORDER BY created_at DESC) parentID_rank
FROM yourtable) groupedData
WHERE parentID_rank = 1
ORDER BY id;
To explain the intention:
The PARTITION BY clause groups your data by the parent_id.
The ORDER BY clause sorts it starting with the latest date.
The WHERE clause just takes the entry with the latest date per parent id only.
The main point here is that your query is invalid. The DBMS should raise an error, but you work in a cheat mode that MySQL offers that allows you to write such queries without being warned.
My advice: When working in MySQL make sure you have always
SET sql_mode = 'ONLY_FULL_GROUP_BY';
As to the query: You are using MAX. Thus you aggregate your data. In your GROUP BY clause you say you want one result row per parent_id. You select the parent_id's maximum created_at. You also select the parent_id's ID, the parent_id itself, and the parent_id's amount. The parent_id's ID??? Is there only one ID per parent_id in your table? The amount? Is there only one amount per parent_id in the table? You must tell the DBMS which ID to show and which amount. You haven't done so, and this makes your query invalid according to standard SQL.
You are running MySQL in cheat mode,however, and so MySQL silently applies ANY_VALUE to all non-aggregated columns. This is what your query is turned into internally:
select
any_value(id),
parent_id,
any_value(amount),
max(created_at)
from table
group by parent_id;
ANY_VALUE means the DBMS is free to pick the attribute from whatever row it likes; you don't care.
What you want instead is not to aggregate your rows, but to filter them. You want to select only those rows with the maximum created_at per parent_id.
There exist several ways to get this result. Here are some options.
Get the maximum created_at per parent_id. Then select the matching rows:
select *
from table
where (parent_id, created_at) in
(
select parent_id, max(created_at)
from table
group by parent_id
);
Select the rows for which no newer created_at exists for the parent_id:
select *
from table t
where not exists
(
select null
from table newer
where newer.parent_id = t.parent_id
and newer.created_at > t.created_at
);
Get the maximum created_at on-the-fly. Then compare the dates:
select id, parent_id, amount, created_at
from
(
select t.*, max(created_at) over (partition by parent_id) as max_created_at
from table t
) with_max_created_at
where created_at = max_created_at;
select id, parent_id, amount, max(created_at)
from table
group by parent_id
order by max(created_at) desc
limit 1
I am trying to understand how to do in mySQL what I usually do in python.
I have a sales table, with sale_date, user_id and price_USD as columns. Each row is an order made by the user.
I want to get a new table that has all of the orders which cost more than the last order the user made (so in this picture, just the yellow rows).
I know how to get a table with the last order for each user, but I cannot save it on the database.
How do I compare each row's price to a different value by the user_id and get just the larger ones in one 'swoop'?
Thanks
If you are running MysL 8.0, you can do this with window functions:
select t.*
from (
select
t.*,
first_value(price_usd)
over(partition by user_id order by sale_date desc) last_price_usd
from mytable t
) t
where lag_price_usd is null or price > last_price_usd
In earlier versions, you could use a correlated subquery:
select t.*
from mytable t
where t.price_usd > (
select price_usd
from mytable t1
where t1.user_id = t.user_id
order by sale_date desc
limit 1
)
So I'm having an issue with what I expect is a very simple problem, but for the life of me I can't figure it out!
I have a table like this:
id name status date
1 bob good 01/01/2020
2 john good 01/01/2020
3 bob bad 02/01/2020
4 john good 02/01/2020
5 ben good 02/01/2020
I want to retrieve the latest record for each name.
I have tried the following:
SELECT name
,STATUS
,MAX(DATE)
FROM TABLE
GROUP BY name
ORDER BY MAX(DATE)
I thought this worked, however it is returning a record for bob, john and ben, but it is showing bobs date as 02/01/2020 but his status as "good" from the other record!
At a loss as to how to do this in the simplest way possible, all help is much appreciated!
Don't think of this as aggregation. Think of this as filtering!
Select t.name, t.status, t.date
from table t
where t.date = (select max(t2.date)
from table t2
where t2.name = t.name
);
You are not aggregating anything. Your result set just wants columns from one row, the row with the maximum date for each name. That is more like filtering than grouping.
With not exists:
select t.* from tablename t
where not exists (
select 1 from tablename
where name = t.name and date > t.date
)
The result is:
every row of the table for which there is not another row with the same name and later date.
For MySql 8.0+ you can use ROW_NUMBER() window function:
select t.id, t.name, t.status, t.date
from (
select *, row_number() over (partition by name order by date desc) rn
from tablename
) t
where t.rn = 1
Maria DB 10.2 apparently. – Ed Jones
SELECT DISTINCT name,
FIRST_VALUE(status) OVER (PARTITION BY name
ORDER BY date DESC) status,
MAX(date) OVER (PARTITION BY name) date
FROM table;
The index by (name, data) will increase the performance.
I have a table that looks like this...
user_id, match_id, points_won
1 14 10
1 8 12
1 12 80
2 8 10
3 14 20
3 2 25
I want to write a MYSQL script that pulls back the most points a user has won in a single match and includes the match_id in the results - in other words...
user_id, match_id, max_points_won
1 12 80
2 8 10
3 2 25
Of course if I didn't need the match_id I could just do...
select user_id, max(points_won)
from table
group by user_id
But as soon as I add match_id to the "select" and "group by" I have a row for every match, and if I only add the match_id to the "select" (and not the "group by") then it won't correctly relate to the points_won.
Ideally I don't want to do the following either because it doesn't feel particularly safe (e.g. if the user has won the same amount of points on multiple matches)...
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
Are there any more elegant options for this problem?
This is harder than it needs to be in MySQL. One method is a bit of a hack but it works in most circumstances. That is the group_concat()/substring_index() trick:
select user_id, max(points_won),
substring_index(group_concat(match_id order by points_won desc), ',', 1)
from table
group by user_id;
The group_concat() concatenates together all the match_ids, ordered by the points descending. The substring_index() then takes the first one.
Two important caveats:
The resulting expression has a type of string, regardless of the internal type.
The group_concat() uses an internal buffer, whose length -- by default -- is 1,024 characters. This default length can be changed.
You can use the query:
select user_id, max(points_won)
from table
group by user_id
as a derived table. Joining this to the original table gets you what you want:
select t1.user_id, t1.match_id, t2.max_points_won
from table as t1
join (
select user_id, max(points_won) as max_points_won
from table
group by user_id
) as t2 on t1.user_id = t2.user_id and t1.points_won = t2.max_points_won
I think you can optimize your query by add limit 1 in the inner query.
SELECT t.user_id, max(t.points_won) max_points_won
, (select t2.match_id
from table t2
where t2.user_id = t.user_id
and t2.points_won = max_points_won limit 1) as 'match_of_points_maximum'
FROM table t
GROUP BY t.user_id
EDIT : only for postgresql, sql-server, oracle
You could use row_number :
SELECT USER_ID, MATCH_ID, POINTS_WON
FROM
(
SELECT user_id, match_id, points_won, row_number() over (partition by user_id order by points_won desc) rn
from table
) q
where q.rn = 1
For a similar function, have a look at Gordon Linoff's answer or at this article.
In your example, you partition your set of result per user then you order by points_won desc to obtain highest winning point first.
I have a log table with several statuses. It logs the position of physical objects in an external system. I want to get the latest rows for a status for each distinct physical object.
I need a list of typeids and their quantity for each status, minus the quantity of typeids that have an entry for another status that is later than the row with the status we are looking for.
e.g each status move is recorded but nothing else.
Here's the problem, I don't have a distinct ID for each physical object. I can only calculate how many there are from the state of the log table.
I've tried
SELECT dl.id, dl.status
FROM `log` AS dl
INNER JOIN (
SELECT MAX( `date` ) , id
FROM `log`
GROUP BY id ORDER BY `date` DESC
) AS dl2
WHERE dl.id = dl2.id
but this would require a distinct type id to work.
My table has a primary key id, datetime, status, product type_id. There are four different statuses.
a product must pass through all statuses.
Example Data.
date typeid status id
2014-01-13 PF0180 shopfloor 71941
2014-01-13 ND0355 shopfloor 71940
2014-01-10 ND0355 machine 71938
2014-01-10 ND0355 machine 71937
2014-01-10 ND0282 machine 7193
when selected results for the status shopfloor I would want
quantity typeid
1 ND0355
1 PF0180
when selecting for status machine I would want
quantity typeid
1 ND0282
1 ND0355
The order of the statuses shouldn't matter it only matters if there is a later entry for the product.
If I understood you correctly, this will give you the desired output:
select
l1.typeid,
l1.status,
count(1) - (
select count(1)
from log l2
where l2.typeid = l1.typeid and
l2.date > l1.date
)
from log l1
group by l1.typeid, l1.status;
Check this SQL Fiddle
TYPEID STATUS TOTAL
-----------------------------
ND0282 machine 1
ND0355 machine 1
ND0355 shopfloor 1
PF0180 shopfloor 1
You need to get the greatest date per status, not per id. Then join to the log table where the status and date are the same.
SELECT dl.id, dl.status
FROM `log` AS dl
INNER JOIN (
SELECT status, MAX( `date` ) AS date
FROM `log`
GROUP BY status ORDER BY NULL
) AS dl2 USING (status, date);
It would be helpful to have an index on (status, date) on this table, which would allow the subquery to run as an index-only query.
Everton Agner originally posted this solution, but the reply seems to have disappeared so I'm adding it (with slight modifications)
select
l1.typeid,
l1.status,
count(1) - (
select count(1)
from log l2
where l2.typeid = l1.typeid and
l2.`date` > l1.`date`
AND l2.status != 'dieshop'
) as quant
from log l1
WHERE l1.status = 'dieshop'
group by l1.typeid;