This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 1 year ago.
Please assume this table:
// mytable
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 3 | 503 | 12 |
| 7 | 33 | 12 |
| 1 | 771 | 13 |
| 2 | 86 | 13 |
| 1 | 772 | 13 |
| 4 | 652 | 14 |
| 4 | 567 | 14 |
+--------+-------------+---------+
I need to group it based on user_id, So, here is my query:
select max(num), user_id from mytable
group by user_id
Here is the result:
// res
+--------+---------+
| num | user_id |
+--------+---------+
| 7 | 12 |
| 2 | 13 |
| 4 | 14 |
+--------+---------+
Now I need to also get the business_id of those rows. Here is the expected result:
// mytable
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 7 | 33 | 12 |
| 2 | 86 | 13 |
| 4 | 567 | 14 | -- This is selected randomly, because of the equality of values
+--------+-------------+---------+
Any idea how can I do that?
You don't group. You filter. One method uses window functions such as row_number():
select t.*
from (select t.*,
row_number() over (partition by user_id order by num desc) as seqnum
from mytable t
) t
where seqnum = 1;
Another method which can have slightly better performance with an index on (user_id, num) is a correlated subquery:
select t.*
from mytable t
where t.num = (select max(t2.num)
from mytable t2
where t2.user_id = t.user_id
);
You should think "group by" when you want to summarize rows. You should think "where" when you want to choose rows with particular characteristics.
Related
I have the following query:
select count(1) num, business_id, user_id FROM `pos_transactions`
group by user_id, business_id
order by user_id
It returns this:
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 3 | 503 | 12 |
| 7 | 33 | 12 |
| 1 | 771 | 13 |
| 2 | 86 | 13 |
| 1 | 772 | 13 |
| 4 | 652 | 14 |
| 4 | 567 | 14 |
+--------+-------------+---------+
I need to select only one row per user_id, the one which has a bigger num value. If all num values for a user are identical, then just one of them should be selected randomly (i.e. user #14). So, here is the expected result:
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 7 | 33 | 12 |
| 2 | 86 | 13 |
| 4 | 567 | 14 |
+--------+-------------+---------+
Any idea how can I do that?
I guess the solution will be something related to limit 1 per user. But I have no idea how I should write the query.
All I want to do is making the table unique per user_id, and the logic is selecting rows that have bigger num.
Use MAX() and FIRST_VALUE() window functions:
SELECT DISTINCT
MAX(COUNT(*)) OVER (PARTITION BY user_id) num,
FIRST_VALUE(business_id) OVER (PARTITION BY user_id ORDER BY COUNT(*) DESC) business_id,
user_id
FROM pos_transactions
GROUP BY user_id, business_id
ORDER BY user_id
This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 2 years ago.
So I have this table and I am trying to get the latest analysis_id
+----+---------+-------------+
| id | repo_id | analysis_id |
+----+---------+-------------+
| 1 | 20 | 3 |
+----+---------+-------------+
| 2 | 20 | 4 |
+----+---------+-------------+
| 3 | 20 | 5 |
+----+---------+-------------+
| 4 | 21 | 6 |
+----+---------+-------------+
| 5 | 22 | 7 |
+----+---------+-------------+
So how do I get the largest number from analysis_id without the repeating repo_id
+----+---------+-------------+
| id | repo_id | analysis_id |
+----+---------+-------------+
| 3 | 20 | 5 |
+----+---------+-------------+
| 4 | 21 | 6 |
+----+---------+-------------+
| 5 | 22 | 7 |
+----+---------+-------------+
A general MySQL 8+ friendly solution uses ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY repo_id ORDER BY analysis_id DESC) rn
FROM yourTable
)
SELECT id, repo_id, analysis_id
FROM cte
WHERE rn = 1;
You are looking for group by
SELECT MAX(id) ,repo_id ,MAX(analysis_id)
FROM YOUR_TABLE
GROUP BY repo_id
In MySQL 5+ you may use
SELECT *
FROM tablename t1
WHERE NOT EXISTS ( SELECT NULL
FROM tablename t2
WHERE t1.repo_id = t2.repo_id
AND t1.id < t2.id )
Given we have following table where the series number and the the date should increment
+----+--------+------------+
| id | series | date |
+----+--------+------------+
| 1 | 10 | 2020-08-13 |
| 2 | 9 | 2020-08-02 |
| 3 | 8 | 2020-06-23 |
| 4 | 7 | 2020-06-08 |
| 5 | 6 | 2020-05-20 |
| 6 | 5 | 2020-05-05 |
| 7 | 4 | 2020-05-01 |
+----+--------+------------+
Is there a way to check if there are records that do not follow this pattern ?
For example row 2 has bigger series number but it's date is before row 3
+----+--------+------------+
| id | series | date |
+----+--------+------------+
| 1 | 10 | 2020-08-13 |
| 2 | 9 | 2020-06-02 |
| 3 | 8 | 2020-07-23 |
| 4 | 7 | 2020-06-08 |
| 5 | 6 | 2020-05-20 |
| 6 | 5 | 2020-05-05 |
| 7 | 4 | 2020-05-01 |
+----+--------+------------+
You can use window functions:
select *
from (
select t.*, lead(date) over(order by series) lead_date
from mytable t
) t
where date > lead_date
Alternatively:
select *
from (
select t.*, lead(series) over(order by date) lead_series
from mytable t
) t
where series > lead_series
You can use lag():
select t.*
from (select t.*,
lag(id) over (order by series) as prev_id_series,
lag(id) over (order by date) as prev_id_date
from t
) t
where prev_id_series <> prev_id_date;
You can fetch problematic rows and their corresponding conflicting rows using SELF JOIN like this (assuming your table is called "series"):
SELECT s1.id AS row_id, s1.series AS row_series, s1.date AS row_date,
s2.id AS conflict_id, s2.series AS conflict_series, s2.date AS conflict_date
FROM series AS s1
JOIN series AS s2
ON s1.series > s2.series AND s1.date < s2.date;
So, let say I have this data
id | value | group
1 | 100 | A
2 | 120 | A
3 | 150 | B
4 | 170 | B
I want to sort it so it become like this
id | value | group
1 | 100 | A
3 | 150 | B
2 | 120 | A
4 | 170 | B
there will be more group than that, so if I the data ordered the group like (A,C,B,D,B,C,A), it will become (A,B,C,D,A,B,C)
You can add a counter column to the table, which will be used to sort the table:
select t.id, t.value, t.`group`
from (
select t.id, t.value, t.`group`,
(select count(*) from tablename
where `group` = t.`group` and id < t.id) counter
from tablename t
) t
order by t.counter, t.`group`
See the demo.
Results:
| id | value | group |
| --- | ----- | ----- |
| 1 | 100 | A |
| 3 | 150 | B |
| 2 | 120 | A |
| 4 | 170 | B |
You can approach this as
SELECT *
FROM `tablename`
ORDER BY
row_number() OVER (PARTITION BY `group` ORDER BY `group`), `group`
my table has duplicate row values in specific columns. i would like to remove those rows and keep the row with the latest id.
the columns i want to check and compare are:
sub_id, spec_id, ex_time
so, for this table
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 1 | 100 | 444 | 09:29 | 2 |
| 2 | 101 | 555 | 10:01 | 10 |
| 3 | 100 | 444 | 09:29 | 23 |
| 4 | 200 | 321 | 05:15 | 5 |
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i would like to get this result
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i was able to build this query to select all duplicate rows from multiple columns, according to this question
select t.*
from mytable t join
(select id, sub_id, spec_id, ex_time, count(*) as NumDuplicates
from mytable
group by sub_id, spec_id, ex_time
having NumDuplicates > 1
) tsum
on t.sub_id = tsum.sub_id and t.spec_id = tsum.spec_id and t.ex_time = tsum.ex_time
but now im not sure how to wrap this select with a delete query to delete the rows except for the ones with highest id.
as shown here
You can modify your sub-select query, to get maximum value of id for each duplication combination.
Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
You can now Delete from this result-set.
Try the following:
DELETE t
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id