How to select other columns of a table when grouping? [duplicate] - mysql

This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 1 year ago.
Please assume this table:
// mytable
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 3 | 503 | 12 |
| 7 | 33 | 12 |
| 1 | 771 | 13 |
| 2 | 86 | 13 |
| 1 | 772 | 13 |
| 4 | 652 | 14 |
| 4 | 567 | 14 |
+--------+-------------+---------+
I need to group it based on user_id, So, here is my query:
select max(num), user_id from mytable
group by user_id
Here is the result:
// res
+--------+---------+
| num | user_id |
+--------+---------+
| 7 | 12 |
| 2 | 13 |
| 4 | 14 |
+--------+---------+
Now I need to also get the business_id of those rows. Here is the expected result:
// mytable
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 7 | 33 | 12 |
| 2 | 86 | 13 |
| 4 | 567 | 14 | -- This is selected randomly, because of the equality of values
+--------+-------------+---------+
Any idea how can I do that?

You don't group. You filter. One method uses window functions such as row_number():
select t.*
from (select t.*,
row_number() over (partition by user_id order by num desc) as seqnum
from mytable t
) t
where seqnum = 1;
Another method which can have slightly better performance with an index on (user_id, num) is a correlated subquery:
select t.*
from mytable t
where t.num = (select max(t2.num)
from mytable t2
where t2.user_id = t.user_id
);
You should think "group by" when you want to summarize rows. You should think "where" when you want to choose rows with particular characteristics.

Related

How to select one row per user conditionally?

I have the following query:
select count(1) num, business_id, user_id FROM `pos_transactions`
group by user_id, business_id
order by user_id
It returns this:
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 3 | 503 | 12 |
| 7 | 33 | 12 |
| 1 | 771 | 13 |
| 2 | 86 | 13 |
| 1 | 772 | 13 |
| 4 | 652 | 14 |
| 4 | 567 | 14 |
+--------+-------------+---------+
I need to select only one row per user_id, the one which has a bigger num value. If all num values for a user are identical, then just one of them should be selected randomly (i.e. user #14). So, here is the expected result:
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 7 | 33 | 12 |
| 2 | 86 | 13 |
| 4 | 567 | 14 |
+--------+-------------+---------+
Any idea how can I do that?
I guess the solution will be something related to limit 1 per user. But I have no idea how I should write the query.
All I want to do is making the table unique per user_id, and the logic is selecting rows that have bigger num.
Use MAX() and FIRST_VALUE() window functions:
SELECT DISTINCT
MAX(COUNT(*)) OVER (PARTITION BY user_id) num,
FIRST_VALUE(business_id) OVER (PARTITION BY user_id ORDER BY COUNT(*) DESC) business_id,
user_id
FROM pos_transactions
GROUP BY user_id, business_id
ORDER BY user_id

Select non repeating non repeating columns base on another column [duplicate]

This question already has answers here:
How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?
(22 answers)
Closed 2 years ago.
So I have this table and I am trying to get the latest analysis_id
+----+---------+-------------+
| id | repo_id | analysis_id |
+----+---------+-------------+
| 1 | 20 | 3 |
+----+---------+-------------+
| 2 | 20 | 4 |
+----+---------+-------------+
| 3 | 20 | 5 |
+----+---------+-------------+
| 4 | 21 | 6 |
+----+---------+-------------+
| 5 | 22 | 7 |
+----+---------+-------------+
So how do I get the largest number from analysis_id without the repeating repo_id
+----+---------+-------------+
| id | repo_id | analysis_id |
+----+---------+-------------+
| 3 | 20 | 5 |
+----+---------+-------------+
| 4 | 21 | 6 |
+----+---------+-------------+
| 5 | 22 | 7 |
+----+---------+-------------+
A general MySQL 8+ friendly solution uses ROW_NUMBER:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY repo_id ORDER BY analysis_id DESC) rn
FROM yourTable
)
SELECT id, repo_id, analysis_id
FROM cte
WHERE rn = 1;
You are looking for group by
SELECT MAX(id) ,repo_id ,MAX(analysis_id)
FROM YOUR_TABLE
GROUP BY repo_id
In MySQL 5+ you may use
SELECT *
FROM tablename t1
WHERE NOT EXISTS ( SELECT NULL
FROM tablename t2
WHERE t1.repo_id = t2.repo_id
AND t1.id < t2.id )

SQL - Select records that their columns do not follow the same order

Given we have following table where the series number and the the date should increment
+----+--------+------------+
| id | series | date |
+----+--------+------------+
| 1 | 10 | 2020-08-13 |
| 2 | 9 | 2020-08-02 |
| 3 | 8 | 2020-06-23 |
| 4 | 7 | 2020-06-08 |
| 5 | 6 | 2020-05-20 |
| 6 | 5 | 2020-05-05 |
| 7 | 4 | 2020-05-01 |
+----+--------+------------+
Is there a way to check if there are records that do not follow this pattern ?
For example row 2 has bigger series number but it's date is before row 3
+----+--------+------------+
| id | series | date |
+----+--------+------------+
| 1 | 10 | 2020-08-13 |
| 2 | 9 | 2020-06-02 |
| 3 | 8 | 2020-07-23 |
| 4 | 7 | 2020-06-08 |
| 5 | 6 | 2020-05-20 |
| 6 | 5 | 2020-05-05 |
| 7 | 4 | 2020-05-01 |
+----+--------+------------+
You can use window functions:
select *
from (
select t.*, lead(date) over(order by series) lead_date
from mytable t
) t
where date > lead_date
Alternatively:
select *
from (
select t.*, lead(series) over(order by date) lead_series
from mytable t
) t
where series > lead_series
You can use lag():
select t.*
from (select t.*,
lag(id) over (order by series) as prev_id_series,
lag(id) over (order by date) as prev_id_date
from t
) t
where prev_id_series <> prev_id_date;
You can fetch problematic rows and their corresponding conflicting rows using SELF JOIN like this (assuming your table is called "series"):
SELECT s1.id AS row_id, s1.series AS row_series, s1.date AS row_date,
s2.id AS conflict_id, s2.series AS conflict_series, s2.date AS conflict_date
FROM series AS s1
JOIN series AS s2
ON s1.series > s2.series AND s1.date < s2.date;

sort data by specific order sequence (mysql)

So, let say I have this data
id | value | group
1 | 100 | A
2 | 120 | A
3 | 150 | B
4 | 170 | B
I want to sort it so it become like this
id | value | group
1 | 100 | A
3 | 150 | B
2 | 120 | A
4 | 170 | B
there will be more group than that, so if I the data ordered the group like (A,C,B,D,B,C,A), it will become (A,B,C,D,A,B,C)
You can add a counter column to the table, which will be used to sort the table:
select t.id, t.value, t.`group`
from (
select t.id, t.value, t.`group`,
(select count(*) from tablename
where `group` = t.`group` and id < t.id) counter
from tablename t
) t
order by t.counter, t.`group`
See the demo.
Results:
| id | value | group |
| --- | ----- | ----- |
| 1 | 100 | A |
| 3 | 150 | B |
| 2 | 120 | A |
| 4 | 170 | B |
You can approach this as
SELECT *
FROM `tablename`
ORDER BY
row_number() OVER (PARTITION BY `group` ORDER BY `group`), `group`

Mysql delete similar rows according to specific columns except the ones with highest id

my table has duplicate row values in specific columns. i would like to remove those rows and keep the row with the latest id.
the columns i want to check and compare are:
sub_id, spec_id, ex_time
so, for this table
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 1 | 100 | 444 | 09:29 | 2 |
| 2 | 101 | 555 | 10:01 | 10 |
| 3 | 100 | 444 | 09:29 | 23 |
| 4 | 200 | 321 | 05:15 | 5 |
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i would like to get this result
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i was able to build this query to select all duplicate rows from multiple columns, according to this question
select t.*
from mytable t join
(select id, sub_id, spec_id, ex_time, count(*) as NumDuplicates
from mytable
group by sub_id, spec_id, ex_time
having NumDuplicates > 1
) tsum
on t.sub_id = tsum.sub_id and t.spec_id = tsum.spec_id and t.ex_time = tsum.ex_time
but now im not sure how to wrap this select with a delete query to delete the rows except for the ones with highest id.
as shown here
You can modify your sub-select query, to get maximum value of id for each duplication combination.
Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
You can now Delete from this result-set.
Try the following:
DELETE t
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id