I have the following query:
select count(1) num, business_id, user_id FROM `pos_transactions`
group by user_id, business_id
order by user_id
It returns this:
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 3 | 503 | 12 |
| 7 | 33 | 12 |
| 1 | 771 | 13 |
| 2 | 86 | 13 |
| 1 | 772 | 13 |
| 4 | 652 | 14 |
| 4 | 567 | 14 |
+--------+-------------+---------+
I need to select only one row per user_id, the one which has a bigger num value. If all num values for a user are identical, then just one of them should be selected randomly (i.e. user #14). So, here is the expected result:
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 7 | 33 | 12 |
| 2 | 86 | 13 |
| 4 | 567 | 14 |
+--------+-------------+---------+
Any idea how can I do that?
I guess the solution will be something related to limit 1 per user. But I have no idea how I should write the query.
All I want to do is making the table unique per user_id, and the logic is selecting rows that have bigger num.
Use MAX() and FIRST_VALUE() window functions:
SELECT DISTINCT
MAX(COUNT(*)) OVER (PARTITION BY user_id) num,
FIRST_VALUE(business_id) OVER (PARTITION BY user_id ORDER BY COUNT(*) DESC) business_id,
user_id
FROM pos_transactions
GROUP BY user_id, business_id
ORDER BY user_id
Related
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 1 year ago.
Please assume this table:
// mytable
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 3 | 503 | 12 |
| 7 | 33 | 12 |
| 1 | 771 | 13 |
| 2 | 86 | 13 |
| 1 | 772 | 13 |
| 4 | 652 | 14 |
| 4 | 567 | 14 |
+--------+-------------+---------+
I need to group it based on user_id, So, here is my query:
select max(num), user_id from mytable
group by user_id
Here is the result:
// res
+--------+---------+
| num | user_id |
+--------+---------+
| 7 | 12 |
| 2 | 13 |
| 4 | 14 |
+--------+---------+
Now I need to also get the business_id of those rows. Here is the expected result:
// mytable
+--------+-------------+---------+
| num | business_id | user_id |
+--------+-------------+---------+
| 7 | 33 | 12 |
| 2 | 86 | 13 |
| 4 | 567 | 14 | -- This is selected randomly, because of the equality of values
+--------+-------------+---------+
Any idea how can I do that?
You don't group. You filter. One method uses window functions such as row_number():
select t.*
from (select t.*,
row_number() over (partition by user_id order by num desc) as seqnum
from mytable t
) t
where seqnum = 1;
Another method which can have slightly better performance with an index on (user_id, num) is a correlated subquery:
select t.*
from mytable t
where t.num = (select max(t2.num)
from mytable t2
where t2.user_id = t.user_id
);
You should think "group by" when you want to summarize rows. You should think "where" when you want to choose rows with particular characteristics.
I have the following result set...
Name | Team | Score
A | 1 | 10
B | 1 | 11
C | 2 | 9
D | 2 | 15
and I want to add an extra column to the results set for the team score so I can sort on it and end up with the following data set...
Name | Team | Score | TeamScore
D | 2 | 15 | 24
C | 2 | 9 | 24
B | 1 | 11 | 21
A | 1 | 10 | 21
So I end up with the top team first with the members in order.
My actual data is way more complicated than this and pulls in data from several tables but if you can solve this one I can solve my bigger issue!
Join the table to a query that returns the total for each team:
select t.*, s.teamscore
from tablename t
inner join (
select team, sum(score) teamscore
from tablename
group by team
) s on s.team = t.team
order by s.teamscore desc, t.team, t.score desc
See the demo.
Results:
| Name | Team | Score | teamscore |
| ---- | ---- | ----- | --------- |
| D | 2 | 15 | 24 |
| C | 2 | 9 | 24 |
| B | 1 | 11 | 21 |
| A | 1 | 10 | 21 |
In MySQL 8+, we can simplify and just use SUM as an analytic function:
SELECT
Name,
Team,
Score,
SUM(Score) OVER (PARTITION BY Team) AS TeamScore
FROM yourTable
ORDER BY
TeamScore DESC,
Score;
my table has duplicate row values in specific columns. i would like to remove those rows and keep the row with the latest id.
the columns i want to check and compare are:
sub_id, spec_id, ex_time
so, for this table
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 1 | 100 | 444 | 09:29 | 2 |
| 2 | 101 | 555 | 10:01 | 10 |
| 3 | 100 | 444 | 09:29 | 23 |
| 4 | 200 | 321 | 05:15 | 5 |
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i would like to get this result
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i was able to build this query to select all duplicate rows from multiple columns, according to this question
select t.*
from mytable t join
(select id, sub_id, spec_id, ex_time, count(*) as NumDuplicates
from mytable
group by sub_id, spec_id, ex_time
having NumDuplicates > 1
) tsum
on t.sub_id = tsum.sub_id and t.spec_id = tsum.spec_id and t.ex_time = tsum.ex_time
but now im not sure how to wrap this select with a delete query to delete the rows except for the ones with highest id.
as shown here
You can modify your sub-select query, to get maximum value of id for each duplication combination.
Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
You can now Delete from this result-set.
Try the following:
DELETE t
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id
I am not very good at MySQL queries. Can someone help me figure out how to do this?
I have a table like this (lets call it stats):
+----+-------+-----+
| id | memid | qty |
+----+-------+-----+
| 1 | 99 | 0 |
+----+-------+-----+
| 2 | 102 | 22 |
+----+-------+-----+
| 3 | 102 | 10 |
+----+-------+-----+
| 4 | 99 | 100 |
+----+-------+-----+
| 5 | 17 | 25 |
+----+-------+-----+
| 6 | 87 | 72 |
+----+-------+-----+
| 7 | 36 | 0 |
+----+-------+-----+
| 8 | 102 | 6 |
+----+-------+-----+
I need a MySQL query that will combine the qty of all the memids and ORDER BY ASC the total qty for each memid.
Thank you in advance for your help! :)
You can select SUM as another field in query and order it by qty, e.g.:
SELECT id, memid, qty, SUM(qty)
FROM table
ORDER BY qty;
Please note that SUM will return the same value for all the rows as it will be a constant value.
If you have multiple records per memid and want to calculate SUM per memid then you can use GROUP BY e.g.:
SELECT memid, SUM(qty) AS `sum`
FROM table
GROUP BY memid
ORDER BY sum;
I have table with a bunch of (machine id) mid's and (sensor id) sid's, and their corresponding (values) v's. Needless to say the id column is a unique row number. (NB: There are other columns in the table, and not all mid's have the same sid's)
Current Table:
+------+-------+-------+-----+---------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+---------------------+
| 51 | 10 | 1 | 40 | 2015/5/1 11:56:01 |
| 52 | 10 | 2 | 39 | 2015/5/1 11:56:25 |
| 53 | 10 | 2 | 40 | 2015/5/1 11:56:42 |
| 54 | 11 | 1 | 50 | 2015/5/1 11:57:52 |
| 55 | 11 | 2 | 18 | 2015/5/1 11:58:41 |
| 56 | 11 | 2 | 19 | 2015/5/1 11:58:59 |
| 57 | 11 | 3 | 58 | 2015/5/1 11:59:01 |
| 58 | 11 | 3 | 65 | 2015/5/1 11:59:29 |
+------+-------+-------+-----+---------------------+
Q: How would I get the MAX(v)for each sid for each mid?
Expected Output:
+------+-------+-------+-----+---------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+---------------------+
| 51 | 10 | 1 | 40 | 2015/5/1 11:56:01 |
| 53 | 10 | 2 | 40 | 2015/5/1 11:56:42 |
| 54 | 11 | 1 | 50 | 2015/5/1 11:57:52 |
| 56 | 11 | 2 | 19 | 2015/5/1 11:58:59 |
| 58 | 11 | 3 | 65 | 2015/5/1 11:59:29 |
+------+-------+-------+-----+---------------------+
The expected output is to obtain the whole row with all the (single) max value for all the sids in all the mids.
Addendum:
Due to a very big table, I need to place boundaries with dates. For the sample above the two boundary dates should be 2015/05/01 00:00:00 (1st of May'15) till 2015/05/02 00:00:00 (2nd of May'15). Q: How could I add this date boundary?
Find the max v in subquery for each combination of mid, sid and then join it with your original table to get the desired result.
select *
from your_table t
join (
select mid, sid, max(v) as v
from your_table
group by mid, sid
) t2 using (mid, sid, v);
Note here that if there are multiple rows with same sid, mid and v, it will return all of them.
As mentioned in the comments, since you have an id column, you can include that in limited correlated query like this:
select *
from your_table t1
where id = (select id
from your_table t2
where t1.mid = t2.mid
and t1.sid = t2.sid
order by v desc, id desc
limit 1
);
This will give you one single row per mid, sid combination with max v (and latest id in case of ties).
Use MAX() function with GROUP BY clause
SELECT id, mid, sid, MAX(v) AS v, `timestamp`
FROM MyTable
GROUP BY mid, sid;
This returns rows with maximum values of v for each combination of mid and sid.