Delete all rows except the one with the biggest value - mysql

I store transactions in a table, and I want to delete all transactions (grouped by a user_id) except the one with the biggest amount, here is an example table:
+----+---------+--------+
| id | user_id | amount |
+----+---------+--------+
| 1 | 1 | 10 |
+----+---------+--------+
| 2 | 1 | 20 |
+----+---------+--------+
| 3 | 1 | 30 |
+----+---------+--------+
| 4 | 2 | 50 |
+----+---------+--------+
| 5 | 2 | 100 |
+----+---------+--------+
| 6 | 3 | 2 |
+----+---------+--------+
| 7 | 3 | 4 |
+----+---------+--------+
I want the following result
+----+---------+--------+
| id | user_id | amount |
+----+---------+--------+
| 3 | 1 | 30 |
+----+---------+--------+
| 5 | 2 | 100 |
+----+---------+--------+
| 7 | 3 | 4 |
+----+---------+--------+
I tried
DELETE FROM `transactions`
WHERE `user_id` NOT IN (
SELECT `user_id`
FROM (
SELECT MAX(`amount`) AS ts
FROM `transactions` e
WHERE `user_id` = `user_id`
) s
WHERE ts = `transactions`.`amount`
)
ORDER BY `transactions`.`user_id` ASC

DELETE FROM `transactions`
WHERE id NOT IN
(
SELECT MAX(id)
FROM `transactions`
group by user_id
)
The inner query groups by each user and select only the highest ID for each. Delete all records except the IDs from the inner select.

Wasn't sure what did you mean by except the latest one so I considered except last record inserted hence ORDER BY id DESC was used
DELETE FROM `transactions`
WHERE `id` NOT IN (
SELECT `id`
FROM `transactions`
GROUP BY `user_id`
ORDER BY `id` DESC
)

Related

Mysql delete similar rows according to specific columns except the ones with highest id

my table has duplicate row values in specific columns. i would like to remove those rows and keep the row with the latest id.
the columns i want to check and compare are:
sub_id, spec_id, ex_time
so, for this table
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 1 | 100 | 444 | 09:29 | 2 |
| 2 | 101 | 555 | 10:01 | 10 |
| 3 | 100 | 444 | 09:29 | 23 |
| 4 | 200 | 321 | 05:15 | 5 |
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i would like to get this result
+----+--------+---------+---------+-------+
| id | sub_id | spec_id | ex_time | count |
+----+--------+---------+---------+-------+
| 5 | 100 | 444 | 09:29 | 8 |
| 6 | 101 | 555 | 10:01 | 1 |
+----+--------+---------+---------+-------+
i was able to build this query to select all duplicate rows from multiple columns, according to this question
select t.*
from mytable t join
(select id, sub_id, spec_id, ex_time, count(*) as NumDuplicates
from mytable
group by sub_id, spec_id, ex_time
having NumDuplicates > 1
) tsum
on t.sub_id = tsum.sub_id and t.spec_id = tsum.spec_id and t.ex_time = tsum.ex_time
but now im not sure how to wrap this select with a delete query to delete the rows except for the ones with highest id.
as shown here
You can modify your sub-select query, to get maximum value of id for each duplication combination.
Now, while joining to the main table, simply put a condition that id value will not be equal to the maximum id value.
You can now Delete from this result-set.
Try the following:
DELETE t
FROM mytable AS t
JOIN
(SELECT MAX(id) as max_id,
sub_id,
spec_id,
ex_time,
COUNT(*) as NumDuplicates
FROM mytable
GROUP BY sub_id, spec_id, ex_time
HAVING NumDuplicates > 1
) AS tsum
ON t.sub_id = tsum.sub_id AND
t.spec_id = tsum.spec_id AND
t.ex_time = tsum.ex_time AND
t.id <> tsum.max_id

I have an article table that stores the user id, article id and the article view and i want to retrieve the id where they are greater than 50

I have an article table that stores the user id, article id, and the article views. I want to get all the users from this table and order them by their total article views (sum) > 50.
See the table below.
id | user_id | article_id | views
1 2 1 34
2 2 2 26
3 3 3 19
4 3 4 26
5 4 5 40
6 4 6 29
I want to get something like this.
user_id | views
2 60
4 69
You would probably want to use an aggragating function sum to total up the views and then use group by to give the desired response.
mysql> describe `blah`;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| user_id | int(11) | YES | | NULL | |
| article_id | int(11) | YES | | NULL | |
| views | int(11) | YES | | NULL | |
+------------+------------------+------+-----+---------+----------------+
mysql> select * from blah;
+----+---------+------------+-------+
| id | user_id | article_id | views |
+----+---------+------------+-------+
| 1 | 2 | 1 | 34 |
| 2 | 2 | 2 | 26 |
| 3 | 3 | 3 | 19 |
| 4 | 3 | 4 | 26 |
| 5 | 4 | 5 | 40 |
| 6 | 4 | 6 | 29 |
+----+---------+------------+-------+
/* use `SUM` and `GROUP BY` to give desired output */
mysql> select `user_id`, sum( `views` ) from `blah` group by `user_id`;
+---------+----------------+
| user_id | sum( `views` ) |
+---------+----------------+
| 2 | 60 |
| 3 | 45 |
| 4 | 69 |
+---------+----------------+
To limit the records to above 50
mysql> select `user_id`, sum( `views` ) as 'total' from `blah` group by `user_id`
having `total` > 50;
+---------+-------+
| user_id | total |
+---------+-------+
| 2 | 60 |
| 4 | 69 |
+---------+-------+
What you need is GROUP BY, SUM() and HAVING
You need GROUP BY to tell mysql that you are grouping entries of the same user together into one entry. You need SUM(views) to tell mysql that all the entries for views for the same user need to be added together. Finally, you need HAVING to tell mysql that there is a condition on the group-by-level that needs to be fulfilled in order to be considered a valid result.
SELECT user_id, SUM(views) AS views
FROM table GROUP
GROUP BY user_id
HAVING SUM(views) > 50

How to select last row for specific user?

I have a table like this:
// requests
+----+----------+-------------+
| id | id_user | unix_time |
+----+----------+-------------+
| 1 | 2353 | 1339412843 |
| 2 | 2353 | 1339412864 |
| 3 | 5462 | 1339412894 |
| 4 | 3422 | 1339412899 |
| 5 | 3422 | 1339412906 |
| 6 | 2353 | 1339412906 |
| 7 | 7785 | 1339412951 |
| 8 | 2353 | 1339413640 |
| 9 | 5462 | 1339413621 |
| 10 | 5462 | 1339414490 |
| 11 | 2353 | 1339414923 |
| 12 | 2353 | 1339419901 |
| 13 | 8007 | 1339424860 |
| 14 | 7785 | 1339424822 |
| 15 | 2353 | 1339424902 |
| 16 | 2353 | 1466272801 |
| 17 | 2353 | 1466272805 |
+----+----------+-------------+
I'm trying to get last row which is related to specific user. For example for this user id_user = 7785 I want to select this row:
| 14 | 7785 | 1339424822 |
And here is my query:
SELECT unix_time AS last_seen
FROM requests WHERE id = '7785'
ORDER BY unix_time DESC
LIMIT 1
But my query doesn't select any row. What's wrong?
SQL fiddle
Also as a note, can you please tell me should I create single index on id_user and uinx_time or should I create a multiple columns index on those two columns (id_user, unix_time)?
You are using the wrong column (id instead of id_user — or vice versa):
SELECT unix_time AS last_seen
FROM requests WHERE id_user = '7785'
ORDER BY unix_time DESC
LIMIT 1
Let a sub-query return each id_user with it's highest unix_time. Join with that result.
select t1.*
from tablename t1
join (select id_user, max(unix_time) as unix_time
from tablename
group by id_user) t2
on t1.id_user = t2.id_user and t1.unix_time = t2.unix_time
Will return all users and their latest login.
Add WHERE id_user = '7785' if info about a single user is wanted.
In the WHERE condition you are comparing id with id of id_user.
Your mistake is at line two, it should be like:
SELECT unix_time AS last_seen
FROM requests WHERE id_user = '7785'
ORDER BY unix_time DESC
LIMIT 1
You have:
WHERE id = '7785'
If you want the row: | 7 | 7785 | 1339412951 |
You need something like:
SELECT TOP 1 unix_time AS last_seen
FROM requests WHERE id_user = '7785'
ORDER BY unix_time DESC
This will take the top result of the query where the user_ID equals '7785'. The 'ORDER BY unix_time DESC' puts the values with the highest unit time at the top so it will by the last request that user has had.
You can use below query it will help you
SELECT SUBSTRING_INDEX(id,',',1) AS id ,SUBSTRING_INDEX(id_user,',',1) AS id_user, unix_time FROM
(SELECT GROUP_CONCAT(id ORDER BY unix_time DESC ) AS id, GROUP_CONCAT(id_user ORDER BY unix_time DESC ) AS id_user ,MAX(unix_time) AS unix_time FROM test_71 GROUP BY id_user
HAVING id_user='7785' ) t ;

MySQL sort by sum multiple columns in different tables

I have 3 tables:
Users
| id | name |
|----|-------|
| 1 | One |
| 2 | Two |
| 3 | Three |
Likes
| id | user_id | like |
|----|---------|-------|
| 1 | 1 | 3 |
| 2 | 1 | 5 |
| 3 | 2 | 1 |
| 4 | 3 | 2 |
Transations
| id | user_id | transaction |
|----|---------|-------------|
| 1 | 1 | -1 |
| 2 | 2 | 5 |
| 3 | 2 | -1 |
| 4 | 3 | 10 |
I need get sum of likes.like and transations.transation for each user and then sort it by its result.
I was able to do it for users and likes:
select users.*, sum(likes.like) as points
from `users`
inner join `likes` on `likes`.`user_id` = `users`.`id`
group by `users`.`id`
order by points desc
But then I add transactions table like this:
select users.*, (sum(likes.like)+sum(transactions.`transaction`)) as points
from `users`
inner join `likes` on `likes`.`user_id` = `users`.`id`
inner join `transactions` on `transactions`.`user_id` = `users`.`id`
group by `users`.`id`
order by points desc
It is show wrong results.
I expecting to see:
| id | name | points |
|----|-------|--------|
| 3 | Three | 12 |
| 1 | One | 7 |
| 2 | Two | 5 |
But get this instead:
| id | name | points |
|----|-------|--------|
| 3 | Three | 12 |
| 1 | One | 6 |
| 2 | Two | 5 |
So, how sort users by sum likes.like and transations.transation?
Thank you!
Since there's not a 1-to-1 relationships between transactions and likes, I think you need to use subqueries:
select users.*,
(select sum(points) from likes where user_id = users.id) as points,
(select sum(transaction) from transactions where user_id = users.id) as transactions
from users
order by points desc
Updated after more explanation of requirements:
select users.*,
(select sum(points) from likes where user_id = users.id) +
(select sum(transaction) from transactions where user_id = users.id) as points
from users
order by points desc

Mysql Calculate rank of teams from different rows

I'm trying to build a kind of peddy paper. For that I have the following tables:
teams
CREATE TABLE `teams` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`creator_id` int(11) NOT NULL,
`friend_id` int(11) DEFAULT NULL,
`team_name` varchar(128) NOT NULL,
PRIMARY KEY (`id`)
);
team_log
CREATE TABLE IF NOT EXISTS `progress_tracker` (
`id` int(8) NOT NULL AUTO_INCREMENT,
`user_id` int(8) NOT NULL,
`team_id` int(11) NOT NULL,
`date` date NOT NULL,
`clues_found` int(11) NOT NULL,
`clues_to_find` int(11) NOT NULL,
PRIMARY KEY (`id`)
);
Each team is composed by two users;
Each user starts out with a variable number of clues found;
clues_found can either increase or decrease. No guarantee that the highest number is the latest;
I need to get a rank of the teams (in percentage) based on the average of the number of clues the user found since they joined (for both users in a team) - clues_found on the row with biggest date minus clues_found on the record with lowest date).
For instance if I have the following data for each table:
teams table data
+--------+------------+------------+---------------+
| id | creator_id | friend_id | team_name |
+--------+------------+------------+---------------+
| 1 | 25 | 28 | Test1 |
| 2 | 31 | 5 | Test2 |
+--------+------------+------------+---------------+
team_log table data
+--------+---------+---------+------------+-------------+---------------+
| id | user_id | team_id | date | clues_found | clues_to_find |
+--------+---------+---------+------------+-------------+---------------+
| 1 | 25 | 1 | 2013-01-6 | 3 | 24 |
| 2 | 25 | 1 | 2013-01-8 | 7 | 24 |
| 3 | 25 | 1 | 2013-01-10 | 10 | 24 |
| 4 | 28 | 1 | 2013-01-8 | 5 | 30 |
| 5 | 28 | 1 | 2013-01-14 | 20 | 30 |
| 6 | 31 | 2 | 2013-01-11 | 6 | 14 |
| 7 | 5 | 2 | 2013-01-9 | 2 | 20 |
| 8 | 5 | 2 | 2013-01-10 | 10 | 20 |
| 9 | 5 | 2 | 2013-01-12 | 14 | 20 |
+--------+---------+---------+------------+-------------+---------------+
Desired Result
+-------------+---------------------+
| team_id | team_percentage |
+-------------+---------------------+
| 1 | 39,58333333 |
| 2 | 30 |
+-------------+---------------------+
As a reference this is an intermediate representation which might help to understand:
+-------------+---------+---------------------+
| user_id | team_id | precentage_per_user |
+-------------+---------+---------------------+
| 25 | 1 | 29,16666667 |
| 28 | 1 | 50 |
| 31 | 2 | 0 |
| 5 | 2 | 60 |
+-------------+---------+---------------------+
So far I have the following sql:
SELECT STRAIGHT_JOIN
tl2.team_id, (tl2.weight - tl1.weight)*100/tl2.clues_to_find
from
( select
team_id,user_id,clues_found
FROM
`team_log`
where 1
group by
team_id, user_id
order by
`date` ) base
join (select team_id, user_id, clues_found, clues_to_find from `team_log` where user_id = base.user_id and team_id = base.team_id group by team_id, user_id order by `date` desc) tl2
But this returns an error as I'm not allowed to use base.user_id inside the second query. I'm also not very sure I'm heading in the right direction.
Can anyone help please?
Here's another query that will produce the correct result:
SELECT calc.team_id, AVG((calc.end_clues - calc.start_clues)/calc.total_clues*100) as team_percentage
FROM
(SELECT log1.user_id, log1.team_id, log1.clues_found as start_clues, log2.clues_found as end_clues, log2.clues_to_find as total_clues FROM team_log log1
JOIN
(SELECT MIN(id) as start_id, MAX(id) as end_id FROM team_log GROUP BY user_id) ids
ON ids.start_id = log1.id
JOIN team_log log2 ON ids.end_id = log2.id) calc
GROUP BY team_id
ORDER BY team_id;
And the SQL Fiddle-link...
Please take a look at this and comment:
SQLFIDDLE DEMO
Team pct:
select z.team_id, avg(z.pct) as teampct
from (
select x.user_id, y.team_id, x.mndate,
y.mxdate, x.mnclues_found,
y.mxclues_found,
(((y.mxclues_found - x.mnclues_found)*100)
/y.mxclues_tofind) pct
from
(select user_id, team_id, min(date) mndate,
min(clues_found) as mnclues_found
from team_log
group by user_id, team_id) x
left join
(select user_id, team_id, max(date) mxdate,
max(clues_found) as mxclues_found,
max(clues_to_find) as mxclues_tofind
from team_log
group by user_id, team_id) y
on x.user_id = y.user_id and
x.team_id = y.team_id) z
group by z.team_id
;
Results 1:
| USER_ID | TEAM_ID | MNDATE | MXDATE | MNCLUES_FOUND | MXCLUES_FOUND | PCT |
-------------------------------------------------------------------------------------
| 5 | 2 | 13-01-09 | 13-01-12 | 2 | 14 | 60 |
| 25 | 1 | 13-01-06 | 13-01-10 | 3 | 10 | 29.1667 |
| 28 | 1 | 13-01-08 | 13-01-14 | 5 | 20 | 50 |
| 31 | 2 | 13-01-11 | 13-01-11 | 6 | 6 | 0 |
Results final:
| TEAM_ID | TEAMPCT |
----------------------
| 1 | 39.58335 |
| 2 | 30 |
This is a bit ugly, but should work:
select
team_id,
AVG(percentage_per_user) as team_percentage
from (select
team_id,
user_id,
((select clues_found from progress_tracker as x
where x.user_id = m.user_id order by x.date desc limit 0, 1)
- (select clues_found from progress_tracker as y
where y.user_id = m.user_id order by y.date asc limit 0, 1))
/ MAX(clues_to_find)
as percentage_per_user
from progress_tracker as m
group by team_id, user_id
) as userScore
group by team_id
order by team_percentage desc;
Note the inner query run by itself will yield your intermediate "per-user" result.
SQLFiddle
SELECT `team_id`,
(SUM(CASE WHEN b.`date` IS NULL THEN 0 ELSE `clues_found` * 100 / `clues_to_find` END) -
SUM(CASE WHEN c.`date` IS NULL THEN 0 ELSE `clues_found` * 100 / `clues_to_find` END)) / 2
FROM `team_log` a
LEFT JOIN (
SELECT `team_id`, `user_id`, MAX(date) AS `date`
FROM `team_log`
GROUP BY `team_id`, `user_id`) b
USING (`team_id`, `user_id`, `date`)
LEFT JOIN (
SELECT `team_id`, `user_id`, MIN(date) AS `date`
FROM `team_log`
GROUP BY `team_id`, `user_id`) c
USING (`team_id`, `user_id`, `date`)
GROUP BY `team_id`
Since you say there are always two team members, I've used /2. It would be slightly more complex for variable-sized teams.