Removing bidirectional duplicates in MySQL

Removing bidirectional duplicates in MySQL - mysql

I'm modifying phpBB's table to have bidirectional relationships for friends. Unfortuntately, people that have already added friends have created duplicate rows:
user1 user2 friend
2 3 true
3 2 true
2 4 true
So I'd like to remove rows 1 and 2 from the example above. Currently, this is my query built (doesn't work atm):
DELETE FROM friends WHERE user1 IN (SELECT user1 FROM (SELECT f1.user1 FROM friends f1, friends f2 WHERE f1.user1=f2.user2 AND f1.user2=f2.user1 GROUP BY f1.user1) AS vtable);
inspired by Mysql Duplicate Rows ( Duplicate detected using 2 columns ), but the difference is that I don't have the unique ID column and I'd like stay away from having an extra column.

Apologies if this isn't 100% legal MySQL, I'm a MSSQL user...
DELETE F1
FROM friends F1
INNER JOIN friends F2
ON F2.user1 = F1.user2
AND F2.user2 = F1.user1
WHERE F1.user1 < F1.user2

DELETE r
FROM friends l, friends r
WHERE l.user1 = r.user2
AND l.user2 = r.user1
This deletes both entries. If you like to keep on of them you have to add a where statement like Will A alread proposed, but i suggest you to use > instead of < to keep the smaller user1 id. Just looks better :)

Related

Get mutual relationship from database table

I have table user_follow which has only two columns: who and whom. They represent links between users when WHO follows WHOM. If two users follow each other, they are considered friends. In such case the table would contain records:
WHO WHOM
1 2
2 1
Where 1 and 2 are simply user IDs.
In order to determine if two users are friends, I have to query the table and use simple condition
SELECT COUNT(*)
FROM user_follow
WHERE (who = 1 AND whom = 2) OR (who = 2 AND whom = 1)
If I get 2 then they are friends.
But if I want to load list of all user's friends, I cannot do it that way. So i came up with sql join into itself:
SELECT uf2.whom
FROM user_follow AS uf1
LEFT JOIN user_follow AS uf2 ON uf1.who = uf2.whom
WHERE uf1.whom = ? AND uf2.who = ? ORDER BY uf2.whom
I made a dummy table to test it and it works. But I would like for someone to confirm this is the correct solution.

I think your query is correct if you supply the same parameter value for ?. I find the query below simpler to understand.
The query below lists the friends (users who follow each other) of a user:
SELECT uf1.whom
FROM user_follow AS uf1
INNER JOIN user_follow AS uf2
-- filter down to followed users
ON uf1.whom = uf2.who
-- followed users follow their followers
AND uf1.who = uf2.whom
-- the user whose friends are listed
WHERE uf1.who = ?
ORDER BY 1
;

Mutual friends sql

I've seen multiple SO posts on mutual friends but I've structured my friends table in my db so that there are no duplicates e.g. (1,2) and not (2,1)
Create Table Friends(
user1_id int,
user2_id int
);
and then a constraint to make sure user1 id is always smaller than user2 id e.g 4 < 5
Mutual friends sql with join (Mysql)
I see suggestions that to find mutual friends it can be found using a join, so this is what I have but I think it's wrong because if I count the data in my db with the actual result from the query I get different results
select f1.user1_id as user1, f2.user1_id as user2, count(f1.user2_id) as
mutual_count from Friends f1 JOIN Friends f2 ON
f1.user2_id = f2.user2_id AND f1.user1_id <> f2.user1_id GROUP BY
f1.user1_id, f2.user1_id order by mutual_count desc

There are three join scenarios that I can see.
1 -> 2 -> 3 (mutual friend id between other IDs)
2 -> 3 -> 1 (mutual friend id > other IDs)
2 -> 1 -> 3 (mutual friend id < other IDs)
This can be resolved with this predicate...
ON f1.user1_id IN (f2.user1_id, f2.user2_id)
OR f1.user2_id IN (f2.user1_id, f2.user2_id)
AND <not joining the row to Itself>
But that will totally mess up the optimiser's ability to use indexes.
So, I'd union multiple queries.
(pseudo code as I'm on a phone)
SELECT u1, u2, COUNT(*) FROM
(
SELECT f1.u1, f2.u2 FROM f1 INNER JOIN f2 ON f1.u2 = f2.u1 AND f1.u1 <> f2.u2
UNION ALL
SELECT f1.u1, f2.u1 FROM f1 INNER JOIN f2 ON f1.u2 = f2.u2 AND f1.u1 <> f2.u1
UNION ALL
SELECT f1.u2, f2.u2 FROM f1 INNER JOIN f2 ON f1.u1 = f2.u1 AND f1.u2 <> f2.u2
) all_combinations
GROUP BY u1, u2
Each individual query will then be able to fully utilise indexes. (Put one index on u1 and another index on u2)
The result should be less esoteric code (with fairly long CASE statements) and a much lower costed execution plan.

Why is my mutual friends query so slow?

This query used to run very quickly when my database only had a finite amount of friends in it, however as the user base has grown I've found the query getting exponentially slower.
The schema for my friends table looks like:
friend_id - entity_id1 - entity_id2 - category
1 1 2 1
2 2 1 1
3 3 2 1
4 2 3 1
5 1 3 1
6 3 1 1
As we can see above, there is a reciprocal relationship between each friend association, this was implemented to improve the query time for suggested friends.
I am now trying to return mutual friends in my suggested friends payload, however the query takes around 1 second to compile for each user in a while loop, and I only currently have 2000 users...this will be a huge problem as the application scales further.
The query I am currently using is as follows:
SELECT COUNT(*) AS mutual_count
FROM entity
WHERE EXISTS(
SELECT *
FROM friends
WHERE friends.Entity_Id1 = :friendId AND friends.Category <> 4
AND friends.Entity_Id2 = entity.Entity_Id
)
AND EXISTS(
SELECT *
FROM friends
WHERE friends.Entity_Id1 = :userId AND friends.Category <> 4
AND friends.Entity_Id2 = entity.Entity_Id
)
Where :userId is the logged in user and :friendId is the user we want to get the mutual friends of. As I said, this query works fine but its extremely slow, how can I optimise it?

What you have there are 2 correlated subqueries which are the fastest queries you can get in mysql, the best index for both subqueries is this one:
ALTER TABLE friends ADD KEY (Entity_Id1, Category, Entity_Id2)

You have two heavy subqueries and You selects all columns in both of them.
First eliminate one of them - try that:
SELECT COUNT(*) AS mutual_count
FROM entity
WHERE (
SELECT COUNT(*) > 0
FROM friends
WHERE friends.Category <> 4 AND friends.Entity_Id2 = entity.Entity_Id
AND (friends.Entity_Id1 = :friendId AND friends.Entity_Id1 = :userId)
)
Then I suggest inner join instead of subquery - maybe something like that:
SELECT COUNT(DISTINCT entity.id) AS mutual_count
FROM entity
INNER JOIN friends ON friends.Entity_Id2 = entity.Entity_Id
AND friends.Category <> 4
AND (friends.Entity_Id1 = :userId AND friends.Entity_Id1 = :friendId)
I haven't checked it (and I don't know tables structure) so there may be some syntax error - but I hope It will help You somehow.

MySQL Query to find friends and number of mutual friends

I have looked through the questions but I cant find anything that does exactly what I need and I can't figure out how to do it myself.
I have 2 tables, a user table and a friend link table. The user table is a table of all my users:
+---------+------------+---------+---------------+
| user_id | first_name | surname | email |
+---------+------------+---------+---------------+
1 joe bloggs joe#test.com
2 bill bloggs bill#test.com
3 john bloggs john#test.com
4 karl bloggs karl#test.com
My friend links table then shows all relationships between the users, for example:
+--------=+---------+-----------+--------+
| link_id | user_id | friend_id | status |
+---------+---------+-----------+--------+
1 1 3 a
2 3 1 a
3 4 3 a
4 3 4 a
5 2 3 a
6 3 2 a
As a note the a in the status column means approved, there could also be r(request) and d(declined).
What I want to do is have a query where if a user does a search it will bring back a list of users that they are currently not already friends with and how many mutual friends each user has with them.
I have managed to get a query for all users that are currently not friends with them. So if the user doing the search had the user id of 1:
SELECT u.user_id,u.first_name,u.surname
FROM users u
LEFT JOIN friend_links fl
ON u.user_id = fl.user_id AND 1 IN (fl.friend_id)
WHERE fl.friend_id IS NULL
AND u.user_id != 1
AND surname LIKE 'bloggs'
How then do I have a count of the number of mutual friends for each returned user?
EDIT:
Just as an edit as I don't think I am being particularly clear with my question.
The query that I currently have above will produce the following set of results:
+---------+------------+---------+
| user_id | first_name | surname |
+---------+------------+---------+
2 bill bloggs
4 karl bloggs
Those are the users matching the surname bloggs that are not currently friends with joe bloggs (user id 1).
Then I want to have how many mutual friends each of these users has with the user doing the search so the returned results would look like:
+---------+------------+---------+--------+
| user_id | first_name | surname | mutual |
+---------+------------+---------+--------+
2 bill bloggs 1
4 karl bloggs 1
Each of these returned users has 1 mutual friend as joe bloggs (user id 1) is friends with john bloggs and john bloggs is friends with both returned users.
I hope this is a bit more clear.
Thanks.

Mutual friends can be found by joining the friend_links table to itself on the friend_id field like so:
SELECT *
FROM friend_links f1 INNER JOIN friend_links f2
ON f1.friend_id = f2.friend_id
WHERE f1.user_id = $person1
AND f2.user_id = $person2
But bear in mind that this, in its worst case, is essentially squaring the number of rows in the friend_links table and can pretty easily jack up your server once you have a non-trivial number of rows. A better option would be to use 2 sub-queries for each user and then join the results of those.
SELECT *
FROM (
SELECT *
FROM friend_links
WHERE user_id = $person1
) p1 INNER JOIN (
SELECT *
FROM friend_links
WHERE user_id = $person1
) p2
ON p1.friend_id = p2.friend_id
Also, you can simplify your friend_links table by removing the surrogate key link_id and just making (user_id,friend_id) the primary key since they must be unique anyway.
Edit:
How would this be applied to the original query of searching for users that aren't already friends, I would like to do both in a single query if possible?
SELECT f2.user_id, COUNT(*) 'friends_in_common'
FROM friend_links f1 LEFT JOIN friend_links f2
ON f1.friend_id = f2.friend_id
WHERE f1.user_id = $person
GROUP BY f2.user_id
ORDER BY friends_in_common DESC
LIMIT $number
I am also thinking that the user_id constraints can be moved from the WHERE clause into the JOIN conditions to reduce the size of the data set created by the self-join and preclude the use of subqueries like in my second example.

This query lists anyone who's not friend with user 1 and whose surname matches '%bloggs%':
SELECT
users.user_id,
users.first_name,
users.surname,
Sum(IF(users.user_id = friend_links_1.friend_id, 1, 0)) As mutual
FROM
users inner join
(friend_links INNER JOIN friend_links friend_links_1
ON friend_links.friend_id = friend_links_1.user_id)
ON friend_links.user_id=1 AND users.user_id<>1
WHERE
users.surname LIKE '%bloggs%'
GROUP BY
users.user_id, users.first_name, users.surname
HAVING
Sum(IF(users.user_id = friend_links.friend_id, 1, 0))=0
just change the user id on the ON clause, and the surname on the WHERE clause. I think it should work correctly now!

If A is friend of B, then B is also a friend of A? Wouldn't it be better to use just a link instead of two links (and instead of two rows in friends_links)? Then you have to use two status columns, status1 and status2, and A is friend of B only if status1 = status2 = "a".
There are many ways to show mutual friends, e.g.:
SELECT friend_id
FROM friend_links
WHERE friend_links.user_id = $user1 or friend_links.user_id = $user2
AND NOT (friend_links.friend_id = $user1 or friend_links.friend_id = $user2)
GROUP BY friend_id
HAVING Count(*)>1
And this query shows for each user and anyone who's not his/her friend:
SELECT
users.user_id,
users.first_name,
users_1.user_id,
users_1.first_name
FROM
users INNER JOIN users users_1 ON users.user_id <> users_1.user_id
WHERE
NOT EXISTS (SELECT *
FROM friend_links
WHERE
friend_links.user_id = users.user_id
AND friend_links.friend_id = users_1.user_id)
(The only think I didn't check is the friendship status, but it's easy to add that check).
I'm still working on it, but it's not easy to combine nicely these two queries togheter. So this isn't exactly an answer, I'm just showing some ideas that i've tried.
But what do you need exactly? A query that returns every user with anyone who's not his/her friend and the number of friends in common, or is the user_id already given?
With some code it's not a problem to answer your question... but there has to be a nice way just by using SQL! :)
EDIT:
I'm still wondering if there's a better solution to this, in particular the next query could be extremely slow, but it looks like this might work:
SELECT
users_1.user_id,
users_2.user_id,
Sum(IF(users_1.user_id = friend_links.user_id AND users_2.user_id = friend_links_1.friend_id, 1, 0)) As CommonFriend
FROM
users users_1 INNER JOIN users users_2
ON users_1.user_id <> users_2.user_id,
(friend_links INNER JOIN friend_links friend_links_1
ON friend_links.friend_id = friend_links_1.user_id)
GROUP BY
users_1.user_id,
users_2.user_id
HAVING
Sum(IF(users_1.user_id = friend_links.user_id AND users_2.user_id = friend_links.friend_id, 1, 0))=0
(as before, i didn't check friendship status)
If user is given, you could put WHERE users_1.user_id=$user1 but it's better to just leave one user table, and filter the next INNER JOIN whith that user.

How to model Friendship relationships

I have been trying to figure out how to do this, and even with looking at other examples, I can't get it figured out, so maybe I can get some personalized help.
I've got two tables, users_status and friendships.
In the users_status table I have a field userid, and several others.
In the friendships table, I have the fields request_to,request_from, and friendship_status.
Basically what I want to do is get all of the status posts by the current user AND those who are friends of the current user (which I can specify in my PHP using a $userid variable).
Here's an example of the friendships table structure. When a friend request is sent, the userid of the sender and receiver are placed in the table, with a friendship_status of 0. When the request is accepted, the friendship_status is set to 1 and those two are now friends.
friendship_id request_from request_to friendship_status
1 111248 111249 1
2 111209 111249 1
3 111209 111248 0
11 111209 111259 1
5 111252 111209 1
12 111261 111209 1
I realize this may not even be the best structure for determining friendships, especially since the site is relationship based and having to check for friendship connections will be a frequently used thing.
Would it perhaps be better to have two separate tables for friend_requests and friendships? If so, how would I structure/manage the friendships table?

You can use a table join (e.g. http://dev.mysql.com/doc/refman/5.0/en/join.html) to find all of the requests.
Actually you can use a subquery here:
SELECT * FROM users_status WHERE userid = "$userid"
OR userid in (SELECT request_to FROM friendships where request_from = "$userid" AND friendship_status = 1)
OR userid in (SELECT request_from FROM friendships where request_to = "$userid" AND friendship_status = 1)
replace $userid with your user id

The simplest schema I can think of is:
PENDING_FRIENDSHIPS(request_from, request_to)
FRIENDSHIPS(request_from, request_to)
I also removed the ID because both fields on both tables will be compound primary keys (request_from, request_to).
To get all friends from the current user just run:
select * from friendships
where $currentUser = request_from OR $currentUser = request_to
This would return both columns and you would have to remove in PHP the current user.
Another way to get all friends from this schema is to run a UNION:
select request_from from friendships
where request_to = $currentUser
UNION
select request_to from friendships
where request_from = $currentUser
The drawback of this solution is that you're running 2 selects

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Removing bidirectional duplicates in MySQL - mysql

Apologies if this isn't 100% legal MySQL, I'm a MSSQL user... DELETE F1 FROM friends F1 INNER JOIN friends F2 ON F2.user1 = F1.user2 AND F2.user2 = F1.user1 WHERE F1.user1 < F1.user2

DELETE r FROM friends l, friends r WHERE l.user1 = r.user2 AND l.user2 = r.user1 This deletes both entries. If you like to keep on of them you have to add a where statement like Will A alread proposed, but i suggest you to use > instead of < to keep the smaller user1 id. Just looks better :)

Related

Get mutual relationship from database table

Mutual friends sql

Why is my mutual friends query so slow?

MySQL Query to find friends and number of mutual friends

How to model Friendship relationships

Categories

Resources