Why is my mutual friends query so slow? - mysql

This query used to run very quickly when my database only had a finite amount of friends in it, however as the user base has grown I've found the query getting exponentially slower.
The schema for my friends table looks like:
friend_id - entity_id1 - entity_id2 - category
1 1 2 1
2 2 1 1
3 3 2 1
4 2 3 1
5 1 3 1
6 3 1 1
As we can see above, there is a reciprocal relationship between each friend association, this was implemented to improve the query time for suggested friends.
I am now trying to return mutual friends in my suggested friends payload, however the query takes around 1 second to compile for each user in a while loop, and I only currently have 2000 users...this will be a huge problem as the application scales further.
The query I am currently using is as follows:
SELECT COUNT(*) AS mutual_count
FROM entity
WHERE EXISTS(
SELECT *
FROM friends
WHERE friends.Entity_Id1 = :friendId AND friends.Category <> 4
AND friends.Entity_Id2 = entity.Entity_Id
)
AND EXISTS(
SELECT *
FROM friends
WHERE friends.Entity_Id1 = :userId AND friends.Category <> 4
AND friends.Entity_Id2 = entity.Entity_Id
)
Where :userId is the logged in user and :friendId is the user we want to get the mutual friends of. As I said, this query works fine but its extremely slow, how can I optimise it?

What you have there are 2 correlated subqueries which are the fastest queries you can get in mysql, the best index for both subqueries is this one:
ALTER TABLE friends ADD KEY (Entity_Id1, Category, Entity_Id2)

You have two heavy subqueries and You selects all columns in both of them.
First eliminate one of them - try that:
SELECT COUNT(*) AS mutual_count
FROM entity
WHERE (
SELECT COUNT(*) > 0
FROM friends
WHERE friends.Category <> 4 AND friends.Entity_Id2 = entity.Entity_Id
AND (friends.Entity_Id1 = :friendId AND friends.Entity_Id1 = :userId)
)
Then I suggest inner join instead of subquery - maybe something like that:
SELECT COUNT(DISTINCT entity.id) AS mutual_count
FROM entity
INNER JOIN friends ON friends.Entity_Id2 = entity.Entity_Id
AND friends.Category <> 4
AND (friends.Entity_Id1 = :userId AND friends.Entity_Id1 = :friendId)
I haven't checked it (and I don't know tables structure) so there may be some syntax error - but I hope It will help You somehow.

Related

Get list of friends n depth

I have simple table friends that look like that:
With the id of a person (id_friend) and the id of its friend (id_friend_of).
I'm trying to get all the IDs of friends of a specific user with a depth, so get all people linked to a specific user with a determined depth.
What I'm trying for a depth of 2 (get the friends of the user and the friends of its friends) :
SELECT DISTINCT
a.id_friend_of
FROM friend a
JOIN friend b
ON b.id_friend = a.id_friend_of
WHERE a.id_friend = 1 AND
b.id_friend <> a.id_friend
But it's not working, I'm only getting the friends of the user but not the friends of friends.
What can I do to make this work?
get the friends of the user and the friends of its friends
You can get the friends of the user with a simple filtered query on the table and the friends of friends with a self join of the table.
Then use UNION to get the results of the 2 queries, which will also remove duplicates:
SELECT id_friend_of
FROM friend
WHERE id_friend = 1
UNION
SELECT f2.id_friend_of
FROM friend f1 INNER JOIN friend f2
ON f2.id_friend = f1.id_friend_of
WHERE f1.id_friend = 1 AND f2.id_friend_of <> 1
For levels above 2, it's better to use a recursive query (for MySql 8.0+):
WITH RECURSIVE cte AS (
SELECT *, 1 level
FROM friend
WHERE id_friend = 1
UNION ALL
SELECT f.*, level + 1
FROM cte c INNER JOIN friend f
ON f.id_friend = c.id_friend_of
WHERE f.id_friend_of <> 1 AND level < 2 -- for level = 2
)
SELECT DISTINCT id_friend_of
FROM cte
See a simplified demo.

Mutual friends sql

I've seen multiple SO posts on mutual friends but I've structured my friends table in my db so that there are no duplicates e.g. (1,2) and not (2,1)
Create Table Friends(
user1_id int,
user2_id int
);
and then a constraint to make sure user1 id is always smaller than user2 id e.g 4 < 5
Mutual friends sql with join (Mysql)
I see suggestions that to find mutual friends it can be found using a join, so this is what I have but I think it's wrong because if I count the data in my db with the actual result from the query I get different results
select f1.user1_id as user1, f2.user1_id as user2, count(f1.user2_id) as
mutual_count from Friends f1 JOIN Friends f2 ON
f1.user2_id = f2.user2_id AND f1.user1_id <> f2.user1_id GROUP BY
f1.user1_id, f2.user1_id order by mutual_count desc
There are three join scenarios that I can see.
1 -> 2 -> 3 (mutual friend id between other IDs)
2 -> 3 -> 1 (mutual friend id > other IDs)
2 -> 1 -> 3 (mutual friend id < other IDs)
This can be resolved with this predicate...
ON f1.user1_id IN (f2.user1_id, f2.user2_id)
OR f1.user2_id IN (f2.user1_id, f2.user2_id)
AND <not joining the row to Itself>
But that will totally mess up the optimiser's ability to use indexes.
So, I'd union multiple queries.
(pseudo code as I'm on a phone)
SELECT u1, u2, COUNT(*) FROM
(
SELECT f1.u1, f2.u2 FROM f1 INNER JOIN f2 ON f1.u2 = f2.u1 AND f1.u1 <> f2.u2
UNION ALL
SELECT f1.u1, f2.u1 FROM f1 INNER JOIN f2 ON f1.u2 = f2.u2 AND f1.u1 <> f2.u1
UNION ALL
SELECT f1.u2, f2.u2 FROM f1 INNER JOIN f2 ON f1.u1 = f2.u1 AND f1.u2 <> f2.u2
) all_combinations
GROUP BY u1, u2
Each individual query will then be able to fully utilise indexes. (Put one index on u1 and another index on u2)
The result should be less esoteric code (with fairly long CASE statements) and a much lower costed execution plan.

Select rows with Left Outer Join and condition - MySQL

PEOPLE PEOPLE_FAVS
id user_id fav_id
------ ------- ----------
1 1 1
2 1 2
3 1 5
4 2 1
5 2 2
6
I have two tables PEOPLE and PEOPLE_FAVS, I am trying to get all PEOPLE which have not favorited number '5' so it should return
PEOPLE
id
------
2
3
4
5
6
I'm trying with this query:
SELECT `people`.`id`
FROM `people`
LEFT OUTER JOIN `people_favs` ON (`people_favs`.`user_id` = `people`.`id`)
WHERE (`people_favs`.`fav_id` != 5)
GROUP BY `people`.`id`
Here is a SQL Fiddle: http://sqlfiddle.com/#!2/4102b8/3
SELECT p.*
FROM people p
LEFT
JOIN people_favs pf
ON pf.user_id = p.id
AND pf.fav_id = 5
WHERE pf.fav_id IS NULL
http://sqlfiddle.com/#!2/665b6/1
You don't actually need to use an outer join. Outer joins are often used when you want to see ALL rows from one table, regardless of their condition with another. While it would work in this case (as seen by Strawberry's example), you can use the NOT EXISTS operator to check for ids that do not have 5 as a favorite.
As far as I am aware, there is little to no performance difference, but this query is a little shorter. I also feel it is a little more logical, because you aren't really joining information. That's just a personal opinion/thought though.
Try this:
SELECT id
FROM people
WHERE NOT EXISTS(SELECT id FROM people_favs WHERE fav_id = 5 AND user_id = id);
SQLFiddle example using your data.
Did you try to simply do this:
SELECT DISTINCT `people`.`id`
FROM `people`
JOIN `people_favs` ON (`people_favs`.`user_id` = `people`.`id`)
WHERE (`people_favs`.`fav_id` <> 5)
GROUP BY `people`.`id`

Finding mutual friend in one way relationship table

want mysql query for finding mutual friend between two friend but
I am maintain the friendship of user in one way relationship for ex.
first is users table
id name
1 abc
2 xyz
3 pqr
Now second table is friend
id user_id friend_id
1 1 2
2 1 3
3 2 3
Now here i can say that abc(id=1) is friend of xyz(id=2) now similar way the xyz is friend of abc but now i want to find mutual friend between abc(id=1) and xyz(id=2) that is pqr so I want mysql query for that.
REVISED
This query will consider the "one way" relationship of a row in the friend table to be a "two way" relationship. That is, it will consider a friend relationship: ('abc','xyz') to be equivalent to the inverse relationship: ('xyz','abc'). (NOTE: we don't have any guarantee that both rows won't appear in the table, so we need to be careful about that. The UNION operator conveniently eliminates duplicates for us.)
This query should satisfy the specification:
SELECT mf.id
, mf.name
FROM (
SELECT fr.user_id AS user_id
, fr.friend_id AS friend_id
FROM friend fr
JOIN users fru
ON fru.id = fr.user_id
WHERE fru.name IN ('abc','xyz')
UNION
SELECT fl.friend_id AS user_id
, fl.user_id AS friend_id
FROM friend fl
JOIN users flf
ON flf.id = fl.friend_id
WHERE flf.user IN ('abc','xyz')
) f
JOIN users mf
ON mf.id = f.friend_id
GROUP BY mf.id, mf.name
HAVING COUNT(1) = 2
ORDER BY mf.id, mf.name
SQL Fiddle here http://sqlfiddle.com/#!2/b23a5/2
A more detailed explanation of how we arrive at this is given below. The original queries below assumed that a row in the friend table represented a "one way" relationship, in that "'abc' ff 'xyz'" did not imply "'xyz' ff 'abc'". But additional comments from the OP hinted that this was not the case.
If there is a unique constraint on friend(user_id,friend_id), then one way to get the result would be to get all of the friends of each user, and get a count of rows for that friend. If the count is 2, then we know a particular friend_id appears for both user 'abc' and for 'xyz'
SELECT mf.id
, mf.name
FROM friend f
JOIN users uu
ON uu.id = f.user_id
JOIN users mf
ON mf.id = f.friend_id
WHERE uu.name IN ('abc','xyz')
GROUP BY mf.id, mf.name
HAVING COUNT(1) = 2
ORDER BY mf.id, mf.name
(This approach can also be extended to find a mutual friend of three or more users, by including more users in the IN list, and changing the value we compare the COUNT(1) to.
This isn't the only query that will return the specified resultset; there are other ways to get it as well.
Another way to get an equivalent result:
SELECT u.id
, u.name
FROM ( SELECT f1.friend_id
FROM friend f1
JOIN users u1
ON u1.id = f1.user_id
WHERE u1.name = 'abc'
) t1
JOIN ( SELECT f2.friend_id
FROM friend f2
JOIN users u2
ON u2.id = f2.user_id
WHERE u2.name = 'xyz'
) t2
ON t2.friend_id = t1.friend_id
JOIN users u
ON u.id = t1.friend_id
ORDER BY u.id, u.name
NOTES
These queries do not check whether user 'abc' is a friend of 'xyz' (the two user names specified in the WHERE clause). It is only finding the common friend of both 'abc' and 'xyz'.
FOLLOWUP
The queries above satisfy the specified requirements, and all the examples and test cases provided in the question.
Now it sounds as if you want a row in that relationship table to be considered a "two way" relationship rather than just a "one way" relationship. It sounds like you want to want to consider the friend relationship ('abc','xyz') equivalent to ('xyz','abc').
To get that, then all that needs to be done is to have the query create the inverse rows,, and that makes it easier to query. We just need to be careful that if both those rows ('abc','xyz') and ('xyz','abc') already exist, that we don't create duplicates of them when we invert them.
To create the inverse rows, we can use a query like this. (It's simpler to look at this when we don't have the JOIN to the users table, and we use just the id value:
SELECT fr.user_id
, fr.friend_id
FROM friend fr
WHERE fr.user_id IN (1,2)
UNION
SELECT fl.friend_id AS user_id
, fl.user_id AS friend_id
FROM friend fl
WHERE fl.friend_id IN (1,2)
It's simpler if we don't include the predicates on the user_id and friend_id table, but that could be a very large (and expensive) rowset to materialize.
try this:
given that you want to get the mutual friends of friends 1 & 2
select friend_id into #tbl1 from users where user_id = 1
select friend_id into #tbl2 from users where friend_id = 2
select id, name from users where id in(select friend_id from #tbl1 f1, #tbl2 f2 where f1.friend_id=f2.friend_id)

Building SQL query on a one-to-many relationship

I have a search page where I am trying to build a complex search condition on two tables which look something like:
Users
ID NAME
1 Paul
2 Remy
...
Profiles
FK_USERS_ID TOPIC TOPIC ID
1 language 1
1 language 2
1 expertise 1
1 expertise 2
1 expertise 3
2 language 1
2 language 2
The second table Profiles, lists the "languages" or the "expertises" (among other stuff) of each user, and topic id is a foreign key to another table depending on the topic (if topic is "language", than topic ID is the ID of a language in the languages table, etc...).
The search needs to find something like where user name LIKE %PAU% and the user "has" language 1 and has language 2 and has expertise 1 and has expertise 2.
Any help would be really appreciated! I am performing a LEFT JOIN on the two tables although I am not sure that is the correct choice. My main problem lies on the "AND". The same user has to have both languages 1 and 2, and at the same time expertise 1 and 2.
I work in PHP and I usually try to avoid inner SELECTs and even joins, but I think an inner SELECT is imminent here?
You can accomplish this by building a set of users that matches the criterias from your profile tables, something like this:
SELECT FK_USERS_ID
FROM Profiles
WHERE topic='x'
AND TOPIC_ID IN (1,2)
GROUP BY FK_USERS_ID
HAVING COUNT(1) = 2
Here you list your users that matches the topics you need. By grouping by the user id and specifying the amount of rows that should be returned, you can effectively say "only those that has x and y in topic z. Just make sure that the COUNT(1) = x has the same number of different TOPIC_IDs to look for.
You can then query the user table
SELECT ID
FROM Users
WHERE name like '%PAU%'
AND ID IN (<insert above query here>)
You can also do it in a join and a derived table, but the essence should be explained above.
EDIT:
if you are looking for multiple combinations, you can use mysql's multi-column IN:
SELECT FK_USERS_ID
FROM Profiles
WHERE (topic,topic_id) IN (('x',3),('x',5),('y',3),('y',6))
GROUP BY FK_USERS_ID
HAVING COUNT(1) = 4
This will look for uses matching the pairs x-3, x-5, y-3 and y-6.
You should be able to build the topic-topic_id pairs easily in php and stuffing it into the SQL string, and also just counting the number of pairs you generate into a variable for using for the count(1) number. See http://www.mysqlperformanceblog.com/2008/04/04/multi-column-in-clause-unexpected-mysql-issue/ for performance talk using this approach.
Isn't it just a simple classical INNER JOIN?
SELECT
p.topic, p.topic_id
FROM
profiles p
INNER JOIN
users u
ON
u.id = p.fk_users_id
WHERE
u.name LIKE '%Paul%'
This query would return all the languages and expertise with their IDs for the users matching the pattern, in this case containing Paul in their name. Is this what you like? Or something else?
select *
from users u, profiles p
where u.id = p.fk_users_id
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'language'
and topic_id = 1)
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'language'
and topic_id = 22)
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'expertise'
and topic_id = 1)
and exists (select 1
from profiles
where fk_users_id = u.id
and topic = 'expertise'
and topic_id = 1)
and u.name like '%PAU%'
EDIT:
Ok, a slight variation on #cairnz' answer:
SELECT ID
FROM Users
WHERE name like '%PAU%'
AND ID IN (SELECT FK_USERS_ID
FROM Profiles
WHERE topic='x'
AND ((TOPIC_ID = 1 AND TOPIC = 'language')
OR (TOPIC_ID = 2 AND TOPIC = 'language')
OR (TOPIC_ID = 1 AND TOPIC = 'expertise')
OR (TOPIC_ID = 2 AND TOPIC = 'expertise'))
GROUP BY FK_USERS_ID
HAVING COUNT(1) = 4)
I would do based on JOIN conditions multiple times against each condition that you are "requiring". I would also ensure an index on the Profiles table based on the each part of the key looking for... (FK_User_ID, Topic_ID, Topic)
SELECT STRAIGHT_JOIN
U.ID
FROM Users U
JOIN Profiles P1
on U.ID = P1.FK_User_ID
AND P1.Topic_Id = 1
AND P1.Topic = "language"
JOIN Profiles P2
on U.ID = P2.FK_User_ID
AND P2.Topic_Id = 2
AND P2.Topic = "language"
JOIN Profiles P3
on U.ID = P3.FK_User_ID
AND P3.Topic_Id = 1
AND P3.Topic = "expertise"
JOIN Profiles P4
on U.ID = P4.FK_User_ID
AND P4.Topic_Id = 2
AND P4.Topic = "expertise"
WHERE
u.name like '%PAU%'
This way, any additional criteria as expressed in other answer provided shouldn't be too much an impact. The tables are setup by the criteria as if simultaneous, and if any are missing, they will be excluded from the result immediately instead of trying to do a sub-select counting for every entry (which I think might be the lag you are encountering).
So, each of your "required" criteria would take the same "JOIN" construct, and as you can see, I'm just incrementing the "alias" of the join instance.