SQL query with mutual user relationship - mysql

I'm making a SNS that users can follow each other. If user A follows user B and user B also follows user A, they become friends.
Also consider that some popular people(like movie stars) may be followed by hundreds of thousands times, but a user can follow 1000 people max.
So given the table below, what is the best SQL query to fetch all friends' ids of user 1?
PS: I'm using MySQL 5.5.
Here is what I have done so far:
SELECT followee_id AS friend_id FROM follow
WHERE follower_id = 1 AND
followee_id IN (SELECT follower_id FROM follow
WHERE followee_id = 1);
CREATE TABLE follow
(
follower_id INT UNSIGNED NOT NULL,
followee_id INT UNSIGNED NOT NULL,
PRIMARY KEY (follower_id, followee_id),
INDEX (followee_id, follower_id)
);

Assuming that by 'best' you mean most performant, and given that a following must be mutual in order to meet your 'friend' criteria:
A filter using followee_id will hit your index better than a filter on follower_id
select
me.follower_id
from
follow me inner join
follow you
on
me.follower_id = you.followee_id
and me.followee_id = you.follower_id
where
me.followee_id = #user
(although note that RDBMS's like MSSQL will default to using your Primary Key as a clustered index, in which case its much of a muchness really.)

Related

SQL - Join two distinct result sets

I'm new with SQL and just had my first assignment.
I have the following requirements:
Given is a database of two tables. The first one contains information about the user, like a unique ID per user, the phone number and the city. ID and phone number consist only of numeric digits. The second table contains data about so called „credits“, which a user can own. Again there is a column for the unique user ID, but also the number, the date and the type of credits. A user can have none, one or several entries in the credit table.
I'm still now sure if I got right the part where a user can have none, one or several entries in the credit table. I created these two tables:
CREATE table user
(
user_id INT NOT NULL UNIQUE AUTO_INCREMENT,
user_phone_number INT NOT NULL,
user_city VARCHAR(32) NOT NULL,
PRIMARY KEY (user_id)
);
CREATE table credit
(
credit_user_id INT FOREIGN KEY (user_id),
credit_date date,
credit_number double,
credit_type char(10),
CONSTRAINT chk_type CHECK (credit_type in ('None','A','B','C')),
);
After creating this, I was asked the following questions:
a) The phone number of all users, who own credits of type „A“
SELECT user_phone_number
FROM user, credit
WHERE credit_type = 'A';
b) Like a), but additionally the credit_number of the credits is smaller than 2 or greater than 4
SELECT user_phone_number
FROM user, credit
WHERE (credit_type ='A')
AND (credit_number < 2 OR credit_number > 4);
C) Like a), but additionally the users also own credits of at least one other type.
SELECT user_phone_number
FROM user, credit
WHERE credit_type = 'A'
AND (
SELECT DISTINCT c1.credit_type FROM credit AS c1
JOIN credit a1 ON (c1.credit_type=a1.credit_type)
JOIN credit a2 ON (c1.credit_type=a2.credit_type)
WHERE a2.credit_type<>a1.credit_type);
My problem is that I can't make letter C work, even if both selects seem to work separately. Any ideas or suggestions would be appreciated, thank you!
I'm not sure to understand what you want in C) but many things to say.
You should't use a database name like 'user' because it can be ambigous (reserved word) for SGBD.
You should prefer 'join' instead of 'from table1, table2' and / or mix both.
Have a look here.
You've got ';' in your request in C) which must be only for specify the end of your request.
You can use nested resquet but not like that, not directly after 'AND' because AND is for condition like a comparison. You've got many possibilites : in select fied, after 'FROM', after 'IN', with join, in condition...
Quick search on google.
From another post:
PRIMARY KEY(x), UNIQUE(x) -- Since a PRIMARY KEY is by definition (in MySQL) UNIQUE...
Since you want to find everyone with 2 kinds of credit, I'd try to make a query like if I was looking for duplicates, here's two ways to do that:
With subquery
Find duplicate records in MySQL
Without
Finding duplicate values in MySQL
Welcome to SO! Here's an approach using the nested query style you're trying to use. I've used explicit JOINs rather than FROM user, credit in the FROM clause, because this makes it clearer that it's a join.
Say your users table looks like this -
user_id user_phone_number user_city
6 75771 Leeds
7 75772 Wakefield
8 75773 Dewsbury
9 75774 Heckmondwike
10 75775 Huddersfield
And your credit table looks like this -
credit_user_id credit_date credit_number credit_type
7 2017-02-13 2 A
7 2017-02-13 2 B
6 2017-02-13 2 A
8 2017-02-13 4 B
The nested query in the AND clause returns records where the credit_type is not A, and the WHERE in the main query selects all records where the credit_type is is A, so if the record appears in both, the user must have two types of credit -
SELECT user_phone_number
FROM [user] AS u
JOIN credit AS c ON u.user_id = c.credit_user_id
WHERE credit_type = 'A'
AND u.user_id IN (
SELECT user_id
FROM [user] AS u
JOIN credit AS c ON u.user_id = c.credit_user_id
WHERE credit_type <> 'A')
As you can see from the tables, the user with the id of 7 has credit both of type A and B, so we end up with -
user_phone_number
75772
I'd agree that you might want to consider some of the points others have raised above, but won't repeat.

Enhancing performance of SQL query

I am running a query on three tables messages, message_recipients and users.
Table structure of messages table:
id int pk
message_id int
message text
user_id int
...
Index for this table is on user_id, message_id and id.
Table structure of message_recipients table:
id int pk
message_id int
read_date datetime
user_id int
...
Index is on id, message_id and user_id.
Table structure of users table:
id int pk
display_name varchar
...
Index is on id.
I am running the following query against these tables:
SELECT
m.*,
if(m.user_id = 0, 'Campus Manager', u.display_name) AS name,
mr.read_date,
IF(m1.message_id > 0 and m1.user_id=1, true, false) as replied
FROM
messages m
JOIN
message_recipients mr
ON
mr.message_id = m.id
LEFT JOIN
users u
ON
u.UID = m.user_id
LEFT JOIN
messages m1
ON
m1.message_id = m.id
WHERE
mr.user_id = 1
AND
m.published = 1
GROUP BY
mr.message_id
ORDER BY
m.created DESC
EXPLAIN returns the following data for this query:
UPDATE
As suggested by #e4c5, I added new composite index on (published,user_id,created) and now the explain query shows this:
How can this query be optimized by adding required indexes (if any) as it is taking lot of time?
GROUP BY needs to list all the non-aggregated columns. I suspect that would be a mess. Why do you need GROUP BY at all?
Why are you linking messages.id to messages_id? Is this a hierarchical table, but the column names aren't like 'parent_id'?
"Index is on id, message_id and user_id" -- is that one composite index or 3 single-column indexes? (It makes a big difference.) It would be better to show us SHOW CREATE TABLE instead of ambiguously paraphrasing.
Is user_id=1 prolific? That is, are you expecting thousands of rows? Is this query only a problem for him?
Using LEFT JOIN implies that m1.message_id could be NULL, yet the reference to it seems to ignore that possibility.
If this is a single table that contains a message thread -- both the main info about the thread and the individual responses, then I suggest it is a bad design. (I made this mistake once upon a time.) I think it iis better to have a table with one row per thread and another table with one row per comment. 1 thread : many comments. So there would be a thread_id in the comment table.
I was able to bring down the query time from 3 seconds to 0.1 second by adding a new index to messages and message_recipients table and changing the database engine of messages table to MyISAM from InnoDB.
Composite index composite added on these columns with respective order on messages table - published, user_id, created
Composite index message_id_2 added on two columns on message_recipients table - message_id, user_id
EXPLAIN Query now shows

Database design for a Fantasy league

Here's the basic schema for my database
Table user{
userid numeber primary key,
count number
}
Table player{
pid number primary key,
}
Table user-player{
userid number primary key foreign key(user),
pid number primary key foreign key(player)
}
Table temp{
pid number primary key,
points number
}
Here's what I intend to do...
After every match the temp table is updated which holds the id of players that played the last match and the points they earned.
Next run a procedure that will match the pid from temp table with every uid of user-player table having the same pid.
add the points from temp table to the count of user table for every matching uid.
empty temp table.
My questions is considering 200 players and 10000 users,Will this method be efficient?
I am going to be using mysql for this.
People often seem to be worried about performance for small databases. Let the DBMS do what it is designed to do. If you find in practice - or preferably under load testing - that you have a performance problem, then take steps to deal with it. Otherwise don't pre-optimize!
Instead of using a temporary table to store one batch of player scores, store all player scores in a tranactional table.
Remove the user.count column and replace your temp table with something like this:
Table player_points{
pid number primary key,
match_date datetime primary key,
points number
}
With this you can easily calculate any user's score. You can even recalculate any user's score as of a given date. This is much more powerful and much simpler to maintain. Keeping a current snapshot only makes it impossible to manage should anything go wrong or should one of your users challenge their score.
This query gives you the scores for all users. You can add filters to it to do other things like finding the score for a single user or showing a leader board.
select
U.userid as UserID
, sum(S.points) as TotalScore
from user S
inner join user-player J
on S.userid = J.userid
inner join player_points S
on J.pid = S.pid
group by
U.userid
This query would give you a leader board:
select
U.userid as UserID
, sum(S.points) as TotalScore
from user S
inner join user-player J
on S.userid = J.userid
inner join player_points S
on J.pid = S.pid
group by
U.userid
order by TotalScore desc
limit 10
This query would give you points awarded to a user by date, which you could graph as-is or cumulatively, for example.
select
S.match_date as MatchDate
, sum(S.points) as TotalScore
from user-player J
inner join player_points S
on J.pid = S.pid
where J.userid = 123 -- The user ID you want.
group by
S.match_date
order by S.match_date

MySQL retrieve friends of friends structure and performance

I would simply like to find a database structure in MySQL to get all users friends of friends and the corresponding query to retrieve them. (friend links are bi-directional)
I have found a couple posts related to that, but my concern is the performance:
Structure 1
Many posts suggest a structure where you have a table in which each row represents a friendship link e.g:
CREATE TABLE `friends` (
`user_id` int(10) unsigned NOT NULL,
`friend_id` int(10) unsigned NOT NULL,
)
saying the user '1' has three friend '2','3','4' and user '2' has two friend '1','5' . Your friend table would look like this:
user_id | friend_id
1 | 2
1 | 3
1 | 4
2 | 1
2 | 5
friends of friends query: How to select friends of friends can be seen here SQL to get friends AND friends of friends of a user. The rsult of the query for user '1' is supposed to give (1,2,3,4,5)
My concern: The average fb-user has about 140 friends. Frequent users will have a lot more.
If I have 20.000 users this will end up in at least 3million rows.
Structure 2
If I could use a structure like this:
CREATE TABLE `friends` (
`user_id` int(10) unsigned NOT NULL,
`friend_1` int(10) unsigned NOT NULL,
`friend_2` int(10) unsigned NOT NULL,
`friend_3` int(10) unsigned NOT NULL,
`friend_4` int(10) unsigned NOT NULL,
....
)
My table would look like this (taking example from above):
user_id | friend_1 | friend_2 | friend_3 | ...
1 | 2 | 3 | 4 |
2 | 1 | 5 | |...
Now I have only 20.000 rows.
friends of friends query: To select user friends of friends I tried
Select * FROM friends as a
WHERE a.user_id
IN (
SELECT * FROM friends AS b
WHERE b.user_id = '1'
)
but I get an error "#1241 - Operand should contain 1 column(s) ". I think the problem is, that the sub-selection passes a row, not a column?
Questions
I hope you understand my concern. I would be really really happy about any input to these questions
1)
find a query that returns all friends of friends for a specified user in structure 2?
2)
Which structure allows me to return friends of friends quicker?
In structure 2 I think the "join row with column" could be slow, if its even possible to use a join here. Thank you for any suggestions. If you could think of any other structures, maybe taking advantage of the small-world-network-type I'd be happy to hear them.
THANK YOU!!
Definitely use the first structure. Queries for the second structure will be huge, hard to maintain and slow because of complicated clauses.
A fast enough query for the first approach:
(
select friend_id
from friends
where user_id = 1
) union (
select distinct ff.friend_id
from
friends f
join friends ff on ff.user_id = f.friend_id
where f.user_id = 1
)
For the best performance you need to have these indexes:
ALTER TABLE `friends` ADD UNIQUE INDEX `friends_idx` (`user_id` ASC, `friend_id` ASC);
ALTER TABLE `friends` ADD INDEX `friends_user_id_idx` (`user_id` ASC);
I'd say you ought to use the first structure. It's more flexible in my opinion. My solution for the query would be a simple sub-query, like this:
SELECT friend_id FROM friends WHERE user_id IN (
SELECT friend_id FROM friends WHERE user_id='$USER_ID'
);
EDIT: Sorry I just woke up and realized after posting a reply that this wasn't at all what you were looking for. Sry.
Don't use "Structure 2" you cannot create a column for all users if just 1 user have let's say 100 friends (what abaout 10K friends or more ?) it results in low performance, for structure 1 you can do a simple join to the same table:
select u.user_id, f.friend_id
from friends as u
inner join friends as f
on (u.friend_id=f.friend_id);
EDIT:
you're error #1241 means that you use * in the subselect and the table returns more than 1 column, your subquery should return just one colums (no mather how many rows), so change your "*" with a "user_id" (without quotes)
Solution 1 is not only faster it is flexible, I don't recomand a subquery for a simple select like this, just join the same table (it's much faster than a subselect).
Solution 2 in my opinion is not a solution at all, it's not flexible, it slower, it uses more space on HD, more columns means less performance in mysql. How can you index such a thing? And how can you select by friend_id not by user_id, you look in every column for that friend_id ?
As the below answers state, solution 1 is preferred to solution 2. Also solution 1 will work out for a decent amount of data.
However, when things go bigger there is also a third solution - Graph Databases.
When your data model focuses on the "relations" instead of the "objects" RDBMSs don't scale well since they have to perform lookups through the tables concerned. DB Indexes make this easier but it was not enough so Graph Databases came to the rescue.
A Graph DB actually "stores" the relations next to each entity making it much faster to perform tasks like yours.
Here is some information to get you started:
http://www.slideshare.net/maxdemarzi/graph-database-use-cases
Neo4j or OrientDB are among the popular choices.

Please help me write a mysql query to select members of a particular department

Let's say I have 2 tables, employee and department, department has 2 departments, dpt1 and dpt2, and there are 12 employees, 6 is each department, and 4 belonging to both departments. How can I use a mysql query to find those employees who belong only to dpt2 and not dpt1?
SELECT employee.*
FROM employee
INNER JOIN employee_belongs_to_departments
WHERE employee.id = employee_belongs_to_departments.employee_id
AND employee_belongs_to_departments.department_id = 'dpt2'
AND employee.id NOT IN (
SELECT DISTINCT ebtd2.employee_id
FROM employee_belongs_to_departments ebtd2
WHERE ebtd2.employee_id = employee.id
AND ebtd2.department_id = 'dpt1'
)
The usual approach in this kind of situation would be to have a third "connection" table, called something like employeeDepartment.
CREATE TABLE employeeDepartment (
employee INT UNSIGNED NOT NULL,
department INT UNSIGNED NOT NULL,
PRIMARY KEY (employee, department)
)
and then you also make employee and department foreign keys to the primary keys of the appropriate tables, though this requires you to be using the right storage engine (not that that is hard to do).
This makes doing various things a whole lot easier than the approach you seem to be going for (storing a list of values in one column).
You could then use
SELECT
eD1.employee
FROM
employeeDepartment AS eD1
LEFT JOIN employeeDepartment AS eD2 ON
eD1.employee = eD2.employee AND
eD2.department = 'dpt2' -- replace with appropriate integer
-- if using integers as I suggested
WHERE
eD1.department = 'dpt1' AND -- same comment applies
eD2.employee IS NULL