How can I optimize this SQL query with a large IN clause? - mysql

I have a fairly complicated operation that I'm trying to perform with just one SQL query but I'm not sure if this would be more or less optimal than breaking it up into n queries. Basically, I have a table called "Users" full of user ids and their associated fb_ids (id is the pk and fb_id can be null).
+-----------------+
| id | .. | fb_id |
|====|====|=======|
| 0 | .. | 12345 |
| 1 | .. | 31415 |
| .. | .. | .. |
+-----------------+
I also have another table called "Friends" that represents a friend relationship between two users. This uses their ids (not their fb_ids) and should be a two-way relationship.
+----------------+
| id | friend_id |
|====|===========|
| 0 | 1 |
| 1 | 0 |
| .. | .. |
+----------------+
// user 0 and user 1 are friends
So here's the problem:
We are given a particular user's id ("my_id") and an array of that user's Facebook friends (an array of fb_ids called fb_array). We want to update the Friends table so that it honors a Facebook friendship as a valid friendship among our users. It's important to note that not all of their Facebook friends will have an account in our database, so those friends should be ignored. This query will be called every time the user logs in so it can update our data if they've added any new friends on Facebook. Here's the query I wrote:
INSERT INTO Friends (id, friend_id)
SELECT "my_id", id FROM Users WHERE id IN
(SELECT id FROM Users WHERE fb_id IN fb_array)
AND id NOT IN
(SELECT friend_id FROM Friends WHERE id = "my_id")
The point of the first IN clause is to get the subset of all Users who are also your Facebook friends, and this is the main part I'm worried about. Because the fb_ids are given as an array, I have to parse all of the ids into one giant string separated by commas which makes up "fb_array." I'm worried about the efficiency of having such a huge string for that IN clause (a user may have hundreds or thousands of friends on Facebook). Can you think of any better way to write a query like this?
It's also worth noting that this query doesn't maintain the dual nature of a friend relationship, but that's not what I'm worried about (extending it for this would be trivial).

If I am not mistaken, your query can be simplified, if you have a UNIQUE constraint on the combination (id, friend_id), to:
INSERT IGNORE INTO Friends
(id, friend_id)
SELECT "my_id", id
FROM Users
WHERE fb_id IN fb_array ;
You should have index on User (fb_id, id) and test for efficiency. if the number of the itmes in the array is too big (more than a few thousands), you may have to split the array and run the query more than once. Profile with your data and settings.

Depends on if if the following columns are nullable (value can be NULL):
USERS.id
FRIENDS.friend_id
Nullable:
SELECT DISTINCT
"my_id", u.id
FROM Users u
WHERE u.fb_id IN fb_array
AND u.id NOT IN (SELECT f.friend_id
FROM FRIENDS f
WHERE f.id = "my_id")
Not Nullable:
SELECT "my_id", u.id
FROM Users u
LEFT JOIN FRIENDS f ON f.friend_id = u.id
AND f.id = "my_id"
WHERE u.fb_id IN fb_array
AND f.fried_id IS NULL
For more info:
http://explainextended.com/2010/05/27/left-join-is-null-vs-not-in-vs-not-exists-nullable-columns/
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Speaking to the number of values in your array
The tests run in the two articles mentioned above contain 1 million rows, with 10,000 distinct values.

Related

How does stackoverflow find users to give them their notifications?

I'm creating a website like SO. Now I want to know, when I write a comment under Jack's answer/question, what happens? SO sends a notification to Jack, right? So how SO finds Jack?
In other word, should I store author-user-id in the Votes/Comments tables? Here is my current Votes-table structure:
// Votes
+----+---------+------------+---------+-------+------------+
| id | post_id | table_code | user_id | value | timestamp |
+----+---------+------------+---------+-------+------------+
// ^ this column stores the user-id who has sent vote
// ^ because there is multiple Posts table (focus on the Edit)
Now I want to send a notification for post-owner. But I don't know how can I find him? Should I add a new column on Votes table named owner and store the author-id ?
Edit: I have to mention that I have four Posts tables (I know this structure is crazy, but in reality the structure of those Posts tables are really different and I can't to create just one table instead). Something like this:
// Posts1 (table_code: 1)
+----+-------+-----------+
| id | title | content |
+----+-------+-----------+
// Posts2 (table_code: 2)
+----+-------+-----------+-----------+
| id | title | content | author_id |
+----+-------+-----------+-----------+
// Posts3 (table_code: 3)
+----+-------+-----------+-----------+
| id | title | content | author_id |
+----+-------+-----------+-----------+
// Posts4 (table_code: 4)
+----+-------+-----------+
| id | title | content |
+----+-------+-----------+
But the way, Just some of those Post tables have author_id column (Because I have two Posts tables which are not made by the users). So, as you see, I can't create a foreign key on those Posts tables.
What I need: I want a TRIGGER AFTER INSERT on Votes table which send a notification to the author if there is a author_id column. (or a query which returns author_id if there is a author_id). Or anyway a good solution for my problem ...
Votes.post_id should be a foreign key into the Posts table. From there you can get Posts.author_id, and send the notification to that user.
With your multiple Posts# tables, you can't use a real foreign key. But you can write a UNION query that joins with the appropriate table depending on the table_code value.
SELECT p.author_id
FROM Votes AS v
JOIN Posts2 AS p ON p.id = v.post_id
WHERE v.table_code = 2
UNION
SELECT p.author_id
FROM Votes AS v
JOIN Posts3 AS p ON p.id = v.post_id
WHERE v.table_code = 3
Try to avoid storing data that you can get by following foreign keys, so that the information is only stored one place. If you run into performance problems because of excessive joining, you may need to violate this normalization principle, but only as a last resort.

MySQL query get column value similar to given

Sorry if my question seems unclear, I'll try to explain.
I have a column in a row, for example /1/3/5/8/42/239/, let's say I would like to find a similar one where there is as many corresponding "ids" as possible.
Example:
| My Column |
#1 | /1/3/7/2/4/ |
#2 | /1/5/7/2/4/ |
#3 | /1/3/6/8/4/ |
Now, by running the query on #1 I would like to get row #2 as it's the most similar. Is there any way to do it or it's just my fantasy? Thanks for your time.
EDIT:
As suggested I'm expanding my question. This column represents favourite artist of an user from a music site. I'm searching them like thisMyColumn LIKE '%/ID/%' and remove by replacing /ID/ with /
Since you did not provice really much info about your data I have to fill the gaps with my guesses.
So you have a users table
users table
-----------
id
name
other_stuff
And you like to store which artists are favorites of a user. So you must have an artists table
artists table
-------------
id
name
other_stuff
And to relate you can add another table called favorites
favorites table
---------------
user_id
artist_id
In that table you add a record for every artist that a user likes.
Example data
users
id | name
1 | tom
2 | john
artists
id | name
1 | michael jackson
2 | madonna
3 | deep purple
favorites
user_id | artist_id
1 | 1
1 | 3
2 | 2
To select the favorites of user tom for instance you can do
select a.name
from artists a
join favorites f on f.artist_id = a.id
join users u on f.user_id = u.id
where u.name = 'tom'
And if you add proper indexing to your table then this is really fast!
Problem is you're storing this in a really, really awkward way.
I'm guessing you have to deal with an arbitrary number of values. You have two options:
Store the multiple ID's in a blob object in JSON format. While MySQL doesn't have JSON functions built in, there are user defined functions that will extract values for you, etc.
See: http://blog.ulf-wendel.de/2013/mysql-5-7-sql-functions-for-json-udf/
Alternatively, switch to PostGres
Add as many columns to your table as the maximum number of ID's you expect to have. So if /1/3/7/2/4/8/ is the longest entry, have 6 columns in your table. Reason this is bad: you'll have sparse columns that'll unnecessarily slow your tables.
I'm sure you could write some horrific regex to accomplish the task, but I caution on using complex regex's on enormous tables.

MySQL: Possible to apply the OR operator across multiple selected rows?

I have three MySQL tables: users, roles and positions.
The users table is pretty self-explanatory. The roles table is a list of job titles a person might hold, such as janitor, president, manager, etc. The roles table also has a long array of boolean permissions, such as access_basement or user_directory_access. If the role has has that bit value set to false (or "0") that role lacks that permission.
Where it gets tricky is that a user might have multiple roles, hence why they are connected by the positions table, which is simply a pairing of the userId and roleId fields. So if I perform a query like:
SELECT * FROM users
LEFT JOIN positions ON users.userId=positions.userId
LEFT JOIN roles ON roles.roleId=positions.roleId
WHERE users.userId=123
I might get results like:
+---------+-----------+-----------------+-----------------------+
| name | title | basement_access | user_directory_access |
+---------+-----------+-----------------+-----------------------+
| Bob | Janitor | true | false |
+---------+-----------+-----------------+-----------------------+
| Bob | President | false | true |
+---------+-----------+-----------------+-----------------------+
Since Bob has two roles, but has different access with each, I'd like to combine the results with a since MySQL query and the logical OR operation across all rows, resulting in a table like:
+---------+-----------------+-----------------------+
| name | basement_access | user_directory_access |
+---------+-----------------+-----------------------+
| Bob | true | true |
+---------+-----------------+-----------------------+
So the question is: is it possible to apply the OR operator across multiple selected MySQL rows?
Thanks!
One way to solve this is if you use values 0 and 1 as role permissions. And use some query like:
SELECT u.name, SUM(r.basement_access) AS basement_access, SUM(r.user_directory_access) AS user_directory_access
FROM users u
LEFT JOIN positions p ON u.userId=p.userId
LEFT JOIN roles r ON r.roleId=p.roleId
WHERE u.userId=123
GROUP BY u.userId;

add friends/contacts from a table of users to show changes as they happen

My system uses a table of users with several elements including their online status. Now I would like to add a feature so each user can add a contact i.e an existing user from the database into their list of friends or contacts showing their name and online status. However I need any changes in the online status to updated on the contact list as they happen so creating a new table wouldn't help.
I have been looking into views for this but don't have too much experience with databases to I would like to know if this is the correct way of going about it and a bit more detail on how to do it.
Here are the steps I was thinking of:
When users registers, create a view i.e view_name = username_view.
To add a contact select data from main users table and add to user's view
To delete a contact delete selected data from view.
I am not sure if this is possible with views so if it isn't can some please help me out.
Thanks.
ER databases = Entites and Relations databases handle that by have one table whit the entites "users", and one table for the relationship "friends" that connect a user to another user.
a query for that can be:
SELECT users.*
FROM friends
LEFT JOIN users ON (users.user_id = friends.friend_id)
WHERE friends.user_id = :user
Exemple:
a user table, for the entity user
CREATE TABLE user (user_id SERIAL, name TINYTEXT NOT NULL);
exemple users
user_id | name
1 | Anna
2 | Bertil
3 | Carl
4 | David
5 | Erik
a friends tabel, for relations between two users, (not one per user, just one)
CREATE TABLE friends (
user_id BIGINT UNSIGNED NOT NULL,
friend_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, friend_id)
);
if Anna adds Beril and Carl as friends, and Carl adds David and Erik as friends, the content of the table going to be:
user_id | friend_id
1 | 2
1 | 3
3 | 4
3 | 5
if we want to list the names of Annas friends, and we allredy know that Annas user_id = 1, then we can use this query (like the one above)
SELECT users.name
FROM friends
LEFT JOIN users ON (users.user_id = friends.friend_id)
WHERE friends.user_id = 1

Table structure for storing communities in MySQL

I have a list of communities. Each community has a list of members. Currently I am storing each community in a row with the member names separated by a comma. This is good for smaller immutable communities. But as the communities are growing big, let us say with 75,000 members, loading of communities is becoming slower. Also partial loading of a community (let us say random 10 members) is also not very elegant. What would be the best table structure for the communities table in this scenario? Usage of multiple tables is also not an issue if there is a reason for doing that.
Use three tables
`community`
| id | name | other_column_1 | other_column_2 ...
`user`
| id | name | other_column_1 | other_column_2 ...
`community_user`
| id (autoincrement) | community_id | user_id |
Then to get user info for all users in a community you do something like this
SELECT cu.id AS entry_id, u.id, u.name FROM `community_user` AS cu
LEFT JOIN `user` AS u
ON cu.user_id = u.id
WHERE cu.community_id = <comminuty id>