best way to store 1:1 user relationships in relational database - mysql

What is the best way to store user relationships, e.g. friendships, that must be bidirectional (you're my friend, thus I'm your friend) in a rel. database, e.g. MYSql?
I can think of two ways:
Everytime a user friends another user, I'd add two rows to a database, row A consisting of the user id of the innitiating user followed by the UID of the accepting user in the next column. Row B would be the reverse.
You'd only add one row, UID(initiating user) followed by UID(accepting user); and then just search through both columns when trying to figure out whether user 1 is a friend of user 2.
Surely there is something better?

I would have a link table for friends, or whatever, with 2 columns both being PK's, and both being FK's to the User table.
Both columns would be the UID, and you would have two rows per friend relationship (A,B and B,A). As long as both columns are PK's, it should still be in normal format (although others are free to correct me on this)
Its a little more complex of a query, but nothing that can't be abstracted away by a stored procedure or some business logic, and its in Normal Format, which is usually nice to have.

You could check which of the two user_id's is the lowest and store them in a specific order. This way you don't need double rows for one friendship and still keep your queries simple.
user_id_low | user_id_high
a simple query to check if you're already friends with someone would be:
<?php
$my_id = 2999;
$friend_id = 500;
$lowest = min($my_id, $friend_id);
$highest= max($my_id, $friend_id);
query("SELECT * FROM friends WHERE user_id_low=$lowest AND user_id_high=$highest");
?>
Or you could find the lowest/higest userid using mysql
<?php
query("SELECT * FROM friends WHERE user_id_low=LEAST($my_id, $friend_id) AND user_id_high=GREATEST($my_id, $friend_id)");
?>
And to get all your friends id's
<?php
query("SELECT IF(user_id_low=$my_id,user_id_high,user_id_low) AS friend_id FROM friends WHERE $my_id IN (user_id_low, user_id_high)");
?>

Using double rows, while it creates extra data, will greatly simplify your queries and allow you to index smartly. I also remember seeing info on Twitter's custom MySQL solution wherein they used an additional field (friend #, basically) to do automatic limiting and paging. It seems pretty smooth:
https://blog.twitter.com/2010/introducing-flockdb

Use a key value store, such as Cassandra for example.

Related

sql query to check many interests are matched

So I am building a swingers site. The users can search other users by their interests. This is only part of a number of parameters used to search a user. The thing is there are like 100 different interests. When searching another user they can select all the interests the user must share. While I can think of ways to do this, I know it is important the search be as efficient as possible.
The backend uses jdbc to connect to a mysql database. Java is the backend programming language.
I have debated using multiple columns for interests but generating the thing is the sql query need not check them all if those columns are not addressed in the json object send to the server telling it the search criteria. Also I worry i may have to make painful modifications to the table at a later point if i add new columns.
Another thing I thought about was having some kind of long byte array, or number (used like a byte array) stored in a single column. I could & this with another number corresponding to the interests the user is searching for but I read somewhere this is actually quite inefficient despite it making good sense to my mind :/
And all of this has to be part of one big sql query with multiple tables joined into it.
One of the issues with me using multiple columns would be the compiting power used to run statement.setBoolean on what could be 40 columns.
I thought about generating an xml string in the client then processing that in the sql query.
Any suggestions?
I think the correct term is a Bitmask. I could maybe have one table for the bitmask that maps the users id to the bitmask for querying users interests, and another with multiple entries for each interest per user id for looking up which user has which interests efficiently if I later require this?
Basically, it would be great to have a separate table with all the interests, 2 columns: id and interest.
Then, have a table that links the user to the interests: user_interests which would have the following columns: id,user_id,interest_id. Here some knowledge about many-to-many relations would help a lot.
Hope it helps!

MySQL: Best DB model for a "User referral" system

I'm modeling a DB for an application where one of the functions is to get a user from the DB and display in a diagram the selected user and all the referrals of that selected user, and the referrals (if any) for the selected user's referrals, going that way up to 3 referral levels.
I have two theories on how to model a scheme to accomplish this, but I don't know which one is the "best" (in terms of optimization, normalization, etc).
We have one scheme where the referrals are stored in a different table, with only a BOOLEAN to show if the user is, in fact, referred from another user.
On the other hand, I can substitute the BOOLEAN with a nullable INT (if referred, just store an INT, if null meaning is not referred by anyone).
If there is a better way to accomplish this, suggestions are also welcomed. Thank you.
I'd suggest a third model. You're talking a model where you can optionally have a referral, so perhaps a table that joins twice to the users table, the first column the refered persons ID, and in the second column, the referring users ID.
Then you know if there is a referral by joining to this table in your queries
I would use the second design and perhaps add a closure table for easily finding the referrals across multiple levels (see: http://wiki.pentaho.com/display/EAI/Closure+Generator)

Modelling ownership in MySQL

I have a table Things and I want to add ownership relations to a table Users. I need to be able to quickly query the owners of a thing and the things a user owns. If I know that there will be at most 50 owners, and the pdf for the number of owners will probably look like this, should I rather
add 50 columns to the Things table, like CoOwner1Id, CoOwner2Id, …, CoOwner50Id, or
should I model this with a Ownerships table which has UserId and ThingId columns, or
would it better to create a table for each thing, for example Thing8321Owners with a row for each owner, or
perhaps a combination of these?
The second choice is the correct one; you should create an intermediate table between the table Things and the table Owners (that contains the details of each owner).
This table should have the thing_id and the owner_id as the primary key.
So finally, you well have 3 tables:
Things (the things details and data)
Owner (the owners details and data)
Ownerships (the assignment of each thing_id to an owner_id)
Because in a relational DB you should not have any redundant data.
You should definitely go with option 2 because what you are trying to model is a many to many relationship. (Many owners can relate to a thing. Many things can relate to an owner.) This is commonly accomplished using what I call a bridging table. (Which exactly what option 2 is.) It is a standard technique in a normalized database.
The other two options are going to give you nightmares trying to query or maintain.
With option 1 you'll need to join the User table to the Thing table on 50 columns to get all of your results. And what happens when you have a really popular thing that 51 people want to own?
Option 3 is even worse. The only way to easily query the data is to use dynamic sql or write a new query each time because you don't know which Thing*Owners table to join on until you know the ID value of the thing you're looking for. Or you're going to need to join the User table to every single Thing*Owners table. Adding a new thing means creating a whole new table. But at least a thing doesn't have a limit on the number of owners it could possibly have.
Now isn't this:
SELECT Users.Name, Things.Name
FROM Users
INNER JOIN Ownership ON Users.UserId=Ownership.UserId
INNER JOIN Things ON Things.ThingId=Ownership.ThingId
much easier than any of those other scenarios?

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

MySQL attribute with multiple entries in one column?

This may seem strange, but I am wondering if it is possible to have a MySQL table have a column that can contain a list of values. For instance, say I have a table that represents a friends list like facebook, how can I simulate this in a table? I'm thinking that you could add the usernames into an attribute column but not sure that is the best idea or even how to do that. Any suggestions on how to achieve this or an alternative?
Under certain circumstances you could use the SET type, which is similar in functionality to the ENUM type. It allows you to store one or more predefined values in a field. However, for the facebook friends case it would not be practical as each time a new user is created the column definition would require updating.
Wolfram's suggestion of a mapping table is definitely the better solution as it also enables the use of foreign key constraint which will ensure referential integrity when one user is deleted (assuming you use cascading). Also, if you ever need to use the relationship in a JOIN then the mapping table is the only solution.
So what you're looking for, in keeping with the facebook example, is to have a table with two columns. The first, identifies the user, and the second a list of that user's friends?
As far as I know, you can't treat a column like an array. You could have a string containing all the individual names separated by dashes, but that would be un-robust.
My suggestion would be to have a friends table which has two columns, both of which are varchars or strings. Each column contains the name or id of 1 person only, which denotes a friendship between those two people.
Then, if you wanted a list of foobar's friends, you would just query:
SELECT *
FROM friends
WHERE user_a == 'foobar' || user_b == 'foobar'
Now, this would actually give you both columns, one of which is foobar and one of which is his/her friend. So you might have to get a little creative as to separating it into just foobar's friends, but I'm sure you can figure out a way that works for your code.
While this isn't usually a great idea, MySQL does have this:
FIND_IN_SET('b','a,b,c,d')
SELECT ... FROM friends WHERE FIND_IN_SET('foo',friends_list) > 0;
So you could do what you asked very easily. It's just not typically suggested.