I have read a number of solutions for a mysql Facebook friendship table and have decided on a fairly simple table with two fields user_a and user_b. I would then using a query with a UNION to get a list of all of a users friends (as they could be in user_a or user_b). My question now is... is it better to have a auto incrementing unique id or a compound id?
table 1)
user_a, user_b
table 2)
unique_id, user_a, user_b
My comments:
either approach for the key is fine. I would prefer a compound key over surrogate key to save space and avoid additional indexes
you may require a surrogate key though - some DALs do not work with compound keys
Update:
You may consider that friendship is a two-way street. Just because UserA has friended UserB does not mean that UserB has friended UserA. If you track both sides, it makes your queries easier. In that case you do:
Friend
-------
UserID
FriendUserID
So, you are only matching on the UserID column to get the list of the user's friends. If two users friend each other, you put two rows in the table. If one user unfriends another, you remove that one row.
While it is true that the compound key solution seems to be more elegant from a design perspective and less space-consuming at first glance, there are circumstances in which I'd personnaly go for an auto incremented numeric id instead.
If the friendship is referenced elsewhere, it will save more space on the long run to have a single numeric ID as a foreign key in the referencing table than a compound ID. Plus, an index on a single id will be (slightly) shorter and faster than a composite index if you query often on the friendship ID.
Related
Wasn't sure how to word the question so my bad if it sounds weird.
I have a table in my database called friendRequests with the the following columns: id, sender_id, recipient_id, and status. How can I make sure that no other row has duplicate recipient_id and sender_id values?
So for example, if I had a row in the table with the following values: (1, 4, 6, 0), how can I make sure that no other row has a sender_id of 4 and a recipient_id of 6 AND that no other row has a sender_id of 6 and a recipient_id of 4?
For same relations - use a unique constraint on <sender, receiver>.
The inverse relation <receiver, sender> however will be possible, cause it's different ids for the unique key constraint.
To handle this (using a uniqe key constraint), you have to add another column, let's call it friendship - There, you'll add a unique key constraint, and insert the users ids, concatenated, BUT ORDERED:
I.e. If a user 3 sends a friend request to 10, you'll insert 3-10 to that column. If the invitation goes from 10 to 3, you'll add 3-10 as well.
This way, you can keep track of WHO initiated the friendship (sender_column=3, receiver-column=10) but also ensure that there is no backwards invite (friendship=3-10 already exists)
So that's something like
INSERT INTO friendships(sender, receiver, friendship) VALUES(3,10,"3-10");
or vice versa:
INSERT INTO friendships(sender, receiver, friendship) VALUES(10,3,"3-10");
One of both constraints will avoid the insertion if the friendship has been already requested. (Actually the second constraint would be sufficent for any case, first 2 columns would only allow to determine the active and passive part of the friendship.)
You have to create a unique index on the table.
create unique index sender_recipient
on friendRequest (sender_id, récipient_id)
What I'm hearing is, if user A invites user B, you don't want to create a new record for user B inviting using A.
I don't think it's possible to enforce a constraint like that, except perhaps through the use of triggers, which I would probably not recommend. I would suggest to try and enforce this in your application.
I'm trying to model a simple poll system, I have 4 tables
Election
id, title, description
Candidate
id, electionId, name
User
id, (other user details)...
Vote
userId, candidateId
There is a 1-n relation from Election to Candidate. If someone runs in multiple elections, they are listed as multiple candidates.
I'm having trouble figuring out how to constrain each user to one vote in each election at the database level. If I create an electionId column in Vote I create inconsistent or redundant data, but I can't think of any other way to constrain the data like that otherwise.
I feel like this has to be a common problem but I don't know what to call it so my last half an hour of searching hasn't been fruitful. What's the correct approach here?
You could change Candidate's PK to be a composite of electionId, name or at least make that combination a unique constraint in Candidate.
Then you would change Vote to be userId, electionId, name where the PK is userId, electionId and there is a FK pointing to Candidate's electionId, name which is now unique.
This means that userId and electionId are unique for the vote table and there is no redundancy left.
You can do this with your current schema by adding validation before the insert into Vote (in mysql this is done with a TRIGGER BEFORE INSERT). You'd select all votes by that particular user, joined with candidate on candidateId, and make sure none of the electionIds match the election Id of the candidate the vote is for.
This is completely normalized but expensive. Sometimes it's worth adding redundant fields for the sake of performance. I'd add electionId to Vote in this schema so that inserts don't need such an expensive validation.
Let's assume there is a table, with theese rows:
-personID,
-personName,
-personInterests
There is also another table, which stores the interests:
-interestID
-interestName
One person can have multiple interests, so I put the serialize()-d or JSON representation of the interest array into the interest field. This is not a String, like "reading", buth rather an index of the interests table, which stores the possible interests. Something like multiple foreign keys in one field.
The best way would be to use foreign keys, but it is not possible to achieve multiple references in one field...
How do I run such a query, without REGEX or splitting the field's content by software? If putting indexes to one field is not the way to go, then how is it possible, to achieve a structure like this?
Storing multiple indexes or any references in one field is strictly not advised.
You have to create something that I call "rendezvous" table.
In your case it has:
- ID
- UserID (foreign key)
- InterestID (foreign key)
Every single person can have multiple interests, so when a person adds a new interest to himself, you just add a new row into this table, that will have a reference to the person and the desired interest with a foreign key NOT NULL.
On large-scale projects when there are too many variations available, it is advised, to not to give an ID row to this table, but rather set the two foreign keys also primary keys, so the duplication will be impossible and the table-index will be smaller, as well as in case of lookup, it will consume less from the expensive computing power.
So the best solution is this:
- UserID (foreign key AND primary key)
- InterestID (foreign key AND primary key)
I believe the only way you can implement this is to create a third table, which will actually get updated by a trigger (Similar to what Gabor Dani advised)
Table1
-personID,
-personName,
-personInterests
Table2
-interestID
-interestName
Table3
-personInterestID (AutoIncrement Field)
-personID
-interestID
Then you need to write a trigger which will do this a stored procedure may be needed because you will need to loop through all the values in the field.
1 database with 3 tables: user - photo - vote
- A user can have many photos.
- A photo can have many votes.
- A user can vote on many photos.
- A vote records:
. the result as an int (-1/disliked, 0/neutral, 1/liked)
. the id of the user who voted.
Here is what I have (all FKs are cascade on delete and update):
http://grab.by/iZYE (sid = surrogate id)
My question is: this doesn't seem right, and I look at this for 2 days already and can't confidently move on. How can I optimize this or am I completely wrong?
MySQL/InnoDB tables are always clustered (more on clustering here and here).
Since primary key also acts as a clustering key1, using the surrogate primary key means you are physically sorting the table in order that doesn't have a useful meaning for the client applications and cannot be utilized for querying.
Furthermore, secondary indexes in clustered tables can be "fatter" than in heap-based tables and may require double lookup.
For these reasons, you'd want to avoid surrogates and use more "natural" keys, similar to this:
({USER_ID, PICTURE_NO} in table VOTE references the same-named fields in PICTURE. The VOTE.VOTER_ID references USER.USER_ID. Use integers for *_ID and *_NO fields if you can.)
This physical model will enable extremely efficient querying for:
Pictures of the given user (a simple range scan on PICTURE primary/clustering index).
Votes on the given picture (a simple range scan on VOTE primary/clustering index). Depending on circumstances, this may actually be fast enough so you don't have to cache the sum in PICTURE.
If you need votes of the given user, change the VOTE PK to: {VOTER_ID, USER_ID, PICTURE_NO}. If you need both (votes of picture and votes of user), keep the existing PK, but create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}.
1 In InnoDB. There are DBMSes (such as MS SQL Server) where clustering key can differ from primary.
The first thing I see is that you have duplicate unique IDs on the tables. You don't need the sid columns; just use user_id, photo_id, and photo_user_id (maybe rename this one to vote_id). Those ID columns should also be INT type, definitely not VARCHARs. You probably don't need the vote total columns on photo; you can just run a query to get the total when you need it and not worry about keeping both tables in sync.
Assuming that you will only allow one vote per user on each photo, the structure of the can be modified so the only columns are user_id, photo_id, and vote_result. You would then make the primary key a composite index on (user_id, photo_id). However, since you're using foreign keys, that makes this table a bit more complicated.
I'm designing a db table that will save a list of user's favorited food items.
I created favorite table with the following schema
id, user_id, food_id
user_id and food_id will be foreign key linking to another table.
Im just wondering if this is efficient and scalable cause if user has multiple favorite things then it would need multiple rows of data.
i.e. user has 5 favorited food items, then it will consist of five rows to save the list for that user.
Is this efficient? and scalable? Whats the best way to optimize this schema?
thnx in advance!!!
tldr; This is called a "join table" and is the correct and scalable approach to model M-M relationships in a relational database. (Depending upon the constraints used it can also model 1-M/1-1 relationships in a "no NULL FK" schema.)
However, I contend that the id column should be omitted here so that the table is only user_id, food_id. The PK will be (user_id, food_id) in this case.
Unlike other tables, where surrogate (aka auto-increment) PKs are sometimes argued for, a surrogate PK generally only adds clutter in a join table as it has a very natural compound PK.
While the PK itself is compound in this case, each "joined" table only relates back by part of the PK. Depending upon queries performed it might also be beneficial to add covering indices on food_id or (food_id, user_id).
Eliminate Surrogate Key: Unless you have a specific reason for the surrogate key id, exclude it from the table.
Fine-tune Indexing: A this point, you just have a composite primary key that is the combination of the two foreign keys. In which order should the PK fields be?
If your application(s) predominantly execute queries such as: "for given user, give me foods", then PK should be {user_id, food_id}.
If the predominant query is "for given food, give me users", then the PK should be {food_id, user_id}.
If both query "directions" are common, add a UNIQUE INDEX that has the same fields as PK, but in opposite directions. So you'll have PK on {user_id, food_id} and index on {food_id, user_id}.
Note that InnoDB tables are clustered, which eliminates (in this case "unnecessary") table heap. Yet, the secondary index discussed above will not cause a double-lookup (since it fully covers the query), nor have a hidden overhead of PK fields (since it indexes the same fields as PK, just in opposite order).
For more on designing a junction table, take a look at this post.
To my opinion, you can optimize your table in the following ways:
As a relation table with 2 foreighkeys you don't have to use "id" field.
use "innodb" engine to your table
name your relation table "user_2_food", which will make it more clear.
try to use datatype as small as possible, i.e. "smallint" is better than "int", and don't forget "UNSIGNED" attribute.
Creating the below three Tables will result in an efficient design.
users : userId, username, userdesc
foods : foodId, foodname, fooddesc
userfoodmapping : ufid, userid, foodid, rowstate
The significance of rowstate is, if the user in future doesn't like that food, its state will become -1
You have 2 options in my opnion:
Get rid of the ID field, but in that case, make both your other keys (combined) your primary key
Keep your ID key as the primary key for your table.
In either case, I think this is a proper approach. Once you get into a problem of inefficiency, then you will look at probably how to load part of the table or any other technique. This would do for now.