I need an expert mysql opinion for doing a friends system - mysql

I need some help designing a friends system
The mysql table:
friends_ list
- auto_ id
- friend_ id
- user_ id
- approved_ status
Option 1 = Everytime a user adds a user there is 2 entries added to the DB, we then can get there friends like this
SELECT user_id FROM `friends_list` WHERE friend_id='$userId' and approved_status='yes'
Option 2 = We add 1 entry for every friend added and then get the friend list like this
SELECT friend_id AS FriendId FROM `friends_list` WHERE user_id='$userId' and approved_status='yes'
UNION
SELECT user_id as FriendId FROM `friends_list` WHERE friend_id='$userId' and approved_status='yes'
Of the 2 methods above for having friends on a site like myspace, facebook, all the other sites do, which of the 2 methods above would be best performance?
The 1st method doubles the ammount of rows, example a site with 2 million friend rows would only be 1 million rows the second method.
However does the union method mean there is 2 queries being made, so 2 queries on a million row table instead of 1?
UPDATE
I just ran some test on 60,000 friends and here are the results, and yes the tables are indexed.
Option 1 with 2 entries per friend;
.0007 seconds
Option 2 with 1 entry per friend using UNION to select
.3100 seconds

Option 1.
Do as much work as possible on adding a friend. Adding someone is very rare compare to selecting all friends. Something probably done every time you render a page (on a social site).

If you are indexing on user_id and friend_id, then the two statements should be equivalent in time -- best to test though -- DB's can have surprising results. The UNION is seen by the database and it can use it to optimize the query, though.
Is friendship always mutual and with the same approval status? If so, I'd opt for the second one. If it can be one-way or with separate approvals, then you need the first one, right?

Which dbms are you using? MySQL always use an extra temporary table when executing union queries. Creating these temporary tables creates some overhead to the query and is probably why your union query is slower than the first query.
Have you thought about separating friends and friend request? This would reduce the content of the friends table, you would also be able to delete accepted requests from the friend request table and keep the size of the table down. Another good advantage is that you keep less columns in each table and it is thereby easier to get a fine tuned index on them.
I am currently building this feature myself, and it would be interesting to hear about your experience on the matter.

Answers to this kind of question tend to depend upong the usage patterns. In your case which is most common: adding new friendship relationships or querying the friendship realtionship?
Also you need to consider future flecibility, what otehr uses might there be for the data.
My guess is that option is a cleaner data model for what you are trying to represent. It looks like it allows for expandsion in directions such as asymmetric relationships, and friend of friends. I think that option will prove to be unweildy in the future.

Related

Database design for a chat system

I know there is a lot of posts out there discussing Db design for a chat system, but they didn't explain anything about the scalability of that design, so here my question.
I want to design a Db of a real-time chat between 2 or more users, let's take 2 users first, here what I came up with.
Table 1:
name: User
fields: id, name
Table 2
name: Chat Room
fields: id, user1, user2
Table 3:
name: Message
fields: Chat_room_id, user_id, message
Now considering Facebook in mind, it has around 2 billion active users per month and let say 1 billion of them indulge in chatting and each user sends 100 messages.
which make 100 Billion entries in table: Message, so the question is,
"Will Mysql or Postgres be able to handle this much of entries and show particular chat room messages in real-time ?" if not then what should be the best practice to follow that, I know that it also depends on the server on which RDBMS is installed but still want to know the optimum architecture.
PS: I am using Django as backend and AngularJs for asynchronous behavior
100 Billions rows in one table will never work online. Not only all possible partitioning ways are applied to reduce the sizes, but also separation of active/passive data strategies. But nevertheless all the high maters, the answer:
Postgres is indeed effective working with big data itself.
and yet:
Postgres has not effective enough strategy to fight poor design
Look at your example: table chat_room lists two users in separate columns - what for? You have user_id in messages referencing users.id. And you have chat_room.id in it, so you have data which users were in that chat_room. Now if your idea was to pre-aggregate which users participated in chat_room over time or at all, make it one array column, like (chat_room.id int, users_id bigint[]) or if you want join time and leave time, add corresponding attributes. active/passive data can be implemented using archived chat_rooms in different relation then active ones. Btw aggregation on who participated in that chatroom can be performed on such archiving...
Above is not instructions for action, just expression. There is no best practice for database schema. First make a clear plan what your chat will do, then make db schema, try it, improve, try, improve, try, improve and so on, until everything works. If you have concerns on how it will work with 100 billions of rows - fill it up and check...

Modelling ownership in MySQL

I have a table Things and I want to add ownership relations to a table Users. I need to be able to quickly query the owners of a thing and the things a user owns. If I know that there will be at most 50 owners, and the pdf for the number of owners will probably look like this, should I rather
add 50 columns to the Things table, like CoOwner1Id, CoOwner2Id, …, CoOwner50Id, or
should I model this with a Ownerships table which has UserId and ThingId columns, or
would it better to create a table for each thing, for example Thing8321Owners with a row for each owner, or
perhaps a combination of these?
The second choice is the correct one; you should create an intermediate table between the table Things and the table Owners (that contains the details of each owner).
This table should have the thing_id and the owner_id as the primary key.
So finally, you well have 3 tables:
Things (the things details and data)
Owner (the owners details and data)
Ownerships (the assignment of each thing_id to an owner_id)
Because in a relational DB you should not have any redundant data.
You should definitely go with option 2 because what you are trying to model is a many to many relationship. (Many owners can relate to a thing. Many things can relate to an owner.) This is commonly accomplished using what I call a bridging table. (Which exactly what option 2 is.) It is a standard technique in a normalized database.
The other two options are going to give you nightmares trying to query or maintain.
With option 1 you'll need to join the User table to the Thing table on 50 columns to get all of your results. And what happens when you have a really popular thing that 51 people want to own?
Option 3 is even worse. The only way to easily query the data is to use dynamic sql or write a new query each time because you don't know which Thing*Owners table to join on until you know the ID value of the thing you're looking for. Or you're going to need to join the User table to every single Thing*Owners table. Adding a new thing means creating a whole new table. But at least a thing doesn't have a limit on the number of owners it could possibly have.
Now isn't this:
SELECT Users.Name, Things.Name
FROM Users
INNER JOIN Ownership ON Users.UserId=Ownership.UserId
INNER JOIN Things ON Things.ThingId=Ownership.ThingId
much easier than any of those other scenarios?

MySQL Performance: one query using left join vs. multiple queries

I would like to achieve something like you see on Facebook:
- Posting status
- Comment status
- Like status (like for comments not implemented yet)
My tables structure is like this :
Posts Users Comments Likes
------- ------- -------- -------
ID ID ID ID
UserID Username PostID PostID
Content UserID UserID
Date Content
Date
So at this time when someone access to the main page the system is going to show the 10 lasts posts. My query uses LEFT JOIN on theses tables.
If for example there is 10 posts without any comments and any likes the query will return 10 records.
But for each comment or likes my query will return a new record (row) with some NULL value in the corresponding column.
At the end by simply wanting to retrieve 10 posts my query will return at least 50 rows (if each post has some comments and likes).
I was wondering if that will cause problem in the future. And I was wondering if I should better use multiple queries and parse all the results into an array like:
1. Select the 10 last posts
2. Save the IDs into array and all data into global array
3. Parse the array and make a prepared query for the comments something like:
SELECT * FROM COMMENTS WHERE PostID IN (1, 2, 3, 4, 5, 6,...)
4. Save the result into global array
5. Repeat again for the like table
I hope my explanation was clear enough :) Thank you
Doing one 50 row query reduces the overhead when communicating with the server, on the other hand it adds processing after the rows are retrieved.
It really depends on the overall solution.
However, unless the application is performance critical with the server being the bottleneck, i would go with 10 result sets - one per row, probably using some class/widget/object to display the post on the page.
I'm not an expert, if I understand correctly your option are:
A) the single mega query that will return a lot of NULL's and repeated values.
[Note: By "all" I mean, all you are interested in]
B) Three queries: One for all posts, one for all comments, and one for all likes (all joined with the users table), and then you can process them into objects or structs or dictionaries with whatever language you are using to query the database.
I would go with the second because It is easier, and the increase in order of magnitude seems benign, and probably even more flexible design wise.
What I would prefer NOT to do is one query per post. That would probably become a problem sooner than later. At least much sooner than A or B.

MySQL table design, one row or more pr user?

Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.

How to properly design a simple favorites and blocked table?

i am currently writing a webapp in rails where users can mark items as favorites and also block them. I came up two ways and wondered which one is more common/better way.
1. Separate join tables
Would it be wise to have 2 tables for this? Like:
users_favorites
- user_id
- item_id
users_blocked
- user_id
- item_id
2. single table
users_marks (or so)
- users_id
- item_id
- type (["fav", "blk"])
Both ways seem to have advantages. Which one would you use and why?
The second one has at least the advantage (if the primary key is users_id + item_id) to make sure that no user will have an item both as favorited and blocked.
I suppose I would got with that second solution -- especially considering the two tables, in the first solution, would have the same structure, which seems strange ; and it also allows you to have all the information in the same place, which might help, in some cases (reporting, for instance ? ).
I would go with #2.
It leaves all the appropriate data in a single table.
Otherwise you might have to resort to a union or distinct joins to get a full list of details.
It's just a different status of an item, so #2 will do the job. What would you do if it would be colors? Two different tables? I don't think so ;)
Edit: You might want the status in a different table and link it with a foreign key, but that's up to you. It depend on how many different status you expect to have. Just these two or many others as well?