Relation Database Design for Friends List / Buddy List Speed - mysql

I am quickly putting together a buddy / friends list where a user will have a list of buddies. I will be be using a relation database for this and found the following post:
Buddy List: Relational Database Table Design
So the buddy table might look something like this:
buddy_id username
1 George
2 Henry
3 Jody
4 Cara
And the table for user's buddy lists would look something like this:
user_id buddy_id
2 4
1 4
1 3
My question is how fast would it be if a user had 20,000+ buddies and wanted to pull there entire list in under a second or so. I would be running this on a pretty typical MySql setup. Would there be any key optimizations or db configurations to get this fast?

What does "pull their entire list" mean to you?
I can select 20,000 rows from a large "buddy" table (few million rows) in 15 milliseconds on my computer, but that doesn't include network transit time (both directions), formatting, and displaying on a web page. (Which I presume is the point--a web application.)
You'll need an index that covers user_id, but creating a primary key on (user_id, buddy_id) should do that.
Scripting languages are useful for generating test data. I'm using ruby today.

Related

Database design for a chat system

I know there is a lot of posts out there discussing Db design for a chat system, but they didn't explain anything about the scalability of that design, so here my question.
I want to design a Db of a real-time chat between 2 or more users, let's take 2 users first, here what I came up with.
Table 1:
name: User
fields: id, name
Table 2
name: Chat Room
fields: id, user1, user2
Table 3:
name: Message
fields: Chat_room_id, user_id, message
Now considering Facebook in mind, it has around 2 billion active users per month and let say 1 billion of them indulge in chatting and each user sends 100 messages.
which make 100 Billion entries in table: Message, so the question is,
"Will Mysql or Postgres be able to handle this much of entries and show particular chat room messages in real-time ?" if not then what should be the best practice to follow that, I know that it also depends on the server on which RDBMS is installed but still want to know the optimum architecture.
PS: I am using Django as backend and AngularJs for asynchronous behavior
100 Billions rows in one table will never work online. Not only all possible partitioning ways are applied to reduce the sizes, but also separation of active/passive data strategies. But nevertheless all the high maters, the answer:
Postgres is indeed effective working with big data itself.
and yet:
Postgres has not effective enough strategy to fight poor design
Look at your example: table chat_room lists two users in separate columns - what for? You have user_id in messages referencing users.id. And you have chat_room.id in it, so you have data which users were in that chat_room. Now if your idea was to pre-aggregate which users participated in chat_room over time or at all, make it one array column, like (chat_room.id int, users_id bigint[]) or if you want join time and leave time, add corresponding attributes. active/passive data can be implemented using archived chat_rooms in different relation then active ones. Btw aggregation on who participated in that chatroom can be performed on such archiving...
Above is not instructions for action, just expression. There is no best practice for database schema. First make a clear plan what your chat will do, then make db schema, try it, improve, try, improve, try, improve and so on, until everything works. If you have concerns on how it will work with 100 billions of rows - fill it up and check...

Storing list of users associated with a certain item

This is a bit hard to explain.
But i have built an app where users create what i like to call 'raffles' and then users subscribe to it.
I have a table for the raffles, and i could have a column of type text in it and store all the users in it separated by commas(,)
or i could create a separate table where users are added and associated to the raffle via another field called 'raffle_id' or something like it.
I'm not sure how effective both of these methods will be efficient in the long run or for scaling.
Some advise would be appreciated.
I would recommend against storing your user information in CSV format. The main reason for this is that CSV will make querying the table by user difficult. It will also make doing updates difficult. SQL databases were designed to handle relational data using tables. So in your case I would design the raffles table to look like thia:
raffles (raffle_id, user_id)
And the data might look like this:
1 1
1 3
1 7
2 1
2 2
2 3
2 6
In other words, each record corresponds to a single raffle-user relation. Assuming that you only have a few dozen users and raffles happen every so often, thia should scale fine. And if this raffles table ever gets too large at a much later date you can archive a portion of it.
See [What is the best way to add users to multiple groups in a database?][1]
Raffles are the "Groups". "UserInGroup" becomes UserInRaffle, your join table.

Ideas for stock management using mysql

I am creating a database for a publishing company. The company has around 1300 books and around 6-7 offices. Now i have created a table that displays the stock items in all locations. The table should look like following to the user:
Book Name Location1 Location2 Location3 ......
History 20000 3000 4354
Computers 4000 688 344
Maths 3046 300 0
...
I already have a Books table which stores all the details of the books, i also have a office table which has the office information. Now if i create a stock management table which shows the information like above i will end up in a huge table with a lot of repetition if i store my data in the following way:
Column1- Book_ID Column2- Location_ID Column3- Quantity
1 1 20000
1 2 3000
1 3 4354
2 1 4000
2 2 688
...
So, i think this isn't the best way to store data as it would end up with 1300 (Books) X 7 (Locations) = 9100 rows. Is there a better way of storing data. Now i can have 7 additional columns in the Books stable but if i create a new location, i will have to add another column to the Books table.
I would appreciate any advice or if you think that the above method is suitable or not.
Nope, that's the best way to do it.
What you have is a Many-to-Many relationship between Books and Locations. This is, in almost all cases, stored in the database as an "associative" table between the two main entities. In your case, you also have additional information about that association, namely, it's "stock" or "quantity" (or, if you think about it like a Graph, the magnitude of the connection, or edge-weight).
So, it might seem like you have a lot of "duplication", but you don't really. If you were to try to do it any other way, it would be much less flexible. For example, with the design you have now, it doesn't require any database schema change to add another thousand different books or another 20 locations.
If you were to try to put the book quantities inside the Locations table, or the Locations inside the Books table, it would require you to change the layout of the database, and then re-test any code that might be use it.
Thats the most common (and effective) solution. Most frameworks like Django, Modx and several others implement Many2Many relations via an intermediate table only, using foreign key relations.
Make sure you index your table properly.
ALTER TABLE stock_management add index (Book_ID), add index (Location_ID)
That really the best way to do it; you have 9100 independent data to store, so you really do need 9100 rows (less, really; the rows where the quantity is 0 can be omitted.) Other way of arranging the data would require the structure of the table to change when a location was added.

differences between graphs and MYSQL database in facebook?

I can't understand where facebook uses really mysql:
All the Database can be seen as a graph:
Account - Like -> Comment
Account <- friend -> Account2
Account - Like -> Link
And what is stored in MySQL?
the text of the posts and notes?
Have facebook all these entities ( account, post, comment ) in its graph DB?
Well, I assume that everything You mentioned is stored in MySQL. Every piece of data that is subject to change, including:
Users
Posts
Comments
Information about uploaded pictures (but not pictures themselves)
Likes
Data about users logging in
Ads
Data about users liking / not liking ads
User settings
etc.
Any data that is subject to change needs to be saved in database for indexing and fast access. Filesystem is fine if You want to write-only data, for example logging. Or if You only need to access the whole data at once, not parts of it.
But if You need data to be structured and ready to be accessed quickly, then You need to use a database. You may want to read about binary trees: http://en.wikipedia.org/wiki/Binary_tree
About Facebook: If I had to guess, I would say that there are probably hundreds of more databases. I don't have access to their servers, so I can't really comment on that :) But as another example, if You install WordPress, then it creates 11 different tables. http://codex.wordpress.org/Database_Description
PS. There is no reason facebook should use MySQL, though. There are lot of different databases out there.
EDIT Thanks for pointing out that I misunderstood Your question.
Lets take this case: Account <- friend -> Account2
As said before, they have table like "Users".
Users table will have columns:
ID (It has PRIMARY KEY. This is meant to give unique ID to each row.)
Username (Text field with some length, for example 64 characters)
...And many more...
Now there will be table "Friends". It will have fields:
ID (again, PRIMARY KEY)
Person1
Person2
Both fields Person1 and Person2 will be integers pointing to ID in table "users".
So if table users has three rows:
ID Username
1 rodi
2 rauni
3 superman
Then table "Friends" would be for example:
ID Person1 Person2
1 1 2
2 2 1
3 1 3
4 3 1
Here row 1 means "rodi is friend with rauni" and row 2 means that "rauni is friend with rodi". This is redundant, but I wanted to keep example simple.
Here is good tutorial: http://www.tizag.com/mysqlTutorial/mysqltables.php
There are many pages there, just keep clicking Next to skip what You already know (I don't know how much You already know)
This is about joining info from two tables: http://www.tizag.com/mysqlTutorial/mysqljoins.php
You could use this to select all rodi's friends from our two tables in one query.

I need an expert mysql opinion for doing a friends system

I need some help designing a friends system
The mysql table:
friends_ list
- auto_ id
- friend_ id
- user_ id
- approved_ status
Option 1 = Everytime a user adds a user there is 2 entries added to the DB, we then can get there friends like this
SELECT user_id FROM `friends_list` WHERE friend_id='$userId' and approved_status='yes'
Option 2 = We add 1 entry for every friend added and then get the friend list like this
SELECT friend_id AS FriendId FROM `friends_list` WHERE user_id='$userId' and approved_status='yes'
UNION
SELECT user_id as FriendId FROM `friends_list` WHERE friend_id='$userId' and approved_status='yes'
Of the 2 methods above for having friends on a site like myspace, facebook, all the other sites do, which of the 2 methods above would be best performance?
The 1st method doubles the ammount of rows, example a site with 2 million friend rows would only be 1 million rows the second method.
However does the union method mean there is 2 queries being made, so 2 queries on a million row table instead of 1?
UPDATE
I just ran some test on 60,000 friends and here are the results, and yes the tables are indexed.
Option 1 with 2 entries per friend;
.0007 seconds
Option 2 with 1 entry per friend using UNION to select
.3100 seconds
Option 1.
Do as much work as possible on adding a friend. Adding someone is very rare compare to selecting all friends. Something probably done every time you render a page (on a social site).
If you are indexing on user_id and friend_id, then the two statements should be equivalent in time -- best to test though -- DB's can have surprising results. The UNION is seen by the database and it can use it to optimize the query, though.
Is friendship always mutual and with the same approval status? If so, I'd opt for the second one. If it can be one-way or with separate approvals, then you need the first one, right?
Which dbms are you using? MySQL always use an extra temporary table when executing union queries. Creating these temporary tables creates some overhead to the query and is probably why your union query is slower than the first query.
Have you thought about separating friends and friend request? This would reduce the content of the friends table, you would also be able to delete accepted requests from the friend request table and keep the size of the table down. Another good advantage is that you keep less columns in each table and it is thereby easier to get a fine tuned index on them.
I am currently building this feature myself, and it would be interesting to hear about your experience on the matter.
Answers to this kind of question tend to depend upong the usage patterns. In your case which is most common: adding new friendship relationships or querying the friendship realtionship?
Also you need to consider future flecibility, what otehr uses might there be for the data.
My guess is that option is a cleaner data model for what you are trying to represent. It looks like it allows for expandsion in directions such as asymmetric relationships, and friend of friends. I think that option will prove to be unweildy in the future.