differences between graphs and MYSQL database in facebook? - mysql

I can't understand where facebook uses really mysql:
All the Database can be seen as a graph:
Account - Like -> Comment
Account <- friend -> Account2
Account - Like -> Link
And what is stored in MySQL?
the text of the posts and notes?
Have facebook all these entities ( account, post, comment ) in its graph DB?

Well, I assume that everything You mentioned is stored in MySQL. Every piece of data that is subject to change, including:
Users
Posts
Comments
Information about uploaded pictures (but not pictures themselves)
Likes
Data about users logging in
Ads
Data about users liking / not liking ads
User settings
etc.
Any data that is subject to change needs to be saved in database for indexing and fast access. Filesystem is fine if You want to write-only data, for example logging. Or if You only need to access the whole data at once, not parts of it.
But if You need data to be structured and ready to be accessed quickly, then You need to use a database. You may want to read about binary trees: http://en.wikipedia.org/wiki/Binary_tree
About Facebook: If I had to guess, I would say that there are probably hundreds of more databases. I don't have access to their servers, so I can't really comment on that :) But as another example, if You install WordPress, then it creates 11 different tables. http://codex.wordpress.org/Database_Description
PS. There is no reason facebook should use MySQL, though. There are lot of different databases out there.
EDIT Thanks for pointing out that I misunderstood Your question.
Lets take this case: Account <- friend -> Account2
As said before, they have table like "Users".
Users table will have columns:
ID (It has PRIMARY KEY. This is meant to give unique ID to each row.)
Username (Text field with some length, for example 64 characters)
...And many more...
Now there will be table "Friends". It will have fields:
ID (again, PRIMARY KEY)
Person1
Person2
Both fields Person1 and Person2 will be integers pointing to ID in table "users".
So if table users has three rows:
ID Username
1 rodi
2 rauni
3 superman
Then table "Friends" would be for example:
ID Person1 Person2
1 1 2
2 2 1
3 1 3
4 3 1
Here row 1 means "rodi is friend with rauni" and row 2 means that "rauni is friend with rodi". This is redundant, but I wanted to keep example simple.
Here is good tutorial: http://www.tizag.com/mysqlTutorial/mysqltables.php
There are many pages there, just keep clicking Next to skip what You already know (I don't know how much You already know)
This is about joining info from two tables: http://www.tizag.com/mysqlTutorial/mysqljoins.php
You could use this to select all rodi's friends from our two tables in one query.

Related

Storing list of users associated with a certain item

This is a bit hard to explain.
But i have built an app where users create what i like to call 'raffles' and then users subscribe to it.
I have a table for the raffles, and i could have a column of type text in it and store all the users in it separated by commas(,)
or i could create a separate table where users are added and associated to the raffle via another field called 'raffle_id' or something like it.
I'm not sure how effective both of these methods will be efficient in the long run or for scaling.
Some advise would be appreciated.
I would recommend against storing your user information in CSV format. The main reason for this is that CSV will make querying the table by user difficult. It will also make doing updates difficult. SQL databases were designed to handle relational data using tables. So in your case I would design the raffles table to look like thia:
raffles (raffle_id, user_id)
And the data might look like this:
1 1
1 3
1 7
2 1
2 2
2 3
2 6
In other words, each record corresponds to a single raffle-user relation. Assuming that you only have a few dozen users and raffles happen every so often, thia should scale fine. And if this raffles table ever gets too large at a much later date you can archive a portion of it.
See [What is the best way to add users to multiple groups in a database?][1]
Raffles are the "Groups". "UserInGroup" becomes UserInRaffle, your join table.

Database design for user driven website

Assuming I want to have a web application that requires storing user information, images, etc as well as storing status updates or posts/comments would I want to separate tables?
For example if I have a "users" table that contains users information like passwords, emails, and typical social networking info like age, location etc. Would it be a good idea do create a second table("posts") that handles user content such as comments and/or post?
Table one: "users"
UserID
Username
Age
etc.
Table Two: "posts"
PostID
PostContent
PostAuthor
PostDate
etc
Is this a valid organization? Furthermore if I wanted to keep track of media should I do this in ANOTHER table?
Table Three: "media"
ID
Type
Uploader
etc.
Any help is much appreciated. I'm curious to see if I'm on the right track or just completely lost. I am mostly wondering if I should have many tables or if I should have larger less segregated tables.
Also of note thus far I planned on keeping information such as followers(or friends) in the 'users' table but I'm not sure that's a good idea in retrospect.
thanks in advance,
Generally speaking to design a database you create a table for each object you will be dealing with. In you example you have Users, Posts, Comments and Media. From that you can flesh out what it is you want to store for each object. Each item you want to store is a field in the table:
[Users]
ID
Username
PasswordHash
Age
Birthdate
Email
JoinDate
LastLogin
[Posts]
ID
UserID
Title
Content
CreateDate
PostedDate
[Comments]
ID
PostID
UserID
Content
[Media]
ID
Title
Description
FileURI
Taking a look above you can see a basic structure for holding the information for each object. By the field names you can even tell the relationships between the objects. That is a post has a UserID so the post was created by that user. the comments have a PostID and a UserID so you can see that a comment was written by a person for a specific post.
Once you have the general fields identified you can look at some other aspects of the design. For example right now the Email field under the Users table means that a user can have one (1) email address, no more. You can solve this one of two ways... add more email fields (EmailA, EmailB, EmailC) this generally works if you know there are specific types of emails you are dealing with, for example EmailWork or EmailHome. This doesn't work if you do not know how many emails in total there will be. To solve this you can pull the emails out into its own table:
[Users]
ID
Username
PasswordHash
Age
Birthdate
JoinDate
LastLogin
[Emails]
ID
UserID
Email
Now you can have any number of emails for a single user. You can do this for just about any database you are trying to design. Take it in small steps and break your bigger objects into smaller ones as needed.
Update
To deal with friends you should think about the relationship you are dealing with. There is one (1) person with many friends. In relation to the tables above its one User to many Users. This can be done with a special table that hold no information other than the relationship you are looking for.
[Friends]
[UserA]
[UserB]
So if the current user's ID is in A his friend's ID is in B and visa-verse. This sets up the friendship so that if you are my friend, then I am your friend. There is no way for me to be your friend without you being mine. If you want to setup the ability for one way friendships you can setup the table like this:
[Friends]
[UserID]
[FriendID]
So If we are both friends with each other there would have to be 2 records, one for my friendship to you and one for your freindship to me.
You need to use multiple tables.
The amount of tables depends on how complex you want your interactive site to be. Based on what you have posted you would need a table that would store information about the users, a table for comments, and more such as a table to store status types.
For example tbl_Users should store:
1. UserID
2. First Name
3. Last name
4. Email
5. Password (encrypted)
6. Address
7. City
8. State
9. Country
10. Date of Birth
11. UserStatus
12. Etc
This project sounds like it should be using a relational DB that will pull up records, such as comments, by relative userIDs.
This means that you will need a table that stores the following:
1. CommentID (primary key, int, auto-increment)
2. Comment (text)
3. UserID (foreign key, int)
The comment is attached to a user through a foreign key, which is essentially the userId from the tbl_Users table. You would need to combine these tables in an SQL statement with your script to query the information as a single piece of information. See example code
$sql_userWall = "SELECT tbl_Users.*, tbl_Comments.*, tbl_userStatus FROM tbl_Users
INNER JOIN tbl_Comments ON tbl_Users.userID = tbl_Comments.userID
INNER JOIN tbl_UserStatus ON tbl_Users.userID = tbl.UserStatus
WHERE tbl_Users.userID = $userID";
This statement essentially says get the information of the provided user from the users table and also get all the comments with that has the same userID attached to it, and get the userStatus from the table of user status'.
Therefore you would need a table called tbl_userStatus that held unique statusIDs (primary key, int, auto-incrementing) along with a text (varchar) of a determined length that may say for example "online" or "offline". When you started the write the info out from e record using php, asp or a similar language the table will automatically retrieve the information from tbl_userStatus for you just by using a simple line like
<?php echo $_REQUEST['userStatus']; ?>
No extra work necessary. Most of your project time will be spent developing the DB structure and writing SQL statements that correctly retrieve the info you want for each page.
There are many great YouTube video series that describe relational DBS and drawing entity relational diagrams. This is what you should look into for learning more on creating the tye of project you were describing.
One last note, if you wanted comments to be visible for all members of a group this would describe what is known as a many-to-many relationship which would require additional tables to allow for multiple users to 'own' a relationship to a single table. You could store a single groupID that referred to a table of groups.
tbl_groups
1. GroupID
2. GroupName
3. More group info, etc
And a table of users registered for the group
Tbl_groupMembers
1. membershipCountID (primary key, int, auto-increment)
2. GroupID (foriegn key, int)
3. UserID (foriegn key, int)
This allows users to registrar for a group and inner join them to group based comments. These relationships take a little more time to understand, the videos will help greatly.
I hope this helps, I'll come back and post some YouTube links later that I found helpful learning this stuff.

Relation Database Design for Friends List / Buddy List Speed

I am quickly putting together a buddy / friends list where a user will have a list of buddies. I will be be using a relation database for this and found the following post:
Buddy List: Relational Database Table Design
So the buddy table might look something like this:
buddy_id username
1 George
2 Henry
3 Jody
4 Cara
And the table for user's buddy lists would look something like this:
user_id buddy_id
2 4
1 4
1 3
My question is how fast would it be if a user had 20,000+ buddies and wanted to pull there entire list in under a second or so. I would be running this on a pretty typical MySql setup. Would there be any key optimizations or db configurations to get this fast?
What does "pull their entire list" mean to you?
I can select 20,000 rows from a large "buddy" table (few million rows) in 15 milliseconds on my computer, but that doesn't include network transit time (both directions), formatting, and displaying on a web page. (Which I presume is the point--a web application.)
You'll need an index that covers user_id, but creating a primary key on (user_id, buddy_id) should do that.
Scripting languages are useful for generating test data. I'm using ruby today.

Database Design for Rental Listings

I'm designing a simple database for a rental listings website,
sort of like classified ads but only for home/room rentals. This is what I've come up with thus far:
Question 1
For the "post" table, I actually wanted more information. For example, there would be a 'facilities' section where the users can select whether there's 'parking' available, do I need a separate table? Or just use 0 for no and 1 for yes?
Question 2
Here's what I did with the "category" table (sorry I don't know how to pretty print yet)
Category_ID 1 is Rent
Category_ID 2 is buildingType
For "categoryProperty" table
Category_ID 1 categoryPropertyID 1 House
Category_ID 1 categoryPropertyID 2 Room
Category_ID 2 categoryPropertyID 3 Apartment
Category_ID 2 categoryPropertyID 4 Condominium
Category_ID 2 categoryPropertyID 5 Detached
Does the above make sense?
Question 3
Users can post whether they are logged in or not. Just that logged in users/members have the advantage of tracking their ads/adjusting the availability.
How do I record the ads that a member has posted? Like their history.
Should I create a "postHistory" table and set the 'postHistory_ID' as FK to "member" table?
Thanks a lot in advance, I appreciate your help, especially just pointing me to the right direction.
Question 1:
make a separate table and make a One to One relation, that would be the simplest way:
POST -|-----|- EXTRAS
in EXTRAS you may have every extra field (parking=1/0, in_down_town=1/0,has_a_gost=1/0)
Question 2:
This does not make sense, you've two options:
in the Post table create a "type_of_operation", that can have two vales (building_type,rent). Or you can create different tables, but would make this more complicate (you should analyise if the same type can be in both states, etc).
Question 3:
I recommend you to make your users register. Even with a really simple form (email+password) .
Seems to be on the right track -- with respect to your specific questions:
Question #1: Assuming there's more than one type of facility (parking; swimming pool; gym) then you have a many-to-many relationship and you want 2 new tables: Facilities and PropertyFacilities. Each Property (or I guess "post") could have multiple rows in the PropertyFacilities table.
Question #2: Not really clear on what you're getting at -- is it that each property type can either be rented whole or rented per room?
Question #3: Good question, what you want to do is have an Active bit, or an ExpireDate, in your POST table -- then anything that becomes inactive or expired is automatically 'historical' data, no need to marshall it to a history table. Although you'll have to archive eventually of course.

Database schema suggestion for widget driven site

I am currently working on restructuring my site's database. As the schema I have now is not one of the best, I thought it would be useful to hear some suggestions from you.
To start off, my site actually consists of widgets. For each widget I need a table for settings (where each instance of the widget has its user defined settings), a table for common (shared items between instances of the same widget) and userdata (users' saved data within an instance of a widget).
Until now, I had the following schema, consisting of 2 databases:
the first database, where I had all site-maintenance tables (e.g. users, widgets installed, logs, notifications, messages etc.) PLUS a table where I joined each widget instance to each user that instanciated it, having assigned a unique ID (so, I have the following columns: user_id, widget_id and unique_id).
the second database, where I kept all widget-related data. That means, for each widget (unique by its widget_id) I had three tables: [widget_id]_settings, [widget_id]_common and [widget_id]_userdata. In each of these tables, each row held that unique_id of the users' widget. Actually here was all the users' data stored within a widget.
To give a short example of how my databases worked:
First database:
In the users table I have user_id = 1
In the widgets table I have widget_id = 1
In the users_widgets table I have user_id = 1, widget_id = 1, unique_id = 1
Second database:
In the 1_settings I have unique_id = 1, ..., where ... represents the user's widget settings
In the 1_common I have several rows which represent shared data between instances of the same widget (so, no user specific data here)
In the 1_userdata I have unique_id = 1, ..., where ... represents the user's widget data. An important notice here is that this table may contain several rows with the same unique_id (e.g. For a tasks widget, a user can have several tasks for a widget instance)
Hope you understood in the rough my database schema.
Now, I want to develop a 'cleaner' schema, so it won't be necessary to have 2 databases and switch each time from one to another in my application. It would be also great if I found a way NOT to dinamically generate tables in the second database (1_settings, 2_settings, ... , n_settings).
I will greatly appreciate any effort in suggesting any better way of achieving this. Thank you very much in advance!
EDIT:
Shall I have databases like MongoDB or CouchDB in my mind when restructurating my databases? I mean, for the second database, where it would be better if I didn't have a fixed schema.
Also, how would traditional SQL's and NoSQL's get along on the same site?
A possible schema for the users_widgets table could be:
id | user_id | widget_id
You don't need the unique_id field in the users_widgets table, unless you want to hide the primary key for some reason. In fact, I would rename this table to something a little more memorable like widget_instances, and use widget_instance_id in the remaining tables of the second database.
One way to handle the second set of tables is by using a metadata style:
widget_instance_settings
id | widget_instance_id | key | value
This would include the userdata, because user_id is related to the widget_instance_id, unless you want to allow a user to create multiple instances of the same widget, and have the same data across all instances for some reason.
widget_common_settings
id | widget_id | key | value
This type of schema can be seen in packages like Elgg.
Do you know the settings a widget class and widget instance could have? In this case these settings could be made columns of the widget_class table (for common settings) and widget_instance (for instance specific settings).
If you don't know them, then you could have a widget_class_settings table that has a many to one relation with the widget_class table and a widget_instance_settings that has a many to one relation to the widget_instance table. Between the widget_instance and the widget_class you could, again, have a many to one relation. The widget_instance could also have a foreign key in the users table, so that you know which user created a specific widget.