I am designing a message/comment table in MySQL for social networking site. I have a messaging system where participating two users (only two and not more than that) will have single/unique thread. So, whenever a user is messaging another user then it will check whether both user had unique thread IF NOT then it will create unique thread for both user.
create table comment_threads (
thread_id int (PK),
user_id_1 int,
user_id_2 int,
created datetime
);
create table comments (
comment_id int (PK),
thread_id int (FK),
comment text,
created datetime
);
Whenever a user messages another user each time then I have to check whether both of the participating user had previous thread so i have to query the database for that (QUERY 1). And then again if there wasn't any thread to create a thread in comment_thread (QUERY 2). Then again to post comment in comment table (QUERY 3). So I have to query two or three times for messaging.
Questions:
Is the above tables correct way of solving my problem or it needs some correction?
Is there any other better way to do it?
You don't really need to have 2 tables.. 1 table should be fine:
create table comments (
comment_id int (PK),
parent_id int,
user_id_1 int,
user_id_2 int,
comment text,
created datetime
);
For a new thread, set the parent_id to 0. For future comments, you can set the parent_id to the ID of first comment.
This way, you can also do multi-level threaded conversations, and it makes it easy to do stuff like 'comments I've posted'.
As per Itay's answer, you should use some caching mechanisms to improve performance.
You can't know which user left which message
When you instantiate the message system with a thread, cache the two users ids and thread id, no need to run to DB for each submit (this is not that simple, but rather a direction for you, you will need some failsafes mechanisems)
I would buffer in memory the thread and submit to DB only at a later time (or just use a memory table)
since this is a two users only, one thread only system, you can de-normalize and do the entire thread in one record inside a huge text field where you concatenate the latest entry to the end. Again, this is just a direction and not the complete solution.
Related
This is similar to another question but not entirely the same.
My aim is to design a movie reservation system. A user could click on one or more empty seats for a movie schedule to reserve them. But he needs to make a payment before 15mins is up, otherwise the reserved seats would have to be automatically given up for other users.
I have the following pseudo MySQL :
Table Movie:
id int primary key,
name varchar,
.....
Table MovieSched:
movie_id foreign key refers to Movie,
sched_id int primary key,
showtime datetime, // date and time of schedule
count_signup int, // number of sign ups
max_size int // max number of seats
Table MovieSchedSignUp:
sched_id foreign key refers to MovieSched,
user_id foreign key refers to User,
signup datetime // datetime of signup
Every movie schedule has a max_size of users who can sign up. To register a user, I insert a row in MovieSchedSignUp with the current date and time.
A few constraints naturally arise from the requirements:
Due to possible inconsistency between the interface and database, I need to inform a user A when there are not enough seats available when A tries to reserve seats. (for e.g., another user B could have bought all the seats right before A.)
I need to atomically insert a row in MovieSchedSignUp while ensuring the schedule is not "overbooked" (count_signup <= max_size) as well as updating count_signup at the same time.
I need to ensure payment is made within 15mins, otherwise the reserved seats have to be freed.
My thoughts are:
Have extra columns in MovieSchedSignUp to keep track of when payment is made.
Use a transaction, but how do I return information about whether there are enough seats or not?
Have a batch job running in the background to delete the "expired" rows in MovieSchedSignUp.
What is the most efficient way to go about doing this? Any other thoughts? Don't really want to use a batch job, but is there any other way out?
I think in this situation you are going to have to use a transaction.
Start the transaction
Insert the records to be added to a temp table
Do a join between the temp table, MovieSched and MovieSchedSignUp to
check the number of records (combined temp and MovieSchedSignUp) isn't greater than max size.
If OK then do the insert
If OK them commit the transaction, if not then roll back the
transaction.
I am trying to design a notifications architecture where each notification has a UID and needs to be delivered to multiple users. Each user device has a local cache of the latest notifications. When the user device comes online it always checks for any new notifications and pulls all of them meant for that user. The device keeps the UID of the latest notification it synced and uses that UID to fetch newer notifications from the server.
I am wondering the best way to implement this in MySQL tables to make it scalable for more than 500K users.
I have a notifications details table where the notification UID is the auto increment primary key. I would need suggestions about the user mapping table which can be like (ignoring the foreign key constraints)
CREATE TABLE user_notifications_mapping (
user_id INT UNSIGNED NOT NULL,
notification_id BIGINT UNSIGNED NOT NULL,
UNIQUE KEY (user_id, notification_id)
) ENGINE=InnoDB;
but am skeptical if it would be the best performance while making a query like
SELECT notification_id FROM user_notifications_mapping WHERE user_id = <user-id> AND notification_id > <last-notification-uid>
If the table is properly indexed, this design is very suitable. Assuming that only a "small" number of notifications will be returned to one given device on synchronisation, a medium-range server will be able to handle hundreds of such requests per second, even if the table is huge (millions of rows).
Now this table is going to grow very huge. But I believe one given notification needs to be sent to one given device only once. I would consider removing (or archiving in another table) records of this table once a notification has been sent. Conceptually, this table becomes something like pending_notifications.
[edit]
Given the new information, this table is likely to grow beyond practical size. You need to take a different approach. For example, there is probably a way to group your notifications (eg. they are of a given type, or they originate from a given entity in your application). The same concept can be applied to your users: maybe you want some notifications be sent to (eg.) all "customers" or "all "administrators".
The underlying idea is to establish the n-n relationship between two entities of smaller cardinality. You wouldn't model the case "some users receive some notifications" but rather "some user groups receive some types of notifications".
Example:
notification can be an "Announcement", a "Notice" or a "Warning" (notification type)
users can be "Administrators" or "Customers" (user group)
Then the notifications_mapping table would look like this:
+-----------------------+
| notifications_mapping |
+-----------------------+
| notification_type |
| group_id |
+-----------------------+
And the corresponding query could be:
SELECT notification_id
FROM notifications_mapping AS map
JOIN user ON user.group_id = map.group_id
JOIN notifications ON notifications.type = map.notification_type
WHERE user_id = <user-id> AND notification_id > <last-notification-uid>
I have object which store in database, it's a some text with properties.
That text has rating. I need to store this rating, and prevent to one user raise this raiting more than one time. If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
There are two ways:
You can use many-to-many relationship ie use separate table with name like 'user_likes', it will have user_id and like_id columns, both of them are primary key (it makes possible user to like the like_object only once)
Another way - which hightraffic websites use: every user record in user table has columns: likes which is just serialized array or json, whatever. Before update this columns your application retrieve this data and look for particular like_object_id if it doesn't exist - you update your database. Please note that in this case all care about data consistency in your application (for instance like_object_id exists in some user record, but doesn't exist in like_object table) should be implemented in your application code, not database.
P.S. Sorry for my english, but I tried to explain as best as I could.
If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
How do you know what is too many records?
Some of the MySQL tables I support have billions of rows. If they need more than that, they split the data to multiple MySQL servers. 1 million rows is not a problem for a MySQL database.
If you want to limit the data so each user can "like" a given text only once, you must store the data separately for each user. This is also true if a user can "unlike" a text they had previously liked.
CREATE TABLE likes (
user_id BIGINT UNSIGNED NOT NULL,
post_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, post_id),
KEY (post_id, user_id)
);
This example table uses its primary key constraint to ensure each user can like a given post only once. By adding a second index, this helps to optimize queries for likes on a specific post.
This is only 16 bytes per row, plus the size of the index. I filled an InnoDB table with over 1 million rows, and it uses about 60MB.
mysql> show table status\G
Name: likes
Engine: InnoDB
Rows: 1046760
Data_length: 39419904
Index_length: 23658496
It's common to store databases on terabyte-sized storage these days, so a 60MB table doesn't seem too large.
I store the likes with the post itself, but not sure with its performance since non of my websites reached a very heavy load.
but I do the following :
Post {
id int;
likes_count int; // likes count to quickly retrive it
likes string; // id of the users liked this post, comma separated
}
when a user likes a post, (using ajax):
the UI will update directly and show that the user liked the post
ajax will send request to the server with the post id and the user id, then post data will be updated as follow:
post.likes_count += 1;
post.likes += userId + ',' ;
when the user reload the page, it will check if his id is in likes, then it the post will appear as liked.
I am using Ruby on Rails\MySQL and I would like to implement a rating system with the following conditions\features considering a large amount of users and articles:
Each article have 3 rating criterias
Each criteria is a binary function (examples: good\bad, +1 \ -1, ...)
Each user can vote one only time per criteria and per article
I would like to know in those conditions what are the best approaches\techniques to design\think of the database (for exmple in UML) in order to ensure a optimal performance (response time of a database query, CPU overloading, ...).
P.S.: I think a rating system as that working for the Stackoverflow website.
Create a table for the ratings:
CREATE TABLE vote
(
userId INT NOT NULL,
articleId INT NOT NULL,
criterion ENUM('language', 'usefulness', 'depth') NOT NULL, -- or whatever
value BOOLEAN NOT NULL,
PRIMARY KEY (userId, articleId, criterion)
)
This will allow each user to cast at most one vote per article per criterion.
criterion has type enum which allows only three different criteria. This constraint is on metadata level: this means that if you want to add the criteria, you will have to change the table's definition rather than data.
I want to make user group system that imitates group policy in instant messengers.
Each user can create as many as groups as they want, but they cannot have groups with duplicate names, and they can put as many friends as they want into any groups.
For example, John's friend Jen can be in 'school' group of John and 'coworker' group of John at the same time. And, it is totally independent from how Jen puts John into her group.
I'm thinking two possible ways to implement this in database user_group table.
1.
user_group (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
group_name VARCHAR(30),
UNIQUE KEY (user_id, group_name)
)
In this case, all groups owned by all users will have a unique id. So, id alone can identify which user and the name of the group.
2.
user_group (
user_id INT,
group_id INT AUTO_INCREMENT,
group_name VARCHAR(30),
PRIMARY KEY (user_id, group_id),
UNIQUE KEY (user_id, group_name)
)
In this case, group_id always starts from 0 for each user, so, there could exist many groups with same group_id s. But, pk pair (user_id, group_id) is unique in the table.
which way is better implementation and why?
what are advantages and drawbacks for each case?
EDIT:
added AUTO_INCREMENT to group_id in second scenario to insure it is auto-assigned from 0 for each user_id.
EDIT:
'better' means...
- better performance in SELECT/INSERT/UPDATE friends to the group since that will be the mostly used operations regarding the user group.
- robustness of database like which one will be more safe in terms of user size.
- popularity or general preference of either one over another.
- flexibility
- extensibility
- usability - easier to use.
Personally, I would go with the 1st approach, but it really depends on how your application is going to work. If it would ever be possible for ownership of a group to be changed, or to merge user profiles, this will be much easier to do in your 1st approach than in the 2nd. In the 2nd approach, if either of those situations ever happen, you would not only have to update your user_group table, but any dependent tables as well that have a foreign key relation to user_group. This will also be a many to many relation (there will be multiple users in a group, and a user will be a member of multiple groups), so it will require a separate joining table. In the 1st approach, this is fairly straightforward:
group_member (
group_id int,
user_id int
)
For your 2nd approach, it would require a 3rd column, which will not only be more confusing since you're now including user_id twice, but also require 33% additional storage space (this may or may not be an issue depending on how large you expect your database to be):
group_member (
owner_id int,
group_id int,
user_id int
)
Also, if you ever plan to move from MySQL to another database platform, this behavior of auto_increment may not be supported. I know in MS SQL Server, an auto_increment field (identity in MSSQL) will always be incremented, not made unique according to indexes on the table, so to get the same functionality you would have to implement it yourself.
Please define "better".
From my gut, I would pick the 2nd one.
The searchable pieces are broken down more, but that wouldn't be what I'd pick if insert/update performance is a concern.
I see no possible benefit to number 2 at all, it is more complex, more fragile (it would not work at all in SQL Server) and gains nothing. Remeber the groupId is without meaning except to identify a record uniquely, likely the user willonly see the group name not the id. So it doesn't matter if they all start from 0 or if there are gaps because a group was rolled back or deleted.