Optimised way of storing threaded converstations? - mysql

We are to design a architecture storing conversations (one to many)...
we are using the following tables (and columns)
messages_table - message_id, sender_id (user_id), timestamp, reference_id
recipients - message_id, recipient_id (user_id)
unread_status_table - message_id, user_id
we are storing messages in messages table, and reference_id stores the message id of starting thread.
unread status table saves only messages that are unread.
I am not sure if we should use a separate table for unread messages, advantage is, if all the messages are read, the table is empty.
please help me :)

I don't think there is one single answer to a question like this. It all depends on how many users, how many conversations, the hardware you have, requirements for how responsive the system should be to users, and so forth. If you can get adequate performance by not having a second table for unread messages, then that's probably the better route. Always simplify when possible.
My thought is that, if I had a second table, it probably wouldn't be for unread messages. Instead, in a very high-concurrency situation, that table would have no indexes apart from a primary key, and new messages would be pushed into it. It should allow for very fast operations even when lots of new messages come in. Then a job could run periodically to read a small number of records and push them into the main table, where the operation is slower, and then delete them from the second table. This means that users can continue to create new messages and the system stays responsive, but it might be a little while until recipients see the messages.
But again, this approach is needed only when you have seen that performance requires it.

Related

What is the optimal way of setting up a database for a messaging/email application?

I am currently trying to create an email style web app to allow users of my site to contact one another. I have created an SQL table for this, which has the following headings:
id
senderID
recipientID
timestamp
message
read (Boolean to record whether message has been read by recipient)
starred (Boolean to record whether message has been starred by recipient)
archived (Boolean to record whether message has been archived by recipient)
deleted (Boolean to record whether message has been deleted by recipient)
convoID
I have started to now realise that this table is insufficient. For example, if the conversation has been starred by a user, this does not tell me which of the 2 users has starred the convo, etc.
Can anyone suggest a way to avoid the above issue, and maybe suggest a better database structure?
I would recommend split your table into two, let's call them "message" and "star". So, they are:
message
-------
id
sender_id
recipient_id
timestamp
read
archived
deleted
convo_id
parent_id
star
----
message_id
user_id
timestamp
As you can see, I added parent_id into message. If you don't need hierarchical structure - you may kick this column. A star table gives possibility to enhance starring feature. Who knows, maybe in future all users may put a star, not only participants of conversation.
In addition, there is a nice articles about DB normalization. It will really helps you to build well-organized DB structure:
What is Normalisation (or Normalization)?
http://www.studytonight.com/dbms/database-normalization.php
http://searchsqlserver.techtarget.com/definition/normalization
depend on your application and how many users you will have.
About the starred, archived and other stuff where both users can do things, you can use an enumeration or simply a couple of values. Not just a boolean.
Or you can split every read with a senderRead and recipentRead

Database design for a chat system

I know there is a lot of posts out there discussing Db design for a chat system, but they didn't explain anything about the scalability of that design, so here my question.
I want to design a Db of a real-time chat between 2 or more users, let's take 2 users first, here what I came up with.
Table 1:
name: User
fields: id, name
Table 2
name: Chat Room
fields: id, user1, user2
Table 3:
name: Message
fields: Chat_room_id, user_id, message
Now considering Facebook in mind, it has around 2 billion active users per month and let say 1 billion of them indulge in chatting and each user sends 100 messages.
which make 100 Billion entries in table: Message, so the question is,
"Will Mysql or Postgres be able to handle this much of entries and show particular chat room messages in real-time ?" if not then what should be the best practice to follow that, I know that it also depends on the server on which RDBMS is installed but still want to know the optimum architecture.
PS: I am using Django as backend and AngularJs for asynchronous behavior
100 Billions rows in one table will never work online. Not only all possible partitioning ways are applied to reduce the sizes, but also separation of active/passive data strategies. But nevertheless all the high maters, the answer:
Postgres is indeed effective working with big data itself.
and yet:
Postgres has not effective enough strategy to fight poor design
Look at your example: table chat_room lists two users in separate columns - what for? You have user_id in messages referencing users.id. And you have chat_room.id in it, so you have data which users were in that chat_room. Now if your idea was to pre-aggregate which users participated in chat_room over time or at all, make it one array column, like (chat_room.id int, users_id bigint[]) or if you want join time and leave time, add corresponding attributes. active/passive data can be implemented using archived chat_rooms in different relation then active ones. Btw aggregation on who participated in that chatroom can be performed on such archiving...
Above is not instructions for action, just expression. There is no best practice for database schema. First make a clear plan what your chat will do, then make db schema, try it, improve, try, improve, try, improve and so on, until everything works. If you have concerns on how it will work with 100 billions of rows - fill it up and check...

Performance considerations for table design of chat service

We are implementing a chat service, and I'd like some input on table design. Our service uses MySQL, and our DB has 2 tables, Threads and Messages. Threads table stores all the chat threads, and Messages table stores all the messages. A chat thread can have multiple messages, while a message belong to only one thread. Each message is identified by a column in Messages table called messageId.
We need to get the messageId of the last message of each thread from time to time in our service. I can see 2 options:
1 add a column called lastMessageId to Threads to keep track of the last message; each time a message is inserted into Messages table, we need to update Threads table as well;
2 each time we need the last message's id, perform a query on Messages table to find the last message;
Which option should I take, and why?
I would suggest to go for option 2, below are the reason.
You said that u need last message id time to time which means not so frequent.
Making a query time to time is less intensive than making an Update operation on every insert.
You can further fine tune your query by creating indexes on Messages Table.

DB for Commenting System

i wanna create a 2 level status message system. Which is the best way to create a tables ?
Scope:
User sets a Status Message
Users Reply to the status message
this is a picture showing it
Tables i have created
users (id, name .... )
status_messages (id, message, time, user_id)
status_message_replies (id, message, time, status_message_id, user_d)
Some one suggested this can be done in a single table format
status_messages (id, pid, message, time, user_id)
where pid = selfId or ParentId of the status.
I wanna know which is the best method to create the system ?
As long as the original messages and the responses have the same structure (set of attributes, or columns) then you can use the single table approach. It has the advantage that you can search over original messages and responses with a single query.
The set of original messages can be found where pid = selfid and the responses where pid <> selfid. If it's important to be able to see the original and response messages separately (without knowledge of the storage mechanism) you can encapsulate the above conditions in two VIEWs: OriginalMessages and Responses.
If the originals and responses have different attributes (for instance, if you want the original to allow links to URLs, photos, etc) you might consider using two separate tables. But even there, I'd probably argue for the one table structure with a separate, extender table for the additional attributes. That means you don't have to store often-empty columns for those original messages that don't use the extended attributes, and you can later easily add the extended attributes to the response messages as well (if desired).
A classical IS-A relationship: every reply is a message with an extra attribute (the message it is a reply to).
This is probably not the best way to model it. You'll be running the risk of having to write a lot of UNION queries over those two tables.
Alternatives:
just one table: status_messages (id, message, time, status_message_id, user_id), and allowing status_message_id to be NULL
use a HAS-A: one table status_messages (id, message, time, user_id) and one table replies (reply_id, replies_to_id
The former has the disadvantage that working with NULL is tricky in SQL.
The latter will necessitate joins when you want to query replies specifically.
BTW it's much clearer (IMO) to name columns after the relationship they stand for, not the table they refer to.

How to implement one time (per user) messages in a web app?

When you want to alert a user of something once (one time notes about new features, upcoming events, special offers, etc.), what's the best way to do it?
I'm mainly concerned with the data representation, but if there are more issues to think about please point them out. This is my first time approaching this particular problem.
So my thoughts so far ...
You could have a users, a messages, and a seen/acknowledged messages table. When the user acknowledges the messages, we have a new entry in the seen table with a user id & message id pair.
However, the seen table will grow rapidly with the number of users and messages. At some point, this would become unwieldy (any insight when that would be for a single mysql db on a single server?).
Would it be better to just create 1 seen table per message and maybe end up with 20-30 such additional tables to start? Not really a problem. It just comes with the added nuisance of having to create a new table every time there is a new message (of course, that would be automated in the code - still a little more coding).
This is for a project that has 2-3K current users, but the hopes are to grow that to 10K over the next year, and of course, we're looking beyond that, too ...
Edit:
I'm not enthusiastic about the currently top voted method at all. The proposal seems to be to prepopulate a messages table and delete messages as they are seen. This seems to be a lot more work. You not only have to add your entire user list each time you add a new message. You also have to add all the messages for a new user each time you add a new user - separate logic.
On top of that, the record of a message being "seen" is actually the absence of a record. That does not seem right. Plus, if you later decide to track when messages were seen with a simple time stamp. You've have to rewrite a lot of code and other code becomes unusable.
Lastly, could someone tell me why it's so absolutely horrible to add new tables to the database? Doesn't this happen all the time when a new feature is added? Take any CMS: Joomla or Wordpress for example. When you add new plugins, you are creating tables dynamically. So it has to be more nuanced and contextual than "don't do it". What are the pitfalls and what are the circumstances under which you don't do it or it's okay to do?
I can see that you might say: Be careful about creating new tables on a production servers. Make sure it's been well tested, but ultimately, you're just adding an empty table.
This may require and extended answer, so if any knows any articles, please post them.
Edit: Gabriel Sosa gave a nice flushed out example of his messages table, and I'll simply create a seen table similar to what I originally posted although with timestamp column too. Thanks!
You could have the unseen messages listed in a table, and once the message is displayed you delete that row from the table. And you could also delete rows after X weeks, perhaps, whether or not the users see those messages. That would keep the table from growing unbounded. I'm imagining tables like this:
messages
--------
type PRIMARY KEY
text TEXT
unseenMessages
--------------
id PRIMARY KEY
messageType FOREIGN KEY
user FOREIGN KEY
expirationDate DATE
This unseenMessages table would hold all of the messages in your system, once per message per user. When a user loads a page you check if they have any entries in this table. If so you display those messages and then delete them from the table. Think of it like a message "inbox".
Also, I would not do anything that involves dynamic table creation. You should never, ever* create tables on the fly. Ever.
* Except temporary tables, of course.
All of your messages should be stored in one table, or at least a fixed number of predefined tables. It is a cardinal database sin to create tables on the fly. The same goes for adding and removing columns on the fly. You just don't change the database schema dynamically. You don't do that in polite society. If you think you have to, you haven't designed the database correctly.
The programming analogue is the eval() function: it's just one of those things that's almost never a good idea. And in all fairness, eval() is okay in certain situations. Creating tables on the fly never is.
Your volume is not intimidating for a modern RDBMS. Keep in mind that there are many 100's of MILLIONS of records sitting in Twitter's MySQL database and other SQL Server and Oracle databases.
I can see two ways to actually solve this
Set a distinct cookie for the particular message which is good if these messages area a rarity
Create a couple tables to hold the message details
Messages - would hold the message definition including the message text (or HTML in your case?), as well a message status of active/inactive.
User Messages - a cross reference
table that includes a row if a user
has viewed the message. When the
user sees and acknowledges a message
you could insert a row into this
table.
To determine whether a user
should see the message or not, you
would query this table and the
active messages with the user's ID.
If a result is returned, then you
should bypass the message, otherwise
display it..
I think this provides you an opportunity to scale well into the future since the "User Messages" tables would only be integer association keys between the User/Account table and the Message table. You could also log the user's disposition on the User Message table (acknowledged, viewed, bypassed, etc...)
Let me know if this isn't clear, and i can try to explain better or provide a diagram. I'm sure there are some other patterns for doing this as well. Bank of America flashes these things after logging into my online accounts about once every month or so.
this was my aproach:
CREATE TABLE IF NOT EXISTS `system_user_messages` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`section` enum('home','account','all') NOT NULL DEFAULT 'home',
`message` varchar(250) NOT NULL,
`message_type` varchar(25) NOT NULL,
`show` tinyint(4) NOT NULL DEFAULT '1',
`allow_dismiss` tinyint(4) NOT NULL DEFAULT '1',
`created_on` datetime NOT NULL,
`dismissed_on` datetime DEFAULT NULL,
`show_order` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx-user_id` (`user_id`),
KEY `idx-section` (`section`),
KEY `message_type` (`message_type`)
);
I added allow_dismiss because you may dont want allow the user to dismiss that message. In my case when some user's CC is about to expire we dont allow to dismiss and then the system remove the message once the user updates the CC information. By the other hand you may also want to show the message only on certain areas of your site.
I posted the sql because I think is clear in this way... I know there are lot improments to make over this schema but maybe give you an idea.
You could have a users, a messages,
and a seen/acknowledged messages
table. When the user acknowledges the
messages, we have a new entry in the
seen table with a user id & message id
pair.
That seems pretty reasonable.
However, the seen table will grow
rapidly with the number of users and
messages. At some point, this would
become unwieldy (any insight when that
would be for a single mysql db on a
single server?).
Based on the number of users you're talking, I wouldn't bet on that being a concern unless you do a lot of messages often. If it makes sense to retire messages, you could have a disabled flag on messages and a background job to remove rows from the seen table (warehouse it!) that correspond to disabled messages.
Performance wise, a lot of it is riding on your server's specs and what else it is doing. You can get a lot of performance out of a well-indexed, simple table even with lots (100s of thousands+) of rows and some configuration tweaks.
John Kugelman's solution (1 entry per user per message) makes more sense if you want to control who sees messages on a person/group level.
The techniques aren't mutually exclusive either.
Update:
Agree with no dynamic table creation. :)
How important are the messages? If they are truely just a view once "hey here's what's new", does it matter if you know if everyone sees them? (But maybe you want to measure effectiveness?) Why not just display for a week or month or whatever timeframe it is your average user logs in on. Any key site updates can be put on a separate change log page for those really interested. If there are special offers, then its just time limited and why wouldn't users want to be reminded of it each time they log in? Or you could have part of the home page (or a link) dedicated to what's new on the site and record in a cookie if the user has clicked it since it has last been updated, and if not, then highlight it for that user. Sorry for the rambley response.