Representation of a simple messenger application using a database - mysql

In this application there are users, conversations and messages.
More than 2 users could participate in a conversation.
I was thinking this:(---- are relations, CAPITAL_WORDS are entities)
MESSAGE ---- CONVERSATION ---- USER
msg contains sender and the content to be sent;
conversation contains the users that participate in that conversation;
But this is not enough because the sender is a user and there is another link between MSG and USER. If I add this relation I have a problem because the sender should be a user that participate to the conversation.(there is a IS-A relation if I'm not mistaking)
I really don't know how to model this problem. If the conversation was between only 2 users, I would need only MESSAGE and USER. In fact the CONVERSATION entity is pretty strange.
If I eliminate the CONVERSATION from the initial problem I have to add in MESSAGE a field that contains a list of partecipants. But in this way I miss the concept of conversation and then if I have to retrieve the msgs related to a conversation I need a join of all data :) . --> impractical
I will appreciate every suggestion. I don't even know if the database is a good idea for this kind of application. My thought was to do something not too difficult to work with.(keeping in mind that I can lose performance)

Related

Is there a scalability issue in having a one-to-many relationship between participants and conversation?

I have a database design as shown in following entity-relationship diagram (ERD):
https://app.dbdesigner.net/designer/schema/0-social_media-00a3405c-0bcd-4809-9f8e-e86c1b8e5f33
I was wondering if I should have a one-to-many relationship between Participants and Conversation.
Issue: need many joins
The issue is that we need to make a join every time we want to get the id of the Participants of a Conversation to broadcast Messages.
Not only that, but we also need the content of the Messages, meaning we need to make two joins between three tables.
Questions
Is there a more scalable solution for this?
Is there any bottleneck issues?
Is there anything else wrong with the table aside that as an added bonus?
Scalable because:
If one conversation attracts more and more Users (in their role as participants), you simply have to add rows in the table Participants. Imagine the conversation has a members-list, it's called Participants.
If one User account was deleted, you simply have to search for all his records (associated conversations) in table Participants and delete them as well.
Both cases mean only a modification of Participants, whereas the conversation remains untouched.
Associative Entity
This membership or relationship of User to Conversation is bridged by a so-called associative relationship, associative table or associative entity. Means one User can attend (participate in) 0 or many Conversations, vice-versa one Conversation can have (at least) one (the creator) or many participating Users.
So the entity/table Participants acts like a bridge: connecting two sides/perspectives.
Broadcast Example
User A wants to broadcast a message to the channel/conversation 1. Now the system needs to determine all recipients. So look only within table Participants for the conversation 1 and find their attending Users A, B and C. All except the sender A should receive the broadcast: B and C.
There was no join involved. A simple query: SELECT user_id FROM participants WHERE conversation_id = 1 AND user_id <> 'A'. Given the Message and assuming that user_ids can be used directly as destination (email-address, phone-number, etc.), the system can immediately send the broadcast out.

What is the optimal way of setting up a database for a messaging/email application?

I am currently trying to create an email style web app to allow users of my site to contact one another. I have created an SQL table for this, which has the following headings:
id
senderID
recipientID
timestamp
message
read (Boolean to record whether message has been read by recipient)
starred (Boolean to record whether message has been starred by recipient)
archived (Boolean to record whether message has been archived by recipient)
deleted (Boolean to record whether message has been deleted by recipient)
convoID
I have started to now realise that this table is insufficient. For example, if the conversation has been starred by a user, this does not tell me which of the 2 users has starred the convo, etc.
Can anyone suggest a way to avoid the above issue, and maybe suggest a better database structure?
I would recommend split your table into two, let's call them "message" and "star". So, they are:
message
-------
id
sender_id
recipient_id
timestamp
read
archived
deleted
convo_id
parent_id
star
----
message_id
user_id
timestamp
As you can see, I added parent_id into message. If you don't need hierarchical structure - you may kick this column. A star table gives possibility to enhance starring feature. Who knows, maybe in future all users may put a star, not only participants of conversation.
In addition, there is a nice articles about DB normalization. It will really helps you to build well-organized DB structure:
What is Normalisation (or Normalization)?
http://www.studytonight.com/dbms/database-normalization.php
http://searchsqlserver.techtarget.com/definition/normalization
depend on your application and how many users you will have.
About the starred, archived and other stuff where both users can do things, you can use an enumeration or simply a couple of values. Not just a boolean.
Or you can split every read with a senderRead and recipentRead

Django database models - messages and recipients: many-to-many or many-to-one?

I am using MySQL for my database and have the following Message model in my Django app:
class Message(models.Model):
sender = models.ForeignKey(User, on_delete=models.CASCADE, null=True, related_name='sender_notification')
recipient = models.ForeignKey(User, on_delete=models.CASCADE, related_name='recipient_notification')
message = models.TextField()
read = models.BooleanField(default=False)
recieved_date = models.DateTimeField(auto_now_add=True)
If I am not mistaken, this way the relationship between recipient/User and Message is one-to-many, since a single Message record can only have one recipient, but a User might have many records in the Message table. However, there will be situations where I want to send the same Message to multiple Users. With my current schema, I can just add multiple Message records, one for each recipient, with the same message text. However, this seems like a lot of duplication. In this case, would my relationship make more sense as a many-to-many one?
Initially I thought the duplication (multiple Messages with the same message field but different recipient fields) was a bad idea, but the more I think about it, it seems like modeling the many-to-many relationship would actually be more difficult, and require more data. Each Message record has a read field to indicate whether the recipient has read the message or not. If I have a single Message record with multiple recipients then it seems like I will need another table to keep track of the recipients who have and have not read the message. This means that if I want to send a message to every user, I will need to add one row to the Message table, but will have to add one row to a "read record" table for each recipient. So not only am I creating more rows total (1 Message, and one read record each for n users, as opposed to n Messages for n users), I am also adding an additional table to the mix.
I have done many searches trying to determine whether it is better to have a single table with many rows, or multiple tables with fewer rows, and it seems like there is no definitive answer; instead, it depends on your data and requirements. I am just looking for some insight into the best direction to go in my particular case, but am also interested in any general rules that I might be able to follow in the future, if such rules exist for this kind of situation.
I am very open to having more work for myself up front in order to do things the "right" way (if there is such a thing in this case), if that happens to be necessary. Thank you!
A way of solving the issue with the "read" flag for multiple users is to use a many-to-many with a custom through model, and put that flag there:
class UserMessage(models.Model):
read = models.BooleanField()
user = models.ForeignKey(User...)
message = models.ForeignKey(Message...)
class Message(models.Model):
user = models.ManyToManyField(User, through=UserMessage)
Now each user has an individual "read" field for each message, no matter how many users also read that message. Given the user and the message, you can get the intermediate model with UserMessage.objects.get(user=user, message=message).

Basic Normalization Question

This might not exactly be a "normalization" question, it's more the type of data which I am saving.
I've just done a specification for a messaging and email system . The idea is that I need to save all of the messages which are internal to my web service, but also know if an email has been sent with that message.
Here is the specification.
Specification
Any messages are stored in one table.
Messages can be from unregistered users, or registered users.
An unregistered user message will just have a return email address
A registered user message will have the user id of the sender
Messages are either owned by a User (meaning that they are the sent to) or messages are shared by user roles.
When a message is owned by a user, we record some information about this message (same table as the message).
a) Has the user opened/read the message?
b) Was an _email sent_ to the owner of the message or is it just an internal message
c) Date the message was first read
d) Date the message was sent
When a message is sent to a group of users, meaning that they are sent to "All Users", or "All Owners" or "All SuperAdmin"...
a) The message is saved once in the messages table with a sent date
b) Each individual open is tracked in a seperate table
c) A field records if a direct _email has been sent_, or if it is just saved internally in the system. (seperate table)
Messages can be threaded, this means that if a message is responded to, that it is a child or the original message.
Messages have different "Types", meaning that a message can be "System Notice", "Enquiry", "Personal Message", "Private Message", "Transactional Information"
Messages which are linked to an enquiry for a product, will save the ID of the product they are enquiring for. (ie The relevant property).
End Specification
Now the actual question...
As you can see in bullet 1)(b) I am recording for a message which is sent to an indiviual user, if an email was also sent for that message.
However, when an email is sent to a group of users, I am then recording whether an email was sent in a completely different table.
Obviously because I can't save this information in the same table.
What are your opinions on this model. I'm not duplicating any data, but I'm seperating where the data is saved. Should I just have a email_sent table to record all of this information.
It is hard to say whether your current design is good or bad. On the surface, I think that it is a mistake to separate the same piece of information into two places. It may seem easier to have a note about an individual email sent in the table which is closer to the individual and notes about emails sent to groups closer to the groups. However, your code is going to have to go looking in two places to find information about any email or about all emails in general.
If the meaning of the flag email_sent is the same for an individual user as it is for a member of a group of users, then looking in two places all the time for what is essentially one kind of information will be tedious (which in code terms comes down to being potentially slow and hard to support).
On the other hand, it may be that email_sent is something that is not important to your transactional or reporting logic and is just a mildly interesting fact that is "coming along for the ride". In this case, trying to force two different email_sent flags into one place may require an inconvenient and inadvisable mash-up of two entities that ought to be distinct because of all of their other, more important attributes.
It is difficult to give a conclusive answer without having a better understanding of your business requirement, but this is the trade-off you have to consider.
Create 3 tables:
MSG with id (key auto), msgtext, type (value U or R), userId/roleId
ROLES with roleId, userId
ACCS with userId, MsgId, date opened, read, etc
MSG records the message, with a type to see if it's from a role or unregistered user
ROLES points one role to many users
ACCS records everything, for a user, registered or not.
To retrieve, join the MSG type U with ACCS
join MSG type R with ROLES and then with ACCS
To retrieve all, UNION them

Database Design: private chat, group chat, and emails

The communication between Facebook users seem to be stored in one long "conversation." So, emails sent and private chat messages exchanged all seem to be part of one long ongoing conversation.
I think this implementation works well for users (at least it does for me). I assume the table design for this part could be implemented this way:
TABLE: message
- message_id
- timestamp
- from_user_id
- to_user_id
- message
What if I wanted to support group chat? Would I do something like this:
TABLE: message
- message_id
- timestamp
- from_user_id
- message
TABLE: message_recipient
- message_recipient_id
- message_id
- to_user_id
I think it'll work. However, I'm wondering if it would make sense to the user if I displayed every single things that user has ever messaged anyone in one long conversation. It probably wont. Imagine a conversation with Person A mixed with group conversation with Person A, B, C, D mixed with conversation with Person E and so on ....
Any suggestion on what would be a usable concept to implement?
I believe a message should be an entity, regardless of platform or sender/receiver, with id,message,timestamp fields, and a message relation table - like you suggested - with id,message_id,from_id,to_id.
Then, if you are showing a single user to user conversation, you can show every message between them.
For group chats, you should have a table with id,title,timestamp that holds the group chat main record, and another table that holds the users that are part of that group chat, with id,group_chat_id,user_id fields.
Just my opinion and how I would implement it.
Edit: Maybe it would make sense to have from_id on the message entity itself, as a message has to have a singular sender id.
You could also group messages by topics.
You add a topic table. You add a recipients table, tied to a topic. Messages will also be tied to a topic.
You can programmatically limit the topics between two users by looking which topic has those two users in its recipients.
You could also separate your messages by giving them a type attribute. For example, type 0 will be an inbox message, type 1 will be a chat message and so on.
If I wanted to have an arbitrary number of recipients in one topic, I would avoid the from_id/to_id combo.