This might not exactly be a "normalization" question, it's more the type of data which I am saving.
I've just done a specification for a messaging and email system . The idea is that I need to save all of the messages which are internal to my web service, but also know if an email has been sent with that message.
Here is the specification.
Specification
Any messages are stored in one table.
Messages can be from unregistered users, or registered users.
An unregistered user message will just have a return email address
A registered user message will have the user id of the sender
Messages are either owned by a User (meaning that they are the sent to) or messages are shared by user roles.
When a message is owned by a user, we record some information about this message (same table as the message).
a) Has the user opened/read the message?
b) Was an _email sent_ to the owner of the message or is it just an internal message
c) Date the message was first read
d) Date the message was sent
When a message is sent to a group of users, meaning that they are sent to "All Users", or "All Owners" or "All SuperAdmin"...
a) The message is saved once in the messages table with a sent date
b) Each individual open is tracked in a seperate table
c) A field records if a direct _email has been sent_, or if it is just saved internally in the system. (seperate table)
Messages can be threaded, this means that if a message is responded to, that it is a child or the original message.
Messages have different "Types", meaning that a message can be "System Notice", "Enquiry", "Personal Message", "Private Message", "Transactional Information"
Messages which are linked to an enquiry for a product, will save the ID of the product they are enquiring for. (ie The relevant property).
End Specification
Now the actual question...
As you can see in bullet 1)(b) I am recording for a message which is sent to an indiviual user, if an email was also sent for that message.
However, when an email is sent to a group of users, I am then recording whether an email was sent in a completely different table.
Obviously because I can't save this information in the same table.
What are your opinions on this model. I'm not duplicating any data, but I'm seperating where the data is saved. Should I just have a email_sent table to record all of this information.
It is hard to say whether your current design is good or bad. On the surface, I think that it is a mistake to separate the same piece of information into two places. It may seem easier to have a note about an individual email sent in the table which is closer to the individual and notes about emails sent to groups closer to the groups. However, your code is going to have to go looking in two places to find information about any email or about all emails in general.
If the meaning of the flag email_sent is the same for an individual user as it is for a member of a group of users, then looking in two places all the time for what is essentially one kind of information will be tedious (which in code terms comes down to being potentially slow and hard to support).
On the other hand, it may be that email_sent is something that is not important to your transactional or reporting logic and is just a mildly interesting fact that is "coming along for the ride". In this case, trying to force two different email_sent flags into one place may require an inconvenient and inadvisable mash-up of two entities that ought to be distinct because of all of their other, more important attributes.
It is difficult to give a conclusive answer without having a better understanding of your business requirement, but this is the trade-off you have to consider.
Create 3 tables:
MSG with id (key auto), msgtext, type (value U or R), userId/roleId
ROLES with roleId, userId
ACCS with userId, MsgId, date opened, read, etc
MSG records the message, with a type to see if it's from a role or unregistered user
ROLES points one role to many users
ACCS records everything, for a user, registered or not.
To retrieve, join the MSG type U with ACCS
join MSG type R with ROLES and then with ACCS
To retrieve all, UNION them
Related
I am using MySQL for my database and have the following Message model in my Django app:
class Message(models.Model):
sender = models.ForeignKey(User, on_delete=models.CASCADE, null=True, related_name='sender_notification')
recipient = models.ForeignKey(User, on_delete=models.CASCADE, related_name='recipient_notification')
message = models.TextField()
read = models.BooleanField(default=False)
recieved_date = models.DateTimeField(auto_now_add=True)
If I am not mistaken, this way the relationship between recipient/User and Message is one-to-many, since a single Message record can only have one recipient, but a User might have many records in the Message table. However, there will be situations where I want to send the same Message to multiple Users. With my current schema, I can just add multiple Message records, one for each recipient, with the same message text. However, this seems like a lot of duplication. In this case, would my relationship make more sense as a many-to-many one?
Initially I thought the duplication (multiple Messages with the same message field but different recipient fields) was a bad idea, but the more I think about it, it seems like modeling the many-to-many relationship would actually be more difficult, and require more data. Each Message record has a read field to indicate whether the recipient has read the message or not. If I have a single Message record with multiple recipients then it seems like I will need another table to keep track of the recipients who have and have not read the message. This means that if I want to send a message to every user, I will need to add one row to the Message table, but will have to add one row to a "read record" table for each recipient. So not only am I creating more rows total (1 Message, and one read record each for n users, as opposed to n Messages for n users), I am also adding an additional table to the mix.
I have done many searches trying to determine whether it is better to have a single table with many rows, or multiple tables with fewer rows, and it seems like there is no definitive answer; instead, it depends on your data and requirements. I am just looking for some insight into the best direction to go in my particular case, but am also interested in any general rules that I might be able to follow in the future, if such rules exist for this kind of situation.
I am very open to having more work for myself up front in order to do things the "right" way (if there is such a thing in this case), if that happens to be necessary. Thank you!
A way of solving the issue with the "read" flag for multiple users is to use a many-to-many with a custom through model, and put that flag there:
class UserMessage(models.Model):
read = models.BooleanField()
user = models.ForeignKey(User...)
message = models.ForeignKey(Message...)
class Message(models.Model):
user = models.ManyToManyField(User, through=UserMessage)
Now each user has an individual "read" field for each message, no matter how many users also read that message. Given the user and the message, you can get the intermediate model with UserMessage.objects.get(user=user, message=message).
I'm writing an app that stores messages sent to users in a mysql database. These messages can have keywords that will be replaced by users data. at this time the dilemma that exists is what is the best way to store messages.
I have two options:
Store the original message (including keywords) in a table, and recipients in another. when i need to get the message, can be processed before it is displayed. the biggest problem is that the message will be different each time the user changes his own data.
Store the original message (including keywords) in a table and another table to store the recipients and the message the user is received. the disadvantage is the possible duplication of data, which can be a headache if the same message is sent to 20,000 users.
I would suggest several tables.
message - table, which will store message text
user - table to store user account information
mail - table to store message_id, user_id_from, user_id_to, is_read and other attributes to be associated with the specific conversation.
In a message table you should store message templates. When the message is fetched for display, it should be rendered. If you will need caching, you will be able to add rendered version of a message to the mail table (if rendering will consume too much of the resources).
Here's what I came up with but I'm not sure which one of these is "the best". Perhaps there's another, better one that I may not know of. Keep in mind that I have both inbox and outbox in my app and messages deleted by either sender or recipient should still be visible to other related users unless they delete it themselves.
Option 1 - simple ManyToMany:
Tables:
User - just user fields
Message - just message fields
User_Message - contains 2 foreign keys: user_id and message_id
Example: When user sends a message, ONE message row is added to the Message table, and TWO rows are added to User_Message, obviously connecting sender and recipient with the added message. Now, this might get a little problematic when let's say I want to fetch only inbox messages because ManyToMany will fetch all of them so I came up with option 2.
Option 2 - OneToMany:
Tables:
User - just user fields
MessageReceived - message fields AND foreign key to user_id
MessageSent - message fields AND foreign key to user_id
Example: When user sends a message, this message is added to both received and sent tables but with different user_id. Of course senders id will be in sent table and recipient id in received table.
Now, when I want to fetch only inbox messages, I'm fetching messages from MessageReceived table and while deleting for example inbox (MessageReceived) message, copy of it still stays in MessageSent and is available to sender so everything is fine, however I feel like there's something "not cool" about this one because I'm basically keeping ALMOST the same data in both tables.
Please, let me know what do you think about this and if there is any better way to do it, I'm also listening.Thanks for your time.
EDIT :
Both Madbreaks and Tab Alleman provided really good and somewhat similar solutions so thanks for that. I'm gonna go with Madbreaks one, simply because I prefer to delete the relations in join table instead of keeping a 'deleted' column but that's just my taste. Nevertheless, thank you both for your time and answers.
You shouldn't need to add 2 rows in user_messages for each message - have 3 columns in that table: sender_id, recipient_id, message_id.
EDIT
The deletion scenario you describe in your question, below, changes things. Instead of a n-to-n approach, you likely now have two 1-to-n relationships:
the relationship between sender and their many sent messages
the relationship between a recipient and their many received messages
I would probably have the messages table have a sender ID foreign key. I would then have a message_recipients table that maps user (recipient) ID to message ID.
Now, if a sender can delete a message but the recipients should still be able to access it (and know who the sender is), then you'll need four tables:
users
messages
message_sender (1-to-1 map) -- senders deleting sent messages deletes from her
message_recipients (1-to-n map) -- recipients deleting received messages deletes from here
It's not clear from your question whether or not this is a requirement, I only add it for completeness. You may want a trigger or a subsequent query to determine if/when there are no remaining relationships between the users and messages tables, and at that time (possibly) delete the message itself.
Here's what I would do (I am assuming a message can only have one sender, but multiple recipients)
UserTable - Contains UserID and Other info
MessageTable - Has MessageID, SenderID (FK to UserTable.UserID) and other info
MessageRecipientsTable - Has MessageID, RecipientID (FK to UserTable.UserID), and possibly other info like when/if it was received, etc.
If you want a recipient to be able to delete a message and still have it show for the sender (and other recipients), then you would add a "Deleted" column to the MessageRecipientsTable. You would never actually delete a row from the messages table, but when populating a recipients inbox you would filter out the rows where "Deleted" is true.
In this application there are users, conversations and messages.
More than 2 users could participate in a conversation.
I was thinking this:(---- are relations, CAPITAL_WORDS are entities)
MESSAGE ---- CONVERSATION ---- USER
msg contains sender and the content to be sent;
conversation contains the users that participate in that conversation;
But this is not enough because the sender is a user and there is another link between MSG and USER. If I add this relation I have a problem because the sender should be a user that participate to the conversation.(there is a IS-A relation if I'm not mistaking)
I really don't know how to model this problem. If the conversation was between only 2 users, I would need only MESSAGE and USER. In fact the CONVERSATION entity is pretty strange.
If I eliminate the CONVERSATION from the initial problem I have to add in MESSAGE a field that contains a list of partecipants. But in this way I miss the concept of conversation and then if I have to retrieve the msgs related to a conversation I need a join of all data :) . --> impractical
I will appreciate every suggestion. I don't even know if the database is a good idea for this kind of application. My thought was to do something not too difficult to work with.(keeping in mind that I can lose performance)
I have a program where the user can enter multiple email addresses to get notification. I'm creating a field in the database to keep track of this and I'm not sure what would be the best data type to choose for all the email addresses. At this point I believe we will limit it to 4 email addresses.
What data type would be appropriate here for mysql?
Not sure this is relevant but I plan to serialize the data (with php function) When processing the email addresses. Interested in any feedback on my plans and if there is a better way to do this.
This indicates that you have 1:many relation of user:email addresses. Create another table with user_id and email columns and link it up to your users table via user_id.
Never serialize data and stick it in a column, you'll regret it later.