Database schema for chat: private and group - mysql

I'm trying to design the database schema with the ability to both private chat and group chat. Here's what I've got so far:
So - the theory is that even if the user is just in a one on one private chat, they are still assigned a 'roomID', and each message they send is to that room.
To find out all the rooms they are involved in, I can SELECT a list from the table participants to find out.
This is okay, However it feels to me that the room table is slightly redundant, in that I don't really need a room name, and I could leave it out and simply use the participants table and SELECT DISTINCT roomID FROM particpants to find out the individual rooms.
Can anyone explain to me a better structure or why I should keep the room table at all?

Your schema looks perfectly fine, you might see the others (including myself today) came with more or less the same structure before (Storing messages of different chats in a single database table, Database schema for one-to-one and group chat, Creating a threaded private messaging system like facebook and gmail). I'd really like to note that your visual representation is the best of all, it's so easy to understand and follow :)
In general, I think having "room" ("chat", "conversation") makes sense even if you have no specific properties at the moment (as it might be name, posting_allowed, type (i.e. if you reuse the similar structure not only for private messages and chats but i.e. to public posts with comments) and so on. Single table with the single index ID should be super fast and have close to zero overhead, however it will allow extension quite easily without need to modify all existing code (i.e. one day you decide to add a name to chats).
Keeping the roomID logic "hidden" inside participants table will not be transparent and neither efficient (i.e. when you need to find next ID of the chat), I wouldn't recommend that.

I think you may need to refine your domain model a little - without that, it's hard to say whether your schema is "right".
Taking Slack as a model (note - I haven't done a huge amount of research on this, so the details may be wrong), you might say that your system has "chats".
A chat can be public - i.e. listed for all users to see and join - or private - i.e. not listed for all users, and only available by invitation.
Public chats must have a "name" attribute. Private chats may or may not have a name attribute.
A chat can have 2..n participants.
All 1-1 chats start as private by default.
It is possible to change a private chat to a public chat.
In that case, you have an inheritance/specialisation relationship - "private" and "public" are subtypes of "chat".
The relational model is notoriously bad at dealing with inheritance; there are lots of related questions on SO.

I know this is a little late in the game but I've made a few of these and I always have an active type bool col in the message table. Just incase someone says something you can hide it but still keep a record of it. As well as user_auth in the users table. Sometimes I put in the room table auth_required -> user.user_auth incase you want leveled conversations like in many discords and always a datetime in the message col. Those are the standards at min because you will regret later if you don't have them..

I would do it more like this for a simple chat system with groups and privat chat (two member).
A other posibility is to create a table only for group message and one for privat chat. (to avoid the n:m between group and message table or you use the n:m like a feature and not as a posible bug / logic error). If you want a more complex chat system look at Neville Kuyt post.
I hope I was able to help you.

Related

Database design & normalization

I'm creating a messaging system for a e-learning platform and there are some design concerns that I'd like some feedback on.
First of all, it is important for me and my system to be highly modifiable in the future. As such, maintaining a fairly high normalization across my tables is important.
On to how my system will work:
All members (students or teachers) are part of a virtual classroom.
Teachers can create tasks and exercises in these classrooms and assign them to one or multiple students (member_task table not illustrated).
A student can request help for a specific task or exercise by sending a message to the teachers of the classroom.
Messages sent by students are sent to all the teachers. They cannot address a message to a specific teacher.
Messages sent by teachers can be addressed to one or more students.
Students cannot send messages to other students.
Messages behave like chat, meaning that a private conversation starts between a student and all teachers when they send a message.
Here's the ER diagram I made:
So my question is, is this table normalized properly for my purpose? Is there anything that can be done to reduce redundancy of data across my tables? And out of curiosity, is it in BCNF?
Another question: I don't intend to ever implement delete features anywhere in my system. Only "archiving" where said classroom/task/member/message/whatever is simply hidden/deactivated. So is there any reason to actually use FK?
EDIT: Also, a friend brought to my attention that the Conversations table might be redundant, and it kinda feels so. Thoughts?
Thanks.
In response to your emphasis on "modifiability" which I'm taking to mean with respect to application and schema evolution I'm actually going to suggest a fairly extreme solution. Before that some notes some aspects you've mentioned. First, foreign keys represent meaningful constraints in your data. They should always be defined and enforced. Foreign keys are not there just for cascading delete. Second, the Conversations table is arguably redundant. It would make sense if you had a notion of "session" of chat which would correspond to a Conversation. Otherwise, you just have a bunch of messages throughout time. The Conversation table could also enable a many-to-many relation between messages and tasks/exercises if you wanted to have chats that simultaneously covered multiple exercises, for example.
Now for the extreme suggestion. You could use 6NF. In particular, you might look at its incarnation in anchor modeling. The most notable difference in this approach is each attribute is modeled as a different table. 6NF supports temporal databases (supported in anchor modeling via "historized" attributes/ties). This means handling situations like a student being associated to a task now but not later won't cause all their messages to disappear. Most relevant to you, all schema modifications are non-destructive and additive, so no old code breaks when you make a change.
There are downsides. First, it's a bit weird, and in particular anchor modeling (somewhat gratuitously?) introduces a bunch of new terms. Second, it produces weird queries for most relational databases which they may not optimize well. This can sometimes be resolved with materialized views. Third, at the physical level, every attribute is effectively nullable. Finally, the tooling and support, while present, is pretty young. In particular, for MySQL, you may only be "inspired by" what's provided on the anchor modeling site.
As far as the actual database model would go, it would look roughly similar. Anchor modeling uses the term "anchor" for roughly the same thing as an entity, and "tie" for roughly the same thing as a relation. For simplicity, dropping the Conversation relation (and thus directly connecting Message to Task), the image would be similar: you'd have an anchor for Classroom, Member, Message, and Task, and a tie replacing Recipient that you might called ReceivedMessage representing the relation of "member received message message". The attributes on your entities would be attribute nodes. Making the message attribute on the Message anchor historized would allow messages to be edited if desired and support a history of revisions.
One concern I have is that I don't see a Users table which will hold all the students and teachers info (login, email, system id, role, etc) but I assume there is something similar in our system?
Now, looking into the Members table: usually students change classes every semester or so and you don't want last semesters' students to receive new messages. I would suggest the following:
Members
=============
PK member_id
FK class_id
FK user_id
--------------
join_date
leave_date
active
role
The last two fields might be redundant:
active: is an alternative solution if you want to avoid using dates. This will become false when a user stops being member of this class. Since there is not delete feature, the Members entry has to be preserved for archive purposes (and historical log).
role: Depends on how you setup Users table and roles in your system. If a user entry has role field(s) then this is not needed. However, this field allows for the same user to assume different roles in different classes. Example: a 3rd year student, who was a member of this class 2 years ago, is now working as TA/LA (teaching/lab assistant) for the same class. This depends on how the institution works... in my BSc we had the "rule": anyone with grade > 8.5/10 in Java could volunteer to do workshops to other students (using uni's labs). Finally, this field if used as a mask or a constant, allows for roles to be extended (future-proof)
As for FKs I will always suggest using them for data consistency. Things can get really ugly really fast without FKs. The limitations they impose can be worked around and they are usually needed: What is the purpose of archiving a message with sender_id if the sender has been deleted by accident? Also, note that in most systems FKs are indexed which improves the performance of queries/joins.
Hope the above helps and not confuse things :)

Proper way to model user groups

So I have this application that I'm drawing up and I start to think about my users. Well, My initial thought was to create a table for each group type. I've been thinking this over though and I'm not sure that this is the best way.
Example:
// Users
Users [id, name, email, age, etc]
// User Groups
Player [id, years playing, etc]
Ref [id, certified, etc]
Manufacturer Rep [id, years employed, etc]
So everyone would be making an account, but each user would have a different group. They can also be in multiple different groups. Each group has it's own list of different columns. So what is the best way to do this? Lets say I have 5 groups. Do I need 8 tables + a relational table connecting each one to the user table?
I just want to be sure that this is the best way to organize it before I build it.
Edit:
A player would have columns regarding the gear that they use to play, the teams they've played with, events they've gone to.
A ref would have info regarding the certifications they have and the events they've reffed.
Manufacturer reps would have info regarding their position within the company they rep.
A parent would have information regarding how long they've been involved with the sport, perhaps relations with the users they are parent of.
Just as an example.
Edit 2:
**Player Table
id
user id
started date
stopped date
rank
**Ref Table
id
user id
started date
stopped date
is certified
certified by
verified
**Photographer / Videographer / News Reporter Table
id
user id
started date
stopped date
worked under name
website / channel link
about
verified
**Tournament / Big Game Rep Table
id
user id
started date
stopped date
position
tourney id
verified
**Store / Field / Manufacturer Rep Table
id
user id
started date
stopped date
position
store / field / man. id
verified
This is what I planned out so far. I'm still new to this so I could be doing it completely wrong. And it's only five groups. It was more until I condensed it some.
Although I find it weird having so many entities which are different from each other, but I will ignore this and get to the question.
It depends on the group criteria you need, in the case you described where each group has its own columns and information I guess your design is a good one, especially if you need the information in a readable form in the database. If you need all groups in a single table you will have to save the group relevant information in a kind of object, either a blob, XML string or any other form, but then you will lose the ability to filter on these criteria using the database.
In a relational Database I would do it using the design you described.
The design of your tables greatly depends on the requirements of your software.
E.g. your description of users led me in a wrong direction, I was at first thinking about a "normal" user of a software. Basically name, login-information and stuff like that. This I would never split over different tables as it really makes tasks like login, session handling, ... really complicated.
Another point which surprised me, was that you want to store the equipment in columns of those user's tables. Usually the relationship between a person and his equipment is not 1 to 1 and in most cases the amount of different equipment varies. Thus you usually have a relationship between users and their equipment (1:n). Thus you would design an equipment table and there refer to the owner's user id.
But after you have an idea of which data you have in your application and which relationships exist between your data, the design of the tables and so on is rather straitforward.
The good news is, that your data model and database design will develop over time. Try to start with a basic model, covering the majority of your use cases. Then slowly add more use cases / aspects.
As long as you are in the stage of planning and early implementation phasis, it is rather easy to change your database design.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

MySQL database model for signups with and without addresses

I've been thinking about this all evening (GMT) but I can't seem to figure out a good solution for this one. Here's the case...
I have to create a signup system which distinguishes 4 kinds of "users":
Individual sign ups (require address info)
Group sign ups (don't require address info)
Group contact (require address info)
Application users (don't require address info)
I really cannot come up with a decent way of modeling this into something that makes sense. I'd greatly appreciate it if you could share your ideas.
Thanks in advance!
Sounds like good case for single table inheritance
Requiring certain data is more a function of your application logic than your database. You can definitely define database columns that don't allow NULL values, but they can be set to "" (empty string) without any errors.
As far as how to structure your database, have two separate tables:
User
UserAddress
When you have a new signup that requires contact info, your application will create records in both tables. When a new signup doesn't require address info, your application will only create a record in the User table.
There are a couple considerations here: first, I like to look at User/Group as a case of a Composite pattern. It clearly meets the requirement: you often have to treat the aggregate and individual versions of the entity interchangeably (as you note). Implementing a composite in a database is not that hard. If you are using an ORM, it is pretty simple (inheritance).
On the other part of the question, you always have the ability to create data structures that are mostly empty. Generally, that's a bad idea. So you can say 'well, in the beginning, we don't have any information about the User so we will just leave all the other fields blank.' A better approach is to try and model the phases as if they were part of an FSM. One of the clearest ways to do this in this particular case is to distinguish between Users, Accounts and some other more domain-specific entity, e.g. Subscriber or Customer. Then, I can come and browse using User, sign up and make an Account, then later when you want address and other personal information, become a subscriber. This would also imply inheritance, and you have the added benefit of being able to have a true representation of the population at any time that doesn't require stupid shenanigans like 'SELECT COUNT(*) WHERE _ not null,' etc.
Here's a suggestion from my end after weighing pro's and con's on this model. As I think the ideal setup is to have all users be a user entity that belong to a group without differentiating groups from individuals (except of course flag a group contact person and creating a link with a groups table) we came up with the alternative to copy the group contact user details to the group members when they group is created.
This way all entities that actually are a person will get their own table.
Could this be a good idea? Awaiting your comments :)
I've decided to go with a construction where group members are separated from the user pool anyway. The group members eventually have no relation with a user since they don't require access to mutating their personal data, that's what a group contact person is for. Eventually I could add a possibility for groups to have multiple contact persons, even distinguishing persons that are or are not allowed to edit any member data.
That's my answer on this one.

Mysql database design

currently Im working on a project that, at first glance, will require many tables in a database. Most of the tables are fairly straightforward however I do have an issue. One of the tables will be a list of members for the website, things like username, password, contact info, bio, education, etc will be included. This is a simple design, however, there is also a need for each member to have their availability entered and store in the database as well. Availability is defined as a date and time range. Like available on 4/5/2011 from 1pm to 6pm EST, or NOT available every friday after 8pm EST. For a single user, this could be a table on its own, but for many users, Im not sure how to go about organizing the data in a manageable fashion. First thought would be to have code to create a table for each user, but that could mean alot of tables in the database in addition to the few I have for other site functions. Logically i could use the username appended to Avail_ or something for the table name ie: Avail_UserBob and then query that as needed. But im curious if anyone can think of a better option than having the potential of hundreds of tables in a single database.
edit
So general agreement would be to have a table for members, unique key being ID for instance. Then have a second table for availability (date, start time, end time, boolean for available or not, and id of member this applies to). Django might sound nice and work well, but i dont have the time to spend learning another framework while working on this project. The 2 table method seems plausable but Im worried about the extra coding required for features that will utilize the availability times to A) build a calender like page to add, edit, or remove entered values, and B) match availabilities with entries from another table that lists games. While I might have more coding, I can live with that as long as the database is sound, functional, and not so messy. Thanks for the input guys.
Not to sound like a troll, but you should take a look into using a web framework to build most of this for you. I'd suggest taking a look at Django. With it you can define the type of fields you wish to store (and how they relate) and Django builds all the SQL statements to make it so. You get a nice admin interface for free so staff can login and add/edit/etc.
You also don't have to worry about building the login/auth/change password, etc. forms. all that session stuff is taken care of by Django. You get to focus on what makes your project/app unique.
And it allow you to build your project really, really fast.
djangoproject.org
I don't have any other framework suggestions that meet your needs. I do... but I think Django will fit the bill.
Create a table to store users. Use its primary key as foreign key in other tables.
The databases are written to hold many many rows in a table. There are not optimized for table creation. So it is not a good idea to create a new table for each user. Instead give each user an unique identifier and put the availability in a separate table. Provide an additional flag to make an entry valid or invalid.
Create a table of users; then create a table of availabilities per user. Don't try to cram availabilities into the user table: that will guarantee giant grief for you later on; and you'll find you have to create an availabilities table then.
Google database normalization to get an idea why.
Take it as truth from one who has suffered such self-inflicted grief :-)