MySQL database model for signups with and without addresses - mysql

I've been thinking about this all evening (GMT) but I can't seem to figure out a good solution for this one. Here's the case...
I have to create a signup system which distinguishes 4 kinds of "users":
Individual sign ups (require address info)
Group sign ups (don't require address info)
Group contact (require address info)
Application users (don't require address info)
I really cannot come up with a decent way of modeling this into something that makes sense. I'd greatly appreciate it if you could share your ideas.
Thanks in advance!

Sounds like good case for single table inheritance

Requiring certain data is more a function of your application logic than your database. You can definitely define database columns that don't allow NULL values, but they can be set to "" (empty string) without any errors.
As far as how to structure your database, have two separate tables:
User
UserAddress
When you have a new signup that requires contact info, your application will create records in both tables. When a new signup doesn't require address info, your application will only create a record in the User table.

There are a couple considerations here: first, I like to look at User/Group as a case of a Composite pattern. It clearly meets the requirement: you often have to treat the aggregate and individual versions of the entity interchangeably (as you note). Implementing a composite in a database is not that hard. If you are using an ORM, it is pretty simple (inheritance).
On the other part of the question, you always have the ability to create data structures that are mostly empty. Generally, that's a bad idea. So you can say 'well, in the beginning, we don't have any information about the User so we will just leave all the other fields blank.' A better approach is to try and model the phases as if they were part of an FSM. One of the clearest ways to do this in this particular case is to distinguish between Users, Accounts and some other more domain-specific entity, e.g. Subscriber or Customer. Then, I can come and browse using User, sign up and make an Account, then later when you want address and other personal information, become a subscriber. This would also imply inheritance, and you have the added benefit of being able to have a true representation of the population at any time that doesn't require stupid shenanigans like 'SELECT COUNT(*) WHERE _ not null,' etc.

Here's a suggestion from my end after weighing pro's and con's on this model. As I think the ideal setup is to have all users be a user entity that belong to a group without differentiating groups from individuals (except of course flag a group contact person and creating a link with a groups table) we came up with the alternative to copy the group contact user details to the group members when they group is created.
This way all entities that actually are a person will get their own table.
Could this be a good idea? Awaiting your comments :)

I've decided to go with a construction where group members are separated from the user pool anyway. The group members eventually have no relation with a user since they don't require access to mutating their personal data, that's what a group contact person is for. Eventually I could add a possibility for groups to have multiple contact persons, even distinguishing persons that are or are not allowed to edit any member data.
That's my answer on this one.

Related

Database schema for chat: private and group

I'm trying to design the database schema with the ability to both private chat and group chat. Here's what I've got so far:
So - the theory is that even if the user is just in a one on one private chat, they are still assigned a 'roomID', and each message they send is to that room.
To find out all the rooms they are involved in, I can SELECT a list from the table participants to find out.
This is okay, However it feels to me that the room table is slightly redundant, in that I don't really need a room name, and I could leave it out and simply use the participants table and SELECT DISTINCT roomID FROM particpants to find out the individual rooms.
Can anyone explain to me a better structure or why I should keep the room table at all?
Your schema looks perfectly fine, you might see the others (including myself today) came with more or less the same structure before (Storing messages of different chats in a single database table, Database schema for one-to-one and group chat, Creating a threaded private messaging system like facebook and gmail). I'd really like to note that your visual representation is the best of all, it's so easy to understand and follow :)
In general, I think having "room" ("chat", "conversation") makes sense even if you have no specific properties at the moment (as it might be name, posting_allowed, type (i.e. if you reuse the similar structure not only for private messages and chats but i.e. to public posts with comments) and so on. Single table with the single index ID should be super fast and have close to zero overhead, however it will allow extension quite easily without need to modify all existing code (i.e. one day you decide to add a name to chats).
Keeping the roomID logic "hidden" inside participants table will not be transparent and neither efficient (i.e. when you need to find next ID of the chat), I wouldn't recommend that.
I think you may need to refine your domain model a little - without that, it's hard to say whether your schema is "right".
Taking Slack as a model (note - I haven't done a huge amount of research on this, so the details may be wrong), you might say that your system has "chats".
A chat can be public - i.e. listed for all users to see and join - or private - i.e. not listed for all users, and only available by invitation.
Public chats must have a "name" attribute. Private chats may or may not have a name attribute.
A chat can have 2..n participants.
All 1-1 chats start as private by default.
It is possible to change a private chat to a public chat.
In that case, you have an inheritance/specialisation relationship - "private" and "public" are subtypes of "chat".
The relational model is notoriously bad at dealing with inheritance; there are lots of related questions on SO.
I know this is a little late in the game but I've made a few of these and I always have an active type bool col in the message table. Just incase someone says something you can hide it but still keep a record of it. As well as user_auth in the users table. Sometimes I put in the room table auth_required -> user.user_auth incase you want leveled conversations like in many discords and always a datetime in the message col. Those are the standards at min because you will regret later if you don't have them..
I would do it more like this for a simple chat system with groups and privat chat (two member).
A other posibility is to create a table only for group message and one for privat chat. (to avoid the n:m between group and message table or you use the n:m like a feature and not as a posible bug / logic error). If you want a more complex chat system look at Neville Kuyt post.
I hope I was able to help you.

One or two tables: that is the quest*on

I'm trying to setup a database schema for a company which works as a middle man (selling items collected from vendors to buyers).
Both of these entities (vendors and buyers) can be generalized as a client - they both have very similar attributes (name, email, password, address, etc...) and multiple other entities depend on this. For example invoices are generated for buyers and settlements (different type of paperwork) are generated for vendors. The thing is that one person (a client) can by buyer and vendor in the same time.
The dilema I'm having is how to setup the database structure for this?
At the moment I'm more in favor of having both vendors and buyers in one table and distinguish between them using something like roles column. Thanks to this approach I would avoid the data redundancy and I could still create views to easily separate vendors from buyers to the outside world.
Am I thinking about this correctly? How would you typically solve this situation? Would it be better to use two separate tables?
Thank you for your advice and experiences :)
If you know the usecases, think about, what could be a rough solution. But that is quite dangerous, at the end sometimes the ingenious datamodel becomes too complicated to understand and maintain.
How important is it to decide now, will your datamodel or organization be fit for a later change?. Can you be agile? Then implement, what is best for your current usecases, nothing more!
btw.
if there is a 1 to 2 relationsship between person and role, you should factor out the role, not duplicate the data, or create two attributes, isBuyer and isVendor, or put in these attributes references to the buyer- and vendor-specific data, if there is any.

Database design & normalization

I'm creating a messaging system for a e-learning platform and there are some design concerns that I'd like some feedback on.
First of all, it is important for me and my system to be highly modifiable in the future. As such, maintaining a fairly high normalization across my tables is important.
On to how my system will work:
All members (students or teachers) are part of a virtual classroom.
Teachers can create tasks and exercises in these classrooms and assign them to one or multiple students (member_task table not illustrated).
A student can request help for a specific task or exercise by sending a message to the teachers of the classroom.
Messages sent by students are sent to all the teachers. They cannot address a message to a specific teacher.
Messages sent by teachers can be addressed to one or more students.
Students cannot send messages to other students.
Messages behave like chat, meaning that a private conversation starts between a student and all teachers when they send a message.
Here's the ER diagram I made:
So my question is, is this table normalized properly for my purpose? Is there anything that can be done to reduce redundancy of data across my tables? And out of curiosity, is it in BCNF?
Another question: I don't intend to ever implement delete features anywhere in my system. Only "archiving" where said classroom/task/member/message/whatever is simply hidden/deactivated. So is there any reason to actually use FK?
EDIT: Also, a friend brought to my attention that the Conversations table might be redundant, and it kinda feels so. Thoughts?
Thanks.
In response to your emphasis on "modifiability" which I'm taking to mean with respect to application and schema evolution I'm actually going to suggest a fairly extreme solution. Before that some notes some aspects you've mentioned. First, foreign keys represent meaningful constraints in your data. They should always be defined and enforced. Foreign keys are not there just for cascading delete. Second, the Conversations table is arguably redundant. It would make sense if you had a notion of "session" of chat which would correspond to a Conversation. Otherwise, you just have a bunch of messages throughout time. The Conversation table could also enable a many-to-many relation between messages and tasks/exercises if you wanted to have chats that simultaneously covered multiple exercises, for example.
Now for the extreme suggestion. You could use 6NF. In particular, you might look at its incarnation in anchor modeling. The most notable difference in this approach is each attribute is modeled as a different table. 6NF supports temporal databases (supported in anchor modeling via "historized" attributes/ties). This means handling situations like a student being associated to a task now but not later won't cause all their messages to disappear. Most relevant to you, all schema modifications are non-destructive and additive, so no old code breaks when you make a change.
There are downsides. First, it's a bit weird, and in particular anchor modeling (somewhat gratuitously?) introduces a bunch of new terms. Second, it produces weird queries for most relational databases which they may not optimize well. This can sometimes be resolved with materialized views. Third, at the physical level, every attribute is effectively nullable. Finally, the tooling and support, while present, is pretty young. In particular, for MySQL, you may only be "inspired by" what's provided on the anchor modeling site.
As far as the actual database model would go, it would look roughly similar. Anchor modeling uses the term "anchor" for roughly the same thing as an entity, and "tie" for roughly the same thing as a relation. For simplicity, dropping the Conversation relation (and thus directly connecting Message to Task), the image would be similar: you'd have an anchor for Classroom, Member, Message, and Task, and a tie replacing Recipient that you might called ReceivedMessage representing the relation of "member received message message". The attributes on your entities would be attribute nodes. Making the message attribute on the Message anchor historized would allow messages to be edited if desired and support a history of revisions.
One concern I have is that I don't see a Users table which will hold all the students and teachers info (login, email, system id, role, etc) but I assume there is something similar in our system?
Now, looking into the Members table: usually students change classes every semester or so and you don't want last semesters' students to receive new messages. I would suggest the following:
Members
=============
PK member_id
FK class_id
FK user_id
--------------
join_date
leave_date
active
role
The last two fields might be redundant:
active: is an alternative solution if you want to avoid using dates. This will become false when a user stops being member of this class. Since there is not delete feature, the Members entry has to be preserved for archive purposes (and historical log).
role: Depends on how you setup Users table and roles in your system. If a user entry has role field(s) then this is not needed. However, this field allows for the same user to assume different roles in different classes. Example: a 3rd year student, who was a member of this class 2 years ago, is now working as TA/LA (teaching/lab assistant) for the same class. This depends on how the institution works... in my BSc we had the "rule": anyone with grade > 8.5/10 in Java could volunteer to do workshops to other students (using uni's labs). Finally, this field if used as a mask or a constant, allows for roles to be extended (future-proof)
As for FKs I will always suggest using them for data consistency. Things can get really ugly really fast without FKs. The limitations they impose can be worked around and they are usually needed: What is the purpose of archiving a message with sender_id if the sender has been deleted by accident? Also, note that in most systems FKs are indexed which improves the performance of queries/joins.
Hope the above helps and not confuse things :)

SQL Database Setup

I'm setting up a database to run practice management software for lawsuits. When adding people associated with the suit, some of them will be repeat parties (eg lawyers for the firm) and some will be one-off parties (witnesses, etc). Looking for input on whether to make 1 "case users" table with values for a user id as well as the rest of the info for the one-off parties, or make 2 tables, one being "case users-firm" with 2 columns for the case and the user id, and another "case users-other" with the one-off party information.
It's pretty common to have a "Persons" table, filled with things common to all people like first name, last name, and a primary key. Then store that key everywhere you might want a person. Who knows? Your lawyer might be a witness. No need to duplicate the entry, when they are in fact the same person.
I don't see why you would want to have two tables, especially if they are going to have mostly the same fields. On the other hand, if you want to keep a lot more info for the attorneys than for the witnesses (or the other way around) two tables could be beneficial...
Just one other point to add. For privacy and data protection it is best not to store, encourage your users to store, or even set up a framework for storing, any more personal data than you actually need for your purposes. ( In the UK, think 'Data Protection' law ). So unless you are planning on evolving a kind of legal-profession facebook, I would keep the tables separate on this occasion.

Permissions for web site users

I'm working on a web site where each user can have multiple roles/permissions such as basic logging in, ordering products, administrating other users, and so on. On top of this, there are stores, and each store can have multiple users administrating it. Each store also has it's own set of permissions.
I've confused myself and am not sure how best to represent this in a db. Right now I'm thinking:
users
roles
users_roles
stores
stores_users
But, should I also have stores_roles and stores_users_roles tables to keep track of separate permissions for the stores or should I keep the roles limited to a single 'roles' table?
I originally thought of having only a single roles table, but then what about users who have roles in multiple stores? I.e., if a user is given a role of let's say 'store product updating' there would need to be some method of determining which store this is referring to. A stores_users_roles table could fix this by having a store_id field, thus a user could have 'store product updating' and 'store product deletion' for store #42 and only 'store product updating' for store #84.
I hope I'm making sense here.
Edit
Thanks for the info everyone. Apparently I have some thinking to do. This is simply a fun project I'm working on, but RBAC has always been something that I wanted to understand better.
This is probably obvious to you by now, but role based access control is hard. My suggestion is, don't try to write your own unless you want that one part to take up all the time you were hoping to spend on the 'cool stuff'.
There are plenty of flexible, thoroughly-tested authorization libraries out there implementing RBAC (sometimes mislabeled as ACL), and my suggestion would be to find one that suits your needs and use it. Don't reinvent the wheel unless you are a wheel geek.
It seems likely to me that if I have permission to do certain roles in a set of stores, then I would probably have the same permissions in each store. So having a single roles table would probably be sufficient. So "joe" can do "store product updating" and "store product deletion", then have a user_stores table to list which stores he has access to. The assumption is for that entire list, he would have the same permissions in all stores.
If the business rules are such that he could update and delete in one store, but only update, no delete, in another store, well then you'll have to get more complex.
In my experience you'll usually be told that you need a lot of flexibility, then once implemented, no one uses it. And the GUI gets very complex and makes it hard to administer.
If the GUI does get complex, I suggest you look at it from the point of view of the store as well as the point of view of the user. In other words, instead of selecting a user, then selecting what permissions they have, and what stores they can access, it may be simpler to first select a store, then select which users have access to which roles in that store. Depends I guess on how many users and how many stores. In a past project I found it far easier to do it one way than the other.
Your model looks ok to me. The only modification I think you need is as to the granularity of the Role. Right now, your role is just an operation.
But first, you need a store_role table, a joint table resolving the Many-to-many relationship b/w a role and a store. ie, one store can have many roles and one role can be done in many stores.
Eg: StoreA can CREATE, UPDATE, DELETE customer. and DELETE customer can be done in StoreA, StoreB and StoreC.
Next, you can freely associate users to store_role_id in the user_store_roles table.
Now, a user_store_role record will have a user_id and a store_role_id:
A collection of
SELECT * FROM USER_STORE_ROLE WHERE user_id = #userID
returns all permitted operations of the user in all the stores.
For a collection of users's roles in a particular store, do an inner join of the above to user_store table adding a WHERE part of like
where STORE_ROLE.store_id = #storeID
Put a store_id in the user_roles table.
If this is Rails, the user model would have_many :stores, :through => :roles