Right design to structurally ensure data consistency - mysql

In my current design, I have app_group, student and group_article:
To structurally ensure that a group_article is only associated with a student from that same group, the foreign keys "publisher" and "app_group" are taken from the join entity group_member (1) as opposed to having them issued from student and app_group individually. This way, someone with the right to insert new records into the database cannot introduce incoherent data such as adding an article that have been written by a student that isn't even in that group which would be poor design. Now, I want generalize this approach into multiple students or multiple groups. I now have group_message, group_message_in and group_message_out which is an inheritance chain (group_message is the base which is an abstract entity in Symfony, and both group_message_in and group_message_out extend it):
Initially, I was planning to embed the group foreign key on the base class (group_message) and have the sender/recipient (respectively on group_message_out and group_message_in) be taken from student directly:
However, this will leave the database vulnerable to incoherence as per the first example, eg: student from group A can be associated with a message that targets student from group B which is not desirable (only students from the same group can exchange group_message).
I'm well aware that I can amend this risk in code but I want a similar solution to (1) and to know if this is achievable with Doctrine since MySQL itself might have ways of solving a similar problem that aren't supported by Doctrine.

A relational solution to your problem would look something like this:
The integrity that you seek would be achieved by the PK-FK relationships and by assigning a student to a group using the groupName colums.
Your question then becomes something like "How can I use Doctrine to do the same thing?"
To the best of my knowledge Doctrine uses a set of PHP libraries to create what its proponents call a "persistence layer" that stores what it calls "Entities". With Doctrine, the term "Entity" is a synonym for "Class" in the OO paradigm.
In other words Doctrine stores classes in the data layer.
And now we can see the problem.
A relational schema is a structure of relations which is a completely different kind of artefact than a collection of classes.
The OO/Relational divide has been called an "impedance mismatch". Unfortunately this term obscures more than it reveals.
To quote from the Wikipedia article: "There have been some attempts at building object-oriented database management systems (OODBMS) that would avoid the impedance mismatch problem. They have been less successful in practice than relational databases however, partly due to the limitations of OO principles as a basis for a data model."
I suggest that you also review Ted Neward's article "The Vietnam of Computer Science."

This new answer shows the object-role model, the relational schema that it generates and the logic that is implied by the new constraint (shown by the red arrow)
The object- role model.
This is the logic that is asserted by the fact type Student(.id) is a member of Group(.name)
Now as the domain expert, you can read this verbalization and tell me whether it is True or False in your domain.
Please note that all I did as the modeler, was to change the constraint (shown by the red arrow) and the ORM tool called NORMA generated the new verbalization that you see here.
When the domain expert agrees that the model conforms to the requirements then it takes a few seconds to generate the SQL DDL that can then be used to create a new database schema in an RDBMS.

Related

ER Diagram, Physical Data Model Relations

I am trying to create a very simple database Supermarket management system.
And it seems that I am having a problem with how relations work between entities, I am using PowerDesigner to create the ERD and then generate everything from it(LDM, PDM, OOM). Is this a bad idea?.
Now for my main problem It's between these 3 tables:
Employee(Cashier)
Customer
Orders(Receipt).
The way I did it is:
The customer gather the products he wants to buy and present it to the employee, then the employee gets the order for the customer from the machine, so:
There is a relation between the Customer and the Employee (Many to Many) : each customer can request_order from one or more Employee and each Employee can get_order to one or more Customer.
There is a relation between the Employee and the Orders (1 To Many) : each Employee can get one or more orders, each order is fetched by one employee.
The problem is if I want to know the customer related to that specific order......I can't.
How do I fix this? How can I get the specific order that customer made.
I am still very new to this, so sorry for any obvious mistakes.
I am sticking to the Relational Database context, that you have tagged.
Data Modelling is an iterative process. There is a lot more definition that is needed, before the data model can be complete. Rather than answering the specifics that you request, which would be limited to one iteration; one increment, allow me to provide something more complete, several iterations progressed.
If it is useful, please discuss this data model, and progress it to fulfil all your requirements.
Of course it is too small as an inline graphic. As a PDF Supermarket Data Model.
The Standard for Relational Data Modelling since 1983 is IDEF1X. For those unfamiliar with the Standard, refer to the short IDEF1X Introduction.
I am using PowerDesigner to create the ERD and then generate everything from it (LDM, PDM, OOM). Is this a bad idea?.
PowerDesigner is great. Just ignore the Oracle-specific nonsense, it pushes you into considering the physical far too early.
Skip the ERD, it is brain-dead in the context of the Relational paradigm, and surpassed by IDEF1X, which is specific to that paradigm.
Use the Entity Level display for ERD equivalence.
For small projects you can ignore the academic distinctions {CDM; LDM; PDM; OOM, etc}.
There is actually just one model: it is "conceptual" at the beginning, and you just progress to "logical", and last, when the "logical" is stable, to the "physical".
Understand that the whole process is Logical.
Unfortunately, in PD you have to have separate "models" or files for each.
Now for my main problem It's between these 3 tables:
I have solved that issue. And exposed others.
each customer can request_order from one or more Employee and each Employee can get_order to one or more Customer
each Employee can get_order to one or more Customer
Yes, but that is the overall result. In each shopping or presentation instance:
a customer can request_order from one Employee (Cashier)
a Employee can get_order from one Customer
The problem is if I want to know the customer related to that specific order......I can't. How do I fix this?
Solved: Each Order is Identified by (CustomerId, DateTime), ie. the Customer who created the Order.
Note
Do not mix Process elements (eg. Get_Order) with Data elements (eg. the data model). The two areas are separate, and governed by quite different science. Here we are solving the Data; only the Data; and nothing but the Data. After that, the Process Model is easy.
RecordIds are anti-Relational. They are certainly not needed in a Relational database. Read my other Answers for detailed explanations.
Relational Keys (aka Compound Keys or Composite Keys) are standard fare in a Relational database. They provide far more integrity than a RecordId based file ever can.
You need to be more precise (state the exact sequence) in defining how an Order is created.
Please feel free to comment or ask specific questions.

How to design a "dynamic" relational database

We're building a new piece of software for our company, where we want to manage our inventory.
The goal for the tool is to be customizable by the customer.
My part is mostly on the DB side. We have chosen MariaDB as our DB engine, and while we are working with the rather static functionality of a relational DB, we want to realize a rather dynamic solution.
Our chief programmer has explained to me the basics of the concept I shall implement into our DB:
We want a table which basically just consists of other tables.
Lets call it "maintable".
Maintable shall then reference its "attributes", which are the other tables.
For example, maintable references "Workstations".
"Workstations" then contains attributes like CPU, RAM, Drives, PSU etc..
And now comes the part which I didn't completely understand. The actual VALUES to these attributes in "Workstations" shall not be inserted into "Workstations". Instead, they are packed into another (junction?) table.
The reason for this approach is that the customer shall be able to customize the DB to his needs.
When the customer wants to add another attribute, he shall be able to do so. For example, if a new PSU now requires another attribute for an additional serial number, then the customer shall be able to simply create this new attribute in the front-end input form and then persist it to the DB.
If someone could point to good tutorials explaining this type of DB concept, then I would be glad as well! :=)

Database design & normalization

I'm creating a messaging system for a e-learning platform and there are some design concerns that I'd like some feedback on.
First of all, it is important for me and my system to be highly modifiable in the future. As such, maintaining a fairly high normalization across my tables is important.
On to how my system will work:
All members (students or teachers) are part of a virtual classroom.
Teachers can create tasks and exercises in these classrooms and assign them to one or multiple students (member_task table not illustrated).
A student can request help for a specific task or exercise by sending a message to the teachers of the classroom.
Messages sent by students are sent to all the teachers. They cannot address a message to a specific teacher.
Messages sent by teachers can be addressed to one or more students.
Students cannot send messages to other students.
Messages behave like chat, meaning that a private conversation starts between a student and all teachers when they send a message.
Here's the ER diagram I made:
So my question is, is this table normalized properly for my purpose? Is there anything that can be done to reduce redundancy of data across my tables? And out of curiosity, is it in BCNF?
Another question: I don't intend to ever implement delete features anywhere in my system. Only "archiving" where said classroom/task/member/message/whatever is simply hidden/deactivated. So is there any reason to actually use FK?
EDIT: Also, a friend brought to my attention that the Conversations table might be redundant, and it kinda feels so. Thoughts?
Thanks.
In response to your emphasis on "modifiability" which I'm taking to mean with respect to application and schema evolution I'm actually going to suggest a fairly extreme solution. Before that some notes some aspects you've mentioned. First, foreign keys represent meaningful constraints in your data. They should always be defined and enforced. Foreign keys are not there just for cascading delete. Second, the Conversations table is arguably redundant. It would make sense if you had a notion of "session" of chat which would correspond to a Conversation. Otherwise, you just have a bunch of messages throughout time. The Conversation table could also enable a many-to-many relation between messages and tasks/exercises if you wanted to have chats that simultaneously covered multiple exercises, for example.
Now for the extreme suggestion. You could use 6NF. In particular, you might look at its incarnation in anchor modeling. The most notable difference in this approach is each attribute is modeled as a different table. 6NF supports temporal databases (supported in anchor modeling via "historized" attributes/ties). This means handling situations like a student being associated to a task now but not later won't cause all their messages to disappear. Most relevant to you, all schema modifications are non-destructive and additive, so no old code breaks when you make a change.
There are downsides. First, it's a bit weird, and in particular anchor modeling (somewhat gratuitously?) introduces a bunch of new terms. Second, it produces weird queries for most relational databases which they may not optimize well. This can sometimes be resolved with materialized views. Third, at the physical level, every attribute is effectively nullable. Finally, the tooling and support, while present, is pretty young. In particular, for MySQL, you may only be "inspired by" what's provided on the anchor modeling site.
As far as the actual database model would go, it would look roughly similar. Anchor modeling uses the term "anchor" for roughly the same thing as an entity, and "tie" for roughly the same thing as a relation. For simplicity, dropping the Conversation relation (and thus directly connecting Message to Task), the image would be similar: you'd have an anchor for Classroom, Member, Message, and Task, and a tie replacing Recipient that you might called ReceivedMessage representing the relation of "member received message message". The attributes on your entities would be attribute nodes. Making the message attribute on the Message anchor historized would allow messages to be edited if desired and support a history of revisions.
One concern I have is that I don't see a Users table which will hold all the students and teachers info (login, email, system id, role, etc) but I assume there is something similar in our system?
Now, looking into the Members table: usually students change classes every semester or so and you don't want last semesters' students to receive new messages. I would suggest the following:
Members
=============
PK member_id
FK class_id
FK user_id
--------------
join_date
leave_date
active
role
The last two fields might be redundant:
active: is an alternative solution if you want to avoid using dates. This will become false when a user stops being member of this class. Since there is not delete feature, the Members entry has to be preserved for archive purposes (and historical log).
role: Depends on how you setup Users table and roles in your system. If a user entry has role field(s) then this is not needed. However, this field allows for the same user to assume different roles in different classes. Example: a 3rd year student, who was a member of this class 2 years ago, is now working as TA/LA (teaching/lab assistant) for the same class. This depends on how the institution works... in my BSc we had the "rule": anyone with grade > 8.5/10 in Java could volunteer to do workshops to other students (using uni's labs). Finally, this field if used as a mask or a constant, allows for roles to be extended (future-proof)
As for FKs I will always suggest using them for data consistency. Things can get really ugly really fast without FKs. The limitations they impose can be worked around and they are usually needed: What is the purpose of archiving a message with sender_id if the sender has been deleted by accident? Also, note that in most systems FKs are indexed which improves the performance of queries/joins.
Hope the above helps and not confuse things :)

Improving my Database Design for future scalability

Well, I am working on a project which might involve thousands of users & I don't have much experience in databases especially when it involves relationships between entities.
Let me explain my scenario. First there's an User who can login into our system using his credentials. We have a module in our system, which will enable him to create Projects. So that brings a relationship between User table & Projects table.
Now there's another module, namely Team Creation Module, it does what it says. Out of the list of available members, he can pick who he likes and add them to a team. So there are tables for that Members & Team. Furthermore, a member can be a part of many teams and a team can have many members & a "User" can be member as well.
I have a designed the database myself but I am not sure if it is good or bad one. Moreover, I would really appreciate if someone can point me to good tutorials which shows how to insert or update into tables involving relationships.
Here's my design till now:
Update
After a discussion with someone on IRC, I came up with a revised design. I merged "User" & "Members" table as User is also a Member.
My question still remains the same, Am I on right track?
It's great that you're thinking long-term, but your solution won't work long-term.
This is not the first time this sort of thing has been tried before. Rely on the wisdom of those that have messed up before. Read data modeling pattern books.
Abstract and Normalize. That's how you get to a good long-term solution.
At least read up on The Party Model. A group and individual are actually the same (abstract) thing.
Put actually different things in different tables. An Address and Member don't belong in the same table.
"Am I on the right track" is not a useful question - we have no way of telling, because it depends on where you are headed.
A couple of things:
it's a good idea to name the relation columns after the relationship. For instance, in the first diagram, the "owner" of the project should not be called users_user_id - that's meaningless. Call it "owner_id" or something that meaningfully describes the relationship between the project and members table.
in the second diagram, you appear to have a "many to many" relationship between members and projects in the members table - but there's no efficient way of storing the id of more than one project in the members table. You need to factor that out into a joining table - projects_members, for instance, just like you did with teams_members.
the "teams_members" table has a primary key called tm_id. A purist would tell you this is wrong - the unique identifier for that table should be the combination of member_id and team_id. You don't need another unique identifier - and in fact it's harmful, because you must guarantee uniqueness of the member_id and team_id combination.
As Neil says, you probably want to start reading up on this. I can recommend 'Database Systems: Design, Implementation, and Management' by Coronel et al.

CakePHP alternative to Class Table Inheritance?

I want to create a Class Table Inheritance model in CakePHP.
I would like to have a Model called something like ProductBase with the table product_bases to hold all the base information every product should have, like upc, price, etc.
Then have specific product type models extend that. For example ProductRing with the table product_rings to hold specific ring information like ring_size, center_stone, etc.
Then if I retrieve data directly from the ProductBase model, have it pull all types:
// pull all product types
$this->ProductBase->find('all');
Or find specific types only:
// pull only Rings or descendants of the Ring type.
$this->ProductRing->find('all');
Is anything like this possible in CakePHP? If not, what should I be doing instead?
What is the proper Cake way of doing something like this?
I worked with CakePHP for two years, and found no satisfactory solution for this, so one day I wrote a solution for it. I built a new kind of ORM that work as a plugin on top of CakePHP 2.x. I called it "Cream".
It works similar to the entities of CakePHP 3.0, but in addition supports multi table inheritance. It also supports very convenient data structure browsing (lazy loading) and is very easy to configure. In my opinion it is more powerful than what CakePHP 3.0 offers right now. Data structure browsing works as follows:
$entity = new Entity('SomeModel', $somePrimaryKeyValue);
$foo = $entity->RelatedModel()->YetAnotherRelatedModel()->someProperty();
However, it is important to notice, that in Cream, each entity object is a compund of a series of models and primary key values that are merged together. At least in the case where model inheritance is used. Such a compound looks like:
[<'SomeConcreteModel', primaryKeyValueA>, <'IntermediaryModel', primaryKeyValueB>, <'BaseModel', primaryKeyValueC>]
It is important to notice that you can pick up this entity by any of the given model/primaryKeyValue combinations. They all refer to the same entity.
Using this you can also solve your problem. You can use standard CakePHP find methods to find all primary key values you want from the base model, or you can use the find methods models that inherit from it, and then go along and create the entities.
You set up the chain of inheritance/extension by simply writing in your model class:
public $extends = 'YourBaseModel';
In addition you also needs to setup an ordinary CakePHP relationship between the models (hasOne or belongsTo). It works just like in normal OOP, with a chain of models that inherit from their bases. If you just use vanilla CakePHP you will just notice that these models are related, but when you start using the Cream interface, all entities merge model/primaryKeyValue pairs into one single object.
Within my github repository there is a powerpoint file that explain most of the basic features.
https://github.com/erobwen/Cream
Perhaps I should fork the CakePHP project and make a pull request, but for now It is a separate repository. Please feel free to comment or participate in developing "Cream".
Also, for those suggesting that it is best to just "work with the CakePHP flow as intended" I would argue the following. Common estimates suggest that C programs are 2.5 times bigger than the C++ counterpart. Given that the only feature that separates these languages is the OOP with inheritance etc, we can deduce that the lack of proper OOP with inheritance etc requires the programmer to do 150% additional work with repetition code etc. Therefore I would argue that a proper model inheritance mechanism in CakePHP is very much needed. Cream is an attempt at this.
You are referring to an ARC relationship (or at least a variation of it). Cake does not handle these types of relationships on the fly. This means you will have to implement your own logic to handle this.
The other option is to categorize the products. If the product can fit into multiple categories, then you will want a HABTM categories for each product. Otherwise, you can use a category column. I suspect it will be a HABTM you are looking for.
PRODUCTS: The table that holds the
products.
CATEGORIES: The list of categories
any given product can belong to.
CATEGORIES_PRODUCTS: The link between
each product and their various
categories.
TYPE: This is the flag that will
define the type of product (i.e.
ring, shoe, pants, etc.)
Then when you want ALL products, you query the products table. When you want a slice of the products (i.e. Rings) you select all the products that belongs to the RING category.
Now, we need to address the information about the product. For example, not all information will apply to every product. There are a number of ways to do this.
You can build multiple tables to
hold the product information. When
you pull a product of a given type,
you pull its companion information
from the table.
Store the information in a text
field as serialized data. All of the
information can be defined in a
settings var and then you can use
the serialized data to map to the
information.
I hope this helps. Happy coding!