As part of an assessment I need to build a site in PHP that relies on a database, the site can be on anything we want. The only restrictions on the site is the database must have a minumum of seven tables, must be in third normal form and have at minimum one many to many table.
Below is the Schema I have sofar, I am hoping people would be willing to tell me where it is not in third normal form, or if my many to many table (games) will not work.
I have an understanding of third normal form in that each item must be reliant on the primary key there to be no candidate key.
Thoughts?
Related
Scenario: Multiple Types to a single type; one to many.
So for example:
parent multiple type: students table, suppliers table, customers table, hotels table
child single type: banking details
So a student may have multiple banking details, as can a supplier, etc etc.
Layout Option 1 students table (id) + students_banking_details (student_id) table with the appropriate id relationship, repeat per parent type.
Layout Option 2 students table (+others) + banking_details table. banking_details would have a parent_id column for linking and a parent_type field for determining what the parent is (student / supplier / customers etc).
Layout Option 3 students table (+others) + banking_details table. Then I would create another association table per parent type (eg: students_banking_details) for the linking of student_id and banking_details_id.
Layout Option 4 students table (+others) + banking_details table. banking_details would have a column for each parent type, ie: student_id, supplier_id, customers_id - etc.
Other? Your input...
My thoughts on each of these:
Multiple tables of the same type of information seems wrong. If I want to change what gets stored about banking details, thats also several tables I have to change as opposed to one.
Seems like the most viable option. Apparently this doesnt maintain 'referential integrity' though. I don't know how important that is to me if I'm just going to be cleaning up children programatically when I delete the parents?
Same as (2) except with an extra table per type so my logic tells me this would be slower than (2) with more tables and with the same outcome.
Seems dirty to me with a bunch of null fields in the banking_details table.
Before going any further: if you do decide on a design for storing banking details which lacks referential integrity, please tell me who's going to be running it so I can never, ever do business with them. It's that important. Constraints in your application logic may be followed; things happen, exceptions, interruptions, inconsistencies which are later reflected in data because there aren't meaningful safeguards. Constraints in your schema design must be followed. Much safer, and banking data is something to be as safe as possible with.
You're correct in identifying #1 as suboptimal; an account is an account, no matter who owns it. #2 is out because referential integrity is non-negotiable. #3 is, strictly speaking, the most viable approach, although if you know you're never going to need to worry about expanding the number of entities who might have banking details, you could get away with #4 and a CHECK constraint to ensure that each row only has a value for one of the four foreign keys -- but you're using MySQL, which ignores CHECK constraints, so go with #3.
Index your foreign keys and performance will be fine. Views are nice to avoid boilerplate JOINs if you have a need to do that.
So I am building out a set of tables in an existing database at the moment, and have run into a weird problem.
First things first, the tables in question are called Organizations, Applications, and PostOrganizationsApplicants.
Organizations is a pre-existing table that is already populated with lots of data in regards to an organization's information which has been filled out in another form on another portal. EDIT: I cannot edit this table.
Applications is a table that records all information that a user inputs in the application form of the website. It is a new table.
PostOrganizationsApplicants is basically a copy of Organizations. This is also a new table.
The process goes:
1. Go to website and choose between two different web forms, Form A pertains to companies who are in the Organizations table, and Form B pertains to companies who are not in that table.
2a. If Form A is chosen, a lot of the fields in the application will be auto-populated because of their previous submission.
2b. If Form B is chosen, the company has to start from scratch and fill out the entire application from scratch.
3. Any Form B applicants must go into the PostOrganizationsApplicants table.
Now I am extremely new to SQL and Database Management so I may sound pretty stupid, but when I am linking the Organizations and PostOrganizationsApplicants tables to the Applications table, FK's for the OrganizationsID column and PostOrganizationsApplicantsID columns will have lots of empty spaces.
Is this good practice? Is there a better way to structure my tables? I've been racking my brain over this and just can't figure out a better way.
No, it's not necessarily bad practice to allow NULL values for foreign key columns.
If an instance of an entity doesn't have a relationship to an instance of another entity, then storing a NULL in the foreign key column is the normative practice.
From your description of the use case, a "Form A" Applications won't be associated with a row in Organizations or a row in PostOrganizationsApplicants.
Getting the cardinality right is what is important. How manyOrganizations can a given Applications be related to? Zero? One? More than One? And vice versa.
If the relationship is many-to-many, then the usual pattern is to introduce a third relationship table.
Less frequently, we will also implement a relationship table for very sparse relationships, when a relationship to another entity is an exception, rather than the rule.
I'm assuming that the OrganizationsID column you are referring to is in the PostOrganizationsApplicants table (which would mean that a PostOrganizationsApplicants can be associated with (at most) one Organizations.
I'm also assuming that PostOrganizationsApplicantsID column is in the Applications table, which means an instance of Applications can be associated with at most one PostOrganizationsApplicants.
Bottomline, have a "zero-or-one to many" relationship is valid, as long as that supports a suitable representation of the business model.
Why not just add a column to the Organizations table that indicates that the Organization is a "Post" type of organization and set it for the Form B type of applicants? - then, all your orgs are in one table - with a single property to indicate where they came from.
If you can add a new record to Organizations (I hope you can) just
create FK from Organizations as PK of PostOrganizationsApplicants. So
if Organizations has corresponding record in PostOrganizationsApplicants - it's "Post"!
Thanks everybody, I think I found the most efficient way to do it inspired by all of your answers.
My solution below, in case anyone else has a similar problem...
Firstly I will make the PK of PostOrganizationsApplicants the FK of Organizations by making a "link" table.
Then I am going to add a column in PostOrganizationsApplicants which will take in a true/false value on whether they completed the form from the other portal or not. Then I will ask a question in the form whether they have already done the other version of the form or not. If the boolean value is true, then I will point those rows to the Organizations table to auto-populate the forms.
Thanks again!
Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.
I'm designing a website with courses and jobs.
I have a jobs table and courses table, and each job or course is offered by a 'body', which is either an institution(offering courses) or a company(offering jobs). I am deciding between these two options:
option1: use a 'Bodies' table, with a body_type column for both insitutions and companies.
option2: use separate 'institution' and 'company' tables.
My main problem is that there is also a post table where all adverts for courses and jobs are displayed from. Therefore if I go with the first option, I would just need to put a body_id as a record for each post, whereas if I choose the second option, I would need to have an extra join somewhere when displaying posts.
Which option is best? or is there an alternative design?
Don't think so much in terms of SQL syntax and "extra joins", think more in terms of models, entities, attributes, and relations.
At the highest level, your model's central entity is a Post. What are the attributes of a post?
Who posted it
When it was posted
Its contents
Some additional metadata for search purposes
(Others?)
Each of these attributes is either unique to that post and therefore should be in the post table directly, or is not and should be in a table which is related; one obvious example is "who posted it" - this should simply be a PostedBy field with an ID which relates another table for poster/body entities. (NB: Your poster entity does not necessarily have to be your body entity ...)
Your poster/body entity has its own attributes that are either unique to each poster/body, or again, should be in some normalized entity of their own.
Are job posts and course posts substantially different? Perhaps you should consider CoursePosts and JobPosts subset tables with job- and course-specific data, and then join these to your Posts table.
The key thing is to get your model in such a state that all of the entity attributes and relationships make sense where they are. Correctly modeling your actual entities will prevent both performance and logic issues down the line.
For your specific question, if your bodies are generally identical in terms of attributes (name, contact info, etc) then you want to put them in the same table. If they are substantially different, then they should probably be in different tables. And if they are substantially different, and your jobs and courses are substantially different, then definitely consider creating two entirely different data models for JobPosts versus CoursePosts and then simply linking them in some superset table of Posts. But as you can tell, from an object-oriented perspective, if your Posts have nothing in common but perhaps a unique key identifier and some administrative metadata, you might even ask why you're mixing these two entities in your application.
When resolving hierarchies there are usually 3 options:
Kill children: Your option 1
Kill parent: Your option 2
Keep both
I get the issue you're talking about when you kill the parent. Basically, you don't know to what table you have to create a foreign key. So unless you also create a post hierarchy where you have a post related to institution and a separate post table relating to company (horrible solution!) that is a no go. You could also solve this outside the design itself adding metadata in each post stating which table they should join against (not a good option either as your schema will not be self documentation and the data will determine how to join tables... which is error prone).
So I would discard killing the parent. Killing the children works good if you don't have too many different fields between the different tables. Also you should bear in mind that that approach is not good to solve issues wether the children can be both: institution and companies but it doesn't seem to be the case. Killing the children is also the most efficient one.
The third option that you haven't evaluated is the keeping both approach. This way you keep a dummy table containing the shared values between the bodies and each of the bodies have a FK to this "abstract" table (if you know what I mean). This is usually the least efficient way but most likely the most flexible. This way you can easily handle bodies that are of both types, and also that are only of type "body" but not a company nor an institution themselves (if that is even possible or might be possible in the future). You should note that in order to join a post to an institution you should always reference the parent table and then join the parent with the children.
This question might also be useful for you:
What is the best database schema to support values that are only appropriate to specific rows?
I am constructing a database System using Mysql, this will be an application of about 20 tables. The system contains information on farmers, we work with organic certification and need to record a lot of info for that.
In my system, there are related parent-child tables for farmers, producing years and fields/areas - it's a simple representation of the real world in which farmers farm crops on their fields.
I now need to add several status flags for each one of these levels: a farmer can be certified, or his field can be, or the specific year can be; each of these flags has several states and can occur a number of times.
The obvious solution to this would be to add a child table to every one of these tables, and define the states there.
What I wonder if there is an easier way to do this to avoid getting to many tables? Where/how would be best practise to keep that data?
What about an indicator on every table that contains data that may or may not be certified? It's easier than adding new tables.
Or, if "certification" is actually a combination of several pieces/fields of data, then have a single "certification" table, and the other tables can reference it through a foreign key (something like "certification_id", which is the key of the "certification" table).