Normalize two tables with same primary key to 3NF - mysql

I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK

This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.

Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.

Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.

Related

MySQL Database Layout/Modelling/Design Approach / Relationships

Scenario: Multiple Types to a single type; one to many.
So for example:
parent multiple type: students table, suppliers table, customers table, hotels table
child single type: banking details
So a student may have multiple banking details, as can a supplier, etc etc.
Layout Option 1 students table (id) + students_banking_details (student_id) table with the appropriate id relationship, repeat per parent type.
Layout Option 2 students table (+others) + banking_details table. banking_details would have a parent_id column for linking and a parent_type field for determining what the parent is (student / supplier / customers etc).
Layout Option 3 students table (+others) + banking_details table. Then I would create another association table per parent type (eg: students_banking_details) for the linking of student_id and banking_details_id.
Layout Option 4 students table (+others) + banking_details table. banking_details would have a column for each parent type, ie: student_id, supplier_id, customers_id - etc.
Other? Your input...
My thoughts on each of these:
Multiple tables of the same type of information seems wrong. If I want to change what gets stored about banking details, thats also several tables I have to change as opposed to one.
Seems like the most viable option. Apparently this doesnt maintain 'referential integrity' though. I don't know how important that is to me if I'm just going to be cleaning up children programatically when I delete the parents?
Same as (2) except with an extra table per type so my logic tells me this would be slower than (2) with more tables and with the same outcome.
Seems dirty to me with a bunch of null fields in the banking_details table.
Before going any further: if you do decide on a design for storing banking details which lacks referential integrity, please tell me who's going to be running it so I can never, ever do business with them. It's that important. Constraints in your application logic may be followed; things happen, exceptions, interruptions, inconsistencies which are later reflected in data because there aren't meaningful safeguards. Constraints in your schema design must be followed. Much safer, and banking data is something to be as safe as possible with.
You're correct in identifying #1 as suboptimal; an account is an account, no matter who owns it. #2 is out because referential integrity is non-negotiable. #3 is, strictly speaking, the most viable approach, although if you know you're never going to need to worry about expanding the number of entities who might have banking details, you could get away with #4 and a CHECK constraint to ensure that each row only has a value for one of the four foreign keys -- but you're using MySQL, which ignores CHECK constraints, so go with #3.
Index your foreign keys and performance will be fine. Views are nice to avoid boilerplate JOINs if you have a need to do that.

Cardinality in ER diagram

I've made a project that's essentially an online bookstore where one can buy books and place the order.
My database contains various tables like:
user
user_shipping_address
user_payment_mode
user_order
order_shipping_address
order_billing_address
order_payment_details
I tried to construct the EERD diagram for this but I am confused about one thing: A user_order can only have one shipping address. I've created a foreign key order_id in the order_shipping_address table that references the primary key order.id. I also have a shipping_address_id foreign key in the table order that references order_shipping_address.id.
When I try to generate the ER diagram, it gives me two different relationships. A 1:1 relationship between the order and the shipping address and a 1:M relationship from the shipping address to the order. I don't know how to structure the foreign key constraints because I feel the order table should contain the shipping_address_id and the shipping address should contain the order_id, right? This just made everything more confusing.
Please help me about this.
Here is my EERD :
This happens because your current design means it's possible for multiple user_order rows to reference the same single shipping_address row.
You need to change the design so that it's impossible for multiple user_order rows to reference the same single shipping_address row.
There are at least two different possible solutions:
Add a UNIQUE constraint on user_order.shipping_address_id
Or: invert the relationship (this my preferred option as it eliminates unneeded surrogate keys):
Remove the user_order.shipping_address_id column.
Change shipping_address.id to shipping_address.order_id, so that it's a foreign-key of user_order.id
Make shipping_address.order_id the new primary key of shipping_address.
Note that both of these options are a denormalization as it prevents the sharing of shipping addresses between different orders (e.g. if the same customer makes the same repeat order a lot) though this can be intentional - so that if a user's future address changes it won't unintentionally retroactively update old order shipping records.
A few other tips:
Consider using int rather than bigint for your identities - I doubt you'll have over 2 billion rows in each table.
Don't blindly use varchar(255) for all text columns - use it to enforce reasonable constraints on data length, for example state doesn't need to be longer than 2 characters if you're storing abbreviations, ditto zipcode which can be varchar(10) if you're using ZIP+4.
DO NOT STORE FULL CREDIT CARD NUMBERS IN YOUR DATABASE! (as seen in your payment table) - This is a violation of PCI Rules, a massive liability and probably illegal negligence in your jurisdiction. Your payment processor will provide you with a substitute opaque token value (or something similar) as a means of identifying charge-cards and applying future charges to stored payment details - the most you can reasonably store is the last 4 digits. Whether or not your encrypt the data is largely irrelevant.

Does data redundancy in different tables not follow Third Normal Form (3NF)?

I have 4 tables. Each of them contain the following attributes:
Table 1 :
Person (Id (Primary key), Name, Occupation, Location, SecondJob, PerHour, HoursWorked, Phone, Workphone)
Table 2 :
Job (Id (Foreign key that refers to Person), Title, Name, Location, Salary)
Table 3 :
SecondJob (Id (Foreign key that refers to Person), Title, Name)
Table 4:
PhoneNumber (Id (Foreign key that refers to Person), Name, Phone, Workphone)
I can obtain the values of each attribute like Name, Title, Phone and Workphone from the Person table with the following psuedo SQL statement:
Select (ATTRIBUTE NAME) FROM Person WHERE Id IN (PERSONS ID)
Does the fact that some of the information is being repeated in DIFFERENT TABLES (Data Redundancy), break (ie, not follow) the Third Normal Form (3NF)?
Or should the values be put into the other Tables separately and reason what attribute is identifying with the Primary Key of the Table?
I calculate Salary in Job by getting PerHour and HoursWorked from Person, then multiply them. I have also heard that this is redundant Data, due to the fact that is is data that you could extrapolate from existing Data within the Tables.
But, does this break the Third Normal Form??
Does the fact that information is repeated in DIFFERENT TABLES (Data Redundancy), break against 3NF Normalization?
No. A table value or variable is or isn't in a given NF. This is independent of any other table. (We do also talk about a database being in NF when all of its tables are in that NF.)
Normalization can be reasonably said to remove redundancy. But there is lots of redundancy not addressed by normalization. And there is lots of redundancy that is not bad. And duplication is not necessarily redundancy. Just because data is repeated doesn't mean "information" is repeated. What data says by being or not being in a table depends on the meaning of the table.
But you seem to think that just because duplicating data in a different table doesn't violate 3NF that it doesn't violate other principles of good design. That's wrong. Also, it's 5NF that matters. The only reason lower NFs are used is that SQL DBMSs don't support 5NF well.
Or should i just put in the values into the other Tables seperately and reason what attribute is identifying with the Primary Key of the Table?
I guess you are trying to say, Should I only put the values in one table each and reconstruct the second table via queries involving shared keys? Ie, if you can get the values in a column by querying the rest of the database then should you avoid having that column? Generally speaking, yes.
Your question assumes a misconception. It's not a matter of "(exclusive) or" here. You should do both.
I calculate Salary in Job by getting PerHour and HoursWorked from Person, then multiply them. I heard that this is also redundant Data, due to it being data that you could extrapulate from existing Data in the Tables.
It is redundant given the rest of the database, because you could use a query instead. And if you don't constrain salary values appropriately then that is bad redundancy. Even if you do the column and constraint complicate the schema.
But does it break 3NF Normalization?
No, because the NF of a table is independent of other tables. But that doesn't mean it's ok.
(If you added Salary to Person, the new table would not be in 3NF. But then, SQL DBMSs have computed columns that make that ok, by making the non-3NF table with Salary a view of the 3NF table without it.)
Learn some database design method(s) and how they apply principles of good design. Your tables needlessly address overlapping aspects of the application. Also learn about JOIN in writing queries.

Is it proper to make a grand-parent key, a primary key, in its grand-child, in a multi-level identifying relationship?

Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.

Always create unique keys whenever possible?

Should you always create unique keys whenever possible?
For example let's say I have a table with three fields, student ID, first name, last name and the student ID is the primary key.
If no two students have the first & last name, should I create a unique key for those two fields?
Yes, you should use unique indexes even when you already have a primary key when the column or combination of columns are unique. It's good to have constraints in your database to prevent bad data. However, this is not what you have in your case. Even if you currently have no students with duplicate names that can easily happen in the future. Names are not unique in the world.
U.S. Social Security numbers are almost always unique (they can be reused after a number of years, but it's unlikely to ever happen in your case), so they might make for a good candidate for a unique index. If you have non-U.S. students though then you would need to make the column nullable.
Yes, usually having unique IDs (surrogate keys) is best. In this case, last name and first name are not enough for a primary key. Even if you no duplicate names now, you can't be sure you won't have two John Smith's in the future.
Don't make the assumption that no two students will have the same name.
When the underlying model suggests it, it is a good idea to create unique keys. Constraints like these will ensure cohesive data and prevent errors. But in your case the underlying model does not suggest this to be the case.
Unique keys should follow business definitions; if the studentID is a "semi-natural" key (it has unique meaning that exists beyond your specific database), then that should suffice as your unique key.
If the studentID is simply an identity value that is assigned by the database as a row-number, then you probably need some other unique key to avoid entering the same student twice.
Primitive primary key with no relation to data domain is one of widely accepted best practices
( just imagine - one of your students decides to marry )
Another good practice (though from NoSql) world is to use GUID - this way keys are unique, and different datasets can be mixed in same table without collisions.
PS: you could save some storage space, but today it is cheap and there is no need to sacrifice good practices for it
Yes!
If you ever need to update or delete rows from the table, it is very advantageous to have something to uniquely identify each row in the table.
With your example, I don't think it's possible to guarantee no two students will share the same name. Even adding a date of birth still can't guarantee they'll always be unique. I'd recommend adding an auto incrementing INT or BIGINT as the primary key.
You can always add the Unique constraint as well and remove it if it becomes an issue.
A simple way to do it is use an auto-generated Guid (Globally Unique Identifier) to identify a student. It is "guarenteed" to be unique every time it is generated. Names can change (like when somebody gets married), but some auto generated value has no meaning so should never need to be changed.
http://en.wikipedia.org/wiki/Globally_unique_identifier
Your database constraints should be DBMS understood business rules. Is there a business rule that states that no two students may have the same first and last name combination? I presume not, therefore do not create a unique key for those two fields. Perhaps best not to presume, though, and ask a business domain expert e.g. the enrolment officer.
Note that a row in this table is a proposition I.e. that there exists a student enrolled with first name 'x' and last name 'y' and student ID 'z'. Clearly the DBMS has not concept of whether this proposition is true in the real world. What normally happens is that there will be a trusted source to verify data. The enterprise will authorize an officer (director etc) in this role. Let's say it is the enrolment officer who is responsible for verifying that 'x y' is a real person, that they are eligible to be enrolled, and the person is who they say they are. Typically, they will require sight of documents (certificates, passport, etc), take up references, interview the person, check public records, etc. Of course, the enrolment officer may delegate their responsibility to other members of staff or engage an agent.
At some point they will be satisfied and for convenience will issue they own identifier, the student ID. Mistakes do happen and it may turn out that this value is not unique, in which case it would be the enrolment officer's responsibility to resolve the problem and issue a new student to. Perhaps they will use software to generate the value to mitigate against such problems. The student ID will be issued to the student and will be used within the enterprise to identify the person for the convenience of all concerned. They may even be issued with a document (e.g. photo ID card) to assist in identification, based on the level of trust in a given context (e.g. may need to produce photo ID to sit an exam). If the student forgets their ID, loses their issued documents, etc then the enrolment office will be able to retrieve it from records e.g. with reference to copy documents taken during the verification process; they are unlikely to use first name and last name alone.
The point is, the trusted source for the identifier is the enrolment officer on behalf of the enterprise, rather than the database, the DBMS or any other kind of software involved in the process. Therefore, it probably is acceptable to make student ID the sole identifier for stents within the database. Consider, however, that an auto-increment column generated on one hardware build of a single DBMS within the enterprise is probably not suitable for the allocation of such significant identifier values.