I am developing an application using PHP and Yii Framework. I've been thinking about the most suitable database structure for the given functionality and here's what I've come up with. Yet I'm not 100% positive that's how it should be done so I've decided to ask the community.
App Description:
Registered users may participate in an event. Every event can have an
unlimited number of users, called "participants of the event").
Once the event is over, every participant can leave a feedback about every other participant of the same event.
Database structure:
Since every event can have an unlimited number of users and users can participate in an unlimited number of events, I've created a table "Participant", which resolves the many-to-many relation.
Other tables are self-explanatory.
And here's the most important thing:
Every participant of an event can have the maximum number of feedbacks which equals the number of participants of the same event excluding the given participant (Example, if there are 5 participants of the event, the given participant can receive 4 feedbacks from participants of the same event).
Let me emphasize, only participants of the same event can leave a feedback (and only one) about the given participant.
Below are the steps I took to ensure the integrity of the database:
I've created the "id" column in the "Participant" table to give a unique ID to every user who participates in a certain event. This ID is composite (user_id and practice_id concatenated together). So, the participant id of the user 23 who participated in event 14 would be 14-23.
You may ask why I decided to create a separate column with this ID instead of simply defining the primary key like this:
PRIMARY KEY (user_id, event_id)
Read on.
When the event is over, every participant can leave a feedback about the others. Now, this participant ID can be references by the foreign keys "sender_id" and "recipient_id" in the feedback table.
Further on, the primary key of the feedback table will also be formed by combining "the sender_id" and the "recipient_id", so if the user 23 wants to leave a feedback about the user 45 (both participated in the event 71), the primary key for the feedback would be: 71-45-71-23.
This approach allows us to make sure on the database level that no participant leaves a feedback about the same participant twice and that a user can't participate in the same event twice.
Questions:
Does this approach has the right to exist?
What are the pros and
other, better way to approach this functionality?
Can I generate the primary keys based on the values of the other columns
automatically on record insertion?
This is a bad design. Just make a 2-column primary key, and 2-column foreign keys to it. This is a fundamental anti-pattern called "encoding information in keys" which (thereby) are called "smart", "intelligent" or "concatenated" keys. A good key is a "dumb" key.
Eg::
Despite it now being easy to implement a Smart Key, it is hard to
recommend that you create one of your own that isn't a natural key,
because they tend to eventually run into trouble, whatever their
advantages, because it makes the databases harder to refactor, imposes
an order which is
difficult to change and may not be optimal for your queries, requires
a string comparison if the Smart Key includes non-numeric characters,
and is less effective than a composite key in helping range-based
aggregations. It also violates the basic relational guideline that
every column should store atomic values
Smart Keys also tend to outgrow their original coding constraints
Besides, there is no need to do this.
Many DBMSes allow "computed columns" whose values are automatically calculated from other columns. To make one a primary key or foreign key you would usually need it "persisted", ie have take up memory like a normal column vs just being calculated when needed like a view. MySQL does not have these, but 5.7.5 has some functionality where they are called "generated columns", which can be "stored". But don't do this for PKs or FKs!
The actual design issue is handling database/SQL subtypes/hierarchies/inheritance/polymorphism.
Related
I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK
This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.
Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.
Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.
Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.
I am new to MSAccess so I'm not sure about this; do I have to have a primary key for every single table in my database? I have one table which looks something like this:
(http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP.png?t=1382688844)
In this case, every field/column has a repeating term. I have tried assigning the primary key to every field but it returns with an error saying that there is a repeated field.
How do I go about this?
Strictly speaking, Yes, every row in a relational database should have a Primary Key (a unique identifier). If doing quick-and-dirty work, you may be able to get away without one.
Internal Tracking ID
Some database generate a primary key under-the-covers if you do not assign one explicitly. Every database needs some way to internally track each row.
Natural Key
A natural key is an existing field with meaningful data that happens to identify each row uniquely. For example, if you were tracking people assigned to teams, you might have an "employee_id" column on the "person" table.
Surrogate Key
A surrogate key is an extra column you add to a table, just to assign an arbitrary value as the unique identifier. You might assign a serial number (1, 2, 3, …), or a UUID if your database (such as Postgres) supports that data type. Assigning a serial number or UUID is so common that nearly every database engine provides a built-in facility to help you automatically create such a value and assign to new rows.
My Advice
In my experience, any serious long-term project should always use a surrogate key because every natural key I've ever been tempted to use eventually changes. People change their names (get married, etc.). Employee IDs change when company gets acquired by another.
If, on the other hand, you are doing a quick-and-dirty job, such as analyzing a single batch of data to produce a chart once and never again, and your data happens to have a natural key then use it. Beware: One-time jobs often have a way of becoming recurring jobs.
Further advice… When importing data from a source outside your control, assign your own identifier even if the import contains a candidate key.
Composite Key
Some database engines offer a composite key feature, also called compound key, where two or more columns in the table are combined to create a single value which once combined should prove unique. For example, in a "person" table, "first_name" and "last_name", and "phone_number" fields might be unique when considered together. Unless two people married and sharing the same home phone number while also happening to each be named "Alex" with a shared last name! Because of such collisions as well as the tendency for meaningful data to change and also the overhead of calculating such combined values, it is advisable to stick with simple (single-column) keys unless you have a special situation.
If the data doesn't naturally have a unique field to use as the primary key, add an auto-generated integer column called "Id" or similar.
Read the "how to organize my data" section of this page:
http://www.htmlgoodies.com/primers/database/article.php/3478051
This page shows you how to create one (under "add an autonumber primary key"):
http://office.microsoft.com/en-gb/access-help/create-or-remove-a-primary-key-HA010014099.aspx
In you use a DataAdapter and a Currency Manager, your tables must have a primary key in order to push updates, additions and deletions back to the database. Otherwise, they will not register and you will receive an error.
I lost one week figuring that one out until I added this to the Try-Catch-End Try block: MsgBox(er.ToString) which mentioned "key". From there, I figured it out.
(NB : Having a primary key was not a requisite in VB6)
Not having a primary key usually means your data is poorly structured. However, it looks like you're dealing with summary/aggregate data there, so it's probably doesn't matter.
Should you always create unique keys whenever possible?
For example let's say I have a table with three fields, student ID, first name, last name and the student ID is the primary key.
If no two students have the first & last name, should I create a unique key for those two fields?
Yes, you should use unique indexes even when you already have a primary key when the column or combination of columns are unique. It's good to have constraints in your database to prevent bad data. However, this is not what you have in your case. Even if you currently have no students with duplicate names that can easily happen in the future. Names are not unique in the world.
U.S. Social Security numbers are almost always unique (they can be reused after a number of years, but it's unlikely to ever happen in your case), so they might make for a good candidate for a unique index. If you have non-U.S. students though then you would need to make the column nullable.
Yes, usually having unique IDs (surrogate keys) is best. In this case, last name and first name are not enough for a primary key. Even if you no duplicate names now, you can't be sure you won't have two John Smith's in the future.
Don't make the assumption that no two students will have the same name.
When the underlying model suggests it, it is a good idea to create unique keys. Constraints like these will ensure cohesive data and prevent errors. But in your case the underlying model does not suggest this to be the case.
Unique keys should follow business definitions; if the studentID is a "semi-natural" key (it has unique meaning that exists beyond your specific database), then that should suffice as your unique key.
If the studentID is simply an identity value that is assigned by the database as a row-number, then you probably need some other unique key to avoid entering the same student twice.
Primitive primary key with no relation to data domain is one of widely accepted best practices
( just imagine - one of your students decides to marry )
Another good practice (though from NoSql) world is to use GUID - this way keys are unique, and different datasets can be mixed in same table without collisions.
PS: you could save some storage space, but today it is cheap and there is no need to sacrifice good practices for it
Yes!
If you ever need to update or delete rows from the table, it is very advantageous to have something to uniquely identify each row in the table.
With your example, I don't think it's possible to guarantee no two students will share the same name. Even adding a date of birth still can't guarantee they'll always be unique. I'd recommend adding an auto incrementing INT or BIGINT as the primary key.
You can always add the Unique constraint as well and remove it if it becomes an issue.
A simple way to do it is use an auto-generated Guid (Globally Unique Identifier) to identify a student. It is "guarenteed" to be unique every time it is generated. Names can change (like when somebody gets married), but some auto generated value has no meaning so should never need to be changed.
http://en.wikipedia.org/wiki/Globally_unique_identifier
Your database constraints should be DBMS understood business rules. Is there a business rule that states that no two students may have the same first and last name combination? I presume not, therefore do not create a unique key for those two fields. Perhaps best not to presume, though, and ask a business domain expert e.g. the enrolment officer.
Note that a row in this table is a proposition I.e. that there exists a student enrolled with first name 'x' and last name 'y' and student ID 'z'. Clearly the DBMS has not concept of whether this proposition is true in the real world. What normally happens is that there will be a trusted source to verify data. The enterprise will authorize an officer (director etc) in this role. Let's say it is the enrolment officer who is responsible for verifying that 'x y' is a real person, that they are eligible to be enrolled, and the person is who they say they are. Typically, they will require sight of documents (certificates, passport, etc), take up references, interview the person, check public records, etc. Of course, the enrolment officer may delegate their responsibility to other members of staff or engage an agent.
At some point they will be satisfied and for convenience will issue they own identifier, the student ID. Mistakes do happen and it may turn out that this value is not unique, in which case it would be the enrolment officer's responsibility to resolve the problem and issue a new student to. Perhaps they will use software to generate the value to mitigate against such problems. The student ID will be issued to the student and will be used within the enterprise to identify the person for the convenience of all concerned. They may even be issued with a document (e.g. photo ID card) to assist in identification, based on the level of trust in a given context (e.g. may need to produce photo ID to sit an exam). If the student forgets their ID, loses their issued documents, etc then the enrolment office will be able to retrieve it from records e.g. with reference to copy documents taken during the verification process; they are unlikely to use first name and last name alone.
The point is, the trusted source for the identifier is the enrolment officer on behalf of the enterprise, rather than the database, the DBMS or any other kind of software involved in the process. Therefore, it probably is acceptable to make student ID the sole identifier for stents within the database. Consider, however, that an auto-increment column generated on one hardware build of a single DBMS within the enterprise is probably not suitable for the allocation of such significant identifier values.
I'm building a site similar to Yelp (Recommendation Engine, on a smaller scale though), so there will be three main entities in the system: User, Place (includes businesses), and Event.
Now what I'm wondering about is how to store information such as photos, comments, and 'compliments' (similar to Facebook's "Like") for each of these type of entity, and also for each object they can be applied to (e.g. comment on a recommendation, photo, etc). Right now the way I was doing it was a single table for each i.e.
Photo (id, type, owner_id, is_main, etc...)
where type represents: 1=user, 2=place, 3=event
Comment (id, object_type, object_id, user_id, content, etc, etc...)
where object_type can be a few different objects like photos, recommendations, etc
Compliment (object_id, object_type, compliment_type, user_id)
where object_type can be a few different objects like photos, recommendations, etc
Activity (id, source, source_type, source_id, etc..) //for "activity feed"
where source_type is a user, place, or event
Notification (id, recipient, sender, activity_type, object_type, object_id, etc...)
where object_type & object_id will be used to provide a direct link to the object of the notification e.g. a user's photo that was complimented
But after reading a few posts on SO, I realized I can't maintain referential integrity with a foreign key since that's requires a 1:1 relationship and my source_id/object_id fields can relate to an ID in more than one table. So I decided to go with the method of keeping the main entity, but then break it into subsets i.e.
User_Photo (photo_id, user_id) | Place_Photo(photo_id, place_id) | etc...
Photo_Comment (comment_id, photo_id) | Recommendation_Comment(comment_id, rec_id) | etc...
Compliment (id, ...) //would need to add a surrogate key to Compliment table now
Photo_Compliment(compliment_id, photo_id) | Comment_Compliment(compliment_id, comment_id) | etc...
User_Activity(activity_id, user_id) | Place_Activity(activity_id, place_id) | etc...
I was thinking I could just create views joining each sub-table to the main table to get the results I want. Plus I'm thinking it would fit into my object models in Code Igniter as well.
The only table I think I could leave is the notifications table, since there are many object types (forum post, photo, recommendation, etc, etc), and this table will only hold notifications for a week anyway so any ref integrity issues shouldn't be much of a problem (I think).
So am I going about this in a sensible way? Any performance, reliability, or other issues that I may have overlooked?
The only "problem" I can see is that I would end up with a lot of tables (as it is right now I have about 72, so I guess i would end up with a little under 90 tables after I add the extras), and that's not an issue as far as I can tell.
Really grateful for any kind of feedback. Thanks in advance.
EDIT: Just to be clear, I'm not concerned if i end up with another 10 or so tables. From what I know, the number of tables isn't too much of an issue (once they're being used)... unless you had say 200 or so :/
Some propositions for this UoD (universe of discourse)
User named Bob logged in.
User named Bob uploaded photo number 56.
There is a place named London.
Photo number 56 is of place named London.
User named Joe created comment "very nice" on photo number 56.
To introduce object IDs
User (UserID) logged in.
User (UserID) uploaded Photo (PhotoID).
There is Place (PlaceID).
Photo (PhotoID) is of Place (PlaceID).
User (UserID) created Comment (CommentID) on Photo (PhotoID).
Just Fact Types
User logged in.
User uploaded Photo.
Place exists.
Photo is of Place.
User created Comment on Photo.
Now to extract predicates
Predicate Predicate Arity
---------------------------------------------
... logged in 1 (Unary predicate)
... uploaded ... 2 (Binary)
... exists 1 (Unary)
... is of ... 2 (Binary)
... created ... on ... 3 (Ternary)
It looks like each proposition is this UoD may be stated with max ternary predicate,
so I would suggest something like
Predicate role (Role_1_ID, Role_2_ID, Role_3_ID) is a part that an object plays in a predicate. Substitute the ... in a predicate from left to right with each Role_ID.
Note that only Role_1_ID is mandatory (at least unary predicate), the other two may be NULL.
In this simple model, it is possible to propose anything.
Hence, you would need to implement constraints on the application layer.
For example, you have to make sure that it is possible to create Comment on Place, but not create Place on Place.
Not all predicates represents action, for example ... logged in is an action while ... is of ... is not.
So, your activity feed would list all Propositions with Predicate.IsAction = True.
If you rearrange things slightly, you can simplify your comments and compliments. Essentially you want to have a single store of comments and another one of compliments. Your problem is that this won't let you use declarative referential integrity (foreign key constraints).
The way to solve this is to make sure that the objects that can attract comments and compliments are all logical sub-types of one supertype. From a logical perspective, it means you have an "THING_OF_INTEREST" entity (I'm not making a naming convention recommendation here!) and each of the various specific things which attract comments and compliments will be a sub-type of THING_OF_INTEREST. Therefore your comments table will have a "thing_of_interest_id" FK column and similarly for your compliments table. You will still have the sub-type tables, but they will have a 1:1 FK with THING_OF_INTEREST. In other words, THING_OF_INTEREST does the job of giving you a single primary key domain, whereas all of the sub-type tables contain the type-specific attributes. In this way, you can still use declarative referential integrity to enforce your comment and compliment relationships without having to have separate tables for different types of comments and compliments.
From a physical implementation perspective, the most important thing is that your various things of interest all share a common primary key domain. That's what lets your comment table have a single FK value that can be easily joined with whatever that thing of interest happens to be.
Depending on how you go after your comments and recommendations, you probably will (but may not) need to physically implement THING_OF_INTEREST - which will have at least two attributes, the primary key (usually an int) plus a partitioning attribute that tells you which sub-type of thing it is.
If you need referential integrity (RI) there is no better way to do it than to use many-to-many junction tables. True, you end up having a lot of tables in the system, but that's the cost you need to pay. It also has some other benefits going this route, for instance you get some sort of partitioning for free: you get the data partitioned by their relation type, each in its own table. This offers RI but it is not 100% safe either, for instance there's nothing to guarantee you that a comment belongs to a photo and to that photo alone, you'd need to enforce this kind of constraints manually should you need them.
On the other hand, going with a generic solution like you already did gets you faster off the ground and it's way easier to extend in the future but there'll be no RI unless you'll code it manually (which is very complex and a lot harder to deal with than the alternative M:M for every relation type).
Just to mention another alternative, similar to your existing implementation, you could use a custom M:M junction table to handle all your relations regardless of their type: object1_type, object1_id, object2_type, object2_id. Simple but no other benefit beside very easy to implement and extend. I'd only recommend it if you don't need RI and you got yourself a lot of tables, all interlinked.