DB Design: Favor Abstraction or Foreign-Key Constraints? - language-agnostic

Say we have this scenario:
Artist ==< Album ==< Track
//ie, One Artist can have many albums, and one album can have many tracks
In this case, all 3 entities have basically the same fields:
ID
Name
A foreign of the one-many relationship to the corresponding children (Artist to Album and Album to Track
A typical solution to the provided solution would be three tables, with the same fields (ArtistID, AlbumID etc...) and foreign key constraints in the one-many relationship field.
But, can we in this case, incorporate a form of inheritance to avoid the repetition of the same field ? I'm talking something of the sort:
Table: EntityType(EntityTypeID, EntityName)
This table would hold 3 entities (1. Artist, 2. Album, 3. Track)
Table: Entities(EntityID, Name, RelField, EntityTypeID)
This table will hold the name of the entity (like the name of
an artist for example), the one-many field (foreign-key
of EntityID) and EntityTypeID holding 1 for Artist, 2 for Album
and so on.
What do you think about the above design? Does it make sense to incorporate "OOP concepts" in this DB scenario?
And finally, would you prefer having the foreign-key constraints of the first scenario or the more generic (with the risk of linking an artist with a Track for example, since there is no check to see the inputter foreign-key value is really of an album) approach?
..btw, come to think of it, I think you can actually check if an inputted value of the RelField of an Artist corresponds to an Album, with triggers maybe?

I have recently seen this very idea of abstraction implemented consistenly, and the application and its database became a monster to maintain and troubleshoot. I will stay away from this technique. The simpler, the better, is my mantra.

There's very little chance that the additional fields that will inevitably accumulate on the various entities will be as obliging. Nothing to be gained by not reflecting reality in a reasonably close fashion.
I don't imagine you'd even likely conflate these entities in your regular OO design.
This reminds me (but only slightly) of an attempt I saw once to implement everything in a single table (named "Entity") with another table (named "Attributes") and a junction table between them.

By stucking all three together, you make your queries less readble (unless you then decompose the three categories as views) and you make searching and indexing more difficult.
Plus, at some point you'll want to add attributes to one category, which aren't attributes for the others. Sticking all three together gives you no room for change without ripping out chunks of your system.
Don't get so clever you trip yourself up.

The only advantage I can see to doing it in your OOP way is if there are other element types added in future (i.e., other than artist, album and track). In that case, you wouldn't need a schema change.
However, I'd tend to opt for the non-OOP way and just change the schema in that case. Some problems you have with the OOP solution are:
what if you want to add the birthdate of artist?
what if you want to store duration of albums and tracks?
what if the want to store track type?
Basically, what if you want to store something that's psecific only to one or two of the element types?

If you're in to this sort of thing, then take a look at table inheritance in PostgreSQL.
create table Artist (id integer not null primary key, name varchar(50));
create table Album (parent integer foreign key (id) references Artist) inherits (Artist);
create table Track (parent integer foreign key (id) references Album) inherits (Artist);

I agree with le dorfier, you might get some reuse out of the notion of a base entity (ID, Name) but beyond that point the concepts of Artist, Album, and Track will diverge.
And a more realistic model would probably have to deal with the fact that multiple artists may contribute to a single track on an album...

Related

Normalize two tables with same primary key to 3NF

I have two tables currently with the same primary key, can I have these two tables with the same primary key?
Also are all the tables in 3rd normal form
Ticket:
-------------------
Ticket_id* PK
Flight_name* FK
Names*
Price
Tax
Number_bags
Travel class:
-------------------
Ticket id * PK
Customer_5star
Customer_normal
Customer_2star
Airmiles
Lounge_discount
ticket_economy
ticket_business
ticket_first
food allowance
drink allowance
the rest of the tables in the database are below
Passengers:
Names* PK
Credit_card_number
Credit_card_issue
Ticket_id *
Address
Flight:
Flight_name* PK
Flight_date
Source_airport_id* FK
Dest_airport_id* FK
Source
Destination
Plane_id*
Airport:
Source_airport_id* PK
Dest_airport_id* PK
Source_airport_country
Dest_airport_country
Pilot:
Pilot_name* PK
Plane id* FK
Pilot_grade
Month
Hours flown
Rate
Plane:
Plane_id* PK
Pilot_name* FK
This is not meant as an answer but it became too long for a comment...
Not to sound harsh, but your model has some serious flaws and you should probably take it back to the drawing board.
Consider what would happen if a Passenger buys a second Ticket for instance. The Passenger table should not hold any reference to tickets. Maybe a passenger can have more than one credit card though? Shouldn't Credit Cards be in their own table? The same applies to Addresses.
Why does the Airport table hold information that really is about destinations (or paths/trips)? You already record trip information in the Flights table. It seems to me that the Airport table should hold information pertaining to a particular airport (like name, location?, IATA code et cetera).
Can a Pilot just be associated with one single Plane? Doesn't sound very likely. The pilot table should not hold information about planes.
And the Planes table should not hold information on pilots as a plane surely can be connected to more than one pilot.
And so on... there are most likely other issues too, but these pointers should give you something to think about.
The only tables that sort of looks ok to me are Ticket and Flight.
Re same primary key:
Yes there can be multiple tables with the same primary key. Both in principle and in good practice. We declare a primary or other unique column set to say that those columns (and supersets of them) are unique in a table. When that is the case, declare such column sets. This happens all the time.
Eg: A typical reasonable case is "subtyping"/"subtables", where entities of a kind identified by a candidate key of one table are always or sometimes also of the kind identifed by the same values in another table. (If always then the one table's candidate key values are also in the other table's. And so we would declare a foreign key from the one to the other. We would say the one table's kind of entity is a subtype of the other's.) On the other hand sometimes one table is used with attributes of both kinds and attributes inapplicable to one kind are not used. (Ie via NULL or a tag indicating kind.)
Whether you should have cases of the same primary key depends on other criteria for good design as applied to your particular situation. You need to learn design including normalization.
Eg: All keys simple and 3NF implies 5NF, so if your two tables have the same set of values as only & simple primary key in every state and they are both in 3NF then their join contains exactly the same information as they do separately. Still, maybe you would keep them separate for clarity of design, for likelihood of change or for performance based on usage. You didn't give that information.
Re normal forms:
Normal forms apply to tables. The highest normal form of a table is a property independent of any other table. (Athough you might choose that form based on what forms & tables are alternatives.)
In order to normalize or determine a table's highest normal form one needs to know (in general) all the functional dependencies in it. (For normal forms above BCNF, also join dependencies.) You didn't give them. They are determined by what the meaning of the table is (ie how to determine what rows go in it in any given situation) and the possible situtations that can arise. You didn't give them. Your expectation that we could tell you about the normal forms your tables are in without giving such information suggests that you do not understand normalization and need to educate yourself about it.
Proper design also needs this information and in general all valid states that can arise from situations that arise. Ie constraints among given tables. You didn't give them.
Having two tables with the same key goes against the idea of removing redundancy in normalization.
Excluding that, are these tables in 1NF and 2NF?
Judging by the Names field, I'd suggest that table1 is not. If multiple names can belong to one ticket, then you need a new table, most likely with a composite key of ticket_id,name.

What is the Best Practice for Composite Key with JPA?

I am creating a DB for my project and I am facing a doubt regarding best practice.
My concrete case is:
I have a table that stores the floors of a building called "floor"
I have a second table that stores the buildings called "building"
I have a third table that stores the relationship between them, called building_x_floor
The problem is this 3rd table.
What should I do?
Have only two columns, one holding a FK to the PK of building and another holding an FK to the PK of floor;
Have the two columns above and a third column with a PK and control consistency with trigger, forbidding to insert a replicated touple of (idbuilding, idfloor)?
My first thought was to use the first option, but I googling around and talking I heard that it is not always the best option.
So I am asking for guidance.
I am Using MySQL 5.6.17
You don't need third table. Because there is one-to-many relationship between building and floor.
So one building has many floors and a floor belongs to one building. Don't get things complicated. Even though you need a table with composite keys, you should be careful. You need to override equals and hashCode methods.
I am still not confortable with that approach. I am not saying it is wrong or innapropriate, very far from that. I am trying to understand how the informations would be organized and how performatic it would be.
If I have a 1:* relationship, like a student may be attending to more than one subject along its university course within a semester I would Have the 3rd table with (semester, idstudent, iddiscipline).
If I try to get rid of the join table my relationship would be made with a FK inside student table or inside subject table. And it does not make sense to do that because student table is a table for a set of information related with registering the info of a person while the discipline table holds the data of a discipline, like content, hours...it is more a parametric table.
So I would need a table for the join.

Doctrine ClassTable vs SingleTable inheritance (specific for a project)

I'm developing an art web where users can publish different types of art: Images, Literature, Fonts, etc.. my question is about the database structure for the Work table.
Each work has basically the same fields (id, owner, name, description) but also some unique fields:
Image: image_path, album_id (relation)
Literature: text, book_id (relation)
Fonts: file_path
What will be the best table structure? Please keep in mind that I'll have Comments and other relational tables pointing to Work
Single Table Inheritance
Pros:
easy to manage and use in relation. No JOINS are required.
Cons:
no seperation of the unique fields (FontWork will have book_id, album_id, etc)
Class Table Inheritance
Pros:
each table will have only it's unique fields.
Cons:
Performance. multiple JOINS for about every query executed.
I would like to hear your opinion about it and also get new implementation ideas!
Thanks :)
I would recommend that you start with simple Single Table inheritance because your classes have few fields, along with a type attribute (a tiny integer) to separate the different entities.
This also means that your comments and work tables will join to one table. As the numbers grow you can partition your table to improve performance.
Bottom line is start simple and make more complex as your needs change.

Trying to avoid "Polymorphic Associations" and uphold foreign key referential integrity

I'm building a site similar to Yelp (Recommendation Engine, on a smaller scale though), so there will be three main entities in the system: User, Place (includes businesses), and Event.
Now what I'm wondering about is how to store information such as photos, comments, and 'compliments' (similar to Facebook's "Like") for each of these type of entity, and also for each object they can be applied to (e.g. comment on a recommendation, photo, etc). Right now the way I was doing it was a single table for each i.e.
Photo (id, type, owner_id, is_main, etc...)
where type represents: 1=user, 2=place, 3=event
Comment (id, object_type, object_id, user_id, content, etc, etc...)
where object_type can be a few different objects like photos, recommendations, etc
Compliment (object_id, object_type, compliment_type, user_id)
where object_type can be a few different objects like photos, recommendations, etc
Activity (id, source, source_type, source_id, etc..) //for "activity feed"
where source_type is a user, place, or event
Notification (id, recipient, sender, activity_type, object_type, object_id, etc...)
where object_type & object_id will be used to provide a direct link to the object of the notification e.g. a user's photo that was complimented
But after reading a few posts on SO, I realized I can't maintain referential integrity with a foreign key since that's requires a 1:1 relationship and my source_id/object_id fields can relate to an ID in more than one table. So I decided to go with the method of keeping the main entity, but then break it into subsets i.e.
User_Photo (photo_id, user_id) | Place_Photo(photo_id, place_id) | etc...
Photo_Comment (comment_id, photo_id) | Recommendation_Comment(comment_id, rec_id) | etc...
Compliment (id, ...) //would need to add a surrogate key to Compliment table now
Photo_Compliment(compliment_id, photo_id) | Comment_Compliment(compliment_id, comment_id) | etc...
User_Activity(activity_id, user_id) | Place_Activity(activity_id, place_id) | etc...
I was thinking I could just create views joining each sub-table to the main table to get the results I want. Plus I'm thinking it would fit into my object models in Code Igniter as well.
The only table I think I could leave is the notifications table, since there are many object types (forum post, photo, recommendation, etc, etc), and this table will only hold notifications for a week anyway so any ref integrity issues shouldn't be much of a problem (I think).
So am I going about this in a sensible way? Any performance, reliability, or other issues that I may have overlooked?
The only "problem" I can see is that I would end up with a lot of tables (as it is right now I have about 72, so I guess i would end up with a little under 90 tables after I add the extras), and that's not an issue as far as I can tell.
Really grateful for any kind of feedback. Thanks in advance.
EDIT: Just to be clear, I'm not concerned if i end up with another 10 or so tables. From what I know, the number of tables isn't too much of an issue (once they're being used)... unless you had say 200 or so :/
Some propositions for this UoD (universe of discourse)
User named Bob logged in.
User named Bob uploaded photo number 56.
There is a place named London.
Photo number 56 is of place named London.
User named Joe created comment "very nice" on photo number 56.
To introduce object IDs
User (UserID) logged in.
User (UserID) uploaded Photo (PhotoID).
There is Place (PlaceID).
Photo (PhotoID) is of Place (PlaceID).
User (UserID) created Comment (CommentID) on Photo (PhotoID).
Just Fact Types
User logged in.
User uploaded Photo.
Place exists.
Photo is of Place.
User created Comment on Photo.
Now to extract predicates
Predicate Predicate Arity
---------------------------------------------
... logged in 1 (Unary predicate)
... uploaded ... 2 (Binary)
... exists 1 (Unary)
... is of ... 2 (Binary)
... created ... on ... 3 (Ternary)
It looks like each proposition is this UoD may be stated with max ternary predicate,
so I would suggest something like
Predicate role (Role_1_ID, Role_2_ID, Role_3_ID) is a part that an object plays in a predicate. Substitute the ... in a predicate from left to right with each Role_ID.
Note that only Role_1_ID is mandatory (at least unary predicate), the other two may be NULL.
In this simple model, it is possible to propose anything.
Hence, you would need to implement constraints on the application layer.
For example, you have to make sure that it is possible to create Comment on Place, but not create Place on Place.
Not all predicates represents action, for example ... logged in is an action while ... is of ... is not.
So, your activity feed would list all Propositions with Predicate.IsAction = True.
If you rearrange things slightly, you can simplify your comments and compliments. Essentially you want to have a single store of comments and another one of compliments. Your problem is that this won't let you use declarative referential integrity (foreign key constraints).
The way to solve this is to make sure that the objects that can attract comments and compliments are all logical sub-types of one supertype. From a logical perspective, it means you have an "THING_OF_INTEREST" entity (I'm not making a naming convention recommendation here!) and each of the various specific things which attract comments and compliments will be a sub-type of THING_OF_INTEREST. Therefore your comments table will have a "thing_of_interest_id" FK column and similarly for your compliments table. You will still have the sub-type tables, but they will have a 1:1 FK with THING_OF_INTEREST. In other words, THING_OF_INTEREST does the job of giving you a single primary key domain, whereas all of the sub-type tables contain the type-specific attributes. In this way, you can still use declarative referential integrity to enforce your comment and compliment relationships without having to have separate tables for different types of comments and compliments.
From a physical implementation perspective, the most important thing is that your various things of interest all share a common primary key domain. That's what lets your comment table have a single FK value that can be easily joined with whatever that thing of interest happens to be.
Depending on how you go after your comments and recommendations, you probably will (but may not) need to physically implement THING_OF_INTEREST - which will have at least two attributes, the primary key (usually an int) plus a partitioning attribute that tells you which sub-type of thing it is.
If you need referential integrity (RI) there is no better way to do it than to use many-to-many junction tables. True, you end up having a lot of tables in the system, but that's the cost you need to pay. It also has some other benefits going this route, for instance you get some sort of partitioning for free: you get the data partitioned by their relation type, each in its own table. This offers RI but it is not 100% safe either, for instance there's nothing to guarantee you that a comment belongs to a photo and to that photo alone, you'd need to enforce this kind of constraints manually should you need them.
On the other hand, going with a generic solution like you already did gets you faster off the ground and it's way easier to extend in the future but there'll be no RI unless you'll code it manually (which is very complex and a lot harder to deal with than the alternative M:M for every relation type).
Just to mention another alternative, similar to your existing implementation, you could use a custom M:M junction table to handle all your relations regardless of their type: object1_type, object1_id, object2_type, object2_id. Simple but no other benefit beside very easy to implement and extend. I'd only recommend it if you don't need RI and you got yourself a lot of tables, all interlinked.

Database Formatting for Album Tracks

I would like to store album's
track names in a single field in a
database.
The number of tracks are arbitrary
for each album.
Each album is one record in the table.
Each track must be linked to a specific URL which also should be stored in the database somewhere.
Is it possible to do this by storing them in a single field, or is a relational table for the track names/urls the only way to go?
Table: Album
ID/PK (your choice of primary key philosophy)
AlbumName
Table: Track
ID/PK (optional, could make AlbumFK, TrackNumber the primary key)
AlbumFK REFERENCES (Album.PK)
TrackNumber
TrackName
TrackURL
It's entirely possible, you could store the field as comma-separated or XML data for example.
Whether it's sensible is another question - if you ever want to query how many albums have more than 10 tracks for example you aren't going to be able to write an SQL query for that and you'll have to resort to pulling the data back into your application and dissecting it there which is not ideal.
Another option is to store the data in a separate "tracks" table (i.e. normalised), but also provide a view on those tables that gives the data as a single field in a denormalised manner. Then you get the benefit of properly structured data and the ability to query the data as a single field from the view.
Conventional approach would be to have one table with a row for each track (with any meta data). Have another table for each Album, and a third table that records the association for which tracks are on which album(s) and in which order.
Use two tables, one for albums, and one for tracks.
Album
-----
Id
Name
Artist
etc...
Track
-----
Id
AlbumId(Foreign Key to Album Table)
Name
URL
You could also augment this with a third table that joined the trackId and AlbumId fields (so don't have the AlbumId in the Track table). The advantage of this second approach would be that it would allow you to reuse a recording when it appeared on many albums (such as compilations).
The Wikipedia article on Database Normalization makes a reasonable effort to explain the purpose of normalization ... and the sorts of anomalies that the normalization rules are intended to prevent.