My question might be dumb since I think it's a very common design issue, and I guess there is a simple and usual solution to it:
I have a table Producer and a table Movie
ONE Producer has produced MANY Movies
ONE Producer has ONE favorite Movie among the ones he has produced
How do I implement this in MySQL?
just one ONE-TO-MANY relation between Producer and Movie, plus a 'favorite' boolean attribute in the Movie table
one ONE-TO-MANY relation to represent the 'has produced' relation, and a ONE-TO-ONE relation to represent the 'is favorite' relation
The first solution seems more natural to me, but when the producer wants to change his favorite movie, I guess the second solution is more efficient. As well as it should be more efficient to find a producer's favorite movie with solution #2.
What am I missing? Is there a best solution? If not in which case should I use solution #1 and solution #2?
(Of course, my problem is a bit more complex thant the example above...)
The (1) is not easy/efficient to enforce declaratively. Plus, you end-up wasting space on all non-favorite movies.
The (2) is the way to go. Unfortunately, this circular dependency will lead to the chicken-and-egg problem, which is solved:
either by deferring one of the FKs (if the DBMS supports it, which MySQL unfortunately desn't),
or by leaving the FK in user NULL-able, which is less than ideal since the user can now have zero or one favorite movies (as opposed to strictly one).
Assuming you want the same movie to be relatable to multiple users (making it a many-to-many relationship, not one-to-many as you stated), your model would end-up looking something like this in a DBMS supporting deferrable FKs:
But you don't have the luxury of deferring the constraints in MySQL, so you'll be forced to do something like this:
CHECK (FAVORITE_USER_ID IS NULL OR FAVORITE_USER_ID = USER_ID)
The boolean attribute not only takes up a lot more space, but also looks like it would prevent more than one user from having a favourite.
The second solution sounds correct. Have a field in User to represent each user's favourite and an additional table to make the one to many "has watched" relationship.
If you need to make sure that the user has actually watched his/her favourite movie, you should add that logic as a business rule in your data access object.
In general, I would go with solution 2. Along with making queries for the favorite simpler, this has the added benefit of a built-in constraint, limiting the user to having only one favorite movie. Furthermore, this will be slightly smaller, having one id per user instead of one boolean per watched movie-user. One side-effect, however, is that you are able to favorite a movie without watching it.
Solution 1 would be the preferred choice only if you wanted to extend your favorites system in the future to include more than one favorite. However, it would appear that this is explicitly not desirable.
It is also worth mentioning that the relationship between users and movies is many-to-many, not one-to-many, as each user has their own list of watched movies. Therefore you will need a third table to link the two. Unless of course you are just having a list of uncorrelated strings for each user, but I doubt this is the case.
If you are sure there will always only be one movie I would have a "favorite movie ID" attribute in the user table, along with a many-to-many relation between users and movies (the same movie can be watched by many users)
So, three tables total, flag in the users table.
I think second solution is more feasible. You can create two tables for these relations, one named HasWatched, the other one named Favorite. Both of these tables constist of two columns, named userId - movieId. The primary key for Haswatched table is (userId - movieId) tuple. But primary key of Favorite table is only userId. Therefore, you can apply all constarints.
I would do a third table... I don't think is a one to many relation, its a many to many, one user can watch many films but one film can be watched by many users...
I would do a users table, a movie table, and a user-movie-watched table with user_id, movie_id
and add a favorite_movie field pointing the movie_id the user favs.
Related
I'm not looking for the answer, I am just looking for some guidance or a little clarity here. I need to design a database as if I worked for redbox and I'm trying to track movies actors and directors. So I am assuming I need three different tables but I just don't understand how to "track" it. Would I create a custom ID for each movie and something that tracks where the kiosks are? Like I said, I think I can do this but I just fully understand it.
Any help is appreciated
In broad strokes here is what you need:
(Basic relational rules and strategy apply, so every table needs to have a Primary Key, and the keys will be used to relate the tables together).
movie:
One row per movie, with title, rating, year, etc.
person:
Add to that a related person table with one row for any person who might be a cast or crew member in any film.
credit:, credit_type
Now relate Movie <-> Person
Since this is a many to many relationship you need a table between the two. Typically this would be called "credit" and you need a credit_type table that will describe the credit (actor, director, writer, producer, etc).
Of course that has nothing to do with your "tracking" question. For that you would need a slew of tables:
inventory:
Here is where you have one row for every copy of a movie that exists. It should be obvious that there will be a foreign key for a movie in this table. In the real world there would be an assigned id that would then be printed out as a barcode and attached to the disk + sleeve of the physical material.
kiosk:
For every Kiosk there is a row, along with location information, which could be an address perhaps along with a note, in case there are multiple kiosks at the same location.
kiosk_bin:
For every Kiosk, you will have a 1-M bins, each with a number identifying it.
I wouldn't do it this way, but you could for simplicity add a column in kiosk_bin that would be a foreign key to the inventory table. In this way you are able to indicate that an inventory (a single copy of one particular movie) is sitting in a kiosk_bin.
member:
These are the people subscribed to the service.
member_checkout:
When a member gets a movie from a kiosk/kiosk_bin, a row gets created here, with the inventory_id, and the date, and the system would update the kiosk_bin row to remove the inventory_id and show that the bin is now empty and could accept another inventory copy.
As you can see, this is non-trivial. Database design of any relatively complicated business process is going to be more than 3 tables, I'm sorry to say.
Here's an ERD that illustrates some of the basic movie to credit relations I did for another similar question. The tables were named a bit differently but you should be able to match them up.
I was studying about relationships in RDBMS.I have understood the basic concept behind mapping relation ship,but I am not able to spot them.
The three possibilities :
one to many(Most common) requires a PK - FK relationsip.Two tables involved
many to many(less common) requires a junction table.Three tables Involved
one to one(very rare). One table involved.
When I begin a project,I am not able to separate the first two conditions and I am not clear in my head.
Examples when I study help for a brief moment,but not when I need to put these principles in to practice.
This is the place where most begineers falter.
How can I spot these relationships.Is there a simpler way?
Don't look at relationships from a technical perspective. Use analogies and real-life examples when trying to envision relationships in your head.
For example, let's say we have a library database.
A library must have books.
M:M
Each Book may have been written by multiple Authors and each Author may have written multiple Books. Thus it is a many-to-many relationship which will reflect into 3 tables in the database.
1:M
Each Book must also have a Publisher, but a Book may only have one Publisher and a Publisher can publish many Books. Thus it is a one-to-many relationship and it reflects with the PublisherId being referenced in the Books table.
A simple analogy like this one explains relationships to their core. When you try to look at them through a technical lens you're only making it harder on yourself. What's actually difficult is applying real world data scenarios when constructing your database.
I think the reason you are not getting the answers that you need is because of the way you are framing the question. Instead of asking “How do I spot the correct type of relationship between entities”, think about “How do my functional needs dictate what relationship to implement”. Database design doesn’t drive the function; it’s the functional needs that drive the relationships you need to implement.
When designing a database structure, you need to identify all the entities. Entities are all the facts that you want to store: lists of things like book titles, invoices, countries, dog species, etc. Then to identify your relationships, you have to consider the types of questions you will want to ask your database. It takes a bit of forward thinking sometimes… just because nobody is asking the question now doesn’t mean that it might not ever be asked. So you can’t ask the universe “what is the relationship between these lists of facts?” because there is no definitive answer. You define the universe… I only want to know answers to these types of questions; therefore I need to use this type of relationship.
Let’s examine an example relation between two common entities: a table of customers and a table of store locations. There is no “correct” way to relate these entities without first defining what you need to know about them. Let’s say you work for a retailer and you want to give a customer a default store designation so they can see products on the website that their local store has in stock. This only requires a one-to-many relationship between a store and the customer. Designing the relationship this way ensures that one store can have many customers as their default and each customer can only have one default store. To implement this relationship is as easy as adding a DefaultStore field to your Customer table as a foreign key that links to the primary key of the Store table.
The same two entities above might have alternate requirements for the relationship definition in a different context. Let’s say that I need to be able to give the customer the opportunity to select a list of favorite stores so that they can query about in stock information about all of them at once. This requires a many-to-many relationship because you want one customer to be able to relate to many stores and each store can also relate to many customers. To implement a many-to-many relationship requires a little more overhead because you will have to create a separate table to define the relationship links, but you get this additional functionality. You might call your relationship table something like CustomerStoreFavorites and would have as its primary key as the combined primary keys from each of the entities: (CustomerID, StoreID). You could also add attributes to the relationship, like possibly a LastOrderDate field to specify the last date that the customer ordered something from a particular store.
You could technically define both types of relationships for the same two entities. As an example: maybe you need to give the customer the option to select a default store, but you also need to be able to record the last date that a customer ordered something from a particular store. You could implement the DefaultStore field on the Customer table with the foreign key to the Store table and also create a relationship table to track all the stores that a customer has ordered from.
If you had some weird situation where every customer had their own store, then you wouldn’t even need to create two tables for your entities because you can fit all the attributes for both the customer and the store into one table.
In short, the way you determine which type of relationship to implement is to ask yourself what questions you will need to ask the database. The way you design it will restrict the relational data you can collect as well as the queries you can ask. If I design a one-to-many relationship from the store to the customer, I won’t be able to ask questions about all the stores that each customer has ordered from unless I can get to that information though other relationships. For example, I could create an entity called "purchases" which has a one-to-many relationship to the customer and store. If each purchase is defined to relate to one customer and one store, now I can query “what stores has this customer ordered from?” In fact with this structure I am able to capture and report on a much richer source of information about all of the customer's purchases at any store. So you also need to consider the context of all the other relationships in your database to decide which relationship to implement between two particular entities.
There is no magic formula, so it just takes practice, experience, and a little creativity. ER Diagrams are a great way to get your design out of your head and onto paper so that you can analyze your design and ensure that you can get the right types of questions answered. There are also a lot of books and resources to learn about database architecture. One good book I learned a lot from was “Database System Concepts” by Abraham Silberschatz and Henry Korth.
Say you have two tables A and B. Consider an entry from A and think of how many entries from B it could possibly be related with at most: only one, or more? Then consider an entry from B and think of how many entries in A it could be related with.
Some examples:
Table A: Mothers, Table B: Children. Each child has only one mother but a mother may have one or more children. Mothers and Children have a one-to-many relationship.
Table A: Doctors, Table B: Patients. Each patient may be visiting one or more doctors and each doctor treats one or more patients. So they have a many-to-many relationship.
An example of one to one:
LicencePlate to Vehicle. One licence plate belongs to one vehicle and one vehicle has one licence plate.
I'm building a site similar to Yelp (Recommendation Engine, on a smaller scale though), so there will be three main entities in the system: User, Place (includes businesses), and Event.
Now what I'm wondering about is how to store information such as photos, comments, and 'compliments' (similar to Facebook's "Like") for each of these type of entity, and also for each object they can be applied to (e.g. comment on a recommendation, photo, etc). Right now the way I was doing it was a single table for each i.e.
Photo (id, type, owner_id, is_main, etc...)
where type represents: 1=user, 2=place, 3=event
Comment (id, object_type, object_id, user_id, content, etc, etc...)
where object_type can be a few different objects like photos, recommendations, etc
Compliment (object_id, object_type, compliment_type, user_id)
where object_type can be a few different objects like photos, recommendations, etc
Activity (id, source, source_type, source_id, etc..) //for "activity feed"
where source_type is a user, place, or event
Notification (id, recipient, sender, activity_type, object_type, object_id, etc...)
where object_type & object_id will be used to provide a direct link to the object of the notification e.g. a user's photo that was complimented
But after reading a few posts on SO, I realized I can't maintain referential integrity with a foreign key since that's requires a 1:1 relationship and my source_id/object_id fields can relate to an ID in more than one table. So I decided to go with the method of keeping the main entity, but then break it into subsets i.e.
User_Photo (photo_id, user_id) | Place_Photo(photo_id, place_id) | etc...
Photo_Comment (comment_id, photo_id) | Recommendation_Comment(comment_id, rec_id) | etc...
Compliment (id, ...) //would need to add a surrogate key to Compliment table now
Photo_Compliment(compliment_id, photo_id) | Comment_Compliment(compliment_id, comment_id) | etc...
User_Activity(activity_id, user_id) | Place_Activity(activity_id, place_id) | etc...
I was thinking I could just create views joining each sub-table to the main table to get the results I want. Plus I'm thinking it would fit into my object models in Code Igniter as well.
The only table I think I could leave is the notifications table, since there are many object types (forum post, photo, recommendation, etc, etc), and this table will only hold notifications for a week anyway so any ref integrity issues shouldn't be much of a problem (I think).
So am I going about this in a sensible way? Any performance, reliability, or other issues that I may have overlooked?
The only "problem" I can see is that I would end up with a lot of tables (as it is right now I have about 72, so I guess i would end up with a little under 90 tables after I add the extras), and that's not an issue as far as I can tell.
Really grateful for any kind of feedback. Thanks in advance.
EDIT: Just to be clear, I'm not concerned if i end up with another 10 or so tables. From what I know, the number of tables isn't too much of an issue (once they're being used)... unless you had say 200 or so :/
Some propositions for this UoD (universe of discourse)
User named Bob logged in.
User named Bob uploaded photo number 56.
There is a place named London.
Photo number 56 is of place named London.
User named Joe created comment "very nice" on photo number 56.
To introduce object IDs
User (UserID) logged in.
User (UserID) uploaded Photo (PhotoID).
There is Place (PlaceID).
Photo (PhotoID) is of Place (PlaceID).
User (UserID) created Comment (CommentID) on Photo (PhotoID).
Just Fact Types
User logged in.
User uploaded Photo.
Place exists.
Photo is of Place.
User created Comment on Photo.
Now to extract predicates
Predicate Predicate Arity
---------------------------------------------
... logged in 1 (Unary predicate)
... uploaded ... 2 (Binary)
... exists 1 (Unary)
... is of ... 2 (Binary)
... created ... on ... 3 (Ternary)
It looks like each proposition is this UoD may be stated with max ternary predicate,
so I would suggest something like
Predicate role (Role_1_ID, Role_2_ID, Role_3_ID) is a part that an object plays in a predicate. Substitute the ... in a predicate from left to right with each Role_ID.
Note that only Role_1_ID is mandatory (at least unary predicate), the other two may be NULL.
In this simple model, it is possible to propose anything.
Hence, you would need to implement constraints on the application layer.
For example, you have to make sure that it is possible to create Comment on Place, but not create Place on Place.
Not all predicates represents action, for example ... logged in is an action while ... is of ... is not.
So, your activity feed would list all Propositions with Predicate.IsAction = True.
If you rearrange things slightly, you can simplify your comments and compliments. Essentially you want to have a single store of comments and another one of compliments. Your problem is that this won't let you use declarative referential integrity (foreign key constraints).
The way to solve this is to make sure that the objects that can attract comments and compliments are all logical sub-types of one supertype. From a logical perspective, it means you have an "THING_OF_INTEREST" entity (I'm not making a naming convention recommendation here!) and each of the various specific things which attract comments and compliments will be a sub-type of THING_OF_INTEREST. Therefore your comments table will have a "thing_of_interest_id" FK column and similarly for your compliments table. You will still have the sub-type tables, but they will have a 1:1 FK with THING_OF_INTEREST. In other words, THING_OF_INTEREST does the job of giving you a single primary key domain, whereas all of the sub-type tables contain the type-specific attributes. In this way, you can still use declarative referential integrity to enforce your comment and compliment relationships without having to have separate tables for different types of comments and compliments.
From a physical implementation perspective, the most important thing is that your various things of interest all share a common primary key domain. That's what lets your comment table have a single FK value that can be easily joined with whatever that thing of interest happens to be.
Depending on how you go after your comments and recommendations, you probably will (but may not) need to physically implement THING_OF_INTEREST - which will have at least two attributes, the primary key (usually an int) plus a partitioning attribute that tells you which sub-type of thing it is.
If you need referential integrity (RI) there is no better way to do it than to use many-to-many junction tables. True, you end up having a lot of tables in the system, but that's the cost you need to pay. It also has some other benefits going this route, for instance you get some sort of partitioning for free: you get the data partitioned by their relation type, each in its own table. This offers RI but it is not 100% safe either, for instance there's nothing to guarantee you that a comment belongs to a photo and to that photo alone, you'd need to enforce this kind of constraints manually should you need them.
On the other hand, going with a generic solution like you already did gets you faster off the ground and it's way easier to extend in the future but there'll be no RI unless you'll code it manually (which is very complex and a lot harder to deal with than the alternative M:M for every relation type).
Just to mention another alternative, similar to your existing implementation, you could use a custom M:M junction table to handle all your relations regardless of their type: object1_type, object1_id, object2_type, object2_id. Simple but no other benefit beside very easy to implement and extend. I'd only recommend it if you don't need RI and you got yourself a lot of tables, all interlinked.
Say we have this scenario:
Artist ==< Album ==< Track
//ie, One Artist can have many albums, and one album can have many tracks
In this case, all 3 entities have basically the same fields:
ID
Name
A foreign of the one-many relationship to the corresponding children (Artist to Album and Album to Track
A typical solution to the provided solution would be three tables, with the same fields (ArtistID, AlbumID etc...) and foreign key constraints in the one-many relationship field.
But, can we in this case, incorporate a form of inheritance to avoid the repetition of the same field ? I'm talking something of the sort:
Table: EntityType(EntityTypeID, EntityName)
This table would hold 3 entities (1. Artist, 2. Album, 3. Track)
Table: Entities(EntityID, Name, RelField, EntityTypeID)
This table will hold the name of the entity (like the name of
an artist for example), the one-many field (foreign-key
of EntityID) and EntityTypeID holding 1 for Artist, 2 for Album
and so on.
What do you think about the above design? Does it make sense to incorporate "OOP concepts" in this DB scenario?
And finally, would you prefer having the foreign-key constraints of the first scenario or the more generic (with the risk of linking an artist with a Track for example, since there is no check to see the inputter foreign-key value is really of an album) approach?
..btw, come to think of it, I think you can actually check if an inputted value of the RelField of an Artist corresponds to an Album, with triggers maybe?
I have recently seen this very idea of abstraction implemented consistenly, and the application and its database became a monster to maintain and troubleshoot. I will stay away from this technique. The simpler, the better, is my mantra.
There's very little chance that the additional fields that will inevitably accumulate on the various entities will be as obliging. Nothing to be gained by not reflecting reality in a reasonably close fashion.
I don't imagine you'd even likely conflate these entities in your regular OO design.
This reminds me (but only slightly) of an attempt I saw once to implement everything in a single table (named "Entity") with another table (named "Attributes") and a junction table between them.
By stucking all three together, you make your queries less readble (unless you then decompose the three categories as views) and you make searching and indexing more difficult.
Plus, at some point you'll want to add attributes to one category, which aren't attributes for the others. Sticking all three together gives you no room for change without ripping out chunks of your system.
Don't get so clever you trip yourself up.
The only advantage I can see to doing it in your OOP way is if there are other element types added in future (i.e., other than artist, album and track). In that case, you wouldn't need a schema change.
However, I'd tend to opt for the non-OOP way and just change the schema in that case. Some problems you have with the OOP solution are:
what if you want to add the birthdate of artist?
what if you want to store duration of albums and tracks?
what if the want to store track type?
Basically, what if you want to store something that's psecific only to one or two of the element types?
If you're in to this sort of thing, then take a look at table inheritance in PostgreSQL.
create table Artist (id integer not null primary key, name varchar(50));
create table Album (parent integer foreign key (id) references Artist) inherits (Artist);
create table Track (parent integer foreign key (id) references Album) inherits (Artist);
I agree with le dorfier, you might get some reuse out of the notion of a base entity (ID, Name) but beyond that point the concepts of Artist, Album, and Track will diverge.
And a more realistic model would probably have to deal with the fact that multiple artists may contribute to a single track on an album...
I'm designing a mySQL DB and I'm having the following issue:
Say I have a wall_posts table. Walls can belong to either an event or a user.
Hence the wall_posts table must references either event_id or user_id (foreign key constraint).
What is the best way to build such a relationship, considering I must always be able to know who the walls belong to ... ?
I've been considering using 2 tables, such as event_wall_posts and user_wall_posts so one got an event_id field and the other a user_id one, but I believe there must be something much better than such a redundant workaround ...
Is there a way to use an intermediate table to link the wall_posts to either an event_id or a user_id ?
Thanks in advance,
Edit : seems there is not a clear design to do this and both approach seem okay, so,
which one will be the fastest if there is a lots of data ?
Is it preferable to have 2 separates table (so queries might be faster, since there will be twice less data in tables ...), or is it still preferable to have a better OO approach with a single wall_posts table referencing a wall table (and then both users and events will have a uniquewall_id`)
Why is it redundant? You won't write code twice to handle them, you will use the same code, just change the name of the table in the SQL.
Another reason to take this approach is that some time in the future you will discover you need new different fields for each entity.
What you're talking about is called an exclusive arc and it's not a good practice. It's hard to enforce referential integrity. You're probably better off using, in an object sense, a common supertype.
That can be modelled in a couple of ways:
A supertype table called, say, wall. Two subtype tables (user_wall and event_wall) that link to a user and event respectively as the owner. The wall_posts table links to the supertype table; or
Putting both entity types into one table and having a type column. That way you're not linking to two separate tables.
Go for the simplest solution: add both an event_id and a user_id column to the wall_posts table. Use constraints to enforce that one of them is null, and the other is not.
Anything more complex smells like overnormalization to me :)
A classical approach to this problem is:
Create a table called wall_container and keep properties common to both users and events in it
Reference both users and events to wall_container
Reference wall_posts to wall_container
However, this is not very efficient and it's not guaranteed that this wall_container doesn't containt records that are not either a user or an event.
SQL is not particularly good in handling multiple inheritance.
Your wall and event has their own unique IDs .. right?? then their is no need for another table . let the wall_post table have a attribute as origin which will direct to the record of whatever the record is event's or users. '
If the wall and event may have same ID then make a table with three attributes origin(primary), ID number and type. There ID number will be what you set, type defining what kind of entity does the ID represent and origin will be a new ID which you will generate maybe adding different prefix. In case of same ID the origin table will help you immensely for other things to other than wall posts.