Abstract example: If I have a system with domains of "Fleet" containing a "Vehicle" class, and "Customers" containing a "Driver" class, where would you place a joining class (which would detail lifecycle, insurance claims, and other information about the relationship)? Fleet and Customer concerns are equally important to the system and views on the relationship from both directions will be made.
Fleet.DriverHistory?
Customers.VehicleHistory?
MyVagueGeneralRelationshipNamespace.VehicleDriverHistory?
Other?
I don't think it needs to be a Vague relationship. The vehicle allocations may be "abstract" in the sense that you can't touch them, but in a business sense they are "real", in fact they are pretty much the whole reason for the business. So I'd have a domain "Rental" or some such, which can have your two histories.
I think some kind of record class independent of both two classes is definitely the way to go, ie
MyVagueGeneralRelationshipNamespace.VehicleDriverHistory
Related
If I had Book and Author and the relationship between can exist as MainAuthor and AssistantAuthor. There could be multiple instances of MainAuthor and AssistantAuthor. What are the tradeoffs between making two separate join tables, or adding a field in the join table that represents the relationship on whether it's MainAuthor or AssistantAuthor?
It depends on how much do you think the business model will change in the future.
The benefit of two separate relationship is that it looks cleaner, simpler. If you want the main authors you use one relationship; if you want the asistants you use the other one. Another benefit is that it's cheaper to adopt this solution.
The benefit of a single relationship with a "type" field is flexibility over time. What happens if next month you want to add a third or fourth type of author? For example: a legal adviser, or a collaborator? The same single relationship should be able to accommodate more types easily.
Bottom line: if you think the relationship is stable and won't change, go for the first one; if you prefer to add flexibility, you could adopt the second one that is a little bit more expensive.
I have the following entities: Books, Authors, and Stores.
Each of them can have a comments section. Should I store the comments in a separate table OR have a subtype/supertype design? Is it technically wrong if I use separate tables? Because either way, it may required the same amount of work OR the subtype design may require more work if a supertype hierarchy changes for any subtype.
Supertypes and subtypes attack the issue "These things are not exactly alike, but they're also not utterly different."
A supertype/subtype design requires that some attributes be stored in the supertype, and some be stored in the subtype. The attributes for each thing in the real world are split between two tables.
How do you decide how to split up the attributes? The attributes common to all subtypes move "up" into the supertype. So if you were starting with companies and individuals, they're not exactly alike, because
one is an individual, the other is (conceptually) a bunch of individuals,
one can have children, the other can't,
one can get married, the other can't
and so on.
They're not utterly different, because
both can have multiple addresses, phone numbers, email addresses, web sites, etc.,
both can enter into contracts with other companies,
both can enter into contracts with other individuals,
both are required to file tax returns
and so on.
The attributes common to both (legal name, at the very least) bubble "up" into the supertype.
In your case, though, it's not clear how books, authors, and stores fit into that analysis. It's clear that they're not exactly alike. But are they utterly different? I think so.
If you're talking about something like web site posts about books, persons, and stores, that's a different story. The answer to a similar SO question includes code.
I'm setting up a database that will have 'business_owners' and 'customers'. I could set this up in a couple days but wanted to see what your opinion is on best practice.
I could have two tables, 'business_owners' and 'customers', each with name, email etc. or...
I could do one table 'Users' and have a user_type as 'business_owner' or 'customer' and just use that type to determine what to show.
I'm thinking the second option is best, any feedback?
Rule of thumb:
If you have more than one table with identical (or near identical) columns, they should be condensed into a single table. Use a type code/etc to distinguish between as necessary, and work out the business rules for columns that depend on the type code.
Answer:
The second option is the best approach. It's the most scalable, and will be the easiest to work with if you ever need to use resultsets that include both business owners & customers.
It depends on the difference between the two types, if they share exactly the same attributes aside from their role as either a 'user' or 'business owner' I would suggest going for the second option to avoid overkill in terms of having identical columns in 2 separate tables.
How would you model this in an object model? Would you set up a single superclass, call it "stakeholders", that captures the properties of both business-owners and customers? Would you then set up specialized subclasses, "business-owner" and "customer" that extend the definition of stakeholders? If so, read on.
Your case looks like an instance of the Gen-Spec design pattern. Gen-spec is familiar to object oriented programmers through the superclass-subclass hierarchy. Unfortunately, introductions to relational database design tend to skip over how to design tables for the Gen-Spec situation. Fortunately, it’s well understood. A web search on “Relational database generalization specialization” will yield several articles on the subject. Some of your hits will be previous questions here on SO. Here is one article that discusses Gen-Spec in terms of Object Relational Mapping.
The trick is in the way the PK for the subclass (specialized) tables gets assigned. It’s not generated by some sort of autonumber feature. Instead, it’s a copy of the PK in the superclass (generalized) table, and is therefore an FK reference to it.
Thus, if the case were vehicles, trucks and sedans, every truck or sedan would have an entry in the vehicles table, trucks would also have an entry in the trucks table, with a PK that’s a copy of the corresponding PK in the vehicles table. Similarly for sedans and the sedan table. It’s easy to figure out whether a vehicle is a truck or a sedan by just doing joins, and you usually want to join the data in that kind of query anyway.
How do you know when to create a new table for very similar object types?
Example:
To learn mysql I'm building a model solar system. For the purposes of my project, planets have many similar attributes to dwarf planets, centaurs, and comets. Dwarf planets are almost completely identical to planets. Centaurs and comets are only different from planets because their orbital path has more variation. Should I have a separate table for each type of object, or should they share tables?
The example is probably too simple, but I'm also interested in best practices. Like should I use separate tables just in case I want to make planets and dwarf planets different in the future, or are their any efficiency reasons for keeping them in the same table.
Normal forms is what you should be interested with. They pretty much are the convention for building tables.
Any design that doesn't break the first, second or third normal form is fine by me. That's a pretty long list of requirement though, so I suggest you go read it off the Wikipedia links above.
It depends on what type of information you want to store about the objects. If the information for all of them is the same, say orbit radius, mass and name, then you can use the same table. However, if there are different properties for each (say atmosphere composition for planets, etc.) then you can either use separate tables for each (not very normalized) or have one table for basic properties like orbit, mass and name and a second table for just the properties that are unique to planets (and a similar table for comets, etc. if needed). All objects would be in the first table but only planets would be in the second table and linked through a foreign key to the first table.
It's called Database Normalization
There are many normal forms. By applying normalization you will go through metadata (tables) and study the relationsships between data more clearly. By using the normalization techniques you will optimize the tables to prevent redundancy. This process will help you understand which entities to create based on the relationsships between the different fields.
You should most likely split the data about a planet etc so that the shared (common) information is in another table.
E.g.
Common (Table)
Diameter (Column)
Mass (Column)
Planet
Population
Comet
Speed
Poor columns I know. Have the Planet and Comet tables link to the Common data with a key.
This is definitely a subjective question. It sounds like you are already on the right lines of thinking. I would ask:
Do these objects share many attributes? If so, it's probably worth considering at the very least a base table to list them all in.
Does one object "extend" another - it has all the attributes of the other, plus some extras? If so, it might be worth adding another table with the extra attributes and a one-to-one mapping back to the base object.
Do both objects have many shared attributes and unshared attributes? If this is the case, maybe you need a single table plus a "data extension" system where each object can have a type or category that specifies any amount of extra attributes that may be associated with it.
Do the objects only share one or two attributes? In this case, they are probably dissimilar enough to separate into multiple tables.
You may also ask yourself how you are going to query the data. Will you ever want to get them all in the same list? It's always a good idea to combine data into tables with other data they will commonly be queried with. For example, an "attachments" table where the file can be an image or a video, instead of images and video tables, if you commonly want to query for all attachments. Don't split into multiple tables unless there is a really good reason.
If you will ever want to get planets and comets in one single query, they will pretty much have to be in the same table if you want the database to work efficiently. Inheritance should be handled inside your app itself :)
Here's my answer to a similar question, which I think applies here as well:
How do you store business activities in a SQL database?
There are many different ways to express inheritance in your relational model. For example you can try to squish everything in to one table and have a field that allows you to distinguish between the different types or have one table for the shared attributes with relationships to a child table with the specific attributes etc... in either choice you're still storing the same information. When going from a domain model to a relational model this is what is called an impedance mismatch. Both choices have different trade offs, for example one table will be easier to query, but multiple tables will have higher data density.
In my experience it's best not to try to answer these questions from a database perspective, but let your domain model, and sometimes your application framework of choice, drive the table structure. Of course this isn't always a viable choice, especially when performance is concerned.
I recommend you start by drawing on paper the relationships you want to express and then go from there. Does the table structure you've chosen represent the domain accurately? Is it possible to query to extract the information you want to report on? Are the queries you've written complicated or slow? Answering these questions and others like them will hopefully guide you towards creating a good relational model.
I'd also suggest reading up on database normalization if you're serious about learning good relational modeling principals.
I'd probably have a table called [HeavenlyBodies] or some such thing. Then have a look up table with the type of body, ie Planet, comet, asteroid, star, etc. All will share similar things such as name, size, weight. Most of the answers I read so far all have good advise. Normalization is good, but I feel you can take it too far sometimes. 3rd normal is a good goal.
Something keeps showing up in my programming, and it is that two things are the same from some viewpoint, but different from another. Like, imagine you build a graph of rail stations, connected by trains, then the classes Vertex and RailStation are sometimes the same, other times not.
So, imagine I have a graph that very much represents rail stations and trains. Then I hand this graph to another object, which deletes some vertices, and then I want the corresponding rail stations to be gone.
I don't want to make rail stations "properties" of vertices, they're not. Also, the problem is symmetrical: If I erase a railstation, I want the corresponding vertex to be gone. What is the proper OO way to model or correspondences. I'm willing to go a few extra miles by writing some support methods or classes, if in the end the overall usage is simple and easy.
I'm currently using the Smalltalk programming language, but the question isn't really smalltalk-specific, I think. I just mention it because in Smalltalk, you can do cool tricks like examining the call stack, which might be helpful in this context.
Update:
Well, RailStations aren't Vertices! Are they?
Ok, let us consider real code, as demanded in the answers. Let me model a person with children. That's the easiest thing, right? Children should also know their parents, so we have like a doubly linked tree. To make disbanding parents from children easier, I model the link between parent and child as a Relationship, with properties parent and child.
So, I could implement parent>>removeChild: perhaps like this
removeChild: aChild
(parent relationshipWith: aChild) disband.
So, a parent has a collection of relationships, not of children. But each relationship corresponds to a child. Now I want to do things like this:
parent children removeAllSuchThat: [:e | e age < 12]
which should remove the relationship and the child.
Here, relationships and children correspond in some sense. So, what do I do now? Don't get me wrong, I'm fully aware that I could solve the problem without introducing Relationship classes. But indeed, parents and children actually do share a relationship, so why not model that and use it to help disbanding double links less imperatively?
In your problem domain, aren't stations a kind of vertex? In which case, why not derive Station from Vertex?
Notice the use of the phrase "in your problem domain". Your problem appears to be about the use as railway stations appearing in a graph. So yes, in that domain, stations are vertexes. If it was a different problem domain, say a database on railway station architecture, they may well not be. Most modern languages support some idea of namespaces to allow you to have different kinds of entity with the same names in different domains.
Regarding your parent/child problem, once again you are being too general. If I were modelling mathematical expressions and sub expressions, if I remove a parent I would want to remove and delete/free all subexpressions. OTOH, ff I were modelling legal responsibility relationships in the UK population, then when a responsibility isis dissolved (say because of a divorce), I only want to remove the relationship, and NOT delete/free the child, which has its own independent existence.
It seems like you just want RailStation to inherit from Vertex (is-a relationship). See this smalltalk tutorial on inheritance. That way, if you have a graph of RailStations, an object used to dealing (generically) with graphs of Vertexes would handle things right naturally.
If this approach won't work, be more specific (preferably with real code).
From your description of the problem, you have a one-to-one correspondence of stations to vertices and deleting a station should automatically delete the corresponding vertex (and vice-versa). You also mentioned building "a graph of rail stations, connected by trains", by which you apparently mean a graph in which stations are vertices and trains are edges.
So, in what way is a station not a vertex? If the station does not exist except as a vertex, and if a vertex does not exist except as a station, then what benefit do you see in maintaining them as two distinct-but-linked entities?
As I understand your situation, station-isa-vertex and inheritance is the way to model that.
Having a Relationship object is a good idea.
I think the appropriate question here is "which use should be made of it?".
Probably Parent and Child classes are extending the same Person superclass, so they'll have some attributes in common, age for example.
In my idea, I can see the following: Parent and Child objects have to know each other, so both classes have to keep a link to the same Relationship.
The Relationship object keeps a one-to-many relation between a single parent and a certain number of children, and it'll keep a reference to each Person object.
This way you can implement the whole disbanding logic within the Relationshp object, more or less sophisticated as you wish. You can query the Relationship object to know which members of the family match your requirements to do something. You can make the relationship to disband (and destroy) safely, as it will know all members and would ask them to break the reference and then it would be ready to destroy, or ask to some member to leave the family, keeping the Relationship object alive.
But that's not all. Relationship should be really a superclass, extended by HierarchicalRelationship and PeerRelationship (or FriendRelationship).
This specialization lets you have Parent(s) and Child(ren) to link between other hierarchies in a completely traversal way.
The true concept behind this is that your Relationship objects are the key to query and organize the whole bunch of Person objects (or Vertex objects) in a scalable and structured way, so the whole data domain you end up with is usable in any sense you like, whether you want to disband groups or walk a certain path (or railroad) between them.
Sorry for the huge amount of metaphores.
Take a look at Fame, see http://www.squeaksource.com/Fame.html
We use a specialized subclass of Collection that updates the opposite end when you add or remove elements. Also, you can annotate your classes with pragmas to annotate relations. These pragmas are used by the Fame framework to do all kind of nice stuff.