Circular References in Database Design - Should they be avoided?

Circular References in Database Design - Should they be avoided? - ms-access

I am currently developing a database via MS Access 2003 and got stuck at a circular reference problem. Basically, it comes down to the following relationship triangle (it is a simplified form of my relationship table):
Positions
oo oo
/ \
/ \
/ \
/ \
/ \
/ \
/ \
/ \
/ \
/ \
oo oo
Employees oo -------------------- oo Software,
where Positions, Employees and Software are the tables, and "oo-------...-------oo" displays the many-to-many relationships between them.
In short, all of the employees in a company are assigned to specific positions (some of them are assigned to more than one), and have permissions to use specific piece(s) of software based on their position(s). However, there are exceptions, and some of the employees are granted to use a few number of other software packages, in addition to what they are allowed to according to their position(s).
The question is, is it OK to allow a circular relationship in this kind of database? Are there any workarounds that do not require denormalization?
Thanks in advance,
VS.

Your diagram is elliptical in the sense that you've left out the N:N join tables between all your entities. Those make a HUGE difference in regard to the side effects of circular relationships. Direct 1:N relationships with CASCADE DELETE on can cause real problems, and potential deadlocks. But with the N:N tables in between, you shouldn't have that problem, as CASCADE DELETE would run only "downhill" from the 1 table to the N, and not back up the chain from the N:N table to the other parent table.
It seems to me that this is a common problem, isomorphic with the address problem, i.e., a person can have a personal address and inherit an address from the employer, and #Saif Khan's solution of eliminating the software inheritance from the position is a form of denormalization, in that you've collapsed two complex entity relationships into a single one. I never know how to model this, not because of potential circular relationships, but because of the performance issues (and non-editibility) that come from assembling a single result set of all software/addresses, which requires a UNION. I would be tempted to use a trigger to duplicate the software inherited from the position with a record linking the person to the software.
Prior to A2010, this was not possible at the engine level in Access/Jet/ACE, but A2010 added table-level data macros which can be used to implement the equivalent of triggers. This could be a case where that new feature could allow you to implement this structure with triggers.
But I'm not sure I'm comfortable with duplicating data, even though triggers would allow you to keep the duplicated data in synch at the engine level.

You could avoid it by generating new position for each of the exceptions. A boolean flag could then be added to the position to differentiate between real and exception generating positions, if required.

You need to properly normalize the DB. IMHO - I'd not use a relationship in the positions table. Here is what I'd do
Tables
Employee
Software
EmployeeSoftware
Position
The "POSITIONS" table in your case, i assume, is your roles. Note the DB should be used as storage and very minimal business logic should be placed there. That being said, ...let me continue
There will be a relationship between Employee and EmployeeSoftware (empid present as foreign key in EmployeeSoftware. The same for Software and EmployeeSoftware (softid present as foreign key in the EmployeeSoftware.
The application first checks if a person is in a proper position (POSITIONS) table before inserting a record. For an additional DB check you can add a check contraint on the EmployeeSoftware to check the POSITIONS DB before...there then need to be a Relationship between Software and Positions.

I think this database design is getting too complicated because of the way of handling the exception,
"some of the employees are granted to
use a few number of other software
packages, in addition to what they are
allowed to according to their
position(s).
Don't try to directly link an Employee to software.
I would just create another position because the main purpose of position in this case is to determine software access. Even if one person has a unique list of software, they will get replaced in the future and that person can just be assigned the same position(s).
Querying will be easier. As David-W-Fenton pointed out, you're going to have to use a lot of unions to find out who can use what software or vice versa.

Related

How can I implement a dual relationship (Null, Not Null) in MySQL without creating two classes?

If I have a TABLE that consists of a LIST of attributes of one TYPE and another TABLE that is also composed by the same TYPE of LIST, so how can I implements them into MySQL without having to create two TABLES of the same TYPE?
For example:
Like the EMPLOYEE table, the COMPANY table has a list of ADDRESSES.
And I want to implement without having to make one table ADDRESS for COMPANY and another for EMPLOYEE, as in this case:
To me the solution seems to be a dual relationship where one of the foreign keys must be null while the other may not be, but I don't even know how to do it.

I believe your underlying hyopthesis is flawed: a one-person Company, founded by an Employee, could be registered at the founder's personal adress. Therefore an Employee may have the same address as a Company.
Likewise, two Employees may share the same address (a husband and a wife could be coworkers). Therefore I would define the relationship between adress and each of the other two entities a regular many-to-many, without any further condition.
You might be worried that changing the address of (eg.) a Company would wrongly alter the Employee's address. But instead of updating the Address entity, treat this case as creating a new Address and linking the (eg.) Company to this new Address.
Now, if you really need to implement the constraint, regardless of the (oh so dull ;) reality, I see no other option but implementing a form of inheritance as decribed here.
Notice that:
your initial design actually implements the Single Table Inheritance pattern
your alternative proposal (based on two separate address tables) actually implements the Concrete Table Inheritance pattern1
In your initial design, the constraint can be implemented with a simple CHECK constraint2 on the address table, the condition being company_id IS NULL XOR employee_id IS NULL
1 Migrating to the Class Table Inheritance pattern is "left as an exercise". However this pattern does not help in enforcing this constraint (in fact, this elegant pattern has serious additional limitations with enforcing integrity constraints in general). Nevertheless the constraint could still be enforced with a CHECK constraint.
2 MySQL does not enforce CHECK constraints, but a similar effect can be achieved through the use of triggers.

Database Design - structure

I'm designing a website with courses and jobs.
I have a jobs table and courses table, and each job or course is offered by a 'body', which is either an institution(offering courses) or a company(offering jobs). I am deciding between these two options:
option1: use a 'Bodies' table, with a body_type column for both insitutions and companies.
option2: use separate 'institution' and 'company' tables.
My main problem is that there is also a post table where all adverts for courses and jobs are displayed from. Therefore if I go with the first option, I would just need to put a body_id as a record for each post, whereas if I choose the second option, I would need to have an extra join somewhere when displaying posts.
Which option is best? or is there an alternative design?

Don't think so much in terms of SQL syntax and "extra joins", think more in terms of models, entities, attributes, and relations.
At the highest level, your model's central entity is a Post. What are the attributes of a post?
Who posted it
When it was posted
Its contents
Some additional metadata for search purposes
(Others?)
Each of these attributes is either unique to that post and therefore should be in the post table directly, or is not and should be in a table which is related; one obvious example is "who posted it" - this should simply be a PostedBy field with an ID which relates another table for poster/body entities. (NB: Your poster entity does not necessarily have to be your body entity ...)
Your poster/body entity has its own attributes that are either unique to each poster/body, or again, should be in some normalized entity of their own.
Are job posts and course posts substantially different? Perhaps you should consider CoursePosts and JobPosts subset tables with job- and course-specific data, and then join these to your Posts table.
The key thing is to get your model in such a state that all of the entity attributes and relationships make sense where they are. Correctly modeling your actual entities will prevent both performance and logic issues down the line.
For your specific question, if your bodies are generally identical in terms of attributes (name, contact info, etc) then you want to put them in the same table. If they are substantially different, then they should probably be in different tables. And if they are substantially different, and your jobs and courses are substantially different, then definitely consider creating two entirely different data models for JobPosts versus CoursePosts and then simply linking them in some superset table of Posts. But as you can tell, from an object-oriented perspective, if your Posts have nothing in common but perhaps a unique key identifier and some administrative metadata, you might even ask why you're mixing these two entities in your application.

When resolving hierarchies there are usually 3 options:
Kill children: Your option 1
Kill parent: Your option 2
Keep both
I get the issue you're talking about when you kill the parent. Basically, you don't know to what table you have to create a foreign key. So unless you also create a post hierarchy where you have a post related to institution and a separate post table relating to company (horrible solution!) that is a no go. You could also solve this outside the design itself adding metadata in each post stating which table they should join against (not a good option either as your schema will not be self documentation and the data will determine how to join tables... which is error prone).
So I would discard killing the parent. Killing the children works good if you don't have too many different fields between the different tables. Also you should bear in mind that that approach is not good to solve issues wether the children can be both: institution and companies but it doesn't seem to be the case. Killing the children is also the most efficient one.
The third option that you haven't evaluated is the keeping both approach. This way you keep a dummy table containing the shared values between the bodies and each of the bodies have a FK to this "abstract" table (if you know what I mean). This is usually the least efficient way but most likely the most flexible. This way you can easily handle bodies that are of both types, and also that are only of type "body" but not a company nor an institution themselves (if that is even possible or might be possible in the future). You should note that in order to join a post to an institution you should always reference the parent table and then join the parent with the children.
This question might also be useful for you:
What is the best database schema to support values that are only appropriate to specific rows?

Guaranteeing a FK relationship through multiple tables

I'm using MySQL / InnoDB, and using foreign keys to preserve relationships across tables. In the following scenaro (depicted below), a 'manager' is associated with a 'recordLabel', and an 'artist' is also associated with a 'recordLabel'. When an 'album' is created, it is associated with an 'artist' and a 'manager', but both the artist and the manager need to be associated with the same recordLabel. How can I guarantee that relationship with the current table setup, or do I need to redesign the tables?

You cannot achieve this result using pure DRI - Declarative Referential Integrity, or the linking of foreign keys to ensure the schema's referential integrity.
There are 2 ways to solve this problem:
Consider the requirement a database problem, and use a trigger on INSERT and UPDATE to validate the requirements, and fail otherwise.
Consider the nested link a business logic requirement, and implement it in your business logic in PHP/C#/whatever.
As a sidenote, I think the structure is rather strange from a practical perspective - as far as I know an Artist is signed to a RecordLabel, and assigned a Manager separately (either from the label or individually, many artists retain their own manager when switching to another label). Linking the Manager also to the Album only makes sense to record historic managers, enabling you to retrieve who was the manager to the artist when the album was released, but that automatically means your requirement is invalid if the artist switches labels and/or manages later on. I think therefore it is wrong from a practical data view to enforce this link.

What you do is add recordLabel id to the albums table. Then you put two, two column indexes on albumns (recordLabel_id, artist_id) and (recordLabel_id, managers_id).
Because the record_id can only have one value in each row of the albumns table you will have insured integrity.

Multiple relationships between two entities, is this good practice?

I have the following typical scenario regarding an office and its staff:
Each staff member belongs to one office
Each office has only one manager (a staff member)
ER Model
As you can see, this results in the relationship being recorded twice, once for the foreign key in the office table to point to the manager, and also in the staff table pointing to the office that staff member works for.
I have looked into alternative ways of modelling this but am still a bit lost. Please could someone advice a suitable way of modelling this, or if my method is acceptable for the scenario.
Many thanks

It's not that "the relationship [is] recorded twice", but that you actually have two relationships between these tables — which is perfectly fine. My only concern is, can a manager belong to the same office that (s)he's the manager of? (And relatedly: is it really true that every staff member has an office and every office has a manager who is a staff member?) If so, you have a circular dependency: you can't set the manager's office until the office exists, but you can't set the office's manager until the manager exists. As long as one or the other field is nullable, you can work around this by application logic (INSERT one, then INSERT the other, then UPDATE the first one), but it's a bit ugly. But if those are the relationships that exist, then there's not much you can do about it.

A circular relationship like this is valid, from an SQL perspective, but it causes some things to become complex.
For instance, when you back up and restore the data, you have to defer creation of one of the foreign key constraints until after you restore the data. Because if you create the constraints before you fill the tables, you can't restore the manager of the office before you restore his office, and you can't restore the office before you restore its manager.
Another way of solving this instead of using a Office.Manager foreign key column is to use a boolean column Staff.IsManager which is true for the manager but false for all other staff in the given office.

It is ok by me because regardless of the fact the the tables in the relationships are the same, the relationships are actually very different. It is correct that you can deduct that a manager works in the the office she manages, however this is rather a domain rule and doesn't de-normalize the this part of the database design.
if you want to get rid of the double relationship you can always create a Managers table which will contain foreign key to Staff and Office.

This seems awkward when you want to add a new office. An office must have a manager, so you need to make the manager first. But a manager is a staff member, which must have an office, so you need to make the office first.
To break this cycle, you need to allow one to be temporarily NULL or some other untrue value, and then modify whichever you create first to refer to the second. Not impossible, just awkward.
If I were designing this, I would probably have a separate "manages" table, relating offices to mangers.

This is ok, just to show the relationship. But be aware that this may lead to an infinite loop.
staffid ->officeid -> manager -> staffid
This loop can happen When retrieving data (I faced the same situation once), so it's better to normalize this and avoid this 2-relationships issue.

Database design: How should I add an information which can apply to several tables

I am constructing a database System using Mysql, this will be an application of about 20 tables. The system contains information on farmers, we work with organic certification and need to record a lot of info for that.
In my system, there are related parent-child tables for farmers, producing years and fields/areas - it's a simple representation of the real world in which farmers farm crops on their fields.
I now need to add several status flags for each one of these levels: a farmer can be certified, or his field can be, or the specific year can be; each of these flags has several states and can occur a number of times.
The obvious solution to this would be to add a child table to every one of these tables, and define the states there.
What I wonder if there is an easier way to do this to avoid getting to many tables? Where/how would be best practise to keep that data?

What about an indicator on every table that contains data that may or may not be certified? It's easier than adding new tables.
Or, if "certification" is actually a combination of several pieces/fields of data, then have a single "certification" table, and the other tables can reference it through a foreign key (something like "certification_id", which is the key of the "certification" table).

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008