ERD design pro and contra - mysql

I have one question regarding database design.
Here is the first example:
User may have a multiple Websites, and user can request specific resource for every of his websites. All requests are saved in RequestForResource table.
Now, if I want to see the name of an user who requested a resource, I have to join tables RequestForResource Website and table User.
To avoid this, I can make foreign key between RequestForResource and User table like it is demonstrated here:
Now, in order to get an user name, I have to join table RequestForResource and table User which is probably easier for SQL server, but at the other hand I have one foreign key more.
Which approach is better and (or) faster and why?

You can always duplicate information to gain execution speed. This is called: denormalisation. Yes, it will probably speed up the queries by lowering the required count of index seeks.
BUT
You have to write your code to make sure, that the data is consistent:
With the second design it is possible, to insert Website.User_idUser and a RequestForResource.User_idUser with different IDs for the same site! According to the design this is valid (but probably this will not satisfy your business rules).
Consider to update the foreign key constraint (or add a second one) which refers only to the Website table (User_idUser, Website_idWebsite) and remove the User-RequestForResource one.
Also consider to build a view to query your data with all the required info (probably with a clustered index).

Related

Is it proper to make a grand-parent key, a primary key, in its grand-child, in a multi-level identifying relationship?

Asked this here a couple of days ago, but haven't gotten many views, let alone a response, so I'm reposting to stackoverflow.
I'm modeling a DB for a conference ticketing system. In this system attendees are members of an attendee group, which belong to a conference. These relationships are identifying, and therefore FKs must be PKs in the respective children.
My current model:
Q: Is it proper to have attendeeGroupConferenceId FK, as a PK, in the attendee table, as MySQL Workbench has automatically set up for me?
On one side one would get a performance boost by keeping it in there for quick association at "check in". However, it does not strictly necessary since the combination of id, attendeeGroupId, and a corresponding lookup of conferenceId in the respective attendeeGroup table, is enough. (Therefore becomes redundant data.)
To me, it feels like it might violate some form of normalization, but I plan on keeping it in for the speed boost as described. I'm just curious about what proper design says about giving it PK status or not.
You definitely don't need the attendeeGroupConferenceId in your attendee table. It's redundant and notice that candidate key is the combination of (attendeeGroupId, personId), not the attendeeGroupConferenceId alone.
The table attendee also seems to violate the Second normal form (2NF) as it is.
My suggestion is to remove the attribute attendeeGroupConferenceId. In any case you can just join the tables in your queries to get extra info rather than keeping an extra attribute.

Duplicating MySql Column Values VS Foreign Key - Best Practice

We are currently in the process of developing our own e-commerce solution, as part of our research we have been examining the ZenCart Database Schema and found that data is quite frequently duplicated between various tables where it would seem that perhaps a Foreign Key would have been sufficient to link the two or more tables in question, for example:
Given that there is table "Products" that has the following columns
PRODUCT_IDPRODUCT_NAMEPRODUCT_PRICEPRODUCT_SKU
Then if there is a Sales_Item "Table" Then of course a product (and all its constituent columns)may be referenced by simply doing something like:
SALES_ITEM_IDProducts_PRODUCT_ID //This is the foreign key that relates a specific product to a sale item.SALE_TIMEREST_OF_SALE_SPECIFIC_DATA......
However instead it seems that the Sales table COPIES many of the field values defined in the Products table so it infact looks as follows:
SALES_ITEM_IDPRODUCT_IDPRODUCT_NAMEPRODUCT_PRICEPRODUCT_SKUSALE_TIME
My question is which approach would generally be considered best practice when attempting to build a scalable efficient solution. Using foreign keys means data is not duplicated but the caveat is that database or application-level JOINS would be needed in order to query the entire dataset. However than being said, for some reason the foreign key approach seems cleaner and more correct somehow.

Are foreign keys used in a "link" table?

Quick question about DB design! In this example there are users and schedules. Each user can have many schedules and each schedule can belong to many users.
I have two tables, 'user' and 'schedule', that each have a unique identifier/primary key (user_id and schedule_id): these tables have a many-to-many relationship.
This is where I am unsure/inexperienced: In order to connect them together and adhere to good db design, I want to create a link table that has two columns, user_id and schedule_id. I plan to make these both primary keys (therefore a composite key). However, do I also add two foreign keys, one on user_id linked to the 'user' table and one on schedule_id linked to the 'schedule' table?
TLDR: I plan to use a composite key in 2-column 'link' table that connects two tables. Should/Do I also need to make those into foreign keys?
PKs and FKs serve different purposes. In a link table, you need the PK to preserve uniqueness of the data. However, if you do not also create the FKs then you may end up with data integrity problems because the ID could be deleted from the original table and not the link table.
Sometimes people think they can get away without the FKs because they will enforce data integrity through the application. Almost always this is because they find it annoying when the constraints won't let them do something they want to do. Of course that is the purpose of the constraint, to prevent users and developers from doing things they should not. Data integrity must be preserved through the database; it is too important to risk letting the application handle it. I have seen a lot of data from hundreds of databases and the ones with the worst data are invariably the ones where the devs thought they could manage stuff like table relationships through the application. There are always holes when you do this and eventually they come back to bite you and then they can be very difficult to fix properly.

join, foreign key and other relational DB planning

I am new to database planning and in programming in general.
I need to develop desk app for realtors.
It needs to have at least 2 tables:
property_table - id, license #, address, city, bedrooms, baths, laundry, etc, etc.
image_table - id, picture_name, path, size (image related DB)
(it will probably need a agent_table, but lets keep things simple).
Property_table will have only one address per ID. A new entry with same address has to generate new ID (a person re-selling same house).
But image_table may have 10 entries for the same property address.
I am using PHP Session to bring address, city, zip code between table to avoid mistakes from user (therefore image_table is actually id, picture_name, path, size, address, city, zip code, username).
QUESTION: should I use a foreign key? Or just join in my searches? Many questions about this, like here, good tutorials on joins, etc., etc. It seems I have to use join query. What about the foreign key?
WHY: I need to show the listings like coming from different BD. Address (table-1) has several pictures (table-2).
PLANNING AHEAD. In the long term, same address will have more than one entry (same address, same zip code).
Just confused with so much new information and trying to plan ahead.
Thank you so much for your time.
property_table - Property_table_id(PK), license #, address, city, bedrooms, baths, laundry, etc, etc.
image_table - image_table_id(PK),Property_table_id(FK), picture_name, path, size (image related DB)
Select * from property_table PropTable
Inner Join Image_Table imgTable on PropTable.Property_table_id = imgTable.Property_table_id
In general I would say 'yes' to using the Foreign Key (FK). I am just assuming that since you are using PHP that you are probably using one of the many popular free or open source databases such as MySQL for your relational database back-end. Having the FK will allow you to set constraints on your data, to prevent you from making a mistake that may cause your program to error out.
For example, in your scenario you have the property_table table, which will have several addresses, in which some of them may use the same images. In this case, you would want a column in your property_table, maybe property_table.image_id that is a a FK to your image_table table, referencing the column image_table.id.
If you set up your constraints properly on the FK, you will prevent yourself from accidentally entering the id of an image in the images_table table that does not exist. You can also use constraints to automatically manage the references for you in case you do something with the data in refered (images_table) table. For example, if you delete the image from images_table you could have all references to that image automatically set to NULL (an empty value) inside the property_table.
Foreign key is a constraint, JOIN is a query method. While it is true that JOINs are often (but not always) done "on top" of foreign keys, they are not the same thing.
So, if you need to ensure there are no "dangling pointers" (as you do between image_table and property_table), use foreign key. Always enforce the data integrity at the database level, even if you also enforce it in the UI1
If you need to get the related data from two (or more) tables using a single query, use JOIN.
1 Which will protect your data in case of bugs, especially subtle concurrency bugs that almost certainly exist in your code unless you employed locking very carefully. Furthermore, if you ever have to create another application that accesses the same database, it will benefit from integrity constraints that are already there. And if you ever modify the data ad-hoc through the generic UI provided by your DBMS, it will be more difficult to "break" the data.

Mysql deduce foreign key relationship for random queries

I am an MySQL novice and am looking for the solution to the following problem:
I would like to create a CMS with cppcms which shall be capable to have modules. Since I want to reduce the chance of (accidental) access to private data, I want a module which handles data access and rights. Since this module is supposed to be unaware of data structures created by other modules I would like it to deduce the data owner through foreign key relations. My idea would be to search for a path (over foreign keys) which links a row to a user id.
Sum up:
What I am trying to do
Taking a random query, determine the affected rows
for the affected rows determine a relationship/path (via foreign keys) to a user/userid (a column in an existing table)
return only the rows for which a relationship could be determined and a condition holds (e.g. the userid found in the related query matches a fixed user id, such as the user currently accessing the system)
(As far as I know foreign keys only enforce the existence of a key in another table, however the precondition I assume is, that every row is linked to a user over a path of foreign key relations)
My Problem/Question:
Is there an existing solution/Better approach to the problem? Prepared statements wont do the trick since I don't know all datastructures/queries in advance.
How do I get the foreign key relations? Is there another way besides "SHOW CREATE TABLE" and then parsing the result string?
How can I determine the rows that would be affected, without modifing them? I would like to filter this set afterwards by determining if I can link it to the current user (not the mysql user but system user).
Could I try executing the query, and then select the affect rows, and if I determine an access violation simply do a rollback? Problem with this: how to do the changes to the subset of rows for which it is legal (e.g. I attempt to change 5 rows, may only change 2, how to only change those 2). One idea was to search a way to create a temporary table with the result set; this solution has several drawbacks: foreign key relations are not possilbe for temporary tables, they are 'lost'.
P.S.: I am coding in c++, therfore I would prefer cpp-compatible library recommendations, however I am open to other suggestions. While googling I stumbled over doctrine and Iam currently researching it.
P.P.S.: Database engine is InnoDB (has to because of the foreign keys)
UPDATE: Explanation Attempt of Part 2:
I am trying to filter which collumns a user is allowed to see of tables. To do so I would like to find a connection in the database over foreign keys (By foreign keys I ensure that I can get to all data over joins, and they are a hint on which columns I have to join). Since I plan on a complexer system (e.g. forum) I don't want to join all data in a temporary table and run a user query on those. I would rather evaluate the userquery and check for the result if I can map it with a join to the users id. For example I could use this to enforce that an edit button is only enabled for the posts created by the user. (I know there are easier ways to do this, but I basically want to allow programmers to write their own queries without giving them the chance to edit or view data that they are not allowed to see. My assumption is that the programmer is not an evildoer but simply forgetting constraints, thus I want to enforce them in software).
Getting here would be pretty good, but I have a little more complex need.
First a basic example. Let's say its like facebook and all the friends of a person are allowed to see his pictures.
pictures = id **userid** file (bool)visibleForFriends album
friendship = **userid1** **userid2**
users = userid
What I want to happen is:
Programmer input "SELECT * FROM pictures WHERE album=2"
System gets all matching records (e.g. set of ids)
System sees foreign key userid, tries to match current userid against the pictures userid, adds all matching to the returned result part
System notices special column visibleForFriends
System tries to determin all Friends (SELECT userid1 FROM friendship WHERE userid2=currentUserID join (have to read up on joins) SELECT userid2 FROM friendship WHERE userid1 =currentUserID)
System adds all rows where visibleForFriends is true and pictures.userid=Result from 5.
While the Friendship part is some extra code (I think doable if igot started on the first bit), I still need to figure out how to automatically follow the foreign keys to see the connection. Ignoring the special Friendship case (special case), I would like the system to work on this as well:
pictures = id **albumid** file (bool)visibleForFriends album
albums = id **userid**
users = userid
Now the system should go pictures.albumid ==> albums.id -> albums.userid ==> users.userid.
I hope the examples clarified the question a bit. One problem is, that in point one from the example (programmer query input) I dont want to let "DELETE *" take effect on anything not owned by the user. So I have to filter which rows to actually delete.
In response to part of your answer (part 1), providing the Mysql user you access the database with has access rights to information_schema then you can use the following query to understand existing foreign key relations within a specific database:
SELECT
TABLE_NAME,
COLUMN_NAME,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME
FROM
information_schema.KEY_COLUMN_USAGE
WHERE
TABLE_SCHEMA = 'dbname' AND REFERENCED_COLUMN_NAME IS NOT NULL;
I am slightly confused by the part 2 and am unsure how to give an appropriate response to this section. I hope you find the above query helpful though in your project!
Is there an existing solution/Better approach to the problem?
Yes, I think so. You're describing a multi-tenant database. In a multi-tenant database in which the users share tables (also known as "shared everything"), each table should have a column for the user id. In effect, each row knows its owner.
This will vastly simplify your SQL, since you need no joins to determine who a row belongs to. it will probably speed up your SQL a lot, too.
This SO answer has a decent summary of the issues and alternatives.