Suppose I have 3 database tables: Countries, Provinces and Cities.
Countries has id (PK) and name.
Provinces has id (PK), name and country_id (FK).
Cities has id (PK), name and province_id (FK).
My question is: would be good to have country_id too as FK in the Cities table? I mean, this Country_id would be Provinces table's FK (Pronvices_Countries_id), not Countries PK directly. My mate says it is better for performance. But having FK's of all previous tables may be tedious when you have a lot of tables. For example, having 8 tables in relation, the last one could have 8+ FK's instead of the last table PK as FK.
Countries table:
+----+-----------+
| id | name |
+----+-----------+
| 1 | France |
+----+-----------+
Provinces table:
+----+-----------+------------+
| id | name | country_id |
+----+-----------+------------+
| 1 | Languedoc | 1 |
+----+-----------+------------+
Cities table:
+----+-----------+-------------+---------------------+
| id | name | province_id | province_country_id |
+----+-----------+-------------+---------------------+
| 1 | Toulouse | 1 | 1 |
+----+-----------+-------------+---------------------+
Can I have an explanation about this?
EDIT: Maybe the answer can be about identifying and non-identifying relationships? (I don't know.)
You should keep the structure as simple as possible until it actually affects performance.
A big effort is made at the engine level to make use of indexes when making queries, so a best bet would be to create indexes including all the fields you'll filter by.
Otherwise, countries, provinces and cities look like small tables, so the total performance impact wouldn't be noticeable anyway.
This makes your inserts and updates more complicated and updates should definitely be handled with a trigger on all the associated tables. That way if the country of the province changes, the country of cities in that province would also change automatically. You would also have to be careful that the source of the countryid value is consistent between the insert to province and the insert to city. You really have to be careful to maintain data integrity.
However there is a business case where it makes sense to do this.
That is:
when the id is unlikely to change frequently (so little extra work normally on update would happen)
when a significant portion of the queries against the child tables do
not need any information from the intermediate tables
and preferably when the descriptor information in the parent table is
not usually needed.
For instance, we do this for clientid as the client never changes in our system. almost all queries of subordinate tables need to be filtered by client but the client name is rarely needed in the end query. By cutting out several layers of joins to things we don't need in most queries, denormalizing clientid made sense.
However, in your case, while the first condition is likely met, are you really going to filter by country and not need province or city? I would find this a less likely scenario. Of course I am not familiar with how your data is used; it could be I am mistaken in this. I think in this case teh risk to data integrity woudl be higher than teh gain by doing this.
Related
I'm trying design a model which allows a user to be a buyer and seller with a single account, but some teachers told me that this diagram is wrong because it has redundancy.
I had reviewed the diagram but I haven't found a way to solve this redundancy. In the table orders I need to know who is a buyer, so for this reason I didn't delete this from the table. Some ideas?
The only thing that are "redundant" (not normalized to be exact) in your scheme is this :
You don't need to make a special ID, a composite PK is enough.
-------------------
| ORDERPRODUCT |
-------------------
| PK | PRODUCT_ID |
| PK | ORDER_ID |
-------------------
ADD CONSTRAINT pk
PRIMARY KEY (PRODUCT_ID, ORDER_ID);
On top of what #Blag has said, for Categories, you have 2 fields that might do the same thing: categoryname and description. You already have an identifier with PK_IdCategory, so one of those might be unnecessary
Let's assume I have a very large database with tons of tables in it.
Certain of these tables contain datasets to be connected to each other like
table: album
table: artist
--> connected by table: album_artist
table: company
table: product
--> connected by table: company_product
The tables album_artist and company_product contain 3 columns representing primary key, albumID/artistID meanwhile companyID/productID...
Is it a good practice to do something like an "assoc" table which is made up like
---------------------------------------------------------
| id int(11) primary | leftID | assocType | rightID |
|---------------------------------------------------------|
| 1 | 10 | company:product | 4 |
| 2 | 6 | company:product | 5 |
| 3 | 4 | album:artist | 10 |
---------------------------------------------------------
I'm not sure if this is the way to go or if there's anything else than creating multiple connection tables?!
No, it is not a good practice. It is a terrible practice, because referential integrity goes out the window. Referential integrity is the guarantee provided by the RDBMS that a foreign key in one row refers to a valid row in another table. In order for the database to be able to enforce referential integrity, each referring column must refer to one and only one referred column of one and only one referred table.
No, no, a thousand times no. Don't overthink your many-to-many relationships. Just keep them simple. There's nothing to gain and a lot to lose by trying to consolidate all your relationships in a single table.
If you have a many to many relationship between, say guiarist and drummer, then you need a guitarist_drummer table with two columns in it: guitarist_id and drummer_id. That table's primary key should be comprised of both columns. And you should have another index that's made of the two columns in the opposite order. Don't add a third column with an autoincrmenting id to those join tables. That's a waste, and it allows duplicated pairs in those tables, which is generally confusing.
People who took the RDBMS class in school will immediately recognize how these tables work. That's good, because it means you don't have to be the only programmer on this project for the rest of your life.
Pro tip: Use the same column name everywhere. Make your guitarist table contain a primary key called guitarist_id rather than id. It makes your relationship tables easier to understand. And, if you use a reverse engineering tool like Sql Developer that tool will have an easier time with your schema.
The answer is that it "depends" on the situation. In your case and most others, no, it does not make sense. It does make sense if you are doing a many <-> many relationship, the constraints can be enforced by the link table with foreign keys and a unique constraint. Probably the best use case would be if you had numerous tables pointing to a single table. Each table could have a link table with indexes on it. This would be beneficial if one of the tables is a large table, and you need to fetch the linked records separately.
While using mysql workbench and for designing database using designer the relation tool uses a third table to form a many to many relation between 2 tables.
I have 3 tables
TABLE1
TABLE2
TABLE3
TABLE2 has foregin key from primary key of TABLE1,having a many to one relation
TABLE2 and TABLE3 are related using a many to many relation,
as soon as I create the relation
a new table TABLE3_has_TABLE2 is created with all the key from TABLE2(primarykey of table2 & foreign key of table1) and TABLE3 (primary key of table3).
Now,
why is there foreign key of table1.?
Even if i remove I will be able to query data from table1 and table3 using table2 as intermediate, so is it good to have this kind of relation or avoided?
For Example in below diagram
This is a geographical distribution of location, on right side it shows the hirarchy.
Now,
Table1(Zone) is the primary table i.e Zone
Table2(state) is related to table1 using zone_id
Table3(division) is related to table2(state) using state_id & zone_id of table1(zone)
Question: Should this zone_id column be in the table3 or not?
similarly table4 contains all the previous key columns of table3.
Strictly from a denormalization point-of-view, the DIVISION.STATE_ZONE_ID isn't required.
Since you can get the ZONE_ID from the DIVISION by joining STATE on the state_id.
And it's the same with the division_state_state_id & division_state_zone_id in DISTRICT.
Having the division_division_id is enough to join DIVISION, then STATE, then ZONE.
However, what if you would remove those 'extra' fields?
Then a SQL always needs to go through that cascade of joined tables to get the ZONE.zone_name.
So there's an advantage that by having those 'extra' fields, it becomes possible to JOIN directly to the ZONE table. Which can simplify/speed up certain popular queries.
The disadvantage is that it becomes harder to assure referential integrity.
Because for example, you could assign a different zone_id to a DIVISION.state_zone_id than the STATE.zone_id you can get via DIVISION.state_state_id.
It is best practice in relational models to avoid many-to-many relationships. Workbench usually compensates for user trying to do that as you have seen.
Let us use an example (or check the tl;dr), where there are two identified entities; buyers and hardware items. Some people buy 1 item, others buy more than one. The thing is, that same item can be bought by many people. So the buyer table has Mr. A buying nails. Simple enough to record in one row. But lo' and behold, he ups and gets another item! How do we show that he buys another item?
One way is by adding another attribute to the table (say "item_number_two"). But then he gets another! We can't keep going adding attributes like that. Databases were designed more for vertical addition of records, rather than horizontal addition of attributes (to give a visual picture). There is a longer explanation but you should read up, or probably might figure it out after reading this.
Another way is to re-enter a record for Mr. A and then put the ID of another item in that column, showing that he bought two items (not really "he" from a database stand-point, it's two different people!).
A better method would be to create a table that consists of the unique identifiers found in the original tables (just one per table may be necessary). This is called an intermediary table. The original tables themselves do not have foreign keys from the other table.
This is where the concept of a composite key comes in. It means that two or more candidate keys are used to uniquely identify a record rather than just one. This is how it works:
Person Table:
| person_ID | person_Name |
| P0001 | Mr. A |
| P0002 | Mr. B |
| P0003 | Mr. C |
| P0004 | Mr. D |
Cat Table
| item_ID | cat_Name |
| I0001 | Nails |
| I0002 | Screws |
| I0003 | Hammers |
| I0004 | Power-Saw |
Intermediary table
| person_ID | item_ID |
| P0001 | I0001 |
| P0001 | I0002 |
| P0001 | I0003 | //Shows that person 1 bought more than one item
| P0002 | I0004 |
| P0002 | I0001 | //Shows that an item has been bought by more that one person
So this new table matches a record of one table(through the use of a primary key) to a record of another. The only thing that will ever be repeated is one of the two ID's. A unique record is made as long as no two combinations are repeated.
tl;dr - Having tables mapped in a many to many relationship inevitably wastes space in the DB when entering records, as new records of the same data have to be made to show a small difference (adding no real value in proportion to the space). Another issue is that it causes more calculations than necessary when a query is made, wasting time and space. Or the results returned may just be plain wrong...
EDIT:
If you have tables A and B having a many-to-many relationship, do the following as an alternative. Create a table C. Take the primary keys from table A and B and place them in tables C. In table C they both exist as primary and foreign keys. This would mean the following relationship is created.
| Table A |-----------<| Table C |>------------|Table B|
Table A and B are linked through C.
Sample query:
SELECT C.itemID FROM A, C WHERE A.personID = P0001 AND A.personID = C.personID;
This query will return all ID's of the items bought by the person with an ID of P0001. Records must match the condition of having a personID of P0001, but the record selected must have that matching ID in Table C (the intermediary table). An extended query could be to take the item names from the Table B. Each attribute in C has a recorded value that corresponds to a value of a key in either Table A or B, meaning that a query can be run to pull other info, where the value in Table C is = to the values in Table A/B (depending on which one you want).
My database has several categories to which I want to attach user-authored text "notes". For instance, an entry in a high level table named jobs may have several notes written by the user about it, but so might a lower level entry in sub_projects. Since these notes would all be of the same format, I'm wondering if I could simplify things by having only one notes table rather than a series of tables like job_notes or project_notes, and then use multiple many-to-many relationships to link it to several other tables at once.
If this isn't a deeply flawed idea from the get go (let me know if it is!), I'm wondering what the best way to do this might be. As I see it, I could do it in two ways:
Have a many-to-many junction table for each larger category, like job_notes_mapping and project_notes_mapping, and manage the MtM relationships individually
Have a single junction table linked to either an enum or separate table for table_type, which specifies what table the MtM relationship is mapping to:
+-------------+-------------+---------------+
| note_id | table_id | table_type_id |
+-------------+-------------+---------------+
| 1 | 1 | jobs |
| 2 | 2 | jobs |
| 3 | 1 | project |
| 4 | 2 | subproject |
| ........... | ........... | ........ |
+-------------+-------------+---------------+
Forgive me if any of these are completely horrible ideas, but I thought it might be an interesting question at least conceptually.
The ideal way, IMO, would be to have a supertype of jobs, projects and subprojects - let's call it activities - on which you could define any common fact types.
For example (I'm assuming jobs, projects and subprojects form a containment hierarchy):
activities (activity PK, activity_name, begin_date, ...)
jobs (job_activity PK/FK, ...)
projects (project_activity PK/FK, job_activity FK, ...)
subprojects (subproject_activity PK/FK, project_activity FK, ...)
Unfortunately, most database schemas define unique auto-incrementing identifiers PER TABLE which makes it very difficult to implement supertyping after data has been loaded. PostgreSQL allows sequences to be reused, which is great, some other DBMSs (like MySQL) don't make it easy at all.
My second choice would be your option 1, since it allows foreign key constraints to be defined. I don't like option 2 at all.
Unfortunately, we have ended up going with the ugliest answer to this, which is to have a notes table for every different type of entry - job_notes, project_notes, and subproject_notes. Our reasons for this were as follows:
A single junction table with a column containing the "type" of junction has poor performance since none of the foreign keys are "real" and must be manually searched. This is compounded by the fact that the Notes field contains a lot of text per entry.
A junction table per entry adds an additional table over simply having separate notes tables for every table type, and while it seems slightly prettier, it does not create substantial performance gains.
I'm not satisfied with this answer, because it seems so wasteful to effectively be duplicating the same Notes table for every job/project/subproject table that is being described. However, we haven't been able to come up with an answer that would hold up performance wise in the long term. I'll leave this open in case anyone has better recommendations for how to do this!
I've created a database with three tables in it:
Restaurant
restaurant_id (autoincrement, PK)
Owner
owner_id (autoincrement, PK)
restaurant_id (FK to Restaurant)
Deal
deal_id (autoincrement)
owner_id (FK to Owner)
restaurant_id (FK to Restaurant)
(PK: deal_id, owner_id, restaurant_id)
There can be many owners for each restaurant. I chose two foreign keys for Deal so I can reference the deal by either the owner or the restaurant. The deal table would have three primary keys, two being foreign keys. And it would have two one-to-many relationships pointing to it. All of my foreign keys are primary keys and I don't know if I'll regret doing it like this later on down the road. Does this design make sense, and seem good for what I'm trying to achieve?
Edit: What I really need to be able to accomplish here is when a owner is logged in and viewing their account, I want them to be able to see and edit all the deals that are associated with that particular restaurant. And because there can be more that one owner per restaurant, I need to be able to perform a query something like: select *from deals where restaurant_id = restaurant_id. In other words, if I'm an owner and I'm logged in, I need to be able to make query: get all of the deal that are related to not just me, the owner, but to all of the owners associated with this restaurant.
You're having some trouble with terminology.
A table can only ever have a one primary key. It is not possible to create a table with two different primary keys. You can create a table with two different unique indexes (which are much like a primary key) but only one primary key can exist.
What you're asking about is whether you should have a composite or compound primary key; a primary key using more than one column.
Your design is okay, but as written you probably have no need for the column deal_id. It seems to me that restaurant_id and owner_id together are enough to uniquely identify a row in Deal. (This may not be true if one owner can have two different ownership stakes in a single restaurant as the result of recapitalization or buying out another owner, but you don't mention anything like that in your problem statement).
In this case, deal_id is largely wasted storage. There might be an argument to be made for using the deal_id column if you have many tables that have foreign keys pointing to Deal, or if you have instances in which you want to display to the user Deals for multiple restaurants and owners at the same time.
If one of those arguments sways you to adopt the deal_id column, then it, and only it, should be the primary key. There would be nothing added by including the other two columns since the autoincrement value itself would be unique.
If u have a unique field, this should be the PK, that would be the incremented field.
In this specific case it gives u nothing at all to add more fields to this key, it actually somewhat impacts performance (don't ask me how much, u bench it).
if you want to create 2 foreign keys in the deal table which are the restaurant and the owner the logic is something like a table could exist in the deal even without an owner or an owner could exist in the deal even without identifying the table on it but you could still identify the table because it's being used as a foreign key on the owner table, but if your going to put values on each columns that you defined as foreign key then I think it's going to be redundant cause I'm not sure how you would use the deal table later on but by it's name I think it speaks like it would be used to identify if a restaurant table is being reserved or not by a customer and to see how you have designed your database you could already identify the table which they have reserved even without specifying the table as foreign key in the deal table cause by the use of the owner table you would able to identify which table they have reserved already since you use it as foreign key on the owner table you just really have to be wise on defining relationships between your tables and avoid redundancy as much as possible. :)
I think it is not best.
First of all, the Deal table PK should be the deal_id. There is no reason to add additional columns to it--and if you did want to refer to the deal_id in another table, you'd have to include the restaurant_id and owner_id which is not good. Whether deal_id should also be the clustered index (a.k.a. index organized on this column) depends on the data access pattern. Will your database be full of data_id values most often used for lookup, or will you primarily be looking deals up by owner_id or restaurant_id?
Also, using two separate FKs way the you have described it (as far as I can tell!) would allow a deal to have an owner and restaurant combination that are not a valid (combining an owner that does not belong to that restaurant). In the Deal table, instead of one FK to Owner and one FK to Restaurant, if you must have both columns, there should be a composite FK to only the Owner table on (OwnerID, RestaurantID) with a corresponding unique key in the Owner table to allow this link up.
However, with such a simple table structure I don't really see the problem in leaving RestaurantID out of the Deal table, since the OwnerID always fully implies the RestaurantID. Obviously your deals cannot be linked only with the restaurant, because that would imply a 1:M relationship on Deal:Owner. The cost of searching based on Restaurant through the Owner table shouldn't really be that bad.
Its not wrong, it works. But, its not recommended.
Autoincrement Primary Keys works without Foreign Keys (or Master Keys)
In some databases, you cannot use several fields as a single primary key.
Compound Primary Keys or Compose Primary Keys are more difficult to handle in a query.
Compound Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(1=1) AND
(D.RestaurantKey = D.RestaurantKey) AND
(D.OwnerKey = D.OwnerKey)
Versus
Single Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(D.OwnerKey = O.OwnerKey)
Sometimes, you have to change the value of foreign key of a record, to another record. For Example, your customers already order, the deal record is registered, and they decide to change from one restaurant table to another. So, the data must be updated, in the "Owner", and "Deal" tables.
+-----------+-------------+
| OwnerKey | OwnerName |
+-----------+-------------+
| 1 | Anne Smith |
+-----------+-------------+
| 2 | John Connor |
+-----------+-------------+
| 3 | Mike Doe |
+-----------+-------------+
+-----------+-------------+-------------+
| OwnerKey | DealKey | Food |
+-----------+-------------+-------------+
| 1 | 1 | Hamburguer |
+-----------+-------------+-------------+
| 2 | 2 | Hot-Dog |
+-----------+-------------+-------------+
| 3 | 3 | Hamburguer |
+-----------+-------------+-------------+
| 1 | 3 | Soda |
+-----------+-------------+-------------+
| 2 | 1 | Apple Pie |
+-----------+-------------+-------------+
| 3 | 3 | Chips |
+-----------+-------------+-------------+
If you use compound primary keys, you have to create a new record for "Owner", and new records for "Deals", copy the other fields, and delete the previous records.
If you use single keys, you just have to change the foreign key of Table, without inserting or deleting new records.
Cheers.