If I have two tables - table beer and table distributor, each one have a primary key and a third table that have the foreign keys and calls beer_distributor
Is it adequate a new field (primary key) in this table? The other way is with joins, correct? To obtain for example DUVEL De vroliijke drinker?
You've definitely got the right idea. Your beer_distributor table is what's known as a junction table. JOINs and keys/indexes are used together. The database system uses keys to make JOINs work quickly and efficiently. You use this junction table by JOINing both beer and distributor tables to it.
And, your junction table should have a primary key that spans both columns (a multiple-column index / "composite index"), which it looks like it does if I understand that diagram correctly. In that case, it looks good to me. Nicely done.
I would put a primary key in the join table beer_distributor, not a dual primary key of the two foreign keys. IMO, it makes life easier when maintaining the relationship.
UPDATE
To emphasize this point, consider having to change the distributor ACOO9 for beer 163. With the dual primary key, you'd have to remove then reinsert OR know both existing values to update the record. With a separate primary key, you'd simply update the record using this value. Comes in handy when building applications on top this data. If this is strictly a data warehouse, then a dual primary key might make more sense from the DBA perspective.
UPDATE beer_distributor SET distributor_id = XXXXX WHERE beer_id = 163 AND distributor_id = AC009
versus
UPDATE beer_distributor SET distributor_id = XXXXX WHERE id = 1234
Related
I need the advice of someone who has a greeter experience.
I have an associative entity in my database, like that:
Table2-> CustomerID, ServiceID, DateSub
Since the same customer (with PK, for example 1111) can require the same service (with PK, for example 3) more than once but never in the same date , the composite PK of Table 2 can't be just (CustomerID, ServiceID).
Now I have 2 options:
1- Also "DateSub" will be a primary key, so the PK of table 2 will be (CustomerID, ServiceID, DateSub)
2- Create a specific PK for the associative entity (for example, Table2ID, and so CustomerID and Service ID will be FK)
Which of the 2 approach would you follow and why? Thank you
First of all you need to decide whether is it your requirement to make combination of CustomerID, ServiceID amd DateI column as unique? If so then you should go for firt option.
Otherwise I would go for second option.
With first option if DateI is of date data type you will not be able to insert same service for a customer twice. If it's datetime then it's doable though.
If you want to use this primary key (composite primary key) in any other table as foreign key then you need to use all three columns there too.
I tend to prefer the PK be "natural". You have 3 columns that, together, can uniquely define each row. I would consider using it.
The next question is what order to put the 3 columns in. This depends on the common queries. Please provide them.
An index (including the PK) is used only leftmost first. It may be desirable to have some secondary key(s), for efficient access to other columns. Again, let's see the queries.
If you have a lot of secondary indexes, it may be better to have a surrogate, AUTO_INCREMENT "id" as the PK. Again, let's see the queries.
If you ever use a date range, then it is probably best to have DateSub last in any index. (There are rare exceptions.)
How many rows in the table?
The table is ENGINE=InnoDB, correct?
Reminder: The PRIMARY KEY is a Unique key, which is an INDEX.
DateSub is of datatype DATE, correct?
Lets say we have quite a few tables (T1, T2... T50), and we would like to have n to n relations between all of them.
What would be a propper way of implementig that.
Having a relations table for each pair of Tx and Ty would not be practical if the number of tables goes up to 100 or more.
The current solution I have is
relationships_table
id_x, table_name_x, id_y, table_name_y
for storing all the relationships. This way adding new tables is trivial, but what are the disadvantages?
1) What is a better way of supporting such a use case, if we're limited to sql?
2) How to efficiently solve this if we're not limited to sql?
The solution you proposed is the most reasonable solution to the stated problem. But the problem seems somewhat unreasonable.
If you need a graph, then you only need two tables, one for the nodes and another one for the edges.
If some nodes are of specific types then you can have extra specialization tables for them.
Add only the essential Relation tables. tblA relates to tblB, and tblB relates to tblC. So, usually that implies that you can get from A to C via
FROM tblA
JOIN tblB ON ...
JOIN tblC ON ...
Won't this do? And need not much more than 50 extra tables? And be a lot cleaner?
I run into the same problem and I had a sligthly different approach. I added a table called relationable, only storing an id and all tables appearing in the graph have a reference to this table. I make sure on my own that only one element references an relationable entry in the whole database (This is actually what boters me the most, but in practice it is not such a problem just not looking nice). and then a relation table for the n to n relationship between relationable.
To make my point I add an example i MADE IN MySQL.
CREATE TABLE relationable
(
relationable_id INT AUTO_INCREMENT PRIMARY KEY
) ENGINE=INNODB;
in the relation table I added a name, because my vertices have a name, there might even be multiple vertices between two nodes with different names.
CREATE TABLE relation
(
from_id INT NOT NULL,
to_id INT NOT NULL,
name VARCHAR(255) NOT NULL,
FOREIGN KEY (from_id) REFERENCES relationable(relationable_id) ON DELETE CASCADE,
FOREIGN KEY (to_id) REFERENCES relationable(relationable_id) ON DELETE CASCADE
)ENGINE=INNODB;
finally a table which appears in the graph would look like the following
CREATE TABLE place
(
place_id INT NOT NULL,
name VARCAHR(255),
FOREIGN KEY (PLACE_ID) REFERENCES relationable(relationable_id)
ON DELETE CASCADE
) ENGINE=INNODB;
Now obviously this has pros and cons,
cons
You need to make sure yourself that a relationable is only referenced once. Inside one table this is taken care of by PRIMARY KEY but over all tables this is not done.
You might need a huge int for the id of relationable.
The table relation might get quite big.
pros
To errase an entry and all its relations deleting the relationable entry suffices, all entrys in relation and the respective table will be deleted.
When joining two tables there is no need for the relationable table.
I am in a situation where i have to store key -> value pairs in a table which signifies users who have voted certain products.
UserId ProductID
1 2345
1 1786
6 657
2 1254
1 2187
As you can see that userId keeps on repeating and so can productId. I wanted to know what can be the best way to represent this data. Also is there a necessity of using primary key in here. I've searched a lot but am not able to find the exact specification about my problem. Any help would be appreciated. Thank you.
If you want to enforce that a given user can vote for a given product at most once, create a unique constraint over both columns:
ALTER TABLE mytable ADD UNIQUE INDEX (UserId, ProductID);
Although you can use these two columns together as a key, your app code is often simpler if you define a separate, typically auto increment, key column, but the decision to do this depends on which app code language/library you use.
If you have any tables that hold a foreign key reference to this table, and you intend to use referential integrity, those tables and the SQL used to define the relationship will also be simpler if you create a separate key column - you just end up carting multiple columns around instead of just one.
I'm designing a db table that will save a list of user's favorited food items.
I created favorite table with the following schema
id, user_id, food_id
user_id and food_id will be foreign key linking to another table.
Im just wondering if this is efficient and scalable cause if user has multiple favorite things then it would need multiple rows of data.
i.e. user has 5 favorited food items, then it will consist of five rows to save the list for that user.
Is this efficient? and scalable? Whats the best way to optimize this schema?
thnx in advance!!!
tldr; This is called a "join table" and is the correct and scalable approach to model M-M relationships in a relational database. (Depending upon the constraints used it can also model 1-M/1-1 relationships in a "no NULL FK" schema.)
However, I contend that the id column should be omitted here so that the table is only user_id, food_id. The PK will be (user_id, food_id) in this case.
Unlike other tables, where surrogate (aka auto-increment) PKs are sometimes argued for, a surrogate PK generally only adds clutter in a join table as it has a very natural compound PK.
While the PK itself is compound in this case, each "joined" table only relates back by part of the PK. Depending upon queries performed it might also be beneficial to add covering indices on food_id or (food_id, user_id).
Eliminate Surrogate Key: Unless you have a specific reason for the surrogate key id, exclude it from the table.
Fine-tune Indexing: A this point, you just have a composite primary key that is the combination of the two foreign keys. In which order should the PK fields be?
If your application(s) predominantly execute queries such as: "for given user, give me foods", then PK should be {user_id, food_id}.
If the predominant query is "for given food, give me users", then the PK should be {food_id, user_id}.
If both query "directions" are common, add a UNIQUE INDEX that has the same fields as PK, but in opposite directions. So you'll have PK on {user_id, food_id} and index on {food_id, user_id}.
Note that InnoDB tables are clustered, which eliminates (in this case "unnecessary") table heap. Yet, the secondary index discussed above will not cause a double-lookup (since it fully covers the query), nor have a hidden overhead of PK fields (since it indexes the same fields as PK, just in opposite order).
For more on designing a junction table, take a look at this post.
To my opinion, you can optimize your table in the following ways:
As a relation table with 2 foreighkeys you don't have to use "id" field.
use "innodb" engine to your table
name your relation table "user_2_food", which will make it more clear.
try to use datatype as small as possible, i.e. "smallint" is better than "int", and don't forget "UNSIGNED" attribute.
Creating the below three Tables will result in an efficient design.
users : userId, username, userdesc
foods : foodId, foodname, fooddesc
userfoodmapping : ufid, userid, foodid, rowstate
The significance of rowstate is, if the user in future doesn't like that food, its state will become -1
You have 2 options in my opnion:
Get rid of the ID field, but in that case, make both your other keys (combined) your primary key
Keep your ID key as the primary key for your table.
In either case, I think this is a proper approach. Once you get into a problem of inefficiency, then you will look at probably how to load part of the table or any other technique. This would do for now.
I am aware of the fact, the if I use the ORM of Django every table has to have a primary key column. Somehow if you have a many_to_many table which links to tables (let's call them authors and books) you would get something like:
id author_id book_id
1 1 1
2 1 2
3 2 3
etc.
I have encountered a book in which it is proposed to avoid the column "id" and to create a compound primary key instead. Does this work out with django?
You could create a compound primary key on your through table (with ALTER TABLE). You could also drop the id column from the table. None of this would harm django in way since the way ManyToMany fields work in the backend they wouldn't use the id column anyway.
However you should note that getting compound PK to work in django is basically a non starter. This shouldn't be an issue for you as no table should have a ForeignKey to your through table (for any reason that I can think of at least.
So in summary. Compound primary keys don't work with django. So if you ever need to have a ForeignKey to a table with a compound PK you are basically SOL. Finally there is no real pro about using a compound PK here, but no real con either (in this one and only case). So why are you spending your time worrying about this?