Understanding Weak Entities and Weak Relationships - mysql

If I have the following ERD:
------
|Inv |
---------
1
|
<<contains>>
|
m
--------
-------- ---------
||Line || 1 --- <<has a>> --- 1 | prod |
-------- ----------
--------
Where Line is a weak entity, and contains and has a are weak relationships, what would the primary key for Line look like?
I've been looking online and I'd like to think it would be a composite primary key consisting of:
PK = (ID from line, Primary Key from Inv, Primary Key from Prod)
Can anyone help me out? Am I right? Where'd I go wrong? etc.

I've been looking online and I'd like to think it would be a composite primary key consisting of:
PK = (ID from line, Primary Key from Inv, Primary Key from Prod)
No, the primary key from Inv and the line item number are sufficient to identify a row in the table "Line". If you want to implement a further business requirement--that each product can appear only once per invoice--you can make an additional unique constraint on the pair of columns {value from Inv, value from Prod}.
As a practical matter, I wouldn't use autoincrementing id numbers in "Inv" or in "Line". Autoincrementing id numbers can leave gaps, and accountants hate gaps. By extension, database people hate gaps in these kinds of numbers, too. (We're the ones who get blamed for "missing" rows.)
You need to be careful about storing the id number for a product, too. If the product name changes, it will appear to change on all past invoices. That's a good way to get on the bad side of a judge in court.

Related

MySQL two-column table as primary key

I have an extreamly simple idea: table that keeps user "achievements". And it is as simple as that:
user_id | achievement_id
1 | 1
1 | 2
1 | 5
2 | 2
2 | 3
All what I need is user id, and id of achievement if he already got it. All what I need to SELECT is SELECT achievement_id WHERE user_id=x. So no need for an artificial autoincrement column that I'll never use or know what it contains. But setting an primary key is required, so the question is - is it good idea to make such 2-column table and set both columns as multi-column primary key? I already have a set of 3-columns table where 2 are primary key, because it is logic... Well, logic for me, but for the database?
These types of tables are common in cases of n-n relationships, multivalued attributes, and weak entities. It varies a lot from its modeling, but yes, it is a good solution for some cases. the primary key is usually the relation of the columns. In your case it would be user_id and achievement_id.
Yes since the rule for such a set of n-keys is: "I only want one kind of record which has this set (a,b) of keys".
-> therefore you won't be able to add twice "Mario, achievement1".
Primary key will be then (PlayerID, AchievementID).
If you want to add some informations about this achievement (for example, when the player got the achievement), simply do such as: (PlayerID, AchievementID, Date) with PlayerID, AchievementID as primary key.
I hope this will help you.

Database Smell - Improve current design with multiple tables

I am in the process of creating a second version of my technical wiki site and one of the things I want to improve is the database design. The problem (or so I think) is that to display each document, I need to join upwards of 15 tables. I have a bunch of lookup tables that contain descriptive data associated with each wiki entry such as programmer used, cpu, tags, peripherals, PCB layout software, difficulty level, etc.
Here is an example of the layout:
doc
--------------
id | author_id | doc_type_id .....
1 | 8 | 1
2 | 11 | 3
3 | 13 | 3
_
lookup_programmer
--------------
doc_id | programmer_id
1 | 1
1 | 3
2 | 2
_
programmer
--------------
programmer_id | programmer
1 | USBtinyISP
2 | PICkit
3 | .....
Since some doc IDs may have multiples entries for a single attribute (such as programmer), I have created the DB to compensate for this. The other 10 attributes have a similiar layout as the 2 programmer tables above. To display a single document article, approx 20 tables are joined.
I used the Sphinx Search engine for finding articles with certain characteristics. Essentially Sphinx indexes all of the data (does not store) and returns the wiki doc ID of interest based on the filters presented. If I want to find articles that use a certain programmer and then sort by date, MYSQL has to first join ALL documents with the 2 programmer tables, then filter, and finally sort the remaining by insert time. No index can help me ordering the filtered results (takes a LONG time with 150k doc IDs) since it is done in a temporary table. As you can imagine, it gets worse really quickly with the more parameters that need to be filtered.
It is because I have to rely on Sphinx to return - say all wiki entries that use a certain CPU AND programer - that lead me to believe that there is a DB smell with my current setup....
edit: Looks like I have implemented a [Entity–attribute–value model]1
I don't see anything here that suggests you've implemented EAV. Instead, it looks like you've assigned every row in every table an ID number. That's a guaranteed way to increase the number of joins, and it has nothing to do with normalization. (There is no "I've now added an id number" normal form.)
Pick one lookup table. (I'll use "programmer" in my example.) Don't build it like this.
create table programmer (
programmer_id integer primary key,
programmer varchar(20) not null,
primary key (programmer_id),
unique key (programmer)
);
Instead, build it like this.
create table programmer (
programmer varchar(20) not null,
primary key (programmer)
);
And in the tables that reference it, consider cascading updates and deletes.
create table lookup_programmer (
doc_id integer not null,
programmer varchar(20) not null,
primary key (doc_id, programmer),
foreign key (doc_id) references doc (id)
on delete cascade,
foreign key (programmer) references programmer (programmer)
on update cascade on delete cascade
);
What have you gained? You keep all the data integrity that foreign key references give you, your rows are more readable, and you've eliminated a join. Build all your "lookup" tables that way, and you eliminate one join per lookup table. (And unless you have many millions of rows, you're probably not likely to see any degradation in performance.)

Does it make sense to have three primary keys, two of which are foreign keys, in one table?

I've created a database with three tables in it:
Restaurant
restaurant_id (autoincrement, PK)
Owner
owner_id (autoincrement, PK)
restaurant_id (FK to Restaurant)
Deal
deal_id (autoincrement)
owner_id (FK to Owner)
restaurant_id (FK to Restaurant)
(PK: deal_id, owner_id, restaurant_id)
There can be many owners for each restaurant. I chose two foreign keys for Deal so I can reference the deal by either the owner or the restaurant. The deal table would have three primary keys, two being foreign keys. And it would have two one-to-many relationships pointing to it. All of my foreign keys are primary keys and I don't know if I'll regret doing it like this later on down the road. Does this design make sense, and seem good for what I'm trying to achieve?
Edit: What I really need to be able to accomplish here is when a owner is logged in and viewing their account, I want them to be able to see and edit all the deals that are associated with that particular restaurant. And because there can be more that one owner per restaurant, I need to be able to perform a query something like: select *from deals where restaurant_id = restaurant_id. In other words, if I'm an owner and I'm logged in, I need to be able to make query: get all of the deal that are related to not just me, the owner, but to all of the owners associated with this restaurant.
You're having some trouble with terminology.
A table can only ever have a one primary key. It is not possible to create a table with two different primary keys. You can create a table with two different unique indexes (which are much like a primary key) but only one primary key can exist.
What you're asking about is whether you should have a composite or compound primary key; a primary key using more than one column.
Your design is okay, but as written you probably have no need for the column deal_id. It seems to me that restaurant_id and owner_id together are enough to uniquely identify a row in Deal. (This may not be true if one owner can have two different ownership stakes in a single restaurant as the result of recapitalization or buying out another owner, but you don't mention anything like that in your problem statement).
In this case, deal_id is largely wasted storage. There might be an argument to be made for using the deal_id column if you have many tables that have foreign keys pointing to Deal, or if you have instances in which you want to display to the user Deals for multiple restaurants and owners at the same time.
If one of those arguments sways you to adopt the deal_id column, then it, and only it, should be the primary key. There would be nothing added by including the other two columns since the autoincrement value itself would be unique.
If u have a unique field, this should be the PK, that would be the incremented field.
In this specific case it gives u nothing at all to add more fields to this key, it actually somewhat impacts performance (don't ask me how much, u bench it).
if you want to create 2 foreign keys in the deal table which are the restaurant and the owner the logic is something like a table could exist in the deal even without an owner or an owner could exist in the deal even without identifying the table on it but you could still identify the table because it's being used as a foreign key on the owner table, but if your going to put values on each columns that you defined as foreign key then I think it's going to be redundant cause I'm not sure how you would use the deal table later on but by it's name I think it speaks like it would be used to identify if a restaurant table is being reserved or not by a customer and to see how you have designed your database you could already identify the table which they have reserved even without specifying the table as foreign key in the deal table cause by the use of the owner table you would able to identify which table they have reserved already since you use it as foreign key on the owner table you just really have to be wise on defining relationships between your tables and avoid redundancy as much as possible. :)
I think it is not best.
First of all, the Deal table PK should be the deal_id. There is no reason to add additional columns to it--and if you did want to refer to the deal_id in another table, you'd have to include the restaurant_id and owner_id which is not good. Whether deal_id should also be the clustered index (a.k.a. index organized on this column) depends on the data access pattern. Will your database be full of data_id values most often used for lookup, or will you primarily be looking deals up by owner_id or restaurant_id?
Also, using two separate FKs way the you have described it (as far as I can tell!) would allow a deal to have an owner and restaurant combination that are not a valid (combining an owner that does not belong to that restaurant). In the Deal table, instead of one FK to Owner and one FK to Restaurant, if you must have both columns, there should be a composite FK to only the Owner table on (OwnerID, RestaurantID) with a corresponding unique key in the Owner table to allow this link up.
However, with such a simple table structure I don't really see the problem in leaving RestaurantID out of the Deal table, since the OwnerID always fully implies the RestaurantID. Obviously your deals cannot be linked only with the restaurant, because that would imply a 1:M relationship on Deal:Owner. The cost of searching based on Restaurant through the Owner table shouldn't really be that bad.
Its not wrong, it works. But, its not recommended.
Autoincrement Primary Keys works without Foreign Keys (or Master Keys)
In some databases, you cannot use several fields as a single primary key.
Compound Primary Keys or Compose Primary Keys are more difficult to handle in a query.
Compound Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(1=1) AND
(D.RestaurantKey = D.RestaurantKey) AND
(D.OwnerKey = D.OwnerKey)
Versus
Single Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(D.OwnerKey = O.OwnerKey)
Sometimes, you have to change the value of foreign key of a record, to another record. For Example, your customers already order, the deal record is registered, and they decide to change from one restaurant table to another. So, the data must be updated, in the "Owner", and "Deal" tables.
+-----------+-------------+
| OwnerKey | OwnerName |
+-----------+-------------+
| 1 | Anne Smith |
+-----------+-------------+
| 2 | John Connor |
+-----------+-------------+
| 3 | Mike Doe |
+-----------+-------------+
+-----------+-------------+-------------+
| OwnerKey | DealKey | Food |
+-----------+-------------+-------------+
| 1 | 1 | Hamburguer |
+-----------+-------------+-------------+
| 2 | 2 | Hot-Dog |
+-----------+-------------+-------------+
| 3 | 3 | Hamburguer |
+-----------+-------------+-------------+
| 1 | 3 | Soda |
+-----------+-------------+-------------+
| 2 | 1 | Apple Pie |
+-----------+-------------+-------------+
| 3 | 3 | Chips |
+-----------+-------------+-------------+
If you use compound primary keys, you have to create a new record for "Owner", and new records for "Deals", copy the other fields, and delete the previous records.
If you use single keys, you just have to change the foreign key of Table, without inserting or deleting new records.
Cheers.

Allow/require only one record with common FK to have "primary" flag

Firstly, I apologise if this is a dupe - I suspect it may be but I can't find it.
Say I have a table of companies:
id | company_name
----+--------------
1 | Someone
2 | Someone else
...and a table of contacts:
id | company_id | contact_name | is_primary
----+------------+--------------+------------
1 | 1 | Tom | 1
2 | 2 | Dick | 1
3 | 1 | Harry | 0
4 | 1 | Bob | 0
Is it possible to set up the contacts table in such a way that it requires that one and only one record has the is_primary flag set for each common company_id?
So if I tried to do:
UPDATE contacts
SET is_primary = 1
WHERE id = 4
...the query would fail, because Tom (id = 1) is already flagged as the primary contact for company_id = 1. Or even better, would it be possible to construct a trigger so that the query would succeed, but Tom's is_primary flag would be cleared by the same operation?
I am not too bothered about checking whether company_id exists in the companies table, my PHP code would already have performed this check before I got to this stage (although if there is a way to do this in the same operation it would be nice, I suppose).
When I initially thought about this I thought "that will be easy, I'll just add a unique index across the company_id and is_primary columns" but obviously that won't work as it would restrict me to one primary and one non-primary contact - any attempt to add a third contact would fail. But I can't help feeling there would be a way to configure a unique index that gives me the minimum functionality I require - to reject an attempt to add a second primary contact, or reject an attempt to leave a company with no primary contact.
I am aware that I could just add a primary_contact field to the companies table with an FK to the contacts table but it feels messy. I don't like the idea of both tables having an FK to the other - it seems to me that the one table should rely on the other, not both tables relying on each other. I guess I just think that over time there is more chance of something going wrong.
To sum up:
How can I restrict the contacts table so that one and only one record with a given company_id has the is_primary flag set?
Anyone have any thoughts on whether two tables having FKs to each other is a good/bad idea?
Circular refenences between tables are indeed messy. See this (decade old) article: SQL By Design: The Circular Reference
The cleanest way to make such a constraint is to add another table:
Company_PrimaryContact
----------------------
company_id
contact_id
PRIMARY KEY (company_id)
FOREIGN KEY (company_id, contact_id)
REFERENCES Contact (company_id, id)
This will also require a UNIQUE constraint in table Contact on (company_id, id)
You could just do a query before that one setting
UPDATE contacts SET is_primary = 0 WHERE company_id = .....
or even
UPDATE contacts
SET is_primary = IF(id=[USERID],1,0)
WHERE company_id = (
SELECT company_id FROM contacts WHERE id = [USERID]
);
Just putting an alternative out there - personally I'd probably look to the FK approach though instead of this type of workaround i.e. have a field in the companies table with a primary_user_id field.
EDIT method w/o relying on a contact.is_primary field
Alternative method, first of all remove is_primary from contacts. Secondly add a "primary_contact_id" INT field into companies. Thirdly, when changing the primary contact, just change that primary_contact_id thus preventing any possibility of there being more than 1 primary contact at any time and all without the need for triggers etc in the background.
This option would work fine in any engine as it's simply updating an INT field, any reliance on FK's etc could be added/removed as required but at it's simplest it's just changing an INT fields value
This option is viable as long as you need one and precisely one link from companies to contacts flagging a primary

Multiple foreign keys per record in sql?

I'm creating an application (using PHP / Codeigniter / MYSQL) for tracking volunteers at events. I'd like multiple volunteers to be able to sign on to each event. I plan on doing this using a table called signup which looks something like this:
TABLE SIGNUP
============
VolunteerId EventId
----------- -------
12 223
13 223
15 223
12 235
13 235
19 235
Both columns are foreign keys (to the primary keys of the volunteer table and event table respectively).
Is there a better way to do this?
Should I use a compound-key as the primary key?
Honestly, I don't see a problem with the way you've set it up. Tables like this are commonly used to establish one-to-many relationships between different objects. I'm doing something similar in a table that references counties and cities in a given state. (Some cities span multiple counties.)
Database design best practices state that you should declare a primary key for a table. You don't have to do this; you can technically declare a table without a primary key. However, note that many DB engines will simply create a primary key for you behind the scenes if you don't specifically declare a key; this, however, may not be ideal for every situation (and generally isn't). Specifying a primary key of your choice is good for database optimization and organization.
Due to this, I'd say that you might as well use a compound key as your primary key for your many-to-many table instead of creating a separate index column. In this situation, this will satisfy the table requirements (as a db engine will make a primary key for you regardless) and it will prevent multiple occurences of the same pair, which won't do you any good in a many-to-many reference table.
Short answer: Go with the compound primary key - primary key(VolunteerID, EventID). You shouldn't go wrong.
One use for a compound UNIQUE key would be to prevent the same volunteer/event pair from appearing twice in the table. There's no need for a primary key for this.
A good discussion on why compound primary keys should be avoided: What are the down sides of using a composite/compound primary key?
Given the table you've described you have three choices
1 - lunchmeat317
SIGNUP
-------
VolunteerId (PK)
EventId (PK)
2 - Ted Hopp
SIGNUP
-------
VolunteerId (AK1)
EventId (AK1)
3 - ic3b3rg
SIGNUP
-------
SignUpID (PK)
VolunteerId (AK1)
EventId (AK1)
As Thomas pointed out the main difference between 1 and 2 is that Unique doesn't stop the following.
VolunteerId EventId
----------- -------
null null
null null
However if these fields don't allow nulls to begin with (and the shouldn't) then they're exactly the same.
You could also add, as ic3b3rg suggests a Surrogate key (SignUpID). But as CJ Date notes (and I'm paraphrasing) introducing an artificial, surrogate, nonvolatile key will often be a good idea, but since its often difficult to determine volatility there's no formal way to know when you really need it.
That said as long as this table is is ...
Tracking that volunteers have signed up for events
There won't be any other attributes that have a functional or join dependency to R(VolunteerId, EventID)
... then in the immortal words of Yogi Berra "When you come to a fork in the road, take it" Meaning all three choices are valid and the choice probably won't impact your system one way or another.
Personally this is how I typically do it.
SIGNUP
-------
SignUpID (PK)
VolunteerId (AK1) (Not Null)
EventId (AK1) (Not Null)