Replace primary key that is foreign key in other tables - mysql

I am currently rebuilding a database which is used to store patient records. In the current database, the primary key for a patient is their name and date of birth, (a single column, ie "John Smith 1970-01-01", it is not composite). This is also a foreign key in many other tables to reference the patients table. I am planning to replace this key with an auto-generated integer key (since there will obviously be duplicate keys one day under the current system). How can I add a new primary key to this table and add appropriate foreign keys on all the other tables? Keep in mind that there is already a very large amount of data (~500,000 records) and these data references cannot be broken.
Thanks!

If up to me..
Add a new future-PK column as a non-null unique index (it must be a KEY, but not necessarily the PK) with auto_increment.
Add the appropriate new-FK columns to all the related tables, these should be initially nullable.
Set the new-FK value to the appropriate future-PK value based on the current-PK/FK relationships. Use an "UPDATE .. JOIN" for this step.
Enable the Referential Integrity Constraints (DRI) on the relevant tables. It only needs to be KEY/FK, not PK/FK, which is why the future-PK can be used. Every existing DRI constraint using the current-PK should likely be updated during this step.
Remove the new-FK column nullability based on modeling requirements.
Remove any residue old-FK columns as they are now redundant data.
Switch the old-PK and the new/future-PK (this can be done in one command and may take awhile to physically reorganize all the rows). Remove the old PK column as applicable, or perhaps simply remove the KEY status.
I would also offline the database during the process, review and test the process (use a testing database for dry-runs), and maintain backups.
The Data-Access Layer and any Views/etc will also need to be updated. These should be done at the same time, again through a review and testing process.
Also, even when adding an auto-increment PK, the table should generally still have an appropriate covering natural key enforced with unique constraints.

I solved the problem using the following method:
1- Assigned added a new primary key to the patients table and assigned unique values to all existing records
2- Created materialized views (without triggers) for each of the referencing tables that included all fields in the referencing table as well as the newly created id field in the patients table (via a join).
3- Deleted the source referencing tables
4- Renamed the materialized views to the names of the original source tables
The materialized views are now the dependent tables.
A reference for materialized views: http://www.fromdual.com/mysql-materialized-views

Related

What is the use of Composite key in SQL Server 2008?

i have three tables,
master table
transaction table
master_transaction_link table
here my question is, in link table, which has id,mstrid,transid - mstrid is id of mster table and transid is id of transction table
why should i set the mstrid and transid as composite key in link table.?
what is the use of composite key in link table?
Composite key can be considered as a logical join of these two tables and you can save a column from link table if you use logical columns.
If you consider using somekind of ORM in your software i would suggess to use surrogate as primary key, even thought that many ORM's supports composite keys, but they are sometimes harder to handle.
Also data storing is cheap nowadays and saving a one column isnt usually worth it.

In MySQL, will changing the schema of a primary key affect the schema of foreign keys in other tables?

We are in the midst of trying to clean up the database and debating about whether or not to put in place foreign key constraints on our tables. It would be a very convincing argument for using them if changing the schema of a primary key in one table affected the schema of foreign keys in other tables. But is this the case?
For example, let's say I have a USER table with primary key id and I have another table BLOGGERS whose blogger_id is a foreign key tied to id. Let's say that id is initially declared as a SMALLINT, but then I have hordes of users signing up and we need to increase the range available for ids. If I alter id to be and INT, will that automatically alter blogger_id in the BLOGGER table to be an INT also?
Regardless of the answer to my primary question, does anyone know of any compelling reasons to formally declare foreign key constraints, other than to limit the data the can be placed in that field? Thanks!
No, MySQL does not change the data type in child tables if you change the data type in the parent table.
I had to help one of my consulting customers who had reached the maximum INT value in their Users table. But as you can imagine, there were 30 other tables that referenced Users. We had to ALTER TABLE on each of those other 30 tables before we could change the primary key data type in the Users table, because it wouldn't work if new user id's could not be referenced by the child tables.
As for your question about foreign keys, yes, I do recommend them for the sake of enforcing data integrity. In every database I have analyzed that tried to do without foreign keys, they had a lot of orphaned rows in child tables, with no automatic way of detecting them.
That said, it's surprisingly common for sites to forego foreign keys, assuming they will "just do the right thing" in their application code to avoid orphaned data.
One of the arguments against foreign keys is that the presence of foreign keys creates some cases of locking that you may not expect. If I UPDATE a child row, you expect it to lock that row for the duration of your transaction. But if you have a foreign key, this also locks the parent row, in the table referenced by the foreign key.
Example: suppose you have a parent table ShoppingCart and a child table LineItems. If you UPDATE the quantity of a row in LineItems, your transaction makes an exclusive lock (X-lock) on that row. But it also makes a shared lock (S-lock) on the parent row in ShoppingCart. It makes sense that you wouldn't want the row you're depending on to be DELETEd, for example, while you're in progress of working on one of the rows that references it.
This is a shared lock, so multiple transactions can have this kind of lock at the same time, but then if you need to update the parent row directly while one or more clients have those implicit shared locks, you are blocked.

Primary Key field with merged tables

Apologies for the noob question (I'm keenly learning as I go). I'd be grateful for some advice on the Primary Key.
I have 5 separate (unrelated) tables (Access 2003) containing similar fields that I will be merging (using Append queries) into a single new table. Each record between tables is unique (no duplicated).
Each separate table already has a primary key field using the default autonumber method (1-n). This means (I'm thinking) that there will be many duplicate primary key numbers between tables.
Is it standard practice (and ok to do) to detete the existing primary key field and create a new (autonumber; 1-n) upon merging. Should I do this before the merge (for each separate table) or after the merge (on the single new table)?
Create your new table with the table structure, primary keys and any other necessary metadata defined. Then run a SELECT INTO statement from each of the five table tables specifying the columns to copy into the new table. Since you already have your identity column defined on the new table and you are not selecting the identity column on the old table(s) the data should copy over and the insert will assign a new primary key value.

Database Design - Custom attributes table - Table that "relate" entities

I'm designing a database (for use in mysql) that permits new user-defined attributes to an entity called nodes.
To accomplish this I have created 2 other tables. One customvars table that holds all custom attributes and a *nodes_customvars* that define the relationship between nodes and customvars creating a 1..n and n..1 relationship.
Here is he link to the drawed model: Sketched database model
So far so good... But I'm not able to properly handle INSERTs and UPDATEs using separate IDs for each table.
For example, if I have a custom attribute called color in the *nodes_customvars* table inserted for a specific node, if I try to "INSERT ... ON DUPLICATE KEY UPDATE" either it will always insert or always update.
I've thinked on remove the "ID" field from the *nodes_customvars* tables and make it a composite key using nodes id and customvars id, but I'm not sure if this is the best solution...
I've read this article, and the comments, as well: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
What is the best solution to this?
EDIT:
Complementing: I don't know the *nodes_customvars* id, only nodes id and customvars id. Analysing the *nodes_customvars* table:
1- If I make nodes id and/or customvars id UNIQUE in this table, using "INSERT ... ON DUPLICATE KEY UPDATE" will always UPDATE. Since that multiple nodes can share the same customvar, this is wrong;
2- If I don't make any UNIQUE key, "INSERT ... ON DUPLICATE KEY UPDATE" will always INSERT, since that no UNIQUE key is already found in the statement...
You have two options for solving your specific problem of the "INSERT...ON DUPLICATE KEY" either always inserting or updating as you describe.
Change the primary to be a composite key using nodeId and customvarId (as suggested by SyntaxGoonoo and in your question as a possible option).
Add a composite unique index using nodeId and customvarId.
CREATE UNIQUE INDEX IX_NODES_CUSTOMVARS ON NODES_CUSTOMVARS(nodeId, customvarId);
Both of the options would allow for the "INSERT...ON DUPLICATE KEY" functionality to work as you require (INSERT if a unique combination of nodeId and customvarId doesn't exist; update if it does).
As for the question about whether to have a composite primary key or a separate primary key column with an additional unique index, there are many things to consider in the design. There's the 1NF considerations and the physical characteristics of the database platform you're on and the preference of the ORM you happen to be using (if any). Given how InnoDB secondary indexes work (see last paragraph at: http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html), I would suggest that you keep the design as you currently have it and add in the additional unique index.
HTH,
-Dipin
You current entity design breaks 1NF. This means that your schema can erroneously store duplicate data.
nodes_customvars describes the many-to-many relationship between nodes and customvars. This type of table is sometimes referred to as an auxiliary table, because its contents are purely derived from base tables (in this case nodes and customvars).
The PK for an auxiliary table describing a many-to-many relationship should be a composite key in order to prevent duplication. Basically 1NF.
Any PK on a table is inherently UNIQUE. regardless of whether it is a single, or composite key. So in some ways your question doesn't make sense, because you are talking about turning the UNIQUE constraint on/off on id for nodes and customvars . Which you can't do if your id is actually a PK.
So what are you actually trying to achieve here???

unable to enforce referential integrity in Access

I've checked everything for errors: primary key, uniqueness, and type. Access just doesnt seem to be able to link the 2 fields i have in my database. can someone please take a look?
http://www.jpegtown.com/pictures/jf5WKxKRqehz.jpg
Thanks.
Your relationship diagram shows that you've made the ID fields your primary key in all your tables, but you're not using them for your joins. Thus, they serve absolutely no purpose. If you're not going to use "surrogate keys" (i.e., a meaningless ID number that is generated by the database and is unique to each record, but has absolutely no meaning in regard to the data in your table), then eliminate them. But if you're going to use "natural keys" (i.e., a primary key constructed from a set of real data fields that together are going to be unique for each record), you must have a unique compound index on those fields.
However, there are issues with both approaches:
Surrogate Keys: a surrogate PK makes each record unique. That is you could have a record for David Fenton with ID 1 and a record for David Fenton with ID 2. If it's the same David Fenton, you've got duplicate data, but as far as your database knows, they are unique.
Natural Keys: some types of entities work very well with natural keys. The best such are where there's a single field that identifies the record uniquely. An example would be "employee type," where values might be "associate, manager, etc." In that case, it's a very good candidate for using the natural key instead of adding a surrogate key. The only argument against the natural key in that case is if the data in the candidate natural key is highly volatile (i.e., it changes frequently). While every modern database engine provides "CASCADE UPDATE" functionality (i.e., if the value in the PK field changes, all the tables where that field is a Foreign Key are automatically updated), this imposes a certain amount of overhead and can be problematic. For single-column keys, it's unlikely to be an issue. Now, except for lookup tables, there are very few entities for which a natural key will be a single column. Instead, you have to create a compound index, i.e., an index that spans multiple data fields. In the index dialog in Access table design, you create a compound key by giving it a name in the first column, and then adding multiple rows in the second column (from the dropdown list of fields in your table). The drawback of this is that if any of the fields in your compound unique index are unknown, you won't get uniqueness. That is, if a field has a Null in two records, and the rest of the fields are identical, this won't be counted as a conflict of uniqueness because Null never equals Null. This is because Null doesn't mean "empty" -- it means "Unknown."
Allen Browne has explained everything you need to know about Nulls:
Nulls: Do I Need Them?
Common Errors with Null
In your graphic, you show that you are trying to link the Company table with the PManager table. The latter table has a CompanyID field, and your Company table has a unique index on its ID field, so all you need is a link from the ID field of the Company table to the CompanyID field of the PManager table. For your example to work (which would be useless, since you already have a unique index on the ID field), you'd need to create a unique compound key spanning both ID and ShortName in the Company table.
Additionally, if ShortName is a field that you want to be unique (i.e., you don't want two company records to have the same ShortName), you should add a unique index to it, whether or not you still use the ID field as your primary key. This brings me back to item #1 above, where I described a situation where a surrogate key could lead you to enter duplicate records, because uniqueness is established by the surrogate key along. Any time you choose to use a surrogate key, you must also add a unique compound index on any combination of data fields that needs to be unique (with the caveat about Null fields as outlined in item #2).
If you're thinking "surrogate keys mean more indexes" you're correct, in that you have two unique indexes on the same table (assuming you don't have the Null problem). But you do get substantial ease of use in joining tables in SQL, as well as substantially less duplication of data. Likewise, you avoid the overhead of CASCADE UPDATE. On the other hand, if you're viewing a child table with a natural foreign key, you don't need to join to the parent table to be able to identify the parent record, because the data that identifies that record is right there in the foreign-key fields. That lack of a need for a join can be a major performance gain in certain scenarios (especially for the case where you'd need an outer join because the foreign key can be Null).
This is actually quite a huge topic, and it's something of a religious argument. I'm firmly in the surrogate key camp, but I use natural keys for lookup tables where the key is a single column. I don't use natural keys for any other purpose. That said, where possible (i.e., no Null problems) I also have a unique index on the natural key.
Hope this helps.
Actually you need an index on the name fields, on both sides
However, may I suggest that you have way too many joins? In general there should only be one join from one table to the next. It is rare to have more than one join between tables, and exceedingly rare to have more than two.
Have a look at this link:
http://weblogs.asp.net/scottgu/archive/2006/07/12/Tip_2F00_Trick_3A00_-Online-Database-Schema-Samples-Library.aspx
Notice how all of the tables are joined together by a single relationship?
Each of the fields labeled PK are primary keys. These are AUTONUMBER fields. Each of the fields labeled FK are foreign keys. These are indexed Number fields of type Integer. The Primary Keys are connected to the Foreign Keys in a 1 to many relationship (in most cases).
99% of the time, you won't need any other kind of joins. The trick is to create tables with unique information. There is a lot of repeated information in your database.
A database that is reorganized in this manner is called a "normalized" database. There are lots of good examples of these at http://www.databaseanswers.org/data_models/
Just join on the CompanyID. You could also get rid of the Company field in PManager.
I did the following and the problem was solved (I face the same problem of referential integrity in access).
I exported data from both tables in Access to Excel. Table1
was containing Cust Code and basic information about the company.
Cust Code as Primary key.
Table2 was containing all information about who the
customers associated with that company.
I removed all duplicates from Table2 exported to excel.
Using Vlookup I checked and found that there are 11
customers code not present in Table1.
I added those codes in Access Table. I linked by
referential integrity and Problem was solved.
Also look for foreign key if it does not work.
You need to create an INDEX. Perhaps look for some kind of create index button and create an index on CompanyID