I have encountered MySQL itself recently and the topic of Composite Primary Keys in MySQL, especially how it is useful and what are its pros and cons from this site
I wanted to play with that, so I have created three tables in this fashion:
CREATE TABLE person(
personId INT(11) NOT NULL,
personName VARCHAR(20) NOT NULL,
PRIMARY KEY(personId)
)
CREATE TABLE language(
languageId INT(11) NOT NULL,
languageName VARCHAR(20) NOT NULL,
PRIMARY KEY(personId)
)
CREATE TABLE personLanguage(
personId INT(11) NOT NULL,
languageId INT(11) NOT NULL,
description VARCHAR(20) NOT NULL,
PRIMARY KEY(personId, languageId),
FOREIGN KEY (personId) REFERENCES person(personId) ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY (languageId) REFERENCES language(languageId) ON UPDATE CASCADE ON DELETE CASCADE
)
I can insert data into the person and language tables-straight forward, my questions are:
For the personLanguage table do I need to insert only description column, while the other columns are automatically referenced, or do I need to insert the values for the other two columns in personLanguage table as well
Is there a possibility to update the personId and languageId in personLanguage table automatically as soon as the data in other two tables are inserted, as far as I know when some update/delete is done in either of person or language tables it reflects the same on the two columns in personLanguage table
How to fetch the data relating the three tables, for example I need to know which language does the person with personId=1 speaks? Is it also straight forward query using joins or is there some other way to do since I use composite primary keys
Lots of questions bugging my mind and I could not really find a whole working example to check the exact pros and cons of using composite primary keys. In case if somebody could elaborate this using my example, would be really helpful.
I know I have sort of asked some basic, some what makes no sense question, but please do bear me and throw some good light on this topic
For the personLanguage table do I need to insert only description
column, while the other columns are automatically referenced, or do I
need to insert the values for the other two columns in personLanguage
table as well
Yes, you will need to insert all three of the columns to be completely valid. Otherwise the DB won't know what person or language you are trying to tie this record to.
Is there a possibility to update the personId and languageId in
personLanguage table automatically as soon as the data in other two
tables are inserted, as far as I know when some update/delete is done
in either of person or language tables it reflects the same on the two
columns in personLanguage table
You could do this via an insert trigger, but it might not make any sense. So, let's say that you just entered a new language - say French. You shouldn't need to enter any values at all into the personLanguage table because your existing users might not want to get information in French. The same situation would be for creating a new person. You might have many languages. Most people won't speak most of the languages, so again, you wouldn't want to enter a record into the personLanguage table automatically.
As for updating the records in person and language, the KEYS shouldn't change. This is why you would do something like this. Once Bob or Alice is assigned a personId, they are that Id. Once French is assigned a langaugeId, it should always be that languageId.
How to fetch the data relating the three tables, for example I need to
know which language does the person with personId=1 speaks? Is it also
straight forward query using joins or is there some other way to do
since I use composite primary keys
Well, this is the tricky question. If you are trying to get ALL the languages personId=1 speaks, the join is pretty easy.
select pl.personId, l.languageId, l.languageName
from personLanguage pl
join language l on l.languageId = pl.languageId
where pl.personId = 1
It gets more complicated if you are trying to figure out which language you should communicate with the person, since there is a chance that the person might not have any personLanguages defined. If you can accept null values, you can use an outer join, but you would want to define the query so that you only return a single language.
For the personLanguage table do I need to insert only description column, while the other columns are automatically referenced, or do I need to insert the values for the other two columns in personLanguage table as well
You can only insert values into personLanguage if there are existing keys in the referenced tables. This means you must populate person and language prior to inserting values into personLanguage. However, if you have a NULLable field, you could do this, but it would violate the unique composite key.
Is there a possibility to update the personId and languageId in personLanguage table automatically as soon as the data in other two tables are inserted, as far as I know when some update/delete is done in either of person or language tables it reflects the same on the two columns in personLanguage table
The constraint that you have specified (ON UPDATE CASCADE) means that when there is a change to a referenced value in either person or language it will automatically update those values in personLanguage. However, there cannot be a violation of of the PRIMARY KEY constraint on personLanguage.
How to fetch the data relating the three tables, for example I need to know which language does the person with personId=1 speaks? Is it also straight forward query using joins or is there some other way to do since I use composite primary keys
Since this is a basic example, there wouldn't really be a need for this. In an extended form, you could use explicit JOINs to fetch data between the tables.
Just a few more thoughts...
Composite keys are generally used for referencing in tuples (or sets). This means that when you have a composite key (col1, col2) on table1, this references a composite key (col1, col2) on table2.
Related
Here's what's confusing me. I often have composite primary keys in database tables. The bad side of that approach is that I have pretty extra work when I delete or edit entries. However, I feel that this approach is in the spirit of database design.
On the other side, there are friends of mine, who never use composite keys, but rather introduce another 'id' column in a table, and all other keys are just FKs. They have much less work while coding delete and edit procedures. However, I do not know how they preserve uniqueness of data entries.
For example:
Way 1
create table ProxUsingDept (
fkProx int references Prox(ProxID) NOT NULL,
fkDept int references Department(DeptID) NOT NULL,
Value int,
PRIMARY KEY(fkProx,fkDept)
)
Way 2
create table ProxUsingDept (
ID int NOT NULL IDENTITY PRIMARY KEY
fkProx int references Prox(ProxID) NOT NULL,
fkDept int references Department(DeptID) NOT NULL,
Value int
)
Which way is better? What are the bad sides of using the 2nd approach? Any suggestions?
I personally prefer your 2nd approach (and would use it almost 100% of the time) - introduce a surrogate ID field.
Why?
makes life a lot easier for any tables referencing your table - the JOIN conditions are much simpler with just a single ID column (rather than 2, 3, or even more columns that you need to join on, all the time)
makes life a lot easier since any table referencing your table only needs to carry a single ID as foreign key field - not several columns from your compound key
makes life a lot easier since the database can handle the creation of unique ID column (using INT IDENTITY)
However, I do not know how they
preserve uniqueness of data entries.
Very simple: put a UNIQUE INDEX on the compound columns that you would otherwise use as your primary key!
CREATE UNIQUE INDEX UIX_WhateverNameYouWant
ON dbo.ProxUsingDept(fkProx, fkDept)
Now, your table guarantees there will never be a duplicate pair of (fkProx, fkDept) in your table - problem solved!
You ask the following questions:
However, I do not know how they
preserve uniqueness of data entries.
Uniqueness can be preserved by declaring a separate composite UNIQUE index on columns that would otherwise form the natural primary key.
Which way is better?
Different people have different opinions, sometimes strongly held. I think you will find that more people use surrogate integer keys (not that that makes it the "right" solution).
What are the bad sides of using the
2nd approach?
Here are some of the disadvantages to using a surrogate key:
You require an additional index to maintain the unique-ness of the natural primary key.
You sometimes require additional JOINs to when selecting data to get the results you want (this happens when you could satisfy the requirements of the query using only the columns in the composite natural key; in this case you can use the foreign key columns rather than JOINing back to the original table).
There are cases like M:N join tables where composite keys make most sense (and if the nature or the M:N link changes, you'll have to rework this table anyway).
I know it is a very long time since this post was made. But I had to come across a similar situation regarding the composite key so I am posting my thoughts.
Let's say we have two tables T1 and T2.
T1 has the columns C1 and C2.
T2 has the columns C1, C2 and C3
C1 and C2 are the composite primary keys for the table T1 and foreign keys for the table T2.
Let's assume we used a surrogate key for the Table T1 (T1_ID) and used that as a Foreign Key in table T2, if the values of C1 and C2 of the Table T1 changes, it is additional work to enforce the referential ingegrity constraint on the table T2 as we are looking only at the surrogate key from Table T1 whose value didn't change in Table T1. This could be one issue with second approach.
I have 2 tables, each with their own auto incremented IDs, which are of course primary keys.
When I want to create a 3rd table to establish the relation between these 2 tables, I always have an error.
First one is that you can have only 1 automatically-incremented column, the second one occurs when I delete the auto_increment statement from those 2, therefore AQL doesn't allow me to make them foreign keys, because of the type matching failure.
Is there a way that I can create a relational table without losing auto increment features?
Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto_increment statement, of course. Is it inevitable?
Thanks in advance.
1 Concept
You have misunderstood some basic concepts, and the difficulties result from that. We have to address the concepts first, not the problem as you perceive it, and consequently, your problem will disappear.
auto incremented IDs, which are of course primary keys.
No, they are not. That is a common misconception. And problems are guaranteed to ensue.
An ID field cannot be a Primary Key in the English or technical or Relational senses.
Sure, in SQL, you can declare any field to be a PRIMARY KEY, but that doesn't magically transform it into a Primary Key in the English, technical, or Relational senses. You can name a chihuahua "Rottweiller", but that doesn't transform it into a Rottweiller, it remains a chihuahua. Like any language, SQL simply executes the commands that you give it, it does not understand PRIMARY KEY to mean something Relational, it just whacks an unique index on the column (or field).
The problem is, since you have declared the ID to be a PRIMARY KEY, you think of it as a Primary Key, and you may expect that it has some of qualities of a Primary Key. Except for the uniqueness of the ID value, it provides no benefit. It has none of the qualities of a Primary Key, or any sort of Relational Key for that matter. It is not a Key in the English, technical, or Relational senses. By declaring a non-key to be a key, you will only confuse yourself, and you will find out that there is something terribly wrong only when the user complains about duplicates in the table.
2 Relational Model
2.1 Relational tables must have row uniqueness
A PRIMARY KEY on an ID field does not provide row uniqueness. Therefore it is not a Relational table containing rows, and if it isn't that, then it is a file containing records. It doesn't have any of the integrity, or power (at this stage you will be aware of join power only), or speed, that a table in a Relational database has.
Execute this code (MS SQL) and prove it to yourself. Please do not simply read this and understand it, and then proceed to read the rest of this Answer, this code must be executed before reading further. It has curative value.
-- [1] Dumb, broken file
-- Ensures unique RECORDS, allows duplicate ROWS
CREATE TABLE dumb_file (
id INT IDENTITY PRIMARY KEY,
name_first CHAR(30),
name_last CHAR(30)
)
INSERT dumb_file VALUES
( 'Mickey', 'Mouse' ),
( 'Mickey', 'Mouse' ),
( 'Mickey', 'Mouse' )
SELECT *
FROM dumb_file
Notice that you have duplicate rows. Relational tables are required to have unique rows. Further proof that you do not have a relational table, or any of the qualities of one.
Notice that in your report, the only thing that is unique is the ID field, which no user cares about, no user sees, because it is not data, it is some additional nonsense that some very stupid "teacher" told you to put in every file. You have record uniqueness but not row uniqueness.
In terms of the data (the real data minus the extraneous additions), the data name_last and name_first can exist without the ID field. A person has a first name and last name without an ID being stamped on their forehead.
The second thing that you are using that confuses you is the AUTOINCREMENT. If you are implementing a record filing system with no Relational capability, sure, it is helpful, you don't have to code the increment when inserting records. But if you are implementing a Relational Database, it serves no purpose at all, because you will never use it. There are many features in SQL that most people never use.
2.2 Corrective Action
So how do you upgrade, elevate, that dumb_file that is full of duplicate rows to a Relational table, in order to get some of the qualities and benefits of a Relational table ? There are three steps to this.
You need to understand Keys
And since we have progressed from ISAM files of the 1970's, to the Relational Model, you need to understand Relational Keys. That is, if you wish to obtain the benefits (integrity, power, speed) of a Relational Database.
In Codd's Relational Model:
a key is made up from the data
and
the rows in a table must be unique
Your "key" is not made up from the data. It is some additional, non-data parasite, caused by your being infected with the disease of your "teacher". Recognise it as such, and allow yourself the full mental capacity that God gave you (notice that I do not ask you to think in isolated or fragmented or abstract terms, all the elements in a database must be integrated with each other).
Make up a real key from the data, and only from the data. In this case, there is only one possible Key: (name_last, name_first).
Try this code, declare an unique constraint on the data:
-- [2] dumb_file fixed, elevated to table, prevents duplicate rows
-- still dumb
CREATE TABLE dumb_table (
id INT IDENTITY PRIMARY KEY,
name_first CHAR(30),
name_last CHAR(30),
CONSTRAINT UK
UNIQUE ( name_last, name_first )
)
INSERT dumb_table VALUES
( 'Mickey', 'Mouse' ),
( 'Minnie', 'Mouse' )
SELECT *
FROM dumb_table
INSERT dumb_table VALUES
( 'Mickey', 'Mouse' )
Now we have row uniqueness. That is the sequence that happens to most people: they create a file which allows dupes; they have no idea why dupes are appearing in the drop-downs; the user screams; they tweak the file and add an index to prevent dupes; they go to the next bug fix. (They may do so correctly or not, that is a different story.)
The second level. For thinking people who think beyond the fix-its. Since we have now row uniqueness, what in Heaven's name is the purpose of the ID field, why do we even have it ??? Oh, because the chihuahua is named Rotty and we are afraid to touch it.
The declaration that it is a PRIMARY KEY is false, but it remains, causing confusion and false expectations. The only genuine Key there is, is the (name_last, name_fist), and it is a Alternate Key at this point.
Therefore the ID field is totally superfluous; and so is the index that supports it; and so is the stupid AUTOINCREMENT; and so is the false declaration that it is a PRIMARY KEY; and any expectations you may have of it are false.
Therefore remove the superfluous ID field. Try this code:
-- [3] Relational Table
-- Now that we have prevented duplicate data, the id field
-- AND its additional index serves no purpose, it is superfluous,
-- like an udder on a bull. If we remove the field AND the
-- supporting index, we obtain a Relational table.
CREATE TABLE relational_table (
name_first CHAR(30),
name_last CHAR(30),
CONSTRAINT PK
PRIMARY KEY ( name_last, name_first )
)
INSERT relational_table VALUES
( 'Mickey', 'Mouse' ),
( 'Minnie', 'Mouse' )
SELECT *
FROM relational_table
INSERT relational_table VALUES
( 'Mickey', 'Mouse' )
Works just fine, works as intended, without the extraneous fields and indices.
Please remember this, and do it right, every single time.
2.3 False Teachers
In these end times, as advised, we will have many of them. Note well, the "teachers" who propagate ID columns, by virtue of the detailed evidence in this post, simply do not understand the Relational Model or Relational Databases. Especially those who write books about it.
As evidenced, they are stuck in pre-1970 ISAM technology. That is all they understand, and that is all that they can teach. They use an SQL database container, for the ease of Access, recovery, backup, etc, but the content is pure Record Filing System with no Relational Integrity, Power, or speed. AFAIC, it is a serious fraud.
In addition to ID fields, of course, there are several items that are key Relational-or-not concepts, that taken together, cause me to form such a grave conclusion. Those other items are beyond the scope of this post.
One particular pair of idiots is currently mounting an assault on First Normal Form. They belong in the asylum.
3 Solution
Now for the rest of your question.
3.1 Answers
Is there a way that I can create a relational table without losing auto increment features?
That is a self-contradicting sentence. I trust you will understand from my explanation, Relational tables have no need for AUTOINCREMENT "features"; if the file has AUTOINCREMENT, it is not a Relational table.
AUTOINCREMENT or IDENTITY is good for one thing only: if, and only if, you want to create an Excel spreadsheet in the SQL database container, replete with fields named A, B, and C, across the top, and record numbers down the left side. In database terms, that is the result of a SELECT, a flattened view of the data, that is not the source of data, which is organised (Normalised).
Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto increment statement, of course. Is it inevitable?
In technical work, we don't care about preferences, because that is subjective, and it changes all the time. We care about technical correctness, because that is objective, and it does not change.
Yes, it is inevitable. Because it is just a matter of time; number of bugs; number of "can't dos"; number of user screams, until you face the facts, overcome your false declarations, and realise that:
the only way to ensure that user rows are unique, that user_names are unique, is to declare an UNIQUE constraint on it
and get rid of user_id or id in the user file
which promotes user_name to PRIMARY KEY
Yes, because your entire problem with the third table, not coincidentally, is then eliminated.
That third table is an Associative Table. The only Key required (Primary Key) is a composite of the two parent Primary Keys. That ensures uniqueness of the rows, which are identified by their Keys, not by their IDs.
I am warning you about that because the same "teachers" who taught you the error of implementing ID fields, teach the error of implementing ID fields in the Associative Table, where, just as with an ordinary table, it is superfluous, serves no purpose, introduces duplicates, and causes confusion. And it is doubly superfluous because the two keys that provide are already there, staring us in the face.
Since they do not understand the RM, or Relational terms, they call Associative Tables "link" or "map" tables. If they have an ID field, they are in fact, files.
3.2 Lookup Tables
ID fields are particularly Stupid Thing to Do for Lookup or Reference tables. Most of them have recognisable codes, there is no need to enumerate the list of codes in them, because the codes are (should be) unique.
ENUM is just as stupid, but for a different reason: it locks you into an anti-SQL method, a "feature" in that non-compliant "SQL".
Further, having the codes in the child tables as FKs, is a Good Thing: the code is much more meaningful, and it often saves an unnecessary join:
SELECT ...
FROM child_table -- not the lookup table
WHERE gender_code = "M" -- FK in the child, PK in the lookup
instead of:
SELECT ...
FROM child_table
WHERE gender_id = 6 -- meaningless to the maintainer
or worse:
SELECT ...
FROM child_table C -- that you are trying to determine
JOIN lookup_table L
ON C.gender_id = L.gender_id
WHERE L.gender_code = "M" -- meaningful, known
Note that this is something one cannot avoid: you need uniqueness on the lookup code and uniqueness on the description. That is the only method to prevent duplicates in each of the two columns:
CREATE TABLE gender (
gender_code CHAR(2) NOT NULL,
name CHAR(30) NOT NULL
CONSTRAINT PK
PRIMARY KEY ( gender_code )
CONSTRAINT AK
UNIQUE ( name )
)
3.3 Full Example
From the details in your question, I suspect that you have SQL syntax and FK definition issues, so I will give the entire solution you need as an example (since you have not given file definitions):
CREATE TABLE user ( -- Typical Identifying Table
user_name CHAR(16) NOT NULL, -- Short PK
name_first CHAR(30) NOT NULL, -- Alt Key.1
name_last CHAR(30) NOT NULL, -- Alt Key.2
birth_date DATE NOT NULL -- Alt Key.3
CONSTRAINT PK -- unique user_name
PRIMARY KEY ( user_name )
CONSTRAINT AK -- unique person identification
PRIMARY KEY ( name_last, name_first, birth_date )
)
CREATE TABLE sport ( -- Typical Lookup Table
sport_code CHAR(4) NOT NULL, -- PK Short code
name CHAR(30) NOT NULL -- AK
CONSTRAINT PK
PRIMARY KEY ( sport_code )
CONSTRAINT AK
PRIMARY KEY ( name )
)
CREATE TABLE user_sport ( -- Typical Associative Table
user_name CHAR(16) NOT NULL, -- PK.1, FK
sport_code CHAR(4) NOT NULL, -- PK.2, FK
start_date DATE NOT NULL
CONSTRAINT PK
PRIMARY KEY ( user_name, sport_code )
CONSTRAINT user_plays_sport_fk
FOREIGN KEY ( user_name )
REFERENCES user ( user_name )
CONSTRAINT sport_occupies_user_fk
FOREIGN KEY ( sport_code )
REFERENCES sport ( sport_code )
)
There, the PRIMARY KEY declaration is honest, it is a Primary Key; no ID; no AUTOINCREMENT; no extra indices; no duplicate rows; no erroneous expectations; no consequential problems.
3.4 Relational Data Model
Here is the Data Model to go with the definitions.
As a PDF
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation.
A picture is worth a thousand words; in this case a standard-complaint picture is worth more than that; a bad one is not worth the paper it is drawn on.
Please check the Verb Phrases carefully, they comprise a set of Predicates. The remainder of the Predicates can be determined directly from the model. If this is not clear, please ask.
I am implementing a friends list for users in my database, where the list will store the friends accountID.
I already have a similar structure in my database for achievements where I have a separate table that has a pair of accountID to achievementID, but my concern with this approach is that it is inefficient because if there are 1 million users with 100 achievements each there are 100million entries in this table. Then trying to get every achievement for a user with a certain accountID would be a linear scan of the table (I think).
I am considering having a comma separated string of accountIDs for my friends list table, I realize how annoying it will be to deal with the data as a string, but at least it would be guaranteed to be log(n) search time for a user with accountID as the primary key and the second column being the list string.
Am I wrong about the search time for these two different structures?
MySQL can make effective use of appropriate indexes, for queries designed to use those indexes, avoiding a "scan" operation on the table.
If you are ALWAYS dealing with the complete set of achievements for a user, retrieving the entire set, and storing the entire set, then a comma separated list in a single column can be a workable approach.
HOWEVER... that design breaks down when you want to deal with individual achievements. For example, if you want to retrieve a list of users that have a particular achievement. Now, you're doing expensive full scans of all achievements for all users, doing "string searches", dependent on properly formatted strings, and MySQL is unable to use an index scan to efficiently retrieve that set.
So, the rule of thumb, if you NEVER need to individually access an achievement, and NEVER need to remove an achievement from user in the database, and NEVER need to add an individual achievement for a user, and you will ONLY EVER pull the achievements as an entire set, and only store them as an entire set, in and out of the database, the comma separated list is workable.
I hesitate to recommend that approach, because it never turns out that way. Inevitably, you'll want a query to get a list of users that have a particular achievement.
With the comma separated list column, you're into some ugly SQL:
SELECT a.user_id
FROM user_achievement_list a
WHERE CONCAT(',',a.list,',') LIKE '%,123,%'
ugly in the sense that MySQL can't use an index range scan to satisfy the predicate; MySQL has to look at EVERY SINGLE list of achievements, and then do a string scan on each and every one of them, from the beginning to the end, to find out if a row matches or not.
And it's downright excruciating if you want to use the individual values in that list to do a join operation, to "lookup" a row in another table. That SQL just gets horrendously ugly.
And declarative enforcement of data integrity is impossible; you can't define any foreign key constraints that restrict the values that are added to the list, or remove all occurrences of a particular achievement_id from every list it occurs in.
Basically, you're "giving up" the advantages of a relational data store; so don't expect the database to be able to do any work with that type of column. As far as the database is concerned, it's just a blob of data, might as well be .jpg image stored in that column, MySQL isn't going to help with retrieving or maintaining the contents of that list.
On the other hand, if you go with a design that stores the individual rows, each achievement for each user as a separate row, and you have an appropriate index available, the database can be MUCH more efficient at returning the list, and the SQL is more straightforward:
SELECT a.user_id
FROM user_achievements a
WHERE a.achievement_id = 123
A covering index would be appropriate for that query:
... ON user_achievements (achievement_id, user_id)
An index with user_id as the leading column would be suitable for other queries:
... ON user_achievements (user_id, achievement_id)
FOLLOWUP
Use EXPLAIN SELECT ... to see the access plan that MySQL generates.
For your example, retrieving all achievements for a given user, MySQL can do a range scan on the index to quickly locate the set of rows for the one user. MySQL doesn't need to look at every page in the index, the index is structured as a tree (at least, in the case of B-Tree indexes) so it can basically eliminate a whole boatload of pages it "knows" that the rows you are looking for can't be. And with the achievement_id also in the index, MySQL can return the resultset right from the index, without a need to visit the pages in the underlying table. (For the InnoDB engine, the PRIMARY KEY is the cluster key for the table, so the table itself is effectively an index.)
With a two column InnoDB table (user_id, achievement_id), with those two columns as the composite PRIMARY KEY, you would only need to add one secondary index, on (achievement_id, user_id).
FOLLOWUP
Q: By secondary index, do you mean a 3rd column that contains the key for the composite (userID, achievementID) table. My create table query looks like this
CREATE TABLE `UserFriends`
(`AccountID` BIGINT(20) UNSIGNED NOT NULL
,`FriendAccountID` BIGINT(20) UNSIGNED NOT NULL
,`Key` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT
, PRIMARY KEY (`Key`)
, UNIQUE KEY `AccountID` (`AccountID`, `FriendAccountID`)
);
A: No, I don't mean the addition of a third column. If the only two columns in the table are the foreign keys to another table (looks like they refer to the same table, and the columns are both NOT NULL and there is a UNIQUE constraint on the combination of the columns... and there are no other attributes on the table, I would consider not using a surrogate as the primary key at all. I would make the UNIQUE KEY the PRIMARY KEY.
Personally, I would be using InnoDB, with the innodb_file_per_table option enabled. And my table definition would look something like this:
CREATE TABLE user_friend
( account_id BIGINT(20) UNSIGNED NOT NULL COMMENT 'PK, FK ref account.id'
, friend_account_id BIGINT(20) UNSIGNED NOT NULL COMMENT 'PK, FK ref account.id'
, PRIMARY KEY (account_id, friend_account_id)
, UNIQUE KEY user_friend_UX1 (friend_account_id, account_id)
, CONSTRAINT FK_user_friend_user FOREIGN KEY (account_id)
REFERENCES account (id) ON UPDATE CASCADE ON DELETE CASCADE
, CONSTRAINT FK_user_friend_friend FOREIGN KEY (friend_account_id)
REFERENCES account (id) ON UPDATE CASCADE ON DELETE CASCADE
) Engine=InnoDB;
I am using a MySQL InnoDB database and have many tables in it. What I want to be able to do is enforce (from within the database) a constraint such that a key may exist in one of two columns (in two separate tables) but not both. I'll try to make this more clear.
Say I have two tables, TableA and TableB. Both of these tables have many columns, but they have one column in common, called SpecialID (int 255).
Now, both of these tables have many rows, and from the PHP side of the web app, the SpecialID column in TableA should never contain an integer that is in the SpecialID column of TableB, and the same goes the other way around. In other words, an integer should never be able to be found in the SpecialID column of TableA and TableB at any one time.
I'm fairly confident that I've enforced this from the PHP side, however I want to be able to enforce this relationship from within the database, just to be extra careful, as if I ever ended up with the same value in both tables, it would be catastrophic.
This may not even be possible, but I thought I'd throw it out there cos it seems like it could be. It would be sort of like a "foreign uniqueness constraint". I have done a bit of research but haven't turned up anything at all, not even people asking for something like this, so perhaps I could just be searching for the wrong thing?
Here's a solution:
CREATE TABLE Specials (
specialid INT AUTO_INCREMENT PRIMARY KEY,
type CHAR(1) NOT NULL,
UNIQUE KEY (id, type)
);
CREATE TABLE TableA (
id INT AUTO_INCREMENT PRIMARY KEY,
specialid INT NOT NULL,
type CHAR(1) NOT NULL DEFAULT 'A',
FOREIGN KEY (specialid, type) REFRENCES Specials(specialid, type)
);
CREATE TABLE TableB (
id INT AUTO_INCREMENT PRIMARY KEY,
specialid INT NOT NULL,
type CHAR(1) NOT NULL DEFAULT 'B',
FOREIGN KEY (specialid, type) REFRENCES Specials(specialid, type)
);
Now you need to make sure TableA.type is always 'A' and TableB.type is always 'B'. You can do this with a trigger, or else a foreign key to a lookup table of one row for each case.
The result is that Specials.type can be any letter, but only one letter for a given specialid. The rows in TableA and TableB can reference only a specialid with a type that matches their own type. This means that any given specialid can be referenced by only one table or the other, but never both.
From what I've found, there is no easy way to do this. As I cannot find anything and no one has provided a solution to the issue, I will assume that it is in fact not possible (at least, not very easily).
For my own project, I ended up going along a different route to achieve my goal.
The discussion at this SO question may be of use for anyone searching for something along these lines:
Enforce unique values across two tables
I have not tried it myself, though.
If I ever come across anything better (or someone posts a better answer here) then I shall update my response and/or mark someone else's answer as correct as necessary.
I am really beginner when it's about databases and I am facing a problem.
I have a table with many rows. Each row got a primary key, called MY_KEY. MY_KEY is CHAR(20).
I have another table. Each row, in one of the fields will have many MY_KEY separated by space and stored as TEXT, but never the same MY_KEY twice.
I am not sure i explained this well, but how can I design those two tables to be more performance efficient?
My program will take the TEXT and add it to a vector and then binary search it. This will be slow if there are 1000 20 characters MY_KEY.
Don't store delimited values in the database! Normalize your data by introducing many-to-many table.
Can you tell me the way to improve this design, please?
Your schema might look something like
CREATE TABLE table1
(
table1_key CHAR(20) NOT NULL PRIMARY KEY,
-- other columns
);
CREATE TABLE table2
(
table2_key CHAR(20) NOT NULL PRIMARY KEY,
-- other columns
);
CREATE TABLE table2_table1
(
table2_key CHAR(20),
table1_key CHAR(20),
PRIMARY KEY (table2_key, table1_key),
FOREIGN KEY(table2_key) REFERENCES table2 (table2_key),
FOREIGN KEY(table1_key) REFERENCES table1 (table1_key)
);
Here is SQLFiddle demo
Check out some readings on database normalization. The basic idea is that you don't want to have any column that stores more than one piece of data. While this isn't an absolute rule, it's a good rule of thumb, and will probably be more performant than what you're describing.
Instead of one row with a bunch of associated keys, consider having a bunch of rows with the pairs of associated keys. This is a superior way to represent a many to many relation in a relational database. You can do a join to retrieve the data.