SQL: Creating a Relational table with 2 different auto_increment - mysql
I have 2 tables, each with their own auto incremented IDs, which are of course primary keys.
When I want to create a 3rd table to establish the relation between these 2 tables, I always have an error.
First one is that you can have only 1 automatically-incremented column, the second one occurs when I delete the auto_increment statement from those 2, therefore AQL doesn't allow me to make them foreign keys, because of the type matching failure.
Is there a way that I can create a relational table without losing auto increment features?
Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto_increment statement, of course. Is it inevitable?
Thanks in advance.
1 Concept
You have misunderstood some basic concepts, and the difficulties result from that. We have to address the concepts first, not the problem as you perceive it, and consequently, your problem will disappear.
auto incremented IDs, which are of course primary keys.
No, they are not. That is a common misconception. And problems are guaranteed to ensue.
An ID field cannot be a Primary Key in the English or technical or Relational senses.
Sure, in SQL, you can declare any field to be a PRIMARY KEY, but that doesn't magically transform it into a Primary Key in the English, technical, or Relational senses. You can name a chihuahua "Rottweiller", but that doesn't transform it into a Rottweiller, it remains a chihuahua. Like any language, SQL simply executes the commands that you give it, it does not understand PRIMARY KEY to mean something Relational, it just whacks an unique index on the column (or field).
The problem is, since you have declared the ID to be a PRIMARY KEY, you think of it as a Primary Key, and you may expect that it has some of qualities of a Primary Key. Except for the uniqueness of the ID value, it provides no benefit. It has none of the qualities of a Primary Key, or any sort of Relational Key for that matter. It is not a Key in the English, technical, or Relational senses. By declaring a non-key to be a key, you will only confuse yourself, and you will find out that there is something terribly wrong only when the user complains about duplicates in the table.
2 Relational Model
2.1 Relational tables must have row uniqueness
A PRIMARY KEY on an ID field does not provide row uniqueness. Therefore it is not a Relational table containing rows, and if it isn't that, then it is a file containing records. It doesn't have any of the integrity, or power (at this stage you will be aware of join power only), or speed, that a table in a Relational database has.
Execute this code (MS SQL) and prove it to yourself. Please do not simply read this and understand it, and then proceed to read the rest of this Answer, this code must be executed before reading further. It has curative value.
-- [1] Dumb, broken file
-- Ensures unique RECORDS, allows duplicate ROWS
CREATE TABLE dumb_file (
id INT IDENTITY PRIMARY KEY,
name_first CHAR(30),
name_last CHAR(30)
)
INSERT dumb_file VALUES
( 'Mickey', 'Mouse' ),
( 'Mickey', 'Mouse' ),
( 'Mickey', 'Mouse' )
SELECT *
FROM dumb_file
Notice that you have duplicate rows. Relational tables are required to have unique rows. Further proof that you do not have a relational table, or any of the qualities of one.
Notice that in your report, the only thing that is unique is the ID field, which no user cares about, no user sees, because it is not data, it is some additional nonsense that some very stupid "teacher" told you to put in every file. You have record uniqueness but not row uniqueness.
In terms of the data (the real data minus the extraneous additions), the data name_last and name_first can exist without the ID field. A person has a first name and last name without an ID being stamped on their forehead.
The second thing that you are using that confuses you is the AUTOINCREMENT. If you are implementing a record filing system with no Relational capability, sure, it is helpful, you don't have to code the increment when inserting records. But if you are implementing a Relational Database, it serves no purpose at all, because you will never use it. There are many features in SQL that most people never use.
2.2 Corrective Action
So how do you upgrade, elevate, that dumb_file that is full of duplicate rows to a Relational table, in order to get some of the qualities and benefits of a Relational table ? There are three steps to this.
You need to understand Keys
And since we have progressed from ISAM files of the 1970's, to the Relational Model, you need to understand Relational Keys. That is, if you wish to obtain the benefits (integrity, power, speed) of a Relational Database.
In Codd's Relational Model:
a key is made up from the data
and
the rows in a table must be unique
Your "key" is not made up from the data. It is some additional, non-data parasite, caused by your being infected with the disease of your "teacher". Recognise it as such, and allow yourself the full mental capacity that God gave you (notice that I do not ask you to think in isolated or fragmented or abstract terms, all the elements in a database must be integrated with each other).
Make up a real key from the data, and only from the data. In this case, there is only one possible Key: (name_last, name_first).
Try this code, declare an unique constraint on the data:
-- [2] dumb_file fixed, elevated to table, prevents duplicate rows
-- still dumb
CREATE TABLE dumb_table (
id INT IDENTITY PRIMARY KEY,
name_first CHAR(30),
name_last CHAR(30),
CONSTRAINT UK
UNIQUE ( name_last, name_first )
)
INSERT dumb_table VALUES
( 'Mickey', 'Mouse' ),
( 'Minnie', 'Mouse' )
SELECT *
FROM dumb_table
INSERT dumb_table VALUES
( 'Mickey', 'Mouse' )
Now we have row uniqueness. That is the sequence that happens to most people: they create a file which allows dupes; they have no idea why dupes are appearing in the drop-downs; the user screams; they tweak the file and add an index to prevent dupes; they go to the next bug fix. (They may do so correctly or not, that is a different story.)
The second level. For thinking people who think beyond the fix-its. Since we have now row uniqueness, what in Heaven's name is the purpose of the ID field, why do we even have it ??? Oh, because the chihuahua is named Rotty and we are afraid to touch it.
The declaration that it is a PRIMARY KEY is false, but it remains, causing confusion and false expectations. The only genuine Key there is, is the (name_last, name_fist), and it is a Alternate Key at this point.
Therefore the ID field is totally superfluous; and so is the index that supports it; and so is the stupid AUTOINCREMENT; and so is the false declaration that it is a PRIMARY KEY; and any expectations you may have of it are false.
Therefore remove the superfluous ID field. Try this code:
-- [3] Relational Table
-- Now that we have prevented duplicate data, the id field
-- AND its additional index serves no purpose, it is superfluous,
-- like an udder on a bull. If we remove the field AND the
-- supporting index, we obtain a Relational table.
CREATE TABLE relational_table (
name_first CHAR(30),
name_last CHAR(30),
CONSTRAINT PK
PRIMARY KEY ( name_last, name_first )
)
INSERT relational_table VALUES
( 'Mickey', 'Mouse' ),
( 'Minnie', 'Mouse' )
SELECT *
FROM relational_table
INSERT relational_table VALUES
( 'Mickey', 'Mouse' )
Works just fine, works as intended, without the extraneous fields and indices.
Please remember this, and do it right, every single time.
2.3 False Teachers
In these end times, as advised, we will have many of them. Note well, the "teachers" who propagate ID columns, by virtue of the detailed evidence in this post, simply do not understand the Relational Model or Relational Databases. Especially those who write books about it.
As evidenced, they are stuck in pre-1970 ISAM technology. That is all they understand, and that is all that they can teach. They use an SQL database container, for the ease of Access, recovery, backup, etc, but the content is pure Record Filing System with no Relational Integrity, Power, or speed. AFAIC, it is a serious fraud.
In addition to ID fields, of course, there are several items that are key Relational-or-not concepts, that taken together, cause me to form such a grave conclusion. Those other items are beyond the scope of this post.
One particular pair of idiots is currently mounting an assault on First Normal Form. They belong in the asylum.
3 Solution
Now for the rest of your question.
3.1 Answers
Is there a way that I can create a relational table without losing auto increment features?
That is a self-contradicting sentence. I trust you will understand from my explanation, Relational tables have no need for AUTOINCREMENT "features"; if the file has AUTOINCREMENT, it is not a Relational table.
AUTOINCREMENT or IDENTITY is good for one thing only: if, and only if, you want to create an Excel spreadsheet in the SQL database container, replete with fields named A, B, and C, across the top, and record numbers down the left side. In database terms, that is the result of a SELECT, a flattened view of the data, that is not the source of data, which is organised (Normalised).
Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto increment statement, of course. Is it inevitable?
In technical work, we don't care about preferences, because that is subjective, and it changes all the time. We care about technical correctness, because that is objective, and it does not change.
Yes, it is inevitable. Because it is just a matter of time; number of bugs; number of "can't dos"; number of user screams, until you face the facts, overcome your false declarations, and realise that:
the only way to ensure that user rows are unique, that user_names are unique, is to declare an UNIQUE constraint on it
and get rid of user_id or id in the user file
which promotes user_name to PRIMARY KEY
Yes, because your entire problem with the third table, not coincidentally, is then eliminated.
That third table is an Associative Table. The only Key required (Primary Key) is a composite of the two parent Primary Keys. That ensures uniqueness of the rows, which are identified by their Keys, not by their IDs.
I am warning you about that because the same "teachers" who taught you the error of implementing ID fields, teach the error of implementing ID fields in the Associative Table, where, just as with an ordinary table, it is superfluous, serves no purpose, introduces duplicates, and causes confusion. And it is doubly superfluous because the two keys that provide are already there, staring us in the face.
Since they do not understand the RM, or Relational terms, they call Associative Tables "link" or "map" tables. If they have an ID field, they are in fact, files.
3.2 Lookup Tables
ID fields are particularly Stupid Thing to Do for Lookup or Reference tables. Most of them have recognisable codes, there is no need to enumerate the list of codes in them, because the codes are (should be) unique.
ENUM is just as stupid, but for a different reason: it locks you into an anti-SQL method, a "feature" in that non-compliant "SQL".
Further, having the codes in the child tables as FKs, is a Good Thing: the code is much more meaningful, and it often saves an unnecessary join:
SELECT ...
FROM child_table -- not the lookup table
WHERE gender_code = "M" -- FK in the child, PK in the lookup
instead of:
SELECT ...
FROM child_table
WHERE gender_id = 6 -- meaningless to the maintainer
or worse:
SELECT ...
FROM child_table C -- that you are trying to determine
JOIN lookup_table L
ON C.gender_id = L.gender_id
WHERE L.gender_code = "M" -- meaningful, known
Note that this is something one cannot avoid: you need uniqueness on the lookup code and uniqueness on the description. That is the only method to prevent duplicates in each of the two columns:
CREATE TABLE gender (
gender_code CHAR(2) NOT NULL,
name CHAR(30) NOT NULL
CONSTRAINT PK
PRIMARY KEY ( gender_code )
CONSTRAINT AK
UNIQUE ( name )
)
3.3 Full Example
From the details in your question, I suspect that you have SQL syntax and FK definition issues, so I will give the entire solution you need as an example (since you have not given file definitions):
CREATE TABLE user ( -- Typical Identifying Table
user_name CHAR(16) NOT NULL, -- Short PK
name_first CHAR(30) NOT NULL, -- Alt Key.1
name_last CHAR(30) NOT NULL, -- Alt Key.2
birth_date DATE NOT NULL -- Alt Key.3
CONSTRAINT PK -- unique user_name
PRIMARY KEY ( user_name )
CONSTRAINT AK -- unique person identification
PRIMARY KEY ( name_last, name_first, birth_date )
)
CREATE TABLE sport ( -- Typical Lookup Table
sport_code CHAR(4) NOT NULL, -- PK Short code
name CHAR(30) NOT NULL -- AK
CONSTRAINT PK
PRIMARY KEY ( sport_code )
CONSTRAINT AK
PRIMARY KEY ( name )
)
CREATE TABLE user_sport ( -- Typical Associative Table
user_name CHAR(16) NOT NULL, -- PK.1, FK
sport_code CHAR(4) NOT NULL, -- PK.2, FK
start_date DATE NOT NULL
CONSTRAINT PK
PRIMARY KEY ( user_name, sport_code )
CONSTRAINT user_plays_sport_fk
FOREIGN KEY ( user_name )
REFERENCES user ( user_name )
CONSTRAINT sport_occupies_user_fk
FOREIGN KEY ( sport_code )
REFERENCES sport ( sport_code )
)
There, the PRIMARY KEY declaration is honest, it is a Primary Key; no ID; no AUTOINCREMENT; no extra indices; no duplicate rows; no erroneous expectations; no consequential problems.
3.4 Relational Data Model
Here is the Data Model to go with the definitions.
As a PDF
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation.
A picture is worth a thousand words; in this case a standard-complaint picture is worth more than that; a bad one is not worth the paper it is drawn on.
Please check the Verb Phrases carefully, they comprise a set of Predicates. The remainder of the Predicates can be determined directly from the model. If this is not clear, please ask.
Related
What would be the best table structure for variable amount of combination?
I need some advice for the choice of my table structure. I am working on a project where I need to save values that are a combination of a variable amount of other values. For example: A = b,c,d B = z,r I was thinking on saving the combinations in a json object inside a column but I am afraid it can be long for big requests and not easy for filtering. There was also the solution of having a multiple amount of columns (containing null when not necessary), but this will not be a good representation of the data, also filtering will be hard. Finally I thought the best would be many to many relations, but the joins might be too heavy, are they ? Do you see any other alternative (besides switching to nosql) ?
This shows the use of Junction tables to avoid saving data in comma separated lists, json, or other mechanisms that would be problematic in at least these areas: Tables-scans (slowness, non-use of fast indexes) Maintenance of data Data integrity Schema create table cat ( -- categories id int auto_increment primary key, code varchar(20) not null, description varchar(255) not null ); create table subcat ( -- sub categories id int auto_increment primary key, code varchar(20) not null, description varchar(255) not null ); create table csJunction ( -- JUNCTION table for cat / sub categories -- Note: you could ditch the id below, and go with composite PK on (catId,subCatId) -- but this makes the PK (primary key) thinner when used elsewhere id int auto_increment primary key, catId int not null, subCatId int not null, CONSTRAINT fk_csj_cat FOREIGN KEY (catId) REFERENCES cat(id), CONSTRAINT fk_csj_subcat FOREIGN KEY (subCatId) REFERENCES subcat(id), unique key (catId,subCatId) -- prevents duplicates ); insert cat(code,description) values('A','descr for A'),('B','descr for B'); -- id's 1,2 respectively insert subcat(code,description) values('b','descr for b'),('c','descr for c'),('d','descr for d'); -- id's 1,2,3 insert subcat(code,description) values('r','descr for r'),('z','descr for z'); -- id's 4,5 -- Note that due to the thinness of PK's, chosen for performance, the below is by ID insert csJunction(catId,subCatId) values(1,1),(1,2),(1,3); -- A gets a,b,c insert csJunction(catId,subCatId) values(2,4),(2,5); -- B gets r,z Good Errors The following errors are good and expected, data is kept clean insert csJunction(catId,subCatId) values(2,4); -- duplicates not allowed (Error: 1062) insert csJunction(catId,subCatId) values(13,4); -- junk data violates FK constraint (Error: 1452) Other comments In response to your comments, data is cached only in so far as mysql has a Most Recently Used (MRU) strategy, no more or less than any data cached in memory versus physical lookup. The fact that B may contain not only z,r at the moment, but it could also contain c as does A, does not mean there is a repeat. And as seen in the schema, no parent can duplicate its containment (or repeat) of a child, which would be a data problem anyway. Note that one could easily go the route of PK's in cat and subcat using the code column. That would unfortunately cause wide indexes, and even wider composite indexes for the junction table. That would slow operations down considerably. Though the data maintenance could be visually more appealing, I lean toward performance over appearance any day. I will add to this Answer when time permits to show such things as "What categories contain a certain subcategory", deletes, etc.
Insert Data into multiple tables in MySQL
Consider two tables User and UserDetails User (UserID,Name,Password) UserDetails(UserID,FullName, Mobile Number,EMail) First I will enter details into User table Then Afterwards I wish to enter details into UserDetails Table with respect to primary key of first table i.e., UserID which is autoincremented. consider this scenario.. User: (101, abc, xyz), (102,asd,war) Now i want to store details in second table with respect to Primary key where UserID= 102 How can I accomplish this?
Start over with the design. Here is a start that runs through and doesn't blow up. Do the same for email. Keep data normalized and don't cause unnecessary lookups. When you have a lot of constraints, it is a sign that you care about the quality of your data. Not that you don't without constraints, if they are un-constrainable. We all read on the internet how we should keep main info in one table and details in another. Nice as a broad brush stroke. But yours does not rise to that level. Yours would have way too many tables. See Note1 at bottom about about Entities. See Note2 at bottom about performance. See any of us with any broad or specific question you may have. create table user ( userId int auto_increment primary key, fullName varchar(100) not null -- other columns ); create table phoneType ( phoneType int auto_increment primary key, -- here is the code long_description varchar(100) not null -- other columns ); create table userPhone ( id int auto_increment primary key, userId int not null, phone varchar(20) not null, phoneType int not null, -- other columns CONSTRAINT fk_up_2_user FOREIGN KEY (userId) REFERENCES user(userId), CONSTRAINT fk_up_2_phoneType FOREIGN KEY (phoneType) REFERENCES phoneType(phoneType) ); Note1: I suspect that your second table as you call it is really a third table, as you try to bring in missing information that really belongs in the Entity. Entities Many have come before you crafting our ideas as we slug it out in design. Many bad choices have been made and by yours truly. A good read is third normal form (3NF) for data normalization techniques. Note2: Performance. Performance needs to be measured both in real-time user and in developer problem solving of data that has run amok. Many developers spend significant time doing data patches for schemas that did not enforce data integrity. So factor that into performance, because those hours add up in those split seconds of User Experience (UX).
You can try this:- INSERT INTO userDetails (SELECT UserID, Name FROM User WHERE USerID= 102), 'Mob_No', EMail;
Multiple foreign keys from the same table
i'm building a DB of a graduation projects management system. Students are divided into groups .There is groups table and faculty table. Each group has an advisor and two examiners. i'm confused here. Should i create 3 FKs from the the faculty table? 2 for examiners and 1 for advisor? here is the SQL code: create table groups ( groupID NUMBER not null, nbStudents NUMBER not null, avgGPA DOUBLE NOT NULL, projectName varchar(50) not null, advisorID NUMBER examiner1ID NUMBER examiner2ID NUMBER primary key (groupID) ); create table faculty ( name varchar(30) not null, facultyID NUMBER(10) not null, email varchar(30) not null, mobile NUMBER(15) not null, type varchar primary key (facultyID) ); alter table groups add constraint FK_EX1 foreign key (examiner1ID) references faculty (facultyID) ; alter table groups add constraint FK_EX1 foreign key (examiner2ID) references faculty (facultyID) ; alter table groups add constraint FK_EX1 foreign key (advisorID) references faculty (facultyID) ;
EDIT PENDING... see my first comment. Just state the foreign keys as you find them. A foreign key says that a value in a column in a table must appear as a value of a column in another (possibly the same) table where corresponding columns form a key. So in the given design just declare the FKs as you find them. Although these aren't really FKs. First, in SQL a FK declaration actually declares a foreign superkey. Second, because those columns can be NULL. SQL says how it's going to check them, and it doesn't check when columns are NULL, and that's what you want. But that constraint just isn't a foreign (super)key. We just call it that in an SQL database. Find statements that describe your application situtations then normalize. It not non-normalized to have multiple columns per se. That is a common misconception. However its generally contraindicated for at least an ideal design. Just find a parameterized statement parameterized by column names for every thing you need to say about a situation. Each statement gets a table. // group [groupID] contains [nbStudents] students .... and has advisor [advisorID] and ... groups(groupID,nbStudents,...,advisorID,examinerID) The rows that make the statement true go in the table. Find all the statements you need to describe your application situations. Fill the tables with the rows that make their statements true. Find simple statements and rearrange for NULL later. Notice that the above statement is only true for rows with no NULLs. But you want to say sometimes that no faculty are in those roles. Ideally you just want // group [groupID] contains [nbStudents] students ... [projectName]) groups(groupID,nbStudents,...,projectName) // [facultyID] advises [groupID] advises(facultyID,groupID) // [facultyID] examines [groupID] examines(faculty,groupID) With constraints about numbers of faculty per group. If you properly write a relational design without nulls then normalize you will get this sort of simple thing. Don't worry about the number of statements/tables. They just reflect the complexity of the application. But SQL DBMSs generally don't suport constraints easily. So for certain reasons to do with SQL or performance we might to rearrange. But design null-free first. Ie pick straightforward statements & then normalize. Then rearrange soundly. (SOme rearranging might de-normalize, but not this particular case.) Nulls complicate. One problem with nulls is they complicate table meanings. Your design with nulls has table group holding the rows that make this statement true: //* group [groupID] contains [nbStudents] students .... AND ( [advisorID IS NULL they have no advisor OR [advisorID] IS NOT NULL AND advisor [facultyID] advises them) AND ( [examiner1ID IS NULL AND [examiner2ID] IS NULL and they have no examiner OR [examiner1ID] IS NOT NULL AND [examiner2ID] IS NULL AND [examiner1ID] examines them OR [examiner1ID] IS NULL AND [examiner2ID] IS NOT NULL AND [examiner2ID] examines them OR [examiner1ID] IS NOT NULL AND [examiner2ID] IS NOT NULL AND [examiner1ID] examines them AND [examiner2ID] examines them) *// groups(groupID,nbStudents,...,advisorID,examinerID) Unless you cut out nulls back to the simple tables above when querying, your query meanings are complicated like this too. Ie queries give rows that make statements like that true. On top of that when nulls are left in SQL gives you complex answers that do not mean "... and faculty unknown". People have intuitive understanding of such nulls in base tables. But design first simply and soundly. Rearrange later. Be sure you properly cut out null-free parts and leave in null-free parts when you query.
MySQL - autoincrement + compound primary key - performance & integrity
I have a database design that makes use of compound primary keys to ensure uniqueness and which are also foreign keys. These tables are then linked to other tables in the same way, so that in the end the compound key can get up to 4 or 5 columns. This led to some rather large JOINs, so I thought a simple solution would be to use an autoincrement column which is not part of the primary key but which is used as part of the primary key of other table(s). Here is some pseudo code showing the general layout : CREATE TABLE Item ( id AUTO_INCREMENT, ... PRIMARY KEY (id) ) ENGINE = InnoDB; CREATE TABLE PriceCategory ( id AUTO_INCREMENT, ... PRIMARY KEY (id) ) CREATE TABLE ItemPriceCategory ( itemId, priceCategoryId, id AUTO_INCREMENT, ... UNIQUE INDEX id, PRIMARY KEY (eventId, priceCategoryId) ) CREATE TABLE ClientType ( id AUTO_INCREMENT, ... PRIMARY KEY (id) ) CREATE TABLE Price ( itemPriceCategoryId, clientTypeId, id AUTO_INCREMENT, ... UNIQUE INDEX id, PRIMARY KEY (itemPriceCategoryId, clientTypeId) ) table Purchase ( priceId, userId, amount, PRIMARY KEY (priceId, userId) ) The names of tables have been changed to protect the innocent ;-) Also the actual layout is a little deeper in terms of references. So, my question is, is this a viable strategy, from a performance and data integrity point of view ? Is it better to have all keys from all the referenced tables in the Purchase table ? Thanks in advance.
Generally, the advice on primary keys is to have "meaningless", immutable primary keys with a single column. Auto incrementing integers are nice. So, I would reverse your design - your join tables should also have meaningless primary keys. For instance: CREATE TABLE ItemPriceCategory ( itemId, priceCategoryId, id AUTO_INCREMENT, ... PRIMARY KEY id, UNIQUE INDEX (eventId, priceCategoryId) ) That way, the itemPriceCategoryId column in price is a proper foreign key, linking to the primary key of the ItemPriceCategory table. You can then use http://dev.mysql.com/doc/refman/5.5/en/innodb-foreign-key-constraints.html foreign keys to ensure the consistency of your database. In terms of performance, broadly speaking, this strategy should be faster than querying compound keys in a join, but with a well-indexed database, you may not actually notice the difference...
I think that something has been lost in translation over here, but I did my best to make an ER diagram of this. In general, there are two approaches. The first one is to propagate keys and the second one is to have an auto-increment integer as a PK for each table. The second approach is often driven by ORM tools which use a DB as object-persistence storage, while the first one (using key propagation) is more common for hand-crafted DB design. In general, the model with key propagation offers better performance for "random queries", mostly because you can "skip tables" in joins. For example, in the model with key propagation you can join the Purchase table directly to the Item table to report purchases by ItemName. In the other model you would have to join Price and ItemPriceCategory tables too -- just to get to the ItemID. Basically, the model with key propagation is essentially relational -- while the other one is object-driven. ORM tools either prefer or enforce the model with separate ID (second case), but offer other advantages for development. Your example seems to be trying to use some kind of a combination of these two -- not necessarily bad, it would help if you could talk to original designer. With key propagation Independent keys for each table
Is string or int preferred for foreign keys?
I have a user table with userid and username columns, and both are unique. Between userid and username, which would be better to use as a foreign key and why? My Boss wants to use string, is that ok?
Is string or int preferred for foreign keys? It depends There are many existing discussions on the trade-offs between Natural and Surrogate Keys - you will need to decide on what works for you, and what the 'standard' is within your organisation. In the OP's case, there is both a surrogate key (int userId) and a natural key (char or varchar username). Either column can be used as a Primary key for the table, and either way, you will still be able to enforce uniqueness of the other key. Here are some considerations when choosing one way or the other: The case for using Surrogate Keys (e.g. UserId INT AUTO_INCREMENT) If you use a surrogate, (e.g. UserId INT AUTO_INCREMENT) as the Primary Key, then all tables referencing table MyUsers should then use UserId as the Foreign Key. You can still however enforce uniqueness of the username column through use of an additional unique index, e.g.: CREATE TABLE `MyUsers` ( `userId` int NOT NULL AUTO_INCREMENT, `username` varchar(100) NOT NULL, ... other columns PRIMARY KEY(`userId`), UNIQUE KEY UQ_UserName (`username`) As per #Dagon, using a narrow primary key (like an int) has performance and storage benefits over using a wider (and variable length) value like varchar. This benefit also impacts further tables which reference MyUsers, as the foreign key to userid will be narrower (fewer bytes to fetch). Another benefit of the surrogate integer key is that the username can be changed easily without affecting tables referencing MyUsers. If the username was used as a natural key, and other tables are coupled to MyUsers via username, it makes it very inconvenient to change a username (since the Foreign Key relationship would otherwise be violated). If updating usernames was required on tables using username as the foreign key, a technique like ON UPDATE CASCADE is needed to retain data integrity. The case for using Natural Keys (i.e. username) One downside of using Surrogate Keys is that other tables which reference MyUsers via a surrogate key will need to be JOINed back to the MyUsers table if the Username column is required. One of the potential benefits of Natural keys is that if a query requires only the Username column from a table referencing MyUsers, that it need not join back to MyUsers to retrieve the user name, which will save some I/O overhead.
An int is 4 bytes, a string can be as many bytes as you like. Because of that, an int will always perform better. Unless ofcourse if you stick with usernames that are less than 4 characters long :) Besides, you should never use a column as PK/FK if the data within the column itself can change. Users tend to change their usernames, and even if that functionality doesn't exist in your app right now, maby it will in a few years. When that day comes, you might have 1000 tables that reference that user-table, and then you'll have to update all 1000 tables within a transaction, and that's just bad.
int will index faster, may or may not be an issue, hard to say based on what you have provided
It depends on the foreign key: If your company has control over it, then I recommend using an Int if there is an ID field for it. However, sometimes an ID field is not on a table because another key makes sense as an alternate unique key. So, the ID field might be a surrogate key in that case. Rule of thumb: Your foreign key data type should match your primary key data type. Here's an exception: what about foreign keys that don't belong to your company? What about foreign keys to databases and APIs that you have no control over? Those IDs should always be strings IMO. To convince you, I ask these questions: Are you doing math on it? Are you incrementing it? Do you have control over it? APIs are notorious for change, even data types CAN be changed in someone else's database... so how much will it mess you up when an int ID becomes a hex?