Not really a DBA, but was tasked with designing a couple new tables for a new feature in a web app. DB is MySQL, using NHibernate as ORM (though that's probably irrelevant to the question).
I'm going to be modelling various, "scenarios" which represent different variations of several designs in the app. Aside from the first scenario & "unstarted" scenarios, each scenario will have a parent scenario they're building from. As a result, will end up with a sort of "no-loop / no-merge" tree structure as scenarios are branched from one another.
CREATE TABLE `scenarios` (
`ScenarioID` INT NOT NULL AUTO_INCREMENT,
`DesignID` INT DEFAULT NULL,
`ScenarioTreeID` INT NOT NULL,
`ParentScenarioID` INT DEFAULT NULL,
`Title` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
...
In addition to the scenarios themselves, there's information that's best related to the entire "Tree" of scenarios (e.g. what structure are the scenarios related to, etc). I've tried to factor this data out into another table called scenariotree and reference it from scenarios via ScenarioTreeID. The issue I ran into was, from a querying perspective, that it'd be important to know what the "root scenario" is when I query the tree (I can't just go WHERE ParentScenarioID is NULL as that includes "unstarted" scenarios). So I tried to set up the table as such:
CREATE TABLE `scenariotree` (
`ScenarioTreeID` INT NOT NULL AUTO_INCREMENT,
`StructureID` INT NOT NULL,
`RootScenario` INT DEFAULT NULL,
...
But then I couldn't create either table due to the circular foreign key references. I realise I can create the tables first & then add the foreign keys in (or just turn FK checks off & then on again when I'm finished), but should I be doing this? Poking around online I'm finding conflicting opinions. Basically what I want to ask is:
"Is this acceptable schema design, or am I going to run into issues down the road? If so, what issues am I likely to have & how might I restructure these tables to avoid them?"
It's fine to have circular references. They are less common than not have cycles, but they are legitimate to model some data structures.
They do require some special handling, as you discovered. That's okay and it's necessary.
You already identified two ways of handling them:
SET FOREIGN_KEY_CHECKS=0; temporarily while you insert the mutually-depended data. One problem with this is that some people forget to re-enable the checks, and then some weeks later discover that their data is full of references that point to non-existing data.
Create the table first, then use ALTER TABLE to add the foreign keys after you populate the data. The problem here is that if you need to add new rows to existing tables, you'd have to drop the foreign keys and re-add them every time, and this affects all clients, not just your session.
A couple of other options:
Make one or the other foreign key nullable. When you need to insert mutually-dependent rows in the two tables, insert the one with nullable FK first, and use a NULL. Then insert to the other table. Then UPDATE the first table to assign the non-NULL value it should reference.
Finally, don't use FOREIGN KEY constraints. You will have columns that reference other columns, but it's sort of on the "honor system" instead of having a RDBMS-enforced constraint. This comes with its own risks of course, because any data that is supposed to be a foreign key has no assurance that it correct. But it gives you total freedom to insert in whatever order you need to. You can use a transaction to make sure inserts to both tables happen together.
Related
If I create two tables and I want to set one column as foreign key to another table column why the hell am I allowed to set foreign key column datatype?
It just doesn't make any sense or am I missing something? Is there any scenario where column with foreign keys has different datatype on purpose?
Little more deeper about my concerns, I tried to use pgadmin to build some simple Postgres DB. I made first table with primary key serial datatype. Then I tried to make foreign key but what datatype? I have seen somewhere serial is bigint unsigned. But this option doesn't even exists in pgadmin. Of course I could use sql but then why am I using gui? So I tried Navicat instead, same problem. I feel like with every choice I do another mistake in my DB design...
EDIT:
Perhaps I asked the question wrong way.
I was allowed to do build structure:
CREATE TABLE user
(
id bigint NOT NULL,
CONSTRAINT user_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE book
(
user integer,
CONSTRAINT dependent_user_fkey FOREIGN KEY (user)
REFERENCES user (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
I insert some data to table user:
INSERT INTO user(id)
VALUES (5000000000);
But I can't cast following insert:
INSERT INTO book(user)
VALUES (5000000000);
with ERROR: integer out of range which is understandable, but obvious design error.
And my question is: Why when we set CONSTRAINT, data types are not being validated. If I'm wrong, answer should contain scenario where it is useful to have different data types.
Actually it does make sense here is why:
In a table, you can in fact set any column as its primary key. So it could be integer, double, string, etc. Even though nowadays, we mostly use either integers or, more recently, strings as primary key in a table.
Since the foreign key is pointing to another table's primary key, this is why you need to specify the foreign key's datatype. And it obviously needs to be the same datatype.
EDIT:
SQL implementations are lax on this case as we can see: they do allow compatible types (INT and BIG INT, Float or DECIMAL and DOUBLE) but at your own risk. Just as we can see in your example, below.
However, SQL norms do specify that both datatypes must be the same.
If datatype is character, they must have the same length, otherwise, if it is integer, they must have the same size and must both be signed or both unsigned.
You can see by yourself over here, a chapter from a MySQL book published in 2003.
Hope this answers your question.
To answer your question of why you'd ever want different type for a foreign vs. primary key...here is one scenario:
I'm in a situation where an extremely large postgres table is running out of integer values for its id sequence. Lots of other, equally large tables have a foreign key to that parent table.
We are upsizing the ID from integer to bigint, both in the parent table and all the child tables. This requires a full table rewrite. Due to the size of the tables and our uptime commitments and maintenance window size, we cannot rewrite all these tables in one window. We have about three months before it blows up.
So between maintenance windows, we will have primary keys and foreign keys with the same numeric value, but different size columns. This works just fine in our experience.
Even outside an active migration strategy like this, I could see creating a new child table with a bigint foreign key, with the anticipation that "someday" the parent table will get its primary key upsized from integer to bigint.
I don't know if there is any performance penalty with mismatched column sizes. That question is actually what brought me to this page, as I've been unable to find guidance on it online.
(Tangent: Never create any table with an integer id. Go with bigint, no matter what you think your data will look like in ten years. You're welcome.)
I have a number of tables in which I need to reference scene IDs, which could be a SET. The problem is that I need to be able to update a set in a table that contains my login information for the app. This set needs to expand(or potentially shrink) based on the number of scenes that exist on the DB. Is it possible to do in phpmyadmin?
From what I've seen in the web interface, I must pre-define the SET values. But I cannot find any info on how to edit the SET's possible values after the column has been created.
What you have is a many-to-many relationship between logins and scenes.
The correct way to implement this is with three tables, for example:
CREATE TABLE logins (login_id INT PRIMARY KEY ...);
CREATE TABLE scenes (scene_id INT PRIMARY KEY ...);
CREATE TABLE login_has_scene (
login_id INT NOT NULL,
scene_id INT NOT NULL,
PRIMARY KEY (login_id, scene_id),
FOREIGN KEY (login_id) REFERENCES logins (login_id),
FOREIGN KEY (scene_id) REFERENCES logins (scene_id)
);
This way you can add new scenes anytime, and you can reference any scene from any login by adding one row per login-scene pair.
This is better than using a SET because SET requires you to redefine the list of values using ALTER TABLE every time you add a scene, and this will become quite a chore.
Also a SET column only allows up to 64 distinct values. If you ever want these tables to support further scenes, you'd have to add more SET columns, or start recycling scene ids or something.
The many-to-many table is a much better solution.
Frankly, I have been using MySQL for nearly 20 years, and I've never found a good use for the SET data type.
Hypothetically, I have an ENUM column named Category, and an ENUM column named Subcategory. I will sometimes want to SELECT on Category alone, which is why they are split out.
CREATE TABLE `Bonza` (
`EventId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`Category` ENUM("a", "b", "c") NOT NULL,
`Subcategory` ENUM("x", "y", "z") NOT NULL,
PRIMARY KEY(`EventId`)
) ENGINE=InnoDB;
But not all subcategories are valid for all categories (say, "z" is only valid with "a" and "b"), and it irks me that this constraint isn't baked into the design of the table. If MySQL had some sort of "pair" type (where a column of that type were indexable on a leading subsequence of the value) then this wouldn't be such an issue.
I'm stuck with writing long conditionals in a trigger if I want to maintain integrity between category and subcategory. Or am I better off just leaving it? What would you do?
I suppose the most relationally-oriented approach would be storing an EventCategoryId instead, and mapping it to a table containing all valid event type pairs, and joining on that table every time I want to look up the meaning of an event category.
CREATE TABLE `Bonza` (
`EventId` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`EventCategoryId` INT UNSIGNED NOT NULL,
PRIMARY KEY(`EventId`),
FOREIGN KEY `EventCategoryId` REFEFRENCES(`EventCategories`.`EventCategoryId`)
ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB;
CREATE TABLE `EventCategories` (
`EventCategoryId` INT UNSIGNED NOT NULL,
`Category` ENUM("a", "b", "c") NOT NULL,
`Subcategory` ENUM("x", "y", "z") NOT NULL,
PRIMARY KEY(`EventCategoryId`)
) ENGINE=InnoDB;
-- Now populate this table with valid category/subcategory pairs at installation
Can I do anything simpler? This lookup will potentially cost me complexity and performance in calling code, for INSERTs into Bonza, no?
Assuming that your categories and subcategories don't change that often, and assuming that you're willing to live with a big update when they do, you can do the following:
Use an EventCategories table to control the hierarchical constraint between categories and subcategories. The primary key for that table should be a compound key containing both Category and Subcategory. Reference this table in your Bonza table. The foreign key in Bonza happens to contain both of the columns that you want to filter by, so you don't need to join to get what you're after. It will also be impossible to assign an invalid combination.
CREATE TABLE `Bonza` (
`EventId` UNSIGNED INT NOT NULL AUTO_INCREMENT,
`Category` CHAR(1) NOT NULL,
`Subcategory` CHAR(1) NOT NULL,
PRIMARY KEY(`EventId`),
FOREIGN KEY `Category`, `Subcategory`
REFEFRENCES(`EventCategories`.`Category`, `EventCategories`.`Subcategory`)
ON DELETE RESTRICT ON UPDATE CASCADE
) ENGINE=InnoDB;
CREATE TABLE `EventCategories` (
`EventCategoryId` UNSIGNED INT NOT NULL,
`Category` CHAR(1) NOT NULL,
`Subcategory` CHAR(1) NOT NULL,
PRIMARY KEY(`Category`, `Subcategory`)
) ENGINE=InnoDB;
My thought is: "best" is almost always opinion-based, but still there are some common things that may be said
Using relational structure
Once you have an issue that not all pairs are valid - you have an issue - that you must store this information. Therefore, you need either to store which pairs are invalid or to store which pairs are valid. Your sample with additional table is completely valid in terms of relational DBMS. In fact, if we'll face such issue, it is near the only way to resolve it on database-design level. With it:
You're storing valid pairs. That's as I've said: you have to store this information somewhere and here we are - creating new table
You're maintaining referential integrity via FOREIGN KEY. So your data will always be correct and point to valid pair
What bad things may happen and how could this impact the performance?
To reconstruct full row, you'll need to use simple JOIN:
SELECT
Bonza.id,
EventCategories.Subcategory,
EventCategories.Category
FROM
Bonza
LEFT JOIN EventCategories
ON Bonza.EventCategoryId=EventCategory.id
Performance of this JOIN will be good: you'll do it be FK - thus, by definition, you'll get only INDEX SCAN. It is about index quality (i.e. it's cardinality) - but in general it will be fast.
How complex is one JOIN? It's simple operation - but it may add some overhead to complex queries. However, in my opinion: it's ok. There's nothing complex in it.
You are able to change pairs with a simple changing of EventCategories data. That is: you can easily remove restrictions on prohibited pairs and this will affect nothing. I see this as a great benefit of this structure. However, adding new restriction isn't so simple - because, yes, it requires DELETE operation. You've chosen ON DELETE RESTRICT action for your FK - and that means you'll have to handle all conflicting records before adding new restriction. This depends, of course, from your application's logic - but think of it another way: if you'll add new restriction, shouldn't then all conflicting records be removed (because logic is saying: yes, they should)? If so, then change your FK to ON DELETE CASCADE.
So: having additional table is simple, flexible and actually easy way to resolve your issue.
Storing in one table
You've mentioned, that you can use trigger for your issue. And that is actually applicable, so I'll show - that this has it's weakness (well, together with some benefits). Let's say, we'll create the trigger:
DELIMITER //
CREATE TRIGGER catCheck BEFORE INSERT ON Bonza
FOR EACH ROW
BEGIN
IF NEW.Subcategory = "z" && NEW.Category = "c" THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Invalid category pair';
END IF;
END;//
DELIMITER ;
Obviously, we still have to store information about how to validate our pairs, but in this case we store invalid combinations. Once we'll get invalid data, we'll catch this inside trigger and abort our insert, returning proper user-defined errno (45000) together with some explanation text. Now, what about complexity and performance?
This way allows you to store your data as it is, in one table. This is a benefit: you'll get rid of JOIN - integrity is maintained by another tool. You may forget about storing pairs and handling them, hiding this logic in the trigger
So, you'll win on SELECT statements: your data always contain valid pairs. And no JOIN would be needed
But, yes, you'll loose on INSERT/UPDATE statements: they will invoke trigger and within it - some checking condition. It may be complex (many IF parts) and MySQL will check them one by one. Making one single condition wouldn't help lot - because still, in worst case, MySQL will check it till it's end.
Scalability of this method is poor. Every time you'll need to add/remove pair restriction - you'll have to redefine trigger. Even worse, unlike JOIN case, you'll not able to do any cascade actions. Instead you'll have to do manual handling.
What to chose?
For common case, if you don't know for certain - what will be your application conditions, I recommend you to use JOIN option. It's simple, readable, scalable. It fits relational DB principles.
For some special cases, you may want to chose second option. Those conditions would be:
Allowed pairs will never be changed (or will be changed very rare)
SELECT statements will be done much, much more often, then INSERT/UPDATE statements. And also SELECT statement performance will be in highest priority in terms of performance for your application.
I'd liked this problem but, with this information I would define a set of valid pairs for just one enum column:
CategorySubcategory ENUM("ax", "ay", "az", "bx", "by", "bz", "cx", "cy")
I think this will only be useful with a limited set of values, when they got bigger personally I would choose your second option rather than the triggered one.
First reason is absolutely an opinion, I don't like triggers too much, and they don't like me
Second reason is that a well indexed and properly sized reference from one table to another has a really high performance
I am creating a database for my company that will store many different types of information. The categories are Brightness, Contrast, Chromaticity, ect. Each category has a number of data points which my company would like to start storing.
Normally, I would create a table for each category which would store the corresponding data. (This is how I learned to do it). However, Sometimes these categories have "sub-data" which would change the number of fields required in each table.
My question is then how do people handle the inconsistency of data when structuring their databases? Do they just keep adding more tables for extra data or is it something else altogether?
There are a few (and thank goodness only a few) unbendable rules about relational database models. One of those is, that if you don't know what to store, you have a hard time storing it. Chances are, you'll have an even harder time retrieving it.
That said, the reality of business rules is often less clear cut than the ivory tower of database design. Most importantly, you might want or even need a way to introduce a new property without changing the schema.
Here are two feasable ways to go at this:
Use a datastore, that specializes in loose or inexistant schemas
(NoSQL and friends). Explaining this in detail is a subject of a CS
Thesis, not a stackoverflow answer.
My recommendation: Use a separate properties table - here is how
this goes:
Assuming for the sake of argument, your products allways have (unique string) name, (integer) id, brightness, contrast, chromaticity plus sometimes (integer) foo and (string) bar, consider these tables
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
brightness INT,
contrast INT,
chromaticity INT,
UNIQUE INDEX(name)
);
CREATE TABLE properties (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
proptype ENUM('null','int','string') NOT NULL default 'null',
UNIQUE INDEX(name)
);
INSERT INTO properties VALUES
(0,'foo','int'),
(0,'bar','string');
CREATE TABLE product_properties (
id INT PRIMARY KEY AUTO_INCREMENT,
products_id INT NOT NULL,
properties_id INT NOT NULL,
intvalue INT NOT NULL,
stringvalue VARCHAR(250) NOT NULL,
UNIQUE INDEX(products_id,properties_id)
);
now your "standard" properties would be in the products table as usual, while the "optional" properties would be stored in a row of product_properties, that references the product id and property id, with the value being in intvalue or stringvalue.
Selecting products including their foo if any would look like
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
AND product_properties.property_id=1
or even
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
LEFT JOIN properties
ON product_properties.property_id=properties.id
WHERE properties.name='foo' OR properties.name IS NULL
Please understand, that this incurs a performance penalty - in fact you trade performance against flexibility: Adding another property is nothing more than INSERTing a row in properties, the schema stays the same.
If you're not mysql bound then other databases have table inheritance or arrays to solve certain of those niche cases. Postgresql is a very nice database that you can use as easily and freely as mysql.
With mysql you could:
change your tables, add the extra columns and allow for NULL in the subcategory data that you don't need. This way integrity can be checked since you can still put constraints on the columns. Unless you really have a lot of subcategory columns this way I'd recommend this, otherwise option 3.
store subcategory data dynamically in a seperate table, that has a category_id,category_row_id,subcategory identifier(=type of subcategory) and a value column: that way you can retrieve your data by linking it via the category_id (determines table) and the category_row_id (links to PK of the original category table row). The bad thing: you can't use foreign keys or constraints properly to enforce integrity, you'd need to write hairy insert/update triggers to still have some control there which would push the burden of integrity checking and referential checking solely on the client. (in which case you'd properly be better of going NoSQL route) In short I wouldn't recommend this.
You can make a seperate subcategory table per category table, columns can be fixed or variable via value column(s) + optional subcategory identifier, foreign keys can still be used, best to maintain integrity is fixed since you'll have the full range of constraints at your disposal. If you have a lot of subcategory columns that would otherwise hopefully clutter your regular subcategory table then I'd recommend using this with fixed columns. Like the previous option I'd never recommend going dynamic for anything but throwaway data.
Alternatively if your subcategory is very variable and volatile: use NoSQL with a document database such as mongodb, mind you that you can keep all your regular data in a proper RDBMS and just storeside-data in the document database though that's probably not recommended.
If your subcategory data is in a known fixed state and not prone to change I'd just add the extra columns to the specific category table. Keep in mind that the major feature of a proper DBMS is safeguarding the integrity of your data via checks and constraints, doing away with that never really is a good idea.
If you are not limited to MySQL, you can consider Microsoft SQL server and using Sparse Columns This will allow you to expand your schema to include however many columns you want, without incurring the storage penalty for columns that are not pertinent for a given row.
Okay, I am asked to prepare a university database and I am required to store certain data in certain way.
For example, I need to store a course code that has a letter and followed by two integers. eg. I45,D61,etc.
So it should be VARCHAR(3) am I right? But I am still unsure whether this is the right path for it. I am also unsure how I am going to enforce this in the SQL script too.
I can't seem to find any answer for it in my notes and I am currently writing the data dictionary for this question before I meddle into the script.
Any tips?
As much as possible, make primary key with no business meaning. You can easily change your database design without dearly affecting the application layer side. With dumb primary key, the users don't associate meaning to the identifier of a certain record.
What you are inquiring about is termed as intelligent key, which most often is user-visible. The non user-visible keys is called dumb or surrogate key, sometimes this non user-visible key become visible, but it's not a problem as most dumb key aren't interpreted by the user. An example, however you want to change the title of this question, the id of this question will remain the same https://stackoverflow.com/questions/10412621/
With intelligent primary key, sometimes for aesthetic reasons, users want to dictate how the key should be formatted and look like. And this could get easily get updated often as often as users feel. And that will be a problem on application side, as this entails cascading the changes on related tables; and the database side too, as cascaded updating of keys on related tables is time-consuming
Read details here:
http://www.bcarter.com/intsurr1.htm
Advantages of surrogate keys: http://en.wikipedia.org/wiki/Surrogate_key
You can implement natural keys(aka intelligent key) alongside the surrogate key(aka dumb key)
-- Postgresql has text type, it's a character type that doesn't need length,
-- it can be upto 1 GB
-- On Sql Server use varchar(max), this is upto 2 GB
create table course
(
course_id serial primary key, -- surrogate key, aka dumb key
course_code text unique, -- natural key. what's seen by users e.g. 'D61'
course_name text unique, -- e.g. 'Database Structure'
date_offered date
);
The advantage of that approach is when some point in the future the school expand, then they decided to offer an Spanish language-catered Database Structure, your database is insulated from the user-interpreted values that are introduced by the user.
Let's say your database started using intelligent key :
create table course
(
course_code primary key, -- natural key. what's seen by users e.g. 'D61'
course_name text unique, -- e.g. 'Database Structure'
date_offered date
);
Then came the Spanish language-catered Database Structure course. If the user introduce their own rules to your system, they might be tempted to input this on course_code value:
D61/ESP, others will do it as ESP-D61, ESP:D61. Things could get out of control if the user decided their own rules on primary keys, then later they will tell you to query the data based on the arbitrary rules they created on the format of the primary key, e.g. "List me all the Spanish language courses we offer in this school", epic requirement isn't it? So what's a good developer will do to fit those changes to the database design? He/she will formalize the data structure, one will re-design the table to this:
create table course
(
course_code text, -- primary key
course_language text, -- primary key
course_name text unique,
date_offered date,
constraint pk_course primary key(course_code, course_language)
);
Did you see the problem with that? That shall incur downtime, as you needed to propagate the changes to the foreign keys of the table(s) that depends on that course table. Which of course you also need first to adjust those dependent tables. See the trouble it could cause not only for the DBA, and also for the dev too.
If you started with dumb primary key from the get-go, even if the user introduce rules to the system without your knowing, this won't entail any massive data changes nor data schema changes to your database design. And this can buy you time to adjust your application accordingly. Whereas if you put intelligence in your primary key, user requirement such as above can make your primary key devolve naturally to composite primary key. And that is hard not only on database design re-structuring and massive updating of data, it will be also hard for you to quickly adapt your application to the new database design.
create table course
(
course_id serial primary key,
course_code text unique, -- natural key. what's seen by users e.g. 'D61'
course_name text unique, -- e.g. 'Database Structure'
date_offered date
);
So with surrogate key, even if users stash new rules or information to the course_code, you can safely introduce changes to your table without compelling you to quickly adapt your application to the new design. Your application can still continue and won't necessitate downtime. It can really buy you time to adjust your app accordingly, anytime. This would be the changes to the language-specific courses:
create table course
(
course_id serial primary key,
course_code text, -- natural key. what's seen by users e.g. 'D61'
course_language text, -- natural key. what's seen by users e.g. 'SPANISH'
course_name text unique, -- e.g. 'Database Structure in Spanish'
date_offered date,
constraint uk_course unique key(course_code, course_language)
);
As you can see, you can still perform a massive UPDATE statement to split the user-imposed rules on course_code to two fields which doesn't necessitate changes on the dependent tables. If you use intelligent composite primary key, restructuring your data will compel you to cascade the changes on composite primary keys to dependent tables' composite foreign keys. With dumb primary key, your application shall still operate as usual, you can amend changes to your app based on the new design (e.g. new textbox, for course language) later on, any time. With dumb primary key, the dependent table doesn't need a composite foreign key to point to the course table, they can still use the same old dumb/surrogate primary key
And also with dumb primary key, the size of your primary key and foreign keys won't expand
This is the domain solution. Still not perfect, check can be improved, etc.
set search_path='tmp';
DROP DOMAIN coursename CASCADE;
CREATE DOMAIN coursename AS varchar NOT NULL
CHECK (length(value) > 0
AND SUBSTR(value,1) >= 'A' AND SUBSTR(value,1) <= 'Z'
AND SUBSTR(value,2) >= '0' AND SUBSTR(value,2) <= '9' )
;
DROP TABLE course CASCADE;
CREATE TABLE course
( cname coursename PRIMARY KEY
, ztext varchar
, UNIQUE (ztext)
);
INSERT INTO course(cname,ztext)
VALUES ('A11', 'A 11' ), ('B12', 'B 12' ); -- Ok
INSERT INTO course(cname,ztext)
VALUES ('3','Three' ), ('198', 'Butter' ); -- Will fail
BTW: For the "actual" PK, I would probably use a surrogate ID. But the domain above (with UNIQUE constraint) could serve as a "logical" candidate key.
That is basically the result of the Table is Domain paradigm.
I strongly recommend you not get too specific about the datatype, so something like VARCHAR(8) would be fine. The reasons are:
Next year there might be four characters in the code. Business needs change all the time, so don't lock down field lengths too much
Let the application layer handle validation - after all, it has to communicate the validation problem to the user
You're adding little or no business value by limiting it to 3 chars
With mysql, although you can define check constraints on columns (in the hope of "validating" the values), they are ignored and are allowed for compatibility reasons only
Of all the components of your system, the database schema is always the hardest thing to change, so allow some flexibility in your data types to avoid changes as much as possible.