Need database design advise - mysql

I have a Questions table which looks like that
as you see, there are 2 id rows that are nearly same: id, question_id. id - is autoincremented, unique id of each question, and question_id - is, for example, course 1 lesson 1 has 5 questions like: question 1, 2, 3, 4, 5. And course 1 lesson 2 has 3 questions: 1, 2, 3 etc.. In other words it's like autoincrement field for each unique course_id, lesson_id combination. I decided to add manually question_id based on course_id, lesson_id for keeping database structure readability and not messing up database, with bunch of association tables between course-lesson-question unique id values.
First question is
How do you think, is it good solution or not? How you'd design this database?
The problem is, when I want to insert new question with given course_id, lesson_id the id field will auto-increment, but I got
Second question
How can I get, last question_id column value based on unique course_id, lesson_id combination. For example, if course 1 lesson 2 has 3 questions: 1, 2, 3 and I want to add 4th question (as I said, in current db design I'm inserting question_id value manually), how do I know that, the last question of course 1 lesson 2 is 3th, and insert new question with (last_number=3)++=4?
Update
Situation is a bit complicated but I will try to explain:
It's online tutorials website. There are different courses, and each course has bunch of lessons. Now I'm designing question-answer part, in which teacher posts questions, and users getting dedicated questions.
Full size image here
Now, the table questions is dedicated for course>lesson based questions.
question_from_to_assoc - It's for, creating assoications between question author and receiver user. For example admin (id=.) sends question to some user with id=5.
qa_assoc - question-answer associations table.

First of all, this is not an optimal database design. Your schema is denormalized, which is not really good.
To answer your first question. I would split Lesson, Course, Question and Author into separate tables. Then I would add a number field beside the Primary Key for Course, Lesson and Question. The PK will only ensure uniqueness of a row, but the number field will be your course number, question number, etc.
Using the PK to represent a question number for instance is not a good idea in my opinion, because it should be kept unchanged. For instance, if your questions are changed to letters instead of numbers, your PK would have to be changed and this might break referential integrity.
After that, I would add a unique constraint on question numbers and FK like [question_no, lesson_id] to ensure that you cannot have two question 1 for the same lesson. Same thing for Lesson. Course would have a unique constraint on course_no.
Finally, to automatically increment question numbers depending on lesson, I would use a trigger which would do something like :
CREATE TRIGGER tr_question_number BEFORE INSERT ON questions
FOR EACH ROW BEGIN
SET NEW.question_no = (SELECT MAX(question_no)+1 FROM questions WHERE lesson_id = NEW.lesson_id FOR UPDATE)
END;
This trigger will set the question number field with the latest value + 1. The FOR UPDATE clause is very important, because it will lock the row to avoid concurrent insertion to get the same number.
The trigger is just a draft, but that's just a general idea of what I would have done.
I hope this will help you.

Question 1: No. I would just use that ID column in that questions table as your unique identifier and drop that question_id field. My design:
create table author (
id int(11) NOT NULL auto_increment,
name varchar(256)
) engine=innodb;
create table course (
id int(11) not null auto_increment,
primary key(id),
name varchar(256)
) engine=innodb;
create table lesson (
id int(11) not null auto_increment,
primary key(id),
name varchar(256),
course_id int(11) NOT NULL,
FOREIGN KEY(course_id) references course(id)
) engine=innodb;
create table question(
id int(11) not null auto_increment,
primary key(id),
question_text text,
correct_answer text,
lesson_id int(11) NOT NULL,
foreign key(lesson_id) references lesson(id),
author_id int(11) not null,
FOREIGN KEY(author_id) REFERENCES author(id)
) engine=innodb;
Question 2: Don't do that. What if I have course_id 1, lesson 2, and question 11? That ID column would be identical to course_id 1, lesson 21, question 1.
And as an aside, I really hope you're using foreign keys. Since it says you're using mysql, be sure to use the Innodb storage engine with these tables so you can use foreign keys to enforce referential integrity.
The key to querying this database efficiently, avoiding collisions in supposedly unique values, avoiding serious performance issues in the future, and data duplication is to design your database in a normalized manner. My example above is normalized and avoids data duplication, as well as the composite key scheme that would not result in unique keys that you defined above. It's best to work with the features built into MySQL rather than try to reinvent the wheel.

I personally would set this up different:
Question table:
id int, -- PK auto-increment
content varchar(50),
answer varchar(50),
authorid int
Course table:
id int, -- PK auto-increment
name varchar(50)
Lesson Table:
id int, -- PK auto-increment
name varchar(50)
Question_Course_Lesson join table:
questionid int, -- PK
courseid int, -- PK
lessonid int -- PK

1.
You should try to get rid of question_id and only leave the autoincrement id as a primary key. Otherwise inserting will get messy.
The problem is, now when I want to insert new question with given course_id, lesson_id the id field wil auto-increment.
I don't understand the problem - auto_increment fields usually do that :).
2.
You can use
SELECT MAX(question_id) WHERE course_id = <something> AND lesson_id = <some_other_thing>
To get the current max id for a given course_id and lesson_id. But you will have to lock the table (or use FOR UPDATE and a transaction if the table is InnoDB) and unlock it after you insert the new record to make sure it remains consistent.

First question.
Actually, I can see why you might need the "redundant" id.
For example, it may play a role in the process of presenting your questions to test takers.
But, it surely does not need to be tied to the values of other columns in the row, the autogenerated id guarantees uniqness anyway.
Second question.
Use the onInsert trigger. It the best way to prevent collisions.

Related

What decides what constrains to use when creating table from a physical schema

I am learning SQL and going trough some lab exercises when i got to a question that asks to create table from a physical schema. No problem there, simple enough to create a table, but i got it wrong because i didn't use the NOT NULL, NULL, and FKconstraints. So what in this schema tells me what constrains to use? Here is the correct answer according to the exercise. (the auto increment was provided in the question)
CREATE TABLE customerorder (DonutOrderID INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
CustomerID INT(11) NOT NULL,
DonutOrderTimestamp TIMESTAMP DEFAULT NOW(),
SpecialNotes VARCHAR(500) NULL,
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID));
A customerorder without a customer makes no sense, so its customer ID column should not be nullable.
A customerorder must refer to an existing customer, so you should make the customer ID a foreign key to the customer table.
SpecialNotes are only special when they are optional in my opinion, so the should be nullable.
If the table is called customerorder, its ID should not be called DonutOrderID, as this name looks somewhat unrelated and I'd expect some additional DonutOrder table in the database. The customerorder's ID should be called id or customerorder_id or the like instead.
As a customer order seems to be a donat order in that database, the DonutOrderTimestamp should probably be obligatory (i.e. not nullable), as every order is placed at some point in time.

Insert Data into multiple tables in MySQL

Consider two tables User and UserDetails
User (UserID,Name,Password)
UserDetails(UserID,FullName, Mobile Number,EMail)
First I will enter details into User table
Then Afterwards I wish to enter details into UserDetails Table with respect to primary key of first table i.e., UserID which is autoincremented.
consider this scenario..
User: (101, abc, xyz), (102,asd,war)
Now i want to store details in second table with respect to Primary key where UserID= 102
How can I accomplish this?
Start over with the design. Here is a start that runs through and doesn't blow up. Do the same for email. Keep data normalized and don't cause unnecessary lookups. When you have a lot of constraints, it is a sign that you care about the quality of your data. Not that you don't without constraints, if they are un-constrainable.
We all read on the internet how we should keep main info in one table and details in another. Nice as a broad brush stroke. But yours does not rise to that level. Yours would have way too many tables. See Note1 at bottom about about Entities. See Note2 at bottom about performance. See any of us with any broad or specific question you may have.
create table user
( userId int auto_increment primary key,
fullName varchar(100) not null
-- other columns
);
create table phoneType
( phoneType int auto_increment primary key, -- here is the code
long_description varchar(100) not null
-- other columns
);
create table userPhone
( id int auto_increment primary key,
userId int not null,
phone varchar(20) not null,
phoneType int not null,
-- other columns
CONSTRAINT fk_up_2_user FOREIGN KEY (userId) REFERENCES user(userId),
CONSTRAINT fk_up_2_phoneType FOREIGN KEY (phoneType) REFERENCES phoneType(phoneType)
);
Note1:
I suspect that your second table as you call it is really a third table, as you try to bring in missing information that really belongs in the Entity.
Entities
Many have come before you crafting our ideas as we slug it out in design. Many bad choices have been made and by yours truly. A good read is third normal form (3NF) for data normalization techniques.
Note2:
Performance. Performance needs to be measured both in real-time user and in developer problem solving of data that has run amok. Many developers spend significant time doing data patches for schemas that did not enforce data integrity. So factor that into performance, because those hours add up in those split seconds of User Experience (UX).
You can try this:-
INSERT INTO userDetails
(SELECT UserID, Name FROM User WHERE USerID= 102), 'Mob_No', EMail;

SQL: Creating a Relational table with 2 different auto_increment

I have 2 tables, each with their own auto incremented IDs, which are of course primary keys.
When I want to create a 3rd table to establish the relation between these 2 tables, I always have an error.
First one is that you can have only 1 automatically-incremented column, the second one occurs when I delete the auto_increment statement from those 2, therefore AQL doesn't allow me to make them foreign keys, because of the type matching failure.
Is there a way that I can create a relational table without losing auto increment features?
Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto_increment statement, of course. Is it inevitable?
Thanks in advance.
1 Concept
You have misunderstood some basic concepts, and the difficulties result from that. We have to address the concepts first, not the problem as you perceive it, and consequently, your problem will disappear.
auto incremented IDs, which are of course primary keys.
No, they are not. That is a common misconception. And problems are guaranteed to ensue.
An ID field cannot be a Primary Key in the English or technical or Relational senses.
Sure, in SQL, you can declare any field to be a PRIMARY KEY, but that doesn't magically transform it into a Primary Key in the English, technical, or Relational senses. You can name a chihuahua "Rottweiller", but that doesn't transform it into a Rottweiller, it remains a chihuahua. Like any language, SQL simply executes the commands that you give it, it does not understand PRIMARY KEY to mean something Relational, it just whacks an unique index on the column (or field).
The problem is, since you have declared the ID to be a PRIMARY KEY, you think of it as a Primary Key, and you may expect that it has some of qualities of a Primary Key. Except for the uniqueness of the ID value, it provides no benefit. It has none of the qualities of a Primary Key, or any sort of Relational Key for that matter. It is not a Key in the English, technical, or Relational senses. By declaring a non-key to be a key, you will only confuse yourself, and you will find out that there is something terribly wrong only when the user complains about duplicates in the table.
2 Relational Model
2.1  Relational tables must have row uniqueness
A PRIMARY KEY on an ID field does not provide row uniqueness. Therefore it is not a Relational table containing rows, and if it isn't that, then it is a file containing records. It doesn't have any of the integrity, or power (at this stage you will be aware of join power only), or speed, that a table in a Relational database has.
Execute this code (MS SQL) and prove it to yourself. Please do not simply read this and understand it, and then proceed to read the rest of this Answer, this code must be executed before reading further. It has curative value.
-- [1] Dumb, broken file
-- Ensures unique RECORDS, allows duplicate ROWS
CREATE TABLE dumb_file (
id INT IDENTITY PRIMARY KEY,
name_first CHAR(30),
name_last CHAR(30)
)
INSERT dumb_file VALUES
( 'Mickey', 'Mouse' ),
( 'Mickey', 'Mouse' ),
( 'Mickey', 'Mouse' )
SELECT *
FROM dumb_file
Notice that you have duplicate rows. Relational tables are required to have unique rows. Further proof that you do not have a relational table, or any of the qualities of one.
Notice that in your report, the only thing that is unique is the ID field, which no user cares about, no user sees, because it is not data, it is some additional nonsense that some very stupid "teacher" told you to put in every file. You have record uniqueness but not row uniqueness.
In terms of the data (the real data minus the extraneous additions), the data name_last and name_first can exist without the ID field. A person has a first name and last name without an ID being stamped on their forehead.
The second thing that you are using that confuses you is the AUTOINCREMENT. If you are implementing a record filing system with no Relational capability, sure, it is helpful, you don't have to code the increment when inserting records. But if you are implementing a Relational Database, it serves no purpose at all, because you will never use it. There are many features in SQL that most people never use.
2.2  Corrective Action
So how do you upgrade, elevate, that dumb_file that is full of duplicate rows to a Relational table, in order to get some of the qualities and benefits of a Relational table ? There are three steps to this.
You need to understand Keys
And since we have progressed from ISAM files of the 1970's, to the Relational Model, you need to understand Relational Keys. That is, if you wish to obtain the benefits (integrity, power, speed) of a Relational Database.
In Codd's Relational Model:
a key is made up from the data
and
the rows in a table must be unique
Your "key" is not made up from the data. It is some additional, non-data parasite, caused by your being infected with the disease of your "teacher". Recognise it as such, and allow yourself the full mental capacity that God gave you (notice that I do not ask you to think in isolated or fragmented or abstract terms, all the elements in a database must be integrated with each other).
Make up a real key from the data, and only from the data. In this case, there is only one possible Key: (name_last, name_first).
Try this code, declare an unique constraint on the data:
-- [2] dumb_file fixed, elevated to table, prevents duplicate rows
-- still dumb
CREATE TABLE dumb_table (
id INT IDENTITY PRIMARY KEY,
name_first CHAR(30),
name_last CHAR(30),
CONSTRAINT UK
UNIQUE ( name_last, name_first )
)
INSERT dumb_table VALUES
( 'Mickey', 'Mouse' ),
( 'Minnie', 'Mouse' )
SELECT *
FROM dumb_table
INSERT dumb_table VALUES
( 'Mickey', 'Mouse' )
Now we have row uniqueness. That is the sequence that happens to most people: they create a file which allows dupes; they have no idea why dupes are appearing in the drop-downs; the user screams; they tweak the file and add an index to prevent dupes; they go to the next bug fix. (They may do so correctly or not, that is a different story.)
The second level. For thinking people who think beyond the fix-its. Since we have now row uniqueness, what in Heaven's name is the purpose of the ID field, why do we even have it ??? Oh, because the chihuahua is named Rotty and we are afraid to touch it.
The declaration that it is a PRIMARY KEY is false, but it remains, causing confusion and false expectations. The only genuine Key there is, is the (name_last, name_fist), and it is a Alternate Key at this point.
Therefore the ID field is totally superfluous; and so is the index that supports it; and so is the stupid AUTOINCREMENT; and so is the false declaration that it is a PRIMARY KEY; and any expectations you may have of it are false.
Therefore remove the superfluous ID field. Try this code:
-- [3] Relational Table
-- Now that we have prevented duplicate data, the id field
-- AND its additional index serves no purpose, it is superfluous,
-- like an udder on a bull. If we remove the field AND the
-- supporting index, we obtain a Relational table.
CREATE TABLE relational_table (
name_first CHAR(30),
name_last CHAR(30),
CONSTRAINT PK
PRIMARY KEY ( name_last, name_first )
)
INSERT relational_table VALUES
( 'Mickey', 'Mouse' ),
( 'Minnie', 'Mouse' )
SELECT *
FROM relational_table
INSERT relational_table VALUES
( 'Mickey', 'Mouse' )
Works just fine, works as intended, without the extraneous fields and indices.
Please remember this, and do it right, every single time.
2.3  False Teachers
In these end times, as advised, we will have many of them. Note well, the "teachers" who propagate ID columns, by virtue of the detailed evidence in this post, simply do not understand the Relational Model or Relational Databases. Especially those who write books about it.
As evidenced, they are stuck in pre-1970 ISAM technology. That is all they understand, and that is all that they can teach. They use an SQL database container, for the ease of Access, recovery, backup, etc, but the content is pure Record Filing System with no Relational Integrity, Power, or speed. AFAIC, it is a serious fraud.
In addition to ID fields, of course, there are several items that are key Relational-or-not concepts, that taken together, cause me to form such a grave conclusion. Those other items are beyond the scope of this post.
One particular pair of idiots is currently mounting an assault on First Normal Form. They belong in the asylum.
3  Solution
Now for the rest of your question.
3.1  Answers
Is there a way that I can create a relational table without losing auto increment features?
That is a self-contradicting sentence. I trust you will understand from my explanation, Relational tables have no need for AUTOINCREMENT "features"; if the file has AUTOINCREMENT, it is not a Relational table.
AUTOINCREMENT or IDENTITY is good for one thing only: if, and only if, you want to create an Excel spreadsheet in the SQL database container, replete with fields named A, B, and C, across the top, and record numbers down the left side. In database terms, that is the result of a SELECT, a flattened view of the data, that is not the source of data, which is organised (Normalised).
Another possible (but not preferred) solution may be there is another primary key in the first table, which is the username of the user, not with an auto increment statement, of course. Is it inevitable?
In technical work, we don't care about preferences, because that is subjective, and it changes all the time. We care about technical correctness, because that is objective, and it does not change.
Yes, it is inevitable. Because it is just a matter of time; number of bugs; number of "can't dos"; number of user screams, until you face the facts, overcome your false declarations, and realise that:
the only way to ensure that user rows are unique, that user_names are unique, is to declare an UNIQUE constraint on it
and get rid of user_id or id in the user file
which promotes user_name to PRIMARY KEY
Yes, because your entire problem with the third table, not coincidentally, is then eliminated.
That third table is an Associative Table. The only Key required (Primary Key) is a composite of the two parent Primary Keys. That ensures uniqueness of the rows, which are identified by their Keys, not by their IDs.
I am warning you about that because the same "teachers" who taught you the error of implementing ID fields, teach the error of implementing ID fields in the Associative Table, where, just as with an ordinary table, it is superfluous, serves no purpose, introduces duplicates, and causes confusion. And it is doubly superfluous because the two keys that provide are already there, staring us in the face.
Since they do not understand the RM, or Relational terms, they call Associative Tables "link" or "map" tables. If they have an ID field, they are in fact, files.
3.2  Lookup Tables
ID fields are particularly Stupid Thing to Do for Lookup or Reference tables. Most of them have recognisable codes, there is no need to enumerate the list of codes in them, because the codes are (should be) unique.
ENUM is just as stupid, but for a different reason: it locks you into an anti-SQL method, a "feature" in that non-compliant "SQL".
Further, having the codes in the child tables as FKs, is a Good Thing: the code is much more meaningful, and it often saves an unnecessary join:
SELECT ...
FROM child_table -- not the lookup table
WHERE gender_code = "M" -- FK in the child, PK in the lookup
instead of:
SELECT ...
FROM child_table
WHERE gender_id = 6 -- meaningless to the maintainer
or worse:
SELECT ...
FROM child_table C -- that you are trying to determine
JOIN lookup_table L
ON C.gender_id = L.gender_id
WHERE L.gender_code = "M" -- meaningful, known
Note that this is something one cannot avoid: you need uniqueness on the lookup code and uniqueness on the description. That is the only method to prevent duplicates in each of the two columns:
CREATE TABLE gender (
gender_code CHAR(2) NOT NULL,
name CHAR(30) NOT NULL
CONSTRAINT PK
PRIMARY KEY ( gender_code )
CONSTRAINT AK
UNIQUE ( name )
)
3.3  Full Example
From the details in your question, I suspect that you have SQL syntax and FK definition issues, so I will give the entire solution you need as an example (since you have not given file definitions):
CREATE TABLE user ( -- Typical Identifying Table
user_name CHAR(16) NOT NULL, -- Short PK
name_first CHAR(30) NOT NULL, -- Alt Key.1
name_last CHAR(30) NOT NULL, -- Alt Key.2
birth_date DATE NOT NULL -- Alt Key.3
CONSTRAINT PK -- unique user_name
PRIMARY KEY ( user_name )
CONSTRAINT AK -- unique person identification
PRIMARY KEY ( name_last, name_first, birth_date )
)
CREATE TABLE sport ( -- Typical Lookup Table
sport_code CHAR(4) NOT NULL, -- PK Short code
name CHAR(30) NOT NULL -- AK
CONSTRAINT PK
PRIMARY KEY ( sport_code )
CONSTRAINT AK
PRIMARY KEY ( name )
)
CREATE TABLE user_sport ( -- Typical Associative Table
user_name CHAR(16) NOT NULL, -- PK.1, FK
sport_code CHAR(4) NOT NULL, -- PK.2, FK
start_date DATE NOT NULL
CONSTRAINT PK
PRIMARY KEY ( user_name, sport_code )
CONSTRAINT user_plays_sport_fk
FOREIGN KEY ( user_name )
REFERENCES user ( user_name )
CONSTRAINT sport_occupies_user_fk
FOREIGN KEY ( sport_code )
REFERENCES sport ( sport_code )
)
There, the PRIMARY KEY declaration is honest, it is a Primary Key; no ID; no AUTOINCREMENT; no extra indices; no duplicate rows; no erroneous expectations; no consequential problems.
3.4  Relational Data Model
Here is the Data Model to go with the definitions.
As a PDF
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation.
A picture is worth a thousand words; in this case a standard-complaint picture is worth more than that; a bad one is not worth the paper it is drawn on.
Please check the Verb Phrases carefully, they comprise a set of Predicates. The remainder of the Predicates can be determined directly from the model. If this is not clear, please ask.

Multitable constraints for a column value in sql database

I have a three tables, a project table, there may be many projects, a subjects table, where each project will have many subjects and a condition table where each subject will have a condition and a project may have many conditions.
How to restrict the condition that the subjects can have based on the conditions that the project is linked to given that the subject must be in one of the projects.
Hope that makes sense. Also, I am thinking of using sqlite but if it is not possible to do something like this with the database system does there exist one that can? Preferebly free and sql based ie mysql or postgresql.
Thanks.
edit: some examples;
project A has conditions 1, 2 and 3. All are drawn from the condition table which has conditions 1,2,3,4,5. Now subject X is part of project A so should only be allowed to assume conditions 1,2,3 NOT 4 or 5.
Is this possible?
Looks like you need something similar to this:
The key aspect of this design the the usage of identifying relationships and the resulting composite keys. This allows us to migrate PROJECT.PROJECT_ID:
not just directly to SUBJECT
but also through CONDITION and then to SUBJECT.
Both of these "paths" of migration eventually get merged into the same field (note FK1,FK2 in front of SUBJECT.PROJECT_ID), which ensures that when a subject is connected to a condition, they both must be connected to the same project.
create table Condition(
Id int not null, --PK
Description varchar(50)
)
create table ProjectCondition(
Id int not null, --PK
ProjectId int not null, -- FK to Project PK#
ConditionId int not null -- FK to Condition PK
)
create table ProjectSubject(
Id int not null, --PK
ProjectId int not null, -- FK to Project PK
SubjectId int not null -- FK to Subject PK
)
create table ProjectSubjectCondition(
Id int not null, -- PK
ProjectConditionId int not null -- FK to ProjectContion PK
)
Assumptions:
Subject has an existence separate from Project (i.e. there is a
Subject table somewhere)
Condition is the same
(Doesn't make much difference if they're wrong.)
By linking the ProjectSubjectCondition to the ProjectConditions the condition of a subject for a project must be a condition of the project.
Cheers -

MYSQL: Two fields as PRIMARY KEYs + 1 Field as UNIQUE, a question

I have two primary (composite) keys that refer to a shop and a branch.
I thought I should have used a corresponding ID for each row, so I added a UNIQUE + AUTO_INCREMENT named ID.
So I had on the table a column named ID (AUTO INCREMENT), but it was declared PRIMARY - which was done automatically, and I don't want the ID to be PRIMARY. Just the shop and branch.
I have learnt how to trick MYSQL to accept the ID field as UNIQUE and AUTO INCREMENT, as it was not extremely trivial to make the AUTO_INCREMENT (it wanted to make it PRIMARY).
I had to ERASE the ID Field (for some reason it didn't let me erase its PRIMARY index), then declare it INDEX, and only then AUTO INCREMENT.
Is that a good approach ?
Could there be something I am doing wrong going with this design ?
Thanks !!!
The prevailing wisdom is that every table should have a unique autonumbered column named Id.
In classical data modeling, as developed by Codd and Date, the ID field is not necessary for a complete logical model of the data.
What good does the ID field do you? Do you ever reference a row in this table by its ID? If never, then just leave the field out. (shop, branch) provided a perfectly good candidate to be the PK.
What did your create table statement look like? Because I imagine this:
CREATE TABLE foo (
IDCol int not null auto_increment,
shop int not null,
branch int not null,
/* ... */
UNIQUE KEY IDCol (IDCol),
PRIMARY KEY (shop, branch)
);