Multitable constraints for a column value in sql database - mysql

I have a three tables, a project table, there may be many projects, a subjects table, where each project will have many subjects and a condition table where each subject will have a condition and a project may have many conditions.
How to restrict the condition that the subjects can have based on the conditions that the project is linked to given that the subject must be in one of the projects.
Hope that makes sense. Also, I am thinking of using sqlite but if it is not possible to do something like this with the database system does there exist one that can? Preferebly free and sql based ie mysql or postgresql.
Thanks.
edit: some examples;
project A has conditions 1, 2 and 3. All are drawn from the condition table which has conditions 1,2,3,4,5. Now subject X is part of project A so should only be allowed to assume conditions 1,2,3 NOT 4 or 5.
Is this possible?

Looks like you need something similar to this:
The key aspect of this design the the usage of identifying relationships and the resulting composite keys. This allows us to migrate PROJECT.PROJECT_ID:
not just directly to SUBJECT
but also through CONDITION and then to SUBJECT.
Both of these "paths" of migration eventually get merged into the same field (note FK1,FK2 in front of SUBJECT.PROJECT_ID), which ensures that when a subject is connected to a condition, they both must be connected to the same project.

create table Condition(
Id int not null, --PK
Description varchar(50)
)
create table ProjectCondition(
Id int not null, --PK
ProjectId int not null, -- FK to Project PK#
ConditionId int not null -- FK to Condition PK
)
create table ProjectSubject(
Id int not null, --PK
ProjectId int not null, -- FK to Project PK
SubjectId int not null -- FK to Subject PK
)
create table ProjectSubjectCondition(
Id int not null, -- PK
ProjectConditionId int not null -- FK to ProjectContion PK
)
Assumptions:
Subject has an existence separate from Project (i.e. there is a
Subject table somewhere)
Condition is the same
(Doesn't make much difference if they're wrong.)
By linking the ProjectSubjectCondition to the ProjectConditions the condition of a subject for a project must be a condition of the project.
Cheers -

Related

What database technique should I use?

I am new to databases. This is for a class. I am using MySQL. I will be accessing the database with PHP. I have two tables already. TableA is for products. TableB is for US States. I want to have data about sales of each product in each state. I have considered this is a many to many relationship.
Technique idea #1:
I have considered making a third table, TableC, that has a column for the state names and a column for each product. My issue with this is that I don't know how create a relationship between the product rows in TableA and the product columns in TableC. If I add a product to TableA I want it to automatically add the column in TableC.
Technique idea #2:
Add the product columns to TableB. Same issue as above. Also, seems like a worse design.
Are one of these techniques the right way to do this or is there another technique?
The art and science of making a good schema revolves around finding the best place to put something, or in many cases, the least inconvenient.
Putting sales data in a table that's intended for geographic information is almost always a mistake. Keep your tables focused on one entity, and where information there exists only when related to other tables, make a "join table" that connects the two and put that data there.
For example:
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255)
);
CREATE TABLE states (
id INT PRIMARY KEY AUTO_INCREMENT,
code CHAR(2),
name VARCHAR(255)
);
CREATE TABLE state_sales(
id INT PRIMARY KEY AUTO_INCREMENT,
state_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT
);
You can extend that later to include things like the date/month/quarter the sales happened in and so forth.

What would be the best table structure for variable amount of combination?

I need some advice for the choice of my table structure.
I am working on a project where I need to save values that are a combination of a variable amount of other values.
For example:
A = b,c,d
B = z,r
I was thinking on saving the combinations in a json object inside a column but I am afraid it can be long for big requests and not easy for filtering.
There was also the solution of having a multiple amount of columns (containing null when not necessary), but this will not be a good representation of the data, also filtering will be hard.
Finally I thought the best would be many to many relations, but the joins might be too heavy, are they ?
Do you see any other alternative (besides switching to nosql) ?
This shows the use of Junction tables to avoid saving data in comma separated lists, json, or other mechanisms that would be problematic in at least these areas:
Tables-scans (slowness, non-use of fast indexes)
Maintenance of data
Data integrity
Schema
create table cat
( -- categories
id int auto_increment primary key,
code varchar(20) not null,
description varchar(255) not null
);
create table subcat
( -- sub categories
id int auto_increment primary key,
code varchar(20) not null,
description varchar(255) not null
);
create table csJunction
( -- JUNCTION table for cat / sub categories
-- Note: you could ditch the id below, and go with composite PK on (catId,subCatId)
-- but this makes the PK (primary key) thinner when used elsewhere
id int auto_increment primary key,
catId int not null,
subCatId int not null,
CONSTRAINT fk_csj_cat FOREIGN KEY (catId) REFERENCES cat(id),
CONSTRAINT fk_csj_subcat FOREIGN KEY (subCatId) REFERENCES subcat(id),
unique key (catId,subCatId) -- prevents duplicates
);
insert cat(code,description) values('A','descr for A'),('B','descr for B'); -- id's 1,2 respectively
insert subcat(code,description) values('b','descr for b'),('c','descr for c'),('d','descr for d'); -- id's 1,2,3
insert subcat(code,description) values('r','descr for r'),('z','descr for z'); -- id's 4,5
-- Note that due to the thinness of PK's, chosen for performance, the below is by ID
insert csJunction(catId,subCatId) values(1,1),(1,2),(1,3); -- A gets a,b,c
insert csJunction(catId,subCatId) values(2,4),(2,5); -- B gets r,z
Good Errors
The following errors are good and expected, data is kept clean
insert csJunction(catId,subCatId) values(2,4); -- duplicates not allowed (Error: 1062)
insert csJunction(catId,subCatId) values(13,4); -- junk data violates FK constraint (Error: 1452)
Other comments
In response to your comments, data is cached only in so far as mysql has a Most Recently Used (MRU) strategy, no more or less than any data cached in memory versus physical lookup.
The fact that B may contain not only z,r at the moment, but it could also contain c as does A, does not mean there is a repeat. And as seen in the schema, no parent can duplicate its containment (or repeat) of a child, which would be a data problem anyway.
Note that one could easily go the route of PK's in cat and subcat using the code column. That would unfortunately cause wide indexes, and even wider composite indexes for the junction table. That would slow operations down considerably. Though the data maintenance could be visually more appealing, I lean toward performance over appearance any day.
I will add to this Answer when time permits to show such things as "What categories contain a certain subcategory", deletes, etc.

Need database design advise

I have a Questions table which looks like that
as you see, there are 2 id rows that are nearly same: id, question_id. id - is autoincremented, unique id of each question, and question_id - is, for example, course 1 lesson 1 has 5 questions like: question 1, 2, 3, 4, 5. And course 1 lesson 2 has 3 questions: 1, 2, 3 etc.. In other words it's like autoincrement field for each unique course_id, lesson_id combination. I decided to add manually question_id based on course_id, lesson_id for keeping database structure readability and not messing up database, with bunch of association tables between course-lesson-question unique id values.
First question is
How do you think, is it good solution or not? How you'd design this database?
The problem is, when I want to insert new question with given course_id, lesson_id the id field will auto-increment, but I got
Second question
How can I get, last question_id column value based on unique course_id, lesson_id combination. For example, if course 1 lesson 2 has 3 questions: 1, 2, 3 and I want to add 4th question (as I said, in current db design I'm inserting question_id value manually), how do I know that, the last question of course 1 lesson 2 is 3th, and insert new question with (last_number=3)++=4?
Update
Situation is a bit complicated but I will try to explain:
It's online tutorials website. There are different courses, and each course has bunch of lessons. Now I'm designing question-answer part, in which teacher posts questions, and users getting dedicated questions.
Full size image here
Now, the table questions is dedicated for course>lesson based questions.
question_from_to_assoc - It's for, creating assoications between question author and receiver user. For example admin (id=.) sends question to some user with id=5.
qa_assoc - question-answer associations table.
First of all, this is not an optimal database design. Your schema is denormalized, which is not really good.
To answer your first question. I would split Lesson, Course, Question and Author into separate tables. Then I would add a number field beside the Primary Key for Course, Lesson and Question. The PK will only ensure uniqueness of a row, but the number field will be your course number, question number, etc.
Using the PK to represent a question number for instance is not a good idea in my opinion, because it should be kept unchanged. For instance, if your questions are changed to letters instead of numbers, your PK would have to be changed and this might break referential integrity.
After that, I would add a unique constraint on question numbers and FK like [question_no, lesson_id] to ensure that you cannot have two question 1 for the same lesson. Same thing for Lesson. Course would have a unique constraint on course_no.
Finally, to automatically increment question numbers depending on lesson, I would use a trigger which would do something like :
CREATE TRIGGER tr_question_number BEFORE INSERT ON questions
FOR EACH ROW BEGIN
SET NEW.question_no = (SELECT MAX(question_no)+1 FROM questions WHERE lesson_id = NEW.lesson_id FOR UPDATE)
END;
This trigger will set the question number field with the latest value + 1. The FOR UPDATE clause is very important, because it will lock the row to avoid concurrent insertion to get the same number.
The trigger is just a draft, but that's just a general idea of what I would have done.
I hope this will help you.
Question 1: No. I would just use that ID column in that questions table as your unique identifier and drop that question_id field. My design:
create table author (
id int(11) NOT NULL auto_increment,
name varchar(256)
) engine=innodb;
create table course (
id int(11) not null auto_increment,
primary key(id),
name varchar(256)
) engine=innodb;
create table lesson (
id int(11) not null auto_increment,
primary key(id),
name varchar(256),
course_id int(11) NOT NULL,
FOREIGN KEY(course_id) references course(id)
) engine=innodb;
create table question(
id int(11) not null auto_increment,
primary key(id),
question_text text,
correct_answer text,
lesson_id int(11) NOT NULL,
foreign key(lesson_id) references lesson(id),
author_id int(11) not null,
FOREIGN KEY(author_id) REFERENCES author(id)
) engine=innodb;
Question 2: Don't do that. What if I have course_id 1, lesson 2, and question 11? That ID column would be identical to course_id 1, lesson 21, question 1.
And as an aside, I really hope you're using foreign keys. Since it says you're using mysql, be sure to use the Innodb storage engine with these tables so you can use foreign keys to enforce referential integrity.
The key to querying this database efficiently, avoiding collisions in supposedly unique values, avoiding serious performance issues in the future, and data duplication is to design your database in a normalized manner. My example above is normalized and avoids data duplication, as well as the composite key scheme that would not result in unique keys that you defined above. It's best to work with the features built into MySQL rather than try to reinvent the wheel.
I personally would set this up different:
Question table:
id int, -- PK auto-increment
content varchar(50),
answer varchar(50),
authorid int
Course table:
id int, -- PK auto-increment
name varchar(50)
Lesson Table:
id int, -- PK auto-increment
name varchar(50)
Question_Course_Lesson join table:
questionid int, -- PK
courseid int, -- PK
lessonid int -- PK
1.
You should try to get rid of question_id and only leave the autoincrement id as a primary key. Otherwise inserting will get messy.
The problem is, now when I want to insert new question with given course_id, lesson_id the id field wil auto-increment.
I don't understand the problem - auto_increment fields usually do that :).
2.
You can use
SELECT MAX(question_id) WHERE course_id = <something> AND lesson_id = <some_other_thing>
To get the current max id for a given course_id and lesson_id. But you will have to lock the table (or use FOR UPDATE and a transaction if the table is InnoDB) and unlock it after you insert the new record to make sure it remains consistent.
First question.
Actually, I can see why you might need the "redundant" id.
For example, it may play a role in the process of presenting your questions to test takers.
But, it surely does not need to be tied to the values of other columns in the row, the autogenerated id guarantees uniqness anyway.
Second question.
Use the onInsert trigger. It the best way to prevent collisions.

MYSQL: Two fields as PRIMARY KEYs + 1 Field as UNIQUE, a question

I have two primary (composite) keys that refer to a shop and a branch.
I thought I should have used a corresponding ID for each row, so I added a UNIQUE + AUTO_INCREMENT named ID.
So I had on the table a column named ID (AUTO INCREMENT), but it was declared PRIMARY - which was done automatically, and I don't want the ID to be PRIMARY. Just the shop and branch.
I have learnt how to trick MYSQL to accept the ID field as UNIQUE and AUTO INCREMENT, as it was not extremely trivial to make the AUTO_INCREMENT (it wanted to make it PRIMARY).
I had to ERASE the ID Field (for some reason it didn't let me erase its PRIMARY index), then declare it INDEX, and only then AUTO INCREMENT.
Is that a good approach ?
Could there be something I am doing wrong going with this design ?
Thanks !!!
The prevailing wisdom is that every table should have a unique autonumbered column named Id.
In classical data modeling, as developed by Codd and Date, the ID field is not necessary for a complete logical model of the data.
What good does the ID field do you? Do you ever reference a row in this table by its ID? If never, then just leave the field out. (shop, branch) provided a perfectly good candidate to be the PK.
What did your create table statement look like? Because I imagine this:
CREATE TABLE foo (
IDCol int not null auto_increment,
shop int not null,
branch int not null,
/* ... */
UNIQUE KEY IDCol (IDCol),
PRIMARY KEY (shop, branch)
);

Multiple possible relationships on table

I have a table that contains account information for various entities in the database. Currently the table design is something like:
CREATE TABLE account (id int(11) NOT NULL auto_increment,
account_id int(11) NOT NULL,
account_type varchar(15) NOT NULL,
balance decimal(12,2) NOT NULL,
PRIMARY KEY (id))
The account_id column references (not database enforced) one of 3 tables. The account_type column tells the programmer which table to reference. I do not like this approach, because I cannot enforce the relationship and the programmers can accidentally corrupt the data. I have considered doing one of the following:
Adding a nullable foreign key for each type, or dropping the account_id column and adding a cross reference table to link the account to the entities. The account_type column would be used to tell the programmers which cross reference table to access. Are there any other options? What is the best practice for something like this?
You could try having a master identity table from which the three shared-identity tables draw their primary keys. Your account table in the question would then link to the master table. Loosely described:
MasterIdentity
Id (autoincrement)
IdentityType (string, maybe FK to a type lookup table, whatever you want)
Table1
Id (PK, FK to MasterIdentity)
other data
Table2
Id (PK, FK to MasterIdentity)
other data
Table3
Id (PK, FK to MasterIdentity)
other data
Account
Id (its own identifier as you already have)
AccountID (FK to MasterIdentity)
other data
Inserting into any of the three tables would involve inserting into MasterIdentity, grabbing the scope identity value from the insert, and inserting into the desired table directly specifying the Id. (This would all have to be atomic within a transaction, of course.) Note that the Id on the three tables are not auto-increment values, you'd provide them.
Then any table which needs to refer to those three (non-overlapping, I assume) tables would have a single table to refer to which has the identity and the type, the latter of which tells you which sub-table has the rest of that record's data.
(I'm pretty sure this is called a supertype/subtype table relationship, but I can't say for certain.)