I have three simple entities :
Course is like a Book, a sell-able product. The Course entity represents a course and has various properties, such as Duration, Fees, Author, Type and so on.
Course
{
int Id;
string Title;
}
A topic is like an individual page in a Book, it has the actual learning content. A topic may appear in multiple courses.
Topic
{
int Id;
string Title;
}
In context of a book, Quiz is also an individual page which holds questions instead of learning content. Again a Quiz may be appear into multiple courses.
Quiz
{
int Id;
string Title;
}
Now that i have individual Topics and Quizzes i wish to have a table that will assemble Topics and Quizzes into a Book. Consider this table as Table of Contents in a book. Below is a outline of what i am expecting it to look like :
CourseContents
{
int CourseId; // Foreign-Key to Courses.Id
int Page; // Foreign-Key to either Topic.Id or Quiz.Id
int SNo; // Sequence of this page (topic/quiz) in the course, much like page number in a book.
int Type // Type of the page i.e, Quiz or Topic.
}
Is there any way to achieve this in RDBMS ?
Attempt to Solution
One approach i am looking at is creating a table to create a unique identifier for a given Course Item. Then use it in mapping tables Courses-Topics and Courses-Quizzes. Please refer below :
CourseContents
{
int Id; // CourseContentId Primary-Key for this table
int CourseId; // Foreign key to Course.Id
int SNo; // Serial number of an item in this course;
}
CourseTopics
{
int TopicId; // Foreign-Key to Topics.Id
int CourseContentsId; // Foreign-Key to CourseContents.Id
}
CourseQuizzes
{
int QuizId; // Foreign-Key to Quizzes.Id
int CourseContentsId; // Serial number of the quiz in the course
}
Problem : The CourseContentId represent a particular position ( of Topic/Quiz ) in a particular course. Two items cannot occupy same position in a course sequence, hence one CourseContentId must be associated with just one item in either CourseTopics or CourseQuizzes. How can we put unique constraint on CourseContentsId across two tables ?
Further Addition
The above said problem can be solved by adding a ContentType column in CourseContents, CourseTopics and CourseQuizzes column. Then applying Check constraint on the tables to make sure :
CourseContents has a unique combination of CourseContentId and ContentType.
CourseTopics & CourseQuizzes must have the same content Type across.
Adding a Foreign key referencing CourseContents(CourseContentId, ContentType) in CourseTopics & CourseQuizzes tables.
This will ensure that a CourseContentId will not appear in both the tables.
The CourseContentId represent a particular position ( of Topic/Quiz ) in a particular course.
CourseTopics
{
int TopicId; // Foreign-Key to Topics.Id
int CourseContentsId; -- first of 3-part FK
int Page; -- added
int SNo; -- added
PRIMARY KEY(TopicId, CourseContentsId, Page, SNo), -- for JOINing one way
INDEX (CourseContentsId, Page, SNo, TopicId) -- for JOINing the otehr way
}
Meanwhile, ...
I guess that your main Problem is embodied in this one line:
int Page; // Foreign-Key to either Topic.Id or Quiz.Id
That is impractical. The solution is to have a single table for Topic and Page and differentiate from there.
CREATE TABLE CourseContents (
CourseContentsId INTEGER NOT NULL PRIMARY KEY,
CourseContentType CHAR(1) CHECK (CourseContentType IN ('T', 'Q')),
CourseId INTEGER REFERENCES Courses(Id),
SNo INTEGER NOT NULL,
CONSTRAINT UniqueCourseContent UNIQUE (CourseId, SNo),
CONSTRAINT UniqueCourseContentMapping UNIQUE (CourseContentsId, CourseContentType),
);
Course Contents table generates a Unique Id ( CourseContentsId ) for each CourseId and SNo combination, which can then be referenced in the Topics & Quizzes table. As there are two different tables (Topics & Quizzes), we introduce another column that identifies the Type of the Content(Topic/Quiz) that it is linked to. By using a composite UNIQUE constraint on CourseContentsId & CourseContentType we make sure that each entry can be linked to only one content type.
CREATE TABLE CourseTopics (
CourseContentsId INTEGER NOT NULL,
CourseContentType CHAR(1) DEFAULT 'T' CHECK (CourseContentType = 'T'),
TopicId INTEGER REFERENCES Topics(Id),
PRIMARY KEY (CourseContentsId, CourseContentType),
FOREIGN KEY (CourseContentsId, CourseContentType) REFERENCES CourseContents (CourseContentsId, CourseContentType)
);
Course Topics table is mapping table between Topics and Courses ( we have many-to-many relationship between Courses & Topics tables ). The foreign & primary key to CourseContents table ensures that we'll have one entry for each CourseContent ( in other words Course & SNo ). The table restricts CourseContentType to only accept 'T', which means a given CourseContentId must have Content Type of Topic inorder to be linked with a Topic.
CREATE TABLE CourseQuizzes (
CourseContentsId INTEGER NOT NULL,
CourseContentType CHAR(1) DEFAULT 'Q' CHECK (CourseContentType = 'Q'),
QuizId INTEGER REFERENCES Quizzes(Id),
PRIMARY KEY (CourseContentsId, CourseContentType),
FOREIGN KEY (CourseContentsId, CourseContentType) REFERENCES CourseContents (CourseContentsId, CourseContentType)
);
Similar to the Topics table we now create CourseQuizzes table. Only difference is here we have CourseContentType 'Q'.
Finally to simplify querying, we can create a view that joins these tables together. For e.g, the view below will list : CourseId, SNo, ContentType, TopicId, QuizId. In context of a book, with this view you can get what's on a particular Page Number (SNo) of a given book (Course), with type of content on the page (Topic or Quiz) and the id of the content.
CREATE VIEW CourseContents_All AS
SELECT CourseContents.CourseId, CourseContents.SNo, CourseContents.CourseContentType , CourseTopics.Id, CourseQuizzes.Id
FROM CourseContents
LEFT JOIN CourseTopics ON (CourseContents.CourseContentsId = CourseTopics.CourseContentsId)
LEFT JOIN CourseQuizzes ON (CourseContents.CourseContentsId = CourseQuizzes.CourseContentsId);
The advantages that i feel with this approach are :
This structure follows inheritance, which means we can support more content types by just adding another table and modifying CourseContentType Check constraint in CourseContents table.
For a given CourseId and SNo. i also know the content type. This certainly will help in the application code.
Note : Check constraint does not work in MySQL. For it one needs to use Triggers instead.
Related
TL;DR: Why do we have to add ON table1.column = table2.column?
This question asks roughly why do we need to have foreign keys if joining works just fine without them. Here, I'd like to ask the reverse. Given the simplest possible database, like this:
CREATE TABLE class (
class_id INT PRIMARY KEY,
class_name VARCHAR(40)
);
CREATE TABLE student (
student_id INT PRIMARY KEY,
student_name VARCHAR(40),
class_id INT,
FOREIGN KEY(class_id) REFERENCES class(class_id) ON DELETE SET NULL
);
… and a simple join, like this:
SELECT student_id, student_name, class_name
FROM student
JOIN class
ON student.class_id = class.class_id;
… why can't we just omit the ON clause?
SELECT student_id, student_name, class_name
FROM student
JOIN class;
To me, the line FOREIGN KEY(class_id) REFERENCES class(class_id) … in the definition of student already includes all the necessary information for the FROM student JOIN class to have an implicit ON student.class_id = class.class_id condition; but we still have to add it. Why is that?
For this you must consider the JOIN operation. It doesn't check if your two table or collection have relation or not. So the simple join without condition (ON) you will have a big result with all possibilities.
The ON operation filters to get your expected result
Reposting Damien_The_Unbeliever's comment as an answer
you don't have to join on foreign keys;
sometimes multiple foreign keys exist between the same pair of tables.
Also, SQL is a crusty language without many shortcuts for the most common use case.
JOIN condition is an expression which specifies the maching criteria, and it is checked during JOIN process. It can cause a fail only if syntax error occures.
FOREIGN KEY is a rule for data consistency checking subsystem, and it is checked during data change. It will cause a fail if the data state (intermnediate and/or final) does not match the rule.
In other words, there is nothing in common between them, they are completely different and unrelated things.
I feel like I have to reiterate parts of the question. Please, give it a second read - Dima Parzhitsky
Imagine that your offer is accepted. I have tables:
CREATE TABLE users (userid INT PRIMARY KEY);
CREATE TABLE messages (sender INT REFERENCES users (userid),
receiver INT REFERENCES users (userid));
I write SELECT * FROM users JOIN messages.
What reference must be used for joining condition? And justify your assumption...
I have a tables called userAccounts userProfiles and usersearches.
Each userAccount may have multiply Profiles. Each user may have many searches.
I have the db set up working with this. However in each search there may be several user profiles.
Ie, each user account may have a profile for each member of their family.
They then want to search and include all or some of their family members in their search. The way i would kinda like it to work is have a column in user searches called profiles and basically have a list of profileID that are included in that search. (But as far as i know, you can't do this in sql)
The only way i can think i can do this is have 10 columns called profile1, profile2 ... profile10 and place each profileid into the column and 0 or null in the unused space. (but this is clearly messy )
Creating columns of the form name1...nameN is a clear violation of the Zero, One or Infinity Rule of database normalization. Arbitrarily having ten of them is not the right approach, that's an assumption that will prove to be either wildly generous or too constrained most of the time. Since you're using a relational database, try and store your data relationally.
Consider the schema:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
name VARCHAR(255),
UNIQUE KEY index_on_name (name)
);
CREATE TABLE profiles (
id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
user_id INT NOT NULL,
name VARCHAR(255),
email VARCHAR(255),
KEY index_on_user_id (user_id)
);
With that you can create zero or more profile records as required. You can also add or remove fields from the profile records without impacting the main user records.
If you ever want to search for all profiles associated with a user:
SELECT ... FROM profiles
LEFT JOIN users ON
users.id=profiles.user_id
WHERE users.name=?
Using a simple JOIN or subquery you can easily exercise this relationship.
For my RSS aggregator, there a four tables that represent rss and atom feeds and, their articles. Each feed type and entry type will have zero or more categories. In the interest of not duplicating data, I'd like to have only one table for categories.
How can I accomplish this?
One way is to keep categories in one single table - e.g. category - and define an X table for each entity/table that needs 0 or more category associations:
rssFeedXCategory
rssFeedId INT FK -> rssFeed (id)
categoryId INT FK -> category (id)
atomFeedXCategory
atomFeedId INT FK -> atomFeed (id)
categoryId INT FK -> category (id)
and so on.
You can define a PK for both columns in each table, but an extra identity column may also be used. When working with an ORM, I also have an extra identity/autoincrement column (e.g. XId INT), so that a single column can be used to identity a row.
Let’s assume there are some rows in a table cars, and each of these rows has an owner. If this owner were always a person (conveniently situated in a table persons), this would be your standard one-to-many relation.
However, what if the owner could not only be a person, but also a company (in a table companies)? How would this relationship be modeled and how would it be handled in PHP?
My first idea was to create a column person and a column company and check that one of them always stays NULL, while the other is filled – however, that seems somewhat inelegant and becomes impractical once there is a higher number of possible related tables.
My current assumption would be to not simply create the foreign key as an integer column person in the table, but to create a further table called tables, which gives IDs to the tables, and then split the foreign key into two integer columns: owner_table, containing the ID of the table (e.g. 0 for persons and 1 for companies), and owner_id, containing the owner ID.
Is this a viable and practical solution or is there some standard design pattern regarding such issues? Is there a name for this type of problem? And are there any PHP frameworks supporting such relations?
EDIT: Found a solution: Such structures are called polymorphic relations, and Laravel supports them.
There are multiple ways to do it.
You can go with two nullable foreign keys: one referencing company and the other user. Then you can have a check constraint which assure you one is null. With PostgreSQL:
CREATE TABLE car{
<your car fields>
company_id INT REFERENCES car,
person_id INT REFERENCES person,
CHECK(company_id IS NULL AND person_id IS NOT NULL
OR company_id IS NOT NULL AND person_id IS NULL)
};
Or you can use table inheritance (beware their limitations)
CREATE TABLE car_owner{
car_owner_id SERIAL
};
CREATE TABLE company{
<company fields>
} INHERITS(car_owner);
CREATE TABLE person{
<person fields>
} INHERITS(car_owner);
CREATE TABLE car{
<car fields>
car_owner_id INT REFERENCES car_owner
};
I have the following data:
CREATE TABLE `groups` (
`bookID` INT NOT NULL,
`groupID` INT NOT NULL,
PRIMARY KEY(`bookID`),
KEY( `groupID`)
);
and a book table which basically has books( bookID, name, ... ), but WITHOUT groupID. There is no way for me to determine what the groupID is at the time of the insert for books.
I want to do this in sqlalchemy. Hence I tried mapping Book to the books joined with groups on book.bookID=groups.bookID.
I made the following:
tb_groups = Table( 'groups', metadata,
Column('bookID', Integer, ForeignKey('books.bookID'), primary_key=True ),
Column('groupID', Integer),
)
tb_books = Table( 'books', metadata,
Column('bookID', Integer, primary_key=True),
tb_joinedBookGroup = sql.join( tb_books, tb_groups, \
tb_books.c.bookID == tb_groups.c.bookID)
and defined the following mapper:
mapper( Group, tb_groups, properties={
'books': relation(Book, backref='group')
})
mapper( Book, tb_joinedBookGroup )
...
However, when I execute this piece of code, I realized that each book object has a field groups, which is a list, and each group object has books field which is a singular assigment. I think my definition here must have been causing sqlalchemy to be confused about the many-to-one vs one-to-many relationship.
Can someone help me sort this out?
My desired goal is:
g.books = [b, b, b, .. ]
book.group = g
where g is an instance of group, and b is an instance of book.
Pass userlist=False to relation() to say that property should represent scalar value, not collection. This will for independent on whether there is primary key for this column, but you probably want to define unique constraint anyway.