Comments on many tables database design issue - mysql

I have tables:
Articles{...}
Recipes{...}
Notifications{...}
Photos{...}
And I need to implement 'user comments' feature (like facebook).
Should I make tables: ArticleComments, RecipesComments etc. with 1:n relationship?
Or create one Comments table for all (but I have no idea how to design this)?

You could create another table CommentableEntity (although call it something better). Each of the rows in your tables (Articles, Recipes etc.) would have a reference to a unique row in this table. The entity table might have a type field to indicate the type of entity (to aid reverse joining).
You can then have a Comment table that references CommentableEntity, in a generic fashion.
So for example you'll end up with the following tables:
Articles
-----------------
Article_id
CommentableEntity_id (fk, unique)
Content
....
Recipes
-----------------
Recipe_id
CommentableEntity_id (fk, unique)
Content
....
CommentableEntity
-----------------
CommentableEntity_id (pk)
EntityType (e.g. 'Recipe', 'Article')
Comment
-------
Comment_id (pk)
CommentableEntity_id (fk)
User_id (fk)
DateAdded
Comment
...etc...
You can add the CommentableEntity record every time you add an Article/Recipe etc. All your comment-handling code has to know is the CommentableEntity_id - it doesn't care what type of thing it is.

That depends on how your application will be using comments.
My guess is that you'll frequently want to pull up all the comments a user has created regardless of the entity that they are commenting on. That is, I assume you'll frequently want a query that returns rows indicating that user JohnDoe commented on Article 1, then Photo 12, then Recipe 171. If that's the case, then it would make far more sense to have a single Comments table with a structure similar to what Steve Mayne has suggested with the CommentableEntity table.
On the other hand, if you would only be accessing the comments for a particular item (i.e. all comments for Article 1), separate ArticleComments and PhotoComments tables may be more appropriate. That makes it easier to have foreign keys between the entity table and the comment table and is potentially a bit more efficient since it's a poor man's partitioning. Of course, as soon as you start having to combine data from multiple comment tables, this efficiency goes away so you'd need to be reasonably confident about the use cases.

The easiest way would to have a 'polymorphic' comments table that would have columns for both the id and the type of the object that it refers to.
The you could do the following:
SELECT * FROM Comments where type = "Articles" and type_id = 1;
SELECT * FROM Comments where type IN ("Recipes", "Photos")
Putting a unique compound index on (type, id) would also improve the performance of the look ups.

SELECT TOP 1000 [Comments_Id]
,[Comments_Text]
,[Comments_IsApproved]
,[Comments_IsVisible]
,[Comments_DateStamp]
,[Type_Id]
,[Entity_Id] -- From Entity Table, listing Articles, Recipes etc.
,[EntityItem_Id] -- One of the PK from table of Articles, Recipes etc.
,[User_Id]
FROM [tbl_Comments]

To have an idea on how to create a single Comments table for all objects, you can take a look at django comment model ( http://docs.djangoproject.com/en/dev/ref/contrib/comments/models/ )

Related

How to store many to many relationship with extra column data

in the above scenario 'signs and symptoms' is a multi selection and if 'others' selected 'specify-others' field must be filled . how to store this .
what is the best table structure for performance and querying
Either to provide 15 columns in single table and store null if no value or to store foreign key of symptoms in another table (in this strategy how to store 'others symptom' description column ie specify-other field data).
There is no universal answer, your choice may depend on multiple factors including external issues, i.e. coding framework you use to support database (if any). The "classic" way to do it:
1. Patient table:
id (PK)
name
2. Symptom table:
id (PK)
symptom
3. Patient to Symptom table:
id (PK)
patient_id (FK)
symptom_id (FK)
other_symptoms (text)
But once again, any approach (including this one) has its own pros and cons and this is not a universal solution.
I would definitely exclude the 15 columns in a table option because whenever a new symptom would be needed to be added, and it will be needed rather sooner than later, you'll have to:
alter the table schema
the code that displays the symptoms
the code that inserts/updates patient records
who knows what else.
I'd go with a classic many to many relationship, with tables similar to:
patients: patient_id, name, etc
symptoms: symptom_id, name, description, etc
patient_symptoms: patient_id, symptom_id
Even better would be an extra table:
visits: doctor_id, patient_id, date, other_symptoms
And then, your patient_symptoms table can be related to an actual visit to a doctor:
patient_symptoms: visit_id, symptom_id

Uniqueness constraint on secondary relation

I'm trying to model a simple poll system, I have 4 tables
Election
id, title, description
Candidate
id, electionId, name
User
id, (other user details)...
Vote
userId, candidateId
There is a 1-n relation from Election to Candidate. If someone runs in multiple elections, they are listed as multiple candidates.
I'm having trouble figuring out how to constrain each user to one vote in each election at the database level. If I create an electionId column in Vote I create inconsistent or redundant data, but I can't think of any other way to constrain the data like that otherwise.
I feel like this has to be a common problem but I don't know what to call it so my last half an hour of searching hasn't been fruitful. What's the correct approach here?
You could change Candidate's PK to be a composite of electionId, name or at least make that combination a unique constraint in Candidate.
Then you would change Vote to be userId, electionId, name where the PK is userId, electionId and there is a FK pointing to Candidate's electionId, name which is now unique.
This means that userId and electionId are unique for the vote table and there is no redundancy left.
You can do this with your current schema by adding validation before the insert into Vote (in mysql this is done with a TRIGGER BEFORE INSERT). You'd select all votes by that particular user, joined with candidate on candidateId, and make sure none of the electionIds match the election Id of the candidate the vote is for.
This is completely normalized but expensive. Sometimes it's worth adding redundant fields for the sake of performance. I'd add electionId to Vote in this schema so that inserts don't need such an expensive validation.

mysql - It is ok to have n amount of columns?

I know it's possible to have n amount of columns, but is it proper mysql "coding standard"?
Here is what I'm doing:
I am a table student which includes all the students info including testScores:
student
-------
studId
name
age
gender
testId
Instead of putting each individual test answer within the student table, I made a separate table called testAnswers that will hold each students test results:
testAnswers
-----------
testId
ques1
ques2
.
.
.
quesN
Each entry in the testAnswers table corresponds to a specific student in the table student.
Of course, there will be an admin that will be able to add questions and remove questions as each year the test questions may change. So, if the admin were to remove an answer, than that means one of the columns would be removed.
Just to reiterate myself, I know this is possible to edit and remove columns in a table in mysql, but is good "coding standard"?
The answer is a simple and clear: No. That's just not how you should do it except for very few corner cases.
The usual way to approach this is to normalize your database. Normalization follows a standard procedure that (among other things) avoids having a table with columns names ques1, ques2, ques3 ....
This process will lead you to a database with three tables:
students - id, name, and other stuff that applies to one student each
questions - id and question text for each question
answers - this is a N:M relation between students: student_id, question_id, answer_value
Use two tables!
What you are describing is a one to many relationship as there can be one student to many test scores. You would need to have some id as a foreign key to the student_id and put this id in the testAnswers table. You can then set constraints, which tell the database how to handle removal of data.
As one commenter has mentioned, using one table would result in breaking 1nf or first normal form which basically says that you cannot have multiple values for a single column given a particular record - You can't have multiple test scores for the same user in a given table, instead break the data up into two tables.
...of course 2 tables, also could use 3, just remember to insert a studId column also in the testAnswers table (with REFERENCE to the student table) and an INNER JOIN testAnswers ON student.studId=testAnswers.studId at the SELECT query (to read the data).

Boolean and String Values in the same table

I'm designing my first good sized project and I want to be sure I'm on the right path here so I thought I would run it by the community.
I have vendors that submit products to companies. The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough. My issue is with my table called submissions that starts to tie them all together for each new submission. I am trying to get away from having a submission table with a thousand columns because the companies all want to ask different questions. If I have
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
and
Table Questions
question_id
question
and to bridge the many to many
Table Questions_Submissions
questions_submissions_id
submission_id FK
question_id FK
answer
Would this be the recommended path for normalization and if so is there any harm having the column answer contain boolean and string results or should I somehow break the boolean questions into another table? I'm expecting millions of rows of data over the next few years and want to be sure I dont design this wrong from the beginning. Thanks for any feedback if you see a glaring error or red flag in this design.
So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough.
Each row has its own id number. That's not quite the same thing as you'd get by normalizing a relation. In a relational database, the important thing is not identifying a row, it's identifying what the row represents.
So, for example, this table
Table Questions
question_id
question
could quite easily end up with data that looks like this.
question_id question
--
1 What is your name?
2 What is your name?
3 What is your name?
4 What is your name?
5 What is your name?
Each row is uniquely identified, but each question (the important thing) is not. You need a unique constraint on {question}.
I have vendors that submit products to companies.
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
You need a unique constraint on either {product_id, vendor_id, company_id} or {date, product_id, vendor_id, company_id}.
You also need a table of vendor products. Your table allows a vendor to submit any product--including every product they don't sell--to a company.
The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. (Emphasis added)
Nothing in your schema stores the questions a company has chosen.
is there any harm having the column answer contain boolean and string results
You can express just about any common data type as a string. But with this structure, you can't constrain boolean values to just two values. If you add the possibility of numeric results, you can't constrain them to sane values, either.
This is certainly one way to go about it, and it looks pretty good.
You can do some clever things with the answer and some if statements in the query to handle the different types of answers, but it does add some complexity to the solution, so you should think about what you are trying to do with the answers.
For Boolean, you can just as easily get away with "true" or "false" in the varchar field, and do a count on them. If you needed to get answers that are numeric or dates, for sums or averages directly in the query, you could split the answer into types.

Best way to design a table in which a cell has mutiple references?

My question regards how to design a database.
I have one table, called posts, with columns:
ID, subject, keywords, (and a few other columns)
and another table called keywords with:
kw_id, keyword.
Now, each "post" has several keywords and, sometimes, keywords are deleted because they don't make sense or are duplicates.
My question is:
Can keywords column in table posts be a foreign key? (each row will have multiple keywords)
If I can't, what is the best way to ensure data integrity (specially when a keyword is deleted)?
thanks in advance
EDIT: Can you point me any books or documents I should read about database design? It seems I'm laking key knowledge about database design.
You must flip dependencies: Table KEYWORD should reference back to post. You may want someting in between to find all posts for a given keyword (pseudo code):
POST
POST_ID
...
KEYWORD
KEYWORD_ID
NAME /* the keyword */
POSTKEYWORDREL /* relationship */
POST_ID /* foreign key to POST */
KEYWORD_ID /* foreign key to KEYWORD */
Now you can easily delete a keyword from a given post by just removing the relationship in POSTKEYWORDREL.
EDIT: As always, for documentation let me point you to Wikipedia. You should also have a look at normalization (in my opinion the most important concept when it comes to database design).
You need a many-many table in the "middle" with foreign keys
Posts:
ID (PK)
Subject
(and a few other columns)
Keywords:
kw_id (PK)
Keyword (UQ)
PostsKeywords
PostID (PK, FK to Posts.ID)
kw_id (PK, FK to Keywords.kw_id)
Why have different conventions for your "ID" columns though? Personally I'd use PostID and KeywordID throughout.
Edit: database design link
to follow your design, in the posts table you remove the column keywords.
you create then another table to make the many-to-many relationship, something like
PostKeywords
that table will contain at minimum post_Id and kw_id and both are foreign keys to their own tables.
personally in these cases I also create a local PK column that is not involved in the many to many, for example a PostKeywords_ID which is an auto increment local to that table only.
Instead of putting a keyword column in your post table, you'll be wanting a separate table post_keyword with two columns: post_id and keyword_id. The presence of a row in this table indicates that a particular post has been assigned a particular keyword. The primary key of this table is (post_id, keyword_id). Both columns are foreign keys to their master tables.
This is standard design practice for many-to-many relationships.
If your keyword items are to be presented in a particular order, add an order column.
For best results, by the way, name the id column post_id in your post table and all the other tables in which it appears. That way various schema-engineering tools will be able to figure out what you're doing.