I'm designing my first good sized project and I want to be sure I'm on the right path here so I thought I would run it by the community.
I have vendors that submit products to companies. The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough. My issue is with my table called submissions that starts to tie them all together for each new submission. I am trying to get away from having a submission table with a thousand columns because the companies all want to ask different questions. If I have
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
and
Table Questions
question_id
question
and to bridge the many to many
Table Questions_Submissions
questions_submissions_id
submission_id FK
question_id FK
answer
Would this be the recommended path for normalization and if so is there any harm having the column answer contain boolean and string results or should I somehow break the boolean questions into another table? I'm expecting millions of rows of data over the next few years and want to be sure I dont design this wrong from the beginning. Thanks for any feedback if you see a glaring error or red flag in this design.
So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough.
Each row has its own id number. That's not quite the same thing as you'd get by normalizing a relation. In a relational database, the important thing is not identifying a row, it's identifying what the row represents.
So, for example, this table
Table Questions
question_id
question
could quite easily end up with data that looks like this.
question_id question
--
1 What is your name?
2 What is your name?
3 What is your name?
4 What is your name?
5 What is your name?
Each row is uniquely identified, but each question (the important thing) is not. You need a unique constraint on {question}.
I have vendors that submit products to companies.
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
You need a unique constraint on either {product_id, vendor_id, company_id} or {date, product_id, vendor_id, company_id}.
You also need a table of vendor products. Your table allows a vendor to submit any product--including every product they don't sell--to a company.
The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. (Emphasis added)
Nothing in your schema stores the questions a company has chosen.
is there any harm having the column answer contain boolean and string results
You can express just about any common data type as a string. But with this structure, you can't constrain boolean values to just two values. If you add the possibility of numeric results, you can't constrain them to sane values, either.
This is certainly one way to go about it, and it looks pretty good.
You can do some clever things with the answer and some if statements in the query to handle the different types of answers, but it does add some complexity to the solution, so you should think about what you are trying to do with the answers.
For Boolean, you can just as easily get away with "true" or "false" in the varchar field, and do a count on them. If you needed to get answers that are numeric or dates, for sums or averages directly in the query, you could split the answer into types.
Related
in the above scenario 'signs and symptoms' is a multi selection and if 'others' selected 'specify-others' field must be filled . how to store this .
what is the best table structure for performance and querying
Either to provide 15 columns in single table and store null if no value or to store foreign key of symptoms in another table (in this strategy how to store 'others symptom' description column ie specify-other field data).
There is no universal answer, your choice may depend on multiple factors including external issues, i.e. coding framework you use to support database (if any). The "classic" way to do it:
1. Patient table:
id (PK)
name
2. Symptom table:
id (PK)
symptom
3. Patient to Symptom table:
id (PK)
patient_id (FK)
symptom_id (FK)
other_symptoms (text)
But once again, any approach (including this one) has its own pros and cons and this is not a universal solution.
I would definitely exclude the 15 columns in a table option because whenever a new symptom would be needed to be added, and it will be needed rather sooner than later, you'll have to:
alter the table schema
the code that displays the symptoms
the code that inserts/updates patient records
who knows what else.
I'd go with a classic many to many relationship, with tables similar to:
patients: patient_id, name, etc
symptoms: symptom_id, name, description, etc
patient_symptoms: patient_id, symptom_id
Even better would be an extra table:
visits: doctor_id, patient_id, date, other_symptoms
And then, your patient_symptoms table can be related to an actual visit to a doctor:
patient_symptoms: visit_id, symptom_id
I'm trying to model a simple poll system, I have 4 tables
Election
id, title, description
Candidate
id, electionId, name
User
id, (other user details)...
Vote
userId, candidateId
There is a 1-n relation from Election to Candidate. If someone runs in multiple elections, they are listed as multiple candidates.
I'm having trouble figuring out how to constrain each user to one vote in each election at the database level. If I create an electionId column in Vote I create inconsistent or redundant data, but I can't think of any other way to constrain the data like that otherwise.
I feel like this has to be a common problem but I don't know what to call it so my last half an hour of searching hasn't been fruitful. What's the correct approach here?
You could change Candidate's PK to be a composite of electionId, name or at least make that combination a unique constraint in Candidate.
Then you would change Vote to be userId, electionId, name where the PK is userId, electionId and there is a FK pointing to Candidate's electionId, name which is now unique.
This means that userId and electionId are unique for the vote table and there is no redundancy left.
You can do this with your current schema by adding validation before the insert into Vote (in mysql this is done with a TRIGGER BEFORE INSERT). You'd select all votes by that particular user, joined with candidate on candidateId, and make sure none of the electionIds match the election Id of the candidate the vote is for.
This is completely normalized but expensive. Sometimes it's worth adding redundant fields for the sake of performance. I'd add electionId to Vote in this schema so that inserts don't need such an expensive validation.
I am building a library database and I am stuck on one particular thing.
I have three tables :BookCopy, BookLoan and Members. It is not clear to me how to make the relationships between them, so a member can borrow a book(or books) and all this to be correctly reflected in my database.
My idea was to have a two many-to-many tables, so I add BoakLoansMembers and BookCopiesBookLoans . I am not sure if this is correct, and even if it is, I have no idea how to scipt so many tables.
So, now I am wondering what would be the best thing to be done in this case and why?
I'm guessing your BookCopy is to account for having X copies of book Y, and in that sense "books" are not loaned, "copies" of them are, right?
I think the best course of action is probably to realize the BookLoan table should be the many-to-many table. A copy is loaned to a member at a time and then returned. BookLoad should have the id's for the copy and the member, and the date loaned (as you have now, though it should be a datetime field NOT a varchar one) & date returned (like the loaned date, it should be a datetime, but should also be nullable to represent unreturned copies). You should also keep the unique (presumably auto-increment) id of the loan as it is very possible a member might check out the same copy multiple times.
I am guessing that perhaps you were originally conceptualizing the "loan" similar to a sales transaction, which could work; but you would want a loanCopies table, and wouldn't want the dateReturned on the loan then since different copies could be returned independently.
Edit (additional observations):
isAvailable may be redundant if it is only based on whether the copy is checked out (if you want to withhold the book from circulation it might be appropriate though)
ISBN maxes at 13 characters according to wikipedia (char van be a little more efficient than varchar under some circumstances)
you might want to consider a languages table that the copy can reference rather than using a string type field.
Edit (re: isAvailable):
If you just need to find the copies not loaned out, a simple query like this is all you need.
SELECT *
FROM BookCopy
WHERE idBookCopy NOT IN (
SELECT idBookCopy
FROM BookLoan
WHERE dateReturned IS NULL
);
The subquery gets the list of copies loaned out, and the NOT IN makes sure the copies in the results are not in that list.
If you want to prevent a copy from being loaned out (damaged, vandalized, etc...) an isAvailable "flag" could be a simple way to add such functionality; just add AND isAvailable = 1 to the outer query's WHERE conditions.
You can just have an m:m relationship between Members and BookCopy and use your BookLoan Table as your cross join table. So you basically just have to add the references from the tables Members and Bookcopy to the Table BookLoan
BookLoan
---------------
idBookLoan
dateLoaned
dateReturned
idBookCopy FK -- add these two
idMember FK
And also consider making idBookCopy, idMember and dateLoaned the primary keys of your BookLoan Table
My UNF is
database(
manager_id,
manager_name,
{supplier_id,
supplier_name,
{order_id,
order_quantity}}
{purchase_id,
purchase_date}
Here manager_name, supplier_id, order_id and purchase_id are primary key.
During normalization there will be 1 table called purchase. Is it necessary to make manager_name as a foreign key?
How can I normalize these database?
This is a part of my college project on database. Normalization is really confusing.
First consider splitting things out by things that naturally go together. In this case you have manager information, supplier information, order information and purchase information. I personally would want to know the difference between an order and a purchase because that is not clear to me.
So you have at least four tables for those separate pieces of information (although depending on the other fields you might need, suppliers and managers could be in the same table with an additional field such as person_type to distinguish them, in this case you would want a lookup table to grab the valid person type values from). Then you need to see how these things relate to each other. Are they in a one to one relationship or a one-to many or a many to many relationship? In a one-to one relationship, you need the FK to also have a unique constraint of index to maintain the uniqueness. In a many to many you will need an additional junction table that contains both ids.
Otherwise in the simplest case the child table of purchase would have FKs to the manager, supplier. and order tables.
Manager name should under no circumstances be a primary key. Many people have the same name. Use Manager ID as the key because it is unique where name is not. In general I prefer to separate out the names into First, middle and last so that you can sort on last name easily. However in some cultures this doesn't work so well.
I am converting a spreadsheet to a database but how do i accommodate multiple values for a field?
This is a database tracking orders with factories.
Import PO# is the unique key. sometimes 1 order will have 0,1,2,3,4 or more customers requiring that we place their price tickets on the product in the factory. every order is different. what's the proper way to accommodate multiple values in 1 field?
Generally, having multiple values in a field is bad database design. Maybe a one to many relationship will work in this scenario.
So you will have an Order table with PO# as the primary key,
Then you will have a OrderDetails table with the PO# as a foriegn key. i.e. it will not be designated as a primary key.
For each row in the Order table you will have a unique PO# that will not repeat across rows.
In the OrderDetails table you will have a customer per row and because the PO# is not a primary key, it can repeat across rows. This will allow you to designate multiple customers per order. Therefore each row will have its own PriceTicketsOrdered field so you can know per customer what the price is.
Note that each customer can repeat across rows in the OrderDetails table as long as its for a different PO# and/or product.
This is the best I can tell you based on the clarity of your question.
Personally, I normally spend time desinging my database on paper or using some drawing software like visio before I start implementing my database in a specific software like MySql pr PostgreSql.
Reading up on ER Diagrams(Entity Relationship diagrams) might help you.
You should also read up on Database normalization. Probably you should read up on database normalization first.
here is a link that might help:
http://code.tutsplus.com/articles/sql-for-beginners-part-3-database-relationships--net-8561