Beginner Database architecture

Beginner Database architecture - mysql

I am converting a spreadsheet to a database but how do i accommodate multiple values for a field?
This is a database tracking orders with factories.
Import PO# is the unique key. sometimes 1 order will have 0,1,2,3,4 or more customers requiring that we place their price tickets on the product in the factory. every order is different. what's the proper way to accommodate multiple values in 1 field?

Generally, having multiple values in a field is bad database design. Maybe a one to many relationship will work in this scenario.
So you will have an Order table with PO# as the primary key,
Then you will have a OrderDetails table with the PO# as a foriegn key. i.e. it will not be designated as a primary key.
For each row in the Order table you will have a unique PO# that will not repeat across rows.
In the OrderDetails table you will have a customer per row and because the PO# is not a primary key, it can repeat across rows. This will allow you to designate multiple customers per order. Therefore each row will have its own PriceTicketsOrdered field so you can know per customer what the price is.
Note that each customer can repeat across rows in the OrderDetails table as long as its for a different PO# and/or product.
This is the best I can tell you based on the clarity of your question.
Personally, I normally spend time desinging my database on paper or using some drawing software like visio before I start implementing my database in a specific software like MySql pr PostgreSql.
Reading up on ER Diagrams(Entity Relationship diagrams) might help you.
You should also read up on Database normalization. Probably you should read up on database normalization first.
here is a link that might help:
http://code.tutsplus.com/articles/sql-for-beginners-part-3-database-relationships--net-8561

Related

Sql creating table without using foreign key

Consider we have one sql table customers
now consider iF we have a table where their are two columns customer_name and orders_name now one customer may have multiple orders (one to many relationship) So we have table where in which we choose customer_name as foriegn key. But now consider we have 100 orders to one customer_name so we have to write same customer_name 100 times. waist of memory.
customer_name,customer_orders table is
so i was thinking is can't we just make table with name of customer_name orders, for examle if we have customer_name bill so we can create a table with name of bill's orders, and write all his orders in it, now we not using any foriegn key,
bill's orders table is
and more tables we can create for other users so how it is possible to delete the table when we delete that customer_name from main table. any idea?

You solve the issue of wasted space by using surrogate keys. Instead of copying a huge alphanumeric field (names) to child tables, you would create an ID of sorts using a more compact data type (byteint, smallint, int, etc.). In the approach you propose where you create a separate table for each customer, you will run into the following issues:
cannot run aggregates across customers, i.e., you cannot simply do a sum, avg, min, etc. for sets of customers slicing the data different ways
SQL will be far more complex with each extra customer added to the queries
your data dictionary is going to grow huge and at some point you will incur major performance issues that are not easy to fix
The point of using a relational database is to allow for users to dynamically slice and dice the data. The method that you are proposing would not be useful for querying.

Database design: same table - mixed data VS several tables - same schema

I would like to store information about people (who have a person_id) that is quite similar to each other, such as:
profession
nationality
tags
etc. = limited amount of characteristics which is not expected to grow in number
Since one person can have more than one tags (or professions for example), it makes sense to normalise the database. All these information require a simple table design: primary key (id) + varchar.
I am wondering what makes more sense:
Store mixed information in one table = one schema
Store information in distinct tables, but tables have the same schema
Edit
This information and the people are connected in a third table: primary key | person_id | property_id

1]One should store information in distinct tables having same schema, if your database is OLTP (Online transaction processing).Later you can use various joins to retrieve table data.
2]You should keep mixed information in one table if your database is for data mart/data warehouse/ data mining purpose where performance is not an issue but information related MIS is having more wheitage.

Is it necessary to bring the primary key of non repeating table while normalizing the database from UNF to 1NF

My UNF is
database(
manager_id,
manager_name,
{supplier_id,
supplier_name,
{order_id,
order_quantity}}
{purchase_id,
purchase_date}
Here manager_name, supplier_id, order_id and purchase_id are primary key.
During normalization there will be 1 table called purchase. Is it necessary to make manager_name as a foreign key?
How can I normalize these database?
This is a part of my college project on database. Normalization is really confusing.

First consider splitting things out by things that naturally go together. In this case you have manager information, supplier information, order information and purchase information. I personally would want to know the difference between an order and a purchase because that is not clear to me.
So you have at least four tables for those separate pieces of information (although depending on the other fields you might need, suppliers and managers could be in the same table with an additional field such as person_type to distinguish them, in this case you would want a lookup table to grab the valid person type values from). Then you need to see how these things relate to each other. Are they in a one to one relationship or a one-to many or a many to many relationship? In a one-to one relationship, you need the FK to also have a unique constraint of index to maintain the uniqueness. In a many to many you will need an additional junction table that contains both ids.
Otherwise in the simplest case the child table of purchase would have FKs to the manager, supplier. and order tables.
Manager name should under no circumstances be a primary key. Many people have the same name. Use Manager ID as the key because it is unique where name is not. In general I prefer to separate out the names into First, middle and last so that you can sort on last name easily. However in some cultures this doesn't work so well.

Boolean and String Values in the same table

I'm designing my first good sized project and I want to be sure I'm on the right path here so I thought I would run it by the community.
I have vendors that submit products to companies. The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough. My issue is with my table called submissions that starts to tie them all together for each new submission. I am trying to get away from having a submission table with a thousand columns because the companies all want to ask different questions. If I have
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
and
Table Questions
question_id
question
and to bridge the many to many
Table Questions_Submissions
questions_submissions_id
submission_id FK
question_id FK
answer
Would this be the recommended path for normalization and if so is there any harm having the column answer contain boolean and string results or should I somehow break the boolean questions into another table? I'm expecting millions of rows of data over the next few years and want to be sure I dont design this wrong from the beginning. Thanks for any feedback if you see a glaring error or red flag in this design.

So far I have a Table of companies, a table of vendors, and table of products. Each with their own primary key, easy enough.
Each row has its own id number. That's not quite the same thing as you'd get by normalizing a relation. In a relational database, the important thing is not identifying a row, it's identifying what the row represents.
So, for example, this table
Table Questions
question_id
question
could quite easily end up with data that looks like this.
question_id question
--
1 What is your name?
2 What is your name?
3 What is your name?
4 What is your name?
5 What is your name?
Each row is uniquely identified, but each question (the important thing) is not. You need a unique constraint on {question}.
I have vendors that submit products to companies.
Table Submissions
submission_id
date
product_id FK
vendor_id FK
company_id FK
You need a unique constraint on either {product_id, vendor_id, company_id} or {date, product_id, vendor_id, company_id}.
You also need a table of vendor products. Your table allows a vendor to submit any product--including every product they don't sell--to a company.
The vendors choose which company they want to submit to and that brings up a page of questions chosen by the company. (Emphasis added)
Nothing in your schema stores the questions a company has chosen.
is there any harm having the column answer contain boolean and string results
You can express just about any common data type as a string. But with this structure, you can't constrain boolean values to just two values. If you add the possibility of numeric results, you can't constrain them to sane values, either.

This is certainly one way to go about it, and it looks pretty good.
You can do some clever things with the answer and some if statements in the query to handle the different types of answers, but it does add some complexity to the solution, so you should think about what you are trying to do with the answers.
For Boolean, you can just as easily get away with "true" or "false" in the varchar field, and do a count on them. If you needed to get answers that are numeric or dates, for sums or averages directly in the query, you could split the answer into types.

Should order_products table be denormalized?

The order_products table holds data of products with the product name and price. It has a list of records what customers have bought.
There are also two fields called product_name and price which are duplicate data from the products table.
It is worth it to normalize order_products table and create history (audit) table for product name and price? Then I don't need product_name and price in the order_products table anymore?

I assume you need to store product name and price at the time of the order. Both will change in the course of time. If that happens a lot, your current approach may be good enough.
I would consider a normalized approach, especially if you have many rows in order_products per (product name, price). Have an additional table that stores the volatile states of a product every time they change. Could be called product_history like you already hinted. Just save the date (or timestamp) with every new state. Have a foriegn key link to the table product to preserve referential integrity. Like this:
create table product_history
(product_id integer -- or timestamp
,valid_from date
,product_name varchar
,price decimal
,PRIMARY KEY (product_id, valid_from)
,FOREIGN KEY (product_id) REFERENCES product(product_id)
ON DELETE CASCADE
ON UPDATE CASCADE)
A fast query to look up the applicable volatile attributes:
SELECT *
FROM product_history
WHERE product_id = $my_product_id
AND valid_from <= $my_date
ORDER BY valid_from DESC
LIMIT 1;
You definitely need an index on (product_id, valid_from) to speed up this query. The primary key in my example will probably do.

That depends. What is the purpose of that table?
In general tables like that can be used to statistical analysis of market trends so its important to have both product_name and price because the product price today may be different than what it was one month ago, but you may want to know at which prices products were most bought.
However if the presence of the price in that table is due to the fact that the price may be part of the products primary key then that is just bad practice and the key should be reduced.

It's not possible to make this judgement knowing just the database structure. It depends on how you use your database (ie. inserts, selects, updates and deletes... And how frequently?).
In one end, if your solution was a reporting solution on a read-only database, you should keep those duplicates! But if on the other end your solution is a logging solution that only logs information but never retreives, I'd go for the denormalized model you're suggesting.
Fully normalized database are not optimized for performance. You often have to denormalize your database design..
Very often a model that has a certain degree of redundant data is the fastest one. When denormalizing you just have to keep a steady eye on the balance between faster queries and slower insertions/updates!
Check these answers and maybe you'll find further help making your decision! When to Denormalize a Database Design

Yes that's a good idea, but a better idea is to create one field in order_products table and dump all your order info there after serializing them. With this approach you don't have to create 2 new tables (may be more if you want to do the same for gift coupon info, shipping info etc etc)
Rationale behind the approach is that order_products are placed order which means they are "published records". Published records don't change much and shouldn't be modified. And these records should be kept for future audits.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008