Database Approach - Redundant data - mysql

I have 3 tables:
products (id, name, price, etc)
orders (id, date, payment_method, etc)
shipments (id, order_id, product_id, address, etc)
My question is: It is correct to keep in shipments table product_id? I keep it here to find information about a shipped product without using orders table.

I would suggest:
products (product_id, name, price, etc)
orders (order_id, date, payment_method, etc)
orderitem (orderitem_id, order_id, product_id, ...)
shipment (shipment_id, order_id, ... )
shipment is kind of redundant - I'd add the address etc into orders...

You can do it, but be careful - if the information in table orders could change, it would be a problem - i.e. if the appropriate record in table orders changes the product_id, the database would be inconsistent.
I do use redundant columns e.g. in static dictionaries.
Also check for the NORMAL FORMS (NF) of database desing, I'm not sure if this redundancy doesn't violate some normal form. But it is up to you if you decide to keep some NF or not.

Following the principles of truth and beauty, you should not store redundant data - it's a great opportunity for bugs to occur, it's ugly, it causes confusion in the minds of future developers.
You're allowed to break the principles of truth and beauty, but only if you run into a problem you cannot solve in any other way. For instance, if you find your queries are just too slow by joining to the orders table, denormalizing the data (which is the technical name for what you're doing) is okay - if you document it and make sure all developers understand.
Just avoiding an extra join in a query doesn't seem like a good enough reason....

Related

During normalization of data, can I add my own attributes?

Can we add our own attributed to the main table during the normalization process.
For example, we have
custid, custname, invoice_date, invoice amount, prod_code, prod_description.
Can I add invoice_ID to the table ?
You have left out a lot of key details making this is a very poor question. Assuming you have have the correct permission/role in any DBMS you can alter any table attributes. The normalisation processes normally comes after you have a draft version of what information you want to represent in each table and then follow the steps of the normalisation process.
http://www.studytonight.com/dbms/database-normalization.php
(Also some advice, you haven't posed any of your DB or even what attributes belong to what table above, in which case if all of the attributes you have listed above belong to the single table this breaks the second/third normal form rule)
An example of a normalised environment:
customers
customer_id*, customer_name
invoices
invoice_id*,invoice_date, customer_id,invoice amount
products
product_id*,product_code, product_description
invoice_detail
invoice_id*,product_id*,quantity
* = (component of) primary key
This assumes that there is a 1:1 relationship between orders and invoices, and that invoices are generated on the date an order is placed.

View collective data from all tables using a single query

I have 20 tables and want to see the data of all tables with one query. Is there any way I could do this?
The tables, join/converge on 2 columns. Also, is it possible to view the collective data directly on XAMPP?
It's completely based on how your tables connected to each other and you need a little knowledge to utilize your mysql server to do that for you.
In practice let's say you have a table for customers and a table for products and a relation between them indicating which customer bought which product (which also a table in this case):
Customers (id, username, password, fullname, email)
Products (id, name, sku, price)
Customers_Products (customer_id, product_id, created_at)
And we want to get a list of all products sold from May:
SELECT c.fullname, p.name, p.sku, p.created_at FROM Products AS p INNER JOIN Customers_Products AS cp ON p.id=cp.product_id INNER JOIN Customers as c ON c.id=cp.customer_id WHERE created_at > '2015-05-01 00:00:00';
I wrote this example to show you that doing this is completely depends on your requirements which can be very simple like selecting data from just one table to very complicated like joining multiple tables while filtering, grouping and ordering them.
For your next question. Yes It's possible you can see what's on your tables from phpMyAdmin but usually this is not a good solution since a user in phpMyAdmin can see the plumbing part of your system. The other way that most people use is to write some code to acquire the data and show them to the user.

Database design: Associating tables of variable types

I am looking for a solution to (something I imagine to be) a common and trivial problem, but I couldn't find the correct words to find solutions on Google.
Starting situation
I have a table orders that is associated to a product and a customer:
orders (id, product_id, customer_id)
The problem
Each order must have a payment associated. These payments come from different payment processors (e.g. Paypal, stripe, Google Wallet, Amazon Payments) and thus have different types of data and data fields associated to them.
I'm looking to find a clean and normalized database design for this.
My own attempt/idea
I could create separate tables for the different types of payments and associate the order from there:
paypal_payment (id, order_id, currency, amount, [custom paypal fields])
stripe_payment (id, order_id, currency, amount, [custom stripe info])
direct_debit_payment (id, order_id, currency, amount, [custom direct debit info])
The problem: With this approach I would need to SELECT from each table for every payment type to find an associated payment to an order, so that doesn't seem very efficient to me.
What is a clean solution to this problem? I'm using MySQL if that is relevant at all.
Thanks for your help! :)
Your Payment table should have all fields that are common to all payments (amount, type of payment, etc) as well as a unique ID.
Variable fields would then be stored in a second table with three columns:
Payment UID (foreign key to the Payment table)
Type (what kind of data this current row is storing, i.e. the name of a custom field for the payment type)
Value
This allows you to associate any arbitrary number of custom fields with each payment record.
Obviously, each payment could have any number of entries in this secondary table (or none if none are needed).
This works quite well as most of the time you wont need the payment type specific info and can do queries that ignore this table. But the data will still be there when you need it.
If you're never going to be using those fields' contents in a where or join clause (which it usually is):
Add a payment method field (an enum or varchar)
Serialize the paypal, stripe, or whatever the client used as json
Store the thing using the most appropriate database type -- text in MySQL, json in Postgres.
A normalized way you could do this is by having a base payment table and extension tables for the other payment types.
All common payment information would go in your payment_base table.
payment_base(payment_id, order_id, currency, amount)
paypal_payment (paypal_payment_id, payment_id, [custom paypal fields])
stripe_payment (stripe_payment_id, payment_id, [custom stripe info])
direct_debit_payment (direct_debit_payment_id, payment_id, [custom direct debit info])

Can I create a table structure with dynamic columns in MySQL?

I've created a stock control database which contains two tables (actually more than two, but these are the two that are relevant to my question): Stock, and Receipts
I would like the link between the stock in the stock table,and the stock in the receipts table to be a little more clearer, this would be fine if a customer could only order one item of stock per receipt, as i'd simply have a StockID column and a Quantity column in the Recipts table, with the StockID column as an FK to the ID in the Stock table, however, the customer can make a receipt with any number of items of stock on it, which would mean i'd have to have a large number of columns in the Receipts table (i.e. StockID_1, Quantity_1, StockID_2, Quantity_2 etc.)
Is there a way around this (can you have like a dynamically expanding set of columns in MySQL) within MySQL, other than what i've done at the moment, which is to have an OrderContents column with the following structure (which isn't enforced by the database or anything) StockID1xQuantity,StockID2xQuantity and so on?
I would post an image of the DB structure, but I don't have enough repuation yet. My lecturer mentioned something about that it could be done, by normalising the database into 4th or 5th normal form?
I'd suggest having 3 tables:
Stock (StockID) + stock specific fields
Receipt (ReceiptID) + receipt specific fields.
StockReceipt (ReceiptID, StockID, Quantity) (could have a StockReceiptID, or use StockID+ReceiptID as Primary Key)
A solution including prices could look like:
Stock (StockID, Price)
PriceHistory (StockID, Price, Date) or (DateFrom, DateTo)
Receipt (ReceiptID, ReceiptDate)
StockReceipt (ReceiptID, StockID, Quantity)
That way you can calculate TotalStockReceiptPrice and TotalReceiptPrice for any receipt in the past.
I suspect this might be what you're looking for:
Stock (StockID, StockPrice)
Receipt (ReceiptID)
StockReceipt (ReceiptID, StockID, Quantity)
SELECT r.ReceiptID, SUM(s.StockPrice * sr.Quantity) AS ReceiptPrice
FROM Receipt r
INNER JOIN StockReceipt sr ON r.ReceiptID = sr.ReceiptID
INNER JOIN Stock s ON sr.StockID = s.StockID
GROUP BY r.ReceiptID
This is all very normalised (again, no idea to what normal form - 3rd?). However it only works if the StockPrice on the Stock record NEVER changes. As soon as it changes your ReceiptPrices would all reflect the new price instead of what the customer actually paid.
If the price can change, you'd need to either keep a price history table (ItemID, Price, DateTo, DateFrom) or record the StockPrice on the StockReceipt record (and then get rid of the JOIN to the Stock record in the above query and make it use sr.StockPrice instead of s.StockPrice)
To do the INSERT you posted below, you'd have to do:
INSERT INTO StockReceipts (ReceiptID, StockID, Quantity, TotalStockPrice)
SELECT 1, 99, 2, s.StockPrice
FROM Stock s
WHERE s.StockID = 99
However it's quite likely that whatever is issuing this receipt (and triggers the INSERT) already knows the price so could just insert the value.
No, relational databases do not allow dynamic columns. The definition of a relational table is that it has a header that name the columns, and every row has the same columns.
Your technique of repeating the groups of stock columns is a violation of First Normal Form, and it also has a lot of practical problems, for instance:
How do you know how many extra columns to create?
How do you search for a given value when you don't know which column it's in?
How do you enforce uniqueness?
The simplest solution is as #OGHaza described, store extra stock/quantity data on rows in another table. That way the problems above are solved.
You don't need to create extra columns, just extra rows, which is easy with INSERT.
You can search for a given value over one column to find it.
You can put constraints on the column.
If you really want to understand relational concepts, a nice book that is easy to read is: SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date.
There are also situations where you want to expand a table definition with dynamic columns that aren't repeating -- they're just new attributes. This is not relational, but it doesn't mean that we don't need some data modeling techniques to handle the scenario you describe.
For this type of problem, you might like to read my presentation Extensible Data Modeling with MySQL, for an overview of different solutions, and their pros and cons.
PS: Fourth and Fifth normal form have nothing to do with this scenario. Your lecturer obviously doesn't understand them.

How to expand this tables to support revision history

I have 3 tables:
pricelists (pricelist_id, name)
prices (price_id, pricelist_id, price, note)
tickets (ticket_id, price_id, name, time)
So, the main reason for versioning prices is because prices can be changed in future and I want to keep information about past prices for statistics, and I want to tickets has real price, not future changed price.
Can you please give me some example of queries?
I suppose one possible approach is making two price tables instead of two: the first one will store some generic price-related data (like 'note' and 'pricelist_id' link, as these won't change with time), and the second one will store the actual prices (along with, probably, timestamps of their activation - but that's not necessary):
prices (price_id, pricelist_id, note)
price_versions (price_ver_id, price_id, price, started_at, ended_at)
tickets (ticket_id, price_ver_id, name, issued_at)
As you see, you refer to price_versions in your tickets to get the specific price. But you can easily collect the generic price-related information as well (by joining the prices table from there).
This approach lets you construct an additional constraint, checking that issued_at is not before started_at and not after ended_at (it that NOT NULL in the corresponding row). But that's an addition, not a requirement, I suppose.