how to design the tables for customer orders and bills - mysql

Am designing a database schema (orders and bills) for a hotel system.
The attached image shows the tables in the database schema.
The question is how do I design the bills table, so that I can calculate the customer bill from orders the customer has made?
My assumption is that a bill is calculated after the order is made, and not the other way round, e.g. creating a bill before we make an order.
I am considering this answer however it does not solve my problem, since I want to calculate bills from customer orders.
The red rectangle shows the relationship between the orders and bill table this is where am stuck, I don't know how to design the tables.

Some language standardization first:
By Order you mean Sales Order, OrderDetails are called Line Items, and a Bill is usually called a Sales Invoice.
An sales invoice is a request for payment. You issue one when you think someone owes you money.
Depending on the terms of the sales order, someone owes you money:
after the order is completed
after the service is first delivered
after the service is completely delivered
after some period of time based on the terms of the sales order
For a hotel, usually you ask for money after the service has been completely delivered, but perhaps with a deposit, or intermediate payments for a long stay.
An invoice is not necessarily for one sales order. You can combine multiple sales orders into one invoice.
An invoice has line items referencing the sales order line items that you are requesting payment for.
You may have to issue multiple invoices for the same person/sales order.

EDIT 3 added design + cleanup
Every base table is the rows satisfying some statement. Find the statement.
Customer is the rows satisfying: customer [CustomerId] named [CustomerName] lives at ...
Product is the rows satisfying: product [ProductId] is named [Productname] costing [ProductPrice]
OrderDetail is the rows satisfying: orderDetail [OrderDetailId] of order [OrderId] is quantity [quantity] of product [ProductId]
Order is the rows satisfying: customer [CustomerId] ordered [OrderId] on [dateOfOrder]
What rows do you want in Bill? I'll guess...
Bill is the rows satisfying:
Bill [BillId] is for order [OrderId] on [dateOfBill] ... ???
You can find out some things about a bill by using its order. You must determine what else besides its date and order that you want know about a bill (eg to write one) and then what statement bill rows satisfy (ie finish the "...") that gives you that info directly (as with its date) or indirectly (as with its order).
I asked
what else besides its date and order that you want know about a bill (eg to write one)
BillId
dateOfBill
OrderId
order OrderId's customer's CustomerId, CustomerName, CustomerAddress ...
order OrderId's dateOfOrder
for every orderDetailId's orderDetail whose orderID = OrderId
quantity, ProductId, ProductNam,e ProductPrice, (quantity * ProductPrice) as productProduct
sum(quantity * ProductPrice) as total
over every orderDetail with OrderDetailId = OrderId
I asked
what statement bill rows satisfy that gives you that info directly or indirectly
You suggested
For the bills table I intend to have the following fields
Bill BillId (PK) CustomerId (FK) OrderId (FK) dateOfBill
Bill has to directly give us a BillId, dateOfBill and OrderId; they're nowhere else. But everything else can be got indirectly.
Bill is the rows satisfying:
bill [BillId] is for order [OrderId] and was billed on [dateOfBill]
The reason I mention statements is: one needs them to query and to determine FDs, keys, uniqueness, Fks, and other constraints. (Rather than using one vague intuitions.) This is explicit in design methods ORM2, NIAM and FCO-IM.
I determined the content of a bill above by finding what statement its rows will satisfy:
customer [CustomerId] named [CustomerName] at [CustomerAddress] ...
owes us $[total] for order [OrderId]
ordering [quantity] of product [ProductId] named [ProductName] # price $[ProductPrice] = [productProduct]
as recorded in bill [BillId]
This is a statement made from the statements given for each table, except that I need some statements not in any table yet, namely the stuff that (therefore) Bill needs to give. By replacing the statements by their tables we will get the query whose values are the rows we want.

Related

MySQL - When shouldn't I Join tables? Combinatorial Explosion of values

I am working on a database called classicmodels, which I found at: https://www.mysqltutorial.org/mysql-sample-database.aspx/
I realized that when I executed an Inner Join between 'payments' and 'orders' tables, a 'cartesian explosion' occurred. I understand that these two tables are not meant to be joined. However, I would like to know if it is possible to identify this just by looking at the relational schema or if I should check the tables one by one.
For instance, the customer number '141' appears 26 times in the 'orders table', which I found by using the following code:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
orders
WHERE customerNumber=141
GROUP BY customerNumber;
And the same customer number (141) appears 13 times in the payments table:
SELECT
customerNumber,
COUNT(customerNumber)
FROM
payments
WHERE customerNumber=141
GROUP BY customerNumber;
Finally, I executed an Inner Join between 'payments' and 'orders' tables, and selected only the rows with customer number '141'. MySQL returned 338 rows, which is the result of 26*13. So, my query is multiplying the number of times this 'customer n°' appears in 'orders' table by the number of times it appears in 'payments'.
SELECT
o.customernumber,
py.amount
FROM
customers c
JOIN
orders o ON c.customerNumber=o.customerNumber
JOIN
payments py ON c.customerNumber=py.customerNumber
WHERE o.customernumber=141;
My questions is the following:
1 ) Is there a way to look at the relational schema and identify if a Join can be executed (without generating a combinatorial explosion)? Or should I check table by table to understand how the relationship between them is?
Important Note: I realized that there are two asterisks in the payments table's representation in the relational schema below. Maybe this means that this table has a composite primary key (customerNumber+checkNumber). The problem is that 'checkNumber' does not appear in any other table.
This is the database's relational schema provided by the 'MySQL Tutorial' website:
Thank you for your attention!
This is called "combinatorial explosion" and it happens when rows in one table each join to multiple rows in other tables.
(It's not "overestimation" or any sort of estimation. It's counting data items multiple times when it should only count them once.)
It's a notorious pitfall of summarizing data in one-to-many relationships. In your example each customer may have no orders, one order, or more than one. Independently, they may have no payments, one, or many.
The trick is this: Use subqueries so your toplevel query with GROUP BY avoids joining one-to-many relationships serially. In the query you showed us, that's happening.
You can this subquery to get a resultset with just one row per customer. (try it.)
SELECT customernumber,
SUM(amount) amount
FROM payments
GROUP BY customernumber
Likewise you can get the value of all orders for each customer with this
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) amount
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
This JOIN won't explode in your face because customer can have multiple orders, and each order can have multiple details. So it's a strict hierarchical rollup.
Now, we can use these subqueries in the main query.
SELECT c.customernumber, p.payments, o.orders
FROM customers c
LEFT JOIN (
SELECT c.customernumber,
SUM(od.qytOrdered * od.priceEach) orders
FROM orders o
JOIN orderdetails od ON o.orderNumber = od.orderNumber
GROUP BY c.customernumber
) o ON c.customernumber = o.customernumber
LEFT JOIN (
SELECT customernumber,
SUM() payment
FROM payments
GROUP BY customernumber
) p on c.customernumber = p.customernumber
Takehome tricks:
A subquery IS a table (a virtual table) that can be used whereever you might mention a table or a view.
The GROUP BY stuff in this query happens separately in two subqueries, so no combinatorial explosions.
All three participants in the toplevel JOIN have either one or zero rows per customernumber.
The LEFT JOINs are there so we can still see customers with (importantly for a business) no orders or no payments. With the ordinary inner JOIN, rows have to match both sides of the ON conditions or they're omitted from the resultset.
Pro tip Format your SQL queries fanatically carefully: They are really verbose. Adm. Grace Hopper would be proud. That means they get quite long and nested, putting the Structured in Structured Query Language. If you, or anybody, is going to reason about them in future, we must be able to grasp the structure easily.
Pro tip 2 The data engineer who designed this database did a really good job thinking it through and documenting it. Aspire to this level of quality. (Rarely reached in the real world.)
In this particular case, your behavior should depend on the accounting style being supported by the database, and this does not appear to be "open item" style accounting ie when an order is raised for 1000 there does not need to be a payment against it for 1000.. This is perhaps unusual in most consumer experience because you will be quite familiar with open item style ordering from Amazon - you buy a 500 dollar tv and a 500 dollar games console, the order is a thousand dollars and you pay for it, the payment going against the order. However, you're also familiar with "balance forward" accounting if you paid for that order using your credit card because you make similar purchases every day for a month and hen you get a statement from your bank saying you owe 31000 and you pay a lump of money, doesn't even have to be 31k. You aren't expected to make 31 payments of 1000 to your bank at the end of the month. Your bank allocate it to the oldest items on the account (if they're nice, or the newest items if they're not) and may eventually charge you interest on unpaid transactions
1 ) Is there a way to look at the relational schema and identify if a Join can be executed
Yes, you can tell looking at the schema- customer has many orders, customer makes many payments, but there is no relation between the order and payment tables at all so we can see there is no attempt to directly attach a payment to an order. You can see that customer is a parent table of payment and order, and therefore enjoys a relationship with each of them but they do not relate to each other. If you had Person, Car and Address tables, a person has many addresses during their life, and many cars but it doesn't mean there is a relationship between cars and addresses
In such a case it simply doesn't make sense to join payments to customers to orders because they do not relate that way. If you want to make such a join and not suffer a Cartesian explosion then you absolutely have to sum one side or the other (or both) to ensure that your joins are 1:1 and 1:M (or 1:1 and 1:1). You cannot arrange a join that is a pair of 1:M.
Going back to the car/person/address example to make any meaningful joins, you have to build more information into the question and arrange the join to create the answer. Perhaps the question is "what cars did they own while they lived at" - this flattens the Person:Address relationship to 1:1 but leaves Person:Car as 1:M so they might have owned many cars during their time in that house. "What was the newest car they owned while living at..." might be 1:1 on both sides if there is a clear winner for "newest" (though if they bought two cars manufactured at identical times...)
Which side you sum in your orders case will depend on what you want to know, but in this case I'd say you usually want to know "which orders haven't been paid for" and that's summing all payments and rolling summing all orders then looking at what point the rolling sum exceeds the sum of payments.. those are the unpaid orders
Take a look again at your database graph (the one that was present in the first iteration of your question). See the lines between tables have 3 angled legs on one end - that's the many end. You can start at any table in the graph and join to other tables by walking along the relationship. If you're going from the many end to the one end, and assuming you've picked out a single row in the start table (a single order) you can always walk to any other table in the many->one direction and not increase your row count. If you walk the other way you potentially increase your row count. If you split and walk two ways that both increase row count you get a Cartesian explosion. Of course, also you don't have to only join on relation lines, but that's out of scope for the question
ps: this is easier to see on the db diagram than the ERD in the question because the database purely concerns itself with the columns that are foreign keyed. The ERD is saying a customer has zero or one payments with a particular check number but the database will only be concerned with "the customer ID appears once in the customer table and multiple times in the payment table" because only part of the compound primary key of payment is keyed to the customer table. In other words, the ERD is concerned with business logic relations too, but the db diagram is purely how tables relate and they aren't necessarily aligned. For this reason the db diagrams are probably easier to read when walking round for join strategies
After seeing the answers of Caius Jard and O.Jones (please, check their replies), which kindly helped me to clarify this doubt, I decided to create a table to identify which customers paid for all orders they made and which ones did not. This creates a pertinent reason to join 'orders', 'orderdetails', 'payments' and 'customers' tables, because some orders may have been cancelled or still may be 'On Hold', as we can see in their corresponding 'status' in the 'orders' table. Also, this enables us to execute this join without generating a 'combinatorial explosion'.
I did this by using the CASE statement, which registers when py.amount and amount_in_orders match, don't match or when they are NULL (customers which did not make orders or payments):
SELECT
c.customerNumber,
py.amount,
amount_in_orders,
CASE
WHEN py.amount=amount_in_orders THEN 'Match'
WHEN py.amount IS NULL AND amount_in_orders IS NULL THEN 'NULL'
ELSE 'Don''t Match'
END AS Match
FROM
customers c
LEFT JOIN(
SELECT
o.customerNumber, SUM(od.quantityOrdered*od.priceEach) AS amount_in_orders
FROM
orders o
JOIN orderdetails od ON o.orderNumber=od.orderNumber
GROUP BY o.customerNumber
) o ON c.customerNumber=o.customerNumber
LEFT JOIN(
SELECT customernumber, SUM(amount) AS amount
FROM payments
GROUP BY customerNumber
) py ON c.customerNumber=py.customerNumber
ORDER BY py.amount DESC;
The query returned 122 rows. The images below are fractions of the generated output, so you can visualize what happened:
For instance, we can see that the customers identified by the numbers '141', '124', '119' and '496' did not pay for all the orders they made. Maybe some of them where cancelled or maybe they simply did not pay for them yet.
And this image shows some of the columns (not all of them) that are NULL:

Database queries for an assignment

I am working on a project and I am a newbie in database. I need help answering the questions below with the scenario and the database tables listed.
Database tables:
Product (pid:integer, timestamp:integer, name: string, price:real, location:string)
Customer (cid:integer, email: string)
Purchases (pid:integer, cid:integer, orderid:integer, amount:integer)
Totals (orderid:integer, cid:integer, totalprice:real, timestamp:integer)
Scenario:
A product ID can occur multiple times in the schema. Each time the location or price is updated, another line is added to the database with a timestamp that indicates the time of change. The name does not get changed, so the same pid will always imply the same name.
Totals is a summary of the purchases table which shows when the purchases were made, and what the combined price of all products were.
Whenever possible, try to do your projections as early as possible.
Use the above database and provide queries for the following problems:
Find the names of products that at some point in time cost more than e20.00 and the names of products that have at some point cost less than e0.10.
Find the email addresses of customers that have spent more than e200 at once.
Find the pids of products that have had at least one price change.
Find the names of products that have both been displayed at location ’5-12’ and ’A3’
Find the cid of customers that have bought each product that at some point cost less than e1.00
Find the cid of customers that are registered with the store but have made no purchases.
Find the cid of the customer(s) that have made the largest total purchase.
Find the most expensive product that has been purchased at least once by each registered customer.
Find the pids of products that have not been sold since timestamp 20150625 but have been sold at least once before that date.
The grocery store wants to improve its database. Write a query that returns a table that is basically the Purchases table plus the price of the product at the time of purchase.
Let's try to provide you a bit of help to start your project ❤
In query one, we need to work exclusively with table Products, and You want to find the name of those products with price above x and less than y.
First, the columns we want to get FROM the table:
SELECT name, price FROM Products
Then, as you need a query that get products more expensive than 20, and producs less expensive than 0.10, you could use the Condition BETWEEN reverted with NOT:
SELECT name, price FROM Products
WHERE price NOT BETWEEN 0.10 and 20
And you can order it to be more readable:
SELECT name, price FROM Products
WHERE price NOT BETWEEN 0.10 and 20
ORDER BY name ASC;
I'm not sure if this is what you need, but I hope it helps a bit!

SQL Database design - three tables: product, sales order, and purchase order - how to store product quantity?

I have three tables: product, sales_order (where I sell products) and purchase_order (where I buy products). Now I can think of two ways of keeping the quantity of each product:
Have a column in the product table called quantity; when inserting into sales_order, I subtract the quantity; when inserting into purchase_order, I add the quantity
Instead of storing the quantity in the product table, I calculate the quantity from the sales_order and the purchase_order table each time I need to get the product table
I am wondering if the second approach is preferable to the first one? I like the second one more because it doesn't store any redundant data; however, I am not so sure if calculating the quantity every time is a bit too much calculation. I am wondering what is the convention and best practice here? Thank you!
I would use the first one. Add a column to the product table in the coding u code -x amount when order and you would then display this in the order table. You could right a script for when the products get to a certain amount it emails you and tells u to replenish stocks. However the second would also work and sql is very powerful so i wouldnt wprry about it being ro demanding as it will prbably work it out faster than we can lol
I prefer the first one because in-memory calculations are faster than issuing select statements to check the sales orders and purchase orders assuming that the number of times the quantity value is retrieved is significantly more than the number of times the quantity value is updated.

Pulling different records from multiple tables as one transaction history list

I am working on an employee management/reward system and need to be able to show a single "transaction history" page that shows in chronological order the different events that the employee has experienced in one list. (Sort of like how in facebook you can goto your history/action section and see a chronological list of all the stuff that you have done and affects you, even though they are unrelated to eachother and just have you as a common user)
I have different tables for the different events, each table has an employee_id key and an "occured" timestamp, some table examples:
bonuses
customers
raise
complaints
feedback
So whenever an event occurs (ie a new customer is assigned to the employee, or the employee gets a complaint or raise) a new row is added to the appropriate table with the employee ID it affects and a timestamp of when it occured.
I need a single query to pull all records (upto 50 for example) that include the employee and return a history view of that employee. The field names are different in each table (ie the bonus includes an amount with a note, the customer includes customer info etc).
I need the output to be a summary view using column names such as:
event_type = (new customer, bonus, feedback etc)
date
title (a brief worded title of the type of event, specified in sql based on the table its referencing)
description (verbiage about the action, such as if its event_type bonus display the bonus amount here, if its a complain show the first 50 characters of the complaint message or the ID of the user that filed the complaint from the complaints table. All done in SQL using if statements and building the value of this field output based on which table it comes from. Such as if its from the customers table IF current_table=customers description='A customer was assigned to you by'.customers.assigner_id).
Ideally,
Is there any way to do this?
Another option I have considered, is I could do 5-6 different queries pulling the records each from their own table, then use a mysql command to "mesh/interleave" the results from all the queries into one list by chronological order. That would be acceptable too
You could use a UNION query to merge all the information together and use the ORDER BY clause to order the actions chronologically. Each query must have the same number of fields. Your ORDER BY clause should be last.
The examples below assume you have a field called customer_name in the customers table and bonus_amount in the bonuses table.
It would look something like this:
SELECT 'New Customer' as event_type, date,
'New customer was assigned' as title,
CONCAT('New Customer: ', customer_name, ' was assigned') as description
FROM customers
WHERE employee_id = 1
UNION
SELECT 'Bonus' as event_type, date,
'Received a bonue' as title,
CONCAT('Received a bonus of $', FORMAT(bonus_amount, 2), '.') as description
FROM bonuses
WHERE employee_id = 1
UNION
...
ORDER BY date DESC;

E-Commerce Database Design for Orders Table

I am designing a schema for E-Commerce app, in which I would have 3 tables i.e Orders, Products, Customers.
So Should we store customer_id and product_id in Orders table straightaway.
The limitation to this is when a product or customer updates their attributes( i.e product price or customer name ), the orders table doesn't reflect them.
For Ex: A Customer bought a product at $10, but later on the product price gets updated to $20.So now when we are referring to this order by product id we would get the result as it was bought at $20 instead of $10.
SOLUTION 1:
One solution would be to insert a new row into products table whenever an updates occur and perform a soft delete to that product so that it can be referenced from orders table.
SOLUTION 2:
Store most of the details in product and customer details in orders table.
SOLUTION 3:
Create a temporary table of customers and products whenever there is an update to these tables.
I am very much open to any other suggestions.
One thing that you seem to be missing is a orderLineItem table for anything other than the most simple solution, where there is one product/order.
Now, that being said, you can do the products table in several ways.
Assuming that price is your only variable in the products table that you want to change, you can have a separate pricePoints table, that would store the price for any item at any given time. You would then use the ID from this table in your orders table and use that to get to the productID from the products table. A slightly more inefficient way to store this (but faster for retrieval) would be to store both the productId and the pricePointId in the orders table.
You could also do this by simply storing the price paid amount in the orders table. This gives you a little more flexibility to add discounts and pricing rules. You do need to be concerned about auditing the price though if you do it this way. Why was this price charged for this line at this time is going to be a common question.
You need to know how much the customer paid for the product at any time. It's not so important to know how much the customer would have paid for the order if they bought it today.
Customers are a slightly different issue. Some of the information in a customer table is transient. Some of it has to be fixed for the order. Lets say that the customer has a name, address, billing address and shipping address. At the time of the order, the shipping and billing addresses have to be absolutely fixed. You don't want to go back in three weeks and discover that the shipping address was changed. But, by the same token, you might like the name to be updated if a customer changes their maiden name, for example.
Now, all that being said, we aren't going to design your schema for you. There are a lot of good resources out there for how to design a simple e-commerce database.