inner join three tables results in multiplied values - mysql

I'm trying to (let's say) gather a report on customers.
In that report I want to include sum of orders and ticket number for each client.
Tables:
Customer(id, name)
Order(id, customer_id, amount)
support_ticket(id, customer_id)
query:
select
c.id as 'Customer',
count(distinct t.id) as "Ticket count",
count(distinct o.id) as "Order count",
sum(o.amount) as 'Order Amount'
from customer as c
inner join `order` as o on c.id = o.customer_id
inner join support_ticket as t on c.id = t.customer_id
group by c.id
Since I join with customer.id on the two tables, I get all the rows "duplicated", since I get all possible combinations, so if the client as multiple tickets, the sum(o.amount) will we multiplied because of "duplicated rows"
sqlFiddle (mysql): http://sqlfiddle.com/#!9/ba39ba/13
sqlFiddle (pg): http://sqlfiddle.com/#!17/bc32e/7
It seems like a simple case but I've been looking at it too much I think, I can't find the proper way to do that report.
What am I doing wrong?

your best bet is to re-write the Aggregation off the Order table as as Derived Table;
EG
select
c.id as 'Customer',
count(distinct t.id) as "Ticket count",
o.amount as 'Order Amount' ,
o.[Order count]
from customer as c
inner join
(SELECT
o.customer_id,
sum(amount) as amount ,
count(distinct o.id) as "Order count"
from [order]
group by o.customer_id)
as o on c.id = o.customer_id
inner join support_ticket as t on c.id = t.customer_id
group by
c.id ,
o.amount ,
o.[Order count]
Note that the Derived Table Columns then are added to the group by clause at the bottom.
Cheers!

Just calculate order values in a sub-query and join it.
SELECT
c.id as 'Customer'
,count(DISTINCT st.id) as 'Ticket Count'
,o.`Order Count`
,o.amount as `Order Amount`
FROM customer c
INNER JOIN support_ticket st
on c.id = st.customer_id
INNER JOIN (
SELECT
customer_id
,SUM(amount) as 'amount'
,count(distinct id) as 'Order Count'
FROM `order`
group by customer_id
) o
on c.id = o.customer_id
GROUP BY c.id;

select c.id as 'Customer'
,t2.count_ticket as "Ticket count"
,t1.count_order as "Order count"
,t1.amount as 'Order Amount'
from customer as c
inner join (select customer_id
,count(id) as count_order
,sum(amount) as amount
from Order group by customer_id) t1
on c.id = t1.customer_id
inner join (select customer_id
,count(id) as count_ticket
from support_ticket group by customer_id) t2
on c.id = t2.customer_id

In cases like yours, when I think the solution of my problem should be fairly simple but I cant wrap my head around it, I tend to use a WITH clause.
Not because its better, but because it helps me to understand my code better by splitting up complexity. First I create a relatively simple temp. Solving the first part of my problem.
WITH temp AS (
SELECT
c.id AS "customer",
COUNT(DISTINCT o.id) AS "order_count",
SUM(o.amount) AS "order_amount"
FROM customer AS c
INNER JOIN "order" AS o on c.id = o.customer_id
GROUP BY c.id
)
Then I simply select the first half of my solution from temp, adding this way all intermediate results, and solve the second part of my initial sql.
SELECT
temp.customer,
COUNT(DISTINCT t.id) as "ticket_count",
temp.order_count,
temp.order_amount
FROM temp
INNER JOIN support_ticket as t on temp.customer = t.customer_id
GROUP BY temp.customer, temp.order_count, temp.order_amount
The principle is the same like in all previous answers, but SELECTS are separated and I can check them fast, and continue on if I'm happy with parts of the solution.

Related

using MYSQL Group By to get the most popular value

I am practicing MYSQL using https://www.w3schools.com/mysql/trymysql.asp?filename=trysql_func_mysql_concat which has a mock database for me to practice with an I am experimenting using the GROUP BY command I am attempting to group all employees up with all of their sales and determine, their name, their amount of sales and the product that they sold the most. I have managed to get their name and sales but not the product name. I know that extracting information with a group by is difficult and I have tried using a sub query. Is there a way to get the information.
My query is below.
SELECT
CONCAT_WS(' ',
Employees.FirstName,
Employees.LastName) AS 'Employee name',
COUNT(*) AS 'Num of sales'
FROM
Orders
INNER JOIN
Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN
OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN
Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY Orders.EmployeeID
ORDER BY COUNT(*) DESC;
What this says is get orders, join employees based on orders employeeid, join the order details based on order id and join products information based on product id in the order details, then it groups them by the employee id and orders them by the number of sales an employee has made.
SELECT
concat_ws(' ',
Employees.FirstName,
Employees.LastName) as 'Employee name',
count(*) as 'Num of sales',
(
SELECT Products.ProductName
FROM Orders
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY Orders.EmployeeID
ORDER BY count(Products.ProductName) desc
LIMIT 1
) as 'Product Name'
FROM Orders
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY Orders.EmployeeID
ORDER BY count(*) desc;
Above is my attempt at using a sub query for the solution.
It is quite ugly, as the w3school uses still mysql 5.7
On a personal note, you should install your own server grab somewhere a database and test it there, in mysql workbench you can have many query tabs in which you can test queries , till you het the "right" result.
SELECT
CONCAT_WS(' ',
Employees.FirstName,
Employees.LastName) AS 'Employee name',
COUNT(*) AS 'Num of sales',
tn.ProductName
FROM
Orders
INNER JOIN
Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN
OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN
Products ON Products.ProductID = OrderDetails.ProductID
INNEr JOIN
(SELECT EmployeeID, p.ProductName
FROM (SELECT IF (#Eid = EmployeeID ,#rn := #rn +1, #rn := 1) rn,ProductID, sumamount
, #Eid := EmployeeID as EmployeeID
FROM
(
SELECT
EmployeeID,ProductID, SUM(Quantity) sumamount
FROM Orders o INNER JOIN OrderDetails od ON od.OrderID = o.OrderID,(SELECT #Eid := 0, #rn := 0) t1
GROUP BY EmployeeID,ProductID
ORDER BY EmployeeID,sumamount DESC ) t2 ) t3
INNER JOIN Products p ON t3.ProductID = p.ProductID
WHERE rn= 1) tn
ON Orders.EmployeeID = tn.EmployeeID
GROUP BY Orders.EmployeeID
ORDER BY COUNT(*) DESC;
In your second query you are trying to get an employee's most often sold product. But there are two mistakes in that subquery:
The subquery is invalid. You group by employee, but select a product. Which product? An employee can sell many different products. MySQL should raise a syntax error here, as all other DBMS I know of do. But you are in cheat mode. MySQL allows incorrect aggregation queries and silently applies ANY_VALUE on all columns that cannot be selected otherwise. Thus you are selecting ANY_VALUE(Products.ProductName), i.e. a product arbitrarily chosen by the DBMS. To get out of cheat mode SET sql_mode = 'ONLY_FULL_GROUP_BY';.
Then, you don't relate the subquery to your main query. So when selecting the row for, say, employee #123, your subquery still selects data for all employees in order to pick one of their products. And as this is independent from the employee in the main query, it will probably pick the same product for every other employee you are selecting, too.
Here is what the query should look like instead:
SELECT
concat_ws(' ', e.FirstName, e.LastName) as "Employee name",
count(*) as "Num of sales",
(
SELECT p2.ProductName
FROM Orders o2
INNER JOIN OrderDetails od2 ON od2.OrderID = o2.OrderID
INNER JOIN Products p2 ON p2.ProductID = od2.ProductID
WHERE o2.EmployeeID = o.EmployeeID
GROUP BY p2.ProductID
ORDER BY count(*) DESC
LIMIT 1
) as "Product Name"
FROM Orders o
INNER JOIN Employees e ON o.EmployeeID = e.EmployeeID
INNER JOIN OrderDetails od ON od.OrderID = o.OrderID
GROUP BY o.EmployeeID
ORDER BY count(*) desc;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=f35e96764d454a4032d7778b550fc6b4
Disclaimer: When an employee sold more than one product most often (e.g. 500 x product A, 500 x product B, 200 x product C), then one of them (A or B in the example) gets picked arbitrarily for the employee.

Find the youngest customer with AT LEAST 1 purchase order

I need to write a query to find the youngest customer who bought atleast 1 product
Here is the data:
CUSTOMER:
ORDER_DETAIL:
This is my query so far:
SELECT c.CUSTOMERID, c.age, c.name
from (
SELECT CUSTOMERID, COUNT(ORDERID) as "totalOrder"
FROM FACEBOOK_ORDER_DETAIL
GROUP BY CUSTOMERID
HAVING COUNT(ORDERID) >=1) AS tbl
LEFT JOIN FACEBOOK_CUSTOMER c on c.CUSTOMERID = tbl.CUSTOMERID
order by c.age ;
However, above query gives me
But I need the list of customers with the minimum age.
If you really only want a single youngest customer, even should there be a tie, then use LIMIT:
SELECT c.CUSTOMERID, c.age, c.name
FROM CUSTOMER c
INNER JOIN FACEBOOK_ORDER_DETAIL o
ON c.CUSTOMERID = c.CUSTOMERID
ORDER BY
c.age
LIMIT 1;
This should work because if a customer joins to the order details table, it implies that he had at least one order.
If instead you want to find all youngest customers, including all ties, then a nice way to handle this uses the RANK analytic function:
SELECT DISTINCT CUSTOMERID, age, name
FROM
(
SELECT c.CUSTOMERID, c.age, c.name, RANK() OVER (ORDER BY c.age) rnk
FROM CUSTOMER c
INNER JOIN FACEBOOK_ORDER_DETAIL o
ON c.CUSTOMERID = o.CUSTOMERID
) t
WHERE rnk = 1;
Demo
For earlier versions of MySQL, we can use a subquery as a workaround for not having RANK:
SELECT DISTINCT c.CUSTOMERID, c.age, c.name
FROM CUSTOMER c
INNER JOIN FACEBOOK_ORDER_DETAIL o
ON c.CUSTOMERID = c.CUSTOMERID
WHERE c.age = (SELECT MIN(t1.age)
FROM CUSTOMER t1
INNER JOIN FACEBOOK_ORDER_DETAIL t2
ON t1.CUSTOMERID = t2.CUSTOMERID);
Demo
You only want columns from customers, so I would phrase this as:
select c.*
from (select c.*,
rank() over (order by age) as seqnum
from customers c
where exists (select 1
from facebook_order_detail fod
where fod.customerid = c.customerid
)
) c
where seqnum = 1;
In particular, this requires no duplicate elimination or aggregation, so it should be faster. And it can use an index on face_book_details(customerid) and also perhaps on customers(age, customerid).

Using SUM with Joins in MySQL returns unexpected result

I have these tables : customers, customer_invoices, customer_invoice_details, each customer has many invoices, and each invoice has many details.
The customer with the ID 574413 has these invoices :
select customer_invoices.customer_id,
customer_invoices.id,
customer_invoices.total_price
from customer_invoices
where customer_invoices.customer_id = 574413;
result :
customer_id invoice_id total_price
574413 662146 700.00
574413 662147 250.00
each invoice here has two details (or invoice lines) :
first invoice 662146:
select customer_invoice_details.id as detail_id,
customer_invoice_details.customer_invoice_id as invoice_id,
customer_invoice_details.total_price as detail_total_price
from customer_invoice_details
where customer_invoice_details.customer_invoice_id = 662146;
result :
detail_id invoice_id detail_total_price
722291 662146 500.00
722292 662146 200.00
second invoice 662147 :
select customer_invoice_details.id as detail_id,
customer_invoice_details.customer_invoice_id as invoice_id,
customer_invoice_details.total_price as detail_total_price
from customer_invoice_details
where customer_invoice_details.customer_invoice_id = 662147;
result :
detail_id invoice_id detail_total_price
722293 662147 100.00
722294 662147 150.00
I have a problem with this query :
select customers.id as customerID,
customers.last_name,
customers.first_name,
SUM(customer_invoices.total_price) as invoice_total,
SUM(customer_invoice_details.total_price) as details_total
from customers
join customer_invoices
on customer_invoices.customer_id = customers.id
join customer_invoice_details
on customer_invoice_details.customer_invoice_id = customer_invoices.id
where customer_id = 574413;
unexpected result :
customerID last_name first_name invoice_total details_total
574413 terry amine 1900.00 950.00
I need to have the SUM of the total_price of the invoices, and the SUM of the total_price of the details for each customer. In this case I'm supposed to get 950 as total_price for both columns (invoice_total& details_total) but it's not the case. what am I doing wrong & how can I get the correct result please. The answers in similar topics don't have the solution for this case.
When you mix normal columns with aggregate functions (for example SUM), you need to use GROUP BY where you list the normal columns from the SELECT.
The reason for the excessive amount in total_price for invoices is that the SUM is also calculated over each detail row as it is part of the join. Use this:
select c.id as customerID,
c.last_name,
c.first_name,
SUM(ci.total_price) as invoice_total,
SUM((select SUM(d.total_price)
from customer_invoice_details d
where d.customer_invoice_id = ci.id)) as 'detail_total_price'
from customers c
join customer_invoices ci on ci.customer_id = c.id
where c.id = 574413
group by c.id, c.last_name, c.first_name
db-fiddle
I used join against sub queries and then did a sum on the sums
SELECT c.id as customerID,
c.last_name,
c.first_name
SUM(i.sum) as invoice_total,
SUM(d.sum) AS details_total
FROM customers c
JOIN (SELECT id, customer_id, SUM(total_price) AS sum
FROM customer_invoices
GROUP BY id, customer_id) AS i ON i.customer_id = c.id
JOIN (SELECT customer_invoice_id as id, SUM(total_price) AS sum
FROM customer_invoice_details
GROUP BY customer_invoice_id) AS d ON d.id = i.id
WHERE c.id = 574413
GROUP BY c.id, c.name
The issue is in the joining logic. The table customers is used as the driving table in the joins. But in the second join, you are using a derivative key column from the first join, to join with the third tables. This is resulting in a Cartesian output doubling the records from the result from the nth-1 join, which is leading to customer_invoices.total_price getting repeated twice, hence the rolled up value of this field is doubled.
At a high level I feel that the purpose of rolling up the prices is already achieved in SUM(customer_invoice_details.total_price).
But if you have a specific project requirement that SUM(customer_invoices.total_price) should also be obtained and must match with SUM(customer_invoice_details.total_price), then you can do this:
In a separate query, Join customer_invoice_details and customer_invoices. Roll up the pricing fields, and have a result such that you have only one record for one customer ID.
Then use this as a sub-query and join it with the customers table.
You are aggregating along multiple dimensions. This is challenging. I would suggest doing the aggregation along each dimension independently:
select c.id as customerID, c.last_name, c.first_name,
ci.invoice_total,
cid.details_total
from customers c join
(select ci.sum(ci.total_price) as invoice_total
from customer_invoices ci
group by ci.customer_id
) ci
on ci.customer_id = c.id join
(select ci.sum(cid.total_price) as details_total
from customer_invoices ci join
customer_invoice_details cid
on cid.customer_invoice_id = ci.id
group by ci.customer_id
) cid
on cid.customer_id = c.id
where c.id = 574413;
A faster version (for one customer) uses correlated subqueries:
select c.id as customerID, c.last_name, c.first_name,
(select ci.customer_id, sum(ci.total_price) as invoice_total
from customer_invoices ci
where ci.customer_id = c.id
) as invoice_total,
(select ci.customer_id, sum(cid.total_price) as details_total
from customer_invoices ci join
customer_invoice_details cid
on cid.customer_invoice_id = ci.id
where ci.customer_id = c.id
) as details_total
from customers c
where c.id = 574413;

MySQL: Ranking from multiple tables, sub queries?

This is a MySQL question. I have three tables with the following columns:
transactions (table): transact_id, customer_id, transact_amt, product_id,
products (table): product_id, product_cost, product_name, product_category
customers (table): customer_id, joined_at, last_login_at, state, name, email
I'd like a query that finds out the most popular item in every state and the state. One of the tricky parts is that some product_name have multiple product_id. Therefore I though joining the three tables that generate an output with two columns: state and product_name. Until here that worked fine doing this:
SELECT p.product_name, c.state
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
This selects all the products, and the states from where the customer is. The problem is that I can't find the way to rank the mos popular product per state. I tried different group by, order by and using subqueries without success. I suspect I need to do subqueries, but I can't find the way to resolve it. The expected outcome should look like this:
most_popular_product | state
Bamboo | WA
Walnut | MO
Any help will be greatly appreciated.
Thank you!
You need a subquery that gets the count of transactions for each product in each state.
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
Then write another query that has this as a subquery, and gets the highest count for each state.
SELECT state, MAX(count) AS maxcount
FROM (
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
) AS t
GROUP BY state
Finally, join them together:
SELECT t1.product_name AS most_popular_product, t1.state
FROM (
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
) AS t1
JOIN (
SELECT state, MAX(count) AS maxcount
FROM (
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
) AS t
GROUP BY state
) AS t2 ON t1.state = t2.state AND t1.count = t2.maxcount
This is basically the same pattern as SQL select only rows with max value on a column, just using the first grouped query as the table you're trying to group.

SQL join tables, total spent by all customers

I'm a newbie at SQL and this is the question I've been struggling with:
I have these tables:
customers(custID, firstname, familyname)
items(itemID, unitcost)
lineitems(quantity, orderID, itemID)
orders(orderID, custID, date)
I need to find the names and total spend of all customers that made more than one order.
SELECT SUM(items.unitcost*lineitems.quantity) AS "total_spent"
FROM orders
INNER JOIN customers
ON orders.custID=customers.custID
GROUP BY firstname
HAVING COUNT(DISTINCT orderID)>1
LIMIT 0,30
I think you just need to continue your joins:
SELECT c.custId, c.firstname, SUM(i.unitcost*li.quantity) total_spent
FROM customers c
JOIN orders o ON c.custId = o.custId
JOIN lineitems li ON o.orderId = li.orderId
JOIN items i ON li.itemId = i.itemId
GROUP BY c.custId, c.firstname
HAVING COUNT(DISTINCT o.orderID)>1
LIMIT 0,30