MySQL: Ranking from multiple tables, sub queries? - mysql

This is a MySQL question. I have three tables with the following columns:
transactions (table): transact_id, customer_id, transact_amt, product_id,
products (table): product_id, product_cost, product_name, product_category
customers (table): customer_id, joined_at, last_login_at, state, name, email
I'd like a query that finds out the most popular item in every state and the state. One of the tricky parts is that some product_name have multiple product_id. Therefore I though joining the three tables that generate an output with two columns: state and product_name. Until here that worked fine doing this:
SELECT p.product_name, c.state
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
This selects all the products, and the states from where the customer is. The problem is that I can't find the way to rank the mos popular product per state. I tried different group by, order by and using subqueries without success. I suspect I need to do subqueries, but I can't find the way to resolve it. The expected outcome should look like this:
most_popular_product | state
Bamboo | WA
Walnut | MO
Any help will be greatly appreciated.
Thank you!

You need a subquery that gets the count of transactions for each product in each state.
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
Then write another query that has this as a subquery, and gets the highest count for each state.
SELECT state, MAX(count) AS maxcount
FROM (
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
) AS t
GROUP BY state
Finally, join them together:
SELECT t1.product_name AS most_popular_product, t1.state
FROM (
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
) AS t1
JOIN (
SELECT state, MAX(count) AS maxcount
FROM (
SELECT p.product_name, c.state, COUNT(*) AS count
FROM products p
INNER JOIN transactions t
ON p.product_id = t.product_id
INNER JOIN customers c
ON c.customer_id = t.customer_id
GROUP BY p.product_name, c.state
) AS t
GROUP BY state
) AS t2 ON t1.state = t2.state AND t1.count = t2.maxcount
This is basically the same pattern as SQL select only rows with max value on a column, just using the first grouped query as the table you're trying to group.

Related

using MYSQL Group By to get the most popular value

I am practicing MYSQL using https://www.w3schools.com/mysql/trymysql.asp?filename=trysql_func_mysql_concat which has a mock database for me to practice with an I am experimenting using the GROUP BY command I am attempting to group all employees up with all of their sales and determine, their name, their amount of sales and the product that they sold the most. I have managed to get their name and sales but not the product name. I know that extracting information with a group by is difficult and I have tried using a sub query. Is there a way to get the information.
My query is below.
SELECT
CONCAT_WS(' ',
Employees.FirstName,
Employees.LastName) AS 'Employee name',
COUNT(*) AS 'Num of sales'
FROM
Orders
INNER JOIN
Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN
OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN
Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY Orders.EmployeeID
ORDER BY COUNT(*) DESC;
What this says is get orders, join employees based on orders employeeid, join the order details based on order id and join products information based on product id in the order details, then it groups them by the employee id and orders them by the number of sales an employee has made.
SELECT
concat_ws(' ',
Employees.FirstName,
Employees.LastName) as 'Employee name',
count(*) as 'Num of sales',
(
SELECT Products.ProductName
FROM Orders
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY Orders.EmployeeID
ORDER BY count(Products.ProductName) desc
LIMIT 1
) as 'Product Name'
FROM Orders
INNER JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN Products ON Products.ProductID = OrderDetails.ProductID
GROUP BY Orders.EmployeeID
ORDER BY count(*) desc;
Above is my attempt at using a sub query for the solution.
It is quite ugly, as the w3school uses still mysql 5.7
On a personal note, you should install your own server grab somewhere a database and test it there, in mysql workbench you can have many query tabs in which you can test queries , till you het the "right" result.
SELECT
CONCAT_WS(' ',
Employees.FirstName,
Employees.LastName) AS 'Employee name',
COUNT(*) AS 'Num of sales',
tn.ProductName
FROM
Orders
INNER JOIN
Employees ON Orders.EmployeeID = Employees.EmployeeID
INNER JOIN
OrderDetails ON OrderDetails.OrderID = Orders.OrderID
INNER JOIN
Products ON Products.ProductID = OrderDetails.ProductID
INNEr JOIN
(SELECT EmployeeID, p.ProductName
FROM (SELECT IF (#Eid = EmployeeID ,#rn := #rn +1, #rn := 1) rn,ProductID, sumamount
, #Eid := EmployeeID as EmployeeID
FROM
(
SELECT
EmployeeID,ProductID, SUM(Quantity) sumamount
FROM Orders o INNER JOIN OrderDetails od ON od.OrderID = o.OrderID,(SELECT #Eid := 0, #rn := 0) t1
GROUP BY EmployeeID,ProductID
ORDER BY EmployeeID,sumamount DESC ) t2 ) t3
INNER JOIN Products p ON t3.ProductID = p.ProductID
WHERE rn= 1) tn
ON Orders.EmployeeID = tn.EmployeeID
GROUP BY Orders.EmployeeID
ORDER BY COUNT(*) DESC;
In your second query you are trying to get an employee's most often sold product. But there are two mistakes in that subquery:
The subquery is invalid. You group by employee, but select a product. Which product? An employee can sell many different products. MySQL should raise a syntax error here, as all other DBMS I know of do. But you are in cheat mode. MySQL allows incorrect aggregation queries and silently applies ANY_VALUE on all columns that cannot be selected otherwise. Thus you are selecting ANY_VALUE(Products.ProductName), i.e. a product arbitrarily chosen by the DBMS. To get out of cheat mode SET sql_mode = 'ONLY_FULL_GROUP_BY';.
Then, you don't relate the subquery to your main query. So when selecting the row for, say, employee #123, your subquery still selects data for all employees in order to pick one of their products. And as this is independent from the employee in the main query, it will probably pick the same product for every other employee you are selecting, too.
Here is what the query should look like instead:
SELECT
concat_ws(' ', e.FirstName, e.LastName) as "Employee name",
count(*) as "Num of sales",
(
SELECT p2.ProductName
FROM Orders o2
INNER JOIN OrderDetails od2 ON od2.OrderID = o2.OrderID
INNER JOIN Products p2 ON p2.ProductID = od2.ProductID
WHERE o2.EmployeeID = o.EmployeeID
GROUP BY p2.ProductID
ORDER BY count(*) DESC
LIMIT 1
) as "Product Name"
FROM Orders o
INNER JOIN Employees e ON o.EmployeeID = e.EmployeeID
INNER JOIN OrderDetails od ON od.OrderID = o.OrderID
GROUP BY o.EmployeeID
ORDER BY count(*) desc;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=f35e96764d454a4032d7778b550fc6b4
Disclaimer: When an employee sold more than one product most often (e.g. 500 x product A, 500 x product B, 200 x product C), then one of them (A or B in the example) gets picked arbitrarily for the employee.

Using SUM with Joins in MySQL returns unexpected result

I have these tables : customers, customer_invoices, customer_invoice_details, each customer has many invoices, and each invoice has many details.
The customer with the ID 574413 has these invoices :
select customer_invoices.customer_id,
customer_invoices.id,
customer_invoices.total_price
from customer_invoices
where customer_invoices.customer_id = 574413;
result :
customer_id invoice_id total_price
574413 662146 700.00
574413 662147 250.00
each invoice here has two details (or invoice lines) :
first invoice 662146:
select customer_invoice_details.id as detail_id,
customer_invoice_details.customer_invoice_id as invoice_id,
customer_invoice_details.total_price as detail_total_price
from customer_invoice_details
where customer_invoice_details.customer_invoice_id = 662146;
result :
detail_id invoice_id detail_total_price
722291 662146 500.00
722292 662146 200.00
second invoice 662147 :
select customer_invoice_details.id as detail_id,
customer_invoice_details.customer_invoice_id as invoice_id,
customer_invoice_details.total_price as detail_total_price
from customer_invoice_details
where customer_invoice_details.customer_invoice_id = 662147;
result :
detail_id invoice_id detail_total_price
722293 662147 100.00
722294 662147 150.00
I have a problem with this query :
select customers.id as customerID,
customers.last_name,
customers.first_name,
SUM(customer_invoices.total_price) as invoice_total,
SUM(customer_invoice_details.total_price) as details_total
from customers
join customer_invoices
on customer_invoices.customer_id = customers.id
join customer_invoice_details
on customer_invoice_details.customer_invoice_id = customer_invoices.id
where customer_id = 574413;
unexpected result :
customerID last_name first_name invoice_total details_total
574413 terry amine 1900.00 950.00
I need to have the SUM of the total_price of the invoices, and the SUM of the total_price of the details for each customer. In this case I'm supposed to get 950 as total_price for both columns (invoice_total& details_total) but it's not the case. what am I doing wrong & how can I get the correct result please. The answers in similar topics don't have the solution for this case.
When you mix normal columns with aggregate functions (for example SUM), you need to use GROUP BY where you list the normal columns from the SELECT.
The reason for the excessive amount in total_price for invoices is that the SUM is also calculated over each detail row as it is part of the join. Use this:
select c.id as customerID,
c.last_name,
c.first_name,
SUM(ci.total_price) as invoice_total,
SUM((select SUM(d.total_price)
from customer_invoice_details d
where d.customer_invoice_id = ci.id)) as 'detail_total_price'
from customers c
join customer_invoices ci on ci.customer_id = c.id
where c.id = 574413
group by c.id, c.last_name, c.first_name
db-fiddle
I used join against sub queries and then did a sum on the sums
SELECT c.id as customerID,
c.last_name,
c.first_name
SUM(i.sum) as invoice_total,
SUM(d.sum) AS details_total
FROM customers c
JOIN (SELECT id, customer_id, SUM(total_price) AS sum
FROM customer_invoices
GROUP BY id, customer_id) AS i ON i.customer_id = c.id
JOIN (SELECT customer_invoice_id as id, SUM(total_price) AS sum
FROM customer_invoice_details
GROUP BY customer_invoice_id) AS d ON d.id = i.id
WHERE c.id = 574413
GROUP BY c.id, c.name
The issue is in the joining logic. The table customers is used as the driving table in the joins. But in the second join, you are using a derivative key column from the first join, to join with the third tables. This is resulting in a Cartesian output doubling the records from the result from the nth-1 join, which is leading to customer_invoices.total_price getting repeated twice, hence the rolled up value of this field is doubled.
At a high level I feel that the purpose of rolling up the prices is already achieved in SUM(customer_invoice_details.total_price).
But if you have a specific project requirement that SUM(customer_invoices.total_price) should also be obtained and must match with SUM(customer_invoice_details.total_price), then you can do this:
In a separate query, Join customer_invoice_details and customer_invoices. Roll up the pricing fields, and have a result such that you have only one record for one customer ID.
Then use this as a sub-query and join it with the customers table.
You are aggregating along multiple dimensions. This is challenging. I would suggest doing the aggregation along each dimension independently:
select c.id as customerID, c.last_name, c.first_name,
ci.invoice_total,
cid.details_total
from customers c join
(select ci.sum(ci.total_price) as invoice_total
from customer_invoices ci
group by ci.customer_id
) ci
on ci.customer_id = c.id join
(select ci.sum(cid.total_price) as details_total
from customer_invoices ci join
customer_invoice_details cid
on cid.customer_invoice_id = ci.id
group by ci.customer_id
) cid
on cid.customer_id = c.id
where c.id = 574413;
A faster version (for one customer) uses correlated subqueries:
select c.id as customerID, c.last_name, c.first_name,
(select ci.customer_id, sum(ci.total_price) as invoice_total
from customer_invoices ci
where ci.customer_id = c.id
) as invoice_total,
(select ci.customer_id, sum(cid.total_price) as details_total
from customer_invoices ci join
customer_invoice_details cid
on cid.customer_invoice_id = ci.id
where ci.customer_id = c.id
) as details_total
from customers c
where c.id = 574413;

inner join three tables results in multiplied values

I'm trying to (let's say) gather a report on customers.
In that report I want to include sum of orders and ticket number for each client.
Tables:
Customer(id, name)
Order(id, customer_id, amount)
support_ticket(id, customer_id)
query:
select
c.id as 'Customer',
count(distinct t.id) as "Ticket count",
count(distinct o.id) as "Order count",
sum(o.amount) as 'Order Amount'
from customer as c
inner join `order` as o on c.id = o.customer_id
inner join support_ticket as t on c.id = t.customer_id
group by c.id
Since I join with customer.id on the two tables, I get all the rows "duplicated", since I get all possible combinations, so if the client as multiple tickets, the sum(o.amount) will we multiplied because of "duplicated rows"
sqlFiddle (mysql): http://sqlfiddle.com/#!9/ba39ba/13
sqlFiddle (pg): http://sqlfiddle.com/#!17/bc32e/7
It seems like a simple case but I've been looking at it too much I think, I can't find the proper way to do that report.
What am I doing wrong?
your best bet is to re-write the Aggregation off the Order table as as Derived Table;
EG
select
c.id as 'Customer',
count(distinct t.id) as "Ticket count",
o.amount as 'Order Amount' ,
o.[Order count]
from customer as c
inner join
(SELECT
o.customer_id,
sum(amount) as amount ,
count(distinct o.id) as "Order count"
from [order]
group by o.customer_id)
as o on c.id = o.customer_id
inner join support_ticket as t on c.id = t.customer_id
group by
c.id ,
o.amount ,
o.[Order count]
Note that the Derived Table Columns then are added to the group by clause at the bottom.
Cheers!
Just calculate order values in a sub-query and join it.
SELECT
c.id as 'Customer'
,count(DISTINCT st.id) as 'Ticket Count'
,o.`Order Count`
,o.amount as `Order Amount`
FROM customer c
INNER JOIN support_ticket st
on c.id = st.customer_id
INNER JOIN (
SELECT
customer_id
,SUM(amount) as 'amount'
,count(distinct id) as 'Order Count'
FROM `order`
group by customer_id
) o
on c.id = o.customer_id
GROUP BY c.id;
select c.id as 'Customer'
,t2.count_ticket as "Ticket count"
,t1.count_order as "Order count"
,t1.amount as 'Order Amount'
from customer as c
inner join (select customer_id
,count(id) as count_order
,sum(amount) as amount
from Order group by customer_id) t1
on c.id = t1.customer_id
inner join (select customer_id
,count(id) as count_ticket
from support_ticket group by customer_id) t2
on c.id = t2.customer_id
In cases like yours, when I think the solution of my problem should be fairly simple but I cant wrap my head around it, I tend to use a WITH clause.
Not because its better, but because it helps me to understand my code better by splitting up complexity. First I create a relatively simple temp. Solving the first part of my problem.
WITH temp AS (
SELECT
c.id AS "customer",
COUNT(DISTINCT o.id) AS "order_count",
SUM(o.amount) AS "order_amount"
FROM customer AS c
INNER JOIN "order" AS o on c.id = o.customer_id
GROUP BY c.id
)
Then I simply select the first half of my solution from temp, adding this way all intermediate results, and solve the second part of my initial sql.
SELECT
temp.customer,
COUNT(DISTINCT t.id) as "ticket_count",
temp.order_count,
temp.order_amount
FROM temp
INNER JOIN support_ticket as t on temp.customer = t.customer_id
GROUP BY temp.customer, temp.order_count, temp.order_amount
The principle is the same like in all previous answers, but SELECTS are separated and I can check them fast, and continue on if I'm happy with parts of the solution.

SELECT a column and SUM() of values from another table in SQL

I'm pretty new with SQL, and this is giving me trouble. The idea is that I have several tables. Here are the relevant tables and columns:
customers:
customer_id, customer_name
orders:
order_id, customer_id
orderline:
order_id, item_id, order_qty
items:
item_id, unit_price
I need to return customer_name as well as total revenue from that customer (calculated as item_price * order_qty * 2).
Here's what I have written:
SELECT customers.customer_name, sum(revenue)
FROM SELECT orderline.order_qty * items.unit_value * 2 AS revenue
FROM orderline
INNER JOIN orders
ON orderline.order_id = orders.order_id
INNER JOIN customers
ON revenue.customer_id = customers.customer_id;
This throws a syntax error and I'm not really sure how to proceed.
This is only one example of this type of problem that I need to work out, so more generalized answers would be helpful.
Thanks in advance!
EDIT:
With help from answers I ended up with this code, which just gets total revenue and puts it next to the first person in the DB's name. What did I get wrong here?
SELECT customers.customer_name, sum(revenue)
FROM(SELECT orderline.order_qty * items.unit_price * 2 AS revenue, orders.customer_id AS CustomerID
FROM( orderline
INNER JOIN orders
ON orderline.order_id = orders.order_id
INNER JOIN items
ON orderline.item_id = items.item_id)) CustomerOrders
INNER JOIN customers
ON CustomerOrders.CustomerID = customers.customer_id;
A couple issues with your query.
First, you need to scope your subquery and alias it:
(SELECT orderline.order_qty * items.unit_value * 2 AS revenue
FROM orderline
INNER JOIN orders
ON orderline.order_id = orders.order_id) CustomerOrders
Secondly, you need to select more than the revenue in the subquery since you are joining it to your customers table
(SELECT
orderline.order_qty * items.unit_value * 2 AS revenue,
orders.customer_id AS CustomerId
FROM
orderline
INNER JOIN orders ON orderline.order_id = orders.order_id) CustomerOrders
Then you need to use the subquery alias in the join to the customers table and wrap it all up in a group by customer_id and CustomerOrders.Revenue
I would tend to do it differently. I'd start with selecting from the customer table, because that is the base of what you are looking for. Then I'd do a cross apply on the orders that would all aggregating the order revenue in the subquery. It would look like this (tsql, you could do the same in mysql with a join with some aggregation):
SELECT
customers.customer_name,
ISNULL(customerOrders.Revenue, 0) AS Revenue
FROM
customers
OUTER APPLY (
SELECT
SUM (orderline.order_qty * items.unit_value * 2) AS Revenue
FROM
orders
INNER JOIN
orderline ON orders.order_id = orderline.order_id
INNER JOIN
items on orderline.item_id = items.item_id
WHERE
orders.customer_id = customers.customer_id
) CustomerOrders
In this case, the subquery aggregates all your orders for you and only returns one row per customer, so no extraneous returned data. Since it's an outer apply, it will also return null for customers with no orders. You could change it to a CROSS APPLY and it will filter out customers with no orders (like an INNER JOIN).
SELECT c.customer_name,
sum(COALESCE(ol.order_qty,0) * COALESCE(i.unit_value,0) * 2)
FROM customers c
INNER JOIN orders o
ON o.customer_id = c.customer_id;
INNER JOIN orderline ol
ON ol.order_id = o.order_id
INNER JOIN items i
ON i.item_id = ol.item_id
GROUP BY c.customer_id
select customer_name, sum(item_price * order_qty * 2) as total_revenue
from (
select * from customers
inner join orders using(customer_id)
inner join orderline using(order_id)
inner join items using(item_id)
)
group by customer_name
select
c.customer_name,
r.revenue
from
customers c
inner join
orders ord on
ord.customer_id = c.customer_id
inner join
(select i.item_id, o.order_id, sum(o.order_qty * items.unit_value * 2) as revenue
from orderline o
inner join items i on
i.item_id = o.item_id
group by o.order_id, i.item_id) as r on r.order_id = o.order_id

inner-join mysql x3

I need to display cust_id, the customer forename and surname, the product name<-(from products table) and date of sale<--(from sales table), also I need to display in order of the most recent dates first.
This is what I have got so far:
SELECT
customer.cust_id,
customer.forename,
customer.surname,
products.prod_name,
sales.Date_of_sale
FROM customers c
INNER JOIN sales s ON c.cust_id = s.cust_id
INNER JOIN products p ON s.product_id = p.product_id
ORDER BY s.Date_of_sale DESC
Any help would be appreciated.
This
SELECT
c.cust_id,
c.forename,
c.surname,
p.prod_name,
s.Date_of_sale
FROM customers c
INNER JOIN sales s ON c.cust_id = s.cust_id
INNER JOIN products p ON s.product_id = p.product_id
ORDER BY s.Date_of_sale DESC
will work