Why is an inner join necessary necessary when all info is located in ONE table? Explain please. just getting hang of joins.
If you were interviewing for an entry level analyst job (which I am), given the following sample query and table description, what other observations/questions would you have on this information?
Table: sales_data
**Field** **Description**
date Date of order|
customer_id Unique ID for Customers|
brand Unique ID for the brand of a purchased item|
quantity Quantity of item purchased|
amount Item Price
select customer_id, brand, sum(quantity*amount) as sales
from sales_data a
inner join (
select distinct customer_id
from sales_data
where lower(brand) = 'chanel'
and quantity > 0
and date >= to_date('01/01/2017','MM/DD/YYYY')
) b
on a.customer_id = b.customer_id
where date >= to_date('01/01/2017','MM/DD/YYYY')
and quantity > 0
group by customer_id, brand
Inner join in the same table sometimes required if you fill all your data in one single table with redundant information. For the table description (You explained), there could have Three different table easily - customer, brand & sales_details with appropriate relational key. And that case no Self Join (inner join in the same table) not required. But as data is available in a single table, the requirement actually lead you to the necessity of using a Self Join query. As per output of the query, its clear that you need this following requirement steps-
First you need a distinct list of customer who are selling product with brand='chanel'
Secondly, you need all product selling details for those selected customers in step one.
To avail results for above requirement, a Self Join is required.
Other Question: How can you re-write the query avoiding Inner join to pic the same result?
Ans: You can simply use a inner query in the where statement as below to get the same result-
select customer_id, brand, sum(quantity*amount) as sales
from sales_data a
where date >= to_date('01/01/2017','MM/DD/YYYY')
and quantity > 0
and a.customer_id IN (
select distinct customer_id
from sales_data
where lower(brand) = 'chanel'
and quantity > 0
and date >= to_date('01/01/2017','MM/DD/YYYY')
)
group by customer_id, brand
Your sales_data table presumably contains data for all of the sales of customers. So lets say you wanted to ask a question such as:
What is the total sales revenue for customers that have purchased 'chanel' brand?
The first thing you need to do is identify all of the customers that have purchased 'chanel' then you need to go and get all of their other sales to determine the total revenue.
The inner/sub query does exactly that it identifies the customers that have purchased 'chanel' brand since 1/1/2017.
select distinct customer_id
from sales_data
where lower(brand) = 'chanel'
and quantity > 0
and date >= to_date('01/01/2017','MM/DD/YYYY')
The outer query then joins to those customers and gets all of the sales data by brand by customer since 1/1/2017 that have a positive quantity.
select customer_id, brand, sum(quantity*amount) as sales
from sales_data a
inner join (
Customers That Purchased 'Chanel'
) b
on a.customer_id = b.customer_id
where date >= to_date('01/01/2017','MM/DD/YYYY')
and quantity > 0
group by customer_id, brand
There are methods of using EXISTS or IN to accomplish the same thing. However be very careful of IN if a customer_id could be NULL.
Yes, you are right you can do with one referencing to the table. The inner join gets particular rows and then calculated their sales value. You can do this directly:
select customer_id, brand, sum(CASE WHEN and lower(brand) = 'chanel' THEN quantity*amount ELSE 0 END) as sales
from sales_data
where date >= to_date('01/01/2017','MM/DD/YYYY')
and quantity > 0
group by customer_id, brand
having MAX(CASE WHEN lower(brand) = 'chanel' THEN 1 ELSE 0 END) = 1;
Related
I have two tables, let's say OrderPlaced and OrderDelivered.
The OrderPlaced table looks like this:
In a single order we can have multiple products(which is defined by sku in the table) and each product can have multiple quantity.
The OrderDelivered table looks like this:
So technically 3 products have not been delivered. Orderid 1000 - product S101, Orderid 1001 - product S102(as 3 quantity required, but 2 delivered) and Orderid 1002 - product S100.
I am trying to write a SQL query that can give me the OrderId and sku those have not been delivered. For now I have written something like
select OrderPlaced.orderid,OrderPlaced.sku
from OrderPlaced
left join OrderDelivered
on OrderPlaced.Orderid = OrderDelivered.orderid and OrderPlaced.sku = OrderDelivered.sku
where OrderDelivered.sku is NULL;
This is giving me Orderid 1000 - product S101 and Orderid 1002 - product S100, but Orderid 1001 - product S102 is missing. I understand I have to do a check on qty as well, but couldn't think how to do that. I would really appreciate it if someone can help me with that part.
Add up the deliveries per order and sku and then outer join the delivered quantities to the order table so you can compare the quantities.
select
p.orderid,
p.sku,
p.qty as ordered,
coalesce(d.sum_qty, 0) as delivered
from orderplaced p
left join
(
select orderid, sku, sum(qty) as sum_qty
from orderdelivered
group by orderid, sku
) d on d.orderid = p.orderid and d.sku = p.sku
where p.qty > coalesce(d.sum_qty, 0)
order by p.orderid, p.sku;
Your query works for any items that have not been delivered at all, this is your WHERE OrderDelivered.sku IS NULL. But you can also have a scenario in which fewer items are delivered than ordered, and importantly, you can have multiple records related to your deliveries even if they refer to the same order and sku (two rows with 1 qty each).
In this case you will need to sum up all the deliveries per placed order id, sku and quantity (GROUP BY clause in the query below) check if that sum (or 0 if nothing is found) differs from the placed order (HAVING clause). You could use such a query:
SELECT OrderPlaced.orderid, OrderPlaced.sku,
OrderPlaced.qty - COALESCE(SUM(OrderDelivered.qty), 0) AS qty_missing,
CASE
WHEN SUM(OrderDelivered.qty) IS NULL
THEN 'Yes'
ELSE 'No'
END AS is_missing_completely
FROM OrderPlaced
LEFT
JOIN OrderDelivered
ON OrderPlaced.Orderid = OrderDelivered.orderid
AND OrderPlaced.sku = OrderDelivered.sku
GROUP BY OrderPlaced.orderid, OrderPlaced.sku, OrderPlaced.qty
HAVING OrderPlaced.qty != COALESCE(SUM(OrderDelivered.qty), 0)
Here's a live demo on dbfiddle
I would create two aggregated representations of your ordered and delivered products, and then outer join them to get the differences. If you are using MySql 8 you can represent these as a CTE, otherwise just use two equivalent sub-queries
with op as (
select OrderId, Sku, Sum(qty) Qty
from OrderPlaced
group by OrderId, Sku
), od as (
select OrderId, Sku, Sum(qty) Qty
from OrderDelivered
group by OrderId, Sku
)
select op.OrderId, op.Sku, op.Qty - Coalesce(od.qty,0) notDelivered
from op
left join od on od.orderid = op.orderid and od.sku = op.sku
where op.Qty - Coalesce(od.qty,0)>0;
Example DB<>Fiddle
I want to write a SQL query to compute which customers have purchased more than 4 products on the same day.
Here are my tables:
Sales
(date, customer_id, product_id, units_sold)
Products
(id, name, price)
Customers
(id, name)
& here's what I have so far:
SELECT COUNT(s.product_id) as total_customers
FROM Sales s1
WHERE DATEDIFF(s1.date, s2.date)=0
INNER JOIN Sales s2
ON s1.product_id = s2.product_id
HAVING COUNT(s.product_id) > 4;
If you want the customers who have purchased more than 4 products on the same date:
SELECT DISTINCT s.customer_id
FROM Sales s
GROUP BY s.customer_id, date(s.date)
HAVING COUNT(*) > 4;
This is one of the few cases where SELECT DISTINCT is used with GROUP BY. If you want to know the dates as well, then include date(s.date) in the SELECT.
Note that this assumes that any given product is purchased by a customer only once on each date. If a customer can have multiple records for a single product on one date, use COUNT(DISTINCT product_id) instead of COUNT(*).
To get the total number of customers, use a subquery:
SELECT COUNT(*)
FROM (SELECT DISTINCT s.customer_id
FROM Sales s
GROUP BY s.customer_id, date(s.date)
HAVING COUNT(*) > 4
) c
Too many mistakes's in your query try this
SELECT customer_id,cast(s1.date as date),COUNT(s1.product_id) as total_customers
FROM Sales s1
Group by customer_id,cast(s1.date as date)
HAVING COUNT(s1.product_id) > 4;
I have 3 tables:
products (sku, price, priceOffer, etc)
stock (sku, branch, items)
sales_provider (sku, items, date)
The table products holds all the information about a product, except for stock or sales. Current stock is stored in the table stock, and has information about how many items are available on each branch. The table "sales_provider" stores how many items were sold each day for every product. The product ID is "sku".
Now, I'm trying to get with one query the product that:
Has generated the best profit (number of sales * offered price)
Is still available on stock
And, of course, I want to know how many items are still on stock and how many items were sold.
I'm trying a query like this:
select
*
from
(
select
p.*,
sum(s.items) stock,
sum(sp.items) sales,
case when
p.priceOffer < p.price
and
p.priceOffer > 0
then
p.priceOffer
else
p.price
end finalPrice
from
products p
join
stock s
on
s.sku = p.sku
join
sales_provider sp
on
sp.sku = p.sku
group by
sku
) temp
where
stock > 0
order by
(finalPrice * sales) desc
limit 1;
But I'm having problems with that. Basically, I'm getting a huge sum of stock items ans sales_provider items, not the real amounts. Also, it's a slow query (it's taking about half a second with only 9,500 products).
I've been trying to modify it and I'm having doubts about the subquery being necessary, but I just can't nail it.
If someone can help me improve it and get the correct result, I'll really appreciate it.
Thanks in advance for any helpful comment.
Francisco
For this type of query, you want to do the aggregations separately on stock and sales_provider. Otherwise, you will generate a cartesian product between the two tables for a given item.
Try this:
select p.sku, (salesitems*offeredprice) as profit, stockitems, salesitems
from products p left join
(select sku, SUM(items) as stockitems
from stock
group by sku
) s
on p.sku = s.sku left join
(select sku, SUM(items) as salesitems
from sales_provider sp
group by sku
) sp
on p.sku = sp.sku
where p.stockitems > 0
order by profit desc
This assumes that product(sku) is unique.
I want to get all order id numbers for selected customer which not paid till now, my data show as following:
What I want is Write a SELECT statement that answers this question:
select orderID
from order
where customer id = #custID
and Total cashmovementValue
for current order id
is less than total (sold quantity * salePrice )
for current order id
How to do it?
Thanks.
You need to compare the sum of each order line with the sum of each payment per order. GROUP BY and a few sub-queries is what you need to get the job done.
Something like this should work:
SELECT
O.OrderID
FROM [Order] O
INNER JOIN (
-- Add up cost per order
SELECT
OrderID,
SUM(SoldQuantity * P.SalePrice) AS Total
FROM OrderLine
INNER JOIN Product P ON P.ProductID = OrderLine.ProductID
GROUP BY OrderID
) OL ON OL.OrderID = O.OrderID
LEFT JOIN (
-- Add up total amount paid per order
SELECT
OrderID,
SUM(CashMovementValue) AS Total
FROM CashMovement
GROUP BY OrderID
) C ON C.OrderID = O.OrderID
WHERE
O.CustomerID = #custID
AND ( C.OrderID IS NULL OR C.Total < OL.Total )
EDIT
I've just noticed you're not storing the sale price on each order line. I've updated my answer accordingly, but this is a very bad idea. What will happen to your old orders if the price of an item changes? It is okay (and actually best practice) to denormalise the data by storing the price at the time of sale on each order line.
I have three tables customer, customer_account, account_transaction
Table structure is as follow -
Customer
Id,
Branch,
Name
..
customer_account
Id,
Cust_id,
Loanamout,
EMI
account_transaction
ID,
account_id,
amount,
date
I need to get branch wise details in form of count of Loan given, sum of loan given, and sum of emi received for a particular branch. below is my current query -
SELECT
count(s.id) as cntloan,
SUM(s.Loanamout)
(
SELECT SUM(amount)
FROM account_transaction i
WHERE s.id = i.account_id
) AS curbal
From
customer as c,
customer_account as s
where c.branch = 1 and s.cust_id = c.id
It is giving me desired result for count of loan and sum of loan given. but not giving the right sum of EMI paid by customers
Can anyone help me on this.
Thanks you very much
This SQL has aggregative functions such as count and sum, so without a group by, this would not work.
SELECT customer.id,
COUNT(customer_account.id) as cntloadn,
SUM(customer_account.loan) as loan,
SUM(account_transaction.amount)
FROM customer
JOIN customer_account ON customer_account.cust_id = customer.id
JOIN account_transaction ON account_transaction.account_id = customer_account.id
GROUP BY customer.id