MySQL nested select query performance - mysql

Is there a difference between these two queries? Like performance issues, etc?
Query 1:
select i.invoice_id,
i.total_price
from ( select invoice_id,
sum(price) as total_price
from orders
group by
invoice_id
) as i
inner join invoice
ON i.invoice_id = invoice.invoice_id
Query 2:
select invoice.invoice_id,
orders.total_price
from invoice
inner join ( select invoice_id,
sum(price) as total_price
from orders
group by
invoice_id
) orders
ON orders.invoice_id = invoice.invoice_id
Thanks!

Let me rewrite your queries without any sinifical changes:
Query 1
SELECT i.invoice_id,
i.total_price
FROM invoice INNER JOIN (
SELECT invoice_id,
sum(price) AS total_price
FROM orders
GROUP BY
invoice_id
) AS i
ON i.invoice_id = invoice.invoice_id;
Query 2:
SELECT invoice.invoice_id,
i.total_price
FROM invoice INNER JOIN (
SELECT invoice_id,
sum(price) AS total_price
FROM orders
GROUP BY
invoice_id
) AS i
ON i.invoice_id = invoice.invoice_id;
things I changed:
order of JOIN (which doesn't matter, since it is INNER)
table alias (orders to i, and I really don't understand, why you wanted to name it differently)
Now, it is obvious, that the only difference between them - the first argument in the main SELECT. Your question could have made sence (if there was index on one column and wasn't on the other, and, dependant on the query, you would not always have used both orders.invoice_id and invoice.invoice_id), but since you already retrieving the both column for INNER JOIN it doesn't.
Futhermore, these queries are redundant. As already been mentioned by #valex, your query (actually - both of them) could (and must) be simplified to this:
SELECT invoice_id,
sum(price) AS total_price
FROM orders
GROUP BY
invoice_id;
So, no, there is no differnce in perfomance. And, surely, there is no difference in resultset.
Also, I'd like you to know, that you can always use EXPLAIN for perfomance questions.

Your first query
select i.invoice_id,
i.total_price
from ( select invoice_id,
sum(price) as total_price
from orders
group by
invoice_id
) as i
inner join invoice
ON i.invoice_id = invoice.invoice_id
is equivalent by its result to:
select invoice_id,
sum(price) as total_price
from orders
group by invoice_id

To get the same result (if all invoice_id from orders exist in the invoice table) you don't need to JOIN the Invoice table just use query:
select invoice_id,
sum(price) as total_price
from orders
group by invoice_id

Related

How to count rows grouped by an attribute, where that attribute has not appeared before a specific date?

Assume table Orders
CREATE TABLE Orders (
order_id int(11),
customer_id int(11),
purchase_date datetime
)
We need the following report: For customers who did not place an order prior to this month, how many orders were placed per customer?
Here is the very slow sql that I am currently using:
SELECT count(order_id) num_of_orders, customer_id
FROM (SELECT order_id, customer_id
FROM orders
WHERE customer_id NOT IN (SELECT DISTINCT customer_id
FROM orders
WHERE purchase_date < '2019-03-01')) a
GROUP BY customer_id;
Is there a faster/more efficient way to write this query?
I would re-write it as :
SELECT COUNT(order_id), customer_id
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM orders o1 WHERE o1.customer_id = o.customer_id AND o1.purchase_date < '2019-03-01')
GROUP BY customer_id;
This will speed up your performance thanks to an index on orders(customer_id).
However, if the orders table contains null for customer_id, then NOT IN subquery will return null.
I'm not sure this will be faster, but it is simpler:
SELECT customer_id, count(*) as num_of_orders
FROM orders o
GROUP BY customer_id
HAVING MIN(purchase_date) >= '2019-03-01';

How to match up dates on 2 different tables and join to an ID table?

I have a table full of product ids and their attributes. I want to join sales data and receipt data from 2 different tables and a different row for each id and date combo. So I want the output to look like this:
I tried joining the product id table and sales table to the receipt table but I'm not sure how to get the dates from the sales and receipts table to match up. Not sure of how to approach this. Thanks!
Calculate the counts for each table and combine them usung UNION ALL
select
product_id
,sales_date
-- combine the counts from both tables
,sum(sales_count)
,sum(receipt_count)
from
(
-- get the counts for the sales table
select
product_id
,sales_date
,count(*) as sales_count
-- UNION needs the same number of columns in both Select -> adding a dummy column
-- 1st Select determines the datatype
,cast(0 as int) as receipt_count
from sales
group by product_id, sales_date
UNION ALL
-- get the counts for the receipts table
select
product_id
,receipt_date
,0
,count(*)
from receipts
group by product_id, receipt_date
) as dt
group by product_id, receipt_date
select p.product_id, s.sales_date, s.sales_count, r.receipt_count
from
products p,
(select count(*) sales_count, sales_date, product_id from sales group by 2,3) s
(select count(*) receipt_count, receipt_date, product_id from receipts group by 2,3) r
where
p.product_id = s.product_id
and p.product_id = r.product_id
and s.sales_date=r.receipt_date
;

Mysql query to get lowest value from multiple conditions using UNION operator

I have price table with three columns:
id, price, product_id.
product_id can contain multiple prices.
I need to get lowest price product_id when query perform in mysql.
I am using multiple conditions using UNION operator in this table data is coming but results are wrong.
In this table product_id 101 has 4999 but I am getting 5000, even I have set order by price ASC
Here is my mysql fiddle link
mysql fiddle
This is very basic SQL.
select product_id, min(price) as price
from price
group by product_id;
To fetch minimum prices per product for a given range add them to case statement in group by clause:
select product_id, min(price) as price
from price
group by product_id, case when price between 100 and 5000 then 1 else 2 end;
SQL fiddle
Assuming you need the ID of each record with the min price.
Select p.* from price p
INNER JOIN (Select product_ID, Min(Price) as price from `price` group by Product_ID) sub
on sub.product_Id = p.product_Id
and sub.price = p.price
SQL FIDDLE
Otherwise...
Select product_ID, Min(Price) as price from `price` group by Product_ID
SQL FIDDLE
---- UPDATE----
Still not sure I understand the question... which is why I asked for expected output given your data. I may be able to infer the requirements which i'm not getting. As it stands I had NO idea why a union was needed, nor how this "grouping" came into play.
SELECT p.* from price p
INNER JOIN (Select product_ID, Min(Price) as price from `price` group by Product_ID) sub
on sub.product_Id = p.product_Id
and sub.price = p.price
SQL Fiddle
Assuming no overlap of product_ID in ranges...
SELECT pr.id,MIN(pr.price),pr.product_id FROM price pr WHERE pr.price >= 100 AND pr.price <= 5000 group by pr.product_id UNION SELECT pr.id,MIN(pr.price),pr.product_id FROM price pr WHERE pr.price >= 5001 AND pr.price <= 10000 group by pr.product_id

MySQL subquery with group by in left join - optimisation

MySQL seems to be unable to optimise a select with a GROUP BY subquery and ends up in long execution times. There must be a known optimisation for such common scenario.
Let's assume that we're trying to return all orders from the database, with a flag indicating if it is the first order for the customer.
CREATE TABLE orders (order int, customer int, date date);
Retrieving the first orders by customer is superfast.
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
However, it becomes very slow once we join this with the full order set using a subquery
SELECT order, first_order FROM orders LEFT JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders ON orders.order=first_orders.first_order;
I hope there is a simple trick that we're missing, because otherwise it would be about 1000x faster to do
CREATE TEMPORARY TABLE tmp_first_order AS
SELECT customer, min(order) as first_order FROM orders GROUP BY customer;
CREATE INDEX tmp_boost ON tmp_first_order (first_order)
SELECT order, first_order FROM orders LEFT JOIN tmp_first_order
ON orders.order=tmp_first_order.first_order;
EDIT:
Inspired by #ruakh proposed option 3, there is indeed a less ugly workaround using INNER JOIN and UNION, which has acceptable performance yet does not require temporary tables. However, it is a bit specific to our case and I am wondering if a more generic optimisation exists.
SELECT order, "YES" as first FROM orders INNER JOIN (
SELECT min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_1 ON orders.order=first_orders_1.first_order
UNION
SELECT order, "NO" as first FROM orders INNER JOIN (
SELECT customer, min(order) as first_order FROM orders GROUP BY customer
) AS first_orders_2 ON first_orders_2.customer = orders.customer
AND orders.order > first_orders_2.first_order;
Here are a few things you might try:
Removing customer from the subquery's field-list, since it's not doing anything anyway:
SELECT order,
first_order
FROM orders
LEFT
JOIN ( SELECT MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.order = first_orders.first_order
;
Conversely, adding customer to the ON clause, so it actually does something for you:
SELECT order,
first_order
FROM orders
LEFT
JOIN ( SELECT customer,
MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.customer = first_orders.customer
AND orders.order = first_orders.first_order
;
Same as previous, but using an INNER JOIN instead of a LEFT JOIN, and converting your original ON clause into a CASE expression:
SELECT order,
CASE WHEN first_order = order THEN first_order END AS first_order
FROM orders
INNER
JOIN ( SELECT customer,
MIN(order) AS first_order
FROM orders
GROUP
BY customer
) AS first_orders
ON orders.customer = first_orders.customer
;
Replacing the whole JOIN approach with an uncorrelated IN-subquery in a CASE expression:
SELECT order,
CASE WHEN order IN
( SELECT MIN(order)
FROM orders
GROUP
BY customer
)
THEN order
END AS first_order
FROM orders
;
Replacing the whole JOIN approach with a correlated EXISTS-subquery in a CASE expression:
SELECT order,
CASE WHEN NOT EXISTS
( SELECT 1
FROM orders AS o2
WHERE o2.customer = o1.customer
AND o2.order < o1.order
)
THEN order
END AS first_order
FROM orders AS o1
;
(It's very likely that some of the above will actually perform worse, but I think they're all worth trying.)
I would expect this to be faster when using a variable instead of the LEFT JOIN:
SELECT
`order`,
If(#previous_customer<>(#previous_customer:=`customer`),
`order`,
NULL
) AS first_order
FROM orders
JOIN ( SELECT #previous_customer := -1 ) x
ORDER BY customer, `order`;
That's what my example on SQL Fiddle returns:
CUSTOMER ORDER FIRST_ORDER
1 1 1
1 2 (null)
1 3 (null)
2 4 4
2 5 (null)
3 6 6
4 7 7

Joining two mysql resultset using left outer join

Below is the actual invoice table
After grouping it based on invoiceID, the resultset is
And actual payment table is
and its payment resultset after grouping based on invoiceID is
Now i want to join these two resultsets [Payment and Invoice table] and find the balance amount subtracting Total from Amount based on InvoiceID and for non matching records the balance column should be zero. I tried this, but didn't get the expected result.
Try something like this,
SELECT a.InvoiceID,
a.totalSum InvoiceAmount,
b.totalSum PaymentAmount,
a.totalSum - COALESCE(b.totalSum, 0) TotalBalance
FROM
(
SELECT InvoiceID, SUM(Total) totalSum
FROM InvoiceTB
GROUP BY InvoiceID
) a LEFT JOIN
(
SELECT InvoiceID, SUM(Total) totalSum
FROM paymentTB
GROUP BY InvoiceID
) b
ON a.InvoiceID = b.InvoiceID