I have 4 tables, Customers, Products, Sales & Sale_Items. I pull data from it using the query below.
SELECT (
SELECT c.name
FROM Customers c
WHERE s.customer_id=c.id
) customer
,(
Select group_concat(description)
FROM (
SELECT si.id
,si.sale_id
,concat("x", si.Qty, " ", p.name, " ",(si.total)) description
FROM Sale_Items si
LEFT JOIN Products p ON p.id = si.product_id
) p
where s.id = Sale_ID
GROUP BY Sale_ID
) detail,
s.total
FROM Sales s
The query produces the result, but it becomes slow with just 2000 records (takes 114 seconds to finish)
Customer Product Total
--------------------------------------
James x1 ItemA 10.00 75.00
x3 ItemB 15.00
x1 ItemC 20.00
Mark x2 ItemA 10.00 50.00
x2 ItemB 15.00
Bisi x1 ItemC 20.00 30.00
x2 ItemA 10.00
how can i make this faster?
An attempt has been made here
https://www.db-fiddle.com/f/pkL2HtsT659EXgRSevFSAm/4
If we want to stick with correlated subqueries, we can eliminate the inline view p.
That's going to get materialized for every row retrieved from Sales. The predicate in the WHERE clause in the outer query doesn't get "pushed" down into the view. So the materialized view (or "derived table" in the MySQL parlance) is going to be a full set, and from that, we're going to pick out just a few rows. And we're going to repeat that for each row from Sales.
Unwinding that derived table should give us some performance benefit. This would be reasonable approach for a small number of rows returned from Sales, with suitable indexes defined. That is, if we were limiting the number of rows examined by the outer query with a WHERE clause. With a large number of rows, those correlated subqueries are going to drag down performance.
SELECT ( SELECT c.name
FROM Customers c
WHERE c.id = s.customer_id
) AS customer
, ( SELECT GROUP_CONCAT(CONCAT('x',si.Qty,' ',p.name,' ',si.total) ORDER BY p.name SEPARATOR '\r\n')
FROM Sale_Items si
LEFT
JOIN Products p
ON p.id = si.product_id
WHERE si.sale_id = s.id
) AS detail
, s.total
FROM Sales s
WHERE ...
ORDER
BY ...
If the query is returning all rows from Sales and we are doing the whole bloomin' set, then I'd tend to avoid the correlated subqueries. (That's because the subqueries gets executed for each and every row returned by the outer query. Those subqueries are going to eat our lunch, in terms of performance, with a large number of rows returned.)
Assuming id is unique in customers, we're usually much better off with a join operation.
SELECT c.name AS customer
, d.detail
, s.total
FROM Sales s
LEFT
JOIN Customers c
ON c.id = s.customer_id
LEFT
JOIN ( SELECT si.sale_id
, GROUP_CONCAT(CONCAT('x',si.Qty,' ',p.name,' ',si.total) ORDER BY p.name SEPARATOR '\r\n') AS detail
FROM Sale_Items si
LEFT
JOIN Products p
ON p.id = si.product_id
GROUP
BY si.sale_id
) d
ON d.sale_id = s.id
ORDER
BY ...
The inline view d is going to be expensive with large sets; but at least we're only doing that query one time, materializing the results into a "derived table". Then the outer query can run, and retrieve rows from the derived table.
Related
I have 5 SQL tables
store
staff
departments
sold_items
staff_rating
I created a view that JOINs this four of the tables together. The last table (staff_rating),I want to get the rating column at a time close to when items was sold (sold_items.date) for the view rows.
I have tried the following SQL Queries which works but have performance issues.
SQL QUERY 1
SELECT s.name,
s.country,
d.name,
si.item,
si.date,
(SELECT rating
FROM staff_ratings
WHERE staff_id = s.id
ORDER BY DATEDIFF(date, si.date) LIMIT 1) AS rating,
st.name,
st.owner
FROM store st
LEFT OUTER JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN sold_items si ON si.store_id = st.id
SQL QUERY 2
SELECT s.name,
s.country,
d.name,
si.item,
si.date,
si.rating ,
st.name,
st.owner
FROM store st
LEFT OUTER JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN (SELECT *,
(SELECT rating
FROM staff_ratings
WHERE staff_id = si.staff_id
ORDER BY DATEDIFF(date, si.date) LIMIT 1) AS rating
FROM sold_items) si ON si.store_id = st.id
SQL Query 2 is faster than SQL Query 1. But Both still have performance issue. Appreciate help for a query with better performance. Thanks in advance.
Your query doesn't look right to me (as mentioned in a comment on the original post; lacking staff_id in the join on the sales, etc)
Ignoring that, one of your biggest performance hits is likely to be this...
ORDER BY DATEDIFF(date, si.date) LIMIT 1
That order by can only be answered by comparing EVERY record for that staff member to the current sales record.
What you ideally want to be able to do is find the appropriate staff rating from an index, and not to have to run computations that involve dates from both the ratings table and the sales table.
If, for example, you wanted "the most recent rating BEFORE the sale", the query can be substantially improved...
SELECT
s.name,
s.country,
d.name,
si.item,
si.date,
(
SELECT sr.rating
FROM staff_ratings sr
WHERE sr.staff_id = s.id
AND sr.date <= si.date
ORDER BY sr.date DESC
LIMIT 1
)
AS rating,
st.name,
st.owner
FROM store st
LEFT JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN sold_items si ON si.store_id = st.id
Then, with an index for staff_ratings(staff_id, date, rating) the optimiser can very quickly look up which rating to use, without having to scan Every Single Rating for that staff member.
Why DATEDIFF? Would something like this work better? If so, the given index will make it work much faster.
WHERE staff_id = s.id
AND s.date >= s1.date
ORDER BY s.date
LIMIT 1
And INDEX(staff_id, date)
Do you need LEFT JOIN? Perhaps plain JOIN?
d may benefit from INDEX(store_id, name)
I have the following three tables:
I have Following Query to Join Above 3 Tables
customer.customer_id,
customer.name,
SUM(sales.total),
sales.created_at,
SUM(sales_payments.amount)
FROM
sales INNER JOIN customer ON customer.customer_id = sales.customer_id
INNER JOIN sales_payments ON sales.customer_id = sales_payments.customer_id
WHERE sales.created_at ='2020-04-03'
GROUP By customer.name
Result for Above Query is given below
Sum of sales.total is double of the actual sum of sales.total column which has 2-row count, I need to have the actual SUM of that column, without doubling the SUM of those rows, Thank you, for your help in advance..
PROBLEM
The problem here is that there are consecutive inner joins and the number of rows getting fetched in the second inner join is not restricted. So, as we have not added a condition on sales_payment_id in the join between the sales and sales_payment tables, one row in sales table(for customer_id 2, in this case) would be mapped to 2 rows in the payment table. This causes the same values to be reconsidered.
In other words, the mapping for customer_id 2 between the 3 tables is 1:1:2 rather than 1:1:1.
SOLUTION
Solution 1 : As mentioned by Gordon, you could first aggregate the amount values of the sales_payments table and then aggregate the values in sales table.
Solution 2 : Alternatively (IMHO a better approach), you could add a foreign key between sales and sales_payment tables. For example, the sales_payment_id column of sales_payment table can be introduced in the sales table as well. This would facilitate the join between these tables and reduce additional overheads while querying data.
The query would then look like:
`SELECT c.customer_id,
c.name,
SUM(s.total),
s.created_at,
SUM(sp.amount)
FROM customer c
INNER JOIN sales s
ON c.customer_id = s.customer_id
INNER JOIN sales_payments sp
ON c.customer_id = sp.customer_id
AND s.sales_payments_id = sp.sales_payments_id
WHERE s.created_at ='2020-04-03'
GROUP BY c.customer_id,
c.name,
s.created_at ;`
Hope that helps!
You have multiple rows for sales_payments and sales per customer. You need to pre-aggregate to get the right value:
SELECT c.customer_id, c.name, s.created_at, s.total, sp.amount
FROM customer c JOIN
(SELECT s.customer_id, s.created_at, SUM(s.total) as total
FROM sales s
WHERE s.created_at ='2020-04-03'
GROUP BY s.customer_id, s.created_at
) s
ON c.customer_id = s.customer_id JOIN
(SELECT sp.customer_id, SUM(sp.amount) as amount
FROM sales_payments sp
GROUP BY sp.customer_id
) sp
ON s.customer_id = sp.customer_id
I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id
This is my query:
SELECT `products`.*, SUM(orders.total_count) AS revenue,
SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars
FROM `products`
LEFT JOIN `orders`
ON (`products`.`id` = `orders`.`product_id`) AND
(`orders`.`status` = 'delivered' OR `orders`.`status` = 'new')
LEFT JOIN product_reviews
ON (products.id = product_reviews.product_id)
GROUP BY `products`.`ID`
ORDER BY products.ID DESC
LIMIT 10
OFFSET 0
When I have this second left join, my first left joins data, revenue and qty from orders table gives me values that are not true at all (way too high, many doubles?)
From this question.
I got the direction that I am getting a semi-cartesian product, so two reviews for a product is doubling the quantities, and I believe this is my problem.
How can this be solved?
The problem is that the product_reviews and orders table can have more that one row per product id. One way you can fix this is to use a subquery:
SELECT `products`.*,
o.revenue,
o.qty,
ROUND(avg_stars) as avg_stars
FROM `products`
LEFT JOIN
(
select `product_id`,
sum(total_count) revenue,
sum(quantity) qty
from `orders`
where `status` in ('delivered', 'new')
group by `product_id`
) o
ON `products`.`id` = o.`product_id`
LEFT JOIN
(
select product_id, avg(stars) avg_stars
from product_reviews
group by product_id
) pr
ON (products.id = pr.product_id)
ORDER BY products.ID DESC
LIMIT 10
OFFSET 0
Its not easy to solve this without seeing your table schemas,
I would suggest you look at your Aggregations and Group By statements first, then look at your column default values, how are you handling empty values, also look at DISTINCT in the Aggregation functions.
If all else fails and a "optimized" solution is not vital and your data volumes are low do a Sub Select only on the tables for which you require the values, within the Sub Select on 1 table you have a much narrower row scope and it will yield the correct result.
I would suggest that you supply your table schemas here.
One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, ( SELECT ROUND(AVG(r.stars))
FROM `product_reviews` r
WHERE r.product_id = p.id
) AS avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars).
If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, s.avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
LEFT
JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
, r.product_id
FROM `product_reviews` r
GROUP BY r.product_id
) s
ON s.product_id = p.id
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.
I apologize if my use of the term "semi-cartesian" was misleading or confusing.
The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).
For example, given three rows in reviews for product_id 101, and two rows in orders for product_id 101, e.g.:
REVIEWS
pid stars text
--- ----- --------------
101 4.5 woo hoo perfect
101 3 ehh
101 1 totally sucked
ORDERS
pid date qty
--- ----- ---
101 1/13 100
101 1/22 7
Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:
id date qty stars text
--- ---- ---- ---- ------------
101 1/13 100 4.5 woo hoo perfect
101 1/13 100 3 ehh
101 1/13 100 1 totally sucked
101 1/22 7 4.5 woo hoo perfect
101 1/22 7 3 ehh
101 1/22 7 1 totally sucked
Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.
I have a query that I feel is very bulky and can do with optimisation. First thing would obviously be replacing not in subquery with join but it affects the sub-sub query that I have. I'd appreciate suggestions/workaround on it.
This is the query
SELECT *
FROM lastweeksales
WHERE productID = 1234
AND retailer NOT
IN (
SELECT retailer
FROM sales
WHERE productID
IN (
SELECT productID
FROM products
WHERE publisher = 123
)
AND DATE = date(now())
)
Basically, I want to get rows from lastweek's sales on a product where retailers are not present that did sales today but sales should only be on products by a certain publisher.
:S:S:S
You can group together 2 inner subqueries easily via INNER JOIN. For the outer one you should use LEFT OUTER join and then filter on retailer IS NULL, like this:
SELECT lws.*
FROM lastweeksales lws
LEFT JOIN (SELECT s.retailer
FROM sales s
JOIN products p USING (productID)
WHERE p.publisher = 123
AND s.date = date(now())) AS r
ON lws.retailer = r.retailer
WHERE r.retailer IS NULL;