MYSQL View Query Performance Issue - mysql

I have 5 SQL tables
store
staff
departments
sold_items
staff_rating
I created a view that JOINs this four of the tables together. The last table (staff_rating),I want to get the rating column at a time close to when items was sold (sold_items.date) for the view rows.
I have tried the following SQL Queries which works but have performance issues.
SQL QUERY 1
SELECT s.name,
s.country,
d.name,
si.item,
si.date,
(SELECT rating
FROM staff_ratings
WHERE staff_id = s.id
ORDER BY DATEDIFF(date, si.date) LIMIT 1) AS rating,
st.name,
st.owner
FROM store st
LEFT OUTER JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN sold_items si ON si.store_id = st.id
SQL QUERY 2
SELECT s.name,
s.country,
d.name,
si.item,
si.date,
si.rating ,
st.name,
st.owner
FROM store st
LEFT OUTER JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN (SELECT *,
(SELECT rating
FROM staff_ratings
WHERE staff_id = si.staff_id
ORDER BY DATEDIFF(date, si.date) LIMIT 1) AS rating
FROM sold_items) si ON si.store_id = st.id
SQL Query 2 is faster than SQL Query 1. But Both still have performance issue. Appreciate help for a query with better performance. Thanks in advance.

Your query doesn't look right to me (as mentioned in a comment on the original post; lacking staff_id in the join on the sales, etc)
Ignoring that, one of your biggest performance hits is likely to be this...
ORDER BY DATEDIFF(date, si.date) LIMIT 1
That order by can only be answered by comparing EVERY record for that staff member to the current sales record.
What you ideally want to be able to do is find the appropriate staff rating from an index, and not to have to run computations that involve dates from both the ratings table and the sales table.
If, for example, you wanted "the most recent rating BEFORE the sale", the query can be substantially improved...
SELECT
s.name,
s.country,
d.name,
si.item,
si.date,
(
SELECT sr.rating
FROM staff_ratings sr
WHERE sr.staff_id = s.id
AND sr.date <= si.date
ORDER BY sr.date DESC
LIMIT 1
)
AS rating,
st.name,
st.owner
FROM store st
LEFT JOIN staff s ON s.store_id = st.id
LFET JOIN departments d ON d.store_id = st.id
LEFT JOIN sold_items si ON si.store_id = st.id
Then, with an index for staff_ratings(staff_id, date, rating) the optimiser can very quickly look up which rating to use, without having to scan Every Single Rating for that staff member.

Why DATEDIFF? Would something like this work better? If so, the given index will make it work much faster.
WHERE staff_id = s.id
AND s.date >= s1.date
ORDER BY s.date
LIMIT 1
And INDEX(staff_id, date)
Do you need LEFT JOIN? Perhaps plain JOIN?
d may benefit from INDEX(store_id, name)

Related

Ordering rows in a parent table by SUM of column in a child table

I have 3 tables in my database
companies{
id,
name,
address
}
stores{
id,
name,
address,
company_id
}
invoices{
id,
total,
date_time,
store_id
}
As you can see, each store is connected to a company via foreign key, also each invoice is connected to a store.
My question is, how can i write a SQL query which will return all stores by a company and order them by their turnover?
If i use the query:
SELECT s.*,
sum(i.total) as turnover FROM store s
JOIN invoices i
ON i.store_id = s.id
WHERE YEAR(i.date_time) = 2019;
I can see the turnover for one store for a year 2019 for example, but i'm struggling to find a way to get a list of store ordered by their turnover for a certain period.
You're going to need to join all 3 tables:
SELECT *
FROM
companies c
INNER JOIN stores s on s.company_id = c.id
INNER JOIN invoices i ON i.store_id = s.id
That's your entire raw data in detailed list. Then you say you want it for a certain company only:
SELECT *
FROM
companies c
INNER JOIN stores s on s.company_id = c.id
INNER JOIN invoices i ON i.store_id = s.id
WHERE c.name = 'Acme Rubber Co'
Then you only want the stores and the invoices amounts:
SELECT s.name, i.total
FROM
companies c
INNER JOIN stores s on s.company_id = c.id
INNER JOIN invoices i ON i.store_id = s.id
WHERE c.name = 'Acme Rubber Co'
Then you want a row set where each line is a single store and the sum of all invoices for that store:
SELECT s.name, SUM(i.total)
FROM
companies c
INNER JOIN stores s on s.company_id = c.id
INNER JOIN invoices i ON i.store_id = s.id
WHERE c.name = 'Acme Rubber Co'
GROUP BY s.name
Lastly you want them in descending order, highest total first:
SELECT s.name as storename, SUM(i.total) as turnover
FROM
companies c
INNER JOIN stores s on s.company_id = c.id
INNER JOIN invoices i ON i.store_id = s.id
WHERE c.name = 'Acme Rubber Co'
GROUP BY s.name
ORDER BY turnover DESC
The order of evaluation in sql is FROM(with joins), WHERE, GROUP BY, SELECT, ORDER BY which is why I use different names in eg the order by than I do in the group by. Conceptually your db only sees the names of things as output by the immediately previous operation. Mysql isn't actually too picky but some db are - you couldn't say GROUP BY storename in sql server because the SELECT that creates the storename alias hasn't been run at the time the group by is done
Note: I wasn't really sure on what you were looking for in a WHERE - you started by saying "all stores turnover for a certain company" and finished saying you were "struggling to get turnover for a period"
If you want a period, use eg WHERE somedatecolumn BETWEEN '2000-01-01' AND '2000-12-31' (Between is inclusive) or WHERE somedatecolumn >= '2000-01-01' AND somedatecolumn < '2001-01-01' (A good pattern to use if the date includes a time too). It is almost never wise to call a function on a column you're searching with, ie do not do WHERE YEAR(somedatecolumn) = 2000 because it disables indexing on the column and makes the search very slow

Subquery not outputting the correct result?

I hope someone can help me with this problem, I been trying different combinations but can't seems to get the correct result.
First of all, I have 3 tables, category, department, and master, master has the FK. I want to get sum of total time for each department and sum of total time for category group by department so that I can see which department has how much time for that category, all should be group by department and in the correct order?
I can get the result but not coming out right, here is my code...
SELECT a.department, time_spent , COUNT(DISTINCT (a.department)), sum_quarantine FROM
(SELECT d.department, SUM(time_spent) as time_spent
FROM master as m
INNER JOIN department as d ON d.dept_id = m.dept_id GROUP BY d.department) AS a,
(SELECT d.department, c.category, SUM(time_spent) as sum_quarantine
FROM master as m
INNER JOIN category as c ON c.cat_id = m.cat_id
INNER JOIN department as d ON d.dept_id = m.dept_id
WHERE category = 'Quarantine' GROUP BY d.department) as b
GROUP BY b.department
What you want can be done without sub-query, making use of a CASE WHEN in a SUM, so that it only sums what you are interested in ('Quarantine'):
SELECT d.department,
SUM(CASE c.category
WHEN 'Quarantine' THEN time_spent
END) as sum_quarantine,
SUM(time_spent) time_spent,
COUNT(DISTINCT (c.category)) category_count
FROM master as m
INNER JOIN category as c ON c.cat_id = m.cat_id
INNER JOIN department as d ON d.dept_id = m.dept_id
GROUP BY d.department
ORDER BY 1
Here is fiddle.
Note that I added a count as well, since there was one in your attempt. But yours would always return 1, so I am not sure what you were looking for. I have added the number of categories linked to the department.
Also you did not specify the order you want, but it is easy to adapt the ORDER BY clause. The above query orders results by department, but if for instance you want to order by the second column in descending order, then do:
...
ORDER BY 2 DESC
If your results includes NULL values, and you prefer to have zeroes instead, then make use of the COALESCE function. For example:
COALESCE(SUM(time_spent), 0) time_spent,

Getting many shipping addresses for one order number

I'm using 4 tables
CUSTOMER
CUSTOMER_ORDER
CUST_ORDER_LINE
CUST_ADDRESS
I used Inner joins to link the tables. CUSTOMER is linked to CUSTOMER_ORDER and CUST_ADDRESS by customer_ID, and CUSTOMER_ORDER_LINE is linked to CUSTOMER_ORDER by order_ID. Order_ID does not appear in the CUSTOMER or CUST_ADDRESS tables.
When I run the query below, I get every shipping address on record for that particular customer and order number.
For example, a distributor has 25 possible shipping addresses, but they only ship one order to one shipping address at a time. My query is bringing back one order number 25 times for every address. Any advice would be wonderful. Thank you.
SELECT DISTINCT TOP (100) PERCENT dbo.CUSTOMER_ORDER.ID,
dbo.CUSTOMER.NAME,
dbo.CUST_ORDER_LINE.PART_ID,
dbo.CUST_ORDER_LINE.ORDER_QTY,
dbo.CUSTOMER_ORDER.STATUS,
dbo.CUSTOMER_ORDER.SHIPTO_ID,
dbo.CUST_ADDRESS.NAME AS Expr1
FROM dbo.CUSTOMER
INNER JOIN dbo.CUSTOMER_ORDER
ON dbo.CUSTOMER.ID = dbo.CUSTOMER_ORDER.CUSTOMER_ID
INNER JOIN dbo.CUST_ORDER_LINE
ON dbo.CUSTOMER_ORDER.ID = dbo.CUST_ORDER_LINE.CUST_ORDER_ID
INNER JOIN dbo.CUST_ADDRESS
ON dbo.CUSTOMER.ID = dbo.CUST_ADDRESS.CUSTOMER_ID
WHERE (dbo.CUSTOMER_ORDER.ORDER_DATE > '1/1/2014')
AND (dbo.CUSTOMER_ORDER.ID NOT LIKE 'RMA%')
GROUP BY dbo.CUSTOMER_ORDER.ID,
dbo.CUSTOMER.NAME,
dbo.CUST_ORDER_LINE.PART_ID,
dbo.CUST_ORDER_LINE.ORDER_QTY,
dbo.CUSTOMER_ORDER.STATUS,
dbo.CUSTOMER_ORDER.SHIPTO_ID,
dbo.CUST_ADDRESS.NAME
ORDER BY dbo.CUSTOMER_ORDER.ID
As a shot in the dark it seems your query should be something like this.
SELECT
co.ID,
c.NAME,
col.PART_ID,
col.ORDER_QTY,
co.STATUS,
co.SHIPTO_ID,
ca.NAME AS Expr1
FROM dbo.CUSTOMER c
INNER JOIN dbo.CUSTOMER_ORDER co ON c.ID = co.CUSTOMER_ID
INNER JOIN dbo.CUST_ORDER_LINE col ON co.ID = col.CUST_ORDER_ID
INNER JOIN dbo.CUST_ADDRESS ca ON co.SHIPTO_ID = ca.CUSTOMER_ID --this is now joining to the order table.
WHERE co.ORDER_DATE > '2014-01-01'
AND co.ID NOT LIKE 'RMA%'
GROUP BY co.ID,
c.NAME,
col.PART_ID,
col.ORDER_QTY,
co.STATUS,
co.SHIPTO_ID,
ca.NAME
ORDER BY co.ID
Notice how using aliases makes this look a lot cleaner. I also changed up the string date to use the generally accepted format. This will work regardless of your DATEFORMAT setting.

Slow aggregate query with join on same table

I have a query to show customers and the total dollar value of all their orders. The query takes about 100 seconds to execute.
I'm querying on an ExpressionEngine CMS database. ExpressionEngine uses one table exp_channel_data, for all content. Therefore, I have to join on that table for both customer and order data. I have about 14,000 customers, 30,000 orders and 160,000 total records in that table.
Can I change this query to speed it up?
SELECT link.author_id AS customer_id,
customers.field_id_122 AS company,
Sum(orders.field_id_22) AS total_orders
FROM exp_channel_data customers
JOIN exp_channel_titles link
ON link.author_id = customers.field_id_117
AND customers.channel_id = 7
JOIN exp_channel_data orders
ON orders.entry_id = link.entry_id
AND orders.channel_id = 3
GROUP BY customer_id
Thanks, and please let me know if I should include other information.
UPDATE SOLUTION
My apologies. I noticed that entry_id for the exp_channel_data table customers corresponds to author_id for the exp_channel_titles table. So I don't have to use field_id_117 in the join. field_id_117 duplicates entry_id, but in a TEXT field. JOINING on that text field slowed things down. The query is now 3 seconds
However, the inner join solution posted by #DRapp is 1.5 seconds. Here is his sql with a minor edit:
SELECT
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
FROM
( SELECT
t.author_id
SUM( o.field_id_22 ) as totalOrders
FROM
exp_channel_data o
JOIN
exp_channel_titles t ON t.author_id = o.entry_id AND o.channel_id = 3
GROUP BY
t.author_id ) PQ
JOIN
exp_channel_data c ON PQ.author_id = c.entry_id AND c.channel_id = 7
ORDER BY CustomerID
If this is the same table, then the same columns across the board for all alias instances.
I would ensure an index on (channel_id, entry_id, field_id_117 ) if possible. Another index on (author_id) for the prequery of order totals
Then, start first with what will become an inner query doing nothing but a per customer sum of order amounts.. Since the join is the "author_id" as the customer ID, just query/sum that first. Not completely understanding the (what I would consider) poor design of the structure, knowing what the "Channel_ID" really indicates, you don't want to duplicate summation values because of these other things in the mix.
select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id
If that is correct on the per customer (via author_id column), then that can be wrapped as follows
select
PQ.author_id CustomerID,
c.field_id_122 CompanyName,
PQ.totalOrders
from
( select
o.author_id,
sum( o.field_id_22 ) as totalOrders
FROM
exp_channel_data customers o
where
o.channel_id = 3
group by
o.author_id ) PQ
JOIN exp_channel_data c
on PQ.author_id = c.field_id_117
AND c.channel_id = 7
Can you post the results of an EXPLAIN query?
I'm guessing that your tables are not indexed well for this operation. All of the columns that you join on should probably be indexed. As a first guess I'd look at indexing exp_channel_data.field_id_117
Try something like this. Possibly you have error in joins. also check whether joins on columns are correct in your databases. Cross join may takes time to fetch large data, by mistake if your joins are not proper on columns.
select
link.author_id as customer_id,
customers.field_id_122 as company,
sum(orders.field_id_22) as total_or_orders
from exp_channel_data customers
join exp_channel_titles link on (link.author_id = customers.field_id_117 and
link.author_id = customer.channel_id = 7)
join exp_channel_data orders on (orders.entry_id = link.entry_id and orders.entry_id = orders.channel_id = 3)
group by customer_id

MySQL "Distinct" join super slow

I have the following query which gives me the right results. But it's super slow.
What makes it slow is the
AND a.id IN (SELECT id FROM st_address GROUP BY element_id)
part. The query should show from which countries we get how many orders.
A person can have multiple addresses, but in this case, we only only want one.
Cause otherwise it will count the order multiple times. Maybe there is a better way to achieve this? A distinct join on the person or something?
SELECT cou.title_en, COUNT(co.id), SUM(co.price) AS amount
FROM customer_order co
JOIN st_person p ON (co.person_id = p.id)
JOIN st_address a ON (co.person_id = a.element_id AND a.element_type_id = 1)
JOIN st_country cou ON (a.country_id = cou.id)
WHERE order_status_id != 7 AND a.id IN (SELECT id FROM st_address GROUP BY element_id)
GROUP BY cou.id
Have you tried to replace the IN with an EXISTS?
AND EXISTS (SELECT 1 FROM st_address b WHERE a.id = b.id)
The EXISTS part should stop the subquery as soon as the first row matching the condition is found. I have read conflicting comments on if this is actually happening though so you might throw a limit 1 in there to see if you get any gain.
I found a faster solution. The trick is a join with a sub query:
JOIN (SELECT element_id, country_id, id FROM st_address WHERE element_type_id = 1 GROUP BY
This is the complete query:
SELECT cou.title_en, COUNT(o.id), SUM(o.price) AS amount
FROM customer_order o
JOIN (SELECT element_id, country_id, id FROM st_address WHERE element_type_id = 1 GROUP BY element_id) AS a ON (o.person_id = a.element_id)
JOIN st_country cou ON (a.country_id = cou.id)
WHERE o.order_status_id != 7
GROUP BY cou.id