Optimizing one SQL query - mysql

I need to run query to get some data from the DB. The thing is the query which I use works but takes a very long time.
SELECT SHH1.CUST_NO,
SHH1.CUST_NAME,
ADDR.BVADDREMAIL
FROM SALES_HISTORY_HEADER SHH1
INNER JOIN ADDRESS ADDR ON (SHH1.CUST_NO=ADDR.CEV_NO)
INNER JOIN CUSTOMER CUST ON (SHH1.CUST_NO=CUST.CUS_NO)
WHERE CUST.HOLD = 0
AND SHH1.CUST_NO IN (SELECT SHH2.CUST_NO
FROM SALES_HISTORY_HEADER SHH2
GROUP BY SHH2.CUST_NO
HAVING Max(SHH2.IN_DATE) < '20120101')
GROUP BY SHH1.CUST_NO,
SHH1.CUST_NAME,
ADDR.BVADDREMAIL
I am not very good at this so was wondering if any of you guys could help me? Thanks.

I don't think you need the sub-select. the following should give you the same results:
SELECT SHH1.CUST_NO,SHH1.CUST_NAME,ADDR.BVADDREMAIL
FROM SALES_HISTORY_HEADER SHH1
INNER JOIN ADDRESS ADDR ON (SHH1.CUST_NO=ADDR.CEV_NO)
INNER JOIN CUSTOMER CUST ON (SHH1.CUST_NO=CUST.CUS_NO)
WHERE CUST.HOLD = 0
GROUP BY SHH1.CUST_NO,SHH1.CUST_NAME,ADDR.BVADDREMAIL
HAVING Max(SHH1.IN_DATE) < '20120101'

Assuming CUST_NAME is on CUSTOMER, try:
SELECT CUST.CUS_NO CUST_NO,
CUST.CUST_NAME,
ADDR.BVADDREMAIL
FROM CUSTOMER CUST
JOIN ADDRESS ADDR ON CUST.CUS_NO=ADDR.CEV_NO
LEFT JOIN SALES_HISTORY_HEADER SHH
ON CUST.CUS_NO=SHH.CUST_NO AND SHH.IN_DATE >= '20120101'
WHERE CUST.HOLD = 0 AND SHH.CUST_NO IS NULL
GROUP BY CUST.CUS_NO, ADDR.BVADDREMAIL

Use the Excution Plan which part of the query has the major cost of execution. That will likely indicate where is the problem. The next step I would perform is to review indexes creation. Please recall clustered index is created by default over PKs but it is also a good practice to create non-clustered index over the FKs and any potential field you may have particular interest. Hope it helps.

I agree with Mark Bannister's answer that the OP should do FROM CUSTOMER instead FROM SALES_HISTORY_HEADER.
So, Assuming CUST_NAME is on CUSTOMER, I would do the following:
SELECT CUST.CUS_NO,
CUST.CUST_NAME,
ADDR.BVADDREMAIL
FROM CUSTOMER CUST
JOIN ADDRESS ADDR ON CUST.CUS_NO = ADDR.CEV_NO
WHERE CUST.HOLD = 0
AND CUST.CUS_NO IN (SELECT SHH.CUST_NO
FROM SALES_HISTORY_HEADER SHH
GROUP BY SHH.CUST_NO
HAVING Max(SHH.IN_DATE) < '20120101')
Note that subquery materialization must be active for MySQL to optimize the sub-select.

What about removing some rows before joining
SELECT S.CUST_NO,
,S.Cust_NAME
,A.BVADDREMAIL
FROM CUSTOMER C
INNER JOIN SALES_HISTORY_HEADER S ON (CUST.HOLD = 0 AND S.CUST_NO=C.CUS_NO)
INNER JOIN ADDRESS A ON (S.CUST_NO=A.CEV_NO)
GROUP BY S.CUST_NO,S.CUST_NAME,A.BVADDREMAIL
HAVING Max(S.IN_DATE) < '20120101'
My update on StevieG's answer

Related

Speeding up mysql query

I have a mysql query to join four tables and I thought that it was just best to join tables but now that mysql data is getting bigger the query seems to cause the application to stop execution.
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM `purchase_order`
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY id ORDER BY `purchase_order`.`po_date` DESC LIMIT 0, 20
my problem really is the query that takes a lot of time to finish. Is there a way to speed this query or to change this query for faster retrieval of the data?
heres the EXPLAIN EXTENED as requested in the comments.
Thanks in advance, I really hope this is the right channel for me to ask. If not please let me know.
Will this give you the correct list of ids?
SELECT id
FROM purchase_order
ORDER BY`po_date` DESC
LIMIT 0, 20
If so, then start with that before launching into the JOIN. You can also (I think) get rid of the GROUP BY that is causing an "explode-implode" of rows.
SELECT ...
FROM ( SELECT id ... (as above) ...) AS ids
JOIN purchase_order po ON po.id = ids.id
JOIN ... (the other tables)
GROUP BY ... -- (this may be problematic, especially with the LIMIT)
ORDER BY po.po_date DESC -- yes, this needs repeating
-- no LIMIT
Something like this
SELECT
`purchase_order`.`id`,
`purchase_order`.`po_date` AS po_date,
`purchase_order`.`po_number`,
`purchase_order`.`customer_id` AS customer_id ,
`customer`.`name` AS customer_name,
`purchase_order`.`status` AS po_status,
`purchase_order_items`.`product_id`,
`purchase_order_items`.`po_item_name`,
`product`.`weight` as product_weight,
`product`.`pending` as product_pending,
`product`.`company_owner` as company_owner,
`purchase_order_items`.`uom`,
`purchase_order_items`.`po_item_type`,
`purchase_order_items`.`order_sequence`,
`purchase_order_items`.`pending_balance`,
`purchase_order_items`.`quantity`,
`purchase_order_items`.`notes`,
`purchase_order_items`.`status` AS po_item_status,
`purchase_order_items`.`id` AS po_item_id
FROM (SELECT id, po_date, po_number, customer_id, status
FROM purchase_order
ORDER BY `po_date` DESC
LIMIT 0, 5) as purchase_order
INNER JOIN customer ON `customer`.`id` = `purchase_order`.`customer_id`
INNER JOIN purchase_order_items
ON `purchase_order_items`.`po_id` = `purchase_order`.`id`
INNER JOIN product ON `purchase_order_items`.`product_id` = `product`.`id`
GROUP BY purchase_order.id DESC
LIMIT 0, 5
You need to be sure that purchase_order.po_date and all id column are indexed. You can check it with below query.
SHOW INDEX FROM yourtable;
Since you mentioned that data is getting bigger. I would suggest doing sharding and then you can parallelize multiple queries. Please refer to the following article
Parallel Query for MySQL with Shard-Query
First, I cleaned up readability a bit. You don't need tick marks around every table.column reference. Also, for short-hand, using aliases works well. Ex: "po" instead of "purchase_order", "poi" instead of "purchase_order_items". The only time I would use tick marks is around reserved words that might cause a problem.
Second, you don't have any aggregations (sum, min, max, count, avg, etc.) in your query so you should be able to strip the GROUP BY clause.
As for indexes, I would have to assume you have an index on your reference tables on their respective "id" key columns.
For your Purchase Order table, I would have an index on that based on the "po_date" in the first index field position in case you already had an index using it. Since your Order by is on that, let the engine jump directly to those dated records first and you have your descending order resolved.
SELECT
po.id,
po.po_date,
po.po_number,
po.customer_id,
c.`name` AS customer_name,
po.`status` AS po_status,
poi.product_id,
poi.po_item_name,
p.weight as product_weight,
p.pending as product_pending,
p.company_owner,
poi.uom,
poi.po_item_type,
poi.order_sequence,
poi.pending_balance,
poi.quantity,
poi.notes,
poi.`status` AS po_item_status,
poi.id AS po_item_id
FROM
purchase_order po
INNER JOIN customer c
ON po.customer_id = c.id
INNER JOIN purchase_order_items poi
ON po.id = poi.po_id
INNER JOIN product p
ON poi.product_id = p.id
ORDER BY
po.po_date DESC
LIMIT
0, 20

Optimizing a MySQL NOT IN( query

I am trying to optimize this MySQL query. I want to get a count of the number of customers that do not have an appointment prior to the current appointment being looked at. In other words, if they have an appointment (which is what the NOT IN( subquery is checking for), then exclude them.
However, this query is absolutely killing performance. I know that MySQL is not very good with NOT IN( queries, but I am not sure on the best way to go about optimizing this query. It takes anywhere from 15 to 30 seconds to run. I have created indexes on CustNo, AptStatus, and AptNum.
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
LEFT JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND a.CustNo NOT IN
(
SELECT
a2.CustNo
FROM
appointment a2
WHERE
a2.AptDateTime < a.AptDateTime)
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
Thank you in advance.
Try the following:
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
INNER JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
LEFT OUTER JOIN appointment AS earlier_a
ON earlier_a.CustNo = a.CustNo
AND earlier_a.AptDateTime < a.AptDateTime
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND earlier_a.AptNum IS NULL
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
This will benefit from a composite index on (CustNo,AptDateTime). Make it unique if that fits your business model (logically it seems like it should, but practically it may not, depending on how you handle conflicts in your application.)
Provide SHOW CREATE TABLE statements for all tables if this does not create a sufficient performance improvement.

How to improve SELECT performance joining multiple tables

I have the following mySQL SELECT statement that was working ok on a small data set but died when the volume was increased:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country, BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, Clients, BookingProgram, BookingAccommodation, Countries, ClientType, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
AND Programs.ProgramId = BookingProgram.ProgramId
With around 10K records in Bookings and 25K records in each of BookingAccommodation and BookingPrograms the volume isn't huge but the query ran in 950 seconds. I'm running the query in the SQL window of phpAdmin on a local MAMP server.
Splitting it into 3 queries the result comes back in a fraction of a second for each:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate, Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country
FROM Bookings, Clients, Countries, ClientType
WHERE Bookings.WeekBeginning >= '2016-10-01'
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
SELECT DISTINCT Bookings.BookingId, BookingAccommodation.AccomId, BookingAccommodation.ShareType
FROM Bookings, BookingAccommodation
WHERE Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
SELECT DISTINCT Bookings.BookingId, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, BookingProgram, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND Programs.ProgramId = BookingProgram.ProgramId
There are multiple records in BookingAccommodation and BookingProgram for each record in Bookings but I only require one record from each hence the SELECT DISTINCT.
The primary key on Bookings is BookingId.
The primary key on BookingAccommodation is BookingId, AccomDate, AccomId
The primary key on BookingProgram is BookingId, ProgramId, AccomType
I've tried to rewrite the query with joins and sub queries but I'm obviously not doing it right. How can I join these 3 queries back into a single query that will perform well?
These are the basics of using subqueries instead of joins (MySQL assumed FWIW). Apologies for pseudocode, I thought it important to answer ASAP as this is one of the top hits on this issue I faced just now.
A client makes a booking to go on a cruise ship. The client should also specify their diet (eg. vegetarian, vegan, no soy, etc). We thus have three tables:
Bookings
Booking_Id, Booking_Date, Booking_Time, Client_Id
Clients
Client_Id, Client_Name, Client_Phone, Client_DietId
Diets
Diet_Id, Diet_Name
We now want to present to the concierge a full booking view.
Using "JOINS":
SELECT Bookings.Booking_Id, Bookings.Booking_Date, Bookings.Booking_Time, Clients.Client_Name, Diets.Diet_Name
FROM Bookings
INNER JOIN Clients
ON Bookings.Client_Id = Clients.Client_Id
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id
Using "SUBQUERIES":
How I think of it is creating "temp tables" in those separate JOINs - of course "temp tables" may or may not be the accurate low-level implementation, etc. but anecdotally subqueries may be faster than huge joins (other threads on this).
I have separate joins I want to do from the above example:
First I need to join the Clients with their Diets, then I join that "table" with Bookings.
Thus I end up with this (note the table (re)naming when referring to the subquery):
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id
Now if another table is to be joined say Staff assigned to a particular Booking, then the whole thing above would be nested, and so on eg:
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id)
AS BookingDetailsFull
INNER JOIN Staff
ON BookingDetailsFull.Booking_Id = Staff.Booking_Id_Assigned
Try changing it as
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId,
Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,
Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1,
Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country,
BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId,
Programs.ProgramDesc
FROM Bookings
JOIN Clients ON Clients.ClientId = Bookings.ClientId AND Bookings.WeekBeginning >= '2016-10-01' AND Clients.Email <> ''
JOIN BookingProgram ON Bookings.BookingId = BookingProgram.BookingId
JOIN BookingAccommodation ON Bookings.BookingId = BookingAccommodation.BookingId AND BookingAccommodation.Nights > 0
JOIN Countries ON Clients.CountryId = Countries.CountryId
JOIN Programs ON Programs.ProgramId = BookingProgram.ProgramId
WHERE Bookings.WeekBeginning >= '2016-10-01';
If this is not getting you the results you wanted, try EXPLAIN and see the query plan.
Please Note: I didn't see table ClientType is being used anywhere so I did not include it in JOINs
Rather than spend more time trying to improve the select statement as it hits so many tables I opted to split it into the separate queries as I outlined in the original question.
In the end this was the quickest practical solution.

How to convert code written in INNER JOIN to Subquery

Need some help with converting code from Join statement into Subquery.
I need to remove GROUP BY from it somehow, when converted into Subquery and don't know how.
Managed to put small portion of subquery at the end of the code, don't know how to do rest.
Need some help, thank you.
Here is the sample of the code: (need to convert into SQL Server syntax)
SELECT
b.Number, t.IDTyre, SUM(c.Price)
FROM Tyre AS t
INNER JOIN Bill AS b ON t.BillID = b.IDBill
INNER JOIN Customer AS c ON c.TyreID = t.IDTyre
GROUP BY b.Number, t.IDTyre
HAVING SUM(c.Price) < 3000 OR t.IDTyre NOT IN (SELECT c.TyreID FROM Customer AS c)
Check if the below query works:
SELECT
(Select b.Number From Bill AS b Where b.IDBill = t.BillID) as Number,
t.IDTyre as TyreID,
(Select SUM(c.Price) From Customer AS c Having SUM(c.Price) < 3000 OR t.IDTyre NOT IN (SELECT Distinct c.TyreID FROM Customer AS c) And c.TyreID = t.IDTyre) as Price
FROM Tyre AS t
Why are you trying to convert this to Sub Query?
JOINS are the best options while dealing with linking tables.
Also the "NOT IN" that you are trying to do at the end is also not good, you should use "NOT EXISTS". Change this to: OR NOT EXISTS (SELECT * FROM Customer AS c WHERE t.IDTyre=c.TyreID)

Query optimization

I'm having a problem with this slow query:
SELECT c.*, csc1.changed_status
FROM contract c
LEFT
JOIN contract_status_change csc1
ON csc1.contract_status_change_id =
( SELECT csc2.contract_status_change_id
FROM contract_status_change csc2
WHERE csc2.contract_id = c.contract_id
ORDER
BY csc2.date_changed DESC
LIMIT 1
)
;
I have a contract table and a contract_status_change table, which records statuses against the contract. This query is joining on the latest status with the contract so you can get its current status..
Please can you help me tidy it up?
-edit-
my apologies. I have updated the query to include selecting the actual latest status out. Sorry for the confusion!
After formatting your query for readability (consistent whitespace and capitalization, removing unnecessary backticks and parentheses, more sensible aliases):
SELECT c.*
FROM contract c
LEFT
JOIN contract_status_change csc1
ON csc1.contract_status_change_id =
( SELECT csc2.contract_status_change_id
FROM contract_status_change csc2
WHERE csc2.contract_id = c.contract_id
ORDER
BY csc2.date_changed DESC
LIMIT 1
)
;
and assuming that contract_status_change.contract_status_change_id is a unique identifier, I'm forced to conclude that your query is equivalent to this, much more efficient one:
SELECT c.*
FROM contract c
;
You say that it "is joining on the latest status with the contract so you can get its current status", but it doesn't do anything with the current status — doesn't order by it, doesn't filter by it, doesn't include it in the query results — so there's no need for that.
This should help a bit.
SELECT c.*, csc1.changed_status
FROM contract c LEFT JOIN contract_changed_status csc1 ON c.contract_id = csc1.contract_id
INNER JOIN
(
SELECT contract_id, changed_status, MAX(date_changed) AS 'max_date'
FROM contract_status_changed GROUP_BY contract_id
GROUP BY contract_id
) csc2 ON csc1.contract_id = csc2.contract_id AND csc1.date_changed = csc2.max_date