How to improve SELECT performance joining multiple tables

How to improve SELECT performance joining multiple tables - mysql

I have the following mySQL SELECT statement that was working ok on a small data set but died when the volume was increased:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country, BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, Clients, BookingProgram, BookingAccommodation, Countries, ClientType, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
AND Programs.ProgramId = BookingProgram.ProgramId
With around 10K records in Bookings and 25K records in each of BookingAccommodation and BookingPrograms the volume isn't huge but the query ran in 950 seconds. I'm running the query in the SQL window of phpAdmin on a local MAMP server.
Splitting it into 3 queries the result comes back in a fraction of a second for each:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate, Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country
FROM Bookings, Clients, Countries, ClientType
WHERE Bookings.WeekBeginning >= '2016-10-01'
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
SELECT DISTINCT Bookings.BookingId, BookingAccommodation.AccomId, BookingAccommodation.ShareType
FROM Bookings, BookingAccommodation
WHERE Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
SELECT DISTINCT Bookings.BookingId, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, BookingProgram, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND Programs.ProgramId = BookingProgram.ProgramId
There are multiple records in BookingAccommodation and BookingProgram for each record in Bookings but I only require one record from each hence the SELECT DISTINCT.
The primary key on Bookings is BookingId.
The primary key on BookingAccommodation is BookingId, AccomDate, AccomId
The primary key on BookingProgram is BookingId, ProgramId, AccomType
I've tried to rewrite the query with joins and sub queries but I'm obviously not doing it right. How can I join these 3 queries back into a single query that will perform well?

These are the basics of using subqueries instead of joins (MySQL assumed FWIW). Apologies for pseudocode, I thought it important to answer ASAP as this is one of the top hits on this issue I faced just now.
A client makes a booking to go on a cruise ship. The client should also specify their diet (eg. vegetarian, vegan, no soy, etc). We thus have three tables:
Bookings
Booking_Id, Booking_Date, Booking_Time, Client_Id
Clients
Client_Id, Client_Name, Client_Phone, Client_DietId
Diets
Diet_Id, Diet_Name
We now want to present to the concierge a full booking view.
Using "JOINS":
SELECT Bookings.Booking_Id, Bookings.Booking_Date, Bookings.Booking_Time, Clients.Client_Name, Diets.Diet_Name
FROM Bookings
INNER JOIN Clients
ON Bookings.Client_Id = Clients.Client_Id
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id
Using "SUBQUERIES":
How I think of it is creating "temp tables" in those separate JOINs - of course "temp tables" may or may not be the accurate low-level implementation, etc. but anecdotally subqueries may be faster than huge joins (other threads on this).
I have separate joins I want to do from the above example:
First I need to join the Clients with their Diets, then I join that "table" with Bookings.
Thus I end up with this (note the table (re)naming when referring to the subquery):
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id
Now if another table is to be joined say Staff assigned to a particular Booking, then the whole thing above would be nested, and so on eg:
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id)
AS BookingDetailsFull
INNER JOIN Staff
ON BookingDetailsFull.Booking_Id = Staff.Booking_Id_Assigned

Try changing it as
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId,
Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,
Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1,
Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country,
BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId,
Programs.ProgramDesc
FROM Bookings
JOIN Clients ON Clients.ClientId = Bookings.ClientId AND Bookings.WeekBeginning >= '2016-10-01' AND Clients.Email <> ''
JOIN BookingProgram ON Bookings.BookingId = BookingProgram.BookingId
JOIN BookingAccommodation ON Bookings.BookingId = BookingAccommodation.BookingId AND BookingAccommodation.Nights > 0
JOIN Countries ON Clients.CountryId = Countries.CountryId
JOIN Programs ON Programs.ProgramId = BookingProgram.ProgramId
WHERE Bookings.WeekBeginning >= '2016-10-01';
If this is not getting you the results you wanted, try EXPLAIN and see the query plan.
Please Note: I didn't see table ClientType is being used anywhere so I did not include it in JOINs

Rather than spend more time trying to improve the select statement as it hits so many tables I opted to split it into the separate queries as I outlined in the original question.
In the end this was the quickest practical solution.

Related

Conditional JOIN based on column value

I've looked all over, and unfortunately, I can't seem to figure out what I'm doing wrong. I'm developing a personal financial management application that uses a MySQL server. For this problem, I have 4 tables I'm working with.
The TRANSACTIONS table contains columns CATID and BILLID which refer to primary keys in the SECONDARYCATEGORIES and BILLS tables. Both the TRANSACTIONS and BILLS tables have a column PCATID which refers to a primary key in the PRIMARYCATEGORIES table.
I'm building a SQL query that sums an "amount" column in the TRANSACTIONS table and returns the primary key from PCATID and the sum from all records that are associated with that value. If the BILLID is set to -1, it should find the PCATID in SECONDARYCATEGORIES where SECONDARYCATEGORIES.ID = TRANSACTIONS.CATID, otherwise (since -1 indicates this is NOT a bill), it should find the PCATID from the BILL record where BILLS.ID matches TRANSACTIONS.BILLID.
I'm looking for something like this (not valid SQL, obviously):
SELECT
SECONDARYCATEGORIES.PCATID,
SUM(TRANSACTIONS.AMOUNT)
FROM
TRANSACTIONS
IF (BILLID = -1) JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = TRANSACTIONS.CATID
ELSE JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = BILLS.CATID WHERE BILLS.ID = TRANSACTIONS.BILLID
I have tried a myriad of different JOINs, IF statements, etc, and I just can't seem to make this work. I had thought of breaking this up into different SQL queries based on the value of BILLID, and summing the values, but I'd really like to do this all in one SQL query if possible.
I know I'm missing something obvious here; any help is very much appreciated.
Edit: I forgot to describe the BILLS table. It contains a primary category, ID, as well as some descriptive data.

You can use OR in your JOIN, like this:
SELECT S.PCATID,
SUM(T.AMOUNT)
FROM TRANSACTIONS T
LEFT JOIN BILLS ON BILLS.ID = T.BILLID
JOIN SECONDARYCATEGORIES S ON (S.ID = T.CATID AND T.BILLID = -1)
OR (S.ID = BILLS.CATID AND BILLS.ID = T.BILLID)

You can also use COALESCE and CASE in your JOINs.
SELECT ID = COALESCE(s.PCATID,b.PCATID)
,Total = SUM(t.AMOUNT)
FROM TRANSACTIONS t
LEFT JOIN BILLS b ON b.BILLID = CASE WHEN t.BILLID <> -1 THEN t.BILLID END
LEFT JOIN SECONDARYCATEGORIES s ON s.CATID = CASE WHEN t.BILLID = -1 THEN t.CATID END
GROUP BY COALESCE(s.PCATID,b.BILLID)

I use UNION to pick either query. But the second query obviously won't work because it's missing BILLS table.
SELECT SECONDARYCATEGORIES.PCATID
, SUM(TRANSACTIONS.AMOUNT)
FROM TRANSACTIONS
JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = TRANSACTIONS.CATID AND BILLID = -1
UNION
SELECT SECONDARYCATEGORIES.PCATID
, SUM(TRANSACTIONS.AMOUNT)
FROM TRANSACTIONS
JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = BILLS.CATID AND BILLID <> -1
WHERE BILLS.ID = TRANSACTIONS.BILLID

Optimizing a MySQL NOT IN( query

I am trying to optimize this MySQL query. I want to get a count of the number of customers that do not have an appointment prior to the current appointment being looked at. In other words, if they have an appointment (which is what the NOT IN( subquery is checking for), then exclude them.
However, this query is absolutely killing performance. I know that MySQL is not very good with NOT IN( queries, but I am not sure on the best way to go about optimizing this query. It takes anywhere from 15 to 30 seconds to run. I have created indexes on CustNo, AptStatus, and AptNum.
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
LEFT JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND a.CustNo NOT IN
(
SELECT
a2.CustNo
FROM
appointment a2
WHERE
a2.AptDateTime < a.AptDateTime)
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
Thank you in advance.

Try the following:
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
INNER JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
LEFT OUTER JOIN appointment AS earlier_a
ON earlier_a.CustNo = a.CustNo
AND earlier_a.AptDateTime < a.AptDateTime
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND earlier_a.AptNum IS NULL
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
This will benefit from a composite index on (CustNo,AptDateTime). Make it unique if that fits your business model (logically it seems like it should, but practically it may not, depending on how you handle conflicts in your application.)
Provide SHOW CREATE TABLE statements for all tables if this does not create a sufficient performance improvement.

Understanding why this query is slow

The below query is very slow (takes around 1 second), but is only searching approx 2500 records (+ inner joined tables).
if i remove the ORDER BY, the query runs in much less time (0.05 or less)
OR if i remove the part nested select below "# used to select where no ProfilePhoto specified" it also runs fast, but i need both of these included.
I have indexes (or primary key) on :tPhoto_PhotoID, PhotoID, p.Enabled, CustomerID, tCustomer_CustomerID, ProfilePhoto (bool), u.UserName, e.PrivateEmail, m.tUser_UserID, Enabled, Active, m.tMemberStatuses_MemberStatusID, e.tCustomerMembership_MembershipID, e.DateCreated
(do i have too many indexes? my understanding is add them anywhere i use WHERE or ON)
The Query :
SELECT e.CustomerID,
e.CustomerName,
e.Location,
SUBSTRING_INDEX(e.CustomerProfile,' ', 25) AS Description,
IFNULL(p.PhotoURL, PhotoTable.PhotoURL) AS PhotoURL
FROM tCustomer e
LEFT JOIN (tCustomerPhoto ep INNER JOIN tPhoto p ON (ep.tPhoto_PhotoID = p.PhotoID AND p.Enabled=1))
ON e.CustomerID = ep.tCustomer_CustomerID AND ep.ProfilePhoto = 1
# used to select where no ProfilePhoto specified
LEFT JOIN ((SELECT pp.PhotoURL, epp.tCustomer_CustomerID
FROM tPhoto pp
LEFT JOIN tCustomerPhoto epp ON epp.tPhoto_PhotoID = pp.PhotoID
GROUP BY epp.tCustomer_CustomerID) AS PhotoTable) ON e.CustomerID = PhotoTable.tCustomer_CustomerID
INNER JOIN tUser u ON u.UserName = e.PrivateEmail
INNER JOIN tmembers m ON m.tUser_UserID = u.UserID
WHERE e.Enabled=1
AND e.Active=1
AND m.tMemberStatuses_MemberStatusID = 2
AND e.tCustomerMembership_MembershipID != 6
ORDER BY e.DateCreated DESC
LIMIT 12
i have similar queries that but they run much faster.
any opinions would be grateful:

Until we get more clarity on your question between working in other query etc..Try EXPLAIN {YourSelectQuery} in MySQL client and see the suggestions to improve the performance.

Combine four queries into one

I have inherited a legacy code base, and a database with it (which I cannot modify), and I got stuck on these queries while refactoring the code. There are 4 separate queries which I am trying to put in one, if possible. I will supply the tables schema if needed. Also, if you think that it is not possible to solved in one query, please elaborate on that one.
These are the queries (it is about sports), the final purpose is to get all cups which the given user has created and/or joined
cups table holds the created cups, their creator, etc
joined table holds the users that have joined a specific cup, the place (rank) they have ended
registered_cups table holds the names of the cups
The first one is trying to get all cups which the user has created
SELECT cup_id FROM cups WHERE user_id = 'givenUser' AND cup_type <> 6
The second one is trying to get all cups which the user has joined
SELECT joined.cup_id, joined.cup_rank FROM joined LEFT JOIN cups USING (cup_id) WHERE joined.user_id = 'givenUser' AND cups.cup_type <> 6
Now, the results from the first one are put into a comma-separated string which is then supplied to the next queries. The third one is reading information for the selected cups
SELECT cup_id, user_id, status FROM joined WHERE cup_id IN (cupList)
And the last one gets the names of the cups, their edition, and orders them
SELECT name, cup_id, edition, cups.user_id FROM cups LEFT JOIN registered_cups USING(register_id) WHERE cup_id IN (cupList) ORDER BY name ASC, edition DESC
As you can see there is a lot of repetition, and not needed things, so this is what I came up with:
SELECT cups.cup_id, cups.edition, cups.user_id, joined.cup_rank, registered_cups.name
FROM cups, joined, registered_cups
WHERE cups.cup_id = joined.cup_id
AND cups.register_id = registered_cups.register_id
AND joined.user_id = '308288'
AND cups.cup_type <>6
ORDER BY registered_cups.name ASC , cups.edition DESC
The problem with my query is that I only get the cups which the user joined, but there is also a possibility that a user created a cup, but did not take part into it. That's why are the first two queries, but I don't know how to combine them successfully. I hope you understand what I'm trying to achieve.
Update:
Here is a fiddle of the schema with a little input. Ther user in question is user_id = 133, it should basically select all of the rows, which is 21, with the point that one of them should be a cup that is organized, but not joined by the user, all of the others are both.

There's a couple ways you could do it. You could do an outer join like this:
SELECT cups.cup_id, cups.edition, cups.user_id, joined.cup_rank, registered_cups.name
FROM cups INNER JOIN registered_cups ON cups.register_id = registered_cups.register_id
LEFT OUT JOIN joined ON cups.cup_id = joined.cup_id and joined.user_id = '308288'
WHERE cups.cup_type <>6
AND (joined.user_id IS NOT NULL
OR cups.user_id = '308288')
ORDER BY registered_cups.name ASC , cups.edition DESC
or a union like this:
SELECT cups.cup_id, cups.edition, cups.user_id, joined.cup_rank, registered_cups.name
FROM cups, joined, registered_cups
WHERE cups.cup_id = joined.cup_id
AND cups.register_id = registered_cups.register_id
AND joined.user_id = '308288'
AND cups.cup_type <>6
UNION
SELECT cups.cup_id, cups.edition, cups.user_id, null, registered_cups.name
FROM cups, registered_cups
WHERE cups.register_id = registered_cups.register_id
AND cups.user_id = '308288'
AND cups.cup_type <>6
AND NOT EXISTS
(SELECT 1
FROM joined
WHERE cups.user_id = joined.user_id and cups.cup_id = join.cup_id)
ORDER BY registered_cups.name ASC , cups.edition DESC

I assume you have an users table and cups are created before they can be joined (seems logic)
SELECT
c.cup_id,
u.user_name,
c.edition,
j.status,
j.cup_rank,
cr.user_id IS NOT NULL created,
r.name
FROM
(SELECT 133 user_id) u
/* semi cross join the cups table */
JOIN
cups c
ON u.user_id = 133 AND c.cup_type <> 6
JOIN
registered_cups r
ON c.register_id = r.register_id
LEFT JOIN
joined j
ON c.cup_id = j.cup_id AND u.user_id = j.user_id
LEFT JOIN
cups cr
ON c.cup_id = cr.cup_id AND u.user_id = cr.user_id
ORDER BY r.name, c.edition
Maybe I have some column names wrong or from the wrong table, but the idea stays the same
Edit:
SQL Fiddle

Convert SQL WHERE IN to JOIN

I have a database storing various information about fictional people. There is a table person with general information, such as name, adress etc and some more specific tables holding health history and education for everyone.
What I'm trying to do now, is getting possible connections for one person based on similarities like being at the same school for the same time or having the same doctor or being treated in the same hospital at the same time.
Following Query works fine for this (:id being the id of the person in question), however it is horribly slow (takes about 6secs to get a result).
SELECT person.p_id as id, fname, lname, image FROM person WHERE
(person.p_id IN (
SELECT patient from health_case WHERE
doctor IN (SELECT doctor FROM health_case WHERE patient =:id )
OR center IN (SELECT hc2.center FROM health_case as hc1, health_case as hc2 WHERE hc1.patient = :id AND hc2.center = hc1.center AND (hc1.start <= hc2.end AND hc1.end >= hc2.start)))
OR person.p_id IN (
SELECT ed2.pupil FROM education as ed1, education as ed2 WHERE
ed1.school IN (SELECT school FROM education WHERE pupil = :id) AND ed2.school = ed1.school AND (ed2.start <= ed1.end AND ed2.end >= ed1.start)
))
AND person.p_id != :id
What would be the best approach to convert it to use JOIN clauses? I somehow seem unable to wrap my head around these...

I think I understand what you're trying to do. There is more than one way to skin a cat, but may I suggest splitting your query into two separate queries, and then replacing the complicated WHERE clause with a couple inner joins? So, something like this:
/* Find connections based on health care */
SELECT p2.p_id as id, p2.fname, p2.lname, p2.image
FROM person p
JOIN health_case hc on hc.patient = p.p_id
JOIN health_case hc2 on hc2.doctor = hc.doctor and hc2.healthcenter = hc.healthcenter and hc.start <= hc2.end and hc.end >= hc2.start and hc2.patient <> hc.patient
JOIN person p2 on p2.p_id = hc2.patient and p2.p_id <> p.p_id
WHERE p.p_id = :id
Then, create a separate query to get connections based on education:
/* Find connections based on education */
SELECT p2.p_id as id, p2.fname, p2.lname, p2.image
FROM person p
JOIN education e on e.pupil = p.p_id
JOIN education e2 on e2.school = e.school and e2.start <= e.end AND e2.end >= e.start and e.pupil <> e2.pupil
JOIN person p2 on p2.p_id = e2.pupil and p2.p_id <> p.p_id
WHERE p.p_id = :id
If you really want the data results to be combined, you can use UNION since both queries return the same columns from the person table.

Depends on your SQL engine. Newer SQL systems that have reasonable query optimizers will most likely rewrite both IN and JOIN queries to the same plan. Typically, a sub-query (IN Clause) is rewritten using a join.
In simple SQL engines that may not have great query optimizers, the join should be faster because they may run sub-queries into a temporary in-memory table before running the outer query.
In some SQL engines that have limited memory footprint, however, the sub-query may be faster because it doesn't require joining -- which produces more data.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008