I've looked all over, and unfortunately, I can't seem to figure out what I'm doing wrong. I'm developing a personal financial management application that uses a MySQL server. For this problem, I have 4 tables I'm working with.
The TRANSACTIONS table contains columns CATID and BILLID which refer to primary keys in the SECONDARYCATEGORIES and BILLS tables. Both the TRANSACTIONS and BILLS tables have a column PCATID which refers to a primary key in the PRIMARYCATEGORIES table.
I'm building a SQL query that sums an "amount" column in the TRANSACTIONS table and returns the primary key from PCATID and the sum from all records that are associated with that value. If the BILLID is set to -1, it should find the PCATID in SECONDARYCATEGORIES where SECONDARYCATEGORIES.ID = TRANSACTIONS.CATID, otherwise (since -1 indicates this is NOT a bill), it should find the PCATID from the BILL record where BILLS.ID matches TRANSACTIONS.BILLID.
I'm looking for something like this (not valid SQL, obviously):
SELECT
SECONDARYCATEGORIES.PCATID,
SUM(TRANSACTIONS.AMOUNT)
FROM
TRANSACTIONS
IF (BILLID = -1) JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = TRANSACTIONS.CATID
ELSE JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = BILLS.CATID WHERE BILLS.ID = TRANSACTIONS.BILLID
I have tried a myriad of different JOINs, IF statements, etc, and I just can't seem to make this work. I had thought of breaking this up into different SQL queries based on the value of BILLID, and summing the values, but I'd really like to do this all in one SQL query if possible.
I know I'm missing something obvious here; any help is very much appreciated.
Edit: I forgot to describe the BILLS table. It contains a primary category, ID, as well as some descriptive data.
You can use OR in your JOIN, like this:
SELECT S.PCATID,
SUM(T.AMOUNT)
FROM TRANSACTIONS T
LEFT JOIN BILLS ON BILLS.ID = T.BILLID
JOIN SECONDARYCATEGORIES S ON (S.ID = T.CATID AND T.BILLID = -1)
OR (S.ID = BILLS.CATID AND BILLS.ID = T.BILLID)
You can also use COALESCE and CASE in your JOINs.
SELECT ID = COALESCE(s.PCATID,b.PCATID)
,Total = SUM(t.AMOUNT)
FROM TRANSACTIONS t
LEFT JOIN BILLS b ON b.BILLID = CASE WHEN t.BILLID <> -1 THEN t.BILLID END
LEFT JOIN SECONDARYCATEGORIES s ON s.CATID = CASE WHEN t.BILLID = -1 THEN t.CATID END
GROUP BY COALESCE(s.PCATID,b.BILLID)
I use UNION to pick either query. But the second query obviously won't work because it's missing BILLS table.
SELECT SECONDARYCATEGORIES.PCATID
, SUM(TRANSACTIONS.AMOUNT)
FROM TRANSACTIONS
JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = TRANSACTIONS.CATID AND BILLID = -1
UNION
SELECT SECONDARYCATEGORIES.PCATID
, SUM(TRANSACTIONS.AMOUNT)
FROM TRANSACTIONS
JOIN SECONDARYCATEGORIES ON SECONDARYCATEGORIES.ID = BILLS.CATID AND BILLID <> -1
WHERE BILLS.ID = TRANSACTIONS.BILLID
Related
When I am running a query on MySQL database, it is taking around 3 sec. When we execute the performance testing for 50 concurrent users, then the same query is taking 120 sec.
The query joins multiple tables with an order by clause and a limit condition.
We are using RDS instance (16 GB memory, 4 vCPU).
Can any one suggest how to improve the performance in this case?
Query:
SELECT
person0_.person_id AS person_i1_131_,
person0_.uuid AS uuid2_131_,
person0_.gender AS gender3_131_
CASE
WHEN
EXISTS( SELECT * FROM patient p WHERE p.patient_id = person0_.person_id)
THEN 1
ELSE 0
END AS formula1_,
CASE
WHEN person0_1_.patient_id IS NOT NULL THEN 1
WHEN person0_.person_id IS NOT NULL THEN 0
END AS clazz_
FROM
person person0_
LEFT OUTER JOIN
patient person0_1_ ON person0_.person_id = person0_1_.patient_id
INNER JOIN
person_attribute attributes1_ ON person0_.person_id = attributes1_.person_id
CROSS JOIN
person_attribute_type personattr2_
WHERE
attributes1_.person_attribute_type_id = personattr2_.person_attribute_type_id
AND personattr2_.name = 'PersonImageAttribute'
AND (person0_.person_id IN (SELECT
person3_.person_id
FROM
person person3_
INNER JOIN
person_attribute attributes4_ ON person3_.person_id = attributes4_.person_id
CROSS JOIN
person_attribute_type personattr5_
WHERE
attributes4_.person_attribute_type_id = personattr5_.person_attribute_type_id
AND personattr5_.name = 'LocationAttribute'
AND (attributes4_.value IN ('d31fe20e-6736-42ff-a3ed-b3e622e80842'))))
ORDER BY person0_1_.date_changed , person0_1_.patient_id
LIMIT 25
Plan
There appears to be some redundant query components, and what does not appear to be a proper context of CROSSS-JOIN when you have relation on specific patient and/or attribute info.
Your query getting the "clazz_" is based on a patient_id NOT NULL, but then again a person_id not null. Under what condition, would the person_id coming from the person table EVER be null. That sounds like a KEY ID and would NEVER be null, so why test for that. It seems like that is a duplicate field and in-essence is just the condition of a person actually being a patient vs not.
This query SHOULD get the same results otherwise and suggest the following specific indexes are available including
table index
person ( person_id )
person_attribute ( person_id, person_attribute_type_id )
person_attribute_type ( person_attribute_type_id, name )
patient ( patient_id )
select
p1.person_id AS person_i1_131_,
p1.uuid AS uuid2_131_,
p1.gender AS gender3_131_,
CASE WHEN p2.patient_id IS NULL
then 0 else 1 end formula1_,
-- appears to be a redunant result, just trying to qualify
-- some specific column value for later calculations.
CASE WHEN p2.patient_id IS NULL
THEN 0 else 1 end clazz_
from
-- pre-get only those people based on the P4 attribute in question
-- and attribute type of location. Get small list vs everything else
( SELECT distinct
pa.person_id
FROM
person_attribute pa
JOIN person_attribute_type pat
on pa.person_attribute_type_id = pat.person_attribute_type_id
AND pat.name = 'LocationAttribute'
WHERE
pa.value = 'd31fe20e-6736-42ff-a3ed-b3e622e80842' ) PQ
join person p1
on PQ.person_id = p1.person_id
LEFT JOIN patient p2
ON p1.person_id = p2.patient_id
JOIN person_attribute pa1
ON p1.person_id = pa1.person_id
JOIN person_attribute_type pat1
on pa1.person_attribute_type_id = pat1.person_attribute_type_id
AND pat1.name = 'PersonImageAttribute'
order by
p2.date_changed,
p2.patient_id
LIMIT
25
Finally, your query does an order by the date_changed and patient id which is based on the PATIENT table data having been changed. If that table is a left-join, you may have a bunch of PERSON records that are not patients and thus may not get
the expected records you really intent. So, just some personal review of what is presented in the question.
Speeding up the query is the best hope for handling more connections.
A simplification (but no speed difference), since TRUE=1 and FALSE=0:
CASE WHERE (boolean_expression) THEN 1 ELSE 0 END
-->
(boolean_expression)
Index suggestions:
person: INDEX(patient_id, date_changed)
person_attribute: INDEX(person_attribute_type_id, person_id)
person_attribute: INDEX(person_attribute_type_id, value, person_id)
person_attribute_type: INDEX(person_attribute_type_id, name)
If value is of type TEXT, then that cannot be used in an index.
Assuming that person has PRIMARY KEY(person_id) and patient -- patient_id, I have no extra recommendations for them.
The Entity-Attribute-Value schema pattern, which this seems to be, is hard to optimize when there are a large number of rows. Sorry.
The CROSS JOIN seems to be just an INNER JOIN, but with the condition in the WHERE instead of in ON, where it belongs.
person0_1_.patient_id can be NULL because of the LEFT JOIN, but I don't see how person0_.person_id can be NULL. Please check your logic.
I have the following mySQL SELECT statement that was working ok on a small data set but died when the volume was increased:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country, BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, Clients, BookingProgram, BookingAccommodation, Countries, ClientType, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
AND Programs.ProgramId = BookingProgram.ProgramId
With around 10K records in Bookings and 25K records in each of BookingAccommodation and BookingPrograms the volume isn't huge but the query ran in 950 seconds. I'm running the query in the SQL window of phpAdmin on a local MAMP server.
Splitting it into 3 queries the result comes back in a fraction of a second for each:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate, Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country
FROM Bookings, Clients, Countries, ClientType
WHERE Bookings.WeekBeginning >= '2016-10-01'
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
SELECT DISTINCT Bookings.BookingId, BookingAccommodation.AccomId, BookingAccommodation.ShareType
FROM Bookings, BookingAccommodation
WHERE Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
SELECT DISTINCT Bookings.BookingId, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, BookingProgram, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND Programs.ProgramId = BookingProgram.ProgramId
There are multiple records in BookingAccommodation and BookingProgram for each record in Bookings but I only require one record from each hence the SELECT DISTINCT.
The primary key on Bookings is BookingId.
The primary key on BookingAccommodation is BookingId, AccomDate, AccomId
The primary key on BookingProgram is BookingId, ProgramId, AccomType
I've tried to rewrite the query with joins and sub queries but I'm obviously not doing it right. How can I join these 3 queries back into a single query that will perform well?
These are the basics of using subqueries instead of joins (MySQL assumed FWIW). Apologies for pseudocode, I thought it important to answer ASAP as this is one of the top hits on this issue I faced just now.
A client makes a booking to go on a cruise ship. The client should also specify their diet (eg. vegetarian, vegan, no soy, etc). We thus have three tables:
Bookings
Booking_Id, Booking_Date, Booking_Time, Client_Id
Clients
Client_Id, Client_Name, Client_Phone, Client_DietId
Diets
Diet_Id, Diet_Name
We now want to present to the concierge a full booking view.
Using "JOINS":
SELECT Bookings.Booking_Id, Bookings.Booking_Date, Bookings.Booking_Time, Clients.Client_Name, Diets.Diet_Name
FROM Bookings
INNER JOIN Clients
ON Bookings.Client_Id = Clients.Client_Id
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id
Using "SUBQUERIES":
How I think of it is creating "temp tables" in those separate JOINs - of course "temp tables" may or may not be the accurate low-level implementation, etc. but anecdotally subqueries may be faster than huge joins (other threads on this).
I have separate joins I want to do from the above example:
First I need to join the Clients with their Diets, then I join that "table" with Bookings.
Thus I end up with this (note the table (re)naming when referring to the subquery):
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id
Now if another table is to be joined say Staff assigned to a particular Booking, then the whole thing above would be nested, and so on eg:
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id)
AS BookingDetailsFull
INNER JOIN Staff
ON BookingDetailsFull.Booking_Id = Staff.Booking_Id_Assigned
Try changing it as
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId,
Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,
Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1,
Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country,
BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId,
Programs.ProgramDesc
FROM Bookings
JOIN Clients ON Clients.ClientId = Bookings.ClientId AND Bookings.WeekBeginning >= '2016-10-01' AND Clients.Email <> ''
JOIN BookingProgram ON Bookings.BookingId = BookingProgram.BookingId
JOIN BookingAccommodation ON Bookings.BookingId = BookingAccommodation.BookingId AND BookingAccommodation.Nights > 0
JOIN Countries ON Clients.CountryId = Countries.CountryId
JOIN Programs ON Programs.ProgramId = BookingProgram.ProgramId
WHERE Bookings.WeekBeginning >= '2016-10-01';
If this is not getting you the results you wanted, try EXPLAIN and see the query plan.
Please Note: I didn't see table ClientType is being used anywhere so I did not include it in JOINs
Rather than spend more time trying to improve the select statement as it hits so many tables I opted to split it into the separate queries as I outlined in the original question.
In the end this was the quickest practical solution.
I have this MySQL query to get the total amount of only the first invoice for each client on a given month:
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
WHERE InvoiceID IN (
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
WHERE tblinvoice.ClientID IN (
SELECT tblclient.ClientID
FROM tblclient
LEFT JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE NOT tblclient.EnquiryID IS NULL
AND YEAR(EnquiryDate) = 2014
AND MONTH(EnquiryDate) = 9
)
GROUP BY tblinvoice.ClientID
);
When I run it, it seems to loop forever. If I remove the first part it gives me the list of invoices instantly. Am sure it is a small syntax detail but haven't been able to figure out what the problem is after nearly one hour trying to fix it.
Your assistance is appreciated.
This query can probably be done in a better way without all the sub queries as well, just I'm not so experienced with sub queries. :)
Solution was given but I should have included the full query rather than just the part I was having trouble with. The full query is:
SELECT AdvertisingID, AdvertisingTitle, AdvertisingYear,
AdvertisingMonth, AdvertisingTotal, AdvertisingVisitors,
IFNULL(
(SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN
(SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
JOIN
(SELECT DISTINCT tblclient.ClientID
FROM tblclient
JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE YEAR(tblenquiry.EnquiryDate)=tbladvertising.AdvertisingYear
AND MONTH(tblenquiry.EnquiryDate)=tbladvertising.AdvertisingMonth)
AS inq
ON tblinvoice.ClientID = inq.ClientID
GROUP BY tblinvoice.ClientID) AS inq2
ON tblinvoiceproduct.InvoiceID = inq2.InvoiceID)
, 0)
FROM tbladvertising
ORDER BY AdvertisingYear DESC, AdvertisingMonth DESC, AdvertisingTitle;
Now the problem is that the column with the sub query has no access to "tbladvertising.AdvertisingYear" or "tbladvertising.AdvertisingMonth"
A commenter mentioned that it's hard to understand what you're trying to do here. I agree. But I will take the risk of trying to puzzle it out.
As usual with this sort of query, it's helpful to take advantage of the structured part of structured query language, and try to build this up piece by piece. That's the secret to creating complex queries that actually do what you want them to do.
Your innermost query is this:
SELECT tblclient.ClientID
FROM tblclient
LEFT JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE NOT tblclient.EnquiryID IS NULL
AND YEAR(EnquiryDate) = 2014
AND MONTH(EnquiryDate) = 9
It is saying, "give me the list of ClientID values which have enquiries in September 2014. There's a more efficient way to do this:
SELECT DISTINCT tblclient.ClientID
FROM tblclient
JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE tblenquiry.EnquiryDate >= '2014-09-01'
AND tblenquiry.EnquiryDate < '2014-09-01' + INTERVAL 1 MONTH
Two changes here: First, the NOT ... IS NULL search is unnecessary because if the item you're searching on is null, there's no way for your EnquiryDate to be valid. So we just change the LEFT JOIN to an ordinary inner JOIN and get rid of the otherwise expensive NULL scan.
Second, we recast the date matching as a range scan, so it can use an index on tbl.EnquiryDate.
Cool.
Next, we have this query level.
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
WHERE tblinvoice.ClientID IN (
/* that list of Client IDs from the innermost query */
)
GROUP BY tblinvoice.ClientID
That is pretty straightforward. But MySQL isn't too swift with IN () clauses, so let's recast it in the form of a JOIN as follows:
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
JOIN (
/* that list of Client IDs from the innermost query */
) AS inq ON tblinvoice.ClientID = inq.ClientID
GROUP BY tblinvoice.ClientID
This gets us the list of invoice IDs which were the subject of the first enquiry of the month on behalf of each distinct ClientID. (It's hard for me to figure out the business meaning of this, but I don't understand your business.)
Finally, we come to your outermost query. We can also recast that as a JOIN, like so.
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN (
/* that list of first-in-month invoices */
) AS inq2 ON tblinvoiceproduct.InvoiceID = inq2.InvoiceID
So, this all expands to:
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN (
SELECT MIN(tblinvoice.InvoiceID) AS InvoiceID
FROM tblinvoice
JOIN (
SELECT DISTINCT tblclient.ClientID
FROM tblclient
JOIN tblenquiry ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE tblenquiry.EnquiryDate >= '2014-09-01'
AND tblenquiry.EnquiryDate < '2014-09-01' + INTERVAL 1 MONTH
) AS inq ON tblinvoice.ClientID = inq.ClientID
GROUP BY tblinvoice.ClientID
) AS inq2 ON tblinvoiceproduct.InvoiceID = inq2.InvoiceID
That should do the trick for you. In summary, the big optimizing changes are
using a date range scan.
eliminating the NOT ... IS NULL criterion.
recasting your IN clauses as JOIN clauses.
The next step will be to create useful indexes. A compound index (EnquiryDate, EnquiryID) on your tblenquiry is very likely to help a lot. But to be sure you'll need to do some EXPLAIN analysis.
What if you modify your above posted query, to replace the subquery with JOIN (INNER JOIN) like below. Give it a try.
SELECT SUM(InvoiceProductTotal)
FROM tblinvoiceproduct
JOIN
(
SELECT MIN(ti.InvoiceID) as MinInvoice
FROM tblinvoice ti
JOIN
(
SELECT tblclient.ClientID
FROM tblclient
LEFT JOIN tblenquiry
ON tblclient.EnquiryID = tblenquiry.EnquiryID
WHERE NOT tblclient.EnquiryID IS NULL
AND YEAR(EnquiryDate) = 2014
AND MONTH(EnquiryDate) = 9
) tab
on ti.ClientID = tab.ClientID
GROUP BY ti.ClientID
) tab1
on tblinvoiceproduct.InvoiceID = tab1.MinInvoice
Ok, I tried to simplify my question by abstracting away the details but I'm afraid I wasn't clear and didn't meet moderator requirements. So I will post the full query with my problem in more detail and the actual query I am struggling with. If the question is still inadequate, could you please comment with specifics about what is unclear and I will do my best to clarify.
First, here is the current query that returns all assignment rows for each bed:
SELECT
beds.bed_id,
beds.bedstatus,
beds.position as bed_position,
rooms.room_id,
rooms.room,
wings.wing_id,
wings.name as wing_name,
buildings.building_id,
buildings.name as building_name,
assignments.assignment_id,
assignments.student_id,
assignments.assign_dt,
assignments.assigned_by,
assignments.assignment_status,
assignments.expected_arrival_dt as arrival_dt,
assignments.room_charge_type,
students.first_name,
students.last_name,
meal_plans.name as meal_plan_name,
room_rates.rate_name
FROM
beds
LEFT JOIN
rooms ON (beds.room_id = rooms.room_id)
LEFT JOIN
wings ON (rooms.wing_id = wings.wing_id)
LEFT JOIN
buildings ON (wings.building_id = buildings.buildings_id)
LEFT JOIN assignments ON
((beds.bed_id=assignments.bed_id) AND (term_id = #term_id))
LEFT JOIN
students ON (assignments.student_id = students.student_id)
LEFT JOIN
meal_plans ON (assignments.meal_plan_id = meal_plans.meal_plan_id)
LEFT JOIN
room_rates ON (room_rate_id = room_rates.room_rate_id)
WHERE
(
(rooms.room IS NOT NULL) AND
(rooms.assignable = 1) AND
(buildings.active = 1) AND
(buildings.building_id = #building_id)
)
ORDER BY BY rooms.room;
The problem is that there may be multiple rows in the "assignments" table for each room distinguished by the "assignment_status" field and I want a single row for each assignment. I want to determine which assignment row to select based on the value in assignment_status. That is if the assignment status is "active", I want that row, otherwise, if there is a row with status "waiting approval" then I want that row, etc...
Barmar's suggestion is given here:
LEFT JOIN (SELECT *
FROM OtherTable
WHERE <criteria>
ORDER BY CASE status
WHEN 'Active' THEN 1
WHEN 'Waiting Approval' THEN 2
WHEN 'Canceled' THEN 3
...
END
LIMIT 1) other
This was very helpful and I attempted this approach:
SELECT
beds.bed_id,
beds.bedstatus,
beds.position as bed_position,
rooms.room_id,
rooms.room,
wings.wing_id,
wings.name as wing_name,
buildings.building_id,
buildings.name as building_name,
assign.assignment_id,
assign.student_id,
assign.assign_dt,
assign.assigned_by,
assign.assignment_status,
assign.expected_arrival_dt as arrival_dt,
assign.room_charge_type,
students.first_name,
students.last_name,
meal_plans.name as meal_plan_name,
room_rates.rate_name
FROM
beds
LEFT JOIN
rooms ON (beds.room_id = rooms.room_id)
LEFT JOIN
wings ON (rooms.wing_id = wings.wing_id)
LEFT JOIN
buildings ON (wings.building_id = buildings.buildings_id)
LEFT JOIN (SELECT *
FROM assignments
WHERE ((assignments.bed_id==beds.bed_id) AND (term_id = #term_id))
ORDER BY CASE assignment_status
WHEN 'Active' THEN 1
WHEN 'Waiting Approval' THEN 2
WHEN 'Canceled' THEN 3
END
LIMIT 1) assign
LEFT JOIN
students ON (assign.student_id = students.student_id)
LEFT JOIN
meal_plans ON (assign.meal_plan_id = meal_plans.meal_plan_id)
LEFT JOIN
room_rates ON (room_rate_id = room_rates.room_rate_id)
WHERE
(
(rooms.room IS NOT NULL) AND
(rooms.assignable = 1) AND
(buildings.active = 1) AND
(buildings.building_id = #building_id)
)
ORDER BY rooms.room;
But I realized, the problem here is that OtherTable (assignments) is joined to the parent query based on a FK:
((beds.bed_id=assignments.bed_id) AND (term_id = #term_id))
So I can't do the subselect as the beds.bed_id isn't in scope for the subselect. So as Barmar's comment indicates the join criteria needs to be outside the subselect--but I'm having trouble figuring out how to both restrict the results to a single row per room and move the join outside the subselect. I'm wondering if travelboy's suggestion to use GROUP BY may be more fruitful, but haven't been able to determine how the grouping should be done.
Let me know if I can provide additional clarification.
Original Question:
I need from Table A to do a LEFT JOIN on a SINGLE row in another table, Table B meeting certain criteria (there may be multiple or no rows in Table B that meet the criteria). If there are multiple rows I want to select which row in B to join based on the value of a field in Table B. For example, if there is a row in B with status column='Active', I want that row, if not, if there is a row with status='Waiting Approval', I want that row, if there is a row with status='Canceled', I want that row, etc... Can I do this without a sub select? With a sub select?
Use:
LEFT JOIN (SELECT *
FROM OtherTable
WHERE <criteria>
ORDER BY CASE status
WHEN 'Active' THEN 1
WHEN 'Waiting Approval' THEN 2
WHEN 'Canceled' THEN 3
...
END
LIMIT 1) other
In some cases (but not in all cases) you can do it without a sub-select. You would need to GROUP BY a unique field in table A, typically an ID. This ensures that you get only one (or none) row from table B. However, selecting the row you want is the tricky part. You need an aggregating function such as MAX(). If the field in B is a number, that's easy to do. If not, you can apply some SQL functions on the fields in B to calculate something like a score to sort by. For example, Active could correspond to a higher value than Cancelled etc. That will work without a sub-select and likely be faster on big data sets.
With a sub-select it's easy to do. You can either use Barmar's solution, or, if you only need one specific field from B, you can also put the sub-select within the SELECT clause of the outer query.
I need to follow up with some additional testing to make sure this is accomplishing my goal--but I think I've done this using travelboy's suggestion of a group by query combined with barmar's case logic (wish I could split the answer). Here's the query:
SELECT
beds.bed_id,
beds.bedstatus,
beds.position as bed_position,
rooms.room_id,
rooms.room,
wings.wing_id,
wings.name as wing_name,
buildings.building_id,
buildings.name as building_name,
assignments.assignment_id,
assignments.student_id,
assignments.assign_dt,
assignments.assigned_by,
assignments.assignment_status,
assignments.expected_arrival_dt as arrival_dt,
assignments.room_charge_type,
MIN(CASE assignments.assignment_status
WHEN 'Active' THEN 1
WHEN 'Waiting Approval' THEN 2
WHEN 'Canceled' THEN 3
END),
students.first_name,
students.last_name,
meal_plans.name as meal_plan_name,
room_rates.rate_name
FROM
beds
LEFT JOIN
rooms ON (beds.room_id = rooms.room_id)
LEFT JOIN
wings ON (rooms.wing_id = wings.wing_id)
LEFT JOIN
buildings ON (wings.building_id = buildings.building_id)
LEFT JOIN assignments
ON ((assignments.bed_id=beds.bed_id) AND (term_id = 28))
LEFT JOIN
students ON (assignments.student_id = students.student_id)
LEFT JOIN
meal_plans ON (assignments.meal_plan_id = meal_plans.meal_plan_id)
LEFT JOIN
room_rates ON (assignments.room_rate_id = room_rates.room_rate_id)
WHERE
(
(rooms.room IS NOT NULL) AND
(rooms.assignable = 1) AND
(buildings.active = 1)
)
GROUP BY
bed_id
ORDER BY rooms.room;
The following query hangs: (although subqueries perfomed separately are fine)
I don't know how to make the explain table look ok. If someone tells me, I'll clean it up.
select
sum(grades.points)) as p,
from assignments
left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID
from assignments
left join grades using (assignmentID)
where ... grades.date <= '1255503600' AND grades.date >= '984902400'
group by assignmentID order by grades.date DESC);
I think the problem is with the first grades table... the type ALL with that many rows seems to be the cause.. Everything is indexed.
I uploaded the table as an image. Couldn't get the formatting right:
http://imgur.com/AjX34.png
A commenter wanted the full where clause:
explain extended select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
where gradeID IN
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC);
See "The unbearable slowness of IN":
http://www.artfulsoftware.com/infotree/queries.php#568
Super messy, but: (thanks for everyone's help)
SELECT *
FROM grades
LEFT JOIN assignments ON grades.assignmentID = assignments.assignmentID
RIGHT JOIN (
SELECT g.gradeID
FROM assignments a
LEFT JOIN grades g
USING ( assignmentID )
WHERE a.classID = '7815'
AND (
a.type =30170
)
AND g.contactID =7141
g.points
REGEXP '^[-]?[0-9]+[-]?'
AND g.points != '-'
AND g.points != ''
AND (
g.pointsposs IS NULL
OR g.pointsposs = ''
)
AND g.date <= '1255503600'
AND g.date >= '984902400'
GROUP BY assignmentID
ORDER BY g.date DESC
) AS t1 ON t1.gradeID = grades.gradeID
Suppose you use a Real Database (ie, any database except MySQL, but I'll use Postgres as an example) to do this query :
SELECT * FROM ta WHERE aid IN (SELECT subquery)
a Real Database would look at the subquery and estimate its rowcount :
If the rowcount is small (say, less than a few millions)
It would run the subquery, then build an in-memory hash of ids, which also makes them unique, which is a feature of IN().
Then, if the number of rows pulled from ta is a small part of ta, it would use a suitable index to pull the rows. Or, if a major part of the table is selected, it would just scan it entirely, and lookup each id in the hash, which is very fast.
If however the subquery rowcount is quite large
The database would probably rewrite it as a merge JOIN, adding a Sort+Unique to the subquery.
However, you are using MySQL. In this case, it will not do any of this (it is gonna re-execute the subquery for each row of your table) so it will take 1000 years. Sorry.
If your subquery performs fine when it is executed separately, then try using a JOIN rather than IN, like this:
select count(assignments.assignmentID) as asscount, sum(TRIM(TRAILING '-' FROM grades.points)) as p, sum(assignments.points) as t
from assignments left join grades using (assignmentID)
join
(select grades.gradeID from assignments left join grades using (assignmentID) left join as_types on as_types.ID = assignments.type
where assignments.classID = '7815'
and (assignments.type = 30170 )
and grades.contactID = 7141
and grades.points REGEXP '^[-]?[0-9]+[-]?'
and grades.points != '-'
and grades.points != ''
and (grades.pointsposs IS NULL or grades.pointsposs = '')
and grades.date <= '1255503600'
AND grades.date >= '984902400'
group by assignmentID
order by grades.date DESC) using (gradeID);
There really isn't enough information to answer your question, and you've put a ... in the middle of the where clause which is weird. How big are the tables involved and what are the indexes?
Having said that, if there are too many terms in an in clause, you can see seriously degraded performance. Replace the use of in with a right join.
For starters, the table as_types in the in clause is not used. Left joining it serves no purpose so get rid of it.
That leaves the in clause having only the assignments and grades table from the outer query. Clearly the wheres the modify assignments belong in the where clause for the outer query. You should move all of the where grades=whatever into the on clause of the left join to grades.
The query is a little tough to follow, but I suspect that the subquery isn't necessary at all.
It seems like your query is basically thus:
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE gradeID IN
(
SELECT grades.gradeID
FROM assignments LEFT JOIN grades USING (assignmentID)
WHERE your_conditions = TRUE
);
But, you're not doing anything really fancy in the where clause in the subquery.
I suspect something more like
SELECT FOO()
FROM assignments LEFT JOIN grades USING (assignmentID)
GROUP BY groupings
WHERE your_conditions_with_some_tweaks = TRUE;
would work just as well.
If I'm missing some key logic here please comment back and I'll edit/delete this post.