Convert SQL WHERE IN to JOIN

Convert SQL WHERE IN to JOIN - mysql

I have a database storing various information about fictional people. There is a table person with general information, such as name, adress etc and some more specific tables holding health history and education for everyone.
What I'm trying to do now, is getting possible connections for one person based on similarities like being at the same school for the same time or having the same doctor or being treated in the same hospital at the same time.
Following Query works fine for this (:id being the id of the person in question), however it is horribly slow (takes about 6secs to get a result).
SELECT person.p_id as id, fname, lname, image FROM person WHERE
(person.p_id IN (
SELECT patient from health_case WHERE
doctor IN (SELECT doctor FROM health_case WHERE patient =:id )
OR center IN (SELECT hc2.center FROM health_case as hc1, health_case as hc2 WHERE hc1.patient = :id AND hc2.center = hc1.center AND (hc1.start <= hc2.end AND hc1.end >= hc2.start)))
OR person.p_id IN (
SELECT ed2.pupil FROM education as ed1, education as ed2 WHERE
ed1.school IN (SELECT school FROM education WHERE pupil = :id) AND ed2.school = ed1.school AND (ed2.start <= ed1.end AND ed2.end >= ed1.start)
))
AND person.p_id != :id
What would be the best approach to convert it to use JOIN clauses? I somehow seem unable to wrap my head around these...

I think I understand what you're trying to do. There is more than one way to skin a cat, but may I suggest splitting your query into two separate queries, and then replacing the complicated WHERE clause with a couple inner joins? So, something like this:
/* Find connections based on health care */
SELECT p2.p_id as id, p2.fname, p2.lname, p2.image
FROM person p
JOIN health_case hc on hc.patient = p.p_id
JOIN health_case hc2 on hc2.doctor = hc.doctor and hc2.healthcenter = hc.healthcenter and hc.start <= hc2.end and hc.end >= hc2.start and hc2.patient <> hc.patient
JOIN person p2 on p2.p_id = hc2.patient and p2.p_id <> p.p_id
WHERE p.p_id = :id
Then, create a separate query to get connections based on education:
/* Find connections based on education */
SELECT p2.p_id as id, p2.fname, p2.lname, p2.image
FROM person p
JOIN education e on e.pupil = p.p_id
JOIN education e2 on e2.school = e.school and e2.start <= e.end AND e2.end >= e.start and e.pupil <> e2.pupil
JOIN person p2 on p2.p_id = e2.pupil and p2.p_id <> p.p_id
WHERE p.p_id = :id
If you really want the data results to be combined, you can use UNION since both queries return the same columns from the person table.

Depends on your SQL engine. Newer SQL systems that have reasonable query optimizers will most likely rewrite both IN and JOIN queries to the same plan. Typically, a sub-query (IN Clause) is rewritten using a join.
In simple SQL engines that may not have great query optimizers, the join should be faster because they may run sub-queries into a temporary in-memory table before running the outer query.
In some SQL engines that have limited memory footprint, however, the sub-query may be faster because it doesn't require joining -- which produces more data.

Related

MySQL Max() with group by and multiple tables

for my studies i need to get a code working. I do have two tables:
Training:
UserID
Date
Uebung
Gewicht
Wiederholungen
Mitglied:
UserID
Name
Vorname
and i need to display the max power which you get if you multiply 'Wiederholungen' with 'Gewicht' from the 'Training' table for EACH User with the date and name.
I know there is a "problem" with max() and group by. But i'm kinda new to MySQL and i was only able to find fixes with one table and also every column already existing. I have to join two tables AND create the power column.
I tried a lot and i think this may be my best chance
select name, vorname, x.power from
(SELECT mitglied.UserID,max( Wiederholungen*Gewicht) as Power
FROM training join mitglied
where Uebung = 'Beinstrecker'
and training.UserID = mitglied.UserID
group by training.UserID) as x
inner join (training, mitglied)
on (training.UserID = mitglied.UserID)
and x.Power = Power;
'''
I get way too many results. I know the last statement is wrong (x.power = power) but i have no clue how to solve it.

This is actually a fairly typical question here, but I am bad a searching for previous answers so....
You "start" in a subquery, finding those max values:
SELECT UserID, Uebung, MAX(Gewicht*Wiederholugen) AS Power
FROM training
WHERE Uebung = 'Beinstrecker'
GROUP BY UserID, Uebung
Then, you join that back to the table it came from to find the date(s) those maxes occurred:
( SELECT UserID, Uebung, MAX(Gewicht*Wiederholugen) AS Power
FROM training
WHERE Uebung = 'Beinstrecker'
GROUP BY UserID, Uebung
) AS maxes
INNER JOIN training AS t
ON maxes.UserID = t.UserID
AND maxes.Uebeng = t.Uebeng
AND maxes.Power = (t.Gewicht*t.Wiederholugen)
Finally, you join to mitglied to get information for the user:
SELECT m.name, m.vorname, maxes.Power
FROM ( SELECT UserID, Uebung, MAX(Gewicht*Wiederholugen) AS Power
FROM training
WHERE Uebung = 'Beinstrecker'
GROUP BY UserID, Uebung
) AS maxes
INNER JOIN training AS t
ON maxes.UserID = t.UserID
AND maxes.Uebeng = t.Uebeng
AND maxes.Power = (t.Gewicht*t.Wiederholugen)
INNER JOIN mitglied AS m ON t.UserID = m.UserID
;
Note: t.Uebung = 'Beinstrecker' could be used as a join condition instead, and might be faster; but as a matter of style I try to prevent redundant literals like that unless there is a worthwhile performance difference.

MySQL SELECT queries without LIMIT

I am doing a course on Relational Databases, MySQL to be more especific. We need to create some SELECT queries for a project. The project is related to music. It has tables to represent musicians (musician), bands (band) and the musician ability to do a certain task, like singing or playing the guitar (act).
Table musician contains :
id
name
stagename
startyear
Table band contains :
code
name
type ("band" or "solo")
startyear
And finally, table act contains :
band (foreign key to code of "band" table)
musician (foreign key to id of "musician" table)
hability (guitarist, singer, like that... and a foreign key to another table)
earnings
I have doubts in two exercises, the first one asks to select musicians id and stagename who participate with more acts in bands whose type is solo.
My solution for the first one is this:
SELECT ma.id, ma.stagename
FROM musician ma, act d, band ba
WHERE ma.id = d.musician
AND ba.code = d.band
AND ba.type = "solo"
GROUP BY ma.id, ma.stagename
HAVING COUNT(ma.id) = (SELECT COUNT(d2.musician) AS count
FROM act d2, band ba2
WHERE d2.band = ba2.code
AND ba2.type = "solo"
GROUP BY d2.musician
ORDER BY count DESC
LIMIT 1);
The second one is very similar to the last one. We need to select, for every startyear, the id and stagename of a musician who can do more acts, with the corresponding number of acts and the maximum and minimum of his cachet. This is my solution:
SELECT ma.startyear, ma.id, ma.stagename, COUNT(ma.id) AS NumActs, MIN(d.earnings), MAX(d.earnings)
FROM musician ma, act d, band ba
WHERE ma.id = d.musician
AND ba.code = d.band
AND ba.type = "solo"
GROUP BY ma.year, ma.id, ma.stagename
HAVING COUNT(ma.id) = (SELECT COUNT(d2.musician) AS count
FROM act d2, band ba2
WHERE d2.band = ba2.code
AND ba2.type = "solo"
GROUP BY d2.musician
ORDER BY count DESC
LIMIT 1);
The results with my dummy data are perfect but my teacher told us we should avoid using the LIMIT option, but that's the only way we can get the highest number, at least with what we know right now.
I've seen a lot of subqueries after the FROM statement to solve this problem, however, for this project we can't use subqueries inside FROM. Is this really possible without LIMIT ?
Thanks in advance.

It is possible, but much worse than with sub-query in from or limit. So I'd never use it in real life :)
Well, long story short, you can do something like this:
SELECT
m.id
, m.stagename
FROM
musician m
INNER JOIN act a ON (
a.musician = m.id
)
INNER JOIN band b ON (
b.code = a.band
AND b.type = 'solo'
)
GROUP BY
m.id
, m.stagename
HAVING
NOT EXISTS (
SELECT
*
FROM
act a2
INNER JOIN band b2 ON (
b2.code = a2.band
AND b2.type = 'solo'
)
WHERE
a2.musician != a.musician
GROUP BY
a2.musician
HAVING
COUNT(a2.musician) > COUNT(a.musician)
)
;
I think you can understand the idea from the query itself as it's pretty straightforward. However, let me know if you need an explanation.
It is possible that your restriction was slightly different and you were not allowing to use subquery in your main FROM part only.
P.S. I'm also use INNER JOIN ... ON syntax as it is easier to see what are table join conditions and what are where conditions.
P.P.S. It might be mistakes in query as I do not have your data structure so cannot execute the query and check. I only checked if the idea works with my test table.

EDIT I just re-read the question; my initial reading missed that inline views are disallowed.
We can avoid the ORDER BY ... DESC LIMIT 1 construct by making the subquery into an inline view (or, a "derived table" in the MySQL parlance), and using a MAX() aggregate.
As a trivial demonstration, this query:
SELECT b.foo
FROM bar b
ORDER
BY b.foo DESC
LIMIT 1
can be emulated with this query:
SELECT MAX(c.foo) AS foo
FROM (
SELECT b.foo
FROM bar b
) c
An example re-write of the first query in the question
SELECT ma.id
, ma.stagename
FROM musician ma
JOIN act d
ON d.musician = ma.id
JOIN band ba
ON ba.code = d.band
WHERE ba.type = 'solo'
GROUP
BY ma.id
, ma.stagename
HAVING COUNT(ma.id)
= ( SELECT MAX(c.count)
FROM (
SELECT COUNT(d2.musician) AS count
FROM act d2
JOIN band ba2
ON ba2.code = d2.band
WHERE ba2.type = 'solo'
GROUP
BY d2.musician
) c
)
NOTE: this is a demonstration of a rewrite of the query in the question; this makes no guarantee that this query (or the query in the question) are guaranteed to return a result that satisfies any particular specification. And the specification given in the question is not at all clear.

Optimizing a MySQL NOT IN( query

I am trying to optimize this MySQL query. I want to get a count of the number of customers that do not have an appointment prior to the current appointment being looked at. In other words, if they have an appointment (which is what the NOT IN( subquery is checking for), then exclude them.
However, this query is absolutely killing performance. I know that MySQL is not very good with NOT IN( queries, but I am not sure on the best way to go about optimizing this query. It takes anywhere from 15 to 30 seconds to run. I have created indexes on CustNo, AptStatus, and AptNum.
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
LEFT JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND a.CustNo NOT IN
(
SELECT
a2.CustNo
FROM
appointment a2
WHERE
a2.AptDateTime < a.AptDateTime)
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
Thank you in advance.

Try the following:
SELECT
COUNT(*) AS NumOfCustomersWithPriorAppointment,
FROM
transaction_log AS tl
INNER JOIN
appointment AS a
ON
a.AptNum = tl.AptNum
LEFT OUTER JOIN appointment AS earlier_a
ON earlier_a.CustNo = a.CustNo
AND earlier_a.AptDateTime < a.AptDateTime
INNER JOIN
customer AS c
ON
c.CustNo = tl.CustNo
WHERE
a.AptStatus IN (2)
AND earlier_a.AptNum IS NULL
AND a.AptDateTime > BEGIN_QUERY_DATE
AND a.AptDateTime < END_QUERY_DATE
This will benefit from a composite index on (CustNo,AptDateTime). Make it unique if that fits your business model (logically it seems like it should, but practically it may not, depending on how you handle conflicts in your application.)
Provide SHOW CREATE TABLE statements for all tables if this does not create a sufficient performance improvement.

How to improve SELECT performance joining multiple tables

I have the following mySQL SELECT statement that was working ok on a small data set but died when the volume was increased:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country, BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, Clients, BookingProgram, BookingAccommodation, Countries, ClientType, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
AND Programs.ProgramId = BookingProgram.ProgramId
With around 10K records in Bookings and 25K records in each of BookingAccommodation and BookingPrograms the volume isn't huge but the query ran in 950 seconds. I'm running the query in the SQL window of phpAdmin on a local MAMP server.
Splitting it into 3 queries the result comes back in a fraction of a second for each:
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId, Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate, Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1, Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country
FROM Bookings, Clients, Countries, ClientType
WHERE Bookings.WeekBeginning >= '2016-10-01'
AND Clients.ClientId = Bookings.ClientId
AND Clients.Email <> ''
AND Clients.CountryId = Countries.CountryId
SELECT DISTINCT Bookings.BookingId, BookingAccommodation.AccomId, BookingAccommodation.ShareType
FROM Bookings, BookingAccommodation
WHERE Bookings.BookingId = BookingAccommodation.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND BookingAccommodation.Nights > 0
SELECT DISTINCT Bookings.BookingId, BookingProgram.ProgramId, Programs.ProgramDesc
FROM Bookings, BookingProgram, Programs
WHERE Bookings.BookingId = BookingProgram.BookingId
AND Bookings.WeekBeginning >= '2016-10-01'
AND Programs.ProgramId = BookingProgram.ProgramId
There are multiple records in BookingAccommodation and BookingProgram for each record in Bookings but I only require one record from each hence the SELECT DISTINCT.
The primary key on Bookings is BookingId.
The primary key on BookingAccommodation is BookingId, AccomDate, AccomId
The primary key on BookingProgram is BookingId, ProgramId, AccomType
I've tried to rewrite the query with joins and sub queries but I'm obviously not doing it right. How can I join these 3 queries back into a single query that will perform well?

These are the basics of using subqueries instead of joins (MySQL assumed FWIW). Apologies for pseudocode, I thought it important to answer ASAP as this is one of the top hits on this issue I faced just now.
A client makes a booking to go on a cruise ship. The client should also specify their diet (eg. vegetarian, vegan, no soy, etc). We thus have three tables:
Bookings
Booking_Id, Booking_Date, Booking_Time, Client_Id
Clients
Client_Id, Client_Name, Client_Phone, Client_DietId
Diets
Diet_Id, Diet_Name
We now want to present to the concierge a full booking view.
Using "JOINS":
SELECT Bookings.Booking_Id, Bookings.Booking_Date, Bookings.Booking_Time, Clients.Client_Name, Diets.Diet_Name
FROM Bookings
INNER JOIN Clients
ON Bookings.Client_Id = Clients.Client_Id
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id
Using "SUBQUERIES":
How I think of it is creating "temp tables" in those separate JOINs - of course "temp tables" may or may not be the accurate low-level implementation, etc. but anecdotally subqueries may be faster than huge joins (other threads on this).
I have separate joins I want to do from the above example:
First I need to join the Clients with their Diets, then I join that "table" with Bookings.
Thus I end up with this (note the table (re)naming when referring to the subquery):
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id
Now if another table is to be joined say Staff assigned to a particular Booking, then the whole thing above would be nested, and so on eg:
SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT [RELEVANT FIELDS HERE ETC]
FROM
(SELECT Clients.Client_Id, Clients.Client_Name, Diets.Diet_Name
FROM Clients
INNER JOIN Diets
ON Clients.Client_DietId = Diets.Diet_Id)
AS ClientDetailsWithDiets
INNER JOIN Bookings
ON Bookings.Booking_Id = ClientDetailsWithDiets.Client_Id)
AS BookingDetailsFull
INNER JOIN Staff
ON BookingDetailsFull.Booking_Id = Staff.Booking_Id_Assigned

Try changing it as
SELECT DISTINCT Bookings.BookingId, Bookings.ResortId,
Bookings.WeekBeginning, Bookings.DepartDate, Bookings.CancelledDate,
Clients.FirstName, Clients.LastName, Clients.Email, Clients.Address1,
Clients.City, Clients.State, Clients.CountryId, Clients.ClientType, Countries.Country,
BookingAccommodation.AccomId, BookingAccommodation.ShareType, BookingProgram.ProgramId,
Programs.ProgramDesc
FROM Bookings
JOIN Clients ON Clients.ClientId = Bookings.ClientId AND Bookings.WeekBeginning >= '2016-10-01' AND Clients.Email <> ''
JOIN BookingProgram ON Bookings.BookingId = BookingProgram.BookingId
JOIN BookingAccommodation ON Bookings.BookingId = BookingAccommodation.BookingId AND BookingAccommodation.Nights > 0
JOIN Countries ON Clients.CountryId = Countries.CountryId
JOIN Programs ON Programs.ProgramId = BookingProgram.ProgramId
WHERE Bookings.WeekBeginning >= '2016-10-01';
If this is not getting you the results you wanted, try EXPLAIN and see the query plan.
Please Note: I didn't see table ClientType is being used anywhere so I did not include it in JOINs

Rather than spend more time trying to improve the select statement as it hits so many tables I opted to split it into the separate queries as I outlined in the original question.
In the end this was the quickest practical solution.

Grouping method

I am working on a query with the following format:
I require all the columns from the Database 'A', while I only require the summed amount (sum(amount)) from the Database 'B'.
SELECT A.*, sum(B.CURTRXAM) as 'Current Transaction Amt'
FROM A
LEFT JOIN C
ON A.Schedule_Number = C.Schedule_Number
LEFT JOIN B
ON A.DOCNUMBR = B.DOCNUMBR
ON A.CUSTNMBR = B.CUSTNMBR
GROUP BY A
ORDER BY A.CUSTNMBR
My question is regarding the grouping statement, database A has about 12 columns and to group by each individually is tedious, is there a cleaner way to do this such as:
GROUP BY A
I am not sure if a simpler way exists as I am new to SQL, I have previously investigated GROUPING_ID statements but thats about it.
Any help on lumped methods of grouping would be helpful

Since the docnumber is the primary key - just use the following SQL:
SELECT A.*, sum(B.CURTRXAM) as 'Current Transaction Amt'
FROM A
LEFT JOIN C
ON A.Schedule_Number = C.Schedule_Number
LEFT JOIN B
ON A.DOCNUMBR = B.DOCNUMBR
ORDER BY RM20401.CUSTNMBR
GROUP BY A.DOCNUMBR

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Convert SQL WHERE IN to JOIN - mysql

Related

MySQL Max() with group by and multiple tables

MySQL SELECT queries without LIMIT

Optimizing a MySQL NOT IN( query

How to improve SELECT performance joining multiple tables

Grouping method

Categories

Resources