Multiple LEFT JOIN in SQL Running Slow. How do I optimize it? - mysql

I am combining three tables - persons, properties, totals - using LEFT JOIN. I find the following query to be really fast but it does not give me all rows from table-1 for which there is no corresponding data in table-2 or table-3. Basically, it gives me only rows where there is data in table-2 and table-3.
SELECT a.*, b.propery_address, c.person_asset_total
FROM persons AS a
LEFT JOIN properties AS b ON a.id = b.person_id
LEFT JOIN totals AS c ON a.id = c.person_id
WHERE a.city = 'New York' AND
c.description = 'Total Immovable'
Whereas the following query gives me the correct result by including all rows from table-1 irrespective of whether there is corresponding data or no data from table-2 and table-3. However, this query is taking a really long processing time.
FROM persons AS a
LEFT JOIN
properties AS b ON a.id = b.person_id
LEFT JOIN
(SELECT person_id, person_asset_total
FROM totals
WHERE description = 'Total Immovable'
) AS c ON a.id = c.person_id
WHERE a.city = 'New York'
Is there a better way to write a query that will give data equivalent to second query but with speed of execution equivalent to the first query?

Don't use a subquery:
SELECT p.*, pr.propery_address, t.person_asset_total
FROM persons p LEFT JOIN
properties pr
ON p.id = pr.person_id LEFT JOIN
totals t
ON a.id = c.person_id AND t.description = 'Total Immovable'
WHERE p.city = 'New York';
Your approach would be fine in almost any other database. However, MySQL materializes "derived tables", which makes them much harder to optimize. The above has the same effect.
You will also notice that I changed the table aliases to be abbreviations for the table names. This makes the query much easier to follow.

Related

How is joining with a subquery different from joining without a subquery? Looking for difference between two similar queries

I want to see which user created floor equipment for which customer -- both of these queries do what I want. The second query, however, results with 700 more rows than the first. Could you please explain the difference?
I ran another query that found the difference between the two sets -- sure enough, this query yielded 700 rows. Therefore, the data output is the same, but somehow the second query catches more results. I tried looking at the additional 700 rows, but they all seemed normal and similar to the other results. I can't find the difference by looking at the code, which is what I'm hoping someone can help me with
First query
SELECT customer.name, user.name, floor_equipment.id
FROM customer, user, floor_equipment, floor, building, site
WHERE (floor_equipment.floorID = floor.ID AND floor.buildingID = building.id AND
building.siteID = site.id AND floor_equipment.created_by = user.id)
Second Query
SELECT newTable.custName, newTable.userName, newTable.equipID
FROM (SELECT customer.name as "custName", user.name as "userName",
floor_equipment.id as "equipID", floor_equipment.created_by as "creatorID"
FROM customer, floor_equipment, floor, building, site
WHERE (floor_equipment.floorID = floor.ID AND floor.buildingID = building.id AND
building.siteID = site.id AND site.customerID = customer.ID)) as newTable, user
WHERE user.id = newTable.creatorID
I would expect both of these queries to have the same result, however the second query yields 700 more rows than the first. Aside from the extra rows, both queries result in the same data. The 700 additional rows seem to be normal and similar to the other rows.
NOTE: There is a seemingly pointless subquery in the second query. The purpose of this was for optimization. I am running these queries within Domo, a business intelligence webapp. I wrote the subquery in hopes that it would run faster. Because of the way Domo works, the former took 2 hours whereas the latter took 45 seconds.
Ignoring (or perhaps rectifying) the syntax errors, your first query can be written as follows:
SELECT c.name
, u.name
, fe.id
FROM customer c
CROSS
JOIN user u
JOIN floor_equipment fe
ON fe.created_by = u.id
JOIN floor f
ON f.ID = fe.floorID
JOIN building b
ON b.id = f.buildingID
JOIN site s
ON s.id = b.siteID
Likewise, written a little more coherently, your second query is as follows:
SELECT x.custName
, x.userName
, x.equipID
FROM
( SELECT c.name custName
, u.name userName
, fe.id equipID
, fe.created_by creatorID
FROM customer c
JOIN site s
ON s.customerID = c.ID
JOIN building b
ON b.siteID = s.id
JOIN floor f
ON f.buildingID = b.id
JOIN floor_equipment fe
ON fe.floorID = f.ID
) x
JOIN user u
ON u.id = x.creatorID
Again, we can omit the subquery and write it thus...
SELECT c.name custName
, u.name userName
, fe.id equipID
, fe.created_by creatorID
FROM customer c
JOIN site s
ON s.customerID = c.ID
JOIN building b
ON b.siteID = s.id
JOIN floor f
ON f.buildingID = b.id
JOIN floor_equipment fe
ON fe.floorID = f.ID
JOIN user u
ON u.id = fe.created_by
...so we can see that the first query had a cartesian product (CROSS JOIN), whereas the second query does not.
Your code is a Cartesian product between the tables:
customer, user, floor_equipment, floor, building, site
and your where condition is not for a join but just for a tuple of Boolean value
floor_equipment.floorID = floor.ID,
floor.buildingID = building.id,
building.siteID = site.id,
floor_equipment.created_by = user.id
( boolean, boolean, boolean, boolean)
each boolean is the result for the corresponding match eg:
floor_equipment.floorID = floor.ID
so practically return all the rows because have not matching counterpart.
In the second, your first Cartesian product is expanded by the join between the first result and the matching rows for user.id and newTable.creatorID. Looking to your code, it could be that you need an explicit join syntax and proper on condition.

Sorting results from joins

While running this query:
SELECT
a.id,
pub.name AS publisher_name,
pc.name AS placement_name,
b.name AS banner_name,
a.lead_id,
a.partner_id,
a.type,
l.status,
s.correctness,
a.landing_page,
t.name AS tracker_name,
a.date_view,
a.date_action
FROM actions AS a
LEFT JOIN publishers AS pub ON a.publisher_id = pub.id
LEFT JOIN placements AS pc ON pc.publisher_id = pub.id
LEFT JOIN banners AS b ON b.campaign_id = a.campaign_id
LEFT JOIN leads l ON
l.lead_id = a.lead_id
AND l.created = (
SELECT MAX(created) from leads l2 where l2.lead_id = l.lead_id
)
LEFT JOIN statuses AS s ON l.status = s.status
LEFT JOIN trackers AS t ON t.id = a.tracker_id
LIMIT 10
I am able to sort by every column from actions table. However when I try to for example ORDER BY b.name (from banners table, joined on actions.banner_id) or ORDER BY l.lead_id (joined from leads on more complex condition as seen above) MySQL is running query for a loooong time (most tables have tens of thousands records). Is it possible, performance-wise, to sort by joined columns?
You should rewrite the query with a inner join on the table where the column you want to sort on is.
For example, if you sort on actions.banner_id
SELECT ...
FROM actions AS a
JOIN banners AS b ON b.campaign_id = a.campaign_id
LEFT JOIN *rest of the query*
You will get the same results unless there is not enough banners that can be joined to action to produce a total of 10 rows.
I'm guessing it's not the case otherwise you wouldn't be sorting on banner_id.
You could first filter (order by, where, etc.) your records in a subquery and then join the result with the rest of the tables.

My-Sql JOIN two tables error

I tried to combine two tables' data.
I got an error like this. can you see why?
Every derived table must have its own alias
SELECT a.title, number
FROM store a
JOIN
( SELECT count(b.code) as number
FROM redeem_codes b
WHERE product = a.title
AND available = "Available")
It's a little hard tell without knowing more about your table structures. I'll give a try anyway:
SELECT a.title, count(b.code) AS number FROM store a
LEFT JOIN redeem_codes b ON b.product = a.title
WHERE b.available = "Available"
GROUP BY a.title;
you need to have ALIAS on your subquery.
SELECT a.title, number 
FROM store a  
JOIN (subquery) b -- b is the `ALIAS`
-- and this query will not give you the result you want
but here's a more efficient query without using subquery,
SELECT a.title, count(b.code) number
FROM store a
INNER JOIN redeem_codes b -- or use LEFT JOIN to show 0
-- for those who have no product
ON b.product = a.title
WHERE b.available = 'Available'
GROUP BY a.title

mysql subquery inside a LEFT JOIN

I have a query that needs the most recent record from a secondary table called tbl_emails_sent.
That table holds all the emails sent to clients. And most clients have several to hundreds of emails recorded. I want to pull a query that displays the most recent.
Example:
SELECT c.name, c.email, e.datesent
FROM `tbl_customers` c
LEFT JOIN `tbl_emails_sent` e ON c.customerid = e.customerid
I'm guessing a LEFT JOIN with a subquery would be used, but I don't delve into subqueries much. Am I going the right direction?
Currently the query above isn't optimized for specifying the most recent record in the table, so I need a little assistance.
It should be like this, you need to have a separate query to get the maximum date (or the latest date) that the email was sent.
SELECT a.*, b.*
FROM tbl_customers a
INNER JOIN tbl_emails_sent b
ON a.customerid = b.customerid
INNER JOIN
(
SELECT customerid, MAX(datesent) maxSent
FROM tbl_emails_sent
GROUP BY customerid
) c ON c.customerid = b.customerid AND
c.maxSent = b.datesent
Would this not work?
SELECT t1.datesent,t1.customerid,t2.email,t2.name
FROM
(SELECT max(datesent) AS datesent,customerid
FROM `tbl_emails_sent`
) as t1
INNER JOIN `tbl_customers` as t2
ON t1.customerid=t2.customerid
Only issue you have then is what if two datesents are the same, what is the deciding factor in which one gets picked?

Problem using MySQL Join

i have a MySQL SELECT query which fetches data from 6 tables using Mysql JOIN. here is the MySQL query i am using.
SELECT
u.id,u.password,
u.registerDate,
u.lastVisitDate,
u.lastVisitIp,
u.activationString,
u.active,
u.block,
u.gender,
u.contact_id,
c.name,
c.email,
c.pPhone,
c.sPhone,
c.area_id,
a.name as areaName,
a.city_id,
ct.name as cityName,
ct.state_id,
s.name as stateName,
s.country_id,
cn.name as countryName
FROM users u
LEFT JOIN contacts c ON (u.contact_id = c.id)
LEFT JOIN areas a ON (c.area_id = a.id)
LEFT JOIN cities ct ON (a.city_id = ct.id)
LEFT JOIN states s ON (ct.state_id = s.id)
LEFT JOIN countries cn ON (s.country_id = c.id)
although query works perfectly fine it sometimes returns duplicate results if it finds any duplicate values when using LEFT JOIN. for example in contacts table there exist two rows with area id '2' which results in returning another duplicated row. how do i make a query to select only the required result without any duplicate row. is there any different type of MySQL Join i should be using?
thank you
UPDATE :
here is the contacts table, the column area_id may have several duplicate values.
ANSWER :
there was an error in my condition in last LEFT JOIN where i have used (s.country_id = c.id) instead it should be (s.country_id = cn.id) after splitting the query and testing individually i got to track the error. thank you for your response. it works perfectly fine now.
Duplicating the rows like you mentioned seems to indicate a data problem.
If users is your most granular table this shouldn't happen.
I'd guess, then, that it's possible for a single user to have multiple entries in contacts
You could use DISTINCT as mentioned by #dxprog but I think that GROUP BY is more appropriate here. GROUP BY whichever datapoint could potentially be duplicated....
After all, if a user has corresponding contact records, which one are you intending to JOIN to?
You must specify this if you want to remove "duplicates" because, as far as the RDBMS is concerned, the two rows matching
LEFT JOIN contacts c ON (u.contact_id = c.id)
Are, in fact, distinct already
I think a DISTINCT may be what you're looking for:
SELECT DISTINCT
u.id,u.password,
u.registerDate,
u.lastVisitDate,
u.lastVisitIp,
u.activationString,
u.active,
u.block,
u.gender,
u.contact_id,
c.name,
c.email,
c.pPhone,
c.sPhone,
c.area_id,
a.name as areaName,
a.city_id,
ct.name as cityName,
ct.state_id,
s.name as stateName,
s.country_id,
cn.name as countryName
FROM users u
LEFT JOIN contacts c ON (u.contact_id = c.id)
LEFT JOIN areas a ON (c.area_id = a.id)
LEFT JOIN cities ct ON (a.city_id = ct.id)
LEFT JOIN states s ON (ct.state_id = s.id)
LEFT JOIN countries cn ON (s.country_id = c.id)
This should only return rows where the user ID is distinct, though you may not get all the joined data you'd hoped for.