Filtering rows by date in a full outer join query -> missing some results - mysql

Background
I've got two tables with different types of feedback items in MySQL. I've built a query to combine these tables by FULL OUTER JOIN (which is actually written as two joins and an union in MySQL) and to count some average grades. This query seems to work perfectly:
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
LEFT JOIN feedback_service AS s USING(name)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
RIGHT JOIN feedback_service AS s USING(name)
WHERE l.id IS NULL
GROUP BY name)
ORDER BY name;
(This is somewhat simplified for readability but it doesn't make a difference here)
Problem
Next I tried adding filtering by date (i.e. only feedback items created after a certain date are taken in account). With my SQL skills and the research I did, I was able to come up with this:
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
LEFT JOIN feedback_service AS s USING(name)
WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic AS l
RIGHT JOIN feedback_service AS s USING(name)
WHERE l.id IS NULL
AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
GROUP BY name)
ORDER BY name;
This almost works: the results I get look about right. However, a couple of feedback items are missing. For example, setting the date one month ago, I counted feedback for 21 different people in the database, but this query only returns 19 people. The worst thing is that I can't seem to find any similarities between the missing items.
Am I doing something wrong in this query? I think that the WHERE clause does the date filtering after the JOIN and ideally I would probably be doing it before. Then again, I don't know if this causes my problem and I also have no idea how to write this query differently.

I accepted Johans answer as he did a good job explaining this stuff to me and the answer is useful even in a more generic sense. However, I thought I'd also post the first solution I arrived to. It was using subqueries:
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM (
SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
) AS l
LEFT JOIN (
SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
) AS s USING(name)
GROUP BY name)
UNION ALL
(SELECT name, AVG(l.overallQuality) AS avgLingQual,
AVG(s.overallSatisfaction) AS avgSvcQual
FROM (
SELECT * FROM feedback_linguistic WHERE createdTime >= '" & date & "'
) AS l
RIGHT JOIN (
SELECT * FROM feedback_service WHERE createdTime >= '" & date & "'
) AS s USING(name)
WHERE l.id IS NULL
GROUP BY name)
ORDER BY name;
The results are correct with this query. However, the solution doesn't really look optimal, as subqueries are sometimes slow in my experience. Then again, I haven't done any performance analysis, so maybe using subqueries here is not a bottleneck. In any case it worked fast enough in my application.

A full outer join is a combination of 3 joins:
1- inner join between A and B
2- left exclusion join between A and B
3- right exclusion join between A and B
Note that the combination of an inner and a left exclusion join is a left outer join, so you normally rewrite the query as a left outer join + right exclusion join.
However for debugging purposes it can be useful to union all 3 joins and to add some marker as to which join does what:
/*inner join*/
(SELECT
'inner' as join_type
, COALESCE(s.name, l.name) as listname
, AVG(l.overallQuality) AS avgLingQual
, AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic l
INNER JOIN feedback_service s ON (l.name = s.name)
WHERE (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
GROUP BY l.name)
UNION ALL
(SELECT
'left exclusion' as join_type
, COALESCE(s.name, l.name) as listname
, AVG(l.overallQuality) AS avgLingQual
, AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic l
LEFT JOIN feedback_service s ON (l.name = s.name)
WHERE s.id IS NULL
/*AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL) */
AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL)
GROUP BY l.name)
UNION ALL
(SELECT
'right exclusion' as join_type
, COALESCE(s.name, l.name) as listname
, AVG(l.overallQuality) AS avgLingQual
, AVG(s.overallSatisfaction) AS avgSvcQual
FROM feedback_linguistic l
RIGHT JOIN feedback_service s ON (s.name = l.name)
WHERE l.id IS NULL
AND (s.createdTime >= '" & date & "' OR s.createdTime IS NULL)
/*AND (l.createdTime >= '" & date & "' OR l.createdTime IS NULL) */
GROUP BY s.name)
ORDER BY listname;
I think that the WHERE clause does the date filtering after the JOIN and ideally I would probably be doing it before.
If you want to do the filtering before, then put it in the join clause.

Related

Is there a way to optimize this query

I have written a query but it's taking a lot of time. I want to know if there exists any solution to optimize it without making a temp table in MYSQL. Is there a way to optimize the subquery part since AccessLog2019 is huge so it's taking forever)
Here is my query
SELECT distinct l.ListingID,l.City,l.ListingStatus,l.Price,l.Bedrooms,l.FullBathrooms, gc.Latitude,gc.Longitude , count(distinct s.AccessLogID) AS access_count, s.LBID , lb.CurrentListingID
from lockbox.Listings l
JOIN lockbox.GeoCoordinates gc ON l.ListingID = gc.ID
LEFT JOIN lockbox.LockBox lb ON l.ListingID = lb.CurrentListingID
LEFT JOIN
(SELECT * FROM lockbox.AccessLog2019 ac where ac.AccessType not in('1DayCodeGen','BluCodeGen','SmartMACGen') AND DATEDIFF(NOW(), ac.UTCAccessedDT ) < 1 ) s
ON lb.LBID = s.LBID
WHERE l.AssocID = 'AS00000000CC' AND (gc.Confidence <> '5 - Unmatchable' OR gc.Confidence IS NULL OR gc.Confidence = ' ')
group BY l.ListingID
Thanks
If you can avoid the outer group by, that is a big win. I am thinking:
SELECT l.ListingID, l.City, l.ListingStatus, l.Price, l.Bedrooms, l.FullBathrooms,
gc.Latitude, gc.Longitude,
(select count(*)
from lockbox.LockBox lb join
lockbox.AccessLog2019 ac
on lb.LBID = ac.LBID
where l.ListingID = lb.CurrentListingID and
ac.AccessType not in ('1DayCodeGen', 'BluCodeGen', 'SmartMACGen') and
DATEDIFF(NOW(), ac.UTCAccessedDT) < 1
) as cnt
from lockbox.Listings l JOIN
lockbox.GeoCoordinates gc
ON l.ListingID = gc.ID
WHERE l.AssocID = 'AS00000000CC' AND
(gc.Confidence <> '5 - Unmatchable' OR
gc.Confidence IS NULL OR
gc.Confidence = ' '
)
Note: This does not select s.LBID or lb.CurrentListingID because these don't make sense in your query. If I understand correctly, these could have different values on different rows.
You could try breaking out the subquery to the JOIN clause.
It might give a hint to the optimizer that it can use the LBID field first, and then test the AccessType later (in case the optimizer doesn't figure that out when you have the sub-select).
SELECT distinct l.ListingID,l.City,l.ListingStatus,l.Price,l.Bedrooms,l.FullBathrooms, gc.Latitude,gc.Longitude , count(distinct s.AccessLogID) AS access_count, s.LBID , lb.CurrentListingID
from lockbox.Listings l
JOIN lockbox.GeoCoordinates gc ON l.ListingID = gc.ID
LEFT JOIN lockbox.LockBox lb ON l.ListingID = lb.CurrentListingID
LEFT JOIN AccessLog2019 s
ON lb.LBID = s.LBID
AND s.AccessType not in('1DayCodeGen','BluCodeGen','SmartMACGen')
AND DATEDIFF(NOW(), s.UTCAccessedDT ) < 1
WHERE l.AssocID = 'AS00000000CC' AND (gc.Confidence <> '5 - Unmatchable' OR gc.Confidence IS NULL OR gc.Confidence = ' ')
group BY l.ListingID
Note that this is one of those cases where conditions in the JOIN clause gives different behavior than using a WHERE clause. If you just had lb.LBID = s.LBID and then had the conditions I wrote in the WHERE of the outer query the results would be different. They would exclude the records matching lb.LBID = s.LBID. But in the JOIN clause, it is part of the conditions of the outer join.
SELECT * --> Select only the columns needed.
SELECT DISTINCT ... GROUP BY -- Do one or the other, not both.
Need composite INDEX(AssocID, ListingID) (in that order)
DATEDIFF(NOW(), ac.UTCAccessedDT ) < 1 --> ac.UTCAccessedDT > NOW() - INTERVAL 1 DAY (or whatever your intent was. Then add INDEX(UTCAccessedDT)
OR is hard to optimize; consider cleansing the data so that Confidence does not have 3 values that mean the same thing.

Inner SELECT can't use `u.id` which is in outer SELECT

I have the following MySQL query:
SELECT
COUNT(b.`id`) AS todayOverdue,
DATE_FORMAT(t.`created_time`, "%Y%m%d") AS days
FROM
`Bill` b
LEFT JOIN `Order` o ON b.`order_id` = o.`id`
LEFT JOIN `Trade` t ON o.`trade_id` = t.`id`
LEFT JOIN `User` u ON b.`user_id` = u.`id`
WHERE
b. `deadline` <= "' . $todayTime . '"
AND b. `deadline` >= "' . $todayDate . '"
AND b.`is_paid` = 0
AND (
SELECT
COUNT(b2.`id`)
FROM
`Bill` b2
WHERE
b2.`deadline` <= "' . $todayTime . '"
AND b2.`user_id` = u.`id`
AND b2.`is_paid` = 0
OR (
b2.`deadline` <= b2.`paid_time`
AND b2.`is_paid` = 1
)
) < 2
GROUP BY
days
Why can't the inner SELECT use u.id which is in outer SELECT?
The inner select is completely independent of the outer select. The u table is being joined in the outer select in ways that are unknown to the inner select.
When you have
AND b2.`user_id` = u.`id`
Which row in u is being compared to which row in b2? The server has no way of knowing, so you need to define a table u2 and join it in the inner select.

mysql query group by in group_concat

i have the following query:
SELECT files.file_name, files.locked, projects.project_name,
group_concat( versions.version, versions.language SEPARATOR ' & ')
FROM files
JOIN `projects` ON (files.project_id = projects.project_id)
JOIN `versions` ON (files.file_id = versions.file_id)
WHERE files.file_id = '1'
ORDER BY projects.project_name ASC
this gives me this table:
filename - 1 - projectname - 0.1EN & 0.2FR & 0.3DE & 0.1IT
what i want, is the query to output something like this:
filename - 1 - projectname - 0.1-EN,IT & 0.2-FR & 0.3-DE
so i tried this:
group_concat( versions.version, versions.language GROUP BY versions.version SEPARATOR ' & ')
but mysql did not like that.
How can i get the result i want? Thanks.
Edit: sample tables!
Give this a try,
SELECT a.file_name,
a.locked,
b.project_name,
GROUP_CONCAT(c.version, c.language SEPARATOR ' & ')
FROM files a
INNER JOIN projects b
ON a.project_id = b.project_ID
INNER JOIN
(
SELECT file_ID, `version`, GROUP_CONCAT(language) language
FROM versions
GROUP BY file_ID, `version`
) c ON a.file_ID = c.file_ID
WHERE a.file_ID = 1
GROUP BY a.file_name,
a.locked,
b.project_name

SUM() on all rows of a LEFT JOIN?

I've been playing with LEFT JOINs and I'm wondering if it's possible to get the SUM of all the ratings so far from all users in the below query. The below get's me the information on if the logged in user has rated or not, but i want to show total ratings from other users also.
$query = mysql_query("
SELECT com.comment, com.comment_id, r.rate_up, r.rate_down
FROM comments com
LEFT JOIN ratings r
ON com.comment_id = r.comment_id
AND r.user_id = '" . $user_id_var . "'
WHERE page_id = '" . $category_id_var. "'");
I've tried the following but i only get one comment/row returned for some reason.
$query = mysql_query("
SELECT com.comment, com.comment_id,
r.rate_up, r.rate_down
SUM(r.rate_up) AS total_up_ratings,
SUM(r.rate_down) AS total_down_ratings,
FROM comments com
LEFT JOIN ratings r
ON com.comment_id = r.comment_id
AND r.user_id = '" . $user_id_var . "'
WHERE page_id = '" . $category_id_var. "'");
Any help appreciated. Do i need a different kind of JOIN?
Have you tried using GROUP BY page_id on the end of your SQL?
You could do it something like this:
SELECT
com.comment,
com.comment_id,
Total.total_down_ratings,
Total.total_up_ratings
FROM comments com
LEFT JOIN
(
SELECT
SUM(r.rate_up) AS total_up_ratings,
SUM(r.rate_down) AS total_down_ratings,
r.comment_id
FROM
ratings r
GROUP BY
r.comment_id
) AS Total
ON com.comment_id = Total.comment_id
AND r.user_id = '" . $user_id_var . "'
WHERE page_id = '" . $category_id_var. "'"
If you use an aggregation function in SQL (like SUM()) you will need a corresponding GROUP BY clause.
In your case the most likely one would be com.comment_id; this will give you the sum of all ratings per comment_id:
I don't know if you are duplicating comment rows for purpose but if not you can try write subselect for ratings like this:
select comment_id, user_id, sum(rate_up) as sum_rate_up,
sum(rate_down) as sum_rate_down from ratings
group_by comment_id, user_id;
and then include it in your join query:
select com.comment, com.comment_id, r.user_id, r.sum_rate_up,
r.sum_rate_down from comments com
left join (select comment_id, user_id, sum(rate_up) as sum_rate_up,
sum(rate_down) as sum_rate_down from ratings
group_by comment_id, user_id) as r
on com.comment_id = r.comment_id where page_id = '".$category_id_var."'

Mysql query - JOINS and WHERE clause, beginner here :)

having difficulties with a form and mysql. 3 tables, 1 sum of a tables values. The form provides the value to search, but it does not work with the "WHERE >= '$search_total_rating" being in the wrong place, i am doing something very wrong here.
$result = mysql_query("SELECT coffeeshops.*, services.*, ratings.*, sum(temp.total) as final_total FROM coffeeshops inner join services on coffeeshops.shop_id=services.shop_id
inner join ratings on coffeeshops.shop_id=ratings.shop_id
inner join (select SUM(comfort + service + ambience + friendliness + spacious)/(5) / COUNT(shop_id) AS total, shop_id FROM ratings GROUP BY shop_id) as temp on coffeeshops.shop_id=temp.shop_id WHERE >= '$search_total_rating'");
I do not fully understand this, but what i am trying to do is WHERE the total rating sum is >= selected rating. I am trying to access final_total which is not an actual column in my database, that is why SUM is being used to get the total rating for each shop. Hopefully it is a minor shuffle of the code. Thanks
You should use having instead of where
SELECT coffeeshops.*, services.*, ratings.*, sum(temp.total) as final_total
FROM coffeeshops inner join services on coffeeshops.shop_id=services.shop_id
inner join ratings on coffeeshops.shop_id=ratings.shop_id
inner join (
select SUM(comfort + service + ambience + friendliness + spacious)/5/ COUNT(shop_id) AS total, shop_id
FROM ratings GROUP BY shop_id)
as temp on coffeeshops.shop_id=temp.shop_id
having final_total >= '$search_total_rating'
You have already calculated your total in the subquery. No need for a second SUM().
SELECT coffeeshops.*
, services.*
, ratings.*
, temp.total as final_total
FROM coffeeshops
inner join services
on coffeeshops.shop_id = services.shop_id
inner join ratings
on coffeeshops.shop_id = ratings.shop_id
inner join
( select SUM(comfort + service + ambience + friendliness + spacious) / 5
/ COUNT(shop_id) AS total
, shop_id
FROM ratings
GROUP BY shop_id
) as temp
on coffeeshops.shop_id = temp.shop_id
WHERE temp.total >= '$search_total_rating'
You could also use a HAVING in the subquery:
SELECT coffeeshops.*
, services.*
, ratings.*
, temp.total as final_total
FROM coffeeshops
inner join services
on coffeeshops.shop_id = services.shop_id
inner join ratings
on coffeeshops.shop_id = ratings.shop_id
inner join
( select SUM(comfort + service + ambience + friendliness + spacious) / 5
/ COUNT(shop_id) AS total
, shop_id
FROM ratings
GROUP BY shop_id
HAVINGE total >= '$search_total_rating'
) as temp
on coffeeshops.shop_id = temp.shop_id