MySQL - add clauses to left join

MySQL - add clauses to left join - mysql

I have a table called properties (p) and another table called certificates (c). There can be more than one certificate allocated against each property or no certificate at all. I need to produce a query that uses a join and only displays one certificate from the certificates table per property. The one certificate that is shown needs to be the one with the most recent expiry date. There is a field in the certificates table named 'certificate_expiry_date'. The simple join would be p.property_id = c.certificate_property but this currently outputs all certificates.
My Query Attempt
Here's my query so far;
SELECT DISTINCT t.tenancy_property, t.*, p.*, c.* FROM tenancy t
INNER JOIN property p
on t.tenancy_property = p.property_id
LEFT JOIN
(
SELECT *
FROM certificate
WHERE certificate_expiry_date > CURDATE()
ORDER BY certificate_expiry_date DESC
LIMIT 1
) c ON p.property_id = c.certificate_property
WHERE t.tenancy_type='1' AND p.property_mains_gas_supply='1' AND p.property_availability='2' ORDER BY t.tenancy_id DESC LIMIT {$startpoint} , {$per_page}
This query executes fine but doesn't seem to take into account the left join on the certificates table.
Table structure for table certificate
CREATE TABLE IF NOT EXISTS `certificate` (
`certificate_id` int(11) NOT NULL AUTO_INCREMENT,
`certificate_property` int(11) DEFAULT NULL,
`certificate_type` tinyint(4) DEFAULT NULL,
`certificate_reference` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`certificate_start_date` date DEFAULT NULL,
`certificate_expiry_date` date DEFAULT NULL,
`certificate_notes` text COLLATE utf8_bin,
`certificate_renewal_instructed` tinyint(4) DEFAULT NULL,
`certificate_renewal_contractor` int(11) DEFAULT NULL,
PRIMARY KEY (`certificate_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=219 ;

If we only need to return one or two columns from the certificates table, we can sometimes use correlated subqueries in the SELECT list.
This approach has some performance implications for large tables; but for some use cases, with appropriate indexes available, this can be a workable approach.
SELECT p.id
, p.somecol
, ( SELECT c.col
FROM certificate c
WHERE c.property_id = p.id
ORDER BY c.date_added DESC, c.id DESC
LIMIT 1
) AS most_recent_cert_col
, ( SELECT c.date_added
FROM certificate c
WHERE c.property_id = p.id
ORDER BY c.date_added DESC, c.id DESC
LIMIT 1
) AS most_recent_cert_date_added
FROM property p
WHERE ...
ORDER BY ...

Updated answer with your updated information
Something like this?
(Note: This answer assumes that each property has at least one certificate, or else the sub-query qMostRecentExpire may fail)
select
p.property_id
, p.*
, ( select
c.certificate_id
from
certificates as c
where
c.certificate_property = p.property_id -- all the cert of this property
and c.certificate_expiry_date < CURDATE() -- cert has expired
order by c.certificate_expiry_date desc
limit 1 -- most recent one
) as qMostRecentExpire
from
properties as p
Updated answer after knowing that some properties may have no certificates
select
p.property_id
, p.*
, ( select
c.certificate_id
from
certificates as c
where
c.certificate_property = p.property_id -- all the cert of this property
and c.certificate_expiry_date < CURDATE() -- cert has expired
order by c.certificate_expiry_date desc
limit 1 -- most recent one
) as qMostRecentExpire
from
properties as p
, certificates as c -- inner join : properties that
where -- has not cert will be dropped
p.property_id = c.certificate_property

Related

Optimize table to avoid using temporary and using filesort

I have a messages table
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`author` int(11) DEFAULT NULL,
`time` int(10) unsigned DEFAULT NULL,
`text` text CHARACTER SET latin1,
`dest` int(11) unsigned DEFAULT NULL,
`type` tinyint(4) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `author` (`author`),
KEY `dest` (`dest`)
) ENGINE=InnoDB AUTO_INCREMENT=2758 DEFAULT CHARSET=utf8;
I need to get messages between two users
SELECT
...
FROM
`messages` m
LEFT JOIN `people` p ON m.author = p.id
WHERE
(author = 1 AND dest = 2)
OR (author = 2 AND dest = 1)
ORDER BY
m.id DESC
LIMIT 0, 25
When I EXPLAIN this query I get
Please excuse any ignorance, but is there a way I could optimize this table to avoid using a temporary table and filesort for this query, for now it is not causing a problem but I'm pretty sure in future it is going to be troublesome?

First, I'm guessing the left join is not necessary. Second, consider using union all instead. Then one approach is:
(SELECT ...
FROM messages m JOIN
people p
ON m.author = p.id
WHERE author = 1 AND dest = 2
ORDER BY id DESC
LIMIT 25
)
UNION ALL
(SELECT ...
FROM messages m JOIN
people p
ON m.author = p.id
WHERE author = 2 AND dest = 1
ORDER BY id DESC
LIMIT 25
)
ORDER BY m.id DESC
LIMIT 0, 25
With this query, an index on messages(author, dest, id) should make it fast. (Note: you might need to include m.id in the SELECT list.)

To build on Gordon's answer:
SELECT m2..., p...
FROM
(
( SELECT id
FROM messages
WHERE author = 1
AND dest = 2
ORDER BY id DESC
LIMIT 75
)
UNION ALL
(
SELECT id
FROM messages
WHERE author = 2
AND dest = 1
ORDER BY id DESC
LIMIT 75
)
) ORDER BY id DESC
LIMIT 50, 25 ) AS m1
JOIN messages AS m2 ON m2.id = m1.id
JOIN people p ON p.id = m2.author
ORDER BY m1.id DESC
Notes:
Gordon's index is now "covering". (This adds efficiency, thereby masking some of the other stuff I added.)
Lazy evaluation means that it does not need to shovel all the bulky fields of more than 25 rows around. Instead, only 25 need to be handled. Also, I avoid touching people to start with.
The code shows what "page 3" should look like. Note LIMIT 75 versus LIMIT 50,25.
"Pagination via OFFSET" has several problems. See my blog.
This formulation still will not avoid "filesort" and "using temp". But speed is the real goal, correct? ("Filesort" is a misnomer -- if you don't include that TEXT column, the sort will be done in RAM.)
When you add INDEX(author, dest, id), INDEX(author) becomes redundant; drop it.
The ALL after UNION is not the default for UNION, but it avoids an extra pass (and temp table) to de-duplicate the data.
There will still be 2 or 3 temp tables involved. See EXPLAIN FORMAT=JSON SELECT ... for details.

How to improve my friend list MySQL query?

I have a big MySQL query which actually returns the good set of result but it is quite slow.
SELECT DISTINCT
username,
user.id as uid,
email,
ui.gender,
ui.country,
ui.birthday,
IF( last_activity_date >= now() - INTERVAL 1 HOUR, 1, 0) as is_online,
p.thumb_url,
friend_request.id as sr_id,
IF( ul.id IS NOT NULL, 1, 0) as st_love,
DATE_FORMAT( NOW(), '%Y') - DATE_FORMAT( birthday, '%Y') - (DATE_FORMAT( NOW(), '00-%m-%d') < DATE_FORMAT( birthday, '00-%m-%d')) AS age
FROM friend_request
JOIN user ON (user.`id` = friend_request.`to_user_id`
OR
user.`id` = friend_request.`from_user_id`)
AND user.`id` != '$user_id'
AND friend_request.`status` = '1'
JOIN user_info ui ON user.`id` = ui.`user_id`
JOIN photo p ON ui.`main_photo` = p.`id`
LEFT JOIN user_love ul ON ul.`to_user_id` = user.`id`
AND ul.`from_user_id` = $user_id
WHERE (friend_request.`to_user_id` = '$user_id'
OR friend_request.`from_user_id` = '$user_id')
ORDER BY friend_request.id DESC
LIMIT 30
"$user_id" is the id of the logged-in user.
Here is the table structure of "friend_request" :
CREATE TABLE `friend_request` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`from_user_id` int(11) DEFAULT NULL,
`to_user_id` int(11) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`seen` int(11) DEFAULT '0',
`status` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `from_user_id` (`from_user_id`),
KEY `to_user_id` (`to_user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Can you help me to improve this query?
I have not copied the table structure of the other tables because, after some tests, the "optimization issue" seems to come from the friend_request table.
Thanks!
EDIT :
Here is what "EXPLAIN" gives me :

You should look at the query plan or try using the key word Explain https://dev.mysql.com/doc/refman/5.5/en/using-explain.html so that you can find what parts of your query are taking the longest and optimize them.
Things that jump out to me:
1) You may need to reduce your joins, or optimize them
2) You might need some indexes
3) You have an OR statement in your where clause, which may affect the cache query plan, I have seen an Or cause issues with query caching in tsql. Not sure if that would affect mysql. https://dev.mysql.com/doc/refman/5.1/en/query-cache.html
edit: formatting, found out the photo table join was necessary for the data that was being selected

I'd write the query like this. (I'd also qualify the references to all columns in the query, username, email, last_activity_date). I'd also be doing this as a prepared statement with bind placeholders, rather than including the $user_id variable into the SQL text. We're assuming that the contents of $user_id is known to be "safe", or has been properly escaped, to avoid SQL injection.
SELECT username
, user.id AS uid
, email
, ui.gender
, ui.country
, ui.birthday
, IFNULL(( last_activity_date >= NOW() - INTERVAL 1 HOUR),0) AS is_online
, p.thumb_url
, sr.id AS sr_id
, ul.id IS NOT NULL AS st_love
, TIMESTAMPDIFF(YEAR,ui.birthday,DATE(NOW())) AS age
FROM (
( SELECT frf.id
, frf.to_user_id AS user_id
FROM friend_request frf
WHERE frf.to_user_id <> '$user_id'
AND frf.from_user_id = '$user_id'
AND frf.status = '1'
ORDER BY frf.id DESC
LIMIT 30
)
UNION ALL
( SELECT frt.id
, frt.from_user_id AS user_id
FROM friend_request frt
WHERE frt.from_user_id <> '$user_id'
AND frt.to_user_id = '$user_id'
AND frt.status = '1'
ORDER BY frt.id DESC
LIMIT 30
)
ORDER BY id DESC
LIMIT 30
) sr
JOIN user ON user.id = sr.user_id
JOIN user_info ui ON ui.user_id = user.id
JOIN photo p ON p.id = ui.main_photo
LEFT
JOIN user_love ul
ON ul.to_user_id = user.id
AND ul.from_user_id = '$user_id'
ORDER BY sr.id DESC
LIMIT 30
NOTES:
The inline view sr is intended to get at most 30 rows from friend_request, ordered by id descending (like the original query.)
It looks like the original query is intending to find rows where the specified $user_id is in either the from_ or to_ column.
Older versions of MySQL can generated some pretty obnoxious execution plans for queries involving OR predicates for JOIN operations. The usual workaround for that is to use a UNION ALL or UNION operation of the return from two separate SELECT statements... each SELECT can be optimized to use an appropriate index.
Once we have that resultset from sr, the rest of the query is pretty straightforward.

Get response time per day of week

I have 3 tables:
CREATE TABLE `ticket` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`sid` varchar(50) NOT NULL,
`open_date` datetime NOT NULL,
PRIMARY KEY (`tid`),
KEY `sid` (`sid`,`open_date`),
KEY `open_date` (`open_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `ticket_reply` (
`rid` int(11) NOT NULL AUTO_INCREMENT,
`tid` int(11) NOT NULL,
`reply_date` datetime NOT NULL,
PRIMARY KEY (`rid`),
KEY `tid` (`tid`,`reply_date`),
KEY `reply_date` (`reply_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `subscription` (
`sid` varchar(50) NOT NULL,
`response_time` int(11) NOT NULL DEFAULT '24',
PRIMARY KEY (`sid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I'm trying to get the sum of response times the first ticket reply is from when the ticket was opened and group it by DAYNAME (maybe by MONTH also). Currently I have this SQL:
SELECT
t.tid,
DAYNAME(t.open_date) AS day_opened,
SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min,
SUM(s.response_time * 60) AS response_time_min
FROM ticket t
INNER JOIN ticket_reply tr ON tr.tid = t.tid
INNER JOIN subscription s ON s.sid = t.sid
GROUP BY
t.tid #group by tid as ticket_reply may return many
ORDER BY t.open_date DESC;
So first challenge I have is getting the first ticket_reply row which I solved by GROUP BY, I tried to get a subquery in the join but it was still returning a row per ticket_reply row.
So now I want to start grouping by DAYNAME and maybe MONTH but if I add it to the GROUP BY it doesn't group:
GROUP BY
t.tid,
DAYNAME(t.open_date)
Have tried DAYNAME before tid but that didn't make any difference.
So I have a couple questions, is there a better way to get the first row in ticket_reply and then group by the DAYNAME? I have a feeling getting the first row in a subquery may fix the grouping.

It is grouping, but because you have t.tid in the GROUP BY clause, and the tid column is unique in the ticket table, multiple rows from ticket are not getting collapsed, each is going to be on its own row.
It's not exactly clear what result you want to return.
(The SUM(s.response_time) expression in the SELECT list seems a bit odd, given that you could be matching multiple rows from ticket_reply.)
Given your existing statement, it looks like you might want to use an inline view to return the "earliest" reply_date for each ticket, in place of the reference to the ticket_reply table.
JOIN /*ticket_reply*/
( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr ON tr.tid = t.tid
(Unfortunately, materializing the inline view (populating and accessing an intermediate "derived table") can be the source of a performance issue.)
As another option, rather than performing a JOIN operation, you could consider using a correlated subquery in the SELECT list. That is, in place of the reference to tr.reply_date, you could do something like:
(SELECT MIN(r.reply_date) FROM ticket_reply r WHERE r.tid = t.tid)
and remove the JOIN to the ticket_reply table.
But, repeated execution of that subquery (once for each row returned), can also be a performance issue for large sets.
But the "big" question is whether you need to add up s.response_time for each occurrence of a matching ticket_reply (as your current query is doing), or whether you just need to include the s.response_time once for each ticket?
That is, if there are three ticket_reply for a given ticket, do we need to "triple" the value of response_time that we add to the row?
If you need to include the response_time in the total for each `ticket_reply, then this:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 * tr.cnt_replies AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
, COUNT(1) AS cnt_replies
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
If you only need to include the response_time in the total once for each ticket, remove the references to cnt_replies:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
To GROUP BY month, just change the first expression in the SELECT list, and the reference in the GROUP BY clause.

How to return an ID for the row that has MIN/MAX value within a group?

SELECT
MAX(`client_id`) `client_id`
FROM
`phrases`
WHERE
`language_id` = 1 AND
`client_id` = 1 OR
`client_id` IS NULL
GROUP BY
`language_phrase_id`
How do I get the id for the row that holds MAX(`client_id`) value?
I need this in the context of derived table, e.g.
SELECT
`p2`.`phrase`
FROM
(SELECT `language_phrase_id`, MAX(`client_id`) `client_id` FROM `phrases` WHERE `language_id` = 1 AND `client_id` = 1 OR `client_id` IS NULL GROUP BY `language_phrase_id`) `p1`
INNER JOIN
`phrases` `p2`
ON
`p2`.`language_id` = 1 AND
`p1`.`language_phrase_id` = `p2`.`language_phrase_id` AND
`p1`.`client_id` = `p2`.`client_id`;

Use window functions to find the max for each group and then a where clause to select the row with the maximum:
select p.*
from (SELECT p.*, max(client_id) partition by (language_phrase_id) as maxci
from phrases p
WHERE (`language_id` = 1 AND `client_id`= 1) or
`client_id` IS NULL
) p
where client_id = maxci
I also added parentheses to clarify your where statement. When mixing and and or, I always use parentheses to avoid possible confusion and mistakes.
Now that you've added the mysql tag to your statement, this won't work. So, here is a MySQL-specific solution:
select language_phrase_id,
substring_index(group_concat(id order by client_id desc), ',', 1) as max_id
from phrases
group by phrases p
Note that this if id will get converted to a character string in this process. If it has a different type, you can convert it back.

I had a bit of trouble understanding the requirements, but this seems to be what you're looking for.
Not the most beautiful SQL and I'm sure it can be simplified, but a starting point;
SELECT p1.id, p1.phrase
FROM phrases p1
LEFT JOIN `phrases` p2
ON p1.language_id=p2.language_id
AND p1.language_phrase_id=p2.language_phrase_id
AND p1.client_id IS NULL and p2.client_id = 1
WHERE p2.id IS NULL AND p1.language_id=1
AND (p1.client_id=1 or p1.client_id IS NULL)
GROUP BY p1.language_phrase_id
An SQLfiddle for testing.

mysql having... > avg() doesn't work as expected

I've created two views to help calculate user_diary_number and then select users of whom diary numbers > average of total user's user_diary_number.
two views are like below:
create view user_diary_number as
(
select user_id,count( distinct diary_id ) as diary_num
from user_diary
group by user_id
);
and second using having and avg:
create view hw_diary as
(
select u.user_id, u.realname, ud.diary_num, school.school_name
from (user as u cross join user_diary_number as ud on u.user_id = ud.user_id )cross join school on u.school_id = school.school_id
having diary_num > avg(diary_num)
);
What the problem is now, the second view only have 1 row of result. and absolutely, we have more than 1 users whose diary number > average diary_num. Indeed, I have 251 diaries in total and 103 users. Some of users have 9, 4, 5 diaries.
But the result only comes in only 1 user who have 3 diaries.
my relative tables are:
CREATE TABLE IF NOT EXISTS `school` (
`school_id` int(11) NOT NULL,
`school_name` varchar(45) NOT NULL,
`location` varchar(45) NOT NULL,
`master` varchar(45) NOT NULL,
`numbers_of_student` int(11) NOT NULL,
PRIMARY KEY (`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `user_diary` (
`diary_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`title` varchar(45) NOT NULL,
`content` varchar(255) NOT NULL,
`addtime` DATETIME NOT NULL,
PRIMARY KEY (`diary_id`,`user_id`),
KEY `fk_diary_user_id_idx` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Is it any problems with cross join? or something else?
thanks a lot!

You can't use avg that way. In my personal movie database,
select * from movie having year > avg(year);
produces nothing, and
select * from movie having year > (select avg (year) from movie);
produces the expected result.

You must calculate the average in a separate subquery.
Something like:
select ...
from ...
group by ...
having diary_num > (
select avg(diary_num)
from ...)
You can fill in the blanks with what makes sense

Something like this should return the resultset you are looking for:
SELECT u.user_id
, u.realname
, c.diary_num
, s.school_name
-- , a.diary_avg
FROM ( SELECT d.user_id
, COUNT(DISTINCT d.diary_id) AS diary_num
FROM user_diary d
) c
JOIN user u
ON u.user_id = c.user_id
JOIN school s
ON s.school_id = u.school_id
JOIN ( SELECT AVG(v.diary_num) AS diary_avg
FROM ( SELECT t.user_id
, COUNT(DISTINCT t.diary_id) AS diary_num
FROM user_diary t
) v
) a
ON a.diary_avg < c.diary_num
ORDER BY 1
The inline view aliased as c gets us the diary_num (count) for each user.
The inline view aliased as a gets us the average of all the diary_num for all users. That is getting us an "average" of the counts, which is what it looks like your original query was intending to do.
As an alternative, we could get the "average" number of diaries per user as ... the total count of all diaries divided by the total count of all users. To do that, replace that inline view aliased as a with something like this:
( SELECT COUNT(DISTINCT t.diary_id)
/ NULLIF(COUNT(DISTINCT v.user_id),0) AS diary_avg
FROM user v
LEFT
JOIN user_diary t
ON t.user_id = v.user_id
) a
This yields slightly different results, as its a calculation on total counts, rather than an average of a calculation.
NOTE
The CROSS keyword has no influence on the MySQL optimizer.
We do typically include the CROSS keyword as documentation for future reviewers. It indicates that we have purposefully omitted the usual ON clause. (As a reviwer, when we see a JOIN without an ON clause, our minds race to a "possible unintended Cartesian product"... the author's inclusion of the CROSS keyword alerts us (the reviewer) that the omission of the ON clause was purposeful.
But the MySQL optimizer doesn't care one whit whether the CROSS keyword is included or omitted.
One more question: Does MySQL support for "View's SELECT contains a subquery in the FROM clause"?
A: Really old versions (3.x ?) of MySQL did not support subqueries. But certainly, MySQL 5.1 and later do support subqueries.
To answer your question, yes, a SELECT statement can be used as an inline view as a rowsource for another query, e.g.
SELECT v.*
FROM (
SELECT 1 AS foo
) v

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL - add clauses to left join - mysql

Related

Optimize table to avoid using temporary and using filesort

How to improve my friend list MySQL query?

Get response time per day of week

How to return an ID for the row that has MIN/MAX value within a group?

mysql having... > avg() doesn't work as expected

Categories

Resources