Get response time per day of week - mysql

I have 3 tables:
CREATE TABLE `ticket` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`sid` varchar(50) NOT NULL,
`open_date` datetime NOT NULL,
PRIMARY KEY (`tid`),
KEY `sid` (`sid`,`open_date`),
KEY `open_date` (`open_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `ticket_reply` (
`rid` int(11) NOT NULL AUTO_INCREMENT,
`tid` int(11) NOT NULL,
`reply_date` datetime NOT NULL,
PRIMARY KEY (`rid`),
KEY `tid` (`tid`,`reply_date`),
KEY `reply_date` (`reply_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `subscription` (
`sid` varchar(50) NOT NULL,
`response_time` int(11) NOT NULL DEFAULT '24',
PRIMARY KEY (`sid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I'm trying to get the sum of response times the first ticket reply is from when the ticket was opened and group it by DAYNAME (maybe by MONTH also). Currently I have this SQL:
SELECT
t.tid,
DAYNAME(t.open_date) AS day_opened,
SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min,
SUM(s.response_time * 60) AS response_time_min
FROM ticket t
INNER JOIN ticket_reply tr ON tr.tid = t.tid
INNER JOIN subscription s ON s.sid = t.sid
GROUP BY
t.tid #group by tid as ticket_reply may return many
ORDER BY t.open_date DESC;
So first challenge I have is getting the first ticket_reply row which I solved by GROUP BY, I tried to get a subquery in the join but it was still returning a row per ticket_reply row.
So now I want to start grouping by DAYNAME and maybe MONTH but if I add it to the GROUP BY it doesn't group:
GROUP BY
t.tid,
DAYNAME(t.open_date)
Have tried DAYNAME before tid but that didn't make any difference.
So I have a couple questions, is there a better way to get the first row in ticket_reply and then group by the DAYNAME? I have a feeling getting the first row in a subquery may fix the grouping.

It is grouping, but because you have t.tid in the GROUP BY clause, and the tid column is unique in the ticket table, multiple rows from ticket are not getting collapsed, each is going to be on its own row.
It's not exactly clear what result you want to return.
(The SUM(s.response_time) expression in the SELECT list seems a bit odd, given that you could be matching multiple rows from ticket_reply.)
Given your existing statement, it looks like you might want to use an inline view to return the "earliest" reply_date for each ticket, in place of the reference to the ticket_reply table.
JOIN /*ticket_reply*/
( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr ON tr.tid = t.tid
(Unfortunately, materializing the inline view (populating and accessing an intermediate "derived table") can be the source of a performance issue.)
As another option, rather than performing a JOIN operation, you could consider using a correlated subquery in the SELECT list. That is, in place of the reference to tr.reply_date, you could do something like:
(SELECT MIN(r.reply_date) FROM ticket_reply r WHERE r.tid = t.tid)
and remove the JOIN to the ticket_reply table.
But, repeated execution of that subquery (once for each row returned), can also be a performance issue for large sets.
But the "big" question is whether you need to add up s.response_time for each occurrence of a matching ticket_reply (as your current query is doing), or whether you just need to include the s.response_time once for each ticket?
That is, if there are three ticket_reply for a given ticket, do we need to "triple" the value of response_time that we add to the row?
If you need to include the response_time in the total for each `ticket_reply, then this:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 * tr.cnt_replies AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
, COUNT(1) AS cnt_replies
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
If you only need to include the response_time in the total once for each ticket, remove the references to cnt_replies:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
To GROUP BY month, just change the first expression in the SELECT list, and the reference in the GROUP BY clause.

Related

Trying to join 2 tables based on the most recent timestamp *before* a specific date

Here is my SQL statement, which seemed to function perfectly well before we created a new database. This approach seems to work just fine on another, similarly structured, pair of tables.
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(u.timestamp)
FROM tele2_usage
WHERE u.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon'
What I'm attempting to do here is join the details table with the usage table, the usage rows just contains the iccid of a sim card, a timestamp, and their current usage in bytes. What this should do is find the most recent usage record before the specified date (2022-01-08 09:30:00). This should give me a set of sims, each joined with it's most recent usage record before the specified time, however I usually get zero results on this particular combo of tables.
Specifically though, it does match any records where the date is the exact same as specified in the query, but not dates that are before or equal to the specified date. Can anybody help me with where I'm going wrong. This query worked fine in a previous database, we're rebuilding our systems and this has now appeared as an issue.
Thanks in advance for any help.
Edit
Here's some more information, that I hope will make the question make more sense. So here is an outline of the details table, I've removed some of the columns but this is at least illustrative of table.
CREATE TABLE `tele2_details` (
`iccid` VARCHAR(255) NOT NULL,
`msisdn` VARCHAR(255) NOT NULL,
`status` VARCHAR(255) NOT NULL,
`ratePlan` VARCHAR(255) NOT NULL,
`communicationPlan` VARCHAR(255) NOT NULL,
PRIMARY KEY (`iccid`)
)
COLLATE='utf8mb4_0900_ai_ci'
ENGINE=InnoDB
;
Then we also have a usages table, which stores sample of the data usage of sim cards, along with timestamps...
CREATE TABLE `tele2_usage` (
`id` INT NOT NULL AUTO_INCREMENT,
`iccid` VARCHAR(255) NOT NULL COLLATE 'latin1_swedish_ci',
`msisdn` VARCHAR(255) NOT NULL COLLATE 'latin1_swedish_ci',
`timestamp` DATETIME NOT NULL,
`ctd_data_usage` BIGINT NOT NULL,
`ctd_sms_usage` BIGINT NOT NULL,
`ctd_voice_usage` BIGINT NOT NULL,
`session_count` INT NOT NULL,
PRIMARY KEY (`id`)
)
COLLATE='utf8mb4_hr_0900_ai_ci'
ENGINE=InnoDB
AUTO_INCREMENT=10116319
;
The query I'm trying to create should return a set of sim details, joined with a usage record which is closest to, but not after, a particular time.
So if you look at the original query at the top, I'm trying to join the details onto the usage record which is **closest to, but not after 2022-01-08 09:30:00 **
I hope that makes sense.
Let's have a look at a particular sim
SELECT iccid, msisdn, status, ratePlan, communicationPlan FROM tele2_details WHERE iccid='xxxx203605100034xxxx'
results in 1 match
"xxxx203605100034xxxx" "xxxx9120012xxxx" "ACTIVATED" "Pay as use - Existing Business" "Data LTE"
And if I look in the usage table for that same sim I can see many records that should satisfy my conditions
SELECT id, iccid, TIMESTAMP, ctd_data_usage FROM tele2_usage WHERE iccid='xxxx203605100034xxxx' AND TIMESTAMP <= '2022-01-08 09:30:00'
results in
"10096279" "xxxx203605100034xxxx" "2022-01-08 09:01:00" "77517560"
"10092271" "xxxx203605100034xxxx" "2022-01-08 08:01:03" "77002733"
"10088263" "xxxx203605100034xxxx" "2022-01-08 07:01:11" "76270445"
"10084255" "xxxx203605100034xxxx" "2022-01-08 06:01:05" "76270445"
of which I would like to select the first record (with the 09:01 timestamp) for joining on to the details record. I can get that timestamp with the following query
SELECT MAX(timestamp)
FROM tele2_usage
WHERE TIMESTAMP <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s') AND iccid='xxxx203605100034xxxx'
which results in '2022-01-08 09:01:00', which is exactly what I want. So now I put it all together...
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(timestamp)
FROM tele2_usage
WHERE timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon' AND iccid='xxxx203605100034xxxx'
And i get nothing! I would expect to get back the details record joined with that particular sim, but actually I get zero results and I don't understand why.
Ideally I would remove the final AND for the iccid and I would expect to recieve a set of all the sims for that client with the usage from closest to but not after the specified date.
So with that explanation, does anyone know why I'm not getting any records? I have a similar table for another sim provider that structured exactly the same, with a details table and a usage table and this query works just fine on that table. I simply can't understand why this doesn't work.
Edit 2
#Serg suggested trying to alias the subquery (I think that's what it's called) which resulted in the following code...
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(u2.timestamp)
FROM tele2_usage AS u2
WHERE u2.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon'
Unfortunately this still resulted in zero results.
You should correlate the subquery:
SELECT *
FROM tele2_details AS d INNER JOIN tele2_usage AS u
ON d.iccid = u.iccid
AND u.timestamp = (
SELECT MAX(u2.timestamp)
FROM tele2_usage u2
WHERE d.iccid = u2.iccid AND u2.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE d.accountCustom1='Horizon';
Or, with a join to an aggregation query:
SELECT *
FROM tele2_details AS d
INNER JOIN tele2_usage AS u ON d.iccid = u.iccid
INNER JOIN (
SELECT iccid, MAX(timestamp) timestamp
FROM tele2_usage
WHERE timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
GROUP BY iccid
) m ON m.iccid = u.iccid AND m.timestamp = u.timestamp
WHERE d.accountCustom1='Horizon';

JOIN query taking long time and creating issue "converting HEAP to MyISAM

My query like below. here I used join query to take data. can u pls suggest how can I solve "converting HEAP to MyISAM" issue.
Can I use subquery here to update it? pls suggest how can I.
Here I have joined users table to check user is exist or not. can I refine it without join so that "converting HEAP to MyISAM" can solve.
Oh one more sometimes I will not check with specific user_id. like here I have added user_id = 16082
SELECT `user_point_logs`.`id`,
`user_point_logs`.`user_id`,
`user_point_logs`.`point_get_id`,
`user_point_logs`.`point`,
`user_point_logs`.`expire_date`,
`user_point_logs`.`point` AS `sum_point`,
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0, sum(`user_point_used_logs`.`point`)) AS `minus`
FROM `user_point_logs`
JOIN `users` ON ( `users`.`id` = `user_point_logs`.`user_id` )
LEFT JOIN (SELECT *
FROM user_point_used_logs
WHERE user_point_log_id NOT IN (
SELECT DISTINCT return_id
FROM user_point_logs
WHERE return_id IS NOT NULL
AND user_id = 16082
)
)
AS user_point_used_logs
ON ( `user_point_logs`.`id` = `user_point_used_logs`.`user_point_log_used_id` )
WHERE expire_date >= 1563980400
AND `user_point_logs`.`point` >= 0
AND `users`.`id` IS NOT NULL
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
AND `user_point_logs`.`user_id` = '16082'
GROUP BY `user_point_logs`.`id`
ORDER BY `user_point_logs`.`expire_date` ASC
DB FIDDLE HERE WITH STRUCTURE
Kindly try this, If it works... will optimize further by adding composite index.
SELECT
upl.id,
upl.user_id,
upl.point_get_id,
upl.point,
upl.expire_date,
upl.point AS sum_point,
coalesce(SUM(upl.point),0) AS minus -- changed from complex to readable
FROM user_point_logs upl
JOIN users u ON upl.user_id = u.id
LEFT JOIN (select supul.user_point_log_used_id from user_point_used_logs supul
left join user_point_logs supl on supul.user_point_log_id=supl.return_id and supl.return_id is null and supl.user_id = 16082) AS upul
ON upl.id=upul.user_point_log_used_id
WHERE
upl.user_id = 16082 and coalesce(upl.return_id,0)= 0
and upl.expire_date >= 1563980400 -- tip: if its unix timestamp change the datatype and if possible use range between dates
#AND upl.point >= 0 -- since its NN by default removing this condition
#AND u.id IS NOT NULL -- removed since the inner join matches not null
GROUP BY upl.id
ORDER BY upl.expire_date ASC;
Edit:
Try adding index in the column return_id on the table user_point_logs.
Since this column is used in join on derived query.
Or use composite index with user_id and return_id
Indexes:
user_point_logs: (user_id, expire_date)
user_point_logs: (user_id, return_id)
OR is hard to optimize. Decide on only one way to say whatever is being said here, then get rid of the OR:
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
DISTINCT is redundant:
NOT IN ( SELECT DISTINCT ... )
Change
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0,
sum(`user_point_used_logs`.`point`)) AS `minus`
to
COALESCE( ( SELECT SUM(point) FROM user_point_used_logs ... ), 0) AS minus
and toss LEFT JOIN (SELECT * FROM user_point_used_logs ... )
Since a PRIMARY KEY is a key, the second of these is redundant and can be DROPped:
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`) USING BTREE;
After all that, we may need another pass to further simplify and optimize it.

MySQL - add clauses to left join

I have a table called properties (p) and another table called certificates (c). There can be more than one certificate allocated against each property or no certificate at all. I need to produce a query that uses a join and only displays one certificate from the certificates table per property. The one certificate that is shown needs to be the one with the most recent expiry date. There is a field in the certificates table named 'certificate_expiry_date'. The simple join would be p.property_id = c.certificate_property but this currently outputs all certificates.
My Query Attempt
Here's my query so far;
SELECT DISTINCT t.tenancy_property, t.*, p.*, c.* FROM tenancy t
INNER JOIN property p
on t.tenancy_property = p.property_id
LEFT JOIN
(
SELECT *
FROM certificate
WHERE certificate_expiry_date > CURDATE()
ORDER BY certificate_expiry_date DESC
LIMIT 1
) c ON p.property_id = c.certificate_property
WHERE t.tenancy_type='1' AND p.property_mains_gas_supply='1' AND p.property_availability='2' ORDER BY t.tenancy_id DESC LIMIT {$startpoint} , {$per_page}
This query executes fine but doesn't seem to take into account the left join on the certificates table.
Table structure for table certificate
CREATE TABLE IF NOT EXISTS `certificate` (
`certificate_id` int(11) NOT NULL AUTO_INCREMENT,
`certificate_property` int(11) DEFAULT NULL,
`certificate_type` tinyint(4) DEFAULT NULL,
`certificate_reference` varchar(255) COLLATE utf8_bin DEFAULT NULL,
`certificate_start_date` date DEFAULT NULL,
`certificate_expiry_date` date DEFAULT NULL,
`certificate_notes` text COLLATE utf8_bin,
`certificate_renewal_instructed` tinyint(4) DEFAULT NULL,
`certificate_renewal_contractor` int(11) DEFAULT NULL,
PRIMARY KEY (`certificate_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin AUTO_INCREMENT=219 ;
If we only need to return one or two columns from the certificates table, we can sometimes use correlated subqueries in the SELECT list.
This approach has some performance implications for large tables; but for some use cases, with appropriate indexes available, this can be a workable approach.
SELECT p.id
, p.somecol
, ( SELECT c.col
FROM certificate c
WHERE c.property_id = p.id
ORDER BY c.date_added DESC, c.id DESC
LIMIT 1
) AS most_recent_cert_col
, ( SELECT c.date_added
FROM certificate c
WHERE c.property_id = p.id
ORDER BY c.date_added DESC, c.id DESC
LIMIT 1
) AS most_recent_cert_date_added
FROM property p
WHERE ...
ORDER BY ...
Updated answer with your updated information
Something like this?
(Note: This answer assumes that each property has at least one certificate, or else the sub-query qMostRecentExpire may fail)
select
p.property_id
, p.*
, ( select
c.certificate_id
from
certificates as c
where
c.certificate_property = p.property_id -- all the cert of this property
and c.certificate_expiry_date < CURDATE() -- cert has expired
order by c.certificate_expiry_date desc
limit 1 -- most recent one
) as qMostRecentExpire
from
properties as p
Updated answer after knowing that some properties may have no certificates
select
p.property_id
, p.*
, ( select
c.certificate_id
from
certificates as c
where
c.certificate_property = p.property_id -- all the cert of this property
and c.certificate_expiry_date < CURDATE() -- cert has expired
order by c.certificate_expiry_date desc
limit 1 -- most recent one
) as qMostRecentExpire
from
properties as p
, certificates as c -- inner join : properties that
where -- has not cert will be dropped
p.property_id = c.certificate_property

MySQL is losing rows on a join

I have two tables, and I'm trying to join them together in a specific way. The results I'm looking for would be:
site statusname total
2 Follow-Up 0
2 Off Study 0
2 Screening 1
2 Treatment 0
1 Follow-Up 0
1 Off Study 0
1 Screening 2
1 Treatment 0
However, this is what's being returned:
site statusname total
1 Follow-Up 0
1 Off Study 0
1 Screening 2
2 Screening 1
1 Treatment 0
My actual query (the one that returns the wrong results) looks like:
SELECT
sitestatus.site AS site,
sitestatus.statusname AS statusname,
count(participant.id) AS total
FROM
(SELECT DISTINCT
participant.`site` AS site,
participant_status.`name` AS statusname,
participant_status.`id` AS status
FROM
participant_status
CROSS JOIN
participant) AS sitestatus
LEFT JOIN
participant
ON
participant.`site` = sitestatus.`site` AND
participant.`status` = sitestatus.`status`
GROUP BY
sitestatus.`statusname`,
participant.`site`
However, if I make a slight (but unacceptable) modification, adding a WHERE clause to the subselect and using a UNION, I get my desired results. Here's the query:
SELECT
sitestatus.site AS site,
sitestatus.statusname AS statusname,
count(participant.id) AS total
FROM
(SELECT DISTINCT
participant.`site` AS site,
participant_status.`name` AS statusname,
participant_status.`id` AS status
FROM
participant_status
CROSS JOIN
participant
WHERE site=1) AS sitestatus
LEFT JOIN
participant
ON
participant.`site` = sitestatus.`site` AND
participant.`status` = sitestatus.`status`
GROUP BY
sitestatus.`statusname`,
participant.`site`
UNION
SELECT
sitestatus.site AS site,
sitestatus.statusname AS statusname,
count(participant.id) AS total
FROM
(SELECT DISTINCT
participant.`site` AS site,
participant_status.`name` AS statusname,
participant_status.`id` AS status
FROM
participant_status
CROSS JOIN
participant
WHERE site=2) AS sitestatus
LEFT JOIN
participant
ON
participant.`site` = sitestatus.`site` AND
participant.`status` = sitestatus.`status`
GROUP BY
sitestatus.`statusname`,
participant.`site`;
I cannot figure out where my missing rows are going.
Here are the relevant schemas:
CREATE TABLE `participant` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site` int(10) unsigned NOT NULL,
`status` int(10) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
)
and
CREATE TABLE `participant_status` (
`id` int(10) unsigned NOT NULL,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Thanks for any help you can provide.
(EDIT: Now using CROSS JOIN as suggested by Tim.)
The UNION operator has a default behavior of removing duplicate records which occur in both result sets which are being aggregated. If you want to retain all records from both of your queries, you should use the UNION ALL operator:
query1
UNION ALL
query2
Here is my attempt at what a correct approach to this query might be:
SELECT t2.site, t2.name AS statusname, t1.total
FROM
(
SELECT site, status, COUNT(*) AS total
FROM participant
GROUP BY site, status
) t1
INNER JOIN
(
(SELECT DISTINCT site FROM participant)
CROSS JOIN
participant_status
) t2
ON t1.site = t2.site AND t1.status = t2.id
With the help of #Tim, I was able to arrive at an answer:
SELECT t2.site, t2.statusname AS statusname, COALESCE(t1.total,0) AS total
FROM
(
SELECT site, status, COUNT(*) AS total
FROM participant
GROUP BY site, status
) AS t1
RIGHT JOIN
(
SELECT DISTINCT participant_status.id AS status, participant_status.name AS statusname, participant.site FROM participant
CROSS JOIN
participant_status
ORDER BY status, site
) AS t2
ON t1.site = t2.site AND t1.status = t2.status

mysql having... > avg() doesn't work as expected

I've created two views to help calculate user_diary_number and then select users of whom diary numbers > average of total user's user_diary_number.
two views are like below:
create view user_diary_number as
(
select user_id,count( distinct diary_id ) as diary_num
from user_diary
group by user_id
);
and second using having and avg:
create view hw_diary as
(
select u.user_id, u.realname, ud.diary_num, school.school_name
from (user as u cross join user_diary_number as ud on u.user_id = ud.user_id )cross join school on u.school_id = school.school_id
having diary_num > avg(diary_num)
);
What the problem is now, the second view only have 1 row of result. and absolutely, we have more than 1 users whose diary number > average diary_num. Indeed, I have 251 diaries in total and 103 users. Some of users have 9, 4, 5 diaries.
But the result only comes in only 1 user who have 3 diaries.
my relative tables are:
CREATE TABLE IF NOT EXISTS `school` (
`school_id` int(11) NOT NULL,
`school_name` varchar(45) NOT NULL,
`location` varchar(45) NOT NULL,
`master` varchar(45) NOT NULL,
`numbers_of_student` int(11) NOT NULL,
PRIMARY KEY (`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `user_diary` (
`diary_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`title` varchar(45) NOT NULL,
`content` varchar(255) NOT NULL,
`addtime` DATETIME NOT NULL,
PRIMARY KEY (`diary_id`,`user_id`),
KEY `fk_diary_user_id_idx` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Is it any problems with cross join? or something else?
thanks a lot!
You can't use avg that way. In my personal movie database,
select * from movie having year > avg(year);
produces nothing, and
select * from movie having year > (select avg (year) from movie);
produces the expected result.
You must calculate the average in a separate subquery.
Something like:
select ...
from ...
group by ...
having diary_num > (
select avg(diary_num)
from ...)
You can fill in the blanks with what makes sense
Something like this should return the resultset you are looking for:
SELECT u.user_id
, u.realname
, c.diary_num
, s.school_name
-- , a.diary_avg
FROM ( SELECT d.user_id
, COUNT(DISTINCT d.diary_id) AS diary_num
FROM user_diary d
) c
JOIN user u
ON u.user_id = c.user_id
JOIN school s
ON s.school_id = u.school_id
JOIN ( SELECT AVG(v.diary_num) AS diary_avg
FROM ( SELECT t.user_id
, COUNT(DISTINCT t.diary_id) AS diary_num
FROM user_diary t
) v
) a
ON a.diary_avg < c.diary_num
ORDER BY 1
The inline view aliased as c gets us the diary_num (count) for each user.
The inline view aliased as a gets us the average of all the diary_num for all users. That is getting us an "average" of the counts, which is what it looks like your original query was intending to do.
As an alternative, we could get the "average" number of diaries per user as ... the total count of all diaries divided by the total count of all users. To do that, replace that inline view aliased as a with something like this:
( SELECT COUNT(DISTINCT t.diary_id)
/ NULLIF(COUNT(DISTINCT v.user_id),0) AS diary_avg
FROM user v
LEFT
JOIN user_diary t
ON t.user_id = v.user_id
) a
This yields slightly different results, as its a calculation on total counts, rather than an average of a calculation.
NOTE
The CROSS keyword has no influence on the MySQL optimizer.
We do typically include the CROSS keyword as documentation for future reviewers. It indicates that we have purposefully omitted the usual ON clause. (As a reviwer, when we see a JOIN without an ON clause, our minds race to a "possible unintended Cartesian product"... the author's inclusion of the CROSS keyword alerts us (the reviewer) that the omission of the ON clause was purposeful.
But the MySQL optimizer doesn't care one whit whether the CROSS keyword is included or omitted.
One more question: Does MySQL support for "View's SELECT contains a subquery in the FROM clause"?
A: Really old versions (3.x ?) of MySQL did not support subqueries. But certainly, MySQL 5.1 and later do support subqueries.
To answer your question, yes, a SELECT statement can be used as an inline view as a rowsource for another query, e.g.
SELECT v.*
FROM (
SELECT 1 AS foo
) v