MySQL is losing rows on a join - mysql

I have two tables, and I'm trying to join them together in a specific way. The results I'm looking for would be:
site statusname total
2 Follow-Up 0
2 Off Study 0
2 Screening 1
2 Treatment 0
1 Follow-Up 0
1 Off Study 0
1 Screening 2
1 Treatment 0
However, this is what's being returned:
site statusname total
1 Follow-Up 0
1 Off Study 0
1 Screening 2
2 Screening 1
1 Treatment 0
My actual query (the one that returns the wrong results) looks like:
SELECT
sitestatus.site AS site,
sitestatus.statusname AS statusname,
count(participant.id) AS total
FROM
(SELECT DISTINCT
participant.`site` AS site,
participant_status.`name` AS statusname,
participant_status.`id` AS status
FROM
participant_status
CROSS JOIN
participant) AS sitestatus
LEFT JOIN
participant
ON
participant.`site` = sitestatus.`site` AND
participant.`status` = sitestatus.`status`
GROUP BY
sitestatus.`statusname`,
participant.`site`
However, if I make a slight (but unacceptable) modification, adding a WHERE clause to the subselect and using a UNION, I get my desired results. Here's the query:
SELECT
sitestatus.site AS site,
sitestatus.statusname AS statusname,
count(participant.id) AS total
FROM
(SELECT DISTINCT
participant.`site` AS site,
participant_status.`name` AS statusname,
participant_status.`id` AS status
FROM
participant_status
CROSS JOIN
participant
WHERE site=1) AS sitestatus
LEFT JOIN
participant
ON
participant.`site` = sitestatus.`site` AND
participant.`status` = sitestatus.`status`
GROUP BY
sitestatus.`statusname`,
participant.`site`
UNION
SELECT
sitestatus.site AS site,
sitestatus.statusname AS statusname,
count(participant.id) AS total
FROM
(SELECT DISTINCT
participant.`site` AS site,
participant_status.`name` AS statusname,
participant_status.`id` AS status
FROM
participant_status
CROSS JOIN
participant
WHERE site=2) AS sitestatus
LEFT JOIN
participant
ON
participant.`site` = sitestatus.`site` AND
participant.`status` = sitestatus.`status`
GROUP BY
sitestatus.`statusname`,
participant.`site`;
I cannot figure out where my missing rows are going.
Here are the relevant schemas:
CREATE TABLE `participant` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`site` int(10) unsigned NOT NULL,
`status` int(10) unsigned NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
)
and
CREATE TABLE `participant_status` (
`id` int(10) unsigned NOT NULL,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Thanks for any help you can provide.
(EDIT: Now using CROSS JOIN as suggested by Tim.)

The UNION operator has a default behavior of removing duplicate records which occur in both result sets which are being aggregated. If you want to retain all records from both of your queries, you should use the UNION ALL operator:
query1
UNION ALL
query2
Here is my attempt at what a correct approach to this query might be:
SELECT t2.site, t2.name AS statusname, t1.total
FROM
(
SELECT site, status, COUNT(*) AS total
FROM participant
GROUP BY site, status
) t1
INNER JOIN
(
(SELECT DISTINCT site FROM participant)
CROSS JOIN
participant_status
) t2
ON t1.site = t2.site AND t1.status = t2.id

With the help of #Tim, I was able to arrive at an answer:
SELECT t2.site, t2.statusname AS statusname, COALESCE(t1.total,0) AS total
FROM
(
SELECT site, status, COUNT(*) AS total
FROM participant
GROUP BY site, status
) AS t1
RIGHT JOIN
(
SELECT DISTINCT participant_status.id AS status, participant_status.name AS statusname, participant.site FROM participant
CROSS JOIN
participant_status
ORDER BY status, site
) AS t2
ON t1.site = t2.site AND t1.status = t2.status

Related

JOIN query taking long time and creating issue "converting HEAP to MyISAM

My query like below. here I used join query to take data. can u pls suggest how can I solve "converting HEAP to MyISAM" issue.
Can I use subquery here to update it? pls suggest how can I.
Here I have joined users table to check user is exist or not. can I refine it without join so that "converting HEAP to MyISAM" can solve.
Oh one more sometimes I will not check with specific user_id. like here I have added user_id = 16082
SELECT `user_point_logs`.`id`,
`user_point_logs`.`user_id`,
`user_point_logs`.`point_get_id`,
`user_point_logs`.`point`,
`user_point_logs`.`expire_date`,
`user_point_logs`.`point` AS `sum_point`,
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0, sum(`user_point_used_logs`.`point`)) AS `minus`
FROM `user_point_logs`
JOIN `users` ON ( `users`.`id` = `user_point_logs`.`user_id` )
LEFT JOIN (SELECT *
FROM user_point_used_logs
WHERE user_point_log_id NOT IN (
SELECT DISTINCT return_id
FROM user_point_logs
WHERE return_id IS NOT NULL
AND user_id = 16082
)
)
AS user_point_used_logs
ON ( `user_point_logs`.`id` = `user_point_used_logs`.`user_point_log_used_id` )
WHERE expire_date >= 1563980400
AND `user_point_logs`.`point` >= 0
AND `users`.`id` IS NOT NULL
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
AND `user_point_logs`.`user_id` = '16082'
GROUP BY `user_point_logs`.`id`
ORDER BY `user_point_logs`.`expire_date` ASC
DB FIDDLE HERE WITH STRUCTURE
Kindly try this, If it works... will optimize further by adding composite index.
SELECT
upl.id,
upl.user_id,
upl.point_get_id,
upl.point,
upl.expire_date,
upl.point AS sum_point,
coalesce(SUM(upl.point),0) AS minus -- changed from complex to readable
FROM user_point_logs upl
JOIN users u ON upl.user_id = u.id
LEFT JOIN (select supul.user_point_log_used_id from user_point_used_logs supul
left join user_point_logs supl on supul.user_point_log_id=supl.return_id and supl.return_id is null and supl.user_id = 16082) AS upul
ON upl.id=upul.user_point_log_used_id
WHERE
upl.user_id = 16082 and coalesce(upl.return_id,0)= 0
and upl.expire_date >= 1563980400 -- tip: if its unix timestamp change the datatype and if possible use range between dates
#AND upl.point >= 0 -- since its NN by default removing this condition
#AND u.id IS NOT NULL -- removed since the inner join matches not null
GROUP BY upl.id
ORDER BY upl.expire_date ASC;
Edit:
Try adding index in the column return_id on the table user_point_logs.
Since this column is used in join on derived query.
Or use composite index with user_id and return_id
Indexes:
user_point_logs: (user_id, expire_date)
user_point_logs: (user_id, return_id)
OR is hard to optimize. Decide on only one way to say whatever is being said here, then get rid of the OR:
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
DISTINCT is redundant:
NOT IN ( SELECT DISTINCT ... )
Change
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0,
sum(`user_point_used_logs`.`point`)) AS `minus`
to
COALESCE( ( SELECT SUM(point) FROM user_point_used_logs ... ), 0) AS minus
and toss LEFT JOIN (SELECT * FROM user_point_used_logs ... )
Since a PRIMARY KEY is a key, the second of these is redundant and can be DROPped:
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`) USING BTREE;
After all that, we may need another pass to further simplify and optimize it.

Get response time per day of week

I have 3 tables:
CREATE TABLE `ticket` (
`tid` int(11) NOT NULL AUTO_INCREMENT,
`sid` varchar(50) NOT NULL,
`open_date` datetime NOT NULL,
PRIMARY KEY (`tid`),
KEY `sid` (`sid`,`open_date`),
KEY `open_date` (`open_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `ticket_reply` (
`rid` int(11) NOT NULL AUTO_INCREMENT,
`tid` int(11) NOT NULL,
`reply_date` datetime NOT NULL,
PRIMARY KEY (`rid`),
KEY `tid` (`tid`,`reply_date`),
KEY `reply_date` (`reply_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `subscription` (
`sid` varchar(50) NOT NULL,
`response_time` int(11) NOT NULL DEFAULT '24',
PRIMARY KEY (`sid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I'm trying to get the sum of response times the first ticket reply is from when the ticket was opened and group it by DAYNAME (maybe by MONTH also). Currently I have this SQL:
SELECT
t.tid,
DAYNAME(t.open_date) AS day_opened,
SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min,
SUM(s.response_time * 60) AS response_time_min
FROM ticket t
INNER JOIN ticket_reply tr ON tr.tid = t.tid
INNER JOIN subscription s ON s.sid = t.sid
GROUP BY
t.tid #group by tid as ticket_reply may return many
ORDER BY t.open_date DESC;
So first challenge I have is getting the first ticket_reply row which I solved by GROUP BY, I tried to get a subquery in the join but it was still returning a row per ticket_reply row.
So now I want to start grouping by DAYNAME and maybe MONTH but if I add it to the GROUP BY it doesn't group:
GROUP BY
t.tid,
DAYNAME(t.open_date)
Have tried DAYNAME before tid but that didn't make any difference.
So I have a couple questions, is there a better way to get the first row in ticket_reply and then group by the DAYNAME? I have a feeling getting the first row in a subquery may fix the grouping.
It is grouping, but because you have t.tid in the GROUP BY clause, and the tid column is unique in the ticket table, multiple rows from ticket are not getting collapsed, each is going to be on its own row.
It's not exactly clear what result you want to return.
(The SUM(s.response_time) expression in the SELECT list seems a bit odd, given that you could be matching multiple rows from ticket_reply.)
Given your existing statement, it looks like you might want to use an inline view to return the "earliest" reply_date for each ticket, in place of the reference to the ticket_reply table.
JOIN /*ticket_reply*/
( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr ON tr.tid = t.tid
(Unfortunately, materializing the inline view (populating and accessing an intermediate "derived table") can be the source of a performance issue.)
As another option, rather than performing a JOIN operation, you could consider using a correlated subquery in the SELECT list. That is, in place of the reference to tr.reply_date, you could do something like:
(SELECT MIN(r.reply_date) FROM ticket_reply r WHERE r.tid = t.tid)
and remove the JOIN to the ticket_reply table.
But, repeated execution of that subquery (once for each row returned), can also be a performance issue for large sets.
But the "big" question is whether you need to add up s.response_time for each occurrence of a matching ticket_reply (as your current query is doing), or whether you just need to include the s.response_time once for each ticket?
That is, if there are three ticket_reply for a given ticket, do we need to "triple" the value of response_time that we add to the row?
If you need to include the response_time in the total for each `ticket_reply, then this:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 * tr.cnt_replies AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
, COUNT(1) AS cnt_replies
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
If you only need to include the response_time in the total once for each ticket, remove the references to cnt_replies:
SELECT DAYNAME(t.open_date) AS day_opened
, SUM(TIMESTAMPDIFF(MINUTE, t.open_date, tr.reply_date)) AS num_min
, SUM(s.response_time) * 60 AS response_time_min
FROM ticket t
JOIN ( SELECT r.tid
, MIN(r.reply_date) AS reply_date
FROM ticket_reply r
GROUP BY r.tid
) tr
ON tr.tid = t.tid
JOIN subscription s
ON s.sid = t.sid
GROUP
BY day_opened
To GROUP BY month, just change the first expression in the SELECT list, and the reference in the GROUP BY clause.

Fixing SQL Query so it will become more Efficient

I've got 3 tables:
mobile_users - with id,phone_type,...
2+3. iphone_purchases AND android_purchases - with id,status,user_id,..
I am trying to get all of the users who made 2 or more purchases.
successful purchase is identified by status > 0.
Also I am tring to get the total amount of users in the mobile_users table in the same query.
this is the query I came up with:
SELECT COUNT(*) AS `users`,
( SELECT COUNT(*)
FROM `mobile_users`
) AS `total`
FROM `mobile_users`
WHERE `mobile_users`.`phone_type` = 'iphone'
AND ( SELECT COUNT(*)
FROM ( SELECT `status`,
`user_id`
FROM `iphone_purchases`
UNION
SELECT `status`,
`user_id`
FROM `android_purchases`
) AS `purchase_list`
WHERE `purchase_list`.`status` > 0
AND `purchase_list`.`user_id` = `mobile_users`.`id`
) >= 2
It's very slow, and I have to find a way to improve it.
Any help would be appreciated!
Edit:
Also you should take in consideration that i'm building this query with sub-queries in PHP.
I'm building it with more conditions on the WHERE statment.
Your query is just returning counts of users, not each user.
The following restructures your query. It counts the number of purchases for iphones and androids separately, and then combines them using left outer join. The where clause simply combines the counts:
select mu.*, i.cnt as iphones, a.cnt as androids
from mobile_users mu left outer join
(SELECT `user_id`, count(*) as cnt
FROM `iphone_purchases`
where `status` > 0
group by user_id
) i
on i.user_id = mu.id left outer join
(SELECT `user_id`, count(*) as cnt
FROM `android_purchases`
where `status` > 0
group by user_id
) a
on a.user_id = mu.id
where coalesce(i.cnt, 0) + coalesce(a.cnt, 0) >= 2;

How to return an ID for the row that has MIN/MAX value within a group?

SELECT
MAX(`client_id`) `client_id`
FROM
`phrases`
WHERE
`language_id` = 1 AND
`client_id` = 1 OR
`client_id` IS NULL
GROUP BY
`language_phrase_id`
How do I get the id for the row that holds MAX(`client_id`) value?
I need this in the context of derived table, e.g.
SELECT
`p2`.`phrase`
FROM
(SELECT `language_phrase_id`, MAX(`client_id`) `client_id` FROM `phrases` WHERE `language_id` = 1 AND `client_id` = 1 OR `client_id` IS NULL GROUP BY `language_phrase_id`) `p1`
INNER JOIN
`phrases` `p2`
ON
`p2`.`language_id` = 1 AND
`p1`.`language_phrase_id` = `p2`.`language_phrase_id` AND
`p1`.`client_id` = `p2`.`client_id`;
Use window functions to find the max for each group and then a where clause to select the row with the maximum:
select p.*
from (SELECT p.*, max(client_id) partition by (language_phrase_id) as maxci
from phrases p
WHERE (`language_id` = 1 AND `client_id`= 1) or
`client_id` IS NULL
) p
where client_id = maxci
I also added parentheses to clarify your where statement. When mixing and and or, I always use parentheses to avoid possible confusion and mistakes.
Now that you've added the mysql tag to your statement, this won't work. So, here is a MySQL-specific solution:
select language_phrase_id,
substring_index(group_concat(id order by client_id desc), ',', 1) as max_id
from phrases
group by phrases p
Note that this if id will get converted to a character string in this process. If it has a different type, you can convert it back.
I had a bit of trouble understanding the requirements, but this seems to be what you're looking for.
Not the most beautiful SQL and I'm sure it can be simplified, but a starting point;
SELECT p1.id, p1.phrase
FROM phrases p1
LEFT JOIN `phrases` p2
ON p1.language_id=p2.language_id
AND p1.language_phrase_id=p2.language_phrase_id
AND p1.client_id IS NULL and p2.client_id = 1
WHERE p2.id IS NULL AND p1.language_id=1
AND (p1.client_id=1 or p1.client_id IS NULL)
GROUP BY p1.language_phrase_id
An SQLfiddle for testing.

mysql having... > avg() doesn't work as expected

I've created two views to help calculate user_diary_number and then select users of whom diary numbers > average of total user's user_diary_number.
two views are like below:
create view user_diary_number as
(
select user_id,count( distinct diary_id ) as diary_num
from user_diary
group by user_id
);
and second using having and avg:
create view hw_diary as
(
select u.user_id, u.realname, ud.diary_num, school.school_name
from (user as u cross join user_diary_number as ud on u.user_id = ud.user_id )cross join school on u.school_id = school.school_id
having diary_num > avg(diary_num)
);
What the problem is now, the second view only have 1 row of result. and absolutely, we have more than 1 users whose diary number > average diary_num. Indeed, I have 251 diaries in total and 103 users. Some of users have 9, 4, 5 diaries.
But the result only comes in only 1 user who have 3 diaries.
my relative tables are:
CREATE TABLE IF NOT EXISTS `school` (
`school_id` int(11) NOT NULL,
`school_name` varchar(45) NOT NULL,
`location` varchar(45) NOT NULL,
`master` varchar(45) NOT NULL,
`numbers_of_student` int(11) NOT NULL,
PRIMARY KEY (`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `user_diary` (
`diary_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`title` varchar(45) NOT NULL,
`content` varchar(255) NOT NULL,
`addtime` DATETIME NOT NULL,
PRIMARY KEY (`diary_id`,`user_id`),
KEY `fk_diary_user_id_idx` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
Is it any problems with cross join? or something else?
thanks a lot!
You can't use avg that way. In my personal movie database,
select * from movie having year > avg(year);
produces nothing, and
select * from movie having year > (select avg (year) from movie);
produces the expected result.
You must calculate the average in a separate subquery.
Something like:
select ...
from ...
group by ...
having diary_num > (
select avg(diary_num)
from ...)
You can fill in the blanks with what makes sense
Something like this should return the resultset you are looking for:
SELECT u.user_id
, u.realname
, c.diary_num
, s.school_name
-- , a.diary_avg
FROM ( SELECT d.user_id
, COUNT(DISTINCT d.diary_id) AS diary_num
FROM user_diary d
) c
JOIN user u
ON u.user_id = c.user_id
JOIN school s
ON s.school_id = u.school_id
JOIN ( SELECT AVG(v.diary_num) AS diary_avg
FROM ( SELECT t.user_id
, COUNT(DISTINCT t.diary_id) AS diary_num
FROM user_diary t
) v
) a
ON a.diary_avg < c.diary_num
ORDER BY 1
The inline view aliased as c gets us the diary_num (count) for each user.
The inline view aliased as a gets us the average of all the diary_num for all users. That is getting us an "average" of the counts, which is what it looks like your original query was intending to do.
As an alternative, we could get the "average" number of diaries per user as ... the total count of all diaries divided by the total count of all users. To do that, replace that inline view aliased as a with something like this:
( SELECT COUNT(DISTINCT t.diary_id)
/ NULLIF(COUNT(DISTINCT v.user_id),0) AS diary_avg
FROM user v
LEFT
JOIN user_diary t
ON t.user_id = v.user_id
) a
This yields slightly different results, as its a calculation on total counts, rather than an average of a calculation.
NOTE
The CROSS keyword has no influence on the MySQL optimizer.
We do typically include the CROSS keyword as documentation for future reviewers. It indicates that we have purposefully omitted the usual ON clause. (As a reviwer, when we see a JOIN without an ON clause, our minds race to a "possible unintended Cartesian product"... the author's inclusion of the CROSS keyword alerts us (the reviewer) that the omission of the ON clause was purposeful.
But the MySQL optimizer doesn't care one whit whether the CROSS keyword is included or omitted.
One more question: Does MySQL support for "View's SELECT contains a subquery in the FROM clause"?
A: Really old versions (3.x ?) of MySQL did not support subqueries. But certainly, MySQL 5.1 and later do support subqueries.
To answer your question, yes, a SELECT statement can be used as an inline view as a rowsource for another query, e.g.
SELECT v.*
FROM (
SELECT 1 AS foo
) v