Retrieving count from nested select statement - mysql

I have the following query, which works exactly as intended (thanks to the helpful people at stackoverflow). However, I realized that in addition to using the count to validate whether messages is <= 3 I also want to retrieve the actual number/count for each row that is returned in the results.
This is so that I can customize the logic depending on how many messages each of the returned users has.
I have tried a few different options but it's not quite working.
SELECT u.*
FROM users u
WHERE
NOT EXISTS (
SELECT 1
FROM events e
WHERE e.user_id = u.id AND e.type = 'collection'
)
AND (
SELECT COUNT(*)
FROM messages m
WHERE m.user_id = u.id AND m.message_type = 'collection_reminder'
) <= 3
AND u.admin IS NULL

Move that subquery to the select list. Put the condition in a having clause at the end (MySQL special trick).
SELECT u.*,
(
SELECT COUNT(*)
FROM messages m
WHERE m.user_id = u.id AND m.message_type = 'collection_reminder'
) as cnt
FROM users u
WHERE
NOT EXISTS (
SELECT 1
FROM events e
WHERE e.user_id = u.id AND e.type = 'collection'
)
AND u.admin IS NULL
HAVING cnt <= 3

A slight modification of my answer from the original question:
SELECT u.*, COUNT(DISTINCT m.message_id)
FROM users AS u
LEFT JOIN events AS e ON u.user_id = e.user_id AND e.event_type = 'Collection'
LEFT JOIN messages AS m ON u.user_id = m.user_id AND m.msg_type = 'Collection Reminder'
WHERE u.admin = 0
AND e.event_id IS NULL -- No event of type collection
GROUP BY u.user_id -- Note: you should group on all selected fields, and
-- some configuration of MySQL will require you do so.
HAVING COUNT(DISTINCT m.message_id) < 3 -- Less than 3 collection reminder messages
-- distinct is optional, but
-- if you were to remove the "no event" condition,
-- multiple events could multiply the message count.
;

Related

Joining 3 Tables based on specific conditions

I have the following 3 tables:
users: [id, name, admin ...]
events: [id, user_id, type ...]
messages: [id, user_id, ...]
I want to construct a query that does the following:
-> Select all users from the table users who have not scheduled an event of the type "collection"
-> And who have less than 3 messages of the type "collection_reminder"
-> And who are not admin
I've managed to figure out the first part of this query, but it all goes a bit pear shaped when I try to add the 3 table, do the count, etc.
Here is a query that might get the job done. Each of the requirement is represented as a condition in the WHERE clause, using correlated subqueries when needed:
SELECT u.*
FROM users u
WHERE
NOT EXISTS (
SELECT 1
FROM events e
WHERE e.user_id = u.id AND e.type = 'collection'
)
AND (
SELECT COUNT(*)
FROM messages m
WHERE m.userid = u.id AND m.type = 'collection_reminder'
) <= 3
AND u.admin IS NULL
Ill try this on the top of the head so expect some synthax issues, but the idea is the following.
You can filter out who have no events schedule using a left join. On a left join the elements on the second part of the query will show up as null.
select * from users u
left join events e on e.user_id = u.id
where e.user_id is null
Now, i dont think this is the most performant way, but a simple way to search for everyone that has 3 or less messages:
select * from users u
left join events e on e.user_id = u.id
where u.id in (
select COUNT(*) from messages m where m.user_id = u.id HAVING COUNT(*)>3;
)
and e.user_id is null
Then filtering who is not admin is the easiest :D
select * from users u
left join events e on e.user_id = u.id
where u.id in (
select COUNT(*) from messages m where m.user_id = u.id HAVING COUNT(*)>3;
)
and e.user_id is null
and u.admin = false
Hope it helps.
This is pretty much a direct translation of your requirements, in the order you listed them:
SELECT u.*
FROM users AS u
WHERE u.user_id NOT IN (SELECT user_id FROM events WHERE event_type = 'Collection')
AND u.user_id IN (
SELECT user_id
FROM messages
WHERE msg_type = 'Collection Reminder'
GROUP BY user_id
HAVING COUNT(*) < 3
)
AND u.admin = 0
or alternatively, this can be accomplished completely with joins:
SELECT u.*
FROM users AS u
LEFT JOIN events AS e ON u.user_id = e.user_id AND e.event_type = 'Collection'
LEFT JOIN messages AS m ON u.user_id = m.user_id AND m.msg_type = 'Collection Reminder'
WHERE u.admin = 0
AND e.event_id IS NULL -- No event of type collection
GROUP BY u.user_id -- Note: you should group on all selected fields, and
-- some configuration of MySQL will require you do so.
HAVING COUNT(DISTINCT m.message_id) < 3 -- Less than 3 collection reminder messages
-- distinct is optional, but
-- if you were to remove the "no event" condition,
-- multiple events could multiply the message count.
;
This query uses joins to link the 3 tables, filters the result using the where clause, and using Group by, having limiting the result to only those who satisfy the less than count condition..
SELECT a.id,
SUM(CASE WHEN b.type = 'collection' THEN 1 ELSE 0 END),
SUM(CASE WHEN c.type = 'collection_reminder' THEN 1 ELSE 0 END
FROM users a
left join events b on (b.user_id = a.id)
left join messages c on (c.user_id = a.id)
WHERE a.admin = false
GROUP BY a.id
HAVING SUM(CASE WHEN b.type = 'collection' THEN 1 ELSE 0 END) = 0
AND SUM(CASE WHEN c.type = 'collection_reminder' THEN 1 ELSE 0 END) < 3

Sub query within SELECT statement always returning NULL

I am trying to write an SQL SELECT statement with a sub query. There is no error returned but I don't get the results I am expecting. The value for r.related is always NULL.
SELECT
l.id,
u.id as user_id,
u.name,
r.related
FROM
list l
INNER JOIN user u ON u.id = l.user_id
LEFT JOIN (
SELECT COUNT(u.id) AS related, b.group_id
FROM user u
INNER JOIN booking b ON b.user_id = u.id
WHERE u.id != l.user_id
AND b. = 0) AS r ON r.group_id = l.group_id
WHERE
l.group_id = 22
GROUP BY l.id, u.id
ORDER BY l.id
I am writing the sub query correctly?
Here's the problem:
SELECT COUNT(u.id) AS related, b.group_id
FROM user u
INNER JOIN booking b ON b.user_id = u.id
WHERE u.id != b.user_id
AND b. = 0
Look, you are joining user and booking table on booking.user_id = user.id
and
then you are just discarding those matching rows between these two tables in your where condition WHERE user.id != booking.user_id;
It's more like you are looking the differences between Set A and Set B in A intersection B. So in this case you won't find any (i.e. NULL).

How can I optimise an extremely slow MySQL query that uses COUNT DISTINCT

I have a very slow MySQL query that I would like to optimise.
The query is taking 66.2070 seconds to return 5 results from tables containing around 200 rows.
The database tables store users, experiments (A/B tests), goals (page URLs), visits (page visits) and conversions (clicks a goal's URL). The visit and conversion tables both have a combination column that records if version A or B of a page was visited or a conversion came from version A or B. Combinations are stored in the db as 1 or 2.
I'm trying to get a list of a user's experiments with the number of visits and conversions for each combination.
For some relationships I'm using composite primary keys, which does make the joins more complicated. I doubt it but could this be the cause of the problem?
How can I rewrite this query to make it run in a reasonable time, at least less than a second?
Here's my database schema:
and her's my query:
SELECT e.id AS id,
e.name AS name,
e.status AS status,
e.created AS created,
Count(DISTINCT v1.id) AS visits1,
Count(DISTINCT v2.id) AS visits2,
Count(DISTINCT c1.id) AS conversions1,
Count(DISTINCT c2.id) AS conversions2
FROM experiment e
LEFT JOIN visit v1
ON ( v1.experiment_id = e.id
AND v1.user_id = e.user_id
AND v1.combination = 1 )
LEFT JOIN visit v2
ON ( v2.experiment_id = e.id
AND v2.user_id = e.user_id
AND v2.combination = 2 )
LEFT JOIN goal g
ON ( g.experiment_id = e.id
AND g.user_id = e.user_id
AND g.principal = 1 )
LEFT JOIN conversion c1
ON ( c1.experiment_id = e.id
AND c1.user_id = e.user_id
AND c1.goal_id = g.id
AND c1.combination = 1 )
LEFT JOIN conversion c2
ON ( c2.experiment_id = e.id
AND c2.user_id = e.user_id
AND c2.goal_id = g.id
AND c2.combination = 2 )
WHERE e.user_id = 25
GROUP BY e.id
ORDER BY e.created DESC
LIMIT 5
The resulting table should look something like this:
You should do the aggregations before doing the joins, to avoid getting large intermediate results. I think the logic is
SELECT e.id, e.name, e.status, e.created,
v.visits1, v.visits2, g.conversions1, g.conversions2
FROM experiment e LEFT JOIN
(SELECT experiment_id, user_id,
SUM(combination = 1) as visits1,
SUM(combination = 2) as visits2
FROM visits
WHERE combination IN (1, 2)
GROUP BY experiment_id, user_id
) v
ON v.experiment_id = e.id AND
v.user_id = e.user_id LEFT JOIN
(SELECT g.experiment_id, g.user_id,
SUM(c.combination = 1) as conversions1,
SUM(c.combination = 2) as conversions2
FROM goal g LEFT JOIN
conversion c
ON c.experiment_id = g.experiment_id AND
c.user_id = g.user_id AND
c.goal_id = g.id
WHERE g.principal = 1
GROUP BY g.experiment_id, g.user_id
) g
ON g.experiment_id = e.id AND
g.user_id = e.user_id LEFT JOIN
WHERE e.user_id = 25
ORDER BY e.created DESC
LIMIT 5 ;
There are further optimizations for this. For instance, an index on experiment(user_id, created, id).
For your question about the drawback of using composite keys I found this:
Drawback of composite keys
I can't currently test ur database but use the EXPLAIN syntax in mysql to find out what is wrong with the perfomance of ur query:
MySQL docs about EXPLAIN and optimizing ur query with EXPLAIN

Nested Join in Subquery and failing correlation

I have 3 tables sc_user, sc_cube, sc_cube_sent
I wand to join to a user query ( sc_user) one distinct random message/cube ( from sc_cube ), that has not been sent to that user before ( sc_cube_sent), so each row in the result set has a disctinct user id and a random cubeid from sc_cube that is not part of sc_cube_sent with that user id associated there.
I am facing the problem that I seem not to be able to use a correlation id for the case that I need the u.id of the outer query in the inner On clause. I would need the commented section to make it work.
# get one random idcube per user not already sent to that user
SELECT u.id, sub.idcube
FROM sc_user as u
LEFT JOIN (
SELECT c.idcube, sent.idreceiver FROM sc_cube c
LEFT JOIN sc_cube_sent sent ON ( c.idcube = sent.idcube /* AND sent.idreceiver = u.id <-- "unknown column u.id in on clause" */ )
WHERE sent.idcube IS NULL
ORDER BY RAND()
LIMIT 1
) as sub
ON 1
I added a fiddle with some data : http://sqlfiddle.com/#!9/7b0bc/1
new cubeids ( sc_cube ) that should show for user 1 are the following : 2150, 2151, 2152, 2153
Edit>>
I could do it with another subquery instead of a join, but that has a huge performance impact and is not feasible ( 30 secs+ on couple of thousand rows on each table with reasonably implemented keys ), so I am still looking for a way to use the solution with JOIN.
SELECT
u.id,
(SELECT sc_cube.idcube
FROM sc_cube
WHERE NOT EXISTS(
SELECT sc_cube.idcube FROM sc_cube_sent WHERE sc_cube_sent.idcube = sc_cube.idcube AND sc_cube_sent.idreceiver = u.id
)
ORDER BY RAND() LIMIT 0,1
) as idcube
FROM sc_user u
without being able to test this, I would say you need to include your sc_user in the subquery because you have lost the scope
LEFT JOIN
( SELECT c.idcube, sent.idreceiver
FROM sc_user u
JOIN sc_cube c ON c.whatever_your_join_column_is = u.whatever_your_join_column_is
LEFT JOIN sc_cube_sent sent ON ( c.idcube = sent.idcube AND sent.idreceiver = u.id )
WHERE sent.idcube IS NULL
ORDER BY RAND()
LIMIT 1
) sub
If you want to get messagges ids that has not been sent to the particular user, then why use a join or left join at all ?
Just do:
SELECT sent.idcube
FROM sc_cube_sent sent
WHERE sent.idreceiver <> u.id
Then the query may look like this:
SELECT u.id,
/* sub.idcube */
( SELECT sent.idcube
FROM sc_cube_sent sent
WHERE sent.idreceiver <> u.id
ORDER BY RAND()
LIMIT 1
) as idcube
FROM sc_user as u
Got it working with NOT IN subselect in the on clause. Whereas the correlation link u.id is not given within the LEFT JOIN scope, it is for the scope of the ON clause. Here is how it works:
SELECT u.id, sub.idcube
FROM sc_user as u
LEFT JOIN (
SELECT idcube FROM sc_cube c ORDER BY RAND()
) sub ON (
sub.idcube NOT IN (
SELECT s.idcube FROM sc_cube_sent s WHERE s.idreceiver = u.id
)
)
GROUP BY u.id
Fiddle : http://sqlfiddle.com/#!9/7b0bc/48

SQL with 2 INNER JOIN's on the same column not working

I am trying to use the following code to get the 6 users with which the current user has most recently chatted. I have two problems. First of all, if the current user has recieved a message from the other user but has only sent, that other user isnt fetched. Second of all, the ORDER BY clause is causing an error. Im a beginner in SQL so I have no idea what's going on.
Thanks in Advance!
Here's the code:
SELECT users.*
FROM users INNER JOIN
messages fromuser
ON (fromuser.fromid = users.id) INNER JOIN
messages touser
ON (touser.toid = users.id)
WHERE fromuser.toid = :userid OR touser.fromid = :meid
GROUP BY users.id
ORDER BY MAX(messages.datetime)
LIMIT 6;
This should do your job, and it relies less on MySQL extensions than your other answer so far. I estimate that it would perform about the same, but it's surely wordier.
SELECT u.*
FROM (
SELECT DISTINCT otherid
FROM (
SELECT
m.fromid AS otherid,
MAX(m.datetime) as maxts
FROM messages m
WHERE m.toid = :userid
GROUP BY m.fromid
UNION ALL
SELECT
m.toid AS otherid,
MAX(m.datetime) as maxts
FROM messages m
WHERE m.fromid = :userid
GROUP BY m.toid
) um
ORDER BY maxts DESC
LIMIT 6
) otheru
INNER JOIN users u
ON u.id = otheru.otherid
Your logic is doomed to fail, because one users.id cannot be two different values at the same time. I think this query does what you want:
SELECT u.*
FROM messages m INNER JOIN
users u
ON (m.fromid = u.id AND m.toid = :userid) OR
(m.toid = u.id AND m.fromid = :userid )
GROUP BY u.id
ORDER BY MAX(m.datetime) DESC
LIMIT 6;
Notice that it joins to the users table by the id that is not the current user.