Optimizing MySQL Join and Group By with intermediate table - mysql

Simplifying but I have three tables:
users (user_id, team_id)
results (user_id, result)
user_signups (user_id, team_id, event_id)
results.user_id is a foreign key.
Tables have large number of rows in. If I do
select sum(result)
from results
inner join users on users.id = results.user_id
group by team_id
It is fast. "Explain" has results with 150k rows, users with 1 row.
If I do
select sum(result)
from results
inner join user_signups on user_signups.user_id = results.user_id
where event_id = 1
group by team_id
It is very slow (from 1 second to 14). "Explain" has results with 28 rows, user_signups with 5345 rows.
Things I have tried:
A unique index on event_id and user_id on user_signups.
An index on event_id, user_id, team_id on user_signups.
Rewriting as
select sum(result)
from results
inner join (select * from user_signups where event_id = 1) user_signups on user_signups.user_id = results.user_id
group by team_id
Rewriting as
select sum(result)
from results
inner join users on users.id = results.user_id
inner join user_signups on user_signups.user_id = users.id
where event_id = 1
group by user_signups.team_id
Any other suggestions?

By grouping on the team_id, I assume that you want one row for each record in results.
Is this what you're looking for?
SELECT *, sum(result) FROM results
LEFT JOIN users ON (users.user_id=results.user_id)
LEFT JOIN user_signups ON (user_signups.users_id=users.user_id)
GROUP BY table.field
From here, you can group on whatever you like. This structure assumes that most of your data will be present in the results table and will join users to the results table and user_signups to the users table.

Make the multicolumn index on (event_id, user_id, team_id) in user_signups table and try to run the following query.
If this doesn't work then post your explain here.
select sum(result) from results inner join(select
event_id,user_id,team_id from user_signups where event_id = 1)
user_signups on user_signups.user_id = results.user_id group by
team_id

Related

How to join a table with specific conditions

I would like to select data from a table like this (the table name is conversations_users) :
I would like to be able to retrieve a conversation ID that includes only two users. As instance, if I search a conversation specific to users 1 and 3 the conversation number 6 should be the unique result, because the conversation 5 also includes user 2.
I have tried to perform a request like
SELECT * FROM conversations_users AS table1 JOIN
conversations_users AS table2 ON
table1.conversation_ID = table2.conversationID
WHERE table1.userID = 3 AND
table2.userID = 1
But it returns both conversations 5 and 6. How can I fix that ?
Thank you in advance,
Pierre
Add the ON clause:
SELECT * FROM conversations_users AS table1 JOIN
conversations_users AS table2
ON table1.conversation_ID = table2.conversation_ID
WHERE table1.userID = 3 AND
table2.userID = 1
Update:
To get only coversations, where only 1 and 3 are involved, you can use having clause:
SELECT table1.conversation_ID FROM conversations_users AS table1 JOIN
conversations_users AS table2
ON table1.conversation_ID = table2.conversation_ID
WHERE table1.userID = 3 AND
table2.userID = 1
Group by table1.conversation_ID
having Count(*) = 2
The query you need looks like:
SELECT conversation_ID, GROUP_CONCAT(DISTINCT userID ORDER BY userID) as users
FROM conversations_users
GROUP BY conversation_ID
HAVING users = '1,3'
The GROUP BY clause groups the rows having the same conversation_ID and from each group it generates a new record that contains the conversation_ID and the distinct values of userID, in ascending order, concatenated with comma (,).
The HAVING clause keeps only those records that have '1,3' in the column users computed by the GROUP BY clause.
The query produces the output you need but it is not efficient because it reads the entire table. It could be more efficient by picking first the conversations of users 1 and 3 and then applying the above only to them.
It looks like this:
SELECT conversation_ID, GROUP_CONCAT(DISTINCT userID ORDER BY userID) as users
FROM (
SELECT *
FROM conversations_users
WHERE userID in (1, 3)
) conversations
GROUP BY conversation_ID
HAVING users = '1,3'
In order to work faster than the previous query, the conversations_users must have an index on the userID column.
If you want to restrict to those conversations which involve exactly n number of users. I think below generic query should work. Replacing 'n' as per requirement.
select *
from conversations_users
where conversation_id IN (select conversation_id
from conversations_users
group by conversation_id
having count(userid) = 2)
Thanks,
Amitabh
The inner select grabs all conversationIDs with other users than 1 or 3
the outer select (with distinct) collects all conversations wich are NOT in this subset
SELECT DISTINCT conversationID
FROM conversations_users t1
WHERE conversationID NOT IN ( SELECT conversationID
FROM conversations_users
WHERE userID NOT in (1, 3)
)
You can use join with where condition in this case.
SELECT #, userid ,conversation_ID FROM user AS table1 JOIN
conversations_users AS table2
ON user_ID = conversation_ID
WHERE table1.userID = 3 AND
table2.userID = 1
Group by conversation_ID
You can apply suitable condition by where clause instead of group by

MYSQL AVG Score Of First Attempt Of 3 Different Tests - IN GROUP BY

So I have 2 tables
Users: id, name
Results: id, test_id, user_id, score
I need to get the average of the user's first attempt over all three test_id's.
The user may not have completed all 3 tests.
The query I have here does work but is extremely slow, is there a way of speeding this up?
SELECT AVG(score)
FROM results
WHERE id IN(SELECT MIN(id)
FROM results
WHERE complete = 1
GROUP
BY test_id)
This approach uses a subquery to get the minimum id for each user/test combination. It then joins back to results to get the score and uses that for the aggregation:
SELECT u.*, AVG(r.score)
FROM user u LEFT JOIN
(SELECT user, testid, MIN(id) as minid
FROM results r
WHERE complete = 1
GROUP BY user_id, test_id
) ut
ON ut.user_id = u.id LEFT JOIN
results r
ON r.id = ut.minid
GROUP BY u.id;
This produces the average for each user (which is how I interpret the question). If you want the average over all users of the first of each test, then remove the group by and user table from the query.
Try to use a subquery and JOIN it instead of the correlated subquery in the WHERE clause:
SELECT
r1.id,
r2.test_id,
r2.avgstore
FROM results AS r1
INNER JOIN
(
SELECT test_id, MIN(id) AS MinId, AVG(Store) AS AvgStore
FROM results
WHERE complete = 1
GROUP BY test_id
) AS r2 ON r1.id = r2.MinId AND r1.test_id = r2.test_id;
SQL Fiddle Demo

Mysql most outer table visibility in nested loops

Let's assume i have 4 tables:
'users' (id, username),
'photos' (id, user_id, name),
'photos_comments' (id, photo_id, user_id, text),
'photos_likes' (id, photo_id, user_id, test).
I want to calculate sum of all comments and likes for every user in all of his uploaded photos. For that i'm trying to build a query:
SELECT users.*,
(SELECT SUM(count) as rating FROM(
SELECT COUNT(*) as count FROM photos_likes
WHERE photos_likes.photo_id IN (SELECT photos.id FROM photos WHERE photos.user_id = users.id)
UNION
SELECT COUNT(*) as count FROM photos_comments
WHERE photos_comments.photo_id IN (SELECT photos.id FROM photos WHERE photos.user_id = users.id)
) as total_rating) as rating FROM users
It returns 'Unknown users.id column in WHERE clause' error. So it looks like it can't see users table in most inner query.I can't understand why it happens,because another similar query works ok:
SELECT users.*,
(SELECT COUNT(*) as count FROM photos_likes
WHERE photos_likes.photo_id IN (SELECT photos.id FROM photos WHERE photos.user_id = users.id)) as likes_count,
(SELECT COUNT(*) as count FROM photos_comments
WHERE photos_comments.photo_id IN (SELECT photos.id FROM photos WHERE photos.user_id = users.id)) as comments_count FROM users
In this query it can grab id from users table row in most inner query. Why is it working like that? Thanks for help.
Look into Subqueries in the FROM Clause:
Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
In your second example, you use the subquery in a where clause. That's the difference.
See also Correlated Subqueries.
select
photos.userid,
photos.photoid,
count(distinct commentid),
count(distinct likeid),
count(distinct commentid) + count(distinct likeid) as total
from
photos
left join photos_comments on photos.photoid=photos_comments.photoid
left join photos_likes on photos.photoid=photos_likes.photoid
group by photos.userid, photos.photoid

mysql combine two selects?

The first select is
select user_id, count(*) as count
from users
where referrer IS NOT NULL
group by referrer
order by count DESC
Then based off the records returned by that query I need to get the date for the user who referred the users in the above query.
select user_id from users where token = IDS_FROM_LAST_QUERY
I know I could use a sub query and say where IN (subquery) but I'm getting tripped up trying to keep the count from the subquery.
So in the end I need the following info
user_id, count
select o.user_id user_id, count(*) count
from users o
join users i on o.token = i.user_id
where i.referrer is not null
group by referrer
order by count desc
I would use a CTE (common table expression). CTE is super handy to look to get one population and then query the same or slightly different population from the CTE.
WITH Referrer (user_id, count) AS
(
select user_id, count(*) as count
from users
where referrer IS NOT NULL
group by referrer
order by count DESC
)
select
users.user_id
,Referrer.count
from users
inner join Referrer.user_id = users.user_id

MySQL query help on joins and counting most occurance

I have two tables, client_table and order_table
Entries in the client_table are unique with id as primary key
In the order_table, the order_id is the primary key and also a client_id field. The client_id is the primary key of the client_table.
There can be multiple entries in the order_table that contain the same client_id. So it is a one-to-many relationship.
I'm trying to come up with a query that produce the most occurring client_id in the order_table
I have tried the following queries.
SELECT a.id FROM `client_table` as a inner join order_table as b GROUP BY a.id ORDER BY count(b.client_id)
So I'm looking for a result of client_id which have the most orders. I suppose I only need the order_table and don't need the client_table at all right ?
SELECT b.client_id, count(*) AS Clients
FROM order_table AS b
GROUP BY b.client_id
ORDER BY Clients DESC
No need for a join if you only want the clients id, like you said. Also, if you want to join, like in your example query, you need to specify on which column with a ON clause.
... inner join order_table as b on a.id=b.client_id ...
Yes, you don't need the client table at all:
select
client_id,
count(order_id) order_count
from
order_table
group by
client_id
order by
count(order_id) DESC
you can try:
SELECT a.id FROM `client_table` as a left join order_table as b ON a.id = b.client_id GROUP BY a.id ORDER BY count(b.client_id) desc
Use DESC at ORDER BY to get most occurances. Right now you only get the lowest occurances, and try changing INNER JOIN to LEFT JOIN.