Can Someone Help me Optimize this mysql statement? - mysql

I have one table that I'm using to build groups with in my database. The table contains a list of group names and ids. I have another table that has users, and a third table showing the relationships. (userid, groupid).
The situation is this, I need to create a list of userids that belong to a specific subset of groups. So for instance, I want all users that are in group 1, 3, and 8. That is straight forward enough. It gets more complicated though, I may need a list of all users that are in groups 1, 3, and 8, or 1, 2, and 8. Then I might need to exclude users that fit that criteria, but are also in group 27.
So I've got a script dynamically creating a query, using sub queries that works to a point. I have two problems with it. I don't think I'm handling the not-in part properly, because as I ad criteria, eventually, it just kinda hangs. (I think this is a result of me using sub-selects instead of joins, but I could not figure out how to build this with joins.)
Here is an example of a query with 4 ANDed OR groups, and 2 NOT clauses.
Please let me know if there is a better way to optimize this stmt. (I can handle the dynamic building of it in PHP)
If I need to clarify anything or provide more details, let me know.
select * from users_table where username IN
(
select user_id from
(
select distinct user_id from group_user_map where user_id in
(
select user_id from
(
select * from
(
select count(*) as counter, user_id from
(
(
select distinct(user_id) from group_user_map where group_id in (2601,119)
)
union all
(
select distinct(user_id) from group_user_map where group_id in (58,226)
)
union all
(
select distinct(user_id) from group_user_map where group_id in (1299,525)
)
union all
(
select distinct(user_id) from group_user_map where group_id in (2524,128)
)
)
thegroups group by user_id
)
getall where counter = 4
)
getuserids
)
and user_id not in
(
select user_id from group_user_map where group_id in (2572)
)
)
biggergroup
);
Note, the first part of the query is comparing an id to a username. This is because I have the usernames stored as id's from the other table. (This whole thing is a link between two completely different databases).
(Also, if it looks like I have any extra sub-queries, that was to try to force mysql to evaluate the inner queries first.)
Thanks.
Aaron.

Avoiding subselects used for IN clauses:-
SELECT *
FROM users_table
INNER JOIN
(
SELECT Sub1.user_id
FROM (
SELECT COUNT(*) AS counter, user_id
FROM (
SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)
UNION ALL
SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)
UNION ALL
SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)
UNION ALL
SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)
) thegroups
GROUP BY user_id
HAVING counter = 4
) Sub1
LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
ON group_user_map.user_id = Sub2.user_id
WHERE Sub2.user_id IS NULL
) Sub3
ON users_table.username = Sub3.user_id
Or avoiding using the COUNTs to check that the user id exists in all 4 tables, instead using inner joins
SELECT *
FROM users_table
INNER JOIN
(
SELECT Sub1.user_id
FROM (
SELECT z.user_id
FROM (
SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z
INNER JOIN
(SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y ON z.user_id = y.user_id
INNER JOIN
(SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x ON z.user_id = x.user_id
INNER JOIN
(SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w ON z.user_id = w.user_id
) Sub1
LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
ON group_user_map.user_id = Sub2.user_id
WHERE Sub2.user_id IS NULL
) Sub3
ON users_table.username = Sub3.user_id
Cleaning up that 2nd query a bit
SELECT *
FROM users_table
INNER JOIN
(
SELECT z.user_id
FROM (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y
ON z.user_id = y.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x
ON z.user_id = x.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w
ON z.user_id = w.user_id
LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2
ON z.user_id = Sub2.user_id
WHERE Sub2.user_id IS NULL
) Sub3
ON users_table.username = Sub3.user_id
Using your SQL in the comment below, it can be cleaned up to :-
select SQL_NO_CACHE id
from users_table
INNER JOIN ( SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (0, 67) ) ij1
ON users_table.username = ij1.user_id
LEFT OUTER JOIN ( SELECT user_id FROM group_user_map WHERE group_id IN (0) ) Sub2
ON users_table.username = Sub2.user_id
WHERE Sub2.user_id IS NULL
Cleaning up my SQL in the same way:-
SELECT users_table.*
FROM users_table
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2601,119)) z ON users_table.username = z.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (58,226)) y ON users_table.username = y.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (1299,525)) x ON users_table.username = x.user_id
INNER JOIN (SELECT distinct(user_id) FROM group_user_map WHERE group_id IN (2524,128)) w ON users_table.username = w.user_id
LEFT OUTER JOIN (SELECT user_id FROM group_user_map WHERE group_id IN (2572)) Sub2 ON users_table.username = Sub2.user_id
WHERE Sub2.user_id IS NULL
Removing the subselects and doing the joins directly (might help or hinder, suspect it will depend on how many duplicate user_id records there are for each set of group_id records)
SELECT DISTINCT users_table.*
FROM users_table
INNER JOIN group_user_map z ON users_table.username = z.user_id AND z.group_id IN (2601,119)
INNER JOIN group_user_map y ON users_table.username = y.user_id AND y.group_id IN (58,226)
INNER JOIN group_user_map x ON users_table.username = x.user_id AND x.group_id IN (1299,525)
INNER JOIN group_user_map w ON users_table.username = w.user_id AND w.group_id IN (2524,128)
LEFT OUTER JOIN group_user_map Sub2 ON users_table.username = Sub2.user_id AND Sub2.group_id IN (2572)
WHERE Sub2.user_id IS NULL

It would be easier to understand your problem if you post the table structure and some sample data. But here are a few suggestions based on your current query that you might be able to use.
These queries reduce the number of subqueries that you are using. One of the obvious changes is the difference in the way it gets the list of user_id's with each group:
select user_id
from group_user_map
where group_id in (2601,119)
union all
select user_id
from group_user_map
where group_id in (58,226)
union all
select user_id
from group_user_map
where group_id in (1299,525)
union all
select user_id
from group_user_map
where group_id in (2524,128);
This uses a UNION ALL which will list all of the user_id even if they are duplicated. Once you have this list of user_id's then you get the count by applying a count(distinct user_id) and use a HAVING clause to find those that have 4 occurrences.
First, you could consolidate your current query to the following version in a WHERE clause:
select *
from users_table
where username IN (select user_id
from
(
select user_id
from group_user_map
where group_id in (2601,119)
union all
select user_id
from group_user_map
where group_id in (58,226)
union all
select user_id
from group_user_map
where group_id in (1299,525)
union all
select user_id
from group_user_map
where group_id in (2524,128)
) thegroups
where user_id not in (select user_id
from group_user_map
where group_id in (2572))
group by userid
having count(distinct userid) = 4);
Or you could use the query in the WHERE clause in a subquery that you JOIN to:
select ut.*
from users_table ut
inner join
(
select user_id
from
(
select user_id
from group_user_map
where group_id in (2601,119)
union all
select user_id
from group_user_map
where group_id in (58,226)
union all
select user_id
from group_user_map
where group_id in (1299,525)
union all
select user_id
from group_user_map
where group_id in (2524,128)
) thegroups
where user_id not in (select user_id
from group_user_map
where group_id in (2572))
group by userid
having count(distinct userid) = 4
) biggergroup
on ut.username = biggergroup.user_id;

It's not exactly clear what you mean when you say "I want all users that are in group 1, 3, and 8" and then write
select distinct(user_id) from group_user_map where group_id in (58,226)
because the English suggests you want a user who is in all three groups but the SQL gives you users who are in any 1 of the groups. So you need to be clearer about what exactly you want.
It's a little hard to believe that you are trying to find users that are in all of 4 supergroups with each supergroup being made of exactly 2 groups. It makes me question what you are doing and why.
There are a few different approaches I can think of depending on what you are really going to run into. Obviously the simplest is to break it into multiple queries and combine the results in your code. You can auto-join the group table if it's not too big, but it probably is too big to to join 3 times. You might get better performance with NOT EXISTS than with NOT IN but probably not. You can try to further leverage aggregation functions with CASE functions to compute success values in an intermediate table, but that's getting pretty crazy. More likely you'd be better off reworking your data structure.
The main problem I see with your existing solution is the large number of temporary tables you create. In general you are going to need a temporary table of some kind to do something this complicated so I would focus on limiting it to two tables, each of which is smaller than the relationships table.

Is this right query
select * from users_table where username IN
(
(select distinct(user_id) from group_user_map where group_id in (2601,119)) a
inner join
(select distinct(user_id) from group_user_map where group_id in (58,226)) b
on a.user_id = b.user_id inner join
(select distinct(user_id) from group_user_map where group_id in (1299,525)) c
on a.user_id = c.user_id inner join
(select distinct(user_id) from group_user_map where group_id in (2524,128)) d
on a.user_id = d.user_id
) and user_id not in (select user_id from group_user_map where group_id in (2572))
Instead of union all and finally filter with having counter of 4, I replaced with intersect. Kindly check whether result is correct and it runs fast ?
Vinit

Related

MySQL, get sum of two queries

I have three different tables about Product having different columns and structure, assume
Product1, Product2, Product3
So, I'm trying to get sum of count(*) of three tables having same user_id, i.e. foreach user_id field.
Table - Product1
select P.user_id, count(*)
from Product1 P
group by P.user_id
Table - Product2
select P.user_id, count(*)
from Product2 P
group by P.user_id
Table - Product3
select P.user_id, count(*)
from Product3 P
group by P.user_id
They give me user_id field and count(*),
Can I add results of count(*), foreach user_id field? Thanks, in advance
Having three tables with the same structure is usually a sign of poor database design. You should figure out ways to combine the tables into a single table.
In any case, you can aggregate the results. One way is:
select user_id, sum(cnt)
from ((select user_id, count(*) as cnt
from product1
group by user_id
) union all
(select user_id, count(*) as cnt
from product2
group by user_id
) union all
(select user_id, count(*) as cnt
from product3
group by user_id
)
) up
group by user_id;
You want to use union all rather than a join because MySQL does not support full outer join. Union all ensures that users from all three tables are included.
Aggregating twice (in the subqueries and the outer query) allows MySQL to use indexes for the inner aggregations. That can be a performance advantage.
Also, if you are looking for a particular user or set of users, use a where clause in the subqueries. That is more efficient (in MySQL) than bringing all the data together in subqueries and then doing the filtering.
Combine the results using UNION and then do the addition.
Query
select t.`user_id`, sum(`count`) as `total` from(
select `user_id`, count(*) as `count`
from `Product1`
group by `user_id`
union all
select `user_id`, count(*)
from `Product2`
group by `user_id`
union all
select `user_id`, count(*)
from `Product3`
group by `user_id`
) t
group by t.`user_id`;
You could sum the result of union all
select user_id, sum(my_count)
from (
select P.user_id, count(*) my_count
from Product1 P
group by P.user_id
UNION ALL
select P.user_id, count(*)
from Product2 P
group by P.user_id
UNION ALL
select P.user_id, count(*)
from Product3 P
group by P.user_id ) t
group by user_id
Yes you can :)
SELECT SUM(userProducts) userProducts
FROM (
SELECT count(user_id) userProducts FROM Product1 WHERE user_id = your_user_id
UNION ALL
SELECT count(user_id) userProducts FROM Product2 WHERE user_id = your_user_id
UNION ALL
SELECT count(user_id) userProducts FROM Product3 WHERE user_id = your_user_id
) s
Please try below. Not tried in db so could get syntax error.
select p.user_id ,sum(total) from (
select P.user_id, count() total from product1 p group by P.user_id
union all
select P.user_id, count() total from product2 p group by P.user_id
union all
select P.user_id, count(*) total from product3 p group by P.user_id
) a
Yes, we can aggregate results from different tables using join and union based on our requirement. In your case Union All will work perfectly and can write optimised query by using count(1) instead of count(*), as it uses first Index of the table which is more often clustered Index.
select user_id, sum(cnt)
from ((select user_id, count(1) as cnt
from product1
group by user_id
) union all
(select user_id, count(1) as cnt
from product2
group by user_id
) union all
(select user_id, count(1) as cnt
from product3
group by user_id
)
) a
group by user_id;

Making a select for a two select sql/oracle

I want to join join 2 select in single query :
Here are the two queries.
SELECT player_id, SUM(score) score
FROM (
SELECT id_p1 player_id, score_p1 score
FROM matchs
UNION ALL
SELECT id_p2, score_p2
FROM matchs
) q
GROUP BY player_id
AND
SELECT player_id, SUM(score) score
FROM (
SELECT id_p1 player_id, score_p2 score
FROM matchs
UNION ALL
SELECT id_p2, score_p1
FROM matchs
) q
GROUP BY player_id
Thank you !
Try this
SELECT table1.player_id, table1.score score1, table2.score score2,
abs(table1.score - table2.score) difference
FROM (
SELECT player_id, SUM(score) score
FROM (
SELECT player1_id player_id, score_p1 score FROM matchs
UNION ALL
SELECT player2_id , score_p2 FROM matchs
) q GROUP BY player_id
) table1
INNER JOIN
(
SELECT player_id, SUM(score) score
FROM (
SELECT player1_id player_id, score_p2 score FROM matchs
UNION ALL
SELECT player2_id , score_p1 FROM matchs
) q GROUP BY player_id
) table2 ON table1.player_id = table2.player_id
SQL Fiddle Demo
Ideally, one would perform this operation using a FULL JOIN:
SELECT COALESCE(t1.player1_id, t2.player2_id)
SUM(COALESCE(t1.score_p1,0) + COALESCE(t2.score_p2,0))
FROM table_name t1
FULL JOIN table_name t2 ON t1.player1_id = t2.player2_id
GROUP BY COALESCE(t1.player1_id, t2.player2_id)
However, sadly MySQL does not have native support for such a join operation. Instead, one can simulate it by making a UNION between a LEFT JOIN and a RIGHT JOIN, then aggregating:
SELECT p, SUM(s) FROM (
SELECT t1.player1_id p, SUM(t1.score_p1 + IFNULL(t2.score_p2,0)) s
FROM table_name t1
LEFT JOIN table_name t2 ON t1.player1_id = t2.player2_id
GROUP BY t1.player1_id
UNION
SELECT t2.player2_id, SUM(IFNULL(t1.score_p1,0) + t2.score_p2)
FROM table_name t1
RIGHT JOIN table_name t2 ON t1.player1_id = t2.player2_id
GROUP BY t2.player2_id
) t GROUP BY p
See it on sqlfiddle.

Optimize MySQL Query on Multiple NOT INs

Is there any way to optimize this sql statement?
Maybe joins or something?
SELECT id, name
FROM item
WHERE id NOT IN (
SELECT id
FROM itemlock
) AND id NOT IN (
SELECT id
FROM itemlog
)
Thanks
use LEFT JOIN
SELECT a.*
FROM item a
LEFT JOIN itemLock b
ON a.ID = b.ID
LEFT JOIN itemLog c
ON a.ID = c.ID
WHERE b.ID IS NULL AND
c.ID IS NULL
You can also use UNION, like below:
SELECT id, name FROM item
WHERE id NOT IN (
SELECT id FROM itemlock UNION SELECT id FROM itemlog
)
Try this:
SELECT id, NAME FROM item i
LEFT JOIN (SELECT id FROM itemlock
UNION
SELECT id FROM itemlog) AS A ON i.id = A.id
WHERE A.id IS NULL;

Way to improve this multiple UNION query?

I need to find the users who have either
a shared video credit;
a shared production credit; or
a shared group.
This is currently the query I came up with:
SELECT profile_id FROM productions_productionmember WHERE production_id in
(SELECT production_id FROM productions_productionmember WHERE profile_id=?)
UNION
SELECT profile_id FROM groups_groupmember WHERE group_id in
(SELECT group_id FROM groups_groupmember WHERE profile_id=?)
UNION
SELECT profile_id FROM videos_videocredit WHERE video_id in
(SELECT video_id FROM videos_videocredit WHERE profile_id=?)
Relevant tables:
groups_groupmember
- profile_id
- group_id
videos_videocredit
- profile_id
- video_id
productions_productionmember
- profile_id
- production_id
How can improve on this query?
SELECT DISTINCT p.profile_id
FROM productions_productionmember p
left outer join groups_groupmember g on g.profile_id = p.profile_id
left outer join videos_videocredit v on v.profile_id = p.profile_id
WHERE v.video_id is not null or g.group_id is not null
or p.production_id in (SELECT production_id
FROM productions_productionmember
WHERE profile_id=?)
You need a list of profiles who are present in either of 3 tables.
SELECT profile_id FROM productions_productionmember
UNION
SELECT profile_id FROM groups_groupmember
UNION
SELECT profile_id FROM videos_videocredit
If production_id\group_id\video_id can be null, than IS NOT NULL check can be used on these queries.
What is the profile_id that you are passing in you query? If you need to check if profile is present in any one of table, than you can put the where clause on the result returned by the above query.
SELECT count(*)
from (SELECT profile_id FROM productions_productionmember
UNION
SELECT profile_id FROM groups_groupmember
UNION
SELECT profile_id FROM videos_videocredit ) subq
where subq.profile_id = ?
SELECT pp.profile_id
FROM productions_productionmember pp
JOIN productions_productionmember pp2 ON pp.production_id=pp2.production_id
AND pp2.profile_id=?
UNION
SELECT gg.profile_id
FROM groups_groupmember gg
JOIN groups_groupmember gg2 ON gg.group_id=pp2.group_id
AND gg2.profile_id=?
UNION
SELECT vv.profile_id
FROM videos_videocredit vv
JOIN videos_videocredit vv2 ON vv.video_id=vv2.video_id
AND vv2.profile_id=?
This eliminates inner queries. I can't tell you if this will increase query speed.
This query returns profile_ids which can be found two or more in the tables.
Select *
From productions_productionmember as a,
groups_groupmember as b,
videos_videocredit as c
Where a.profile_id in (select ...) or
b.group_id in (select ...) or
c.video_id in (select...)
Hope it helps...

MySQL COUNT of multiple left joins - optomization

I have a query that is getting counts from multiple tables by using a LEFT JOIN and subqueries. The idea is to get a count various activites a member has participated in.
The schema looks like this:
member
PK member_id
table1
PK tbl1_id
FK member_id
table2
PK tbl2_id
FK member_id
table3
PK tbl3_id
FK member_id
My query looks like this:
SELECT t1.num1,t2.num2,t3.num3
FROM member m
LEFT JOIN
(
SELECT member_id,COUNT(*) as num1
FROM table1
GROUP BY member_id
) t1 ON t1.member_id = m.member_id
LEFT JOIN
(
SELECT member_id,COUNT(*) as num2
FROM table2
GROUP BY member_id
) t2 ON t2.member_id = m.member_id
LEFT JOIN
(
SELECT member_id,COUNT(*) as num3
FROM table3
GROUP BY member_id
) t3 ON t3.member_id = m.member_id
WHERE m.member_id = 27
Where 27 is a test id. The actual query joins more than three tables and the query is run multiple times with the member_id being changed. The problem is this query runs pretty slow. I get the info I need but I am wondering if anyone could suggest a way to optimize this. Any advice is very much appreciated. Thanks much.
You should refactor your query. You can do this by reordering the way the query collects the data. How?
Apply the WHERE clause first
Apply JOINs last
Here is your original query:
SELECT t1.num1,t2.num2,t3.num3
FROM member m
LEFT JOIN
(
SELECT member_id,COUNT(*) as num1
FROM table1
GROUP BY member_id
) t1 ON t1.member_id = m.member_id
LEFT JOIN
(
SELECT member_id,COUNT(*) as num2
FROM table2
GROUP BY member_id
) t2 ON t2.member_id = m.member_id
LEFT JOIN
(
SELECT member_id,COUNT(*) as num3
FROM table3
GROUP BY member_id
) t3 ON t3.member_id = m.member_id
WHERE m.member_id = 27
Here is you new query
SELECT
IFNULL(t1.num1,0) num1,
IFNULL(t1.num2,0) num2,
IFNULL(t1.num3,0) num3
FROM
(
SELECT * FROM member m
WHERE member_id = 27
)
LEFT JOIN
(
SELECT member_id,COUNT(*) as num1
FROM table1
WHERE member_id = 27
GROUP BY member_id
) t1 ON t1.member_id = m.member_id
LEFT JOIN
(
SELECT member_id,COUNT(*) as num2
FROM table2
WHERE member_id = 27
GROUP BY member_id
) t2 ON t2.member_id = m.member_id
LEFT JOIN
(
SELECT member_id,COUNT(*) as num3
FROM table3
WHERE member_id = 27
GROUP BY member_id
) t3 ON t3.member_id = m.member_id
;
BTW I changed member m into SELECT * FROM member m WHERE member_id = 27 in case you need any information about member 27. I also added the IFNULL function to each result to produce 0 in case count is NULL.
You need to make absolutely sure
member_id is the primary key of the member table
member_id is indexed in table1, table2, and table3
Give it a Try !!!
Without knowing your schema and what you've done for indexes, one POSSIBLE way to make this faster is:
SELECT (select ifnull(count(*),0) from table1 where table1.member_id = m.id) as num1,
(select ifnull(count(*),0) from table2 where table2.member_id = m.id) as num2,
(select ifnull(count(*),0) from table3 where table3.member_id = m.id) as num3
from member m
WHERE m.member_id = 27
Now, this is a slightly risky recommendation, simply because I don't know anything about your DB or what else is running, or where the bottlenecks are.
In general, it would be a good idea to post an explain plan with your query to get a better answer.
SELECT num1, num2, count(*) as num3
FROM (
SELECT member_id, num1, count(*) as num2
FROM (
SELECT member_id, count(*) as num1
FROM member
LEFT JOIN table1 USING (member_id)
WHERE member_id = 27) as m1
LEFT JOIN table2 USING (member_id)) as m2
LEFT JOIN table3 USING (member_id);