I have students that are associated many-to-many with groups via a join table groups_students. Each group has a group_type, which can either be a permission_group or not (boolean on group_types table).
I also have users, which are also associated many-to-many with groups via groups_users.
I want to return all students for which a particular user is associated with ALL the student's permission groups.
I've been lead to believe this requires relational division and here's where I am with it:
SELECT DISTINCT gs.student_id
FROM groups_students AS gs
INNER JOIN groups ON groups.id = gs.group_id
INNER JOIN groups_users gu ON gu.group_id = groups.id
INNER JOIN group_types ON group_types.id = groups.group_type_id
WHERE group_types.permission_group = 1
AND gu.user_id = 37
AND NOT EXISTS (
SELECT * FROM groups_students AS gs2
WHERE gs2.student_id = gs.student_id
AND NOT EXISTS (
SELECT gu2.group_id
FROM groups_users AS gu2
WHERE gu2.group_id = gs2.group_id AND gu2.user_id = gu.user_id
)
)
This works, but on my live database with over 20,000 rows in groups_students, it takes over 3 seconds.
Can I make it faster? I read about doing relational division with COUNT but I couldn't relate it to my scenario. Am I able to make cheap gains to bring this query well under half a second execution time or am I looking at a major restructure?
Edit - English language description: Students belong to classes (groups), and users have permission to view certain classes. I need to know the students for which a particular user has permission to view all the (permission) classes for.
EXPLAIN for the slow query:
+----+--------------------+-------------+--------+--------------------------------------------------------------+--------------------------------------------------+---------+-----------------------------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------+--------+--------------------------------------------------------------+--------------------------------------------------+---------+-----------------------------+------+--------------------------------+
| 1 | PRIMARY | gu | ref | index_groups_users_on_user_id,index_groups_users_on_group_id | index_groups_users_on_user_id | 5 | const | 1181 | Using where; Using temporary |
| 1 | PRIMARY | groups | eq_ref | PRIMARY | PRIMARY | 4 | my_db.gu.group_id | 1 | |
| 1 | PRIMARY | group_types | ALL | PRIMARY | NULL | NULL | NULL | 3 | Using where; Using join buffer |
| 1 | PRIMARY | gs | ref | index_groups_students_on_group_id_and_student_id | index_groups_students_on_group_id_and_student_id | 4 | my_db.groups.id | 9 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | gs2 | ref | index_groups_students_on_student_id_and_group_id | index_groups_students_on_student_id_and_group_id | 4 | my_db.gs.student_id | 8 | Using where; Using index |
| 3 | DEPENDENT SUBQUERY | gu2 | ref | index_groups_users_on_user_id,index_groups_users_on_group_id | index_groups_users_on_group_id | 5 | my_db.gs2.group_id | 99 | Using where |
+----+--------------------+-------------+--------+--------------------------------------------------------------+--------------------------------------------------+---------+-----------------------------+------+--------------------------------+
SQL Fiddle
"I want to return all students for which a particular user is associated with ALL the student's permission groups."
I don't really follow your query; it seems so complicated for this purpose. Instead, I think of it as follows:
Generate all students and their permissions
Generate all permissions for user 37
(outer) Join these together on permissions
Be sure that all permissions for a particular student are in the u37 group
The resulting query is:
select student_id
from (SELECT gs.student_id, g.id as group_id
FROM groups_students gs INNER JOIN
groups g
ON g.id = gs.group_id INNER JOIN
groups_users gu
ON gu.group_id = g.id INNER JOIN
group_types gt
ON gt.id = g.group_type_id
where gt.permission_group = 1
) s left outer join
(select g.id as group_id
from groups_users gu INNER JOIN
groups g
on gu.group_id = g.id INNER JOIN
group_types gt
ON gt.id = g.group_type_id
where gu.user_id = 37 and gt.permission_group = 1
) u37
on s.group_id = u37.group_id
group by s.student_id
having count(*) = count(u37.group_id);
Note: You can do this without the subqueries. Despite their overhead, I think they make the query much more understandable.
A simpler version of Gordon's idea...
SELECT gs.student_id
FROM groups_students gs
JOIN groups g
ON g.id = gs.group_id
JOIN group_types gt
ON gt.id = g.group_type_id
LEFT
JOIN groups_users gu
ON gu.group_id = gs.group_id
AND gu.user_id = 37
WHERE gt.permission_group
GROUP
BY student_id
HAVING COUNT(student_id) = COUNT(user_id)
I don't understand why you use subqueries. They are generally slow and should be avoided if possible. Maybe I do not understand your requirements correctly, but I would come up with something like this:
SELECT DISTINCT gs.student_id
FROM groups_students AS gs
INNER JOIN groups ON groups.id = gs.group_id
INNER JOIN groups_users gu ON gu.group_id = groups.id
INNER JOIN group_types ON group_types.id = groups.group_type_id
LEFT JOIN groups_students AS gs2 ON gs2.student_id = gs.student_id
LEFT JOIN groups_users AS gu2 ON gu2.group_id = gs2.group_id AND gu2.user_id = gu.user_id
WHERE group_types.permission_group = 1
AND gu.user_id = 37
AND gs2.student_id IS NULL
AND gu2.group_id IS NULL
You can force something to not exist by using a left join and checking, that the right table-column (use the primary key) contains null.
Related
Suppose I have two tables, people and emails. emails has a person_id, an address, and an is_primary:
people:
id
emails:
person_id
address
is_primary
To get all email addresses per person, I can do a simple join:
select * from people join emails on people.id = emails.person_id
What if I only want (at most) one row from the right table for each row in the left table? And, if a particular person has multiple emails and one is marked as is_primary, is there a way to prefer which row to use when joining?
So, if I have
people: emails:
------ -----------------------------------------
| id | | id | person_id | address | is_primary |
------ -----------------------------------------
| 1 | | 1 | 1 | a#b.c | true |
| 2 | | 2 | 1 | b#b.c | false |
| 3 | | 3 | 2 | c#b.c | true |
| 4 | | 4 | 4 | d#b.c | false |
------ -----------------------------------------
is there a way to get this result:
------------------------------------------------
| people.id | emails.id | address | is_primary |
------------------------------------------------
| 1 | 1 | a#b.c | true |
| 2 | 3 | c#b.c | true | // chosen over b#b.c because it's primary
| 3 | null | null | null | // no email for person 3
| 4 | 4 | d#b.c | false | // no primary email for person 4
------------------------------------------------
You got it a bit wrong, how left/right joins work.
This join
select * from people join emails on people.id = emails.person_id
will get you every column from both tables for all records that match your ON condition.
The left join
select * from people left join emails on people.id = emails.person_id
will give you every record from people, regardless if there's a corresponding record in emails or not. When there's not, the columns from the emails table will just be NULL.
If a person has multiple emails, multiple records will be in the result for this person. Beginners often wonder then, why the data has duplicated.
If you want to restrict the data to the rows where is_primary has the value 1, you can do so in the WHERE clause when you're doing an inner join (your first query, although you ommitted the inner keyword).
When you have a left/right join query, you have to put this filter in the ON clause. If you would put it in the WHERE clause, you would turn the left/right join into an inner join implicitly, because the WHERE clause would filter the NULL rows that I mentioned above. Or you could write the query like this:
select * from people left join emails on people.id = emails.person_id
where (emails.is_primary = 1 or emails.is_primary is null)
EDIT after clarification:
Paul Spiegel's answer is good, therefore my upvote, but I'm not sure if it performs well, since it has a dependent subquery. So I created this query. It may depend on your data though. Try both answers.
select
p.*,
coalesce(e1.address, e2.address) AS address
from people p
left join emails e1 on p.id = e1.person_id and e1.is_primary = 1
left join (
select person_id, address
from emails e
where id = (select min(id) from emails where emails.is_primary = 0 and emails.person_id = e.person_id)
) e2 on p.id = e2.person_id
Use a correlated subquery with LIMIT 1 in the ON clause of the LEFT JOIN:
select *
from people p
left join emails e
on e.person_id = p.id
and e.id = (
select e1.id
from emails e1
where e1.person_id = e.person_id
order by e1.is_primary desc, -- true first
e1.id -- If e1.is_primary is ambiguous
limit 1
)
order by p.id
sqlfiddle
at first my tables:
game
+----+--------------+
| id | game |
+----+--------------+
| 1 | Game1 |
| 2 | Game2 |
| 4 | Game4 |
+----+--------------+
group_game
+---------+----------+
| game_id | group_id |
+---------+----------+
| 1 | 33 |
| 1 | 45 |
| 4 | 33 |
+---------+----------+
groups
+----+------------+----
| id | group_name | ...
+----+------------+----
| 33 | Group33 | ...
| 45 | Group45 | ...
+----+------------+----
users
+---------+----------+----
| user_id | username | ...
+---------+----------+----
| 1 | User1 | ...
| 2 | User2 | ...
+---------+----------+----
users_groups
+---------+----------+
| user_id | group_id |
+---------+----------+
| 1 | 33 |
| 1 | 45 |
| 2 | 45 |
+---------+----------+
What I want to do
Now I want to check wether the current user is in a group which plays "Game4" and if yes the output should be the id and the name of the group.
the current user is "User1" with the ID 1 (table users)
"User1" is in a group with the ID 33 (table users_groups)
The Group-ID 33 belongs to "Group33" (table groups)
The Group with the ID 33 plays the Game with the ID 4 (table group_game)
The Game with the ID belongs to the game "Game4" (table game)
CONCLUSION: Yes, the user is in a group which plays Game4, so output the name of the group ("Group33")
My current code for that (which gives me no rows)
$user_id = $_SESSION["user_id"];
$Game4= "Game4";
$gruppen_dayz = $db->prepare("
SELECT g.group_id, g.group_name
FROM groups g
LEFT JOIN users_groups ug
ON g.group_id = ug.group_id
LEFT JOIN group_game gg
ON g.group_id = gg.group_id
LEFT JOIN game ga
ON ga.id = gg.game_id
WHERE ga.game = ? AND ug.user_id = ?
");
$gruppen_dayz->bind_param('ii', $Game4, $user_id);
I don't know exactly how I should build this query :/
Your where clause is being applied after the joins generate the dataset, negating the left join aspect of your joins. To solve this move the criteria to the join itself so the limits are applied before the joins. Otherwise, the NULL values generated in your left joins are then excluded by your where clause.
There may be other elements as well, this is just the first component I saw.
$user_id = $_SESSION["user_id"];
$Game4= "Game4";
$gruppen_dayz = $db->prepare("
SELECT g.group_id, g.group_name
FROM groups g
LEFT JOIN users_groups ug
ON g.group_id = ug.group_id
AND ug.user_id = ?
LEFT JOIN group_game gg
ON g.group_id = gg.group_id
LEFT JOIN game ga
ON ga.id = gg.game_id
AND ga.game = ?
");
$gruppen_dayz->bind_param('ii', $Game4, $user_id);
---UPDATE ----
Upon further investigation I believe your joins are wrong. Groups doesn't have a group_ID field according to your table structure. Walking though the rest now...
SELECT g.group_id, g.group_name
FROM groups g
LEFT JOIN users_groups ug
ON g.id = ug.group_id
LEFT JOIN group_game gg
ON g.id = gg.group_id
LEFT JOIN game ga
ON ga.id = gg.game_id
WHERE ga.game = ? AND ug.user_id = ?
I might rewrite it as...
SELECT G.Id, G.Group_name
FROM USERS_GROUPS UG
INNER JOIN GROUPS G
on UG.Group_ID = G.ID
INNER JOIN GROUP_GAME GG
on GG.Group_ID = G.ID
INNER JOIN Game GA
on GA.ID = GG.Game_ID
WHERE ga.game = ? AND ug.user_id = ?
I see no value or need for left joins based on your criteria.
I have tables departments, employees, and emails in MySQL 5.6.17 (for a Rails app). Each department has many employees, and both departments and employees have many emails. I want to sort departments by the number of emails to the entire department and individual employees within the department. My attempt:
SELECT departments.*, COUNT(DISTINCT employees.id) AS employees_count, COUNT(DISTINCT emails.id) AS emails_count
FROM departments
LEFT OUTER JOIN employees
ON employees.department_id = departments.id AND employees.is_employed = true
LEFT OUTER JOIN emails
ON (emails.emailable_id = departments.id AND emails.emailable_type = 'department')
OR (emails.emailable_id = employees.id AND emails.emailable_type = 'employee')
GROUP BY departments.id
ORDER BY emails_count DESC
LIMIT 20;
Unfortunately, this query takes over 3 minutes to complete. Since this query will be used in a web interface, that's not a workable timeframe. An EXPLAIN gives:
+----+-------------+-------------+-------+-------------------------------------------------+----------------------------------+---------+-------------------------------+-------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+-------------------------------------------------+----------------------------------+---------+-------------------------------+-------+------------------------------------------------+
| 1 | SIMPLE | departments | index | PRIMARY | PRIMARY | 4 | NULL | 37468 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | employees | ref | index_employees_on_department_id | index_employees_on_department_id | 5 | development_db.departments.id | 5 | Using where |
| 1 | SIMPLE | emails | ALL | index_emails_on_emailable_id_and_emailable_type | NULL | NULL | NULL | 10278 | Range checked for each record (index map: 0x2) |
+----+-------------+-------------+-------+-------------------------------------------------+----------------------------------+---------+-------------------------------+-------+------------------------------------------------+
The index on emails is, then, not being used. This index is used successfully when I join emails only to departments or only to employees, but not with both at once.
Why is this? What can I do about this? Is there a more efficient way to query the desired data?
It might help to do the aggregation first before the joins:
SELECT d.*, e.employees_count, em.emails_count
FROM d LEFT OUTER JOIN
(SELECT e.department_id, count(*) as employees_count
FROM employees e
WHERE e.is_employed = true
GROUP BY e.department_id
) e
ON e.department_id = d.id LEFT OUTER JOIN
(SELECT department_id, count(distinct id) as emails_count
FROM (SELECT em.emailable_id as department_id, em.id
FROM emails em
WHERE em.emailable_type = 'department'
UNION ALL
SELECT e.department_id, em.id
FROM emails em JOIN
employees e
ON em.emailable_id = e.id AND em.emailable_type = 'employee'
) ee
GROUP BY department_id
) em
ON em.department_id = d.id LEFT OUTER JOIN
ORDER BY emails_count DESC
LIMIT 20;
You also want an index on emails(emailable_id, emailable_type, id) and on emails(emailable_type, emailable_id, id).
This query runs more than 12 seconds, even though all tables are relatively small - about 2 thousands rows.
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
Both tables have fields id as primary keys. Besides, id_field, id_order,attr_73206_ and all fields in master_slave are indexed. As for the logic of this query, on the whole it's of master-detail kind. Table object_73130_ is a master-table, table object_73200_ is a detail-table. They are linked by a master_slave table. object_73101_ is an ad-hoc table used to get a real value for the field attr_73206_ by its id. For each row in the master table the query returns a field from the very first row of its detail table. Firstly, the query had another look, but here at stackoverflow I was advised to use this more optimized structure (instead of a subquery which was used previously, and, by the way, the query started to run much faster). I observe that the subquery in the first JOIN block runs very fast but returns a number of rows comparable to the number of rows in the main master-table. In any way, I do not know how to optimize it. I just wonder why a simple fast-running join causes so much trouble. Oh, the main observation is that if I remove an ad-hoc object_73101_ from the query to return just an id, but not a real value, then the query runs as quick as a flash. So, all attention should be focused on this part of the query
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_
Why does it slow down the whole query so terribly?
EDIT
In this way it runs super-fast
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, attr_73206_ FROM object_73200_ o
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = o.id
ORDER BY o.id_order
) AS o GROUP BY o.id_field
) AS o ON f1.id = o.id_field
LEFT JOIN object_73101_ t0 ON t0.id = o.attr_73206_
So, you can see, that I just put the add-hoc join outside of the subquery. But, the problem is, that subquery is automatically created and I have an access to that part of algo which creates it and I can modify this algo, and I do not have access to the part of algo which builds the whole query, so the only thing I can do is just to fix the subquery somehow. Anyway, I still can't understand why INNER JOIN inside a subquery can slow down the whole query hundreds of times.
EDIT
A new version of query with different aliases for each table. This has no effect on the performance:
SELECT attr_73206_ AS attr_73270_
FROM object_73130_ f1
LEFT OUTER JOIN (
SELECT id_field, attr_73206_ FROM (
SELECT m.id_field, t0.attr_73102_ AS attr_73206_ FROM object_73200_ a
INNER JOIN master_slave m ON (m.id_object = 73130 OR m.id_object = 73290) AND (m.id_master = 73200 OR m.id_master = 73354) AND m.id_slave_field = a.id
INNER JOIN object_73101_ t0 ON t0.id = a.attr_73206_
ORDER BY a.id_order
) AS b GROUP BY b.id_field
) AS c ON f1.id = c.id_field
EDIT
This is the result of EXPLAIN command:
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra |
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1564 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1575 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,..| id_object | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY,attr_73206_ | PRIMARY | 4 | 1 |
| 3 | DERIVED | t0 | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
What is wrong with that?
EDIT
Here is the result of EXPLAIN command for the "super-fast" query
| id | select_type | TABLE | TYPE | possible_keys | KEY | key_len | ROWS | Extra
| 1 | PRIMARY | f1 | INDEX | NULL | PRIMARY | 4 | 1570 | USING INDEX
| 1 | PRIMARY | derived2| ALL | NULL | NULL | NULL | 1570 |
| 1 | PRIMARY | t0 | eq_ref| PRIMARY | PRIMARY | 4 | 1 |
| 2 | DERIVED | derived3| ALL | NULL | NULL | NULL | 1581 | USING TEMPORARY; USING filesort
| 3 | DERIVED | m | RANGE | id_object,id_master,| id_bject | 4 | 1356 | USING WHERE; USING TEMPORARY; USING filesort
| 3 | DERIVED | a | eq_ref | PRIMARY | PRIMARY | 4 | 1 |
CLOSED
I will use my own "super-fast" query, which I presented above. I think it is impossible to optimize it anymore.
Without knowing the exact nature of the data/query, there are a couple things that I'm seeing:
MySQL is notoriously bad at handling sub-selects, as it requires the creation of derived tables. In fact, some versions of MySQL also ignore indexes when using sub-selects. Typically, it's better to use JOINs instead of sub-selects, but if you need to use sub-selects, it's best to make that sub-select as lean as possible.
Unless you have a very specific reason for putting the ORDER BY in the sub-select, it may be a good idea to move it to the "main" query portion because the result set may be smaller (allowing for quicker sorting).
So all that being said, I tried to re-write your query using JOIN logic, but I was wondering What table the final value (attr_73102_) is coming from? Is it the result of the sub-select, or is it coming from table object_73130_? If it's coming from the sub-select, then I don't see why you're bothering with the original LEFT JOIN, as you will only be returning the list of values from the sub-select, and NULL for any non-matching rows from object_73130_.
Regardless, not knowing this answer, I think the query below MAY be syntactically equivalent:
SELECT t0.attr_73102_ AS attr_73270_
FROM object_73130_ f1
LEFT JOIN (object_73200_ o
INNER JOIN master_slave m ON m.id_slave_field = o.id
INNER JOIN object_73101_ t0 ON t0.id = o.attr_73206_)
ON f1.id = o.id_field
WHERE m.id_object IN (73130,73290)
AND m.id_master IN (73200,73354)
GROUP BY o.id_field
ORDER BY o.id_order;
I am new to this kind of relational type of database design. I just designed the database in this manner. However, I am quite confused on this JOIN of MySQL. What should be my query to join all this table. If you can see the table users is the reference of all the tables.
users
+----------+----------------+-----------------+
| users_id | users_level_id | users_status_id |
+----------+----------------+-----------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
+----------+----------------+-----------------+
users_credentials
+----------+---------------------------+-----------------------------+----------------------------+
| users_id | users_credential_username | users_credential_email | users_credential_password |
+----------+---------------------------+-----------------------------+----------------------------+
| 1 | super | super#gmail.com | $5$e94e9e$vptscyHjm8rdX0j6 |
| 2 | admin | admin#gmail.com | $5$fVuOmySyC0PttbiMn8in0k7 |
+----------+---------------------------+-----------------------------+----------------------------+
users_level
+----------------+-------------------------+
| users_level_id | users_level_description |
+----------------+-------------------------+
| 1 | Super Administrator |
| 2 | Administrator |
+----------------+-------------------------+
users_status
+-----------------+--------------------------+
| users_status_id | users_status_description |
+-----------------+--------------------------+
| 0 | Disabled |
| 1 | Enabled |
+-----------------+--------------------------+
Try this
SELECT u.*, uc.*, ul.*, us.*
FROM users u
INNER JOIN users_credentials uc
ON u.users_id = uc.users_id
INNER JOIN users_level ul
ON u.users_level_id = ul.users_level_id
INNER JOIN users_status us
ON u.users_status_id = us.users_status_id
Note the use INNER JOIN: this means that if a user does not have corresponing record on joined table it won't be shown; if you need to return every user even without matching record on related tables, change INNER JOIN with LEFT JOIN.
EDITED after user comment:
If you want to return just some column, define it as this example
SELECT uc.users_credential_username AS username,
uc.users_credential_email AS email,
uc.users_credential_password AS pwd,
ul.users_level_description AS level,
us.users_status_description AS status
This is a simple query that will join all of them
select *
from users
left join users_credentials
on users_credentials.users_id = users.users_id
left join users_level
on users_level.users_level_id = users.users_level_id
left join users_status
on users_status.users_status_id = users.users_status_id
EDIT
if you want to fetch data from different tables
user this
select users.* , users_credentials.* , users_level.* , users_status.*
from users
left join users_credentials
on users_credentials.users_id = users.users_id
left join users_level
on users_level.users_level_id = users.users_level_id
left join users_status
on users_status.users_status_id = users.users_status_id
I think this look like this :
SELECT * FROM users
LEFT JOIN user_credentials ON users.user_id = user_credential.user_id
LEFT JOIN user_level ON users.users_level_id = users_level.users_level_id
and so on..
Use this type of query....
SELECT c.*, l.*, s.*
FROM users AS u
INNER JOIN users_credentials AS c ON (u.users_id = C.users_id)
INNER JOIN users_level AS l ON (u.users_level_id= l.users_level_id)
INNER JOIN users_status AS s ON (u.users_status_id= s.users_status_id)
Where you can specify the field what you want in .* ...
Join is used to fetch data from normalized tables which have foreign key relation with the reference table.
For the above table with join you can fetch data among two tables with the help of reference table.
For example
Select * from users a JOIN users_credentials b
ON a.user_id=b.user_id JOIN users_level c
ON c.users_level_id=a.users_level_id
where users_credential_username='super';
The result of this query will give you the detail like users_level_description for the user with users_credential_username=super.