How to optimize this MySQL query? (CROSS JOIN, subquery) - mysql

I have a challenging question for MySQL experts.
I have a users permissions system with 4 tables:
users (id | email | created_at)
permissions (id | responsibility_id | key | weight)
permission_user (id | permission_id | user_id)
responsibilities (id | key | weight)
Users can have any number of permissions assigned and any permission can be granted to any number of users (many to many). Responsibilities are like groups for permissions, each permission belongs to exactly one responsibility. For example, one permission is called update with responsibility of customers. Another one would be delete with orders responsibility.
I need to get a full map of permissions per user, but only for those who have at least one permission granted. Results should be ordered by:
User's number of permissions from most to least
User's created_at column, oldest first
Responsibility's weight
Permission's weight
Example result set:
user_id | responsibility | permission | granted
-----------------------------------------------
5 | customers | create | 1
5 | customers | update | 1
5 | orders | create | 1
5 | orders | update | 1
2 | customers | create | 0
2 | customers | delete | 0
2 | orders | create | 1
2 | orders | update | 0
Let's say I have 10 users in database, but only two of them have any permissions granted. There are 4 permissions in total:
create of customers responsibility
update of customers responsibility
create of orders responsibility
update of orders responsibility.
That's why we have 8 records in results (2 users with any permission × 4 permissions). User with id = 5 is displayed first, because he's got more permissions. If there were any draws, the ones with older created_at date would go first. Permissions are always sorted by the weight of their responsibility and then by their own weight.
My question is, how to write optimal query for this case? I have already made one myself and it works good:
SELECT `users`.`id` AS `user_id`,
`responsibilities`.`key` AS `responsibility`,
`permissions`.`key` AS `permission`,
!ISNULL(`permission_user`.`id`) AS `granted`
FROM `users`
CROSS JOIN `permissions`
JOIN `responsibilities`
ON `responsibilities`.`id` = `permissions`.`responsibility_id`
LEFT JOIN `permission_user`
ON `permission_user`.`user_id` = `users`.`id`
AND `permission_user`.`permission_id` = `permissions`.`id`
WHERE (
SELECT COUNT(*)
FROM `permission_user`
WHERE `user_id` = `users`.`id`
) > 0
ORDER BY (
SELECT COUNT(*)
FROM `permission_user`
WHERE `user_id` = `users`.`id`
) DESC,
`users`.`created_at` ASC,
`responsibilities`.`weight` ASC,
`permissions`.`weight` ASC
The problem is that I'm using the same subquery twice.
Can I do better? I count on you, MySQL experts!
--- EDIT ---
Thanks to Gordon Linoff's comment I made it use HAVING clause:
SELECT `users`.`email`,
`responsibilities`.`key`,
`permissions`.`key`,
!ISNULL(`permission_user`.`id`) as `granted`,
(
SELECT COUNT(*)
FROM `permission_user`
WHERE `user_id` = `users`.`id`
) AS `total_permissions`
FROM `users`
CROSS JOIN `permissions`
JOIN `responsibilities`
ON `responsibilities`.`id` = `permissions`.`responsibility_id`
LEFT JOIN `permission_user`
ON `permission_user`.`user_id` = `users`.`id`
AND `permission_user`.`permission_id` = `permissions`.`id`
HAVING `total_permissions` > 0
ORDER BY `total_permissions` DESC,
`users`.`created_at` ASC,
`responsibilities`.`weight` ASC,
`permissions`.`weight` ASC
I was surprised to discover that HAVING can go alone without GROUP BY.
Can it now be improved for better performance?

Probably the most efficient way to do this is:
SELECT u.email, r.`key`, r.`key`,
!ISNULL(pu.id) as `granted`
FROM (SELECT u.*,
(SELECT COUNT(*) FROM `permission_user` pu WHERE pu.user_id = u.id
) AS `total_permissions`
FROM `users` u
) u CROSS JOIN
permissions p JOIN
responsibilities r
ON r.id = p.responsibility_id LEFT JOIN
permission_user pu
ON pu.user_id = u.id AND
pu.permission_id = p.id
WHERE u.total_permissions > 0
ORDER BY `total_permissions` DESC,
`users`.`created_at` ASC,
`responsibilities`.`weight` ASC,
`permissions`.`weight` ASC;
This will run the subquery once per user, rather than once per user/permission combination (as both the modified query and the original query were doing). This has two costs. The first is the materialization of the subquery, so the data in the users table has to be read and written again. Probably not a big deal, given everything else in the query. The second is the loss of indexes on the users table. Once again, with a cross join, indexes are (probably) not being used, so this is also minor.

Related

MySQL LEFT JOIN using OR is very slow

table users has about 80,000 records
table friends has about 900,000 records
There are 104 records with firstname = 'verena'
this query (the point of the query is gone because its very simplified) is very slow (> 20 seconds):
SELECT users.id FROM users
LEFT JOIN friends ON (
users.id = friends.user_id OR
users.id = friends.friend_id
)
WHERE users.firstname = 'verena';
However, if I remove the OR inside the JOIN, the query is instant, so either:
SELECT users.id FROM users
LEFT JOIN friends ON (
users.id = friends.user_id
)
WHERE users.firstname = 'verena';
returning 1487 results or
SELECT users.id FROM users
LEFT JOIN friends ON (
users.id = friends.friend_id
)
WHERE users.firstname = 'verena';
returning 2849 results
execute instantly (0.001s)
If I remove everything else and go straight for
SELECT 1 FROM friends WHERE user_id = xxx OR friend_id = xxx
or
SELECT id FROM users WHERE firstname = 'verena';
these queries are also instant.
Indexes for friends.friend_id, friends.user_id and users.firstname are set.
I don't understand why the top query is slow while if manually taking it apart and executing the statements isolated everything is blazing fast.
My only suspicion now is that MariaDB is first joining ALL users with friends and only after that filters for WHERE firstname = 'verena', instead of the wanted behavior of first filtering for firstname = 'verena' and then joining the results with the friends table, but even then I don't see why removing the OR inside the JOIN condition would make it fast.
I tested this on 2 different machines, one running MariaDB 10.3.22 with Galera cluster and one with MariaDB 10.4.12 without Galera cluster
What is the technical reason why the top query is having such a huge slowdown and how do I fix this without having to split the SQL into several statements?
Edit:
Here is the EXPLAIN output for it, telling it's not using any index for the friends table and scanning through all records as correctly stated in Barmar's comment:
+------+-------------+---------+------+-------------------+-----------+---------+-------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+-------------------+-----------+---------+-------+--------+------------------------------------------------+
| 1 | SIMPLE | users | ref | firstname | firstname | 768 | const | 104 | Using where; Using index |
| 1 | SIMPLE | friends | ALL | user_id,friend_id | NULL | NULL | NULL | 902853 | Range checked for each record (index map: 0x6) |
+------+-------------+---------+------+-------------------+-----------+---------+-------+--------+------------------------------------------------+
Is there any way to make SQL use both indexes or do I just have to accept this limitation and work around it using for example Barmar's suggestion?
MySQL is not usually able to use an index when you use OR to join with different columns. It can only use one index per table in a join, so if it uses the friends.user_id index, it won't use friends.friend_id, and vice versa.
The solution is to do the two fast queries and combine them with UNION.
SELECT users.id FROM users
LEFT JOIN friends ON (
users.id = friends.user_id
)
WHERE users.firstname = 'verena';
UNION
SELECT users.id FROM users
LEFT JOIN friends ON (
users.id = friends.friend_id
)
WHERE users.firstname = 'verena';

GROUP_CONCAT in sub-query based on specified values

User Table:
ID InstructionSets
1 123,124
Instruction Set Table:
ID Name
123 Learning SQL
124 Learning More SQL
Desired Query Result:
UserID SetID SetNames
1 123,124 Learning SQL,Learning More SQL
Current SQL:
SELECT U1.ID AS UserID, U1.InstructionSets AS SetID, (
SELECT GROUP_CONCAT(Name ORDER BY FIELD(I1.ID, U1.InstructionSets))
FROM Instructions I1
WHERE I1.ID IN (U1.InstructionSets)
) AS SetName
FROM Users U1
WHERE `ID` = 1
RESULT
UserID SetID SetNames
1 123,124 Learning SQL
As expected, if I remove the WHERE clause in the sub-query, all of the SetNames appear; but if I specify the required IDs, I only get the name associated with the first ID. Obviously, I also need to fetch the SetNames in the same order as the IDs. Hence ORDER BY in GROUP_CONCAT.
Also:
Is there a better approach (other than storing the user instruction set assignments in a separate table — overkill for this application)? Couldn't see how to use JOIN in this
situation.
Is there a better title for this question?
Thanks.
Instead of IN use LIKE operator like this:
SELECT U1.ID AS UserID, U1.InstructionSets AS SetID, (
SELECT GROUP_CONCAT(Name ORDER BY (I1.ID))
FROM Instructions I1
WHERE CONCAT(',', U1.InstructionSets, ',') LIKE concat('%,', I1.ID, ',%')
) AS SetName
FROM Users U1
WHERE `ID` = 1
See the demo.
Results:
| UserID | SetID | SetName |
| ------ | ------- | ------------------------------ |
| 1 | 123,124 | Learning SQL,Learning More SQL |
We can use FIND_IN_SET(). In this context, using FIELD() function doesn't make sense.
We can also use FIND_IN_SET() in the WHERE clause. (Function returns 0 when the string isn't found in the string list.)
e.g.
SELECT u.id AS userid
, u.instructionsets AS setid
, ( SELECT GROUP_CONCAT(i.name ORDER BY FIND_IN_SET(i.id, u.instructionsets))
FROM `Instructions` i
WHERE FIND_IN_SET(i.id, u.instructionsets))
) AS setname
FROM `Users` u
WHERE u.id = 1
Storing comma separated lists is an anti-pattern; a separate table isn't overkill.
Assuming id is unique in Users table, we could do a join operation with a GROUP BY
SELECT u.id AS userid
, MIN(u.instructionsets) AS setid
, GROUP_CONCAT(i.name ORDER BY FIND_IN_SET(i.id, u.instructionsets))) AS setname
FROM `Users` u
LEFT
JOIN `Instructions` i
ON FIND_IN_SET(i.id, u.instructionsets)
WHERE u.id = 1
GROUP BY u.id

Selecting a count of rows having a max value

Working example: http://sqlfiddle.com/#!9/80995/20
I have three tables, a user table, a user_group table, and a link table.
The link table contains the dates that users were added to user groups. I need a query that returns the count of users currently in each group. The most recent date determines the group that the user is currently in.
SELECT
user_groups.name,
COUNT(l.name) AS ct,
GROUP_CONCAT(l.`name` separator ", ") AS members
FROM user_groups
LEFT JOIN
(SELECT MAX(added), group_id, name FROM link LEFT JOIN users ON users.id = link.user_id GROUP BY user_id) l
ON l.group_id = user_groups.id
GROUP BY user_groups.id
My question is if the query I have written could be optimized, or written better.
Thanks!
Ben
You actual query is not giving you the answer you want; at least, as far as I understand your question. John actually joined group 2 on 2017-01-05, yet it appears on group 1 (that he joined on 2017-01-01) on your results. Note also you're missing one Group 4.
Using standard SQL, I think the next query is what you're looking for. The comments in the query should clarify what each part is doing:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT * FROM
(-- For each user, find most recent date s/he got into a group
SELECT
user_id AS the_user_id, MAX(added) AS last_added
FROM
link
GROUP BY
the_user_id
) AS u_a
-- Join back to the link table, so that the `group_id` can be retrieved
JOIN link l2 ON l2.user_id = u_a.the_user_id AND l2.added = u_a.last_added
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
This can be written in a more compact way in MySQL (abusing the fact that, in older versions of MySQL, it doesn't follow the SQL standard for the GROUP BY restrictions).
That's what you'll get:
group_name | member_count | members
:--------- | -----------: | :-------------
Group 1 | 2 | Mikie, Dominic
Group 2 | 2 | John, Paddy
Group 3 | 0 | null
Group 4 | 1 | Nellie
dbfiddle here
Note that this query can be simplified if you use a database with window functions (such as MariaDB 10.2). Then, you can use:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT
user_id AS the_user_id,
last_value(group_id) OVER (PARTITION BY user_id ORDER BY added) AS group_id
FROM
link
GROUP BY
user_id
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
dbfiddle here

MySQL - Fetching users who sent the most recent messages

I have a messages table as follows:
Basically what I want is, I want to fetch n users who sent the most recent messages to a group. So it has to be grouped by from_user_id and sorted by id in descending order. I have the following query:
SELECT `users`.`id` AS `user_id`, `users`.`username`, `users`.`image`
FROM `group_messages`
JOIN `users` ON `users`.`id` = `group_messages`.`from_user_id`
WHERE `group_messages`.`to_group_id` = 31
GROUP BY `users`.`id`
ORDER BY `group_messages`.`id` DESC;
The problem with this is, when I group by user.id, the row with the smallest id field is taken into account. Therefor what I get is not in the order which id is descending.
So is there a way to group by, taking the greatest id into account ? Or should I approach it another way ?
Thanks in advance.
Edit: I think I got it.
SELECT `x`.`id`, `users`.`id` AS `user_id`, `users`.`username`, `users`.`image`
FROM (SELECT * FROM `group_messages` ORDER BY `group_messages`.`id` DESC) `x`
JOIN `users` ON `users`.`id` = `x`.`from_user_id`
WHERE `x`.`to_group_id` = 31
GROUP BY `users`.`id`
ORDER BY `x`.`id` DESC;
Just had to make a select from an already ordered list.
Using order by inside subquery is not a very efficient way. Try this:
users table
| uid | name | ---------- |
messages table
|id | from_uid | to_group_id | ----------- |
SELECT u.* FROM users as u JOIN (
SELECT g1.*
FROM messages g1 LEFT JOIN messages g2
ON g1.from_uid = g2.from_uid AND g1.to_groupid = g2.to_groupid
AND g1.id<g2.id
WHERE g2.id is NULL) as lastmessage
ON lastmessage.from_uid = u.uid
WHERE lastmessage.to_groupid = 1
ORDER BY lastmessage.id DESC;
Its not good to have order by in subqueries because they will make the queries slow and in your case u were running it on the whole table .
Check this out Retrieving the last record in each group.

MySQL Statement extremely slow even with indexes

The following query takes around 200 seconds to complete. What i'm trying to achieve is get users who have made 6 or more payments, who have not made any orders yet (there are 2 orders tables for different marketplaces).
u.id, ju.id are both primary keys.
I've indexed the user_id and order_status combined into one index on both orders tables. If I remove the join and COUNT() on the mp_orders table, the query takes 8 seconds to complete, but with it, it takes too long. I think i've indexed every thing that I could have but I don't understand why it takes so long to complete. Any ideas?
SELECT
u.id,
ju.name,
COUNT(p.id) as payment_count,
COUNT(o.id) as order_count,
COUNT(mi.id) as marketplace_order_count
FROM users as u
INNER JOIN users2 as ju
ON u.id = ju.id
INNER JOIN payments as p
ON u.id = p.user_id
LEFT OUTER JOIN orders as o
ON u.id = o.user_id
AND o.order_status = 1
LEFT OUTER JOIN mp_orders as mi
ON u.id = mi.producer
AND mi.order_status = 1
WHERE u.package != 1
AND u.enabled = 1
AND u.chart_ban = 0
GROUP BY u.id
HAVING COUNT(p.id) >= 6
AND COUNT(o.id) = 0
AND COUNT(mi.id) = 0
LIMIT 10
payments table
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| id | bigint(255) | NO | PRI | NULL | auto_increment |
| user_id | bigint(255) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
orders table (mp_orders table pretty much the same)
+-----------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+----------------+
| id | int(255) | NO | PRI | NULL | auto_increment |
| order_number | varchar(1024) | NO | MUL | NULL | |
| user_id | int(255) | NO | MUL | NULL | |
+-----------------+---------------+------+-----+---------+----------------+
You don't need to COUNT the rows of your orders, you need to retrieve users which doesn't have orders, that's not really the same thing.
Instead of counting, filter the users which have no orders :
SELECT
u.id,
ju.name,
COUNT(p.id) as payment_count
FROM users as u
INNER JOIN users2 as ju
ON u.id = ju.id
INNER JOIN payments as p
ON u.id = p.user_id
LEFT OUTER JOIN orders as o
ON u.id = o.user_id
AND o.order_status = 1
LEFT OUTER JOIN mp_orders as mi
ON u.id = mi.producer
AND mi.order_status = 1
WHERE u.package != 1
AND u.enabled = 1
AND u.chart_ban = 0
AND o.id IS NULL -- filter happens here
AND mi.id IS NULL -- and here
GROUP BY u.id
HAVING COUNT(p.id) >= 6
LIMIT 10
This will prevent the engine to count each of the orders for each of your users, and you will gain a lot of time.
One can think that the engine should use the index for doing the count, and so the count must be fast enough.
I will quote from a different site: InnoDB COUNT(id) - Why so slow?
It may be to do with the buffering, InnoDb does not cache the index it
caches into memory the actual data rows, because of this for what
seems to be a simple scan it is not loading the primary key index but
all the data into RAM and then running your query on it. This may take
some time to work - hopefully if you were running queries after this
on the same table then they would run much faster.
MyIsam loads the indexes into RAM and then runs its calculations over
this space and then returns a result, as an index is generally much
much smaller than all the data in the table you should see an
immediate difference there.
Another option may be the way that innodb stores the data on the disk
- the innodb files are a virtual tablespace and as such are not necessarily ordered by the data in your table, if you have a
fragmented data file then this could be creating problems for your
disk IO and as a result running slower. MyIsam generally are
sequential files, and as such if you are using an index to access data
the system knows exactly in what location on disk the row is located -
you do not have this luxury with innodb, but I do not think this
particular issue comes into play with just a simple count(*)
==================== http://dev.mysql.com/doc/refman/5.0/en/innodb-restrictions.html
explains this:
InnoDB does not keep an internal count of rows in a table. (In
practice, this would be somewhat complicated due to multi-versioning.)
To process a SELECT COUNT(*) FROM t statement, InnoDB must scan an
index of the table, which takes some time if the index is not entirely
in the buffer pool. To get a fast count, you have to use a counter
table you create yourself and let your application update it according
to the inserts and deletes it does. If your table does not change
often, using the MySQL query cache is a good solution. SHOW TABLE
STATUS also can be used if an approximate row count is sufficient. See
Section 14.2.11, “InnoDB Performance Tuning Tips”.
=================== todd_farmer:It actually does explain the difference - MyISAM understands that COUNT(ID) where ID is a PK column
is the same as COUNT(*), which MyISAM keeps precalculated while InnoDB
does not.
Try removing the COUNT() = 0 by a IS NULL check instead:
SELECT
u.id,
ju.name,
COUNT(p.id) as payment_count,
0 as order_count,
0 as marketplace_order_count
FROM users as u
INNER JOIN users2 as ju
ON u.id = ju.id
INNER JOIN payments as p
ON u.id = p.user_id
LEFT OUTER JOIN orders as o
ON u.id = o.user_id
AND o.order_status = 1
LEFT OUTER JOIN mp_orders as mi
ON u.id = mi.producer
AND mi.order_status = 1
WHERE
u.package != 1
AND u.enabled = 1
AND u.chart_ban = 0
AND mi.id IS NULL
AND o.id IS NULL
GROUP BY u.id
HAVING COUNT(p.id) >= 6
LIMIT 10
But I think 8 seconds is still too much for the plain query. You should post the explain plan of the main query without the OUTER JOINS to see what's wrong, for example the package, enabled and chart-ban filters could be totally ruining it.