The purpose of this query is to list the distinct users someone has connections to (ie. users that are followed by or are following the user with id 256 but excludes users who are either blocking or are blocked by the current user making the request (user with id 2)
The relationships table is pretty simple. The status column can be one of two values: "following" or "blocked":
mysql> describe relationships;
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| follower_id | int(11) | YES | MUL | NULL | |
| followee_id | int(11) | YES | MUL | NULL | |
| created_at | datetime | YES | | NULL | |
| updated_at | datetime | YES | | NULL | |
| status | varchar(191) | YES | MUL | NULL | |
+-------------+--------------+------+-----+---------+----------------+
This query currently takes about 58 seconds to complete! User 256 only has 1500 connections. To put this is context, there are roughly 10,000 user rows, 5500 relationships rows.
SELECT DISTINCT `users`.*,
-- "followed" is just a flag indicating if user #2 is currently following a given user
(
SELECT COUNT(*) FROM `relationships`
WHERE `relationships`.`followee_id` = `users`.`id`
AND `relationships`.`follower_id` = 2
) AS 'followed'
FROM `users`
INNER JOIN `relationships`
ON (
(`users`.`id` = `relationships`.`follower_id`
AND `relationships`.`followee_id` = 256
)
OR (`users`.`id` = `relationships`.`followee_id`
AND `relationships`.`follower_id` = 256
)
)
WHERE `relationships`.`status` = 'following'
AND (
-- Ensure we don't return users who are blocked by user #2
`users`.`id` NOT IN (
SELECT `relationships`.`followee_id`
FROM `relationships`
WHERE `relationships`.`follower_id` = 2
AND `relationships`.`status` = 'blocked'
)
)
AND (
-- Ensure we don't return users who are blocking user #2
`users`.`id` NOT IN (
SELECT `relationships`.`follower_id`
FROM `relationships`
WHERE `relationships`.`followee_id` = 2
AND `relationships`.`status` = 'blocked'
)
)
ORDER BY `users`.`id` ASC
LIMIT 10
Here's are the current indexes on relationships:
mysql> show index from relationships;
+---------------+------------+---------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------------+------------+---------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| relationships | 0 | PRIMARY | 1 | id | A | 3002 | NULL | NULL | | BTREE | | |
| relationships | 0 | index_relationships_on_status_and_follower_id_and_followee_id | 1 | status | A | 2 | NULL | NULL | YES | BTREE | | |
| relationships | 0 | index_relationships_on_status_and_follower_id_and_followee_id | 2 | follower_id | A | 3002 | NULL | NULL | YES | BTREE | | |
| relationships | 0 | index_relationships_on_status_and_follower_id_and_followee_id | 3 | followee_id | A | 3002 | NULL | NULL | YES | BTREE | | |
| relationships | 1 | index_relationships_on_followee_id | 1 | followee_id | A | 3002 | NULL | NULL | YES | BTREE | | |
| relationships | 1 | index_relationships_on_follower_id | 1 | follower_id | A | 3002 | NULL | NULL | YES | BTREE | | |
| relationships | 1 | index_relationships_on_status_and_followee_id_and_follower_id | 1 | status | A | 2 | NULL | NULL | YES | BTREE | | |
| relationships | 1 | index_relationships_on_status_and_followee_id_and_follower_id | 2 | followee_id | A | 3002 | NULL | NULL | YES | BTREE | | |
| relationships | 1 | index_relationships_on_status_and_followee_id_and_follower_id | 3 | follower_id | A | 3002 | NULL | NULL | YES | BTREE | | |
+---------------+------------+---------------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
explain results:
mysql> EXPLAIN SELECT DISTINCT `users`.*, (SELECT COUNT(*) FROM `relationships` WHERE `relationships`.`followee_id` = `users`.`id` AND `relationships`.`follower_id` = 2) AS 'followed' FROM `users` INNER JOIN `relationships` ON(`users`.`id` = `relationships`.`follower_id` AND `relationships`.`followee_id` = 256) OR (`users`.`id` = `relationships`.`followee_id` AND `relationships`.`follower_id` = 256) WHERE `relationships`.`status` = 'following' AND (`users`.`id` NOT IN (SELECT `relationships`.`followee_id` FROM `relationships` WHERE `relationships`.`follower_id` = 2 AND `relationships`.`status` = 'blocked')) AND (`users`.`id` NOT IN (SELECT `relationships`.`follower_id` FROM `relationships` WHERE `relationships`.`followee_id` = 2 AND `relationships`.`status` = 'blocked')) ORDER BY `users`.`id` ASC LIMIT 10;
+----+--------------------+---------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+-------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+-------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | relationships | index_merge | index_relationships_on_status_and_follower_id_and_followee_id,index_relationships_on_followee_id,index_relationships_on_follower_id,index_relationships_on_status_and_followee_id_and_follower_id | index_relationships_on_followee_id,index_relationships_on_follower_id | 5,5 | NULL | 2 | Using union(index_relationships_on_followee_id,index_relationships_on_follower_id); Using where; Using temporary; Using filesort |
| 1 | PRIMARY | users | ALL | PRIMARY | NULL | NULL | NULL | 1534 | Range checked for each record (index map: 0x1) |
| 4 | SUBQUERY | relationships | ref | index_relationships_on_status_and_follower_id_and_followee_id,index_relationships_on_followee_id,index_relationships_on_follower_id,index_relationships_on_status_and_followee_id_and_follower_id | index_relationships_on_status_and_follower_id_and_followee_id | 767 | const | 1 | Using where; Using index |
| 3 | SUBQUERY | relationships | ref | index_relationships_on_status_and_follower_id_and_followee_id,index_relationships_on_followee_id,index_relationships_on_follower_id,index_relationships_on_status_and_followee_id_and_follower_id | index_relationships_on_status_and_follower_id_and_followee_id | 772 | const,const | 1 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | relationships | ref | index_relationships_on_followee_id,index_relationships_on_follower_id | index_relationships_on_followee_id | 5 | development.users.id | 1 | Using where |
+----+--------------------+---------------+-------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+-------------------------------+------+----------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
It's hard to give you a concrete answer without testing this but I think this part of the query is the problem
SELECT DISTINCT `users`.*, (
SELECT COUNT(*) FROM `relationships`
WHERE `relationships`.`followee_id` = `users`.`id`
AND `relationships`.`follower_id` = 2
) AS 'followed'
You're also using order by. Remove DISTINCT and order by and see if things speed up. I know it changes the query but I suspect that distinct is basically building a bunch of temporary tables and throwing them away for every row that it needs to check. Have a look here
http://dev.mysql.com/doc/refman/5.7/en/distinct-optimization.html
Counts can be slow. make sure that the count is working from the fastest column. See this...
https://www.percona.com/blog/2007/04/10/count-vs-countcol/
A good way to think about SQL is in SETS. Luckily MySQL supports sub queries.
https://dev.mysql.com/doc/refman/5.7/en/from-clause-subqueries.html
Some pseudo SQL follows...
select user_id
from relationships as follower, relationships as followee
where ...
In the above we have two sets that we can then manipulate. Using sub queries this gets really interesting
select user_id
from (select user_id as f1 from relationships where ...) as follower,
(select user_id as f2 from relationships where ...) as followee
where ...
I've always found something like the above an easy way to think about self referencing tables.
It is hard to tell exactly how you should optimize your query and structure, first general hints:
use integers/bits/enums instead of varchars
use not null columns as much as possible
usually it makes sense to have unsigned columns (at least to have bigger range)
try different approaches to build query (check below)
distinct is quite expensive operation
sub-queries sometimes much faster neither joins
Anyway, I've prepared sample fiddle with proposed optimizations, I've changed names of the columns to reduce confusion
final query could look like this:
select *
from users a
where
(
id in (select follower_id as id from relationships USE INDEX (user_id) where user_id = 256 and status = 'following')
or id in (select user_id from relationships USE INDEX (follower_id) where follower_id = 256 and status = 'following')
)
and id not in (select follower_id from relationships USE INDEX (user_id) where user_id = 2 and status = 'blocked')
and id not in (select user_id from relationships USE INDEX (follower_id) where follower_id = 2 and status = 'blocked')
though, it could be rewritten as follows:
select *
from users a
where
id in (select follower_id as id from relationships USE INDEX (user_id) where user_id = 256 and status = 'following'
union all
select user_id from relationships USE INDEX (follower_id) where follower_id = 256 and status = 'following')
and id not in (select follower_id from relationships USE INDEX (user_id) where user_id = 2 and status = 'blocked'
union all
select user_id from relationships USE INDEX (follower_id) where follower_id = 2 and status = 'blocked')
benchmark both, despite execution plan - actual performance may be different on real database
Don't use IN ( SELECT ... ), it optimizes poorly. Instead, either use a JOIN, or EXISTS ( SELECT ... ).
The OR to UNION trick is good, but not if it is still inside IN(...).
(To aid in readability, please omit the table name when there is only one table. And rename followee_id and/or follower_id; they are too close to each other in spelling.)
Related
I'd like to update followers in profile table by counting the followed_id on follow table.
mysql> explain follow;
+-------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------+------+-----+---------+-------+
| id | int(11) | NO | | NULL | |
| followed_id | int(11) | NO | | NULL | |
| follower_id | int(11) | NO | | NULL | |
+-------------+---------+------+-----+---------+-------+
And
mysql> explain profile;
+----------------+---------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_id | int(10) | NO | MUL | 0 | |
| followers | int(7) | NO | | 0 | |
| following | int(7) | NO | | 0 | |
+----------------+---------------+------+-----+-------------------+----------------+
Here is the query that I came up with:
UPDATE profile A
INNER JOIN (SELECT id,COUNT(*) idcount FROM follow GROUP BY id) as B
ON B.id = A.user_id
SET A.followers = B.idcount
But the query does not work as it should. It adds only 1 when profile has followers.
How can I fix this?
You are currently counting the number of rows for each id value in follow, which is always going to be 1. What you need to do is count the number of follower_id values for each followed_id. Also, as #juergend pointed out, you should use a LEFT JOIN so that you can get 0 values for users with no followers. Change your query to this:
UPDATE profile A
LEFT JOIN (SELECT followed_id, COUNT(DISTINCT follower_id) AS idcount
FROM follow
GROUP BY followed_id) as B ON B.followed_id = A.user_id
SET A.followers = COALESCE(B.idcount, 0)
You can use a similar query to update following:
UPDATE profile A
LEFT JOIN (SELECT follower_id, COUNT(DISTINCT followed_id) AS idcount
FROM follow
GROUP BY follower_id) as B ON B.follower_id = A.user_id
SET A.following = COALESCE(B.idcount, 0)
I have a database which consists of three tables, with the following structure:
restaurant table: restaurant_id, location_id, rating. Example: 1325, 77, 4.5
restaurant_name table: restaurant_id, language, name. Example: 1325, 'en', 'Pizza Express'
location_name table: location_id, language, name. Example: 77, 'en', 'New York'
I would like to get the restaurant info in English, sorted by location name and restaurant name, and use the LIMIT clause to paginate the result. So my SQL is:
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
This is terribly slow - so I refined my SQL with deferred JOIN, which make things a lot faster (from over 10 seconds to 2 seconds):
SELECT ln.name, rn.name
FROM restaurant r
INNER JOIN (
SELECT r.restaurant_id
FROM restaurant r
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
LIMIT 0, 50
) r1
ON r.restaurant_id = r1.restaurant_id
INNER JOIN location_name ln
ON r.location_id = ln.location_id
AND ln.language = 'en'
INNER JOIN restaurant_name rn
ON r.restaurant_id = rn.restaurant_id
AND rn.language = 'en'
ORDER BY ln.name, rn.name
2 seconds is unfortunately still not very acceptable to the user, so I go and check the EXPLAIN of the my query, and it appears that the slow part is on the ORDER BY clause, which I see "Using temporary; Using filesort". I checked the official reference manual about ORDER BY optimization and I come across this statement:
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it may still use indexes to find the rows that match the
WHERE clause. Examples:
The query joins many tables, and the columns in the ORDER BY are not
all from the first nonconstant table that is used to retrieve rows.
(This is the first table in the EXPLAIN output that does not have a
const join type.)
So for my case, given that the two columns I'm ordering by are from the nonconstant joined tables, index cannot be used. My question is, is there any other approach I can take to speed things up, or what I've done so far is already the best I can achieve?
Thanks in advance for your help!
EDIT 1
Below is the EXPLAIN output with the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | ALL | idx_restaurant_name_1 | NULL | NULL | NULL | 8484 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+----------------------------------------------+
Below is the EXPLAIN output without the ORDER BY clause:
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 50 | |
| 1 | PRIMARY | rn | ref | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | r1.restaurant_id,const,const | 1 | Using where |
| 1 | PRIMARY | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | r1.restaurant_id | 1 | |
| 1 | PRIMARY | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id,const,const | 1 | Using where |
| 2 | DERIVED | rn | index | idx_restaurant_name_1 | idx_restaurant_name_1 | 1538 | NULL | 8484 | Using where; Using index |
| 2 | DERIVED | r | eq_ref | PRIMARY,idx_restaurant_1 | PRIMARY | 4 | test.rn.restaurant_id | 1 | |
| 2 | DERIVED | ln | ref | idx_location_name_1 | idx_location_name_1 | 1538 | test.r.location_id | 1 | Using where; Using index |
+----+-------------+------------+--------+--------------------------+-----------------------+---------+--------------------------------+------+--------------------------+
EDIT 2
Below are the DDL of the table. I built them for illustrating this problem only, the real table has much more columns.
CREATE TABLE restaurant (
restaurant_id INT NOT NULL AUTO_INCREMENT,
location_id INT NOT NULL,
rating INT NOT NULL,
PRIMARY KEY (restaurant_id),
INDEX idx_restaurant_1 (location_id)
);
CREATE TABLE restaurant_name (
restaurant_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_restaurant_name_1 (restaurant_id, language),
INDEX idx_restaurant_name_2 (name)
);
CREATE TABLE location_name (
location_id INT NOT NULL,
language VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
INDEX idx_location_name_1 (location_id, language),
INDEX idx_location_name_2 (name)
);
Based on the EXPLAIN numbers, there could be about 170 "pages" of restaurants (8484/50)? I suggest that that is impractical for paging through. I strongly recommend you rethink the UI. In doing so, the performance problem you state will probably vanish.
For example, the UI could be 2 steps instead of 170 to get to the restaurants in Zimbabwe. Step 1, pick a country. (OK, that might be page 5 of the countries.) Step 2, view the list of restaurants in that country; it would be only a few pages to flip through. Much better for the user; much better for the database.
Addenda
In order to optimize the pagination, get the paginated list of pages from a single table (so that you can 'remember where you left off'). Then join the language table(s) to look up the translations. Note that this only looks up on page's worth of translations, not thousands.
I have:
_sms_users_
id
_join_smsuser_campaigns_
sms_user_id
I'd like to pull out the sms_users that don't have a record in the join_smsuser_campaigns table (via the join_smsuser_campaigns.sms_user_id = sms_users.id relationships)
I have the sql:
select * from sms_users where id not in (select sms_user_id from join_smsuser_campaigns);
edit:
here's the explain select results:
mysql> explain select u.* from sms_users u left join join_smsuser_campaigns c on u.id = c.sms_user_id where c.sms_user_id is null;
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
| 1 | SIMPLE | u | ALL | NULL | NULL | NULL | NULL | 42303 | |
| 1 | SIMPLE | c | index | NULL | index_join_smsuser_campaigns_on_campaign_id_and_sms_user_id | 8 | NULL | 30722 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+-------------------------------------------------------------+---------+------+-------+--------------------------------------+
mysql> describe sms_users;
+------------------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
mysql> describe join_smsuser_campaigns;
+---------------------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+------------------+------+-----+---------+-------+
| sms_user_id | int(11) | NO | | NULL | |
Looks like the problm is that I don't have an index on sms_user_id on the join?
This takes about 5 minutes to run and I see that there is one record. Is there a more efficient way to do this via a join? My sql skills are pretty basic.
Queries can get slow when having many items in the IN clause. Use a left join instead
select u.*
from sms_users u
left join join_smsuser_campaigns c on u.id = c.sms_user_id
where c.sms_user_id is null
See this great join explanation
I am making a message board and I trying to retrieve regular topics (ie, topics that are not stickied) and sort them by the date of the last posted message. I am able to accomplish this however when I have about 10,000 messages and 1500 topics the query time is >60 seconds.
My question is, is there anything I can do to my query to increase performance or is my design fundamentally flawed?
Here is the query that I am using.
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages
GROUP BY topic_id)
AND Messages.topic_id
NOT IN (
SELECT topic_id
FROM StickiedTopics)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
Edit Here is the Explain Plan
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| 1 | PRIMARY | Topics | ref | PRIMARY,board_id | board_id | 4 | const | 641 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | Users | eq_ref | PRIMARY | PRIMARY | 4 | spergs3.Topics.user_id | 1 | |
| 1 | PRIMARY | Messages | ref | topic_id | topic_id | 4 | spergs3.Topics.topic_id | 3 | Using where |
| 4 | DEPENDENT SUBQUERY | Messages | index | NULL | PRIMARY | 8 | NULL | 1 | |
| 3 | DEPENDENT SUBQUERY | StickiedTopics | index_subquery | topic_id | topic_id | 4 | func | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | Messages | index | NULL | topic_id | 4 | NULL | 3 | Using index |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
Indexes
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Messages | 0 | PRIMARY | 1 | message_id | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 0 | PRIMARY | 2 | revision_no | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 1 | user_id | 1 | user_id | A | 432 | NULL | NULL | | BTREE | |
| Messages | 1 | topic_id | 1 | topic_id | A | 3318 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Topics | 0 | PRIMARY | 1 | topic_id | A | 1205 | NULL | NULL | | BTREE | |
| Topics | 1 | user_id | 1 | user_id | A | 133 | NULL | NULL | | BTREE | |
| Topics | 1 | board_id | 1 | board_id | A | 1 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Users | 0 | PRIMARY | 1 | user_id | A | 2051 | NULL | NULL | | BTREE | |
| Users | 0 | username_UNIQUE | 1 | username | A | 2051 | NULL | NULL | | BTREE | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
I would start with the first basis of qualified topics, get those IDs, then join out after.
My inner first query does a pre-qualify grouped by topic_id and max message to just get distinct IDs pre-qualified. I've also applied a LEFT JOIN to the stickiesTopics too. Why? By doing a left-join, I can look for those that are FOUND (those you want to exclude). So I've applied a WHERE clause for Stickies topic ID is NULL (ie: NOT found). So by doing this, we've ALREADY paired down the list SIGNIFICANTLY without doing several nested sub-queries. From THAT result, we can join to the messages, topics (including qualifier of board_id = 1), users and get parts as needed. Finally, apply a single WHERE IN sub-select for your MIN(posted) qualifier. Don't understand the basis of that, but left it in as part of your original query. Then the order by and limit.
SELECT STRAIGHT_JOIN
M.topic_id,
M.posted,
T.title,
T.user_id,
U.username
FROM
( select
M1.Topic_ID,
MAX( M1.Message_id ) MaxMsgPerTopic
from
Messages M1
LEFT Join StickiedTopics ST
ON M1.Topic_ID = ST.Topic_ID
where
ST.Topic_ID IS NULL
group by
M1.Topic_ID ) PreQuery
JOIN Messages M
ON PreQuery.MaxMsgPerTopic = M.Message_ID
JOIN Topics T
ON M.Topic_ID = T.Topic_ID
AND T.Board_ID = 1
LEFT JOIN Users U
on T.User_ID = U.user_id
WHERE
M.posted IN ( SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
ORDER BY
M.posted DESC
LIMIT 50
I would guess that a big part of your problem lies in your subqueries. Try something like this:
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
StickiedTopics ON StickiedTopics.topic_id = Topics.topic_id
AND StickedTopics.topic_id IS NULL
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages m1
WHERE m1.topic_id = Messages.topic_id)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
I optimized the first subquery by removing the grouping. The second subquery was unnecessary because it can be replaced with a JOIN.
I'm not quite sure what this third subquery is supposed to do:
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
I might be able to help optimize this if I know what it's supposed to do. What exactly is posted - a date, integer, etc? What does it represent?
how to make this select statement more faster?
the first left join with the subselect is making it slower...
mysql> SELECT COUNT(DISTINCT w1.id) AS AMOUNT FROM tblWerbemittel w1
JOIN tblVorgang v1 ON w1.object_group = v1.werbemittel_id
INNER JOIN ( SELECT wmax.object_group, MAX( wmax.object_revision ) wmaxobjrev FROM tblWerbemittel wmax GROUP BY wmax.object_group ) AS wmaxselect ON w1.object_group = wmaxselect.object_group AND w1.object_revision = wmaxselect.wmaxobjrev
LEFT JOIN ( SELECT vmax.object_group, MAX( vmax.object_revision ) vmaxobjrev FROM tblVorgang vmax GROUP BY vmax.object_group ) AS vmaxselect ON v1.object_group = vmaxselect.object_group AND v1.object_revision = vmaxselect.vmaxobjrev
LEFT JOIN tblWerbemittel_has_tblAngebot wha ON wha.werbemittel_id = w1.object_group
LEFT JOIN tblAngebot ta ON ta.id = wha.angebot_id
LEFT JOIN tblLieferanten tl ON tl.id = ta.lieferant_id AND wha.zuschlag = (SELECT MAX(zuschlag) FROM tblWerbemittel_has_tblAngebot WHERE werbemittel_id = w1.object_group)
WHERE w1.flags =0 AND v1.flags=0;
+--------+
| AMOUNT |
+--------+
| 1982 |
+--------+
1 row in set (1.30 sec)
Some indexes has been already set and as EXPLAIN shows they were used.
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2072 | |
| 1 | PRIMARY | v1 | ref | werbemittel_group,werbemittel_id_index | werbemittel_group | 4 | wmaxselect.object_group | 2 | Using where |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 3376 | |
| 1 | PRIMARY | w1 | eq_ref | object_revision,or_og_index | object_revision | 8 | wmaxselect.wmaxobjrev,wmaxselect.object_group | 1 | Using where |
| 1 | PRIMARY | wha | ref | PRIMARY,werbemittel_id_index | werbemittel_id_index | 4 | dpd.w1.object_group | 1 | |
| 1 | PRIMARY | ta | eq_ref | PRIMARY | PRIMARY | 4 | dpd.wha.angebot_id | 1 | |
| 1 | PRIMARY | tl | eq_ref | PRIMARY | PRIMARY | 4 | dpd.ta.lieferant_id | 1 | Using index |
| 4 | DEPENDENT SUBQUERY | tblWerbemittel_has_tblAngebot | ref | PRIMARY,werbemittel_id_index | werbemittel_id_index | 4 | dpd.w1.object_group | 1 | |
| 3 | DERIVED | vmax | index | NULL | object_revision_uq | 8 | NULL | 4668 | Using index; Using temporary; Using filesort |
| 2 | DERIVED | wmax | range | NULL | or_og_index | 4 | NULL | 2168 | Using index for group-by |
+----+--------------------+-------------------------------+--------+----------------------------------------+----------------------+---------+-----------------------------------------------+------+----------------------------------------------+
10 rows in set (0.01 sec)
The main problem while the statement above takes about 2 seconds seems to be the subselect where no index can be used.
How to write the statement even more faster?
Thanks for help. MT
Do you have the following indexes?
for tblWerbemittel - object_group, object_revision
for tblVorgang - object_group, object_revision
for tblWerbemittel_has_tblAngebot - werbemittel_id, zuschlag
Let me know if that helps, there are a few more that I can see might help but try those first.
EDIT
Can you try these two queries and see if they run fast?
SELECT w1.id AS AMOUNT
FROM tblWerbemittel w1 INNER JOIN
(SELECT wmax.object_group,
MAX( wmax.object_revision ) AS wmaxobjrev
FROM tblWerbemittel AS wmax
GROUP BY wmax.object_group ) AS wmaxselect ON w1.object_group = wmaxselect.object_group AND
w1.object_revision = wmaxselect.wmaxobjrev
WHERE w1.flags = 0
SELECT v1.werbemittel_id
FROM tblVorgang v1 LEFT JOIN
(SELECT vmax.object_group,
MAX( vmax.object_revision ) AS vmaxobjrev
FROM tblVorgang AS vmax
GROUP BY vmax.object_group ) AS vmaxselect ON v1.object_group = vmaxselect.object_group AND
v1.object_revision = vmaxselect.vmaxobjrev LEFT JOIN
WHERE v1.flags = 0
While I consider I don't have sufficient data to provide a 100% correct answer, but I can throw in a handful of tips.
Forst of all, MYSQL is stupid. Bear that in mind and always rearrange your queries so that the most data is excluded at the beginning. For instance, if the last join reduced the number of results from 10k to 2k while the others don't, try swapping their positions so that each subsequent join operates on the smallest subset of data possible.
Same applies to the WHERE clause.
Also, joins tend to be slower than subqueries. I don't know if that's a rule or just something that I'm observing in my case, but you can always try to substitute a join or two with a subquery.
While I suppose this doesn't really answer your question, I hope it at least gives you an idea about where to start looking for optimisations.