Is there a more efficent way to do this? I keep thinking I'm missing something. Thanks
SELECT DISTINCT eventId
FROM event_tags_map
WHERE tagId in (
SELECT tagId FROM event_tags_map WHERE eventId=114778
) ORDER BY RAND() LIMIT 5;
I'm hitting the same table twice and I'm wondering if I can get the same results faster.
Table structure:
mysql> describe event_tags_map;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| eventId | int(10) unsigned | NO | PRI | NULL | |
| tagId | int(10) unsigned | NO | PRI | NULL | |
+---------+------------------+------+-----+---------+-------+
Indexes:
mysql> show index from event_tags_map;
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| event_tags_map | 0 | PRIMARY | 1 | eventId | A | 302032 | NULL | NULL | | BTREE | | |
| event_tags_map | 0 | PRIMARY | 2 | tagId | A | 604065 | NULL | NULL | | BTREE | | |
+----------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.02 sec)
I looks like you will need to refer to the original table twice, one way or another.
I would recommend not using the IN condition, which is not very scalable and has various conter-intuitive behaviors.
My first option to use a correlated subquery with an EXISTS condition. This is usually the most efficient way to check that something exists...
SELECT DISTINCT eventId
FROM event_tags_map m
WHERE EXISTS (
SELECT 1 FROM event_tags_map m1 WHERE m1.eventId = 114778 AND m1.tagId = m.tagId
)
ORDER BY RAND() LIMIT 5;
An alternative option is to use a self-INNER JOIN:
SELECT DISTINCT eventId
FROM event_tags_map m
INNER JOIN event_tags_map m1 ON m1.eventId = 114778 AND m1.tagId = m.tagId
ORDER BY RAND() LIMIT 5;
Both solutions should be able to take advantage of a composite index on event_tags_map(eventId, tagId).
Related
I have a nested query that deletes a row in table terms only if exactly one row in definitions.term_id is found. It works but it takes like 9 seconds on my system. Im looking to optimize the query.
DELETE FROM terms
WHERE id
IN(
SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP BY term_id
HAVING COUNT(term_id) = 1
)
The database is only about 4000 rows. If I separate the query into 2 independent queries, it takes about 0.1 each
terms
+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term | varchar(50) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
definitions
+----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| term_id | int(11) | YES | | NULL | |
| definition | varchar(500) | YES | | NULL | |
| example | varchar(500) | YES | | NULL | |
| submitter_name | varchar(50) | YES | | NULL | |
| approved | int(1) | YES | MUL | 0 | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| votos | int(3) | NO | | NULL | |
+----------------+------------------+------+-----+---------+----------------+
To speed up the process, please consider creating an index on the relevant field:
CREATE INDEX term_id ON terms (term_id)
How about using correlated sub query using exists and try,
DELETE FROM terms t
WHERE id = 1234
AND EXISTS (SELECT 1
FROM definitions d
WHERE d.term_id = t.term_id
GROUP BY term_id
HAVING COUNT(term_id) = 1)
It's often quicker to create a new table retaining only the rows you wish to keep. That said, I'd probably write this as follows, and provide indexes as appropriate.
DELETE
FROM terms t
JOIN
( SELECT term_id
FROM definitions
WHERE term_id = 1234
GROUP
BY term_id
HAVING COUNT(*) = 1
) x
ON x.term_id = t.id
Hehe; this may be a kludgy way to do it:
DELETE ... WHERE id = ( SELECT ... )
but without any LIMIT or other constraints.
I'm depending on getting an error something like "subquery returned more than one row" in order to prevent the DELETE being performed if multiple rows match.
I have this big table (ca million records) and I'm trying to retrieve the last record of each type.
The table, the index and the query are very simple, and the fact that MySQL is not using the index means I must be overlooking something.
The table looks like this:
CREATE TABLE `MyTable001` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`TypeField` int(11) NOT NULL,
`Value` bigint(20) NOT NULL,
`Timestamp` bigint(20) NOT NULL,
`AnotherField1` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_MyTable001_TypeField` (`TypeField`),
KEY `idx_MyTable001_Timestamp` (`Timestamp`)
) ENGINE=MyISAM
Show Index gives this:
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| MyTable001 | 0 | PRIMARY | 1 | id | A | 626141 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_TypeField | 1 | TypeField | A | 458 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_Timestamp | 1 | Timestamp | A | 156535 | NULL | NULL | | BTREE | | |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
But when I execute EXPLAIN for the following query:
SELECT *
FROM MyTable001
GROUP BY TypeField
ORDER BY id DESC
The result is this:
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | MyTable001 | ALL | NULL | NULL | NULL | NULL | 626141 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
Why won't MySQL use idx_MyTable001_TypeField?
Thanks in advance.
The problem is that the content of the fields not in the group by are still being inspected. Therefore, all rows must be read, and it's better to do a full table scan. This is clearly seen with the following examples:
SELECT TypeField, COUNT(*) FROM MyTable001 GROUP BY TypeField uses the index.
SELECT TypeField, COUNT(id) FROM MyTable001 GROUP BY TypeField does not.
The original query was incorrect. The correct query is:
SELECT l.*
FROM MyTable001 l
JOIN (
SELECT MAX(id) m_id
FROM MyTable001 l
GROUP BY l.TypeField) l_id ON l_id.m_id = l.id;
It takes 260ms in a table with 630k records. Joachim Isaksson's and fancyPants' alternatives took several minutes in my tests.
My application is similar to Facebook, and I'm trying to optimize the query that get user records. The user records are that he as src ou dst. The src is in usermuralentry directly, the dst list are in usermuralentry_user.
So, a entry can have one src and many dst.
I have those tables:
mysql> desc usermuralentry ;
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| user_src_id | int(11) | NO | MUL | NULL | |
| private | tinyint(1) | NO | | NULL | |
| content | longtext | NO | | NULL | |
| date | datetime | NO | | NULL | |
| last_update | datetime | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
10 rows in set (0.10 sec)
mysql> desc usermuralentry_user ;
+-------------------+---------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| usermuralentry_id | int(11) | NO | MUL | NULL | |
| userinfo_id | int(11) | NO | MUL | NULL | |
+-------------------+---------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
And the following query to retrieve information from two users.
mysql> explain
SELECT *
FROM usermuralentry AS a
, usermuralentry_user AS b
WHERE a.user_src_id IN ( 1, 2 )
OR
(
a.id = b.usermuralentry_id
AND b.userinfo_id IN ( 1, 2 )
);
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
| 1 | SIMPLE | b | ALL | usermuralentry_id,usermuralentry_user_bcd7114e,usermuralentry_user_6b192ca7 | NULL | NULL | NULL | 147188 | |
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 1371289 | Range checked for each record (index map: 0x1) |
+----+-------------+-------+------+-------------------------------------------------------------------------------------------+------+---------+------+---------+------------------------------------------------+
2 rows in set (0.00 sec)
but it is taking A LOT of time...
Some tips to optimize? Can the table schema be better in my application?
Use this query
SELECT *
FROM usermuralentry AS a
left join usermuralentry_user AS b
on b.usermuralentry_id = a.id
WHERE a.user_src_id IN(1, 2)
OR (a.id = b.usermuralentry_id
AND b.userinfo_id IN(1, 2));
And for some tips here are
You are using two tables in from clause which is a cartision product and will take a lot of time as well as undesired results. Always use joins in this situation.
I think your join isn't properly formed, and you need to change the query to use UNION. The OR condition in the where clause is killing performance as well:
SELECT *
FROM usermuralentry AS a
JOIN usermuralentry_user AS b ON a.id = b.usermuralentry_id /* use explicit JOIN! */
WHERE a.user_src_id IN (1 , 2)
UNION
SELECT *
FROM usermuralentry AS a
JOIN usermuralentry_user AS b ON a.id = b.usermuralentry_id
WHERE b.usermuralentry_id IN ( 1, 2 )
You also need an index: ALTER TABLE usermuralentry_user ADD INDEX (usermuralentry_id)
MySQL Server version: 5.0.95
Tables All: InnoDB
I am having an issue with a MySQL db query. Basically I am finding that if I index a particular varchar(50) field tag.name, my queries take longer (x10) than not indexing the field. I am trying to speed this query up, however my efforts seem to be counter productive.
The culprit line and field seems to be:
WHERE `t`.`name` IN ('news','home')
I have noticed that if i query the tag table directly without a join using the same criteria and with the name field indexed, i do not have the issue.. It actually works faster as expected.
EXAMPLE Query **
SELECT `a`.*, `u`.`pen_name`
FROM `tag_link` `tl`
INNER JOIN `tag` `t`
ON `t`.`tag_id` = `tl`.`tag_id`
INNER JOIN `article` `a`
ON `a`.`article_id` = `tl`.`link_id`
INNER JOIN `user` `u`
ON `a`.`user_id` = `u`.`user_id`
WHERE `t`.`name` IN ('news','home')
AND `tl`.`type` = 'article'
AND `a`.`featured` = 'featured'
GROUP BY `article_id`
LIMIT 0 , 5
EXPLAIN with index **
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | t | range | PRIMARY,name | name | 152 | NULL | 4 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | tag_id | 4 | portal.t.tag_id | 10 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,fk_article_user1 | PRIMARY | 4 | portal.tl.link_id | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
+----+-------------+-------+--------+--------------------------+---------+---------+-------------------+------+-----------------------------------------------------------+
EXPLAIN without index **
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
| 1 | SIMPLE | a | index | PRIMARY,fk_article_user1 | PRIMARY | 4 | NULL | 8742 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | portal.a.user_id | 1 | |
| 1 | SIMPLE | tl | ref | tag_id,link_id,link_id_2 | link_id | 4 | portal.a.article_id | 3 | Using where |
| 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | portal.tl.tag_id | 1 | Using where |
+----+-------------+-------+--------+--------------------------+---------+---------+---------------------+------+-------------+
TABLE CREATE
CREATE TABLE `tag` (
`tag_id` int(11) NOT NULL auto_increment,
`name` varchar(50) NOT NULL,
`type` enum('layout','image') NOT NULL,
`create_dttm` datetime default NULL,
PRIMARY KEY (`tag_id`)
) ENGINE=InnoDB AUTO_INCREMENT=43077 DEFAULT CHARSET=utf8
INDEXS
SHOW INDEX FROM tag_link;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tag_link | 0 | PRIMARY | 1 | tag_link_id | A | 42023 | NULL | NULL | | BTREE | |
| tag_link | 1 | tag_id | 1 | tag_id | A | 10505 | NULL | NULL | | BTREE | |
| tag_link | 1 | link_id | 1 | link_id | A | 14007 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
SHOW INDEX FROM article;
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| article | 0 | PRIMARY | 1 | article_id | A | 5723 | NULL | NULL | | BTREE | |
| article | 1 | fk_article_user1 | 1 | user_id | A | 1 | NULL | NULL | | BTREE | |
| article | 1 | create_dttm | 1 | create_dttm | A | 5723 | NULL | NULL | YES | BTREE | |
+---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Final Solution
It seems that MySQL is just sorted the data incorrectly. In the end it turned out faster to look at the tag table as a sub query returning the ids.
It seems that article_id is the primary key for the article table.
Since you're grouping by article_id, MySQL needs to return the records in order by that column, in order to perform the GROUP BY.
You can see that without the index, it scans all records in the article table, but they're at least in order by article_id, so no later sort is required. The LIMIT optimization can be applied here, since it's already in order, it can just stop after it gets five rows.
In the query with the index on tag.name, instead of scanning the entire articles table, it utilizes the index, but against the tag table, and starts there. Unfortunately, when doing this, the records must later be sorted by article.article_id in order to complete the GROUP BY clause. The LIMIT optimization can't be applied since it must return the entire result set, then order it, in order to get the first 5 rows.
In this case, MySQL just guesses wrongly.
Without the LIMIT clause, I'm guessing that using the index is faster, which is maybe what MySQL was guessing.
How big are your tables?
I noticed in the first explain you have a "Using temporary; Using filesort" which is bad. Your query is likely being dumped to disc which makes it way slower than in memory queries.
Also try to avoid using "select *" and instead query the minimum fields needed.
I am making a message board and I trying to retrieve regular topics (ie, topics that are not stickied) and sort them by the date of the last posted message. I am able to accomplish this however when I have about 10,000 messages and 1500 topics the query time is >60 seconds.
My question is, is there anything I can do to my query to increase performance or is my design fundamentally flawed?
Here is the query that I am using.
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages
GROUP BY topic_id)
AND Messages.topic_id
NOT IN (
SELECT topic_id
FROM StickiedTopics)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
Edit Here is the Explain Plan
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
| 1 | PRIMARY | Topics | ref | PRIMARY,board_id | board_id | 4 | const | 641 | Using where; Using temporary; Using filesort |
| 1 | PRIMARY | Users | eq_ref | PRIMARY | PRIMARY | 4 | spergs3.Topics.user_id | 1 | |
| 1 | PRIMARY | Messages | ref | topic_id | topic_id | 4 | spergs3.Topics.topic_id | 3 | Using where |
| 4 | DEPENDENT SUBQUERY | Messages | index | NULL | PRIMARY | 8 | NULL | 1 | |
| 3 | DEPENDENT SUBQUERY | StickiedTopics | index_subquery | topic_id | topic_id | 4 | func | 1 | Using index |
| 2 | DEPENDENT SUBQUERY | Messages | index | NULL | topic_id | 4 | NULL | 3 | Using index |
+----+--------------------+----------------+----------------+------------------+----------+---------+-------------------------+------+----------------------------------------------+
Indexes
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Messages | 0 | PRIMARY | 1 | message_id | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 0 | PRIMARY | 2 | revision_no | A | 9956 | NULL | NULL | | BTREE | |
| Messages | 1 | user_id | 1 | user_id | A | 432 | NULL | NULL | | BTREE | |
| Messages | 1 | topic_id | 1 | topic_id | A | 3318 | NULL | NULL | | BTREE | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Topics | 0 | PRIMARY | 1 | topic_id | A | 1205 | NULL | NULL | | BTREE | |
| Topics | 1 | user_id | 1 | user_id | A | 133 | NULL | NULL | | BTREE | |
| Topics | 1 | board_id | 1 | board_id | A | 1 | NULL | NULL | | BTREE | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Users | 0 | PRIMARY | 1 | user_id | A | 2051 | NULL | NULL | | BTREE | |
| Users | 0 | username_UNIQUE | 1 | username | A | 2051 | NULL | NULL | | BTREE | |
+-------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
I would start with the first basis of qualified topics, get those IDs, then join out after.
My inner first query does a pre-qualify grouped by topic_id and max message to just get distinct IDs pre-qualified. I've also applied a LEFT JOIN to the stickiesTopics too. Why? By doing a left-join, I can look for those that are FOUND (those you want to exclude). So I've applied a WHERE clause for Stickies topic ID is NULL (ie: NOT found). So by doing this, we've ALREADY paired down the list SIGNIFICANTLY without doing several nested sub-queries. From THAT result, we can join to the messages, topics (including qualifier of board_id = 1), users and get parts as needed. Finally, apply a single WHERE IN sub-select for your MIN(posted) qualifier. Don't understand the basis of that, but left it in as part of your original query. Then the order by and limit.
SELECT STRAIGHT_JOIN
M.topic_id,
M.posted,
T.title,
T.user_id,
U.username
FROM
( select
M1.Topic_ID,
MAX( M1.Message_id ) MaxMsgPerTopic
from
Messages M1
LEFT Join StickiedTopics ST
ON M1.Topic_ID = ST.Topic_ID
where
ST.Topic_ID IS NULL
group by
M1.Topic_ID ) PreQuery
JOIN Messages M
ON PreQuery.MaxMsgPerTopic = M.Message_ID
JOIN Topics T
ON M.Topic_ID = T.Topic_ID
AND T.Board_ID = 1
LEFT JOIN Users U
on T.User_ID = U.user_id
WHERE
M.posted IN ( SELECT MIN(posted)
FROM Messages
GROUP BY message_id)
ORDER BY
M.posted DESC
LIMIT 50
I would guess that a big part of your problem lies in your subqueries. Try something like this:
SELECT Messages.topic_id,
Messages.posted,
Topics.title,
Topics.user_id,
Users.username
FROM Messages
LEFT JOIN
Topics USING(topic_id)
LEFT JOIN
StickiedTopics ON StickiedTopics.topic_id = Topics.topic_id
AND StickedTopics.topic_id IS NULL
LEFT JOIN
Users on Users.user_id = Topics.user_id
WHERE Messages.message_id IN (
SELECT MAX(message_id)
FROM Messages m1
WHERE m1.topic_id = Messages.topic_id)
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
AND Topics.board_id=1
ORDER BY Messages.posted DESC LIMIT 50
I optimized the first subquery by removing the grouping. The second subquery was unnecessary because it can be replaced with a JOIN.
I'm not quite sure what this third subquery is supposed to do:
AND Messages.posted IN (
SELECT MIN(posted)
FROM Messages m2
GROUP BY message_id)
I might be able to help optimize this if I know what it's supposed to do. What exactly is posted - a date, integer, etc? What does it represent?