MySQL JOIN extremely poor performance - mysql

I've been messing around all day trying to find why my query performance is terrible. It is extremely simple, yet can take over 15 minutes to execute (I abort the query at that stage). I am joining a table with over 2 million records.
This is the select:
SELECT
audit.MessageID, alerts.AlertCount
FROM
audit
LEFT JOIN (
SELECT MessageID, COUNT(ID) AS 'AlertCount'
FROM alerts
GROUP BY MessageID
) AS alerts ON alerts.MessageID = audit.MessageID
This is the EXPLAIN
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| 1 | PRIMARY | AL | index | NULL | IDX_audit_MessageID | 4 | NULL | 2330944 | 100.00 | Using index |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 124140 | 100.00 | |
| 2 | DERIVED | alerts | index | NULL | IDX_alerts_MessageID | 5 | NULL | 124675 | 100.00 | Using index |
This is the schema:
# Not joining, just showing types
CREATE TABLE messages (
ID int NOT NULL AUTO_INCREMENT,
MessageID varchar(255) NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_messages_MessageID (MessageID)
);
# 2,324,931 records
CREATE TABLE audit (
ID int NOT NULL AUTO_INCREMENT,
MessageID int NOT NULL,
LogTimestamp timestamp NOT NULL,
PRIMARY KEY (ID),
INDEX IDX_audit_MessageID (MessageID),
CONSTRAINT FK_audit_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
# 124,140
CREATE TABLE alerts (
ID int NOT NULL AUTO_INCREMENT,
AlertLevel int NOT NULL,
Text nvarchar(4096) DEFAULT NULL,
MessageID int DEFAULT 0,
PRIMARY KEY (ID),
INDEX IDX_alert_MessageID (MessageID),
CONSTRAINT FK_alert_MessageID FOREIGN KEY(MessageID) REFERENCES messages(ID)
);
A few very important things to note - the MessageID is not 1:1 in either 'audit' or 'alerts'; The MessageID can exist in one table, but not the other, or may exist in both (which is the purpose of my join); In my test DB, none of the MessageID exist in both. In other words, my query will return 2.3 million records with 0 as the count.
Another thing to note is that the 'audit' and 'alert' tables used to use MessageID as varchar(255). I created the 'messages' table expecting that it would fix the join. It actually made it worse. Previously, it would take 78 seconds, now, it never returns.
What am I missing about MySQL?

Subqueries are very hard for the MySQL engine to optimize. Try:
SELECT
audit.MessageID, COUNT(alerts.ID) AS AlertCount
FROM
audit
LEFT JOIN alerts ON alerts.MessageID = audit.MessageID
GROUP BY audit.MessageID

You're joining to a subquery.
The subquery results are effectively a temporary table - note the <derived2> in the query execution plan. As you can see there, they're not indexed, since they're ephemeral.
You should execute the query as a single unit with a join, rather than joining to the results of a second query.
EDIT: Andrew has posted an answer with one example of how to do your work in a normal join query, instead of in two steps.

Related

Single INNER JOIN of two well-indexed tables takes more than a minute to run

I have a query that takes about 90 seconds to run even though the tables should have the right indexes. I don't understand why.
I am using MySQL and the tables are InnoDB.
This is the query:
SELECT count(*)
FROM `following_lists` fl INNER JOIN users u
ON fl.user_uuid = u.user_uuid
WHERE fl.following_query_id = 1000010 AND u.status <= 2
I expect this query to start on the table following_lists, grab about 4K records as per the WHERE condition, join these records to the table users by its primary key, check the value of a field in the users table, and return the count of the resulting records. Why does it take so long? Could it be because the two fields I'm joining the tables by are CHAR(40) and not integers?
These are the tables involved and their indexes:
CREATE TABLE `users` (
`user_uuid` CHAR(40) NOT NULL,
`status` TINYINT UNSIGNED NOT NULL,
...
PRIMARY KEY (`user_uuid`),
...
)
CREATE TABLE `following_lists` (
`following_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`following_query_id` INT UNSIGNED NOT NULL,
`user_uuid` CHAR(40) NOT NULL,
PRIMARY KEY (`following_id`),
KEY `query_id` (`following_query_id`),
KEY `user_uuid` (`user_uuid`)
)
And this is the output of the explain query:
+----+-------------+-------+--------+--------------------+----------+---------+--------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+--------------------+----------+---------+--------------+------+-------------+
| 1 | SIMPLE | fl | ref | query_id,user_uuid | query_id | 4 | const | 3718 | |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 160 | fl.user_uuid | 1 | Using index |
+----+-------------+-------+--------+--------------------+----------+---------+--------------+------+-------------+
Further details:
The table following_lists has about 25k rows, but only 3718 have fl.following_query_id = 1000010.
The table users has about 160k rows, but only 3718 should be selected in the join. Only 40 records meet both conditions fl.following_query_id = 1000010 AND u.status <= 2.
The query is slow even if I remove the condition AND u.status <= 2.
"have the right indexes" -- dead give away.
If you are using MyISAM, don't. Instead, switch to InnoDB.
Do you need following_lists.id for anything? Is (following_query_id, user_uuid) Unique? If so, make them the PRIMARY KEY.
If you can't do the above, change
KEY `query_id` (`following_query_id`)
to
INDEX(following_query_id, user_uuid)
UUIDs are terrible inefficient, especially when unnecessarily declared utf8mb4, or CHAR with a larger than necessary size. Change to CHAR(36) CHARACTER SET ascii. (Notice the "160" in the `EXPLAIN shrink significantly.)
More on why UUIDs are bad for performance: http://mysql.rjweb.org/doc.php/uuid
How much RAM do you have? What is the setting for innodb_buffer_pool_size? (Sounds like it is too low.)
More on indexing: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Select columns from table1 join table2, taking very long time

I have 2 tables. From these two tables i am trying to insert records into a third table using a select query with join. However i found that select query with join not using indexes and taking a lots of time, hence insertion is very slow.
I tried to create multiple indexes as suggested in few posts but not avail.
MySQL with JOIN not using index
MySQL query with JOIN not using INDEX
Here are my tables structure:
CREATE TABLE master_table (
id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
field1 VARCHAR(50) DEFAULT NULL,
field2 VARCHAR(50) DEFAULT NULL,
field3 VARCHAR(50) DEFAULT NULL,
field4 VARCHAR(50) DEFAULT NULL,
PRIMARY KEY (id),
KEY mt_field1_index (field1)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
CREATE TABLE child_table (
c_id BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
m_id BIGINT(20) UNSIGNED NOT NULL ,
group_id BIGINT(20) UNSIGNED NOT NULL ,
status ENUM('Status1','Status2','Status3') NOT NULL,
job_id VARCHAR(50) DEFAULT NULL,
PRIMARY KEY (c_id),
UNIQUE KEY ct_mid_gid (m_id,group_id),
KEY Index_ct_status (status),
KEY index_ct_jobid (job_id),
KEY index_ct_mid (m_id),
KEY index_ct_cid_sts_tsk (group_id,status,job_id)
) ENGINE=INNODB DEFAULT CHARSET=utf8;
Query:
SELECT m.id
, NULLIF(TRIM(m.field1),'')
FROM master_table m
JOIN child_table c
ON m.id = c.m_id
WHERE c.group_id = 2
AND c.status = 'Status3'
AND c.job_id = 0
ORDER
BY m.id
LIMIT 0, 1000;
Explain:
+-------+-------------+-------+------------+----------+------------------------------------------------------------------------------+-----------------+---------+--------------+-------+----------+------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+-------+-------------+-------+------------+----------+------------------------------------------------------------------------------+-----------------+---------+--------------+-------+----------+------------------------------------------------+
| 1 | SIMPLE | c | (NULL) | ref | ct_mid_gid,Index_ct_status,index_ct_jobid,index_ct_mid,index_ct_cid_sts_tsk | Index_ct_status | 1 | const | 65689 | 0.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | m | (NULL) | eq_ref | PRIMARY | PRIMARY | 8 | r_n_d.c.m_id | 1 | 100.00 | (NULL) |
+-------+-------------+-------+------------+----------+------------------------------------------------------------------------------+-----------------+---------+--------------+-------+----------+------------------------------------------------+
WHERE c.group_id = 2
AND c.status = 'Status3'
AND c.job_id = 0
ORDER BY c.m_id -- Note the change
Needs
INDEX(group_id, status, job_id, -- in any order
m_id) -- last
What you have (separate indexes) is not the same.
In order to get to the LIMIT the index must get entirely past the WHERE and ORDER BY. This prevents computing all the rows (before the LIMIT) and sorting and only then do the LIMIT.
So, you get 4 speedups:
Index efficiently fetching the desired rows (from c)
No need for a sort pass (since ORDER BY delivers them in order)
The index is "covering" (hence, no bouncing back and forth between index BTree and data BTree for c)
Get to stop at 1000.
While you are at it, consider getting rid of the AUTO_INCREMENT. Toss c_id and change
PRIMARY KEY (c_id),
UNIQUE KEY ct_mid_gid (m_id, group_id)
-->
PRIMARY KEY(m_id, group_id)
Coincidentally, if you had done this, your KEY index_ct_cid_sts_tsk (group_id,status,job_id) would have stumbled into the perfect index. This is because the PK is implicitly tacked onto any secondary index, but you need m_id, not c_id. Anyway, I prefer to be explicit.
And when making changes, toss any redundant indexes. For example, KEY index_ct_mid (m_id) is useless since it is the beginning of another index.
Have you created index on the columns used in where clause?
Check query by adding index on columns.
but do remember if you add index then insert, update and delete operation on table might get slow.

MySql Record Matching Criteria With Latest Date

I have a mySql table where all status changes are recorded. I want to be able to query the status of all items on a specific date, or the last date for all items. The table I have now is:
CREATE TABLE `tra_rel_sta` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tra_id` int(11) DEFAULT NULL,
`sta_id` int(11) DEFAULT NULL,
`changed_on` datetime DEFAULT NULL,
`changed_by` int(11) DEFAULT NULL,
`comments` text,
PRIMARY KEY (`id`),
KEY `tra_id` (`tra_id`),
KEY `rel` (`tra_id`,`sta_id`,`changed_on`),
KEY `sta_id` (`sta_id`),
KEY `changed_on` (`changed_on`),
KEY `tra_changed` (`tra_id`,`changed_on`)
) ENGINE=InnoDB AUTO_INCREMENT=51734 DEFAULT CHARSET=utf8;
(I know I'm probably overdoing the indexes, but I haven't exactly figured out how to optimize indexes yet).
The query I'm using now, which works is:
SELECT rel.changed_on, rel.changed_by, rel.tra_id, sta.id AS sta_id, sta.status, sta.description, sta.onHold, sta.awaitingApproval, sta.approved, sta.complete, sta.locked
FROM (
SELECT tra_id, MAX(changed_on) AS lst
FROM tra_rel_sta
GROUP BY tra_id
) AS rec
LEFT JOIN tra_rel_sta AS rel ON rel.changed_on = rec.lst AND rel.tra_id = rec.tra_id
LEFT JOIN tra_status AS sta ON sta.id = rel.sta_id
If I want to use a specific date, I insert a WHERE statement in the sub-query.
This works, but it takes about 0.65 seconds to run in PHP with about 51,733 records in the table. This query is used as a sub query in several others when I need to know the last status of an object, and as a result, is slowing down many application.
I've tried to use a sub query in the WHERE statement as described in MySQL: how to select record with latest date before a certain date but it takes almost twice as long. I've tried using a JOIN statement as described in MySQL select of record with latest date but I'm getting about the same or just slightly slower results.
How can I optimize this query or fix my indexes to make this more effective?
Thanks!!
As requested, EXPLAIN of query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
---|-------------|-------------|--------|-----------------------------------|---------|---------|-------------------|-------|-------------
1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 49931 | NULL
1 | PRIMARY | rel | ref | tra_id,rel,changed_on,tra_changed | tra_id | 5 | rec.tra_id | 1 | Using where
1 | PRIMARY | sta | eq_ref | PRIMARY | PRIMARY | 4 | csinfo.rel.sta_id | 1 | NULL
2 | DERIVED | tra_rel_sta | index | tra_id,rel,tra_changed | tra_id | 5 | NULL | 49931 | NULL

mysql difference in index usage between MyISAM and InnoDB

I have these small tables, item and category:
CREATE TABLE `item` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
`category_id` mediumint(8) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `category_id` (`category_id`)
) CHARSET=utf8
CREATE TABLE `category` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(150) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) CHARSET=utf8
I have inserted 100 categories and 1000 items.
If I run this:
EXPLAIN SELECT item.id,category.name AS category_name FROM item JOIN category ON item.category_id=category.id;
Then, if the tables' engine is InnoDB I get:
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
| 1 | SIMPLE | category | index | PRIMARY | name | 452 | NULL | 103 | Using index |
| 1 | SIMPLE | item | ref | category_id | category_id | 3 | dbname.category.id | 5 | Using index |
+----+-------------+----------+-------+---------------+-------------+---------+--------------------+------+-------------+
Whereas, if I switch to MyISAM (with alter table engine=myisam) I get:
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
| 1 | SIMPLE | item | ALL | category_id | NULL | NULL | NULL | 1003 | |
| 1 | SIMPLE | category | eq_ref | PRIMARY | PRIMARY | 3 | dbname.item.category_id | 1 | |
+----+-------------+----------+--------+---------------+---------+---------+-------------------------+------+-------+
My question is, why this difference in the way indexes are handled?
In InnoDB, any secondary index internally contains the primary key column of the table. So the index name on column (name) is implicitly on columns (name, id).
This means that EXPLAIN shows your access to the category table as an "index-scan" (this is shown in the type column as "index"). By scanning the index, it also has access to the id column, which it uses to look up rows in the second table, item.
Then it also takes advantage of the item index on (category_id) which is really (category_id, id), and it is able to fetch item.id for your select-list simply by reading the index. No need to read the table at all (this is shown in the Extra column as "Using index").
MyISAM doesn't store primary keys with the secondary key in this way, so it can't get the same optimizations. The access to the category table is type "ALL" which means a table-scan.
I would expect the access to the MyISAM table item would be "ref" as it looks up rows using the index on (category_id). But the optimizer may get skewed results if you have very few rows in the table, or if you haven't done ANALYZE TABLE item since creating the index.
Re your update:
It looks like the optimizer prefers an index-scan over a table-scan, so it takes the opportunity to do an index-scan in InnoDB, and puts the category table first. The optimizer decides to re-order the tables instead of using the tables in the order you gave them in your query.
In the MyISAM tables, there will be one table-scan whichever table it chooses to access first, but by putting the category table second, it joins to category's PRIMARY key index instead of item's secondary index. The optimizer prefers lookups to a unique or primary key (type "eq_ref").

mysql does not use primary bigint index

iam fighting with some performance problems on a very simple table which seems to be very slow when fetching data by using its primary key (bigint)
I have this table with 124 million entries:
CREATE TABLE `nodes` (
`id` bigint(20) NOT NULL,
`lat` float(13,7) NOT NULL,
`lon` float(13,7) NOT NULL,
PRIMARY KEY (`id`),
KEY `lat_index` (`lat`),
KEY `lon_index` (`lon`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
and a simple query which takes some id from another table using the IN clause to fetch data from the nodes tables, but it takes like 1 hour only to fetch a few rows from this table.
EXPLAIN shows me its not using the PRIMARY key as index, its simply scanning the whole table. Why that? id and the id column from the other table are both from type bigint(20).
mysql> EXPLAIN SELECT lat, lon FROM nodes WHERE id IN (SELECT node_id FROM ways_elements WHERE way_id = '4962890');
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
| 1 | PRIMARY | nodes | ALL | NULL | NULL | NULL | NULL | 124035228 | Using where |
| 2 | DEPENDENT SUBQUERY | ways_elements | ref | way_id | way_id | 8 | const | 2 | Using where |
+----+--------------------+-------------------+------+---------------+--------+---------+-------+-----------+-------------+
The query SELECT node_id FROM ways_elements WHERE way_id = '4962890' simply returns two node ids, so the whole query should only return two rows, but it takes more or less 1 hour.
Using "force index (PRIMARY)" didnt help, even if it would help, why does MySQL not take that index since its a primary key? EXPLAIN doesnt even mention anything in the possible_keys columns but select_type shows PRIMARY.
Am i doing something wrong?
How does this perform?
SELECT lat, lon FROM nodes t1 join ways_elements t2 on (t1.id=t2.node_id) WHERE t2.way_id = '4962890'
I suspect that your query is checking each row in nodes against each item in the "IN" clause.
This is what is called a correlated subquery. You can see this as reference or this popular question posted on Stackoverflow. A better query to use is:
SELECT lat,
lon
FROM nodes n
JOIN ways_elements w ON n.id = w.node_id
WHERE way_id = '4962890'