Trying to understand EXPLAIN results of query in mySQL - mysql

Below is my EXPLAIN query and the output. I'm very much a beginner (please forgive my SQL syntax...unless that's my problem!) - can anyone explain the order of the tables here please? I've played around with the order (in the query itself) and yet the TABLE artists is always top in the EXPLAIN output? I gather the order relates to when the tables are accessed - if so, why artists first?
EXPLAIN
SELECT album_name, artist_name, genre_name
FROM albums
JOIN genres USING (genre_pk)
JOIN artists USING (artist_pk)
ORDER BY album_name;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------+-----------+---------+-------------------------+------+---------------------------------+
| 1 | SIMPLE | artists | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | albums | ref | genre_pk,artist_pk | artist_pk | 2 | music.artists.artist_pk | 1 | NULL |
| 1 | SIMPLE | genres | eq_ref | PRIMARY | PRIMARY | 1 | music.albums.genre_pk | 1 | NULL |
SHOW CREATE TABLE info:
CREATE TABLE `artists` (
`artist_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`artist_name` varchar(57) NOT NULL,
`artist_origin` enum('UK','US','OTHER') DEFAULT NULL,
PRIMARY KEY (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
CREATE TABLE `genres` (
`genre_pk` tinyint(2) unsigned NOT NULL AUTO_INCREMENT,
`genre_name` varchar(30) NOT NULL,
PRIMARY KEY (`genre_pk`),
UNIQUE KEY `genre_name` (`genre_name`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
CREATE TABLE `albums` (
`album_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`genre_pk` tinyint(2) unsigned NOT NULL,
`artist_pk` smallint(4) unsigned NOT NULL,
`album_name` varchar(57) NOT NULL,
`album_year` year(4) DEFAULT NULL,
`album_track_qty` tinyint(2) unsigned NOT NULL,
`album_disc_num` char(6) NOT NULL DEFAULT '1 of 1',
PRIMARY KEY (`album_pk`),
KEY `genre_pk` (`genre_pk`),
KEY `artist_pk` (`artist_pk`),
FULLTEXT KEY `album_name` (`album_name`),
CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`genre_pk`) REFERENCES `genres` (`genre_pk`),
CONSTRAINT `albums_ibfk_2` FOREIGN KEY (`artist_pk`) REFERENCES `artists` (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;

The order of joining your tables is depending on the SQL optimizer. The optimizer internally modifies your query to deliver results in a fast and efficient way (read this page for more details). To avoid internal join optimization you can use SELECT STRAIGHT_JOIN.
In your special case, the order is depending on the number of rows in each table and the availability of indexes. Have a look at these slides starting with page 25 for a little example.

Here is the fiddle for you: http://sqlfiddle.com/#!2/a6224/2/0
As #Daniel already said, MySQL takes into account not only indices, but also the number of rows in each table. The number of rows is low both in my fiddle and in your database - so it is hard to blame MySQL.
Note that even though STRAIGHT_JOIN will make the order of joins seem logical to you, it will not however make the execution plan prettier (I mean Using temporary; Using filesort red flags)

Related

mysql group by joined column too slow

I have two tables events and event_params
the first table stores the events with these columns
events | CREATE TABLE `events` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`project` varchar(24) NOT NULL,
`event` varchar(24) NOT NULL,
`date` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `project` (`project`,`event`)
) ENGINE=InnoDB AUTO_INCREMENT=2915335 DEFAULT CHARSET=latin1
and second stores parameters for each event with these columns
event_params | CREATE TABLE `event_params` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`event_id` int(10) unsigned NOT NULL,
`name` varchar(24) NOT NULL,
`value` varchar(524) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `event_id` (`event_id`),
KEY `value` (`value`),
) ENGINE=InnoDB AUTO_INCREMENT=20789391 DEFAULT CHARSET=latin1
now I want to get count of events those have various values on a specified parameter
I wrote this query for campaign parameter but this is too slow (15 secs to respond)
SELECT
event_params.value as campaign,
count(*) as count
FROM `events`
left join event_params on event_params.event_id = events.id
and event_params.name = 'campaign'
WHERE events.project = 'foo'
GROUP by event_params.value
and here is the EXPLAIN query result:
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | events | NULL | ref | project | project | 26 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | event_params | NULL | ref | name,event_id,value | event_id | 4 | events.events.id | 4 | 100.00 | Using where |
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
can i speed up this query ?
You may try adding the following index on the event_params table, which might speed up the join:
CREATE INDEX idx1 ON event_params (event_id, name, value);
The aggregation step probably can't be optimized much because the COUNT operation involves counting each record.
Move the "campaign value" into the main table, with a suitable length for VARCHAR and then
SELECT
campaign,
count(*) as count
FROM `events`
WHERE project = 'foo'
GROUP by campaign
And have
INDEX(project, campaign)
A bit of advice when tempted to use EAV: Move the 'important' values into the main table; leave only the rarely used or rarely set 'values' in the other table. Also (assuming there are no dups), have
PRIMARY KEY(event_id, name)
More discussion: http://mysql.rjweb.org/doc.php/eav

[Partition Benifit on Indexed Column]

CREATE TABLE ofRoster (
`rosterID` bigint(20) NOT NULL,
`username` varchar(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
PRIMARY KEY (`rosterID`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB;
CREATE TABLE `ofRoster_par` (
`rosterID` bigint(20) NOT NULL AUTO_INCREMENT,
`username` int(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
UNIQUE KEY `rosterID` (`rosterID`,`username`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB AUTO_INCREMENT=412595 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (username)
PARTITIONS 10 */ ;
I created partition on username so that when i use select command it need to search on one partition only.
But i am not sure if this will be benifitial as there is already a index on username.
explain SELECT count(*) FROM ofRoster_par WHERE username='1';
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| 1 | SIMPLE | ofRoster_par | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 4 | const | 120 | Using index |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
explain SELECT count(*) FROM ofRoster WHERE username='1';
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | ofRoster | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 66 | const | 120 | Using where; Using index |
Right now there are just 400 000 records on the table but on the production records will be around 80 million.
Time taken by both query is also the same :-(
PARTITION BY HASH is, in my opinion, useless.
In your example, INDEX(username) on a non-partitioned table would probably be faster than using PARTITION BY HASH(username).
You already have such an index. How fast was it?
Here's what is happening:
With partitioning:
pick partition
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Without partitioning:
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Other comments:
If username is unique, consider making it the PRIMARY KEY and get rid of rosterID. (You may want to keep rosterID because it is smaller and used for JOINing to several other tables.)
Bug: You say INT(64) where you meant VARCHAR(64). This may have impacted your timing test.
"Prefix indexes" (jid(255)) are rarely useful. Let's see how you are using it.
80M rows does not warrant BIGINT (8 bytes); INT UNSIGNED (4 bytes) can handle 400 crore.
You understand that latin1 limits you to western European languages?
When using EXPLAIN with partitioned tables, use EXPLAIN PARTITIONS SELECT .... You may get some surprises.

Mysql Slow Query - Even with all the indices

mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.

Optimise MySql query using temporary and filesort

I have this query (shown below) which currently uses temporary and filesort in order to generate a grouped by set of ordered results. I would like to get rid of their usage if possible. I have looked into the underlying indexes used in this query and I just can't see what is missing.
SELECT
b.institutionid AS b__institutionid,
b.name AS b__name,
COUNT(DISTINCT f2.facebook_id) AS f2__0
FROM education_institutions b
LEFT JOIN facebook_education_matches f ON b.institutionid = f.institutionid
LEFT JOIN facebook_education f2 ON f.school_uid = f2.school_uid
WHERE
(
b.approved = '1'
AND f2.facebook_id IN ( [lots of facebook ids here ])
)
GROUP BY b__institutionid
ORDER BY f2__0 DESC
LIMIT 10
Here is the output for EXPLAIN EXTENDED :
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | f | index | PRIMARY,institutionId | institutionId | 4 | NULL | 308 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | f2 | ref | facebook_id_idx,school_uid_idx | school_uid_idx | 9 | f.school_uid | 1 | 100.00 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | f.institutionId | 1 | 100.00 | Using where |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
The CREATE TABLE statements for each table are shown below so you know the schema.
CREATE TABLE facebook_education (
education_id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) DEFAULT NULL,
school_uid bigint(20) DEFAULT NULL,
school_type varchar(255) DEFAULT NULL,
year smallint(6) DEFAULT NULL,
facebook_id bigint(20) DEFAULT NULL,
degree varchar(255) DEFAULT NULL,
PRIMARY KEY (education_id),
KEY facebook_id_idx (facebook_id),
KEY school_uid_idx (school_uid),
CONSTRAINT facebook_education_facebook_id_facebook_user_facebook_id FOREIGN KEY (facebook_id) REFERENCES facebook_user (facebook_id)
) ENGINE=InnoDB AUTO_INCREMENT=484 DEFAULT CHARSET=utf8;
CREATE TABLE facebook_education_matches (
school_uid bigint(20) NOT NULL,
institutionId int(11) NOT NULL,
created_at timestamp NULL DEFAULT NULL,
updated_at timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (school_uid),
KEY institutionId (institutionId),
CONSTRAINT fk_facebook_education FOREIGN KEY (school_uid) REFERENCES facebook_education (school_uid) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT fk_education_institutions FOREIGN KEY (institutionId) REFERENCES education_institutions (institutionId) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT;
CREATE TABLE education_institutions (
institutionId int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL,
type enum('School','Degree') DEFAULT NULL,
approved tinyint(1) NOT NULL DEFAULT '0',
deleted tinyint(1) NOT NULL DEFAULT '0',
normalisedName varchar(100) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (institutionId)
) ENGINE=InnoDB AUTO_INCREMENT=101327 DEFAULT CHARSET=utf8;
Any guidance would be greatly appreciated.
The filesort probably happens because you have no suitable index for the ORDER BY
It's mentioned in the MySQL "ORDER BY Optimization" docs.
What you can do is load a temp table, select from that afterwards. When you load the temp table, use ORDER BY NULL. When you select from the temp table, use ORDER BY .. LIMIT
The issue is that group by adds an implicit order by <group by clause> ASC unless you disable that behavior by adding a order by null.
It's one of those MySQL specific gotcha's.
I can see two possible optimizations,
b.approved = '1' - You definitely need an index on approved column for quick filtering.
f2.facebook_id IN ( [lots of facebook ids here ]) ) - Store the facebook ids in a temp table,. Then create an index on the temp table and then join with the temp table instead of using IN clause.

MySQL won't use index for query?

I have this table:
CREATE TABLE `point` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`siteid` INT(11) NOT NULL,
`lft` INT(11) DEFAULT NULL,
`rgt` INT(11) DEFAULT NULL,
`level` SMALLINT(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `point_siteid_site_id` (`siteid`),
CONSTRAINT `point_siteid_site_id` FOREIGN KEY (`siteid`) REFERENCES `site` (`id`) ON DELETE CASCADE
) ENGINE=INNODB AUTO_INCREMENT=35 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And this query:
SELECT * FROM `point` WHERE siteid = 1;
Which results in this EXPLAIN information:
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | point | ALL | point_siteid_site_id | NULL | NULL | NULL | 6 | Using where |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
Question is, why isn't the query using the point_siteid_site_id index?
I haven't used MySQL much, but in PostgreSQL if the number of records is small in the table, it can decide not to use the index. This is not a problem, because it chooses the best query plan for the situation. When the number of records is bigger, it will use the index. Maybe this is the same case here with MySQL.
Edit: Are you sure the foreign key is indexed?
It's probably because you don't have enough sample data, or the cardinality (diversity) of the siteid values isn't very large. These are the most common reasons the optimizer will find it quicker just to read all the values.