mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.
Related
I have two tables events and event_params
the first table stores the events with these columns
events | CREATE TABLE `events` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`project` varchar(24) NOT NULL,
`event` varchar(24) NOT NULL,
`date` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `project` (`project`,`event`)
) ENGINE=InnoDB AUTO_INCREMENT=2915335 DEFAULT CHARSET=latin1
and second stores parameters for each event with these columns
event_params | CREATE TABLE `event_params` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`event_id` int(10) unsigned NOT NULL,
`name` varchar(24) NOT NULL,
`value` varchar(524) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `event_id` (`event_id`),
KEY `value` (`value`),
) ENGINE=InnoDB AUTO_INCREMENT=20789391 DEFAULT CHARSET=latin1
now I want to get count of events those have various values on a specified parameter
I wrote this query for campaign parameter but this is too slow (15 secs to respond)
SELECT
event_params.value as campaign,
count(*) as count
FROM `events`
left join event_params on event_params.event_id = events.id
and event_params.name = 'campaign'
WHERE events.project = 'foo'
GROUP by event_params.value
and here is the EXPLAIN query result:
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | events | NULL | ref | project | project | 26 | const | 1 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | event_params | NULL | ref | name,event_id,value | event_id | 4 | events.events.id | 4 | 100.00 | Using where |
+----+-------------+--------------+------------+------+---------------------+----------+---------+------------------+------+----------+----------------------------------------------+
can i speed up this query ?
You may try adding the following index on the event_params table, which might speed up the join:
CREATE INDEX idx1 ON event_params (event_id, name, value);
The aggregation step probably can't be optimized much because the COUNT operation involves counting each record.
Move the "campaign value" into the main table, with a suitable length for VARCHAR and then
SELECT
campaign,
count(*) as count
FROM `events`
WHERE project = 'foo'
GROUP by campaign
And have
INDEX(project, campaign)
A bit of advice when tempted to use EAV: Move the 'important' values into the main table; leave only the rarely used or rarely set 'values' in the other table. Also (assuming there are no dups), have
PRIMARY KEY(event_id, name)
More discussion: http://mysql.rjweb.org/doc.php/eav
I currently join 5 tables to select 20 objects to show the user, unfortunately if I use GROUP BY and ORDER BY it gets really slow.
An example query looks Like this:
SELECT r.name, l.name, o.typ, o.id, persons, children, description, rating, totalratings, minprice, picture FROM angebote as a
JOIN objekte as o ON a.fid_objekt = o.id
JOIN regionen as r ON a.fid_region = r.id
JOIN laender as l ON a.fid_land = l.id
WHERE l.slug="aegypten" AND a.letztes_angebot >= 1
GROUP BY a.fid_objekt ORDER BY rating DESC LIMIT 0,20
The EXPLAIN of the Query shows this:
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
| 1 | SIMPLE | l | ref | PRIMARY,slug | slug | 767 | const | 1 | Using index condition; Using temporary; Using filesort |
| 1 | SIMPLE | o | ALL | PRIMARY | NULL | NULL | NULL | 186779 | Using join buffer (flat, BNL join) |
| 1 | SIMPLE | a | ref | unique_key,letztes_angebot | unique_key | 8 | ferienhaeuser.o.id,ferienhaeuser.l.id | 1 | Using where |
| 1 | SIMPLE | r | eq_ref | PRIMARY | PRIMARY | 4 | ferienhaeuser.a.fid_region | 1 | |
+------+-------------+-------+--------+----------------------------+------------+---------+---------------------------------------+--------+--------------------------------------------------------+
So it looks like it doesn't use a key for the table objekte, the Profiling says it uses 2.7s for Copying to tmp table.
Instead of FROM angebote or JOIN objekte I tried it with (SELECT * GROUP BY id) but unfortunately this doesn't improve.
The fields used for WHERE, ORDER BY and GROUP BY are also indexed.
I think I missed some basic concept here and any help will be appreciated.
Since it's most probable I made a mistake with the Tables, here the description of them:
Objekte
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| objekte | CREATE TABLE `objekte` (
`id` int(11) NOT NULL,
`typ` varchar(50) NOT NULL,
`persons` int(11) NOT NULL,
`children` int(11) NOT NULL,
`description` text NOT NULL,
`rating` float NOT NULL,
`totalratings` int(11) NOT NULL,
`minprice` float NOT NULL,
`picture` varchar(255) NOT NULL,
`last_offer` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `minprice` (`minprice`),
KEY `rating` (`rating`),
KEY `last_offer` (`last_offer`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Angebote
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| angebote | CREATE TABLE `angebote` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fid_objekt` int(11) NOT NULL,
`fid_land` int(11) NOT NULL,
`fid_region` int(11) NOT NULL,
`fid_subregion` int(11) NOT NULL,
`letztes_angebot` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_key` (`fid_objekt`,`fid_land`,`fid_region`,`fid_subregion`),
KEY `letztes_angebot` (`letztes_angebot`),
KEY `fid_objekt` (`fid_objekt`),
KEY `fid_land` (`fid_land`),
KEY `fid_region` (`fid_region`),
KEY `fid_subregion` (`fid_subregion`)
) ENGINE=InnoDB AUTO_INCREMENT=2433073 DEFAULT CHARSET=utf8 |
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
laender, regionen, subregionen (same structure)
+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| laender | CREATE TABLE `laender` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`iso` varchar(2) NOT NULL,
`name` varchar(255) NOT NULL,
`slug` varchar(255) NOT NULL,
`letztes_angebot` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `iso` (`iso`),
KEY `slug` (`slug`),
KEY `letztes_angebot` (`letztes_angebot`)
) ENGINE=InnoDB AUTO_INCREMENT=107 DEFAULT CHARSET=utf8 |
+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
First of all this is a non standard group by. As such it will stop working when you upgrade to mysql 5.7.
The biggest problem comes from the fact that no index is used on the objekte table. To make matters worse you are ordering on the ratings field on that table but the index is still not being used. A possible solution is to create a composite index like this:
CREATE INDEX objekte_idx ON objekte(id,rating);
You do not need to use GROUP BY here. You have not use aggregrate functions. So remove GROUP BY from query. Remove the Group By will increase query performance. Also no need to define 0 for limit.
SELECT r.name, l.name, o.typ, o.id, persons, children, description, rating, totalratings, minprice, picture FROM angebote as a
JOIN objekte as o ON a.fid_objekt = o.id
JOIN regionen as r ON a.fid_region = r.id
JOIN laender as l ON a.fid_land = l.id
WHERE l.slug="aegypten" AND a.letztes_angebot >= 1
ORDER BY rating DESC LIMIT 20
Below is my EXPLAIN query and the output. I'm very much a beginner (please forgive my SQL syntax...unless that's my problem!) - can anyone explain the order of the tables here please? I've played around with the order (in the query itself) and yet the TABLE artists is always top in the EXPLAIN output? I gather the order relates to when the tables are accessed - if so, why artists first?
EXPLAIN
SELECT album_name, artist_name, genre_name
FROM albums
JOIN genres USING (genre_pk)
JOIN artists USING (artist_pk)
ORDER BY album_name;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------+-----------+---------+-------------------------+------+---------------------------------+
| 1 | SIMPLE | artists | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | albums | ref | genre_pk,artist_pk | artist_pk | 2 | music.artists.artist_pk | 1 | NULL |
| 1 | SIMPLE | genres | eq_ref | PRIMARY | PRIMARY | 1 | music.albums.genre_pk | 1 | NULL |
SHOW CREATE TABLE info:
CREATE TABLE `artists` (
`artist_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`artist_name` varchar(57) NOT NULL,
`artist_origin` enum('UK','US','OTHER') DEFAULT NULL,
PRIMARY KEY (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
CREATE TABLE `genres` (
`genre_pk` tinyint(2) unsigned NOT NULL AUTO_INCREMENT,
`genre_name` varchar(30) NOT NULL,
PRIMARY KEY (`genre_pk`),
UNIQUE KEY `genre_name` (`genre_name`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
CREATE TABLE `albums` (
`album_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`genre_pk` tinyint(2) unsigned NOT NULL,
`artist_pk` smallint(4) unsigned NOT NULL,
`album_name` varchar(57) NOT NULL,
`album_year` year(4) DEFAULT NULL,
`album_track_qty` tinyint(2) unsigned NOT NULL,
`album_disc_num` char(6) NOT NULL DEFAULT '1 of 1',
PRIMARY KEY (`album_pk`),
KEY `genre_pk` (`genre_pk`),
KEY `artist_pk` (`artist_pk`),
FULLTEXT KEY `album_name` (`album_name`),
CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`genre_pk`) REFERENCES `genres` (`genre_pk`),
CONSTRAINT `albums_ibfk_2` FOREIGN KEY (`artist_pk`) REFERENCES `artists` (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
The order of joining your tables is depending on the SQL optimizer. The optimizer internally modifies your query to deliver results in a fast and efficient way (read this page for more details). To avoid internal join optimization you can use SELECT STRAIGHT_JOIN.
In your special case, the order is depending on the number of rows in each table and the availability of indexes. Have a look at these slides starting with page 25 for a little example.
Here is the fiddle for you: http://sqlfiddle.com/#!2/a6224/2/0
As #Daniel already said, MySQL takes into account not only indices, but also the number of rows in each table. The number of rows is low both in my fiddle and in your database - so it is hard to blame MySQL.
Note that even though STRAIGHT_JOIN will make the order of joins seem logical to you, it will not however make the execution plan prettier (I mean Using temporary; Using filesort red flags)
I have this query (shown below) which currently uses temporary and filesort in order to generate a grouped by set of ordered results. I would like to get rid of their usage if possible. I have looked into the underlying indexes used in this query and I just can't see what is missing.
SELECT
b.institutionid AS b__institutionid,
b.name AS b__name,
COUNT(DISTINCT f2.facebook_id) AS f2__0
FROM education_institutions b
LEFT JOIN facebook_education_matches f ON b.institutionid = f.institutionid
LEFT JOIN facebook_education f2 ON f.school_uid = f2.school_uid
WHERE
(
b.approved = '1'
AND f2.facebook_id IN ( [lots of facebook ids here ])
)
GROUP BY b__institutionid
ORDER BY f2__0 DESC
LIMIT 10
Here is the output for EXPLAIN EXTENDED :
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | f | index | PRIMARY,institutionId | institutionId | 4 | NULL | 308 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | f2 | ref | facebook_id_idx,school_uid_idx | school_uid_idx | 9 | f.school_uid | 1 | 100.00 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | f.institutionId | 1 | 100.00 | Using where |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
The CREATE TABLE statements for each table are shown below so you know the schema.
CREATE TABLE facebook_education (
education_id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) DEFAULT NULL,
school_uid bigint(20) DEFAULT NULL,
school_type varchar(255) DEFAULT NULL,
year smallint(6) DEFAULT NULL,
facebook_id bigint(20) DEFAULT NULL,
degree varchar(255) DEFAULT NULL,
PRIMARY KEY (education_id),
KEY facebook_id_idx (facebook_id),
KEY school_uid_idx (school_uid),
CONSTRAINT facebook_education_facebook_id_facebook_user_facebook_id FOREIGN KEY (facebook_id) REFERENCES facebook_user (facebook_id)
) ENGINE=InnoDB AUTO_INCREMENT=484 DEFAULT CHARSET=utf8;
CREATE TABLE facebook_education_matches (
school_uid bigint(20) NOT NULL,
institutionId int(11) NOT NULL,
created_at timestamp NULL DEFAULT NULL,
updated_at timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (school_uid),
KEY institutionId (institutionId),
CONSTRAINT fk_facebook_education FOREIGN KEY (school_uid) REFERENCES facebook_education (school_uid) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT fk_education_institutions FOREIGN KEY (institutionId) REFERENCES education_institutions (institutionId) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT;
CREATE TABLE education_institutions (
institutionId int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL,
type enum('School','Degree') DEFAULT NULL,
approved tinyint(1) NOT NULL DEFAULT '0',
deleted tinyint(1) NOT NULL DEFAULT '0',
normalisedName varchar(100) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (institutionId)
) ENGINE=InnoDB AUTO_INCREMENT=101327 DEFAULT CHARSET=utf8;
Any guidance would be greatly appreciated.
The filesort probably happens because you have no suitable index for the ORDER BY
It's mentioned in the MySQL "ORDER BY Optimization" docs.
What you can do is load a temp table, select from that afterwards. When you load the temp table, use ORDER BY NULL. When you select from the temp table, use ORDER BY .. LIMIT
The issue is that group by adds an implicit order by <group by clause> ASC unless you disable that behavior by adding a order by null.
It's one of those MySQL specific gotcha's.
I can see two possible optimizations,
b.approved = '1' - You definitely need an index on approved column for quick filtering.
f2.facebook_id IN ( [lots of facebook ids here ]) ) - Store the facebook ids in a temp table,. Then create an index on the temp table and then join with the temp table instead of using IN clause.
I have this table:
CREATE TABLE `point` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`siteid` INT(11) NOT NULL,
`lft` INT(11) DEFAULT NULL,
`rgt` INT(11) DEFAULT NULL,
`level` SMALLINT(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `point_siteid_site_id` (`siteid`),
CONSTRAINT `point_siteid_site_id` FOREIGN KEY (`siteid`) REFERENCES `site` (`id`) ON DELETE CASCADE
) ENGINE=INNODB AUTO_INCREMENT=35 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And this query:
SELECT * FROM `point` WHERE siteid = 1;
Which results in this EXPLAIN information:
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | point | ALL | point_siteid_site_id | NULL | NULL | NULL | 6 | Using where |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
Question is, why isn't the query using the point_siteid_site_id index?
I haven't used MySQL much, but in PostgreSQL if the number of records is small in the table, it can decide not to use the index. This is not a problem, because it chooses the best query plan for the situation. When the number of records is bigger, it will use the index. Maybe this is the same case here with MySQL.
Edit: Are you sure the foreign key is indexed?
It's probably because you don't have enough sample data, or the cardinality (diversity) of the siteid values isn't very large. These are the most common reasons the optimizer will find it quicker just to read all the values.