I have this table:
CREATE TABLE `point` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`siteid` INT(11) NOT NULL,
`lft` INT(11) DEFAULT NULL,
`rgt` INT(11) DEFAULT NULL,
`level` SMALLINT(6) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `point_siteid_site_id` (`siteid`),
CONSTRAINT `point_siteid_site_id` FOREIGN KEY (`siteid`) REFERENCES `site` (`id`) ON DELETE CASCADE
) ENGINE=INNODB AUTO_INCREMENT=35 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And this query:
SELECT * FROM `point` WHERE siteid = 1;
Which results in this EXPLAIN information:
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | point | ALL | point_siteid_site_id | NULL | NULL | NULL | 6 | Using where |
+----+-------------+-------+------+----------------------+------+---------+------+------+-------------+
Question is, why isn't the query using the point_siteid_site_id index?
I haven't used MySQL much, but in PostgreSQL if the number of records is small in the table, it can decide not to use the index. This is not a problem, because it chooses the best query plan for the situation. When the number of records is bigger, it will use the index. Maybe this is the same case here with MySQL.
Edit: Are you sure the foreign key is indexed?
It's probably because you don't have enough sample data, or the cardinality (diversity) of the siteid values isn't very large. These are the most common reasons the optimizer will find it quicker just to read all the values.
Related
I am trying to prevent mysql from creating a temporary table in this query
SELECT `vendor_id`,SUM(`qty`) AS `qty`
FROM `inventory_transactions`
WHERE `inventory_transactions`.`date`
BETWEEN '2018-10-21 00:00:00' AND '2018-10-22 23:59:59'
GROUP BY `vendor_id`
I've tried re-arranging the indexes, using SELECT DISTINCT, MIN(vendor_id),MAX(vendor_id), adding a COUNT(*) Column in an attempt to see if it would use the index to sort.
I've had ix(date,type,vendor_id) in every variation as well as individual indexes.
I just can't seem to figure out why mysql keeps trying to sort from a temporary table, the only way it doesn't use a temp table is if i group by the date column, which is not what I want.
Anyone have any insight as to how to fix it?
Table Schema
CREATE TABLE `transactions` (
`id` int(11) NOT NULL,
`vendor_id` int(11) DEFAULT NULL,
`type` varchar(50) DEFAULT NULL,
`unit_cost` decimal(10,2) NOT NULL,
`qty` decimal(11,2) DEFAULT NULL,
`location_id` int(11) NOT NULL,
`date` timestamp NOT NULL,
PRIMARY KEY (`id`),
KEY `location_id` (`location_id`),
KEY `type` (`type`),
KEY `vendor_id` (`vendor_id`) USING BTREE,
KEY `date` (`date`,`vendor_id`,`type`) USING BTREE,
CONSTRAINT `transactions_ibfk_1` FOREIGN KEY (`location_id`) REFERENCES `locations` (`id`),
CONSTRAINT `transactions_ibfk_3` FOREIGN KEY (`vendor_id`) REFERENCES `vendors` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
EXPLAIN
+----+-------------+------------------------+------------+-------+----------------------+------------+---------+------+------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------------+------------+-------+----------------------+------------+---------+------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | inventory_transactions | NULL | range | vendor_id,date | date | 4 | NULL | 1196 | 100.00 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+------------------------+------------+-------+----------------------+------------+---------+------+------+----------+-----------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)
CREATE TABLE ofRoster (
`rosterID` bigint(20) NOT NULL,
`username` varchar(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
PRIMARY KEY (`rosterID`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB;
CREATE TABLE `ofRoster_par` (
`rosterID` bigint(20) NOT NULL AUTO_INCREMENT,
`username` int(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
UNIQUE KEY `rosterID` (`rosterID`,`username`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB AUTO_INCREMENT=412595 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (username)
PARTITIONS 10 */ ;
I created partition on username so that when i use select command it need to search on one partition only.
But i am not sure if this will be benifitial as there is already a index on username.
explain SELECT count(*) FROM ofRoster_par WHERE username='1';
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| 1 | SIMPLE | ofRoster_par | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 4 | const | 120 | Using index |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
explain SELECT count(*) FROM ofRoster WHERE username='1';
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | ofRoster | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 66 | const | 120 | Using where; Using index |
Right now there are just 400 000 records on the table but on the production records will be around 80 million.
Time taken by both query is also the same :-(
PARTITION BY HASH is, in my opinion, useless.
In your example, INDEX(username) on a non-partitioned table would probably be faster than using PARTITION BY HASH(username).
You already have such an index. How fast was it?
Here's what is happening:
With partitioning:
pick partition
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Without partitioning:
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Other comments:
If username is unique, consider making it the PRIMARY KEY and get rid of rosterID. (You may want to keep rosterID because it is smaller and used for JOINing to several other tables.)
Bug: You say INT(64) where you meant VARCHAR(64). This may have impacted your timing test.
"Prefix indexes" (jid(255)) are rarely useful. Let's see how you are using it.
80M rows does not warrant BIGINT (8 bytes); INT UNSIGNED (4 bytes) can handle 400 crore.
You understand that latin1 limits you to western European languages?
When using EXPLAIN with partitioned tables, use EXPLAIN PARTITIONS SELECT .... You may get some surprises.
Below is my EXPLAIN query and the output. I'm very much a beginner (please forgive my SQL syntax...unless that's my problem!) - can anyone explain the order of the tables here please? I've played around with the order (in the query itself) and yet the TABLE artists is always top in the EXPLAIN output? I gather the order relates to when the tables are accessed - if so, why artists first?
EXPLAIN
SELECT album_name, artist_name, genre_name
FROM albums
JOIN genres USING (genre_pk)
JOIN artists USING (artist_pk)
ORDER BY album_name;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------+-----------+---------+-------------------------+------+---------------------------------+
| 1 | SIMPLE | artists | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | albums | ref | genre_pk,artist_pk | artist_pk | 2 | music.artists.artist_pk | 1 | NULL |
| 1 | SIMPLE | genres | eq_ref | PRIMARY | PRIMARY | 1 | music.albums.genre_pk | 1 | NULL |
SHOW CREATE TABLE info:
CREATE TABLE `artists` (
`artist_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`artist_name` varchar(57) NOT NULL,
`artist_origin` enum('UK','US','OTHER') DEFAULT NULL,
PRIMARY KEY (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
CREATE TABLE `genres` (
`genre_pk` tinyint(2) unsigned NOT NULL AUTO_INCREMENT,
`genre_name` varchar(30) NOT NULL,
PRIMARY KEY (`genre_pk`),
UNIQUE KEY `genre_name` (`genre_name`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
CREATE TABLE `albums` (
`album_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`genre_pk` tinyint(2) unsigned NOT NULL,
`artist_pk` smallint(4) unsigned NOT NULL,
`album_name` varchar(57) NOT NULL,
`album_year` year(4) DEFAULT NULL,
`album_track_qty` tinyint(2) unsigned NOT NULL,
`album_disc_num` char(6) NOT NULL DEFAULT '1 of 1',
PRIMARY KEY (`album_pk`),
KEY `genre_pk` (`genre_pk`),
KEY `artist_pk` (`artist_pk`),
FULLTEXT KEY `album_name` (`album_name`),
CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`genre_pk`) REFERENCES `genres` (`genre_pk`),
CONSTRAINT `albums_ibfk_2` FOREIGN KEY (`artist_pk`) REFERENCES `artists` (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
The order of joining your tables is depending on the SQL optimizer. The optimizer internally modifies your query to deliver results in a fast and efficient way (read this page for more details). To avoid internal join optimization you can use SELECT STRAIGHT_JOIN.
In your special case, the order is depending on the number of rows in each table and the availability of indexes. Have a look at these slides starting with page 25 for a little example.
Here is the fiddle for you: http://sqlfiddle.com/#!2/a6224/2/0
As #Daniel already said, MySQL takes into account not only indices, but also the number of rows in each table. The number of rows is low both in my fiddle and in your database - so it is hard to blame MySQL.
Note that even though STRAIGHT_JOIN will make the order of joins seem logical to you, it will not however make the execution plan prettier (I mean Using temporary; Using filesort red flags)
mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.
I have this query (shown below) which currently uses temporary and filesort in order to generate a grouped by set of ordered results. I would like to get rid of their usage if possible. I have looked into the underlying indexes used in this query and I just can't see what is missing.
SELECT
b.institutionid AS b__institutionid,
b.name AS b__name,
COUNT(DISTINCT f2.facebook_id) AS f2__0
FROM education_institutions b
LEFT JOIN facebook_education_matches f ON b.institutionid = f.institutionid
LEFT JOIN facebook_education f2 ON f.school_uid = f2.school_uid
WHERE
(
b.approved = '1'
AND f2.facebook_id IN ( [lots of facebook ids here ])
)
GROUP BY b__institutionid
ORDER BY f2__0 DESC
LIMIT 10
Here is the output for EXPLAIN EXTENDED :
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | f | index | PRIMARY,institutionId | institutionId | 4 | NULL | 308 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | f2 | ref | facebook_id_idx,school_uid_idx | school_uid_idx | 9 | f.school_uid | 1 | 100.00 | Using where |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4 | f.institutionId | 1 | 100.00 | Using where |
+----+-------------+-------+--------+--------------------------------+----------------+---------+----------------------------------+------+----------+----------------------------------------------+
The CREATE TABLE statements for each table are shown below so you know the schema.
CREATE TABLE facebook_education (
education_id int(11) NOT NULL AUTO_INCREMENT,
name varchar(255) DEFAULT NULL,
school_uid bigint(20) DEFAULT NULL,
school_type varchar(255) DEFAULT NULL,
year smallint(6) DEFAULT NULL,
facebook_id bigint(20) DEFAULT NULL,
degree varchar(255) DEFAULT NULL,
PRIMARY KEY (education_id),
KEY facebook_id_idx (facebook_id),
KEY school_uid_idx (school_uid),
CONSTRAINT facebook_education_facebook_id_facebook_user_facebook_id FOREIGN KEY (facebook_id) REFERENCES facebook_user (facebook_id)
) ENGINE=InnoDB AUTO_INCREMENT=484 DEFAULT CHARSET=utf8;
CREATE TABLE facebook_education_matches (
school_uid bigint(20) NOT NULL,
institutionId int(11) NOT NULL,
created_at timestamp NULL DEFAULT NULL,
updated_at timestamp NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (school_uid),
KEY institutionId (institutionId),
CONSTRAINT fk_facebook_education FOREIGN KEY (school_uid) REFERENCES facebook_education (school_uid) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT fk_education_institutions FOREIGN KEY (institutionId) REFERENCES education_institutions (institutionId) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT;
CREATE TABLE education_institutions (
institutionId int(11) NOT NULL AUTO_INCREMENT,
name varchar(100) NOT NULL,
type enum('School','Degree') DEFAULT NULL,
approved tinyint(1) NOT NULL DEFAULT '0',
deleted tinyint(1) NOT NULL DEFAULT '0',
normalisedName varchar(100) NOT NULL,
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (institutionId)
) ENGINE=InnoDB AUTO_INCREMENT=101327 DEFAULT CHARSET=utf8;
Any guidance would be greatly appreciated.
The filesort probably happens because you have no suitable index for the ORDER BY
It's mentioned in the MySQL "ORDER BY Optimization" docs.
What you can do is load a temp table, select from that afterwards. When you load the temp table, use ORDER BY NULL. When you select from the temp table, use ORDER BY .. LIMIT
The issue is that group by adds an implicit order by <group by clause> ASC unless you disable that behavior by adding a order by null.
It's one of those MySQL specific gotcha's.
I can see two possible optimizations,
b.approved = '1' - You definitely need an index on approved column for quick filtering.
f2.facebook_id IN ( [lots of facebook ids here ]) ) - Store the facebook ids in a temp table,. Then create an index on the temp table and then join with the temp table instead of using IN clause.