I've been trying to wrap my head around this for a good while, but had no luck. I have a simple queue system implemented on my small site and a cron job to check if there are any items in the queue. It's supposed to fetch several items ordered by priority and process them, but for some reason the priority index gets ignored. My create table syntax is
CREATE TABLE `site_queue` (
`row_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`task` tinyint(3) unsigned NOT NULL COMMENT '0 - email',
`priority` int(10) unsigned DEFAULT NULL,
`commands` text NOT NULL,
`added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`row_id`),
KEY `task` (`task`),
KEY `priority` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
The query to fetch queued items is
SELECT `row_id`, `task`, `commands` FROM `site_queue` ORDER BY `priority` DESC LIMIT 5;
The EXPLAIN query returns the following:
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | site_queue | ALL | NULL | NULL | NULL | NULL | 1269 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
Can anyone offer some insight on what might be causing this?
Because when it's only few rows (originally 4, then increased to 1k) there is no reason to use index, since it will be slower (mysql will have to read both index and data pages too many times).
So the rule of thumb of mysql query optimizations: use reasonably big amount of data when you do so. It would be good if size was comparable to real production data size.
Related
Problem with MySQL version 5.7.18. Earlier versions of MySQL behaves as supposed to.
Here are two tables. Table 1:
CREATE TABLE `test_events` (
`id` int(11) NOT NULL,
`event` int(11) DEFAULT '0',
`manager` int(11) DEFAULT '0',
`base_id` int(11) DEFAULT '0',
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`client` int(11) DEFAULT '0',
`event_time` datetime DEFAULT '0000-00-00 00:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_events`
ADD PRIMARY KEY (`id`),
ADD KEY `client` (`client`),
ADD KEY `event_time` (`event_time`),
ADD KEY `manager` (`manager`),
ADD KEY `base_id` (`base_id`),
ADD KEY `create_time` (`create_time`);
And the second table:
CREATE TABLE `test_event_types` (
`id` int(11) NOT NULL,
`name` varchar(255) DEFAULT NULL,
`create_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`base` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `test_event_types`
ADD PRIMARY KEY (`id`);
Let's try to select last event from base "314":
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
WHERE base = 314
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
| 1 | SIMPLE | test_events | NULL | ALL | NULL | NULL | NULL | NULL | 434928 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | test_event_types | NULL | ALL | PRIMARY | NULL | NULL | NULL | 44 | 2.27 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+------------------+------------+------+---------------+------+---------+------+--------+----------+----------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
MySQL is not using index and reads the whole table.
Without WHERE statement:
EXPLAIN SELECT `test_events`.`create_time`
FROM `test_events`
LEFT JOIN `test_event_types`
ON ( `test_events`.`event` = `test_event_types`.`id` )
ORDER BY `test_events`.`create_time` DESC
LIMIT 1;
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
| 1 | SIMPLE | test_events | NULL | index | NULL | create_time | 4 | NULL | 1 | 100.00 | NULL |
| 1 | SIMPLE | test_event_types | NULL | eq_ref | PRIMARY | PRIMARY | 4 | m16.test_events.event | 1 | 100.00 | Using index |
+----+-------------+------------------+------------+--------+---------------+-------------+---------+-----------------------+------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
Now it uses index.
MySQL 5.5.55 uses index in both cases. Why is it so and what to do with it?
I don't know the difference you are seeing in your previous and current installations but the servers behaviour makes sense.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) ORDER BY test_events.create_time DESC LIMIT 1;
In this query you do not have a where clause but you are fetching one row only. And that's after sorting by create_time which happens to have an index. And that index can be used for sorting. But let's see the second query.
SELECT test_events.create_time FROM test_events LEFT JOIN test_event_types ON ( test_events.event = test_event_types.id ) WHERE base = 314 ORDER BY test_events.create_time DESC LIMIT 1
You don't have an index on the base column. So no index can be used on that. To find the relevent records mysql has to do a table scan. Having identified the relevent rows, they need to be sorted. But in this case the query planner has decided that it's just not worth it to use the index on create_time
I see several problems with your setup, the first being not having and index on base as already mentioned. But why is base varchar? You appear to be storing integers in it.
ALTER TABLE test_events
ADD PRIMARY KEY (id),
ADD KEY client (client),
ADD KEY event_time (event_time),
ADD KEY manager (manager),
ADD KEY base_id (base_id),
ADD KEY create_time (create_time);
And making multiple indexes like this doesn't make much sense in mysql. That's because mysql can use only one index per table for queries. You would be far better off with one or two indexes. Possibly multi column indexes.
I think your ideal index would contain both create_time and event fields
base = 314 with base VARCHAR... is a performance problem. Either put quotes around 314 or make base some integer type.
You appear not to need LEFT. If not, then do a plain JOIN so that the optimizer has the freedom to start with an INDEX(base), which is then missing and needed.
As for the differences between 5.5 and 5.6 and 5.7, there have been a number of Optimization changes; you may have encountered a regression. But I don't want to chase that until you have improved the query and indexes.
I stumbled upon same scenario where MySQL was using table scan, instead of INDEX search.
This could be because of one of the reasons, mentioned in MySQL docs:
The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.
mysql docs link
And when I checked EXPLAIN of MySQL query in production server with large number of rows, it used INDEX search as expected.
Its one of the MySQL optimizations, under the hood :)
Greeting.
Let me show my table scheme first:
CREATE TABLE `log_table` (
`rid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`dataId` int(10) unsigned NOT NULL DEFAULT '0',
`memberId` int(10) unsigned NOT NULL DEFAULT '0',
`clientId` int(10) unsigned NOT NULL DEFAULT '0',
`qty` int(11) NOT NULL DEFAULT '0',
`timestamp` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`typeA` tinyint(2) DEFAULT NULL,
`typeB` int(11) DEFAULT '0',
PRIMARY KEY (`rid`,`timestamp`),
KEY `idx_report1` (`timestamp`,`memberId`,`dataId`),
KEY `idx_report2` (`memberId`,`timestamp`),
KEY `idx_report3` (`dataId`,`timestamp`,`rid`),
KEY `idx_report4` (`timestamp`,`typeB`,`typeA`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY RANGE (year(`timestamp`))
(PARTITION p2014 VALUES LESS THAN (2015),
PARTITION p2015 VALUES LESS THAN (2016)
);
I'm using MariaDB 5.5 and this table contains 25 million records, so I decided to make partitions in the table for preventing performance issue may occur in the near future.
You may see it's time serial, log data, and having 4 views. For example, one of the views uses following query:
select typeB, typeA, count(*) as number from log_table where timestamp between '2015-1-1' and '2015-2-1' group by typeB, typeA;
AFAIK, this query loads the data from p2015 only by partition pruning. But I saw there is not much difference between original table and partition-version in query execution time. (avg 1.94 sec vs 1.95 sec)
Hm, I thought it's might influenced by number of rows in each partition. then how about smaller size of partition? to_days()?
PARTITION BY RANGE (to_days(`timestamp`))
(
...
PARTITION p_2015_01 VALUES LESS THAN (to_days('2015-2-1')),
PARTITION p_2015_02 VALUES LESS THAN (to_days('2015-3-1'))
...
)
Well, there's no effect. Could you let me know what's my missing piece?
EDIT: sorry for my error in the query.. btw, EXPLAIN PARTITION doesn't help me.
and result of explain both tables are :
// original
+------+-------------+-----------+-------+-------------------------+-------------+---------+------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+-------+-------------------------+-------------+---------+------+---------+-----------------------------------------------------------+
| 1 | SIMPLE | org_table | range | idx_report1,idx_report4 | idx_report4 | 8 | NULL | 8828000 | Using where; Using index; Using temporary; Using filesort |
+------+-------------+-----------+-------+-------------------------+-------------+---------+------+---------+-----------------------------------------------------------+
//partition
+------+-------------+-----------+-------+-------------------------+-------------+---------+------+---------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------+-------+-------------------------+-------------+---------+------+---------+-----------------------------------------------------------+
| 1 | SIMPLE | log_table | range | idx_report1,idx_report4 | idx_report4 | 8 | NULL | 7902646 | Using where; Using index; Using temporary; Using filesort |
+------+-------------+-----------+-------+-------------------------+-------------+---------+------+---------+-----------------------------------------------------------+
PARTITIONing does not help performance nearly as often as users think it will.
KEY `idx_report4` (`timestamp`,`typeB`,`typeA`)
without partitioning is optimal for the SELECT you provided. PARTITIONing will not speed it up any.
Since BETWEEN is "inclusive" where timestamp between '2015-1-1' and '2015-2-1' actually hits two partitions. Use EXPLAIN PARTITIONS SELECT ... to see that.
BY RANGE (TO_DAYS(...)) is probably better than BY RANGE (YEAR(...)), but still not useful for the given query.
Here is my discussion of the only 4 use cases where PARTITIONing helps performance: http://mysql.rjweb.org/doc.php/partitionmaint
If this type of query is important, consider "Summary Tables" as a way of greatly speeding up the application: http://mysql.rjweb.org/doc.php/datawarehouse and http://mysql.rjweb.org/doc.php/summarytables
Below is my EXPLAIN query and the output. I'm very much a beginner (please forgive my SQL syntax...unless that's my problem!) - can anyone explain the order of the tables here please? I've played around with the order (in the query itself) and yet the TABLE artists is always top in the EXPLAIN output? I gather the order relates to when the tables are accessed - if so, why artists first?
EXPLAIN
SELECT album_name, artist_name, genre_name
FROM albums
JOIN genres USING (genre_pk)
JOIN artists USING (artist_pk)
ORDER BY album_name;
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+--------------------+-----------+---------+-------------------------+------+---------------------------------+
| 1 | SIMPLE | artists | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
| 1 | SIMPLE | albums | ref | genre_pk,artist_pk | artist_pk | 2 | music.artists.artist_pk | 1 | NULL |
| 1 | SIMPLE | genres | eq_ref | PRIMARY | PRIMARY | 1 | music.albums.genre_pk | 1 | NULL |
SHOW CREATE TABLE info:
CREATE TABLE `artists` (
`artist_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`artist_name` varchar(57) NOT NULL,
`artist_origin` enum('UK','US','OTHER') DEFAULT NULL,
PRIMARY KEY (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1;
CREATE TABLE `genres` (
`genre_pk` tinyint(2) unsigned NOT NULL AUTO_INCREMENT,
`genre_name` varchar(30) NOT NULL,
PRIMARY KEY (`genre_pk`),
UNIQUE KEY `genre_name` (`genre_name`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=latin1;
CREATE TABLE `albums` (
`album_pk` smallint(4) unsigned NOT NULL AUTO_INCREMENT,
`genre_pk` tinyint(2) unsigned NOT NULL,
`artist_pk` smallint(4) unsigned NOT NULL,
`album_name` varchar(57) NOT NULL,
`album_year` year(4) DEFAULT NULL,
`album_track_qty` tinyint(2) unsigned NOT NULL,
`album_disc_num` char(6) NOT NULL DEFAULT '1 of 1',
PRIMARY KEY (`album_pk`),
KEY `genre_pk` (`genre_pk`),
KEY `artist_pk` (`artist_pk`),
FULLTEXT KEY `album_name` (`album_name`),
CONSTRAINT `albums_ibfk_1` FOREIGN KEY (`genre_pk`) REFERENCES `genres` (`genre_pk`),
CONSTRAINT `albums_ibfk_2` FOREIGN KEY (`artist_pk`) REFERENCES `artists` (`artist_pk`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
The order of joining your tables is depending on the SQL optimizer. The optimizer internally modifies your query to deliver results in a fast and efficient way (read this page for more details). To avoid internal join optimization you can use SELECT STRAIGHT_JOIN.
In your special case, the order is depending on the number of rows in each table and the availability of indexes. Have a look at these slides starting with page 25 for a little example.
Here is the fiddle for you: http://sqlfiddle.com/#!2/a6224/2/0
As #Daniel already said, MySQL takes into account not only indices, but also the number of rows in each table. The number of rows is low both in my fiddle and in your database - so it is hard to blame MySQL.
Note that even though STRAIGHT_JOIN will make the order of joins seem logical to you, it will not however make the execution plan prettier (I mean Using temporary; Using filesort red flags)
I have a table with about 30 million records which I need to perform queries upon. From my reading, I thought that a composite index using leftmost prefixing with all the fields I need to select would be the correct way to do it, but when I run an explain on the query, it's not even using the index.
This is the query:
select distinct email FROM my_table
WHERE `customer_id` IN(278,428,186,40,208,247,59,79,376,73,38,52,68,227)
AND `company_id` = 4
AND `active` = 1
AND `date` > '2012-04-15';
The explain looks like this
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | emails | index | customer_id | email | 772 | NULL | 29296705 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
These are the fields
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`customer_id` int(10) unsigned DEFAULT NULL,
`company_id` int(10) unsigned NOT NULL,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date` date DEFAULT NULL
Indexes looks like this
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`,`customer_id`),
KEY `customer_id` (`customer_id`,`company_id`,`active`,`date`)
I'm not quite sure what the best way to optimize this is.
MySQL is often fussy about IN on the left side of the index. Try one query for each customer_id and see if that's using your index. You can use the UNION syntax to join them together The other possibility is that MySQL figures it's faster to sift through everything for 10% of rows than to try to use indexes for them.
I have a number of reports involving joins on large datasets. These tables are being written to many times per second. My cronjobs run the queries at the least impactful times but still I am concerned about harming performance by locking tables with them.
Here is a simple example they requested as a one off today. It shows playtimes for a RIIA report:
SELECT
date_format(p.`played`, '%Y-%m') as `month`,
SUM(TIME_TO_SEC(s.`length`))/3600 as `playtime`
INTO OUTFILE "/tmp/120313_playtime.csv"
FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
FROM
`plays` p,
`songs` s
GROUP BY `month`
How do I construct this to avoid causing issues for the radio app writing to the plays table while the query is running? Should I create temp tables and copy the live ones over?
// EDIT per request EXPLAIN output
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
| 1 | SIMPLE | s | ALL | NULL | NULL | NULL | NULL | 3909 | Using temporary; Using filesort |
| 1 | SIMPLE | p | ALL | NULL | NULL | NULL | NULL | 4040933 | Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+---------+---------------------------------+
CREATE TABLE `plays` (
`play_id` int(11) NOT NULL auto_increment,
`song_id` int(11) NOT NULL,
`played` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
PRIMARY KEY (`play_id`)
) ENGINE=MyISAM AUTO_INCREMENT=4040992 DEFAULT CHARSET=latin1 COMMENT='play counts for songs' AUTO_INCREMENT=4040992 ;
CREATE TABLE `songs` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(255) NOT NULL,
`artist_id` int(11) NOT NULL,
`length` time NOT NULL,
`album_id` int(11) NOT NULL,
`active` tinyint(4) NOT NULL,
`tracknum` varchar(16) NOT NULL,
`bitrate` varchar(32) NOT NULL,
`date_created` datetime NOT NULL,
`date_modified` timestamp NOT NULL default '0000-00-00 00:00:00' on update CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4136 DEFAULT CHARSET=latin1 AUTO_INCREMENT=4136 ;
Just two immediate things come to mind... One, no "JOIN" between plays and songs which will result in a Cartesian product. Second, add a WHERE clause, and I would expect the "played" column is a date/time, so you could query for all records < NOW(), so if any are added while the query is running, they would be excluded. Since it appears you are doing monthly, you might even create a separate table that is nothing but the running totals per "time period" grouped by month and year, then you don't have to worry about a super long query. Then, you can just run for the current month in question... still less than NOW().