I try to optimize this query :
select id_store from receipt where receiptDate between '20151109' and '20151116'
I execute this query with the command EXPLAIN. It appears that no key is used. The index of receiptDate is not used. What's wrong ?
Here's the structure of the table receipt :
CREATE TABLE receipt (
id_store tinyint(3) unsigned NOT NULL default '0',
id_receipt int(7) unsigned NOT NULL default '0',
id_product smallint(6) unsigned NOT NULL default '0',
receiptDate char(8) NOT NULL default '',
qty float NOT NULL default '0',
turnover float NOT NULL default '0',
PRIMARY KEY (id_store,id_receipt,id_product,receiptDate),
KEY NDX_1 (receiptDate),
) ENGINE=MEMORY;
Here's the result of the command EXPLAIN :
+----+-------------+---------------------+--------+-----------------------------------------------------------+---------------------------------+---------+--------------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+-----------------------------------------------------------+---------------------------------+---------+--------------------------------------+------+---------------------------------+
| 1 | SIMPLE | receipt | ALL |NDX_1 | | | |24789225| Using where |
+----+-------------+---------------------+--------+-----------------------------------------------------------+---------------------------------+---------+--------------------------------------+------+---------------------------------+
The table receipt contains 24.789.225 lines, with a average of 15.000 lines per day (receiptDate).
I execute the following query and I obtain 120.295 lines :
select count(*) from receipt where receiptDate between '20151109' and '20151116'
Thanks in advance for your help.
Since you indexed receiptDate, the database engine will use this index if the optimizer thinks it will improve performance. The optimizer takes its decisions based on statistics about your table. It creates these statistics in background, this is a mostly transparent process.
Now you are using a MEMORY engine. Because memory tables are supposed to be short lived, these special engines have very limited optimizer capabilities. You might want to force the query to use your index with the FORCE INDEX keyword.
Your date is stored as a CHAR(8), this is slow because the engine has to parse all height CHAR. You will obtain improved performance with an INT (convert the date to YYYYMMDD). Your queries should still work since the engine converts strings inputs to int automatically.
If you were going to use the InnoDB engine, then you should put the date as a primary key if possible, because the primary key is also the Clustered Index with this engine, meaning the data will be physically sorted by date on the storage.
Instead of an 8-byte (or 24-byte if utf8) CHAR(8) for receiptDate, use a 3-byte DATE datatype.
While you are at it, you could save another byte by making id_receipt MEDIUMINT UNSIGNED if it is only 7 digits.
Specify BTREE:
KEY `NDX_1` (`receiptDate`) USING BTREE
since MEMORY may be making it a HASH index. BTREE can handle ranges; HASH has to do a table scan.
I agree that InnoDB is likely to be better overall.
Related
I have a MySQL table structured like this:
CREATE TABLE `messages` (
`id` int NOT NULL AUTO_INCREMENT,
`author` varchar(250) COLLATE utf8mb4_unicode_ci NOT NULL,
`message` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,
`serverid` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL,
`date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`guildname` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
PRIMARY KEY (`id`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=27769461 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
I need to query this table for various statistics using date ranges for Grafana graphs, however all of those queries are extremely slow, despite the table being indexed using a composite key of id and date.
"id" is auto-incrementing and date is also always increasing.
The queries generated by Grafana look like this:
SELECT
UNIX_TIMESTAMP(date) DIV 120 * 120 AS "time",
count(DISTINCT(serverid)) AS "servercount"
FROM messages
WHERE
date BETWEEN FROM_UNIXTIME(1615930154) AND FROM_UNIXTIME(1616016554)
GROUP BY 1
ORDER BY UNIX_TIMESTAMP(date) DIV 120 * 120
This query takes over 30 seconds to complete with 27 million records in the table.
Explaining the query results in this output:
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
| 1 | SIMPLE | messages | NULL | ALL | PRIMARY | NULL | NULL | NULL | 26952821 | 11.11 | Using where; Using filesort |
+----+-------------+----------+------------+------+---------------+------+---------+------+----------+----------+-----------------------------+
This indicates that MySQL is indeed using the composite primary key I created for indexing the data, but still has to scan almost the entire table, which I do not understand. How can I optimize this table for date range queries?
Plan A:
PRIMARY KEY(date, id), -- to cluster by date
INDEX(id) -- needed to keep AUTO_INCREMENT happy
Assiming the table is quite big, having date at the beginning of the PK puts the rows in the given date range all next to each other. This minimizes (somewhat) the I/O.
Plan B:
PRIMARY KEY(id),
INDEX(date, serverid)
Now the secondary index is exactly what is needed for the one query you have provided. It is optimized for searching by date, and it is smaller than the whole table, hence even faster (I/O-wise) than Plan A.
But, if you have a lot of different queries like this, adding a lot more indexes gets impractical.
Plan C: There may be a still better way:
PRIMARY KEY(id),
INDEX(server_id, date)
In theory, it can hop through that secondary index checking each server_id. But I am not sure that such an optimization exists.
Plan D: Do you need id for anything other than providing a unique PRIMARY KEY? If not, there may be other options.
The index on (id, date) doesn't help because the first key is id not date.
You can either
(a) drop the current index and index (date, id) instead -- when date is in the first place this can be used to filter for date regardless of the following columns -- or
(b) just create an additional index only on (date) to support the query.
I am trying to optimize the following mysql query:
SELECT events.id, events.tracking_id, events.event_time, events.event_type_id
FROM events
WHERE events.event_time >= 1564617600000000 AND events.event_time <= 1567295999000000
Here are the events table details:
CREATE TABLE `events` (
`id` char(36) NOT NULL,
`tracking_id` char(72) NOT NULL,
`event_time` bigint(16) NOT NULL,
`server_id` char(36) NOT NULL,
`project_id` char(36) NOT NULL,
`data_type_id` char(36) NOT NULL,
`event_type_id` char(36) NOT NULL,
PRIMARY KEY (`tracking_id`,`event_time`),
KEY `id_idx` (`id`),
KEY `server_id_idx` (`server_id`),
KEY `event_type_id_idx` (`event_type_id`),
KEY `event_time_idx` (`event_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And the Explain output:
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | events | NULL | ALL | event_time_idx | NULL | NULL | NULL | 2877592 | 37.48 | Using where |
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
The query takes about 30 seconds to run. And adding an index on event_time doesn't seem to have made any difference to the execution time - It doesn't look like the index is being used?
event_time was originally a char (36) but then I was getting the following warning: 'Cannot use range access on index 'event_time_idx' due to type or collation conversion on field 'event_time' which has dissappeared since I converted event_time to a bigint, but it's still not using the index.
What can I do to improve the performance of this query (which is actually a subquery in a much larger query)?
Do all the rows in your table, or at least a majority of them, match the condition? In other words, the timestamps you give are from 2019-08-01 00:00:00 to 2019-08-31 23:59:59, so one full month. Are most of the rows currently in your table from this month?
MySQL does cost-based optimization. It estimates the cost of reading an index entry, then using that to look up a row. This means two lookups per index entry, plus some overhead.
MySQL is correct to estimate that a table-scan might be better than using an index in certain cases. The threshold is not documented, but in my experience if it estimates the number of matching rows are over 20% of the table, it tends to do a table-scan. YMMV
You can use an index hint to tell MySQL that it should treat a table-scan as infinitely costly, so if the index can be used at all, it should prefer it.
SELECT events.id, events.tracking_id, events.event_time, events.event_type_id
FROM events FORCE INDEX (event_time_idx)
WHERE events.event_time >= 1564617600000000 AND events.event_time <= 1567295999000000
But keep in mind MySQL's cost-based optimizer might have been right. It might in fact be less costly to do the table-scan, depending on your data.
I'm trying to create a new table by joining four existing ones. My database is static, so making one large preprocessed table will simplify programming, and save lots of time in future queries. My query works fine when limited with a WHERE, but seems to either hang, or go too slowly to notice any progress.
Here's the working query. The result only takes a few seconds.
SELECT group.group_id, MIN(application.date), person.person_name, pers_appln.sequence
FROM group
JOIN application ON group.appln_id=application.appln_id
JOIN pers_appln ON pers_appln.appln_id=application.appln_id
JOIN person ON person.person_id=pers_appln.person_id
WHERE group_id="24601"
GROUP BY group.group_id, pers_appln.sequence
;
If I simply remove the WHERE line, it will run for days with nothing to show. Adding a CREATE TABLE newtable AS at the beginning does the same thing. It never moves beyond 0% progress.
The group, application, and person tables all use the MyISAM engine, while pers_appln uses InnoDB. The columns are all indexed. The table sizes range from about 40 million to 150 million rows. I know it's rather large, but I wouldn't think it would pose this much of a problem. The computer currently has 4GB of ram.
Any ideas how to make this work?
Here's the SHOW CREATE TABLE info. There are no views or virtual tables:
CREATE TABLE `group` (
`APPLN_ID` int(10) unsigned NOT NULL,
`GROUP_ID` int(10) unsigned NOT NULL,
KEY `idx_appln` (`APPLN_ID`),
KEY `idx_group` (`GROUP_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `application` (
`APPLN_ID` int(10) unsigned NOT NULL,
`APPLN_AUTH` char(2) NOT NULL DEFAULT '',
`APPLN_NR` varchar(20) NOT NULL DEFAULT '',
`APPLN_KIND` char(2) DEFAULT '',
`DATE` date DEFAULT NULL,
`IPR_TYPE` char(2) DEFAULT '',
PRIMARY KEY (`APPLN_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `person` (
`PERSON_ID` int(10) unsigned NOT NULL,
`PERSON_CTRY_CODE` char(2) NOT NULL,
`PERSON_NAME` varchar(300) DEFAULT NULL,
`PERSON_ADDRESS` varchar(500) DEFAULT NULL,
KEY `idx_person` (`PERSON_ID`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MAX_ROWS=30000000 AVG_ROW_LENGTH=100
CREATE TABLE `pers_appln` (
`PERSON_ID` int(10) unsigned NOT NULL,
`APPLN_ID` int(10) unsigned NOT NULL,
`SEQUENCE` smallint(4) unsigned DEFAULT NULL,
`PLACE` smallint(4) unsigned DEFAULT NULL,
KEY `idx_pers_appln` (`APPLN_ID`),
KEY `idx_person` (`PERSON_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (appln_id)
PARTITIONS 20 */
Here's the EXPLAIN of my query:
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
| 1 | SIMPLE | person | ALL | idx_person | NULL | NULL | NULL | 47827690 | Using temporary; Using filesort |
| 1 | SIMPLE | pers_appln | ref | idx_application,idx_person | idx_person | 4 | mydb.person.PERSON_ID | 1 | |
| 1 | SIMPLE | application | eq_ref | PRIMARY | PRIMARY | 4 | mydb.pers_appln.APPLN_ID | 1 | |
| 1 | SIMPLE | group | ref | idx_application | idx_application | 4 | mydb.pers_appln.APPLN_ID | 1 | |
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
Verify that key_buffer_size is about 200M and innodb_buffer_pool_size is about 1200M. Perhaps they could be bigger, but make sure you are not swapping.
group should have PRIMARY KEY(appln_id, group_id) and INDEX(group_id, appln_id) instead of the two KEYs it has.
pers_appln should have INDEX(person_id, appln_id) and INDEX(appln_id, person_id) instead of the two keys it has. If possible, one of those should be PRIMARY KEY, but watch out for the PARTITIONing.
A minor improvement would be to change those CHAR(2) fields to be CHARACTER SET ascii -- assuming you don't really need utf8. That would shrink the field from 6 bytes to 2 bytes per row.
The PARTITIONing is probably not helping at all. (No, I can't say that removing the PARTITIONing will speed it up much.)
If these suggestions do not help enough, please provide the output from EXPLAIN SELECT ...
Edit
Converting to InnoDB and specifying PRIMARY KEYs for all tables will help. This is because InnoDB "clusters" the PRIMARY KEY with the data. What you have now is a lot of bouncing between a MyISAM index and its data -- literally hundreds of millions of times. Assuming not everything can be cached in your small 4GB, that means a lot of disk I/O. I would not be surprised if the non-WHERE version would take a week to run. Even with InnoDB, there will be I/O, but some of it will be avoided because:
1. reaching into a table with the PK gets the data without another disk hit.
2. the extra indexes I proposed will avoid hitting the data, again avoiding an extra disk hit.
(Millions of references * "an extra disk hit" = days of time.)
If you switch all of your tables to InnoDB, you should lower key_buffer_size to 20M and raise innodb_buffer_pool_size to 1500M. (These are approximate; do not raise them so high that there is any swapping.)
Please show us the CREATE TABLEs with InnoDB -- I want to make sure each table has a PRIMARY KEY and which column(s) that is. The PRIMARY KEY makes a big difference in this particular situation.
For person, the MyISAM version has just a KEY(person_id). If you did not change the keys in the conversions, InnoDB will invent a PRIMARY KEY. When the JOIN to that table occurs, InnoDB will (1) drill down the BTree for key to find that invented PK value, then (2) drill down the PK+data BTree to find the row. If, instead, person_id could be the PK, that JOIN would run twice as fast. Possibly even faster--depending on how big the table is and how much it needs to jump around in the index / data. That is, the two BTree lookups is adding to the pressure on the cache (buffer_pool).
How big is each table? What was the final value for innodb_buffer_pool_size? Once you have changed everything from MyISAM to InnoDB, set key_buffer_size to 40M or less, and set innodb_buffer_pool_size to about 70% of available RAM. If the Data + Index sizes for all the tables are less than the buffer_pool, then (once cache is primed) the query won't have to do any I/O. This is easily a 10x speedup.
pers_appln is a many-to-many relationship? Then, probably
PRIMARY KEY(appln_id, person_id),
INDEX(person_id, appln_id) -- if you need to go the other direction, too.
I found the solution: switching to an SSD. My table creation time went from an estimated 45 days to 16 hours. Previously, the database spent all its time with hard drive I/O, barely even using 5% of the CPU or RAM.
Thanks everyone.
CREATE TABLE `TEST` (
`ID1` mediumint(8) NOT NULL default '0',
`ID2` mediumint(8) NOT NULL default '0',
`DATE` datetime NOT NULL default '0000-00-00 00:00:00',
UNIQUE KEY `COMBO_INDEX` (`ID1`,`ID2`),
KEY `ID2` (`ID2`)
) ENGINE=InnoDB`
This table has approx 16196496 records
EXPLAIN SELECT * FROM TEST WHERE ID1 IN ('8518582', '5398912', '6120243', '6841316', '7580078', '7671953', '7775737', '7792470', '7887985', '7888375', '7946516', '8008760', '8111722', '8211235', '8262746', '8365675', '8396853', '8399818', '8410062', '8459079', '8490683')
I am getting output as
+----+-------------+------------------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | TEST | ALL | ID1 | NULL | NULL | NULL | 16196496 | Using where |
+----+-------------+------------------------+------+---------------+------+---------+------+----------+-------------+
I don't understand why the query is not using any key.
Also, when I run this query in this innodb table, it's taking huge amount of time 329 second (MySQL version 5.0.45-log).
While if I run same query on myisam table, it's taking just 2 seconds (though on explain its showing the same result). I am using MySQL version 5.5.
Why is the query not taking any key?
innodb needs a primary key to fast seek to the row found in index. As long as you don't have any - mysql cannot do that so it prefers fullscan.
http://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
So the obvious solution - is to replace the unique key with a primary key (though personally I don't like natural primary keys, composite natural primary keys especially).
PS: seems like my guess in the comments about using numbers instead of strings helped. Though the advice about adding primary keys still in action - do that to get even better performance.
i am not sure but another reason which i though is "it might be case when there is insufficient memory available for indexes to load and hence full scan"
I've got a table that refuses to use index, and it always uses filesort.
The table is:
CREATE TABLE `article` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Category_ID` int(11) DEFAULT NULL,
`Subcategory` int(11) DEFAULT NULL,
`CTimestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`Publish` tinyint(4) DEFAULT NULL,
`Administrator_ID` int(11) DEFAULT NULL,
`Position` tinyint(4) DEFAULT '0',
PRIMARY KEY (`ID`),
KEY `Subcategory` (`Subcategory`,`Position`,`CTimestamp`,`Publish`),
KEY `Category_ID` (`Category_ID`,`CTimestamp`,`Publish`),
KEY `Position` (`Position`,`Category_ID`,`Publish`),
KEY `CTimestamp` (`CTimestamp`),
CONSTRAINT `article_ibfk_1` FOREIGN KEY (`Category_ID`) REFERENCES `category` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=94290 DEFAULT CHARSET=utf8
The query is:
SELECT * FROM article ORDER BY `CTimestamp`;
The explain is:
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
| 1 | SIMPLE | article | ALL | NULL | NULL | NULL | NULL | 63568 | Using filesort |
+----+-------------+---------+------+---------------+------+---------+------+-------+----------------+
When I remove the "ORDER BY" then all are working properly. All other indices (Subcategory, Position, etc) are working fine in other queries. Unfortunately, the timestamp refuses to be used, even with my simple select query. I'm sure I'm missing something important here.
How can I make MySQL use the timestamp index?
Thank you.
In this case, MySQL is not using your index for sorting, and it is a GOOD thing.
Why? Your table contains just 64k rows, average row width is about 26 bytes (if I added column sizes right), so total table size on disk should be around 2MB.
It is very cheap to read just 2MB of data from disk into memory (probably in just 1-2 disk operations or seeks) and then simply perform filesort in memory (probably variation of quicksort).
If MySQL did retrieval by index order as you wish, it would have to perform 64000 disk seek operations, one record after another! It would have been very, very slow.
Indexes can be good when you can use them to quickly jump to known location in huge file and read just small amount of data, like in WHERE clause. But, in this case, it is not good idea - and MySQL is not stupid!
If your table was very big (more than RAM size), then MySQL would certainly start using your index - and this is also good thing.
Well, you can always hint the index. Change your query to
SELECT * FROM article use index (CTimestamp);
This forces MySQL to use the index for the query. The EXPLAIN:
1, 'SIMPLE', 'article', 'ALL', '', '', '', '', 1, 100.00, ''
No filesort to see, and as the used index is CTimestamp, the result should be ordered accordingly.
Alternatively, you can keep your order by clause, but force the index usage:
SELECT * FROM article force index (CTimestamp) order by CTimestamp;
The problem is still strange, though. Have you considered posting it to the official MySQL help forums?
Edit: You seem to be in good company.
Edit: Forcing the index seems to work out well.