I have a table named 'activities' with 50M+ rows.
CREATE TABLE `activities` (
`activity_id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_of_contract` bigint(20) DEFAULT NULL,
`duration_s` int(11) DEFAULT NULL,
`timestamp_end` bigint(20) DEFAULT NULL,
`timestamp_start` bigint(20) DEFAULT NULL,
`id_of_room` bigint(20) DEFAULT NULL,
PRIMARY KEY (`activity_id`),
KEY `analyse` (`id_of_room`,`timestamp_end`,`timestamp_start`,`duration_s`),
ENGINE=InnoDB DEFAULT CHARSET=utf8;
I have this request:
select *
from activities
where id_of_room=3263
and timestamp_end>1471491882747
and timestamp_start<1479267882747
and duration_s>900
order by duration_s desc;
The explain return this:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE activities NULL range analyse analyse 18 NULL 1 5.00 Using index condition; Using filesort
The query returns in 1.5s. How can I optimize this?
Thanks for your help.
This construct: end > 1471491882747 and timestamp_start < 1479267882747 is essentially impossible to optimize, primarily because the Optimizer does know know whether there could be overlapping rows.
INDEX(id_of_room, duration_s) may make it run faster. If used, it would filter on id_of_room and duration_s, but more importantly, it would avoid the filesort. Not knowing the distribution of the values I (and the Optimizer) cannot predict whether this index will be better. And it is likely to be better for some values and worse for others.
One slight benefit would be to change BIGINT to INT UNSIGNED or maybe even MEDIUMINT UNSIGNED where appropriate`. With 50M rows, shrinking the data will decrease I/O.
innodb_buffer_pool_size should be set to about 70% of RAM.
A potentially big help is to avoid SELECT *. List only the columns you need. If that list is short enough, then devise a composite, covering, index.
One final way to speed up the query is with 'lazy eval':
SELECT a.*
FROM ( SELECT activity_id
FROM activities
where id_of_room=3263
and timestamp_end>1471491882747
and timestamp_start<1479267882747
and duration_s>900
) AS x
JOIN activities AS a USING(activity_id)
ORDER BY a.duration_s desc;
This will be beneficial if using a covering index for the derived table and lots of rows are filtered out. In this case, it is worth trying this ordering of index columns:
INDEX(id_of_room, duration_s, timestamp_start, timestamp_end, activity_id)
Related
I encountered a very puzzling optimization case. I'm no SQL expert but still this case seems to defy my understanding of clustered key principles.
I have the below table schema:
CREATE TABLE `orders` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`chargeQuote` tinyint(1) NOT NULL,
`features` int(11) NOT NULL,
`sequenceIndex` int(11) NOT NULL,
`createdAt` bigint(20) NOT NULL,
`previousSeqId` bigint(20) NOT NULL,
`refOrderId` bigint(20) NOT NULL,
`refSeqId` bigint(20) NOT NULL,
`seqId` bigint(20) NOT NULL,
`updatedAt` bigint(20) NOT NULL,
`userId` bigint(20) NOT NULL,
`version` bigint(20) NOT NULL,
`amount` decimal(36,18) NOT NULL,
`fee` decimal(36,18) NOT NULL,
`filledAmount` decimal(36,18) NOT NULL,
`makerFeeRate` decimal(36,18) NOT NULL,
`price` decimal(36,18) NOT NULL,
`takerFeeRate` decimal(36,18) NOT NULL,
`triggerOn` decimal(36,18) NOT NULL,
`source` varchar(32) NOT NULL,
`status` varchar(50) NOT NULL,
`symbol` varchar(32) NOT NULL,
`type` varchar(50) NOT NULL,
PRIMARY KEY (`id`),
KEY `IDX_STATUS` (`status`) USING BTREE,
KEY `IDX_USERID_SYMBOL_STATUS_TYPE` (`userId`,`symbol`,`status`,`type`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=7937243 DEFAULT CHARSET=utf8mb4;
This is a big table. 100 million rows. It's already sharded by createdAt, so 100 million = 1 month worth of orders.
I have a below slow query. The query is pretty straight-forward:
select id,chargeQuote,features,sequenceIndex,createdAt,previousSeqId,refOrderId,refSeqId,seqId,updatedAt,userId,version,amount,fee,filledAmount,makerFeeRate,price,takerFeeRate,triggerOn,source,`status`,symbol,type
from orders where 1=1
and userId=100000
and createdAt >= '1567775174000' and createdAt <= '1567947974000'
and symbol in ( 'BTC_USDT' )
and status in ( 'FULLY_FILLED' , 'PARTIAL_CANCELLED' , 'FULLY_CANCELLED' )
and type in ( 'BUY_LIMIT' , 'BUY_MARKET' , 'SELL_LIMIT' , 'SELL_MARKET' )
order by id desc limit 0,20;
This query takes 24 seconds. The number of rows that satisfy userId=100000 is very little, around 100. And the number of rows that satisfy this entire where clause is 0.
But when I did a small tweak, that is, I changed the order by clause:
order by id desc limit 0,20; -- before
order by createdAt desc, id desc limit 0,20; -- after
It became very fast, 0.03 seconds.
I can see it made a big difference in MySQL engine because explain gives that, before the change it was using key: PRIMARY and after it finally uses key: IDX_USERID_SYMBOL_STATUS_TYPE, as expected, and I guess therefore very fast. Here's the explain plan:
select_type table partitions type possible_keys key key_len ref rows filtered Extra
SIMPLE orders index IDX_STATUS,IDX_USERID_SYMBOL_STATUS_TYPE PRIMARY 8 20360 0.02 Using where
SIMPLE orders range IDX_STATUS,IDX_USERID_SYMBOL_STATUS_TYPE IDX_USERID_SYMBOL_STATUS_TYPE 542 26220 11.11 Using index condition; Using where; Using filesort
So what gives? Actually I was very surprised by the fact that it was not naturally sorted by id (which is the PRIMARY KEY). Isn't this the clustered key in MySQL? And why it chose to not to use index when it's sorted by id?
I'm very puzzled because a more demanding query (sort by 2 conditions) is super fast but a more lenient query is slow.
And no, I tried ANALYZE TABLE orders; and nothing happened.
MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n:
Read all qualifying rows, sort them, and pick the n top rows.
Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, the MySQL optimizer evidently thinks that the second strategy is the best. Inn other words, it does not see that the WHERE clause will not be satisfied by any rows, but thinks that 2% of the rows will satisfy the WHERE clause, and that it will be able to find 20 rows by only scanning part of the table backwards in PRIMARY key order.
How the filtering effect of a WHERE clause is estimated varies quite a bit between 5.6, 5.7, and 8.0. If you are using MySQL 8.0, you can try to create histograms for the columns involved to see if that can improve the estimation. If not, I think your only option is to use a FORCE INDEX hint to make the optimizer choose the desired index.
For your fast query, the second strategy is not an option since there is no index on createdAt that can be used to avoid sorting.
Update:
Reading Rick's answer, I realized that an index on only userId should speed up your ORDER BY id query. In such an index, the entries for a given userId will be sorted on primary key. Hence, using this index will both make it possible to only access the rows of the requested userId, and access the rows in the requested sort order (by id).
The main filters works well with cardinality estimator. When order by uses limit, this is automatically another filter, as data needs to be filter further. This may redirect cardinality estimator to prone to inaccurate estimation which eventually result a poor plan to be selected. In order to prove this, run the 24sec query without the limit clause. It should also respond at 0.3 as your trick.
In order to solve this, if you have a standard very good performance just with the main filters, select this first, and filter at later 2nd time where the result set will be significantly smaller than the whole table. Use something like:
select * from (select ...main select statement)
order by x limit by y
...or...
insert into temp select ...main select statement
select from temp order by x limit by y
Given
and userId=100000
and createdAt >= '1567775174000' and createdAt <= '1567947974000'
and ... -- I am not making use of the other items
order by createdAt DESC, id desc -- I am assuming this change
limit 0,20;
I would try
INDEX(userId, createdAt, id) -- in this order
userId is tested by = is first, thereby narrows down the part of the index to look at.
Leave out the columns tested by IN. If there are multiple values in a IN, we can't make use of step 4.
createdAt filters further by range.
createdAt and id are compared in the same direction (DESC). (Yes, I know 8.0 has an improvement, but I don't think you wanted (ASC, DESC)).
I have a very simple query on a large table (about 37 million rows). This query takes over 10 mins to run and should be fast as the indexes are built correctly (I think). I do not understand why this query is taking so long. I am hoping someone can guide me in the right direction:
Query:
select type_id, sub_type_id, max(settlement_date_time) as max_dt
from transaction_history group by type_id, sub_type_id
Create Statement:
CREATE TABLE `transaction_history` (
`transaction_history_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`sub_type_id` int(11) DEFAULT NULL,
`settlement_date_time` datetime DEFAULT NULL,
PRIMARY KEY (`transaction_history_id`),
KEY `sub_type_id_idx` (`sub_type_id_id`),
KEY `settlement_date` (`settlement_date_time`),
KEY `type_sub_type` (`type_id`,`sub_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=36832823 DEFAULT CHARSET=latin1;
Result from Explain:
id -> 1
select_type -> SIMPLE
table -> transaction_history
type -> index
possible_keys -> NULL
key -> type_sub_type
key_len -> 9
ref -> NULL
rows -> 37025337
filtered -> 100.00
Extra ->
Why is possible keys NULL? It says it is using an index but it does not seem like it is. why is ref NULL? How can I make this query more efficient? Is there something wrong with the indexes? Do I have to change any values MySQL config file?
Thank you
(Apologies to the two commenters who already gave the necessary INDEX; I'll try to say enough more to justify giving a 'Answer'.)
Use the 'composite' (and 'covering') index:
INDEX(type_id, sub_type_id, settlement_date_time)
There is no WHERE, so no need to worry about such columns. First come the columns in the order listed in GROUP BY, then comes the other column. The Optimizer will probably hop through the index very efficiently.
Why NULL? Well the 2-column index is useless. In general, if more than 20% of the table needs to be looked at, it is better to simply scan the table rather than bounce between the index BTree and the data BTree.
More tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I'm trying to optimize a very basic MYSQL example and I can't seem to figure out how to prevent the below query from doing a table scan when referencing the column uid. Using explain, tt shows a possible key correctly but doesn't actually use the key and scans all the rows.
CREATE TABLE `Foo` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`barId` int(10) unsigned NOT NULL,
`uid` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `barId` (`barId`),
KEY `uid` (`uid`)
)
explain
select count(uid) as userCount
FROM Foo
WHERE barId = 1
GROUP BY barId
id select_type table type possible_keys key rows Extra
1 SIMPLE Foo ALL barId NULL 4 Using where
Sample data
id,barId,uid
1,1,1
2,1,2
3,1,3
4,2,4
It looks like MySQL is being smart and realizing it would take more time to use the index with a table that small?
When I EXPLAIN it empty, the key is "barId".
With 4 rows (your
sample data), key is NULL.
With 4096 rows (I ran INSERT SELECT to
itself a handful of times), key returns to "barID".
From the Manual at the bottom.
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.
I have the following query:
SELECT * FROM `alltrackers`
WHERE `deviceid`='FT_99000083401624'
AND `locprovider`!='none'
ORDER BY `id` DESC
This is the show create table:
CREATE TABLE `alltrackers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`deviceid` varchar(50) NOT NULL,
`gpsdatetime` int(11) NOT NULL,
`locprovider` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `statename` (`statename`),
KEY `gpsdatetime` (`gpsdatetime`),
KEY `locprovider` (`locprovider`),
KEY `deviceid` (`deviceid`(18))
) ENGINE=MyISAM AUTO_INCREMENT=8665045 DEFAULT CHARSET=utf8;
I've removed the columns which I thought were unnecessary for this question.
This is the EXPLAIN output for this query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE alltrackers ref locprovider,deviceid deviceid 56 const 156416 Using
where; Using filesort
This particular query is showing as taking several seconds in mytop (mtop). I'm a bit confused though, as the same query but with a different "deviceid" doesn't take as long. Although I only need the last row, I've already removed LIMIT 1 as that makes it take even longer. This table currently contains 3 million rows.
It is used for storing the locations from different GPS devices. Each GPS device has a unique device ID. Locations come in and are added to the table. For statistics I'm running the above query to find the time of the last received location from a certain device.
I'm open to advice on ways to further optimize the query or even the tables.
Many thanks in advance.
If you only need the last row, add an index on (deviceid, id, locprovider). It would be even faster with an index on (deviceid, id, locprovider, gpsdatetime):
ALTER TABLE alltrackers
ADD INDEX special_covering_IDX
(deviceid, id, locprovider, gpsdatetime) ;
Then try this out:
SELECT id, locprovider, gpsdatetime
FROM alltrackers
WHERE deviceid = 'FT_99000083401624'
AND locprovider <> 'none'
ORDER BY id DESC
LIMIT 1 ;
one question that I should be able to answer myself but I don't and I also don't find any answer in google:
I have a table that contains 5 million rows with this structure:
CREATE TABLE IF NOT EXISTS `files_history2` (
`FILES_ID` int(10) unsigned DEFAULT NULL,
`DATE_FROM` date DEFAULT NULL,
`DATE_TO` date DEFAULT NULL,
`CAMPAIGN_ID` int(10) unsigned DEFAULT NULL,
`CAMPAIGN_STATUS_ID` int(10) unsigned DEFAULT NULL,
`ON_HOLD` decimal(1,0) DEFAULT NULL,
`DIVISION_ID` int(11) DEFAULT NULL,
KEY `DATE_FROM` (`DATE_FROM`),
KEY `FILES_ID` (`FILES_ID`),
KEY `CAMPAIGN_ID` (`CAMPAIGN_ID`),
KEY `CAMP_DATE` (`CAMPAIGN_ID`,`DATE_FROM`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When I execute
SELECT files_id, min( date_from )
FROM files_history2
WHERE campaign_id IS NOT NULL
GROUP BY files_id
the query rests with status "Sending data" for more than eight hours (then I killed the process).
Here the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE files_history2 ALL CAMPAIGN_ID,CAMP_DATE NULL NULL NULL 5073254 Using where; Using temporary; Using filesort
I assume that I generated the necessary keys but then the query should take that long, does it?
I would suggest a different index... Index on (Files_ID, Date_From, Campaign_ID)...
Since your group by is on Files_ID, you want THOSE grouped. Then the MIN( Date_From), so that is in second position... Then FINALLY the Campaign_ID to qualify for not null and here's why...
If you put all your campaign IDs first, great, get all the NULLs out of the way... Now, you have 1,000 campaigns and the Files_ID spans MANY campaigns and they also span many dates, you are going to choke.
By the index I'm projecting, by the Files_ID first, you have each "files_id" already ordered to match your group by. Then, within that, all the earliest dates are at the top of the indexed list... great, almost there, then, by campaign ID. Skip over whatever NULL may be there and you are done, on to the next Files_ID
Hope this makes sense -- unless you have TONs of entries with NULL value campaigns.
Also, by having all 3 parts of the index matching the criteria and output columns of your query, it never has to go back to the raw data file for the data, it gets it all from the index directly.
I'd create a covering index (CAMPAIGN_ID, files_id, date_from) and check that performance. I suspect your issue is due to the grouping not and date_from not being able to use the same index.
CREATE INDEX your_index_name ON files_history2 (CAMPAIGN_ID, files_id, date_from);
If this works you could drop the point index CAMPAIGN_ID as it's included in the composite index.
Well the query is slow due to the aggregation ( function MIN ) along with grouping.
One of the solution is altering your query by moving the aggregating subquery from the WHERE clause to the FROM clause, which will be lot faster than the approach you are using.
try following:
SELECT f.files_id
FROM file_history2 AS f
JOIN (
SELECT campaign_id, MIN(date_from) AS datefrom
FROM file_history2
GROUP BY files_id
) AS f1 ON f.campaign_id = f1.campaign_id AND f.date_from = f1.datefrom;
This should have lot better performance, if doesn't work temporary table would only be the choice to go with.