I have a very simple query on a large table (about 37 million rows). This query takes over 10 mins to run and should be fast as the indexes are built correctly (I think). I do not understand why this query is taking so long. I am hoping someone can guide me in the right direction:
Query:
select type_id, sub_type_id, max(settlement_date_time) as max_dt
from transaction_history group by type_id, sub_type_id
Create Statement:
CREATE TABLE `transaction_history` (
`transaction_history_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`sub_type_id` int(11) DEFAULT NULL,
`settlement_date_time` datetime DEFAULT NULL,
PRIMARY KEY (`transaction_history_id`),
KEY `sub_type_id_idx` (`sub_type_id_id`),
KEY `settlement_date` (`settlement_date_time`),
KEY `type_sub_type` (`type_id`,`sub_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=36832823 DEFAULT CHARSET=latin1;
Result from Explain:
id -> 1
select_type -> SIMPLE
table -> transaction_history
type -> index
possible_keys -> NULL
key -> type_sub_type
key_len -> 9
ref -> NULL
rows -> 37025337
filtered -> 100.00
Extra ->
Why is possible keys NULL? It says it is using an index but it does not seem like it is. why is ref NULL? How can I make this query more efficient? Is there something wrong with the indexes? Do I have to change any values MySQL config file?
Thank you
(Apologies to the two commenters who already gave the necessary INDEX; I'll try to say enough more to justify giving a 'Answer'.)
Use the 'composite' (and 'covering') index:
INDEX(type_id, sub_type_id, settlement_date_time)
There is no WHERE, so no need to worry about such columns. First come the columns in the order listed in GROUP BY, then comes the other column. The Optimizer will probably hop through the index very efficiently.
Why NULL? Well the 2-column index is useless. In general, if more than 20% of the table needs to be looked at, it is better to simply scan the table rather than bounce between the index BTree and the data BTree.
More tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Related
I have a table when making a query,
EXPLAIN SELECT `id`
FROM `tblsender`
WHERE `userid` = '6'
AND `astatus` = '1'
AND `sender` = 'ABCDEF'
I am getting USING WHERE even after indexing in all possible ways. Here is my final table structure code.
CREATE TABLE IF NOT EXISTS `tblsender` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`sender` varchar(6) NOT NULL,
`astatus` tinyint(1) NOT NULL DEFAULT '0',
`userid` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `astatus` (`astatus`),
KEY `userid` (`userid`),
KEY `sender` (`sender`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=22975 ;
I even tried full text for sender column but still no luck and I also tried indexing on all where clause columns.
ALTER TABLE `tblsender` ADD INDEX ( `sender` , `astatus` , `userid` ) ;
Still getting using where, how can I properly index this table.
Edit: Explain output for above structure.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tblsender ref astatus,userid,sender astatus 1 const 1 Using where
and Explain output for all 3 columns together
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tblsender ref astatus,userid,sender,sender_2 astatus 1 const 1 Using where
You can't effectively predict optimizer behavior on large data sets when testing with small data sets.
As illustrated by the query plans, the multi column index is seen as a candidate, but the optimizer is choosing not to use it in this case. That doesn't mean it won't use it when it will be considered more beneficial.
I can only speculate without seeing your actual data set and perhaps using optimizer tracing, but I'll offer a reasonable speculation.
The optimizer in MySQL is cost-based. It tries to resolve your query in the least costly way possible. Note that rows = 1. This means that the optimizer has concluded that -- statistically, at least -- it expects that only 1 row is going to match in the index on astatus. With key_len = 1, meaning that astatus is only 1 byte wide -- as opposed to the multicolumn index, which is 11 bytes wide (1 + 6 + 4) -- the astatus index looks like a really inexpensive solution, so it decides to go with that index. Using the longer index theoretically means more I/O, therefore more costly, though in this case (because of a small data set) we humans recognize that the cost difference isn't particularly meaningful.
Using where means that for each row actually returned by using that index, the server will need to verify that the rows match the remainder of the WHERE clause, but if we're only expecting approximately 1 row to match, it's no big deal.
I suggest, then, that you do not have cause for concern, because the small size of the current data set is not going to be able to give you useful information in predicting future behavior. In this specific case, Using where is an artifact of the small number of rows in the table.
You need more data. But yes, you do want a multicolumn index here.
I have a table named 'activities' with 50M+ rows.
CREATE TABLE `activities` (
`activity_id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_of_contract` bigint(20) DEFAULT NULL,
`duration_s` int(11) DEFAULT NULL,
`timestamp_end` bigint(20) DEFAULT NULL,
`timestamp_start` bigint(20) DEFAULT NULL,
`id_of_room` bigint(20) DEFAULT NULL,
PRIMARY KEY (`activity_id`),
KEY `analyse` (`id_of_room`,`timestamp_end`,`timestamp_start`,`duration_s`),
ENGINE=InnoDB DEFAULT CHARSET=utf8;
I have this request:
select *
from activities
where id_of_room=3263
and timestamp_end>1471491882747
and timestamp_start<1479267882747
and duration_s>900
order by duration_s desc;
The explain return this:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE activities NULL range analyse analyse 18 NULL 1 5.00 Using index condition; Using filesort
The query returns in 1.5s. How can I optimize this?
Thanks for your help.
This construct: end > 1471491882747 and timestamp_start < 1479267882747 is essentially impossible to optimize, primarily because the Optimizer does know know whether there could be overlapping rows.
INDEX(id_of_room, duration_s) may make it run faster. If used, it would filter on id_of_room and duration_s, but more importantly, it would avoid the filesort. Not knowing the distribution of the values I (and the Optimizer) cannot predict whether this index will be better. And it is likely to be better for some values and worse for others.
One slight benefit would be to change BIGINT to INT UNSIGNED or maybe even MEDIUMINT UNSIGNED where appropriate`. With 50M rows, shrinking the data will decrease I/O.
innodb_buffer_pool_size should be set to about 70% of RAM.
A potentially big help is to avoid SELECT *. List only the columns you need. If that list is short enough, then devise a composite, covering, index.
One final way to speed up the query is with 'lazy eval':
SELECT a.*
FROM ( SELECT activity_id
FROM activities
where id_of_room=3263
and timestamp_end>1471491882747
and timestamp_start<1479267882747
and duration_s>900
) AS x
JOIN activities AS a USING(activity_id)
ORDER BY a.duration_s desc;
This will be beneficial if using a covering index for the derived table and lots of rows are filtered out. In this case, it is worth trying this ordering of index columns:
INDEX(id_of_room, duration_s, timestamp_start, timestamp_end, activity_id)
I'm trying to optimize a very basic MYSQL example and I can't seem to figure out how to prevent the below query from doing a table scan when referencing the column uid. Using explain, tt shows a possible key correctly but doesn't actually use the key and scans all the rows.
CREATE TABLE `Foo` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`barId` int(10) unsigned NOT NULL,
`uid` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `barId` (`barId`),
KEY `uid` (`uid`)
)
explain
select count(uid) as userCount
FROM Foo
WHERE barId = 1
GROUP BY barId
id select_type table type possible_keys key rows Extra
1 SIMPLE Foo ALL barId NULL 4 Using where
Sample data
id,barId,uid
1,1,1
2,1,2
3,1,3
4,2,4
It looks like MySQL is being smart and realizing it would take more time to use the index with a table that small?
When I EXPLAIN it empty, the key is "barId".
With 4 rows (your
sample data), key is NULL.
With 4096 rows (I ran INSERT SELECT to
itself a handful of times), key returns to "barID".
From the Manual at the bottom.
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.
I have the following query:
SELECT * FROM `alltrackers`
WHERE `deviceid`='FT_99000083401624'
AND `locprovider`!='none'
ORDER BY `id` DESC
This is the show create table:
CREATE TABLE `alltrackers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`deviceid` varchar(50) NOT NULL,
`gpsdatetime` int(11) NOT NULL,
`locprovider` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `statename` (`statename`),
KEY `gpsdatetime` (`gpsdatetime`),
KEY `locprovider` (`locprovider`),
KEY `deviceid` (`deviceid`(18))
) ENGINE=MyISAM AUTO_INCREMENT=8665045 DEFAULT CHARSET=utf8;
I've removed the columns which I thought were unnecessary for this question.
This is the EXPLAIN output for this query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE alltrackers ref locprovider,deviceid deviceid 56 const 156416 Using
where; Using filesort
This particular query is showing as taking several seconds in mytop (mtop). I'm a bit confused though, as the same query but with a different "deviceid" doesn't take as long. Although I only need the last row, I've already removed LIMIT 1 as that makes it take even longer. This table currently contains 3 million rows.
It is used for storing the locations from different GPS devices. Each GPS device has a unique device ID. Locations come in and are added to the table. For statistics I'm running the above query to find the time of the last received location from a certain device.
I'm open to advice on ways to further optimize the query or even the tables.
Many thanks in advance.
If you only need the last row, add an index on (deviceid, id, locprovider). It would be even faster with an index on (deviceid, id, locprovider, gpsdatetime):
ALTER TABLE alltrackers
ADD INDEX special_covering_IDX
(deviceid, id, locprovider, gpsdatetime) ;
Then try this out:
SELECT id, locprovider, gpsdatetime
FROM alltrackers
WHERE deviceid = 'FT_99000083401624'
AND locprovider <> 'none'
ORDER BY id DESC
LIMIT 1 ;
I have MySql table and query that I'm trying to optimize and have some questions.
SELECT value FROM table WHERE userid=?userid AND date <= ?date AND deleted='False' ORDER BY date DESC LIMIT 1
The table:
CREATE TABLE `table` (
`tableid` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(11) DEFAULT NULL,
`value` double DEFAULT '0',
`date` date DEFAULT NULL,
`deleted` enum('False','True') DEFAULT 'False',
PRIMARY KEY (`tableid`),
KEY `userid_date` (`userid`)
) ENGINE=InnoDB;
I get the following EXPLAIN result for the query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ref userid_date userid_date 5 const 4 Using where; Using filesort
If I change to use also date in the userid_date key (KEY userid_date (userid,date)) I get the following EXPLAIN result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table range userid_date userid_date 9 NULL 4 Using where
This is better as it is not using filesort, but it seems like the type is not as good as when only using userid as key. How would you set the index for a table and query like this? Is it good to use a date type as index?
This is better as it is not using filesort
yes, and because one more where filter is available in the index.
You could also think about adding deleted (userid, deleted, date). However, make sure that the equal comparison go before the column you use for the range and order by clause--otherwise your sort will come back.
but it seems like the type is not as good as when only using userid
why? I think your index will be good enough. the deleted column might not be worth, and maybe just wasting space. but that depends if other queries use the column as well and how the 'False' rate is in you database.