Using Where during Explain for MySQL Query - mysql

I have a table when making a query,
EXPLAIN SELECT `id`
FROM `tblsender`
WHERE `userid` = '6'
AND `astatus` = '1'
AND `sender` = 'ABCDEF'
I am getting USING WHERE even after indexing in all possible ways. Here is my final table structure code.
CREATE TABLE IF NOT EXISTS `tblsender` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`sender` varchar(6) NOT NULL,
`astatus` tinyint(1) NOT NULL DEFAULT '0',
`userid` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `astatus` (`astatus`),
KEY `userid` (`userid`),
KEY `sender` (`sender`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=22975 ;
I even tried full text for sender column but still no luck and I also tried indexing on all where clause columns.
ALTER TABLE `tblsender` ADD INDEX ( `sender` , `astatus` , `userid` ) ;
Still getting using where, how can I properly index this table.
Edit: Explain output for above structure.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tblsender ref astatus,userid,sender astatus 1 const 1 Using where
and Explain output for all 3 columns together
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tblsender ref astatus,userid,sender,sender_2 astatus 1 const 1 Using where

You can't effectively predict optimizer behavior on large data sets when testing with small data sets.
As illustrated by the query plans, the multi column index is seen as a candidate, but the optimizer is choosing not to use it in this case. That doesn't mean it won't use it when it will be considered more beneficial.
I can only speculate without seeing your actual data set and perhaps using optimizer tracing, but I'll offer a reasonable speculation.
The optimizer in MySQL is cost-based. It tries to resolve your query in the least costly way possible. Note that rows = 1. This means that the optimizer has concluded that -- statistically, at least -- it expects that only 1 row is going to match in the index on astatus. With key_len = 1, meaning that astatus is only 1 byte wide -- as opposed to the multicolumn index, which is 11 bytes wide (1 + 6 + 4) -- the astatus index looks like a really inexpensive solution, so it decides to go with that index. Using the longer index theoretically means more I/O, therefore more costly, though in this case (because of a small data set) we humans recognize that the cost difference isn't particularly meaningful.
Using where means that for each row actually returned by using that index, the server will need to verify that the rows match the remainder of the WHERE clause, but if we're only expecting approximately 1 row to match, it's no big deal.
I suggest, then, that you do not have cause for concern, because the small size of the current data set is not going to be able to give you useful information in predicting future behavior. In this specific case, Using where is an artifact of the small number of rows in the table.
You need more data. But yes, you do want a multicolumn index here.

Related

MySQL indexes not being used in large database

I have a very simple query on a large table (about 37 million rows). This query takes over 10 mins to run and should be fast as the indexes are built correctly (I think). I do not understand why this query is taking so long. I am hoping someone can guide me in the right direction:
Query:
select type_id, sub_type_id, max(settlement_date_time) as max_dt
from transaction_history group by type_id, sub_type_id
Create Statement:
CREATE TABLE `transaction_history` (
`transaction_history_id` int(11) NOT NULL AUTO_INCREMENT,
`type_id` int(11) NOT NULL,
`sub_type_id` int(11) DEFAULT NULL,
`settlement_date_time` datetime DEFAULT NULL,
PRIMARY KEY (`transaction_history_id`),
KEY `sub_type_id_idx` (`sub_type_id_id`),
KEY `settlement_date` (`settlement_date_time`),
KEY `type_sub_type` (`type_id`,`sub_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=36832823 DEFAULT CHARSET=latin1;
Result from Explain:
id -> 1
select_type -> SIMPLE
table -> transaction_history
type -> index
possible_keys -> NULL
key -> type_sub_type
key_len -> 9
ref -> NULL
rows -> 37025337
filtered -> 100.00
Extra ->
Why is possible keys NULL? It says it is using an index but it does not seem like it is. why is ref NULL? How can I make this query more efficient? Is there something wrong with the indexes? Do I have to change any values MySQL config file?
Thank you
(Apologies to the two commenters who already gave the necessary INDEX; I'll try to say enough more to justify giving a 'Answer'.)
Use the 'composite' (and 'covering') index:
INDEX(type_id, sub_type_id, settlement_date_time)
There is no WHERE, so no need to worry about such columns. First come the columns in the order listed in GROUP BY, then comes the other column. The Optimizer will probably hop through the index very efficiently.
Why NULL? Well the 2-column index is useless. In general, if more than 20% of the table needs to be looked at, it is better to simply scan the table rather than bounce between the index BTree and the data BTree.
More tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql

Optimize Indexes for Particular Query in mySQL

I have a fairly simple query that is taking about 14 seconds to complete and I would like to speed it up. I think I have the correct indexes in place, but I'm not sure...
Here is the query
SELECT *
FROM opportunities
WHERE cid = 7785
AND STATUS != 4
AND otype != 200
AND links > 0
AND ontopic != 'F'
ORDER BY links DESC
LIMIT 0, 100;
Here is the table schema
CREATE TABLE `opportunities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`cid` int(11) NOT NULL,
`url` varchar(900) CHARACTER SET utf8 NOT NULL,
`status` tinyint(4) NOT NULL,
`links` int(11) NOT NULL,
`otype` int(11) NOT NULL,
`reserved` tinyint(4) NOT NULL,
`ontopic` varchar(3) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `cid` (`cid`,`url`),
KEY `cid1` (`cid`),
KEY `url` (`url`),
KEY `otype` (`otype`),
KEY `reserved` (`reserved`),
KEY `ontopic` (`ontopic`),
KEY `status` (`status`),
KEY `links` (`links`),
KEY `ontopic_links` (`ontopic`,`links`),
KEY `cid_status_otype_links_ontopic` (`cid`,`status`,`otype`,`links`,`ontopic`)
) ENGINE=InnoDB AUTO_INCREMENT=13022832 DEFAULT CHARSET=latin1
Here is the result of the EXPLAIN command
id: 1
select_type: Simple
table: opportunities
partitions: null
type: range
possible_keys: cid,cid1,otype,ontopic,status,links,ontopic_links,cid_status_otype_links_ontopic
key: links
keylen: 4
ref: null
rows: 1531552
filtered: 0.33
Extra: Using index condition; Using where
Thoughts / Questions
Am I reading it correctly that it is using the "links" key to do the query? Why wouldn't it use a more complete index, like the cid_status_otype_links_ontopic which covers all the conditions of my query?
Thanks in advance!
As requested
There are 30,961 results that match the query when you remove the LIMIT 0,100. Interestingly, the "count()" command returns almost instantaneously.
It's a funny thing about using inequality comparisons, that they count as range conditions.
That is, equality matches one value, but anything other than equality (!=, >, <, IN, BETWEEN).
By matching multiple values, it means that only the first column in an index used in a range condition is going to be optimized. You'd think that your index cid_status_otype_links_ontopic has all the columns mentioned in conditions of your query, but only the first two will be used. The first because you have an equality comparison for cid. The second because the next column is used in an inequality comparison, and then that's where it stops using columns from the index.*
Evidence: if you can force that index to be used, you should see the keylen field of the EXPLAIN result show only 5, which is the size of cid (4 bytes) + status (1 byte).
The MySQL optimizer apparently has predicted that it would be more beneficial to use your links index, because that allows it to access the rows in index order, which is the same as the sort order you requested with your ORDER BY.
Evidence: you don't see "Using filesort" in your EXPLAIN notes.
Is that really better than using one of the other indexes? Maybe, maybe not. The optimizer's predictions aren't always perfect.
You can use an index hint to override the optimizer's choice:
SELECT * FROM opportunities USE INDEX (cid_status_otype_links_ontopic) WHERE ...
Try that out, do the EXPLAIN of that query and compare it to your other EXPLAIN. Then execute both queries and see which is reliably faster.
(* Actually, I have to add a footnote about the index column usage. MySQL 5.6 and later can do a little bit better than just the two columns, when you see the note "Using Index Condition" in the EXPLAIN. But it's not quite the same. You can read more about that here: https://dev.mysql.com/doc/refman/5.6/en/index-condition-pushdown-optimization.html)
What you have must plow through all of the rows, using your 5-column index, then sort the results and deliver 100 rows.
The only index likely to be useful is INDEX(cid, links). This is because cid is the only column being tested with =, then having links might be useful for the ORDER BY and LIMIT. There is still the risk that the != tests will require filtering a lot of rows.
Are status and otype multi-valued? If either has only 2 values, then turning the != into = and adding it to the index would be beneficial.
Do you really need all the columns (SELECT *)? If not, and if you don't need any big columns (url), then you could go with a 'covering' index.
More on writing indexes .

How can I optimize this Mysql query with bad performance?

I have a table named 'activities' with 50M+ rows.
CREATE TABLE `activities` (
`activity_id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_of_contract` bigint(20) DEFAULT NULL,
`duration_s` int(11) DEFAULT NULL,
`timestamp_end` bigint(20) DEFAULT NULL,
`timestamp_start` bigint(20) DEFAULT NULL,
`id_of_room` bigint(20) DEFAULT NULL,
PRIMARY KEY (`activity_id`),
KEY `analyse` (`id_of_room`,`timestamp_end`,`timestamp_start`,`duration_s`),
ENGINE=InnoDB DEFAULT CHARSET=utf8;
I have this request:
select *
from activities
where id_of_room=3263
and timestamp_end>1471491882747
and timestamp_start<1479267882747
and duration_s>900
order by duration_s desc;
The explain return this:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE activities NULL range analyse analyse 18 NULL 1 5.00 Using index condition; Using filesort
The query returns in 1.5s. How can I optimize this?
Thanks for your help.
This construct: end > 1471491882747 and timestamp_start < 1479267882747 is essentially impossible to optimize, primarily because the Optimizer does know know whether there could be overlapping rows.
INDEX(id_of_room, duration_s) may make it run faster. If used, it would filter on id_of_room and duration_s, but more importantly, it would avoid the filesort. Not knowing the distribution of the values I (and the Optimizer) cannot predict whether this index will be better. And it is likely to be better for some values and worse for others.
One slight benefit would be to change BIGINT to INT UNSIGNED or maybe even MEDIUMINT UNSIGNED where appropriate`. With 50M rows, shrinking the data will decrease I/O.
innodb_buffer_pool_size should be set to about 70% of RAM.
A potentially big help is to avoid SELECT *. List only the columns you need. If that list is short enough, then devise a composite, covering, index.
One final way to speed up the query is with 'lazy eval':
SELECT a.*
FROM ( SELECT activity_id
FROM activities
where id_of_room=3263
and timestamp_end>1471491882747
and timestamp_start<1479267882747
and duration_s>900
) AS x
JOIN activities AS a USING(activity_id)
ORDER BY a.duration_s desc;
This will be beneficial if using a covering index for the derived table and lots of rows are filtered out. In this case, it is worth trying this ordering of index columns:
INDEX(id_of_room, duration_s, timestamp_start, timestamp_end, activity_id)

MySQL Query Optimization for GPS Tracking system

I have the following query:
SELECT * FROM `alltrackers`
WHERE `deviceid`='FT_99000083401624'
AND `locprovider`!='none'
ORDER BY `id` DESC
This is the show create table:
CREATE TABLE `alltrackers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`deviceid` varchar(50) NOT NULL,
`gpsdatetime` int(11) NOT NULL,
`locprovider` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `statename` (`statename`),
KEY `gpsdatetime` (`gpsdatetime`),
KEY `locprovider` (`locprovider`),
KEY `deviceid` (`deviceid`(18))
) ENGINE=MyISAM AUTO_INCREMENT=8665045 DEFAULT CHARSET=utf8;
I've removed the columns which I thought were unnecessary for this question.
This is the EXPLAIN output for this query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE alltrackers ref locprovider,deviceid deviceid 56 const 156416 Using
where; Using filesort
This particular query is showing as taking several seconds in mytop (mtop). I'm a bit confused though, as the same query but with a different "deviceid" doesn't take as long. Although I only need the last row, I've already removed LIMIT 1 as that makes it take even longer. This table currently contains 3 million rows.
It is used for storing the locations from different GPS devices. Each GPS device has a unique device ID. Locations come in and are added to the table. For statistics I'm running the above query to find the time of the last received location from a certain device.
I'm open to advice on ways to further optimize the query or even the tables.
Many thanks in advance.
If you only need the last row, add an index on (deviceid, id, locprovider). It would be even faster with an index on (deviceid, id, locprovider, gpsdatetime):
ALTER TABLE alltrackers
ADD INDEX special_covering_IDX
(deviceid, id, locprovider, gpsdatetime) ;
Then try this out:
SELECT id, locprovider, gpsdatetime
FROM alltrackers
WHERE deviceid = 'FT_99000083401624'
AND locprovider <> 'none'
ORDER BY id DESC
LIMIT 1 ;

Mysql slow perfomance on big table

I have following table with millions rows:
CREATE TABLE `points` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateNumber` int(10) unsigned DEFAULT NULL,
`Count` int(10) unsigned DEFAULT NULL,
`FPTKeyId` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `index3` (`FPTKeyId`,`DateNumber`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=16755134 DEFAULT CHARSET=utf8$$
As you can see i have created indexes. I donnt know am i do it right may be not.
The problem is queries execute super slow.
Let's take a simple query
SELECT fptkeyid, count FROM points group by fptkeyid
I cannt get result because query aborting by timeout(10 min). What i am doing wrong?
Beware MySQL's stupid behaviour: GROUP BYing implicitly executes ORDER BY.
To prevent this, explicitely add ORDER BY NULL, which prevents unnecessary ordering.
http://dev.mysql.com/doc/refman/5.0/en/select.html says:
If you use GROUP BY, output rows are sorted according to the GROUP BY
columns as if you had an ORDER BY for the same columns. To avoid the
overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
+
http://dev.mysql.com/doc/refman/5.6/en/group-by-optimization.html says:
The most important preconditions for using indexes for GROUP BY are
that all GROUP BY columns reference attributes from the same index,
and that the index stores its keys in order (for example, this is a
BTREE index and not a HASH index).
Your query does not make sense:
SELECT fptkeyid, count FROM points group by fptkeyid
You group by fptkeyid so count is not useful here. There should be an aggregate function. Not a count field. Next that that count is also a MySQL function which makes it not very useful / advisable to use the same name for a field.
Don't you need something like:
SELECT fptkeyid, SUM(`count`) FROM points group by fptkeyid
If not please explain what result you expect from the query.
Created a database with test data, half a million records, to see if I can find something equal to your issue. This is what the explain tells me:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE points index NULL index3 10 NULL 433756
And on the SUM query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE points index NULL index3 10 NULL 491781
Both queries are done on a laptop (macbook air) within a second, nothing takes long. Inserting though took some time, few minutes to get half a million records. But retrieving and calculating does not.
We need more to answer your question totally complete. Maybe the configuration of the database is wrong, for example almost no memory allocated?
I would personally start with your AUTO_INCREMENT value. You have set it to increase by 16,755,134 for each new record. Your field value is set to INT UNSIGNED which means that the range of values is 0 to 4,294,967,295 (or almost 4.3 billion). This means that you would have only 256 values before the field goes beyond the data type limits thereby compromising the purpose of the PRIMARY KEY INDEX.
You could changed the data type to BIGINT UNSIGNED and you would have a value range of 0 to 18,446,744,073,709,551,615 (or slightly more then 18.4 quintillion) which would allow you to have up to 1,100,960,700,983 (or slightly more then 1.1 trillion) unique values with this AUTO_INCREMENT value.
I would first ask if you really need to have your AUTO_INCREMENT value set to such a large number and if not then I would suggest changing that to 1 (or at least some lower number) as storing the field values as INT vs BIGINT will save considerable disk space within larger tables such as this. Either way, you should get a more stable PRIMARY KEY INDEX which should help improve queries.
I think the problem is your server bandwidth. Having a million rows would probably need at least high megabyte bandwidths.