Why won't my query use the index on my datetime field? - mysql

I have the following table:
CREATE TABLE 'tableA'(
`col1` int(11) NOT NULL AUTO_INCREMENT,
`col2` varchar(20) NOT NULL,
`col3` int(11) NOT NULL,
`col4` varchar(200) NOT NULL,
`col5` varchar(15) NOT NULL,
`col6` datetime NOT NULL,
PRIMARY KEY (`col1`),
UNIQUE KEY `col2,col3` (`col2`,`col3`),
KEY `col6` (`col6`)
) ENGINE=InnoDB AUTO_INCREMENT=1881208 DEFAULT CHARSET=utf8
I have an index on col6, a datetime column. I have almost 2M rows in the table, and the dates range from 1/1/2007 to 11/27/2012.
When I run the following, it doesn't use my index:
EXPLAIN SELECT * FROM tableA ORDER BY col6 ASC
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | tableA | ALL | NULL | NULL | NULL | NULL | 1933765 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
I tried converting the datetime field to an integer and converting the datetime to a unix timestamp. However, it still won't use my index. What am I missing? Why does the optimizer insist on sorting through lots of rows (in this case 1,933,765 rows) rather than use the index?

Since you are not selecting on anything based on the index to narrow the result set, using it would only incur additional work to lookup via point-lookup every each row in the primary table.

Related

mysql unique index on char column not getting used with IN clause

Mysql version - 5.7.22
Table definition
CREATE TABLE `books` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`uuid` char(32) COLLATE utf8mb4_unicode_ci NOT NULL,
`title` varchar(254) COLLATE utf8mb4_unicode_ci NOT NULL,
`created` datetime(6) NOT NULL,
`modified` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `book_uuid` (`uuid`),
) ENGINE=InnoDB AUTO_INCREMENT=115 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Query
SELECT DISTINCT `books`.`id`,
`books`.`uuid`,
`books`.`title`
FROM `books`
WHERE (`books`.`uuid` IN ("334222a0e99b4a3e97f577665055208e",
"979c059840964934816280ba85c67221",
"4e2978c765dd435998666ea3083666e5",
"535aa78ba80e4215bbf75fb1e20cc5f3",
"f969fb10c72b4875aabdf75c1b493524",
"1daa0015055444a4b1c0821618a7a4d9",
"04f34ede284a4b86b0adddb405d30a75",
"513cad12c88c44c6ab248d43643459b9",
"de2bde6d016f4381ad0ba714234386fa",
"f645c2c9f1594a199a960b97b7015986",
"3ce02c072f24447a8a7b269a19ec554f",
"75450daf9d024d9d9c0df038437ae2c2",
"0e822042b50b4f79bb38304e0acde6f0",
"38d808fb3f9a4f57b4f7b30a141e7169",
"ecd424abd3a94a339383f6f8e668655e"))
ORDER BY `books`.`id` DESC
LIMIT 15;
when i do explain on this query it doesn't pick the index
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-----------------------------+
| 1 | SIMPLE | books | NULL | ALL | book_uuid | NULL | NULL | NULL | 107 | 12.15 | Using where; Using filesort |
+----+-------------+-----------------+------------+------+---------------+------+---------+------+------+----------+-----------------------------+
strangely the index is correctly used when there are 12 or less entries passed to the IN clause (without forcing index)
forcing the index works, but I cannot force index as this query is created by django ORM, cannot use django-mysql 's force_index as I am on Innodb
Note: I know 'distinct' and 'limit' can be avoided in this query, but its part of a bigger query, so i have kept it as is

[Partition Benifit on Indexed Column]

CREATE TABLE ofRoster (
`rosterID` bigint(20) NOT NULL,
`username` varchar(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
PRIMARY KEY (`rosterID`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB;
CREATE TABLE `ofRoster_par` (
`rosterID` bigint(20) NOT NULL AUTO_INCREMENT,
`username` int(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
UNIQUE KEY `rosterID` (`rosterID`,`username`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB AUTO_INCREMENT=412595 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (username)
PARTITIONS 10 */ ;
I created partition on username so that when i use select command it need to search on one partition only.
But i am not sure if this will be benifitial as there is already a index on username.
explain SELECT count(*) FROM ofRoster_par WHERE username='1';
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| 1 | SIMPLE | ofRoster_par | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 4 | const | 120 | Using index |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
explain SELECT count(*) FROM ofRoster WHERE username='1';
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | ofRoster | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 66 | const | 120 | Using where; Using index |
Right now there are just 400 000 records on the table but on the production records will be around 80 million.
Time taken by both query is also the same :-(
PARTITION BY HASH is, in my opinion, useless.
In your example, INDEX(username) on a non-partitioned table would probably be faster than using PARTITION BY HASH(username).
You already have such an index. How fast was it?
Here's what is happening:
With partitioning:
pick partition
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Without partitioning:
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Other comments:
If username is unique, consider making it the PRIMARY KEY and get rid of rosterID. (You may want to keep rosterID because it is smaller and used for JOINing to several other tables.)
Bug: You say INT(64) where you meant VARCHAR(64). This may have impacted your timing test.
"Prefix indexes" (jid(255)) are rarely useful. Let's see how you are using it.
80M rows does not warrant BIGINT (8 bytes); INT UNSIGNED (4 bytes) can handle 400 crore.
You understand that latin1 limits you to western European languages?
When using EXPLAIN with partitioned tables, use EXPLAIN PARTITIONS SELECT .... You may get some surprises.

MySQL optimization query

i have one MySQL issue. I have to optimize some queries on my website. One of them i have already done, but there are still some which i cannot resolve without your help.
I have a table called "news":
CREATE TABLE IF NOT EXISTS `news` (
`id` int(10) NOT NULL auto_increment,
`edited` smallint(1) NOT NULL default '0',
`site` varchar(30) default NULL,
`foreign_id` varchar(25) default NULL,
`title` varchar(255) NOT NULL,
`text` text NOT NULL,
`image` varchar(255) default NULL,
`horizontal` smallint(1) NOT NULL,
`image_author` varchar(255) default NULL,
`text_author` varchar(255) default NULL,
`lang` varchar(3) NOT NULL,
`link` varchar(255) NOT NULL,
`date` date NOT NULL,
`redirect` smallint(1) NOT NULL,
`parent` int(10) NOT NULL,
`views` int(5) NOT NULL,
`status` smallint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `lang` (`lang`,`status`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=47122 ;
as you can see i have two indexes: "lang" and "date"
I have tried some combinations of different indexes and this one has produced me the best results ... unfortunately only on my local computer. On the server i still have bad results. I want to say that the database is the same.
query:
SELECT id FROM news WHERE lang = 'en' AND STATUS =1 ORDER BY DATE DESC LIMIT 0, 10
localhost explain:
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | news | index | lang | date | 3 | NULL | 23 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
server explain:
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
| 1 | SIMPLE | news | ref | status | status | 13 | const,const | 15840 | Using where; Using filesort |
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
I have looked a lot of other similar topics, but unfortunately i cannot find any solution to work on my server. I will be very glad to here from you some solution with some explanation for that so i can optimize my other queries.
Thanks !
This is your query:
SELECT id
FROM news
WHERE lang = 'en' AND STATUS =1
ORDER BY DATE DESC
LIMIT 0, 10
The best index is one that contains all the fields used in the query (four fields in all). The ordering in the index is by equality conditions in the where clause followed by the order by clause followed by other columns in the select clause.
So, try this index: ndws(leng, status, date, id).

What is the best way to optimze this query with indexes?

I have a table with about 30 million records which I need to perform queries upon. From my reading, I thought that a composite index using leftmost prefixing with all the fields I need to select would be the correct way to do it, but when I run an explain on the query, it's not even using the index.
This is the query:
select distinct email FROM my_table
WHERE `customer_id` IN(278,428,186,40,208,247,59,79,376,73,38,52,68,227)
AND `company_id` = 4
AND `active` = 1
AND `date` > '2012-04-15';
The explain looks like this
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | emails | index | customer_id | email | 772 | NULL | 29296705 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
These are the fields
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`customer_id` int(10) unsigned DEFAULT NULL,
`company_id` int(10) unsigned NOT NULL,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date` date DEFAULT NULL
Indexes looks like this
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`,`customer_id`),
KEY `customer_id` (`customer_id`,`company_id`,`active`,`date`)
I'm not quite sure what the best way to optimize this is.
MySQL is often fussy about IN on the left side of the index. Try one query for each customer_id and see if that's using your index. You can use the UNION syntax to join them together The other possibility is that MySQL figures it's faster to sift through everything for 10% of rows than to try to use indexes for them.

Performance disparities between two almost identical tables

I have two tables that are all the same, except one has a timestamp value column and the other has a datetime value column. Indexes are the same. Values are the same.
But when I run SELECT station, MAX(timestamp) AS max_timestamp FROM stations GROUP BY station; if stations is the one with timestamps, it executes really fast, and if I try it with the datetime one, well I haven't seen one query executes. In both cases the timestampcolumn is indexed, only the type changes.
Where should I start looking for? Or is datetime just not suitable for search and indexing ?
Here is what EXPLAIN gives :
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | stations | range | NULL | stamp | 33 | NULL | 1511 | Using index for group-by |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
| 1 | SIMPLE |stations2 | index | NULL | station | 2 | NULL | 3025467 | |
+----+-------------+--------+-------+---------------+---------+---------+------+---------+-------+
And the SHOW:
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations | CREATE TABLE `stations` (
`station` varchar(10) COLLATE utf8_bin DEFAULT NULL,
`available` smallint(6) DEFAULT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
UNIQUE KEY `stamp` (`station`,`timestamp`),
KEY `time` (`timestamp`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| stations2 | CREATE TABLE `stations2` (
`station` smallint(5) unsigned NOT NULL,
`available` smallint(5) unsigned DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
KEY `station` (`station`),
KEY `timestamp` (`timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
You can see from the EXPLAIN that there is no key being used for selection (NULL for possible_keys). You don't have a WHERE clause, so this makes sense.
MySQL can utilize an index to determine MAX, and it can utilize an index to optimize GROUP BY. However, to be able to optimize both combined, you would need both the column in your MAX() function and the column in your GROUP BY clause to be in a compound index. In the first table, you have this compound index as a unique key called 'stamp'. The EXPLAIN result shows that MySQL is using that index.
On the second table, you don't have this compound index, so MySQL is having to perform a lot more work. It has to manually group the results and keep the MAX value for each station by manually scanning each row. If you add the same compound index on the second table, you will see similar performance between the two.
However, TIMESTAMP will still slightly outperform DATETIME because TIMESTAMP is treated as a single 4 byte integer value, which is processed faster than an 8 byte special DATETIME value. The larger the data set, the larger difference you will see.