What is the best way to optimze this query with indexes? - mysql

I have a table with about 30 million records which I need to perform queries upon. From my reading, I thought that a composite index using leftmost prefixing with all the fields I need to select would be the correct way to do it, but when I run an explain on the query, it's not even using the index.
This is the query:
select distinct email FROM my_table
WHERE `customer_id` IN(278,428,186,40,208,247,59,79,376,73,38,52,68,227)
AND `company_id` = 4
AND `active` = 1
AND `date` > '2012-04-15';
The explain looks like this
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
| 1 | SIMPLE | emails | index | customer_id | email | 772 | NULL | 29296705 | Using where |
+----+-------------+--------+-------+---------------+-------+---------+------+----------+-------------+
These are the fields
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`customer_id` int(10) unsigned DEFAULT NULL,
`company_id` int(10) unsigned NOT NULL,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`date` date DEFAULT NULL
Indexes looks like this
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`,`customer_id`),
KEY `customer_id` (`customer_id`,`company_id`,`active`,`date`)
I'm not quite sure what the best way to optimize this is.

MySQL is often fussy about IN on the left side of the index. Try one query for each customer_id and see if that's using your index. You can use the UNION syntax to join them together The other possibility is that MySQL figures it's faster to sift through everything for 10% of rows than to try to use indexes for them.

Related

[Partition Benifit on Indexed Column]

CREATE TABLE ofRoster (
`rosterID` bigint(20) NOT NULL,
`username` varchar(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
PRIMARY KEY (`rosterID`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB;
CREATE TABLE `ofRoster_par` (
`rosterID` bigint(20) NOT NULL AUTO_INCREMENT,
`username` int(64) NOT NULL,
`jid` varchar(1024) NOT NULL,
`sub` tinyint(4) NOT NULL,
`ask` tinyint(4) NOT NULL,
`recv` tinyint(4) NOT NULL,
`nick` varchar(255) DEFAULT NULL,
UNIQUE KEY `rosterID` (`rosterID`,`username`),
KEY `ofRoster_unameid_idx` (`username`),
KEY `ofRoster_jid_idx` (`jid`(255))
) ENGINE=InnoDB AUTO_INCREMENT=412595 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (username)
PARTITIONS 10 */ ;
I created partition on username so that when i use select command it need to search on one partition only.
But i am not sure if this will be benifitial as there is already a index on username.
explain SELECT count(*) FROM ofRoster_par WHERE username='1';
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
| 1 | SIMPLE | ofRoster_par | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 4 | const | 120 | Using index |
+----+-------------+--------------+------+----------------------+----------------------+---------+-------+------+-------------+
explain SELECT count(*) FROM ofRoster WHERE username='1';
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+----------------------+----------------------+---------+-------+------+--------------------------+
| 1 | SIMPLE | ofRoster | ref | ofRoster_unameid_idx | ofRoster_unameid_idx | 66 | const | 120 | Using where; Using index |
Right now there are just 400 000 records on the table but on the production records will be around 80 million.
Time taken by both query is also the same :-(
PARTITION BY HASH is, in my opinion, useless.
In your example, INDEX(username) on a non-partitioned table would probably be faster than using PARTITION BY HASH(username).
You already have such an index. How fast was it?
Here's what is happening:
With partitioning:
pick partition
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Without partitioning:
use KEY(username) (and not the data) to do the COUNT(*) inside the index (note "Using index")
Other comments:
If username is unique, consider making it the PRIMARY KEY and get rid of rosterID. (You may want to keep rosterID because it is smaller and used for JOINing to several other tables.)
Bug: You say INT(64) where you meant VARCHAR(64). This may have impacted your timing test.
"Prefix indexes" (jid(255)) are rarely useful. Let's see how you are using it.
80M rows does not warrant BIGINT (8 bytes); INT UNSIGNED (4 bytes) can handle 400 crore.
You understand that latin1 limits you to western European languages?
When using EXPLAIN with partitioned tables, use EXPLAIN PARTITIONS SELECT .... You may get some surprises.

Why does MySQL ignore index on ORDER BY

I've been trying to wrap my head around this for a good while, but had no luck. I have a simple queue system implemented on my small site and a cron job to check if there are any items in the queue. It's supposed to fetch several items ordered by priority and process them, but for some reason the priority index gets ignored. My create table syntax is
CREATE TABLE `site_queue` (
`row_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`task` tinyint(3) unsigned NOT NULL COMMENT '0 - email',
`priority` int(10) unsigned DEFAULT NULL,
`commands` text NOT NULL,
`added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`row_id`),
KEY `task` (`task`),
KEY `priority` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
The query to fetch queued items is
SELECT `row_id`, `task`, `commands` FROM `site_queue` ORDER BY `priority` DESC LIMIT 5;
The EXPLAIN query returns the following:
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
| 1 | SIMPLE | site_queue | ALL | NULL | NULL | NULL | NULL | 1269 | Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------+
Can anyone offer some insight on what might be causing this?
Because when it's only few rows (originally 4, then increased to 1k) there is no reason to use index, since it will be slower (mysql will have to read both index and data pages too many times).
So the rule of thumb of mysql query optimizations: use reasonably big amount of data when you do so. It would be good if size was comparable to real production data size.

Mysql Slow Query - Even with all the indices

mysql> explain
select c.userEmail,f.customerId
from comments c
inner join flows f
on (f.id = c.typeId)
inner join users u
on (u.email = c.userEmail)
where c.addTime >= 1372617000
and c.addTime <= 1374776940
and c.type = 'flow'
and c.automated = 0;
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
| 1 | SIMPLE | f | index | PRIMARY | customerId | 4 | NULL | 144443 | Using index |
| 1 | SIMPLE | c | ref | userEmail_idx,addTime,automated,typeId | typeId | 198 | f.id,const | 1 | Using where |
| 1 | SIMPLE | u | eq_ref | email | email | 386 | c.userEmail | 1 | Using index |
+----+-------------+-------+--------+----------------------------------------+------------+---------+---------------------+--------+-------------+
How do I make the above query faster - it constantly shows up in the slow query logs.
Indexes present :
id is the auto incremented primary key of the flows table.
customerId of flows table.
userEmail of comments table.
composite index (typeId,type) on comments table.
email of users table (unique)
automated of comments table.
addTime of comments table.
Number of rows :
1. flows - 150k
2. comments - 500k (half of them have automated = 1 and others have automated = 0) (also value of type is 'flow' for all the rows except 500)
3. users - 50
Table schemas :
users | CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(128) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=56 DEFAULT CHARSET=utf8
comments | CREATE TABLE `comments` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`userEmail` varchar(128) DEFAULT NULL,
`content` mediumtext NOT NULL,
`addTime` int(11) NOT NULL,
`typeId` int(11) NOT NULL,
`automated` tinyint(4) NOT NULL,
`type` varchar(64) NOT NULL,
PRIMARY KEY (`id`),
KEY `userEmail_idx` (`userEmail`),
KEY `addTime` (`addTime`),
KEY `automated` (`automated`),
KEY `typeId` (`typeId`,`type`)
) ENGINE=InnoDB AUTO_INCREMENT=572410 DEFAULT CHARSET=utf8 |
flows | CREATE TABLE `flows` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(32) NOT NULL,
`status` varchar(128) NOT NULL,
`customerId` int(11) NOT NULL,
`createTime` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `flowType_idx` (`type`),
KEY `customerId` (`customerId`),
KEY `status` (`status`),
KEY `createTime` (`createTime`),
) ENGINE=InnoDB AUTO_INCREMENT=134127 DEFAULT CHARSET=utf8 |
You have the required indexes to perform the joins efficiently. However, it looks like MySQL is joining the tables in a less efficient manner. The EXPLAIN output shows that it is doing a full index scan of the flows table then joining the comments table.
It will probably be more efficient to read the comments table first before joining. That is, in the order you have specified in your query so that the comment set is restricted by the predicates you have supplied (probably what you intended).
Running OPTIMISE TABLE or ANALYZE TABLE can improve the decision that the query optimiser makes. Particularly on tables that have had extensive changes.
If the query optimiser still gets it wrong you can force tables to be read in the order you specify in the query by beginning your statement with SELECT STRAIGHT_JOIN or by changing the INNER JOIN to STRAIGHT_JOIN.

MySQL optimization query

i have one MySQL issue. I have to optimize some queries on my website. One of them i have already done, but there are still some which i cannot resolve without your help.
I have a table called "news":
CREATE TABLE IF NOT EXISTS `news` (
`id` int(10) NOT NULL auto_increment,
`edited` smallint(1) NOT NULL default '0',
`site` varchar(30) default NULL,
`foreign_id` varchar(25) default NULL,
`title` varchar(255) NOT NULL,
`text` text NOT NULL,
`image` varchar(255) default NULL,
`horizontal` smallint(1) NOT NULL,
`image_author` varchar(255) default NULL,
`text_author` varchar(255) default NULL,
`lang` varchar(3) NOT NULL,
`link` varchar(255) NOT NULL,
`date` date NOT NULL,
`redirect` smallint(1) NOT NULL,
`parent` int(10) NOT NULL,
`views` int(5) NOT NULL,
`status` smallint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `lang` (`lang`,`status`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=47122 ;
as you can see i have two indexes: "lang" and "date"
I have tried some combinations of different indexes and this one has produced me the best results ... unfortunately only on my local computer. On the server i still have bad results. I want to say that the database is the same.
query:
SELECT id FROM news WHERE lang = 'en' AND STATUS =1 ORDER BY DATE DESC LIMIT 0, 10
localhost explain:
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | news | index | lang | date | 3 | NULL | 23 | Using where |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------------+
server explain:
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
| 1 | SIMPLE | news | ref | status | status | 13 | const,const | 15840 | Using where; Using filesort |
+----+-------------+-------+------+---------------+--------+---------+-------------+-------+-----------------------------+
I have looked a lot of other similar topics, but unfortunately i cannot find any solution to work on my server. I will be very glad to here from you some solution with some explanation for that so i can optimize my other queries.
Thanks !
This is your query:
SELECT id
FROM news
WHERE lang = 'en' AND STATUS =1
ORDER BY DATE DESC
LIMIT 0, 10
The best index is one that contains all the fields used in the query (four fields in all). The ordering in the index is by equality conditions in the where clause followed by the order by clause followed by other columns in the select clause.
So, try this index: ndws(leng, status, date, id).

Why won't my query use the index on my datetime field?

I have the following table:
CREATE TABLE 'tableA'(
`col1` int(11) NOT NULL AUTO_INCREMENT,
`col2` varchar(20) NOT NULL,
`col3` int(11) NOT NULL,
`col4` varchar(200) NOT NULL,
`col5` varchar(15) NOT NULL,
`col6` datetime NOT NULL,
PRIMARY KEY (`col1`),
UNIQUE KEY `col2,col3` (`col2`,`col3`),
KEY `col6` (`col6`)
) ENGINE=InnoDB AUTO_INCREMENT=1881208 DEFAULT CHARSET=utf8
I have an index on col6, a datetime column. I have almost 2M rows in the table, and the dates range from 1/1/2007 to 11/27/2012.
When I run the following, it doesn't use my index:
EXPLAIN SELECT * FROM tableA ORDER BY col6 ASC
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | tableA | ALL | NULL | NULL | NULL | NULL | 1933765 | Using filesort |
+----+-------------+----------+------+---------------+------+---------+------+---------+----------------+
I tried converting the datetime field to an integer and converting the datetime to a unix timestamp. However, it still won't use my index. What am I missing? Why does the optimizer insist on sorting through lots of rows (in this case 1,933,765 rows) rather than use the index?
Since you are not selecting on anything based on the index to narrow the result set, using it would only incur additional work to lookup via point-lookup every each row in the primary table.