I have the following MySQL (MyISAM) table with about 3 Million rows.
CREATE TABLE `tasks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`node` smallint(6) NOT NULL,
`pid` int(11) NOT NULL,
`job` int(11) NOT NULL,
`a_id` int(11) DEFAULT NULL,
`user_id` int(11) NOT NULL,
`state` int(11) NOT NULL,
`start_time` int(11) NOT NULL,
`end_time` int(11) NOT NULL,
`stop_time` int(11) NOT NULL,
`end_stream` int(11) NOT NULL,
`message` varchar(255) DEFAULT NULL,
`rate` float NOT NULL,
`exiting` int(11) NOT NULL DEFAULT '0',
`bytes` int(11) NOT NULL,
`motion` tinyint(4) NOT NULL,
PRIMARY KEY (`id`),
KEY `a_id` (`a_id`),
KEY `job` (`job`),
KEY `state` (`state`),
KEY `end_time` (`end_time`),
KEY `start_time` (`start_time`),
) ENGINE=MyISAM AUTO_INCREMENT=100 DEFAULT CHARSET=utf8;
Now when I run the following query, MySQL is only using the a_id index and needs to scan a few thousand rows.
SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' )
AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
When I add an additional index KEY newkey (a_id,state,start_time), MySQL is still trying to use a_id only and not newkey. Only when using the hint / force index in the query, it's been used. Changing the fields in the query around does not help.
Any ideas? I don't necessarily want hints in my statements. The fact that MySQL is not doing this automatically indicates to me that there is an issue with my table, keys or query somewhere. Any help is highly appreciated.
Additional info:
mysql> show index in tasks;
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| tasks | 0 | PRIMARY | 1 | id | A | 3130554 | NULL | NULL | | BTREE | | |
| tasks | 1 | a_id | 1 | a_id | A | 2992 | NULL | NULL | YES | BTREE | | |
| tasks | 1 | job | 1 | job | A | 5 | NULL | NULL | | BTREE | | |
| tasks | 1 | state | 1 | state | A | 9 | NULL | NULL | | BTREE | | |
| tasks | 1 | end_time | 1 | end_time | A | 1565277 | NULL | NULL | | BTREE | | |
| tasks | 1 | newkey | 1 | a_id | A | 2992 | NULL | NULL | YES | BTREE | | |
| tasks | 1 | newkey | 2 | state | A | 8506 | NULL | NULL | | BTREE | | |
| tasks | 1 | newkey | 3 | start_time | A | 3130554 | NULL | NULL | | BTREE | | |
+-------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EXPLAIN with and without quotes:
mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job='1' OR job='3' ) AND a_id='614' AND state >'80' AND state < '100' AND start_time >='1386538013';
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tasks | ref | a_id,job,state,newkey | a_id | 5 | const | 740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.10 sec)
mysql> DESCRIBE SELECT count(id) AS tries FROM `tasks` WHERE ( job=1 OR job=3 ) AND a_id = 614 AND state > 80 AND state < 100 AND start_time >= 1386538013;
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
| 1 | SIMPLE | tasks | ref | a_id,job,state,newkey | a_id | 5 | const | 740 | Using where |
+----+-------------+-------+------+----------------------------+-----------+---------+-------+------+-------------+
1 row in set (0.01 sec)
A few things... I would have a SINGLE compound index on
( a_id, job, state, start_time )
This to help optimize the query on all the criteria, in what I believe is the best tuned sequence. A single "A_ID", then two jobs, a small state range, then time based. Next, notice no quotes... It appears you were converting numeric to string comparisons, leave them as numeric for compare -- faster than strings.
Also, by having them all as part of the index, it is a COVERING index meaning it does NOT have to go to the raw page data to get the other values to test the qualifying records to include or not.
SELECT
count(*) AS tries
FROM
tasks
WHERE
a_id = 614
AND job IN ( 1, 3 )
AND state > 80 AND state < 100
AND start_time >= 1386538013;
Now, the why the index... consider the following scenario. You have two rooms that have boxes... In the first room, each box is an "a_id", within that are the jobs in order, within each job are the state ranges, and finally by start time.
In another room, your boxes are sorted by start time, within that a_id are sorted, and finally state.
Which would be easier to find what you need. That is how you should think on the indexes. I would rather go to one box for "A_ID = 614", then jump to Job 1 and another for Job 3. Within each Job 1, Job 3, grab 80-100, then time. You however know better your data and volume in each criteria consideration and may adjust.
Finally, the count(ID) vs count(*). All I care about is a record qualified. I don't need to know the actual ID as the filtering criteria already qualified as include or not, why look (in this case) for the actual "ID".
Probably mysql thinks that using the a_id key will using less IO.
Probably the cardinality of the key a_id is good enough.
What explains of the hinted/hintless queries say?
Most of a_id=614's state has > 80 and < 100, then it could be happened. Have you tried one of below indexes?
INDEX(a_id, start_time, state)
INDEX(start_time, a_id, state)
Related
Why does this query not use indexes here? The table uses the InnoDB engine.
explain SELECT null as id,
-> up_time,
-> reg_date,
-> refer,
-> MAX(IFNULL(visits_count,0)) as visits_count,
-> MAX(IFNULL(register_count,0)) as register_count,
-> MAX(IFNULL(players_count,0)) as players_count,
-> MAX(IFNULL(activity_count,0)) as activity_count,
-> MAX(IFNULL(payment_users_count,0)) as payment_users_count,
-> MAX(IFNULL(payment_count,0)) as payment_count,
-> MAX(IFNULL(payment_sum,0)) as payment_sum FROM stats_refers
->
-> WHERE
->
-> stats_refers.reg_date < 1435006800
-> AND stats_refers.up_time < 1435006800
->
-> GROUP BY stats_refers.refer, stats_refers.reg_date;
And the explain:
+----+-------------+--------------+------+----------------------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | stats_refers | ALL | reg_date,stat,up_reg_index | NULL | NULL | NULL | 2983126 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+----------------------------+------+---------+------+---------+----------------------------------------------+
And the keys that can be used:
+--------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| stats_refers | 0 | PRIMARY | 1 | id | A | 2983126 | NULL | NULL | | BTREE | | |
| stats_refers | 0 | reg_date | 1 | reg_date | A | 13317 | NULL | NULL | | BTREE | | |
| stats_refers | 0 | reg_date | 2 | up_time | A | 1491563 | NULL | NULL | | BTREE | | |
| stats_refers | 0 | reg_date | 3 | refer | A | 2983126 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | stat | 1 | reg_date | A | 15142 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | stat | 2 | refer | A | 28683 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | refer_uptime | 1 | refer | A | 2307 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | refer_uptime | 2 | up_time | A | 1491563 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | up_reg_index | 1 | reg_date | A | 2314 | NULL | NULL | | BTREE | | |
| stats_refers | 1 | up_reg_index | 2 | up_time | A | 1491563 | NULL | NULL | | BTREE | | |
+--------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
And here is the table description:
CREATE TABLE `stats_refers` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`reg_date` int(10) unsigned NOT NULL,
`up_time` int(10) unsigned NOT NULL,
`refer` varchar(16) NOT NULL DEFAULT '',
`visits_count` int(10) unsigned NOT NULL DEFAULT '0',
`register_count` int(10) unsigned NOT NULL DEFAULT '0',
`players_count` int(10) unsigned NOT NULL DEFAULT '0',
`activity_count` int(10) unsigned NOT NULL DEFAULT '0',
`payment_users_count` int(10) unsigned NOT NULL DEFAULT '0',
`payment_count` int(10) unsigned NOT NULL DEFAULT '0',
`payment_sum` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `reg_date` (`reg_date`,`up_time`,`refer`),
KEY `stat` (`reg_date`,`refer`),
KEY `refer_uptime` (`refer`,`up_time`),
KEY `up_reg_index` (`reg_date`,`up_time`)
) ENGINE=InnoDB AUTO_INCREMENT=4136504 DEFAULT CHARSET=utf8 |
+----+-------------+--------------+-------+-----------------------------------+-------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-----------------------------------+-------------+---------+------+---------+-------------+
| 1 | SIMPLE | stats_refers | index | reg_date,up_time,reg,search_index | group_index | 54 | NULL | 3011896 | Using where |
+----+-------------+--------------+-------+-----------------------------------+-------------+---------+------+---------+-------------+
The compound index stat, which I assume was created for use for this grouping operation, is in the reverse order for doing the grouping operation that you are trying to perform. To have an index available for this grouping operation, you would need to do one of the following:
reverse the column order on the stat index
add an individual index on refer
add another compound index on refer, reg_date
What decision you make would ultimately need to consider other query operations on the table.
You may want to think more broadly about your index usage here. While using compound indexes can improve performance vs. using individual indexes for each field that might need to be used across the range of queries on the table, in your case, you are replicating indexes on the same fields in different combinations within your table description. Without understanding all query use cases you might have it would be hard to look at overall indexing recommendations, but I just wanted to point out that you may want to think about this.
For example, there would be no apparent need at all for your up_reg_index as that indexing is already covered by the reg_date unique index. You might be best served with a set of indexes like this:
PRIMARY KEY (`id`),
UNIQUE KEY `regdate_uptime_refer` (`reg_date`,`up_time`,`refer`),
KEY `reg_date` (`reg_date`),
KEY `refer` (`refer`),
KEY `up_time` (`up_time`),
This would certainly require less space for your indexes than what you currently have and would allow for more flexibility in how to filter/join/group on these columns, but don't take this as a firm recommendation. Test the performance of different indexing scenarios against your different query scenarios (particularly insert performance if that is a use case).
INDEX(refer, reg_date)
in that order might get the optimizer to use that for the GROUP BY.
INDEX(up_time)
might be useful for the part of the WHERE that references up_time.
reg_date < 1435006800
AND up_time < 1435006800
The optimizer will think about using INDEX(reg_date, ...) or INDEX(up_time, ...), but only if the 'range' is less than, say, 20% of the table. It is very unlikely to try to use two separate indexes and do a "index merge".
Because of the "ranges" in the WHERE, it is not possible to have an index that handles both the WHERE and the ORDER BY.
No other indexes are useful for that query.
This 20% is imprecise. It exists because if too much of the table needs to be looked at, it is actually faster to do a table scan than to use the index.
You are selecting up_time, but not grouping by it. That's naughty. The query can deliver whatever up_time it feels like.
Why have an id, since you have a UNIQUE key that could be the PRIMARY KEY? I ask because the "clustering" of the PK could lead to this being the best option:
PRIMARY KEY(refer, reg_date, uptime)
(Of course, changing just that might mess up other queries.)
I am using MySQL 5.6 on Linux (RHEL). The database client is a Java program. The table in question (MyISAM or InnoDB, have tried both) has a multicolumn index comprising two integers (id's from other tables) and a timestamp.
I want to delete records which have timestamps before a given date. I have found that this operation is relatively slow (on the order of 30 seconds in a table which has a few million records). But I've also found that if the other two fields in the index are specified, the operation is much faster. No big surprise there.
I believe I could query the two non-timestamp tables for their index values and then loop over the delete operation, specifying one value of each id each time. I hope that wouldn't take too long; I haven't tried it yet. But it seems like I should be able to get MySQL to do the looping for me. I tried a query of the form
delete from mytable where timestamp < '2013-08-17'
and index1 in (select id from foo)
and index2 in (select id from bar);
but that's actually slower than
delete from mytable where timestamp < '2013-08-17';
Two questions. (1) Is there something I can do to speed up delete operations which depend only on timestamp? (2) Failing that, is there something I can do to get MySQL to loop over two other two index columns (and do it quickly)?
I actually tried this operation with both MyISAM and InnoDB tables with the same data -- they are approximately equally slow.
Thanks in advance for any light you can shed on this problem.
EDIT: More info about the table structure. Here is the output of show create table mytable:
CREATE TABLE `mytable` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`fooId` int(10) unsigned NOT NULL,
`barId` int(10) unsigned NOT NULL,
`baz` double DEFAULT NULL,
`quux` varchar(16) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `fooId` (`fooId`,`barId`,`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=14221944 DEFAULT CHARSET=latin1 COMMENT='stuff'
Here is the output of show indexes from mytable:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
|mytable| 0 | PRIMARY | 1 | id | A | 2612681 | NULL | NULL | | BTREE | | |
|mytable| 0 | fooId | 1 | fooId | A | 20 | NULL | NULL | | BTREE | | |
|mytable| 0 | fooId | 2 | barId | A | 3294 | NULL | NULL | | BTREE | | |
|mytable| 0 | fooId | 3 | timestamp | A | 2612681 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT: More info -- output from "explain".
mysql> explain delete from mytable using mytable inner join foo inner join bar where mytable.fooId=foo.id and mytable.barId=bar.id and timestamp<'2012-08-27';
+----+-------------+-------+-------+---------------+---------+---------+-------------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+-------------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | foo | index | PRIMARY | name | 257 | NULL | 26 | Using index |
| 1 | SIMPLE | bar | index | PRIMARY | name | 257 | NULL | 38 | Using index; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE |mytable| ref | fooId | fooId | 8 | foo.foo.id,foo.bar.id | 211 | Using where |
+----+-------------+-------+-------+---------------+---------+---------+-------------------------------+------+----------------------------------------------------+
Use the multiple-table DELETE syntax to join the tables:
DELETE mytable
FROM mytable
JOIN foo ON foo.id = mytable.index1
JOIN bar ON bar.id = mytable.index2
WHERE timestamp < '2013-08-17'
I think that this should perform particularly well if mytable has a composite index over (index1, index2, timestamp) (and both foo and bar have indexes on their id columns, which will of course be the case if those columns are PK).
Forget about the other two ids. Add an index just on the time stamp. Otherwise you may be traversing the whole table.
I have this big table (ca million records) and I'm trying to retrieve the last record of each type.
The table, the index and the query are very simple, and the fact that MySQL is not using the index means I must be overlooking something.
The table looks like this:
CREATE TABLE `MyTable001` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`TypeField` int(11) NOT NULL,
`Value` bigint(20) NOT NULL,
`Timestamp` bigint(20) NOT NULL,
`AnotherField1` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_MyTable001_TypeField` (`TypeField`),
KEY `idx_MyTable001_Timestamp` (`Timestamp`)
) ENGINE=MyISAM
Show Index gives this:
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| MyTable001 | 0 | PRIMARY | 1 | id | A | 626141 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_TypeField | 1 | TypeField | A | 458 | NULL | NULL | | BTREE | | |
| MyTable001 | 1 | idx_MyTable001_Timestamp | 1 | Timestamp | A | 156535 | NULL | NULL | | BTREE | | |
+------------+------------+--------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
But when I execute EXPLAIN for the following query:
SELECT *
FROM MyTable001
GROUP BY TypeField
ORDER BY id DESC
The result is this:
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | MyTable001 | ALL | NULL | NULL | NULL | NULL | 626141 | Using temporary; Using filesort |
+----+-------------+------------+------+---------------+------+---------+------+--------+---------------------------------+
Why won't MySQL use idx_MyTable001_TypeField?
Thanks in advance.
The problem is that the content of the fields not in the group by are still being inspected. Therefore, all rows must be read, and it's better to do a full table scan. This is clearly seen with the following examples:
SELECT TypeField, COUNT(*) FROM MyTable001 GROUP BY TypeField uses the index.
SELECT TypeField, COUNT(id) FROM MyTable001 GROUP BY TypeField does not.
The original query was incorrect. The correct query is:
SELECT l.*
FROM MyTable001 l
JOIN (
SELECT MAX(id) m_id
FROM MyTable001 l
GROUP BY l.TypeField) l_id ON l_id.m_id = l.id;
It takes 260ms in a table with 630k records. Joachim Isaksson's and fancyPants' alternatives took several minutes in my tests.
I have a table with 18,310,298 records right now.
And next query
SELECT COUNT(obj_id) AS cnt
FROM
`common`.`logs`
WHERE
`event` = '11' AND
`obj_type` = '2' AND
`region` = 'us' AND
DATE(`date`) = DATE('20120213010502');
With next structure
CREATE TABLE `logs` (
`log_id` int(11) NOT NULL AUTO_INCREMENT,
`event` tinyint(4) NOT NULL,
`obj_type` tinyint(1) NOT NULL DEFAULT '0',
`obj_id` int(11) unsigned NOT NULL DEFAULT '0',
`region` varchar(3) NOT NULL DEFAULT '',
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`log_id`),
KEY `event` (`event`),
KEY `obj_type` (`obj_type`),
KEY `region` (`region`),
KEY `for_stat` (`event`,`obj_type`,`obj_id`,`region`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=83126347 DEFAULT CHARSET=utf8 COMMENT='Logs table' |
and MySQL explain show the next
+----+-------------+-------+------+--------------------------------+----------+---------+-------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------+--------------------------------+----------+---------+-------------+--------+----------+--------------------------+
| 1 | SIMPLE | logs | ref | event,obj_type,region,for_stat | for_stat | 2 | const,const | 837216 | 100.00 | Using where; Using index |
+----+-------------+-------+------+--------------------------------+----------+---------+-------------+--------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
Running such query in daily peak usage time take about 5 seconds.
What can I do to make it faster ?
UPDATED: Regarding all comments I modified INDEX and take off DATE function in WHERE clause
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| logs | 0 | PRIMARY | 1 | log_id | A | 15379109 | NULL | NULL | | BTREE | |
| logs | 1 | event | 1 | event | A | 14 | NULL | NULL | | BTREE | |
| logs | 1 | obj_type | 1 | obj_type | A | 14 | NULL | NULL | | BTREE | |
| logs | 1 | region | 1 | region | A | 14 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 1 | event | A | 157 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 2 | obj_type | A | 157 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 3 | region | A | 157 | NULL | NULL | | BTREE | |
| logs | 1 | for_stat | 4 | date | A | 157 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> explain extended SELECT COUNT(obj_id) as cnt
-> FROM `common`.`logs`
-> WHERE `event`= '11' AND
-> `obj_type` = '2' AND
-> `region`= 'est' AND
-> date between '2012-11-25 00:00:00' and '2012-11-25 23:59:59';
+----+-------------+-------+-------+--------------------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+-------+--------------------------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | logs | range | event,obj_type,region,for_stat | for_stat | 21 | NULL | 9674 | 75.01 | Using where |
+----+-------------+-------+-------+--------------------------------+----------+---------+------+------+----------+-------------+
It seems it's running faster. Thanks everyone.
The EXPLAIN output shows that the query is using only the first two columns of the for_stat index.
This is because the query doesn't use obj_id in the WHERE clause. If you create a new key without obj_id (or modify the existing key to reorder the columns), more of the key can be used and you may see better performance:
KEY `for_stat2` (`event`,`obj_type`,`region`,`date`)
If it's still too slow, changing the last condition, where you use DATE(), as said by Salman and Sashi, might improve things.
#Joni already explained what is wrong with your index. For query, I assume that your example query selects all records for 2012-02-13 regardless of time. You can change the where clause to use >= and < instead of DATE cast:
SELECT COUNT(obj_id) AS cnt
FROM
`common`.`logs`
WHERE
`event` = 11 AND
`obj_type` = 2 AND
`region` = 'us' AND
`date` >= DATE('20120213010502') AND
`date` < DATE('20120213010502') + INTERVAL 1 DAY
The date function on the date column is making the full table scan.
Try this ::
SELECT COUNT(obj_id) as cnt
FROM
`common`.`logs`
WHERE
`event` = 11
AND
`obj_type` = 2
AND
`region` = 'us'
AND
`date` = DATE('20120213010502')
As logging (inserts) needs to be fast too, use as less indices as possible.
Evaluation may take long as that is admin, not necessarily needing indices.
CREATE TABLE `logs` (
`log_id` int(11) NOT NULL AUTO_INCREMENT,
`event` tinyint(4) NOT NULL,
`obj_type` tinyint(1) NOT NULL DEFAULT '0',
`obj_id` int(11) unsigned NOT NULL DEFAULT '0',
`region` varchar(3) NOT NULL DEFAULT '',
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`log_id`),
KEY `for_stat` (`event`,`obj_type`,`region`,`date`)
) ENGINE=InnoDB AUTO_INCREMENT=83126347 DEFAULT CHARSET=utf8 COMMENT='Logs table' |
And about the date search #SashiKant and #SalmanA already answered.
Is Mysql you should place index columns by collation count; less possible values in table - placed closer to the left.
Also you can try to change column region to enum() and try to search date with BETWEEN clause.
Mysql is not using third column in the index because it's usage takes more efforts then just filtering (it's a common thing in Mysql).
I have the following MySQL query:
SELECT pool.username
FROM pool
LEFT JOIN sent ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
AND (`location` = 'united states' OR `location` = 'us' OR `location` = 'usa');
The problem is that the pool table contains millions of rows and this query takes over 12 minutes to complete. I realize that in this query, the entire left table (pool) is being scanned. The pool table has an auto incremented id row.
I would like to split this query into multiple queries so that rather than scanning the entire pool table I scan 1000 rows at a time and in the next query I would pick up where I left off (1000-2000,2000-3000) and so on using the id column to keep track.
How can I specify this in my query? Please show examples if you know the answer. Thank you.
here are my indexes if it helps:
mysql> show index from main.pool;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| pool | 0 | PRIMARY | 1 | id | A | 9275039 | NULL | NULL | | BTREE | |
| pool | 1 | username | 1 | username | A | 9275039 | NULL | NULL | | BTREE | |
| pool | 1 | source | 1 | source | A | 1 | NULL | NULL | | BTREE | |
| pool | 1 | location | 1 | location | A | 38168 | NULL | NULL | | BTREE | |
| pool | 1 | pdex | 1 | gender | A | 2 | NULL | NULL | | BTREE | |
| pool | 1 | pdex | 2 | username | A | 9275039 | NULL | NULL | | BTREE | |
| pool | 1 | pdex | 3 | id | A | 9275039 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.00 sec)
mysql> show index from main.sent;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| sent | 0 | PRIMARY | 1 | primary_key | A | 351 | NULL | NULL | | BTREE | |
| sent | 1 | username | 1 | username | A | 175 | NULL | NULL | | BTREE | |
| sent | 1 | sdex | 1 | campid | A | 7 | NULL | NULL | | BTREE | |
| sent | 1 | sdex | 2 | username | A | 351 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
and here is the explain for my query:
----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------+---------+--------------------------------------+
| 1 | SIMPLE | pool | ref | location,pdex | pdex | 5 | const | 6084332 | Using where |
| 1 | SIMPLE | sent | index | sdex | sdex | 309 | NULL | 351 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+------+---------+-------+---------+--------------------------------------+
here is the structure of the pool table:
| pool | CREATE TABLE `pool` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`username` varchar(50) CHARACTER SET utf8 NOT NULL,
`source` varchar(10) CHARACTER SET utf8 NOT NULL,
`gender` varchar(1) CHARACTER SET utf8 NOT NULL,
`location` varchar(50) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `username` (`username`),
KEY `source` (`source`),
KEY `location` (`location`),
KEY `pdex` (`gender`,`username`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9327026 DEFAULT CHARSET=latin1 |
here is the structure of the sent table:
| sent | CREATE TABLE `sent` (
`primary_key` int(50) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`from` varchar(50) NOT NULL,
`campid` varchar(255) NOT NULL,
`timestamp` int(20) NOT NULL,
PRIMARY KEY (`primary_key`),
KEY `username` (`username`),
KEY `sdex` (`campid`,`username`)
) ENGINE=MyISAM AUTO_INCREMENT=352 DEFAULT CHARSET=latin1 |
This produces a syntax error but this WHERE clause in the beginning is what im after:
SELECT pool.username
FROM pool
WHERE id < 1000
LEFT JOIN sent ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
AND (location = 'united states' OR location = 'us' OR location = 'usa');
Splitting your query does not sound like the right approach.
The better way would be to fetch some records from your existing query, send your messages and then continue fetching.
Your query could benefit from another compound index on
pool( location, gender, username )
This should allow to run your complete query from sdex and your new index.
If you really want to split the query, an easy approach could be to
SELECT MIN(id), MAX(id) FROM pool
and then loop from min to max in steps of 1000, and add id >= r AND id < r+1000 to your query.
This could return 0 rows if you have gaps, but it would never return more than 1000 rows at once. A different compound index on pool including (id, location, gender and maybe username) could help for this query.
Looks like its using pool.location
Could try adding an index on gender, might no be a lot of help though.
Rationalising location to a country code in your data, and indexing that would probably be useful.
But the first index to add looks to be campid to me, that could make a serious dent in the number of records it has to test.