MySQL SELECT COUNT with GROUP and ORDER performance issue

MySQL SELECT COUNT with GROUP and ORDER performance issue - mysql

The Facts:
Dedicated Server, 4 Cores, 16GB
MySQL 5.5.29-0ubuntu0.12.10.1-log - (Ubuntu)
One Table, 1.9M rows and growing
I need all sorted rows for export or a 5er chunk. The query takes 25 seconds with Copying To Tmp Table 23.3 s
I tried InnoDB and MyISAM, changing the index order, using a MD5 Hash of some_text as GROUP BY, partition the table by day.
dayis a Unix-Timestamp and alway present.
lang some_bool some_filter ano_filter rel_id could be in where clause but not need to.
Here is the MyISAM example:
The table
mysql> SHOW CREATE TABLE data \G;
*************************** 1. row ***************************
Table: data
Create Table: CREATE TABLE `data` (
`data_id` bigint(20) NOT NULL AUTO_INCREMENT,
`rel_id` int(11) NOT NULL,
`some_text` varchar(255) DEFAULT NULL,
`lang` varchar(3) DEFAULT NULL,
`some_bool` tinyint(1) DEFAULT NULL,
`some_filter` varchar(40) DEFAULT NULL,
`ano_filter` varchar(10) DEFAULT NULL,
`day` int(11) DEFAULT NULL,
PRIMARY KEY (`data_id`),
KEY `cnt_idx` (`some_filter`,`ano_filter`,`rel_id`,`lang`,`some_bool`,`some_text`,`day`)
) ENGINE=MyISAM AUTO_INCREMENT=1900099 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
The query
mysql> EXPLAIN SELECT `some_text` , COUNT(*) AS `num` FROM `data`
WHERE `lang` = 'en' AND `day` BETWEEN '1364342400' AND
'1366934399' GROUP BY `some_text` ORDER BY `num` DESC \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: data
type: index
possible_keys: NULL
key: cnt_idx
key_len: 947
ref: NULL
rows: 1900098
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)
mysql> SELECT `some_text` , COUNT(*) AS `num` FROM `data`
WHERE `lang` = 'en' AND `day` BETWEEN '1364342400' AND '1366934399'
GROUP BY `some_text` ORDER BY `num` DESC LIMIT 5 \G;
...
*************************** 5. row ***************************
5 rows in set (24.26 sec)
Any idea how to speed up that thing?`

No index is being used because of the column order in the index. Indexes work left to right. For this query to use an index, you would need an index of lang, day.

Related

Mysql query not using all indexed column for searching

For faster search i have indexed two columns(composite index) client_id and batch_id.
Below is my output of indexes of my table
show indexes from authentication_codes
*************************** 3. row ***************************
Table: authentication_codes
Non_unique: 1
Key_name: client_id
Seq_in_index: 1
Column_name: client_id
Collation: A
Cardinality: 18
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
*************************** 4. row ***************************
Table: authentication_codes
Non_unique: 1
Key_name: client_id
Seq_in_index: 2
Column_name: batch_id
Collation: A
Cardinality: 18
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
4 rows in set (0.02 sec)
ERROR:
No query specified
when i use explain to check if indexing is used in query or not it gives me below output.
mysql> explain select * from authentication_codes where client_id=6 and batch_id="101" \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: authentication_codes
type: ref
possible_keys: client_id
key: client_id
key_len: 773
ref: const,const
rows: 1044778
Extra: Using where
1 row in set (0.00 sec)
ERROR:
No query specified
********************EDIT***************************
output of show create table authentication_codes is as below
mysql> show create table authentication_codes \G;
*************************** 1. row ***************************
Table: authentication_codes
Create Table: CREATE TABLE `authentication_codes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`batch_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`serial_num` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`client_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_authentication_codes_on_code` (`code`),
KEY `client_id_batch_id` (`client_id`,`batch_id`)
) ENGINE=InnoDB AUTO_INCREMENT=48406205 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
my question is why batch_id column is not used for searching. why only client_id column is used for searching??

To use index on two columns you need to create two column index. MySQL cannot use two separate indexes on one table.
This query will add multi column index on client_id and batch_id
alter table authentication_codes add index client_id_batch_id (client_id,batch_id)
http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

The EXPLAIN does not match the CREATE TABLE, at least in the name of the relevant index.
Explaining the EXPLAIN (as displayed at the moment):
select_type: SIMPLE
table: authentication_codes
type: ref
possible_keys: client_id
key: client_id -- The index named "client_id" was used
key_len: 773 -- (explained below)
ref: const,const -- 2 constants were used for the first two columns in that index
rows: 1044778 -- About this many rows (2% of table) matches those two constants
Extra: Using where
773 = 2 + 3 * 255 + 1 + 4 + 1
2 = length for VARCHAR
3 = max width of a utf8 character -- do you really need utf8?
255 = max length provided in VARCHAR(255) -- do you really need that much?
1 = extra length for NULL -- perhaps your columns could/should be NOT NULL?
4 = length of INT for client_id -- if you don't need 4 billion ids, maybe a smaller INT would work? and maybe UNSIGNED, too?
So, yes, it is using both parts of client_id=6 and batch_id="101". But there are a million rows in that batch for that client, so the query takes time.
If you want to discuss how to further speed up the use of this table, please provide the other common queries. (I don't want to tweak the schema to make this query faster, only to find that other queries are made slower.)

How to optimize MySQL query for a large database

I've noticed a serious problem recently, when my database increased to over 620000 records. Following query:
SELECT *,UNIX_TIMESTAMP(`time`) AS `time` FROM `log` WHERE (`projectname`="test" OR `projectname` IS NULL) ORDER BY `time` DESC LIMIT 0, 20
has an execution time about 2,5s on a local database. I was wondering how can I speed it up?
The EXPLAIN commands produces following output:
ID: 1
select type: SIMPLE
TABLE: log
type: ref_or_null
possible_keys: projectname
key: projectname
key_len: 387
ref: const
rows: 310661
Extra: Using where; using filesort
I've got indexes set on projectname, time columns.
Any help?
EDIT: Thanks to ypercube response, I was able to decrease query execution time. But when I only add another condition to WHERE clause (AND severity="changes") it lasts 2s again. Is it a good solution to include all of the possible "WHERE" columns to my merged-index?
ID: 1
select type: SIMPLE
TABLE: log
type: ref_or_null
possible_keys: projectname
key: projectname
key_len: 419
ref: const, const
rows: 315554
Extra: Using where; using filesort
Table structure:
CREATE TABLE `log` (
`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`projectname` VARCHAR(128) DEFAULT NULL,
`time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`master` VARCHAR(128) NOT NULL,
`itemName` VARCHAR(128) NOT NULL,
`severity` VARCHAR(10) NOT NULL DEFAULT 'info',
`message` VARCHAR(255) NOT NULL,
`more` TEXT NOT NULL,
PRIMARY KEY (`id`),
KEY `projectname` (`severity`,`projectname`,`time`)
) ENGINE=INNODB AUTO_INCREMENT=621691 DEFAULT CHARSET=utf8

Add an index on (projectname, time):
ALTER TABLE log
ADD INDEX projectname_time_IX -- choose a name for the index
(projectname, time) ;
And then use the original column for the ORDER BY
SELECT *, UNIX_TIMESTAMP(time) AS unix_time
FROM log
WHERE (projectname = 'test' OR projectname IS NULL)
ORDER BY time DESC
LIMIT 0, 20 ;
or this variation - to make sure that the index is used effectively:
( SELECT *, UNIX_TIMESTAMP(time) AS unix_time
FROM log
WHERE projectname = 'test'
ORDER BY time DESC
LIMIT 20
)
UNION ALL
( SELECT *, UNIX_TIMESTAMP(time) AS unix_time
FROM log
WHERE projectname IS NULL
ORDER BY time DESC
LIMIT 20
)
ORDER BY time DESC
LIMIT 20 ;

MySQL Refusing to Use Index for Simple Query

I have a table that I'm running a very simple query against. I've added an index to the table on a high cardinality column, so MySQL should be able to narrow the result almost instantly, but it's doing a full table scan every time. Why isn't MySQL using my index?
mysql> select count(*) FROM eventHistory;
+----------+
| count(*) |
+----------+
| 247514 |
+----------+
1 row in set (0.15 sec)
CREATE TABLE `eventHistory` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`whatID` varchar(255) DEFAULT NULL,
`whatType` varchar(255) DEFAULT NULL,
`whoID` varchar(255) DEFAULT NULL,
`createTimestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `whoID` (`whoID`,`whatID`)
) ENGINE=InnoDB;
mysql> explain SELECT * FROM eventHistory where whoID = 12551\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: eventHistory
type: ALL
possible_keys: whoID
key: NULL
key_len: NULL
ref: NULL
rows: 254481
Extra: Using where
1 row in set (0.00 sec)
I have tried adding FORCE INDEX to the query as well, and it still seems to be doing a full table scan. The performance of the query is also poor. It's currently taking about 0.65 seconds to find the appropriate row.

The above answers lead me to realize two things.
1) When using a VARCHAR index, the query criteria needs to be quoted or MySQL will refuse to use the index (implicitly casting behind the scenes?)
SELECT * FROM foo WHERE column = '123'; # do this
SELECT * FROM foo where column = 123; # don't do this
2) You're better off using/indexing an INT if at all possible.

Mysql Query run faster

Table structure:
CREATE TABLE IF NOT EXISTS `logs` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user` bigint(20) unsigned NOT NULL,
`type` tinyint(1) unsigned NOT NULL,
`date` int(11) unsigned NOT NULL,
`plus` decimal(10,2) unsigned NOT NULL,
`minus` decimal(10,2) unsigned NOT NULL,
`tax` decimal(10,2) unsigned NOT NULL,
`item` bigint(20) unsigned NOT NULL,
`info` char(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `item` (`item`),
KEY `user` (`user`),
KEY `type` (`type`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PACK_KEYS=0 ROW_FORMAT=FIXED;
Query:
SELECT logs.item, COUNT(logs.item) AS total FROM logs WHERE logs.type = 4 GROUP BY logs.item;
Table holds 110k records out of which 50k type 4 records.
Execution time: 0.13 seconds
I know this is fast, but can I make it faster?
I am expecting 1 million records and thus the time would grow quite a bit.

Analyze queries with EXPLAIN:
mysql> EXPLAIN SELECT logs.item, COUNT(logs.item) AS total FROM logs
WHERE logs.type = 4 GROUP BY logs.item\G
id: 1
select_type: SIMPLE
table: logs
type: ref
possible_keys: type
key: type
key_len: 1
ref: const
rows: 1
Extra: Using where; Using temporary; Using filesort
The "Using temporary; Using filesort" indicates some costly operations. Because the optimizer knows it can't rely on the rows with each value of item being stored together, it needs to scan the whole table and collect the count per distinct item in a temporary table. Then sort the resulting temp table to produce the result.
You need an index on the logs table on columns (type, item) in that order. Then the optimizer knows it can leverage the index tree to scan each value of logs.item fully before moving on to the next value. By doing this, it can skip the temporary table to collect values, and skip the implicit sorting of the result.
mysql> CREATE INDEX logs_type_item ON logs (type,item);
mysql> EXPLAIN SELECT logs.item, COUNT(logs.item) AS total FROM logs
WHERE logs.type = 4 GROUP BY logs.item\G
id: 1
select_type: SIMPLE
table: logs
type: ref
possible_keys: type,logs_type_item
key: logs_type_item
key_len: 1
ref: const
rows: 1
Extra: Using where

Mysql big table multiple dates in where clause query performance

I have following table with 2 million rows in it.
CREATE TABLE `gen_fmt_lookup` (
`episode_id` varchar(30) DEFAULT NULL,
`media_type` enum('Audio','Video') NOT NULL DEFAULT 'Video',
`service_id` varchar(50) DEFAULT NULL,
`genre_id` varchar(30) DEFAULT NULL,
`format_id` varchar(30) DEFAULT NULL,
`masterbrand_id` varchar(30) DEFAULT NULL,
`signed` int(11) DEFAULT NULL,
`actual_start` datetime DEFAULT NULL,
`scheduled_start` datetime DEFAULT NULL,
`scheduled_end` datetime DEFAULT NULL,
`discoverable_end` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
KEY `idx_discoverability_gn` (`media_type`,`service_id`,`genre_id`,`actual_start`,`scheduled_end`,`scheduled_start`,`episode_id`),
KEY `idx_discoverability_fmt` (`media_type`,`service_id`,`format_id`,`actual_start`,`scheduled_end`,`scheduled_start`,`episode_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Below is query with explain which I am running against this table
mysql> EXPLAIN select episode_id,scheduled_start
from gen_fmt_lookup
where media_type='video'
and service_id in ('mobile_streaming_100','mobile_streaming_200','iplayer_streaming_h264_flv_vlo','mobile_streaming_500','iplayer_stb_uk_stream_aac_concrete','captions','iplayer_uk_stream_aac_rtmp_concrete','iplayer_streaming_n95_3g','iplayer_uk_download_oma_wifi','iplayer_uk_stream_aac_3gp_concrete')
and genre_id in ('100001','100002','100003','100004','100005','100006','100007','100008','100009','100010')
and NOW() BETWEEN actual_start and scheduled_end
group by episode_id order by min(scheduled_start) limit 1 offset 100\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: nitro_episodes_gen_fmt_lookup
type: range
possible_keys: idx_discoverability_gn,idx_discoverability_fmt
key: idx_discoverability_gn
key_len: 96
ref: NULL
rows: 31719
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.16 sec)
So my questions are
Is the index used in query execution best? And if not can someone please suggest better index?
Can mysql use composite index with 2 dates in where clause? As in the query above where clause has and condition "and NOW() BETWEEN actual_start and scheduled_end " but mysql is using index 'idx_discoverability_gn' with key length of 96 only. Which means it is using index upto (media_type,service_id,genre_id,actual_start) only.why can't it use index upto (media_type,service_id,genre_id,actual_start,scheduled_end) ?
What else I can do to improve performance?

You have a range check, so a clustered index on (scheduled_start, actual_start, scheduled_end) might help. Your current indexes are not very useful.You can get rid of them and build one primary key (episode_id) and another index (service_id, genre-id, episode_id)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL SELECT COUNT with GROUP and ORDER performance issue - mysql

No index is being used because of the column order in the index. Indexes work left to right. For this query to use an index, you would need an index of lang, day.

Related

Mysql query not using all indexed column for searching

How to optimize MySQL query for a large database

MySQL Refusing to Use Index for Simple Query

Mysql Query run faster

Mysql big table multiple dates in where clause query performance

Categories

Resources