I have one table that contains documents, and on production there are about 1.2 millon records in this table. On this table when I do select count(*) from <table>, it takes too long that at the end I need to restart the DB. On the other hand I also have many other table containing 10-12 million rows but those tables does not have this issue.
These are indexes of that table
mysql> show index from candidates_resume\G
*************************** 1. row ***************************
Table: candidates_resume
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 843657
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 2. row ***************************
Table: candidates_resume
Non_unique: 0
Key_name: candidate_id
Seq_in_index: 1
Column_name: candidate_id
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 3. row ***************************
Table: candidates_resume
Non_unique: 1
Key_name: candidates_resume_uploaded_on_e4c78158b8c18f_uniq
Seq_in_index: 1
Column_name: uploaded_on
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 4. row ***************************
Table: candidates_resume
Non_unique: 1
Key_name: candidates_resume_pdf_file_5b052603240d1d43_uniq
Seq_in_index: 1
Column_name: pdf_file
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 5. row ***************************
Table: candidates_resume
Non_unique: 1
Key_name: candidates_resume_watermark_file_68fd6000f27d4f8d_uniq
Seq_in_index: 1
Column_name: watermark_file
Collation: A
Cardinality: 844009
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
And this is result of SHOW CREATE TABLE
Create Table: CREATE TABLE `candidates_resume` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(50) NOT NULL,
`uploaded_on` datetime NOT NULL,
`candidate_id` int(11) NOT NULL,
`file` varchar(100) NOT NULL,
`hash` varchar(10) NOT NULL,
`pdf_file` varchar(100) DEFAULT NULL,
`resume_text` longtext NOT NULL,
`watermark_file` varchar(100) DEFAULT NULL,
`html_file` varchar(100) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `candidate_id` (`candidate_id`),
KEY `candidates_resume_uploaded_on_e4c78158b8c18f_uniq` (`uploaded_on`),
KEY `candidates_resume_pdf_file_88ec1f31_uniq` (`pdf_file`),
KEY `candidates_resume_watermark_file_23af2d43_uniq` (`watermark_file`),
CONSTRAINT `candidate_id_refs_id_88f99c34` FOREIGN KEY (`candidate_id`) REFERENCES `candidates_candidate` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=591098 DEFAULT CHARSET=utf8
Can anyone guide me how can I catch the issue with this table ?
SELECT COUNT(*) FROM ... without any filtering (WHERE) must scan the entire table or an index. This takes time.
Do EXPLAIN SELECT ... to see how it is handled. I think it will use your UNIQUE(candidate_id). (Please provide SHOW CREATE TABLE.)
Assuming that candidate_id is INT or BIGINT, the query can't be run much faster.
Why do you need to count the number of rows. Would an estimate be "good enough"? If so, see SHOW TABLE STATUS or the equivalent query in information_schema.
If the count from midnight this morning would be "good enough", then perform that and save it somewhere.
If you can't figure how to avoid timeout, see wait_timeout. Caution; there are several flavors of it.
With a Summary Table
Build and maintain a table that keeps, say, the hourly counts of rows:
CREATE TABLE counts (
hr MEDIUMINT UNSIGNED NOT NULL,
ct SMALLINT UNSIGNED NOT NULL,
PRIMARY KEY(hr)
) ENGINE=InnoDB;
Initialize (one-time task):
INSERT INTO counts (hr, ct)
SELECT FLOOR(UNIX_TIMESTAMP(uploaded_on) / 3600),
COUNT(*)
FROM candidates_resume
GROUP BY 1;
As a new row is inserted into candidates_resume:
INSERT INTO candidates_resume
(hr, ct)
VALUES
(FLOOR(UNIX_TIMESTAMP(uploaded_on) / 3600), 1)
ON DUPLICATE KEY UPDATE ct = ct + 1;
When wanting the count:
SELECT SUM(ct) FROM counts;
That gives the count up to the start of the current hour. If you need the count up to the current second, add on a second query to count just the rows since the start of the hour.
(There are a few loose ends to fix.)
More discussion: http://mysql.rjweb.org/doc.php/summarytables
Related
I have an error popping up that throws an error like:
IntegrityError: (999, "Duplicate entry 'XXXXX' for key 'constraint_name_here_uniq'")
So I have the constraint name, is there an easy way to find out what table, columns are referenced in the mysql command line? It's a very large database and tried poking around a few tables with SHOW CREATE TABLE with no luck, I also tried DESC <constraint name> but that didn't work either.
This should work:
select *
from information_schema.KEY_COLUMN_USAGE
where CONSTRAINT_NAME ='constraint_name_here_uniq';
Example:
mysql> use information_schema;
Database changed
mysql> select * from KEY_COLUMN_USAGE where CONSTRAINT_NAME ='user_has_notification_types_user_idx' \G
*************************** 1. row ***************************
CONSTRAINT_CATALOG: def
CONSTRAINT_SCHEMA: kanboard
CONSTRAINT_NAME: user_has_notification_types_user_idx
TABLE_CATALOG: def
TABLE_SCHEMA: kanboard
TABLE_NAME: user_has_notification_types
COLUMN_NAME: user_id
ORDINAL_POSITION: 1
POSITION_IN_UNIQUE_CONSTRAINT: NULL
REFERENCED_TABLE_SCHEMA: NULL
REFERENCED_TABLE_NAME: NULL
REFERENCED_COLUMN_NAME: NULL
*************************** 2. row ***************************
CONSTRAINT_CATALOG: def
CONSTRAINT_SCHEMA: kanboard
CONSTRAINT_NAME: user_has_notification_types_user_idx
TABLE_CATALOG: def
TABLE_SCHEMA: kanboard
TABLE_NAME: user_has_notification_types
COLUMN_NAME: notification_type
ORDINAL_POSITION: 2
POSITION_IN_UNIQUE_CONSTRAINT: NULL
REFERENCED_TABLE_SCHEMA: NULL
REFERENCED_TABLE_NAME: NULL
REFERENCED_COLUMN_NAME: NULL
2 rows in set (1.70 sec)
And the table using the index:
mysql> use kanboard;
Database changed
mysql> show create table user_has_notification_types;
+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| user_has_notification_types | CREATE TABLE `user_has_notification_types` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`notification_type` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_has_notification_types_user_idx` (`user_id`,`notification_type`),
CONSTRAINT `user_has_notification_types_ibfk_1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
+-----------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
For faster search i have indexed two columns(composite index) client_id and batch_id.
Below is my output of indexes of my table
show indexes from authentication_codes
*************************** 3. row ***************************
Table: authentication_codes
Non_unique: 1
Key_name: client_id
Seq_in_index: 1
Column_name: client_id
Collation: A
Cardinality: 18
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
*************************** 4. row ***************************
Table: authentication_codes
Non_unique: 1
Key_name: client_id
Seq_in_index: 2
Column_name: batch_id
Collation: A
Cardinality: 18
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
4 rows in set (0.02 sec)
ERROR:
No query specified
when i use explain to check if indexing is used in query or not it gives me below output.
mysql> explain select * from authentication_codes where client_id=6 and batch_id="101" \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: authentication_codes
type: ref
possible_keys: client_id
key: client_id
key_len: 773
ref: const,const
rows: 1044778
Extra: Using where
1 row in set (0.00 sec)
ERROR:
No query specified
********************EDIT***************************
output of show create table authentication_codes is as below
mysql> show create table authentication_codes \G;
*************************** 1. row ***************************
Table: authentication_codes
Create Table: CREATE TABLE `authentication_codes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`batch_id` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`serial_num` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`client_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_authentication_codes_on_code` (`code`),
KEY `client_id_batch_id` (`client_id`,`batch_id`)
) ENGINE=InnoDB AUTO_INCREMENT=48406205 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)
my question is why batch_id column is not used for searching. why only client_id column is used for searching??
To use index on two columns you need to create two column index. MySQL cannot use two separate indexes on one table.
This query will add multi column index on client_id and batch_id
alter table authentication_codes add index client_id_batch_id (client_id,batch_id)
http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
The EXPLAIN does not match the CREATE TABLE, at least in the name of the relevant index.
Explaining the EXPLAIN (as displayed at the moment):
select_type: SIMPLE
table: authentication_codes
type: ref
possible_keys: client_id
key: client_id -- The index named "client_id" was used
key_len: 773 -- (explained below)
ref: const,const -- 2 constants were used for the first two columns in that index
rows: 1044778 -- About this many rows (2% of table) matches those two constants
Extra: Using where
773 = 2 + 3 * 255 + 1 + 4 + 1
2 = length for VARCHAR
3 = max width of a utf8 character -- do you really need utf8?
255 = max length provided in VARCHAR(255) -- do you really need that much?
1 = extra length for NULL -- perhaps your columns could/should be NOT NULL?
4 = length of INT for client_id -- if you don't need 4 billion ids, maybe a smaller INT would work? and maybe UNSIGNED, too?
So, yes, it is using both parts of client_id=6 and batch_id="101". But there are a million rows in that batch for that client, so the query takes time.
If you want to discuss how to further speed up the use of this table, please provide the other common queries. (I don't want to tweak the schema to make this query faster, only to find that other queries are made slower.)
The Facts:
Dedicated Server, 4 Cores, 16GB
MySQL 5.5.29-0ubuntu0.12.10.1-log - (Ubuntu)
One Table, 1.9M rows and growing
I need all sorted rows for export or a 5er chunk. The query takes 25 seconds with Copying To Tmp Table 23.3 s
I tried InnoDB and MyISAM, changing the index order, using a MD5 Hash of some_text as GROUP BY, partition the table by day.
dayis a Unix-Timestamp and alway present.
lang some_bool some_filter ano_filter rel_id could be in where clause but not need to.
Here is the MyISAM example:
The table
mysql> SHOW CREATE TABLE data \G;
*************************** 1. row ***************************
Table: data
Create Table: CREATE TABLE `data` (
`data_id` bigint(20) NOT NULL AUTO_INCREMENT,
`rel_id` int(11) NOT NULL,
`some_text` varchar(255) DEFAULT NULL,
`lang` varchar(3) DEFAULT NULL,
`some_bool` tinyint(1) DEFAULT NULL,
`some_filter` varchar(40) DEFAULT NULL,
`ano_filter` varchar(10) DEFAULT NULL,
`day` int(11) DEFAULT NULL,
PRIMARY KEY (`data_id`),
KEY `cnt_idx` (`some_filter`,`ano_filter`,`rel_id`,`lang`,`some_bool`,`some_text`,`day`)
) ENGINE=MyISAM AUTO_INCREMENT=1900099 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
The query
mysql> EXPLAIN SELECT `some_text` , COUNT(*) AS `num` FROM `data`
WHERE `lang` = 'en' AND `day` BETWEEN '1364342400' AND
'1366934399' GROUP BY `some_text` ORDER BY `num` DESC \G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: data
type: index
possible_keys: NULL
key: cnt_idx
key_len: 947
ref: NULL
rows: 1900098
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.00 sec)
mysql> SELECT `some_text` , COUNT(*) AS `num` FROM `data`
WHERE `lang` = 'en' AND `day` BETWEEN '1364342400' AND '1366934399'
GROUP BY `some_text` ORDER BY `num` DESC LIMIT 5 \G;
...
*************************** 5. row ***************************
5 rows in set (24.26 sec)
Any idea how to speed up that thing?`
No index is being used because of the column order in the index. Indexes work left to right. For this query to use an index, you would need an index of lang, day.
Table structure:
CREATE TABLE IF NOT EXISTS `logs` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user` bigint(20) unsigned NOT NULL,
`type` tinyint(1) unsigned NOT NULL,
`date` int(11) unsigned NOT NULL,
`plus` decimal(10,2) unsigned NOT NULL,
`minus` decimal(10,2) unsigned NOT NULL,
`tax` decimal(10,2) unsigned NOT NULL,
`item` bigint(20) unsigned NOT NULL,
`info` char(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `item` (`item`),
KEY `user` (`user`),
KEY `type` (`type`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PACK_KEYS=0 ROW_FORMAT=FIXED;
Query:
SELECT logs.item, COUNT(logs.item) AS total FROM logs WHERE logs.type = 4 GROUP BY logs.item;
Table holds 110k records out of which 50k type 4 records.
Execution time: 0.13 seconds
I know this is fast, but can I make it faster?
I am expecting 1 million records and thus the time would grow quite a bit.
Analyze queries with EXPLAIN:
mysql> EXPLAIN SELECT logs.item, COUNT(logs.item) AS total FROM logs
WHERE logs.type = 4 GROUP BY logs.item\G
id: 1
select_type: SIMPLE
table: logs
type: ref
possible_keys: type
key: type
key_len: 1
ref: const
rows: 1
Extra: Using where; Using temporary; Using filesort
The "Using temporary; Using filesort" indicates some costly operations. Because the optimizer knows it can't rely on the rows with each value of item being stored together, it needs to scan the whole table and collect the count per distinct item in a temporary table. Then sort the resulting temp table to produce the result.
You need an index on the logs table on columns (type, item) in that order. Then the optimizer knows it can leverage the index tree to scan each value of logs.item fully before moving on to the next value. By doing this, it can skip the temporary table to collect values, and skip the implicit sorting of the result.
mysql> CREATE INDEX logs_type_item ON logs (type,item);
mysql> EXPLAIN SELECT logs.item, COUNT(logs.item) AS total FROM logs
WHERE logs.type = 4 GROUP BY logs.item\G
id: 1
select_type: SIMPLE
table: logs
type: ref
possible_keys: type,logs_type_item
key: logs_type_item
key_len: 1
ref: const
rows: 1
Extra: Using where
I have following table with 2 million rows in it.
CREATE TABLE `gen_fmt_lookup` (
`episode_id` varchar(30) DEFAULT NULL,
`media_type` enum('Audio','Video') NOT NULL DEFAULT 'Video',
`service_id` varchar(50) DEFAULT NULL,
`genre_id` varchar(30) DEFAULT NULL,
`format_id` varchar(30) DEFAULT NULL,
`masterbrand_id` varchar(30) DEFAULT NULL,
`signed` int(11) DEFAULT NULL,
`actual_start` datetime DEFAULT NULL,
`scheduled_start` datetime DEFAULT NULL,
`scheduled_end` datetime DEFAULT NULL,
`discoverable_end` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
KEY `idx_discoverability_gn` (`media_type`,`service_id`,`genre_id`,`actual_start`,`scheduled_end`,`scheduled_start`,`episode_id`),
KEY `idx_discoverability_fmt` (`media_type`,`service_id`,`format_id`,`actual_start`,`scheduled_end`,`scheduled_start`,`episode_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Below is query with explain which I am running against this table
mysql> EXPLAIN select episode_id,scheduled_start
from gen_fmt_lookup
where media_type='video'
and service_id in ('mobile_streaming_100','mobile_streaming_200','iplayer_streaming_h264_flv_vlo','mobile_streaming_500','iplayer_stb_uk_stream_aac_concrete','captions','iplayer_uk_stream_aac_rtmp_concrete','iplayer_streaming_n95_3g','iplayer_uk_download_oma_wifi','iplayer_uk_stream_aac_3gp_concrete')
and genre_id in ('100001','100002','100003','100004','100005','100006','100007','100008','100009','100010')
and NOW() BETWEEN actual_start and scheduled_end
group by episode_id order by min(scheduled_start) limit 1 offset 100\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: nitro_episodes_gen_fmt_lookup
type: range
possible_keys: idx_discoverability_gn,idx_discoverability_fmt
key: idx_discoverability_gn
key_len: 96
ref: NULL
rows: 31719
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.16 sec)
So my questions are
Is the index used in query execution best? And if not can someone please suggest better index?
Can mysql use composite index with 2 dates in where clause? As in the query above where clause has and condition "and NOW() BETWEEN actual_start and scheduled_end " but mysql is using index 'idx_discoverability_gn' with key length of 96 only. Which means it is using index upto (media_type,service_id,genre_id,actual_start) only.why can't it use index upto (media_type,service_id,genre_id,actual_start,scheduled_end) ?
What else I can do to improve performance?
You have a range check, so a clustered index on (scheduled_start, actual_start, scheduled_end) might help. Your current indexes are not very useful.You can get rid of them and build one primary key (episode_id) and another index (service_id, genre-id, episode_id)