I have a straight forward table which currently has ~10M rows.
Here is the definition:
CREATE TABLE `train_run_messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`train_id` int(10) unsigned NOT NULL,
`customer_id` int(10) unsigned NOT NULL,
`station_id` int(10) unsigned NOT NULL,
`train_run_id` int(10) unsigned NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`type` tinyint(4) NOT NULL,
`customer_station_track_id` int(10) unsigned DEFAULT NULL,
`lateness_type` tinyint(3) unsigned NOT NULL,
`lateness_amount` mediumint(9) NOT NULL,
`lateness_code` tinyint(3) unsigned DEFAULT '0',
`info_text` varchar(32) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `timestamp` (`timestamp`),
KEY `lateness_amount` (`lateness_amount`),
KEY `customer_timestamp` (`customer_id`,`timestamp`),
KEY `trm_customer` (`customer_id`),
KEY `trm_train` (`train_id`),
KEY `trm_station` (`station_id`),
KEY `trm_trainrun` (`train_run_id`),
KEY `FI_trm_customer_station_tracks` (`customer_station_track_id`),
CONSTRAINT `FK_trm_customer_station_tracks` FOREIGN KEY (`customer_station_track_id`) REFERENCES `customer_station_tracks` (`id`),
CONSTRAINT `trm_customer` FOREIGN KEY (`customer_id`) REFERENCES `customers` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `trm_station` FOREIGN KEY (`station_id`) REFERENCES `stations` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `trm_train` FOREIGN KEY (`train_id`) REFERENCES `trains` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `trm_trainrun` FOREIGN KEY (`train_run_id`) REFERENCES `train_runs` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=9928724 DEFAULT CHARSET=utf8;
We have lots of queries that filter by customer_id and timestamp so we have created a combined index for that.
Now I have this simple query:
SELECT * FROM `train_run_messages` WHERE `customer_id` = '5' AND `timestamp` >= '2013-12-01 00:00:57' AND `timestamp` <= '2013-12-31 23:59:59' LIMIT 0, 100
On our current machine with ~10M entries this query takes ~16 seconds, which is way to long in my taste, since there is an index for queries like this.
So lets look at the output of explain for this query:
+----+-------------+--------------------+------+------------------------------------------- +--------------------+---------+-------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+------+-------------------------------------------+--------------------+---------+-------+--------+-------------+
| 1 | SIMPLE | train_run_messages | ref | timestamp,customer_timestmap,trm_customer | customer_timestamp | 4 | const | 551405 | Using where |
+----+-------------+--------------------+------+-------------------------------------------+--------------------+---------+-------+--------+-------------+
So MySQL is telling me that it would use the customer_timestamp index, fine! Why does the query still take ~16 seconds?
Since I don't always trust the MySQL query analyzer lets try it with a forced index:
SELECT * FROM `train_run_messages` USE INDEX (customer_timestamp) WHERE `customer_id` = '5' AND `timestamp` >= '2013-12-01 00:00:57' AND `timestamp` <= '2013-12-31 23:59:59' LIMIT 0, 100
Query Time: 0.079s!!
Me: puzzled!
So can anyone explain why MySQL is obviously not using the index that it says it would use from the EXPLAIN output? And is there any way to prove what index it really used when performing the real query?
Btw: Here is the output from the slow-log:
# Time: 131217 11:18:04
# User#Host: root[root] # localhost [127.0.0.1]
# Query_time: 16.252878 Lock_time: 0.000168 Rows_sent: 100 Rows_examined: 9830711
SET timestamp=1387275484;
SELECT * FROM `train_run_messages` WHERE `customer_id` = '5' AND `timestamp` >= '2013-12-01 00:00:57' AND `timestamp` <= '2013-12-31 23:59:59' LIMIT 0, 100;
Alltough it does not specifically say that it is not using any index the Rows_examined suggests that it does a full tablescan.
So is this fixable without using USE INDEX? We are using Propel as ORM and there is currently no way to use MySQL-specific "USE INDEX" without manually writing the query.
Edit:
Here is the output of EXPLAIN and USE INDEX:
+----+-------------+--------------------+-------+--------------------+--------------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+--------------------+--------------------+---------+------+--------+-------------+
| 1 | SIMPLE | train_run_messages | range | customer_timestmap | customer_timestmap | 8 | NULL | 191264 | Using where |
+----+-------------+--------------------+-------+--------------------+--------------------+---------+------+--------+-------------+
MySQL has three candidate indexes
(timestamp)
(customer_id, timestamp)
(customer_id)
and you are asking
`customer_id` = '5' AND `timestamp` BETWEEN ? AND ?
The optimizer has choose (customer_id, timestamp) from statistics.
InnoDB Engine's optimizer depends on statistics which uses sampling when table is opend. default sampling reads 8 pages on index file.
So, I suggest three things as follows
increase innodb_stats_sample_pages=64.
Default value of innodb_stats_sample_pages is 8 pages.
refer to http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_stats_sample_pages
remove redandant index. following index is just fine. currently there is only customer_id = 5 (you said)
(timestamp)
(customer_id)
run OPTIMIZE TABLE train_run_messages to re-organize table.
this reduces table and index size and sometimes this makes optimizer smarter
To me, the biggest thing it is failing on your wrapping the customer ID in quotes... such as = '5'. By doing this, it cant use the customer/timestamp index because the customer Id needs to be converted to a string to match your '5' vs just = 5 and you should be good to go.
Related
Below are the 4 tables' table structure:
Calendar:
CREATE TABLE `calender` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`HospitalID` int(11) NOT NULL,
`ColorCode` int(11) DEFAULT NULL,
`RecurrID` int(11) NOT NULL,
`IsActive` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
KEY `idxHospital` (`ID`,`StaffID`,`HospitalID`,`ColorCode`,`RecurrID`,`IsActive`)
) ENGINE=InnoDB AUTO_INCREMENT=4638 DEFAULT CHARSET=latin1;
CalendarAttendee:
CREATE TABLE `calenderattendee` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`CalenderID` int(11) NOT NULL,
`StaffID` int(11) NOT NULL,
`IsActive` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
KEY `idxCalStaffID` (`StaffID`,`CalenderID`)
) ENGINE=InnoDB AUTO_INCREMENT=20436 DEFAULT CHARSET=latin1;
CallPlanStaff:
CREATE TABLE `callplanstaff` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`Staffname` varchar(45) NOT NULL,
`IsActive` tinyint(4) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
KEY `idx_IsActive` (`Staffname`,`IsActive`),
KEY `idx_staffName` (`Staffname`,`ID`) USING BTREE KEY_BLOCK_SIZE=100
) ENGINE=InnoDB AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;
Users:
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL DEFAULT '',
`name` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_users_on_email` (`email`),
UNIQUE KEY `index_users_on_name` (`name`),
KEY `idx_email` (`email`) USING BTREE KEY_BLOCK_SIZE=100
) ENGINE=InnoDB AUTO_INCREMENT=33 DEFAULT CHARSET=utf8;
What I'm trying to do is to fetch the calender.ID and Users.name using below query:
SELECT a.ID, h.name
FROM `stjude`.`calender` a
left join calenderattendee e on a.ID = e.calenderID
left join callplanstaff f on e.StaffID = f.ID
left join users h on f.Staffname = h.email
The relation between those tables are:
It took about 4 seconds to fetch 13000 records which I bet it could be faster.
When I look at the tabular explain of the query, here's the result:
Why MYSQL isn't using index on callplanstaff table and users table?
Also, in my case, should I use multi index instead of multi column index?
And is there any indexes I'm missing so my query is slow?
=======================================================================
Updated:
As zedfoxus and spencer7593 recommended to change the idxCalStaffID's ordering and idx_staffname's ordering, below is the execution plan:
It took 0.063 seconds to fetch, much fewer time required, how does the ordering of the indexing affects the fetch time..?
You're misinterpreting the EXPLAIN report.
type: index is not such a good thing. It means it's doing an "index-scan" which examines every element of an index. It's almost as bad as a table-scan. Notice the column rows: 4562 and rows: 13451. This is the estimated number of index elements it will examine for each of those tables.
Having two tables doing a index-scan is even worse. The total number of rows examined for this join is 4562 x 13451 = 61,363,462.
Using join buffer is not a good thing. It's a thing the optimizer does as a consolation when it can't use an index for the join.
type: eqref is a good thing. It means it's using a PRIMARY KEY index or UNIQUE KEY index, to look up exactly one row. Notice the column rows: 1. So at least for each of the rows from the previous join, it only does one index lookup.
You should create an index on calenderattendee for columns (CalenderId, StaffId) in that order (#spencer7593 posted this suggestion while I was writing my post).
By using LEFT [OUTER] JOIN in this query, you're preventing MySQL from optimizing the order of table joins. And since your query fetches h.name, I infer that you really just want results where the calendar event has an attendee and the attendee has a corresponding user record. It makes no sense that you're not using an INNER JOIN.
Here's the EXPLAIN with the new index and the joins changed to INNER JOIN (though my row counts are meaningless because I didn't create test data):
+----+-------------+-------+------------+--------+--------------------------------+----------------------+---------+----------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+--------------------------------+----------------------+---------+----------------+------+----------+-----------------------+
| 1 | SIMPLE | a | NULL | index | PRIMARY,ID_UNIQUE,idxHospital | ID_UNIQUE | 4 | NULL | 1 | 100.00 | Using index |
| 1 | SIMPLE | e | NULL | ref | idxCalStaffID,CalenderID | CalenderID | 4 | test.a.ID | 1 | 100.00 | Using index |
| 1 | SIMPLE | f | NULL | eq_ref | PRIMARY,ID_UNIQUE | PRIMARY | 4 | test.e.StaffID | 1 | 100.00 | NULL |
| 1 | SIMPLE | h | NULL | eq_ref | index_users_on_email,idx_email | index_users_on_email | 767 | func | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+--------+--------------------------------+----------------------+---------+----------------+------+----------+-----------------------+
The type: index for the calenderattendee table has been changed to type: ref which means an index lookup against a non-unique index. And the note about Using join buffer is gone.
That should run better.
how does the ordering of the indexing affects the fetch time..?
Think of a telephone book, which is ordered by last name first, then by first name. This helps you look up people by last name very quickly. But it does not help you look up people by first name.
The position of columns in an index matters!
You might like my presentation How to Design Indexes, Really.
Slides: http://www.slideshare.net/billkarwin/how-to-design-indexes-really
Video of me presenting this talk: https://www.youtube.com/watch?v=ELR7-RdU9XU
Q: Is there any indexes I'm missing so my query is slow?
A: Yes. A suitable index on calendarattendee is missing.
We probably want an index on calenderattendee with a calendarid as the leading column, for example:
... ON calenderattendee (calendaid, staffid)
This seems like a situation where inner join might be a better option than a left join.
SELECT a.ID, h.name
FROM `stjude`.`calender` a
INNER JOIN calenderattendee e on a.ID = e.calenderID
INNER JOIN callplanstaff f on e.StaffID = f.ID
INNER JOIN users h on f.Staffname = h.email
Then let's get onto the indexes. The Calendar table has
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
The second one, ID_UNIQUE is redundant. A Primary key is a unique index. Having too many indexes slows down insert/update/delete operations.
Then the users table has
UNIQUE KEY `index_users_on_email` (`email`),
UNIQUE KEY `index_users_on_name` (`name`),
KEY `idx_email` (`email`) USING BTREE KEY_BLOCK_SIZE=100
The idx_email column is redundant here. Other than that there isn't much to do by way of tweaking the indexes. Your explain shows that an index is being used on each and table.
Why MYSQL isn't using index on callplanstaff table and users table?
Your explain shows that it does. The it's using the primary key and the index_users_on_email indexes on these tables.
Also, in my case, should I use multi index instead of multi column
index?
As a rule of thumb, mysql uses only one index per table. So a multi column index is the way to go rather than having multiple indexes.
And is there any indexes I'm missing so my query is slow?
As I mentioned in the comments you are fetching (and probably displaying) 13,000 records. That's where your bottleneck maybe.
I have a mysql table with following structure:
mysql> show create table logs \G;
Create Table: CREATE TABLE `logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`request` text,
`response` longtext,
`msisdn` varchar(255) DEFAULT NULL,
`username` varchar(255) DEFAULT NULL,
`shortcode` varchar(255) DEFAULT NULL,
`response_code` varchar(255) DEFAULT NULL,
`response_description` text,
`transaction_name` varchar(250) DEFAULT NULL,
`system_owner` varchar(250) DEFAULT NULL,
`request_date_time` datetime DEFAULT NULL,
`response_date_time` datetime DEFAULT NULL,
`comments` text,
`user_type` varchar(255) DEFAULT NULL,
`channel` varchar(20) DEFAULT 'WEB',
/**
other columns here....
other 18 columns here, with Type varchar and Text
**/
PRIMARY KEY (`id`),
KEY `transaction_name` (`transaction_name`) USING BTREE,
KEY `msisdn` (`msisdn`) USING BTREE,
KEY `username` (`username`) USING BTREE,
KEY `request_date_time` (`request_date_time`) USING BTREE,
KEY `system_owner` (`system_owner`) USING BTREE,
KEY `shortcode` (`shortcode`) USING BTREE,
KEY `response_code` (`response_code`) USING BTREE,
KEY `channel` (`channel`) USING BTREE,
KEY `request_date_time_2` (`request_date_time`),
KEY `response_date_time` (`response_date_time`)
) ENGINE=InnoDB AUTO_INCREMENT=59582405 DEFAULT CHARSET=utf8
and it has more than 30000000 records in it.
mysql> select count(*) from logs;
+----------+
| count(*) |
+----------+
| 38962312 |
+----------+
1 row in set (1 min 17.77 sec)
Now the problem is that it is very slow, the result of select takes ages to fetch records from table.
My following sub query takes almost 30 minutes to fetch records of one day:
SELECT
COUNT(sub.id) AS count,
DATE(sub.REQUEST_DATE_TIME) AS transaction_date,
sub.SYSTEM_OWNER,
sub.transaction_name,
sub.response,
MIN(sub.response_time),
MAX(sub.response_time),
AVG(sub.response_time),
sub.channel
FROM
(SELECT
id,
REQUEST_DATE_TIME,
RESPONSE_DATE_TIME,
TIMESTAMPDIFF(SECOND, REQUEST_DATE_TIME, RESPONSE_DATE_TIME) AS response_time,
SYSTEM_OWNER,
transaction_name,
(CASE
WHEN response_code IN ('0' , '00', 'EIL000') THEN 'Success'
ELSE 'Failure'
END) AS response,
channel
FROM
logs
WHERE
response_code != ''
AND DATE(REQUEST_DATE_TIME) BETWEEN '2016-10-26 00:00:00' AND '2016-10-27 00:00:00'
AND SYSTEM_OWNER != '') sub
GROUP BY DATE(sub.REQUEST_DATE_TIME) , sub.channel , sub.SYSTEM_OWNER , sub.transaction_name , sub.response
ORDER BY DATE(sub.REQUEST_DATE_TIME) DESC , sub.SYSTEM_OWNER , sub.transaction_name , sub.response DESC;
I have also added indexes to my table, but still it is very slow.
Any help on how can I make it fast ?
EDIT:
Ran the above query using EXPLAIN
+----+-------------+------------+------+----------------------------+------+---------+------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+----------------------------+------+---------+------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16053297 | Using temporary; Using filesort |
| 2 | DERIVED | logs | ALL | system_owner,response_code | NULL | NULL | NULL | 32106592 | Using where |
+----+-------------+------------+------+----------------------------+------+---------+------+----------+---------------------------------+
As it stands, the query must scan the entire table.
But first, let's air a possible bug:
AND DATE(REQUEST_DATE_TIME) BETWEEN '2016-10-26 00:00:00'
AND '2016-10-27 00:00:00'
Gives you the logs for two days -- all of the 26th and all of the 27th. Or is that what you really wanted? (BETWEEN is inclusive.)
But the performance problem is that the index will not be used because request_date_time is hiding inside a function (DATE).
Jump forward to a better way to phrase it:
AND REQUEST_DATE_TIME >= '2016-10-26'
AND REQUEST_DATE_TIME < '2016-10-26' + INTERVAL 1 DAY
A DATETIME can be compared against a date.
Midnight of the morning of the 26th is included, but midnight of the 27th is not.
You can easily change 1 to however many days you wish -- without having to deal with leap days, etc.
This formulation allows the use of the index on request_date_time, thereby cutting back severely on amount of data to be scanned.
As for other tempting areas:
!= does not optimize well, so no 'composite' index is likely to be beneficial.
Since we can't really get past the WHERE, no index is useful for GROUP BY or ORDER BY.
My comments about DATE() in WHERE do not apply to GROUP BY; no change needed.
Why have the subquery? I think it can be done in a single layer. This will eliminate a rather large temp table. (Yeah, it means 3 uses of TIMESTAMPDIFF(), but that is probably a lot cheaper than the temp table.)
How much RAM? What is the value of innodb_buffer_pool_size?
If my comments are not enough, and if you frequently run a query like this (over a day or over a date range), then we can talk about building and maintaining a Summary table, which might give you a 10x speedup.
I have a monitoring system where my customers can register their terminals and his terminals send a periodically (5min) keepalive signal to my website to inform that it is online. customers also can access a monitoring page that show all his terminals and update it's status using ajax in an interval of 20sec.
Plus information: a terminal is a android device, customer have to install an app from google play.
THE PROBLEM IS:
With increasing customer number, many peoples access the monitoring page at the same time that is almost flooding server with many requests, and on the other side. each time more terminals is comming and flooding more with it's keepalive signal. so I have besides the common pages (login, many CRUDs etc) dozens phisical terminals sending keepalive signal through internet flooding my database, and many users accessing monitoring pages to get informed their terminals are online. it seems like a time bomb. because I don't know if mysql will support when number of terminals reach hundreds and counting.
PLUS we're already noting our server is decreasing performance along the time it is running. We restart it, and it's very fast, but along the time, it will lose performance
SOLUTION
What can I do to improve performance or make the model more scalable? there is an design pattern for this kind of monitoring system that is more scalable?
There is any gain if I separate two mysql databases, one for common use (access pages, cruds etc) and another for monitoring system?
There is any gain to use MongoDB just for the monitoring part of the system?
additional information:
mysql Ver 14.14 Distrib 5.5.43, for Linux (x86_64) using readline 5.1
PHP 5.4.40 (cli) (built: Apr 15 2015 15:55:28)
Jetty 8.1.14 (for java server side that comunicates with android app)
Server Mon
Free memory ........: 17.84 Gb
Total memory........: 20 Gb
Used memory.........: 2.16 Gb
RAM.................: 20 Kb
JVM Free memory.....: 1.56 Gb
JVM Maximum memory..: 3.93 Gb
JVM Total available.: 1.93 Gb
**************************************
Total (cores).: 10
CPU idle......: 4.9%
CPU nice......: 0.0%
CPU system....: 4183000.0%
CPU total.....: 5.0%
CPU user......: 2.6%
**************************************
Total space (bytes)..: 600 Gb
Free space (bytes)...: 595.64 Gb
Usable space (bytes).: 595.64 Gb
PART OF MODEL AND MONITORING PAGE'S QUERY
This is terminals table
CREATE TABLE IF NOT EXISTS `GM_PLAYER` (
`ID_PLAYER` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`DS_GCM_ID` VARCHAR(250) NULL,
`DT_CRIACAO` DATETIME NOT NULL,
`DS_PLAYER` VARCHAR(100) NOT NULL,
`DS_JANELA_HEIGHT` INT(11) NOT NULL DEFAULT '1024',
`DS_JANELA_WIDTH` INT(11) NOT NULL DEFAULT '768',
`DS_JANELA_POS_X` INT(11) NOT NULL DEFAULT '0',
`DS_JANELA_POS_Y` INT(11) NOT NULL DEFAULT '0',
`DS_WALLPAPER` VARCHAR(255) NULL DEFAULT NULL,
`FL_ATIVO` CHAR(1) NOT NULL DEFAULT 'N',
`FL_FULL_SCREEN` CHAR(1) NOT NULL DEFAULT 'S',
`FL_MOUSE_VISIBLE` CHAR(1) NOT NULL DEFAULT 'N',
`DS_SERIAL` VARCHAR(50) NULL DEFAULT NULL,
`VERSAO_APP` VARCHAR(20) NULL DEFAULT NULL,
`VERSAO_OS` VARCHAR(20) NULL DEFAULT NULL,
`FL_EXIBIR_STATUS_BAR` CHAR(1) NOT NULL DEFAULT 'S',
`ID_GRADE_PROGRAMACAO` BIGINT UNSIGNED NULL DEFAULT NULL,
`ID_CLIENTE` BIGINT UNSIGNED NULL,
`ID_PONTO` BIGINT UNSIGNED NULL,
`FL_ATIVO_SISTEMA` CHAR(1) NOT NULL DEFAULT 'S',
`FL_DEBUG` CHAR(1) NOT NULL DEFAULT 'N',
`VERSAO_APP_UPDATE` VARCHAR(20) NULL,
`FL_ESTADO_MONITOR` CHAR(1) NOT NULL DEFAULT 'L',
`FL_DEVICE_ROOTED` CHAR(1) DEFAULT 'N',
`DT_ATIVACAO` DATETIME ,
`DT_EXPIRA` DATETIME ,
`FL_EXCLUIDO` CHAR(1) DEFAULT 'N' ,
`ID_USUARIO` BIGINT UNSIGNED NOT NULL,
`ID_PACOTE` BIGINT UNSIGNED ,
`DS_IMG_BARRA` VARCHAR(255),
`FL_EXIBIR_HORA` CHAR(1),
`DS_TEXTO_BARRA` TEXT,
PRIMARY KEY (`ID_PLAYER`),
UNIQUE INDEX `UQ_GM_PLAYER_ID_PLAYER` (`ID_PLAYER` ASC),
INDEX `ID_GRADE_PROGRAMACAO` (`ID_GRADE_PROGRAMACAO` ASC),
INDEX `FK_GM_PLAYER_GM_CLIENTE_idx` (`ID_CLIENTE` ASC),
CONSTRAINT `FK_GM_PLAYER_GM_USUARIO` FOREIGN KEY (`ID_USUARIO`) REFERENCES `GM_USUARIO` (`ID_USUARIO`) ON DELETE RESTRICT,
CONSTRAINT `FK_GM_PLAYER_GM_GRADE_PROGRAMACAO` FOREIGN KEY (`ID_GRADE_PROGRAMACAO`) REFERENCES `GM_GRADE_PROGRAMACAO` (`ID_GRADE_PROGRAMACAO`) ON DELETE RESTRICT,
CONSTRAINT `FK_GM_PLAYER_GM_CLIENTE` FOREIGN KEY (`ID_CLIENTE`) REFERENCES `GM_CLIENTE` (`ID_CLIENTE`) ON DELETE RESTRICT
)
ENGINE = InnoDB
AUTO_INCREMENT = 5
DEFAULT CHARACTER SET = latin1;
another used tables
CREATE TABLE IF NOT EXISTS `GM_CLIENTE` (
`ID_CLIENTE` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`DT_CRIACAO` DATETIME NOT NULL,
`DS_CLIENTE` VARCHAR(50) NOT NULL,
`FL_ATIVO` ENUM('S','N') NULL DEFAULT 'S',
`ID_CONTATO` BIGINT UNSIGNED NOT NULL,
`ID_ENDERECO` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`ID_CLIENTE`),
UNIQUE INDEX `UQ_Cliente_ID_CLIENTE` (`ID_CLIENTE` ASC),
INDEX `fk_GM_CLIENTE_GM_CONTATO1_idx` (`ID_CONTATO` ASC),
INDEX `fk_GM_CLIENTE_GM_ENDERECO1_idx` (`ID_ENDERECO` ASC),
CONSTRAINT `fk_GM_CLIENTE_GM_CONTATO1`
FOREIGN KEY (`ID_CONTATO`)
REFERENCES `GM_CONTATO` (`ID_CONTATO`)
ON DELETE RESTRICT,
CONSTRAINT `fk_GM_CLIENTE_GM_ENDERECO1`
FOREIGN KEY (`ID_ENDERECO`)
REFERENCES `GM_ENDERECO` (`ID_ENDERECO`)
ON DELETE RESTRICT)
ENGINE = InnoDB
AUTO_INCREMENT = 2
DEFAULT CHARACTER SET = latin1;
CREATE TABLE GM_USUARIO_CLIENTE (
ID_USUARIO_CLIENTE INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
ID_CLIENTE BIGINT UNSIGNED ,
ID_USUARIO BIGINT UNSIGNED
);
This is the table where I update every time I receive a new terminal keepalive signal
CREATE TABLE IF NOT EXISTS `GM_LOG_PLAYER` (
`id_log_player` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`dt_criacao` DATETIME NOT NULL,
`id_player` BIGINT UNSIGNED NULL,
`qtd_midias_exibidas` INT(11) NULL,
`id_ultima_midia_exibida` BIGINT UNSIGNED NULL,
`up_time_android` bigint(20) unsigned default '0',
`up_time_app` bigint(20) unsigned default '0',
`mem_utilizada` BIGINT(20) NULL,
`mem_disponivel` BIGINT(20) NULL,
`hd_disponivel` BIGINT(20) NULL,
`hd_utilizado` BIGINT(20) NULL,
PRIMARY KEY (`id_log_player`),
UNIQUE INDEX `UQ_id_log_player` (`id_log_player` ASC),
INDEX `FK_GM_LOG_PLAYER_GM_PLAYER_idx` (`id_player` ASC),
INDEX `FK_GM_LOG_PLAYER_GM_MIDIA_idx` (`id_ultima_midia_exibida` ASC),
CONSTRAINT `FK_GM_LOG_PLAYER_GM_PLAYER`
FOREIGN KEY (`id_player`)
REFERENCES `GM_PLAYER` (`ID_PLAYER`)
ON DELETE CASCADE,
CONSTRAINT `FK_GM_LOG_PLAYER_GM_MIDIA`
FOREIGN KEY (`id_ultima_midia_exibida`)
REFERENCES `GM_MIDIA` (`ID_MIDIA`))
ENGINE = InnoDB
AUTO_INCREMENT = 3799
DEFAULT CHARACTER SET = latin1;
CREATE TABLE IF NOT EXISTS `GM_GRADE_PROGRAMACAO` (
`ID_GRADE_PROGRAMACAO` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`DT_CRIACAO` DATETIME NOT NULL,
`DS_GRADE_PROGRAMACAO` VARCHAR(100) NULL DEFAULT NULL,
`ID_USUARIO` BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (`ID_GRADE_PROGRAMACAO`),
UNIQUE INDEX `UQ_GM_GRADE_PROGRAMACAO_ID_GRADE_PROGRAMACAO` (`ID_GRADE_PROGRAMACAO` ASC),
INDEX `fk_GM_GRADE_PROGRAMACAO_GM_USUARIO1_idx` (`ID_USUARIO` ASC),
CONSTRAINT `fk_GM_GRADE_PROGRAMACAO_GM_USUARIO1`
FOREIGN KEY (`ID_USUARIO`)
REFERENCES `GM_USUARIO` (`ID_USUARIO`)
ON DELETE RESTRICT)
ENGINE = InnoDB
AUTO_INCREMENT = 3
DEFAULT CHARACTER SET = latin1;
This is the query executed periodically through ajax requests to update monitoring page
SELECT * FROM (
SELECT
LOG.id_log_player ,
LOG.dt_criacao ,
DATE_FORMAT (LOG.DT_CRIACAO , '%d/%m/%Y %H:%i:%s') F_DT_CRIACAO ,
(CURRENT_TIMESTAMP - LOG.DT_CRIACAO) AS IDADE_REGISTRO ,
LOG.qtd_midias_exibidas ,
LOG.id_ultima_midia_exibida ,
LOG.up_time_android ,
LOG.up_time_app ,
LOG.mem_utilizada ,
LOG.mem_disponivel ,
LOG.hd_disponivel ,
LOG.hd_utilizado ,
PLA.FL_MONITOR_LIGADO,
CLI.DS_CLIENTE ,
PLA.ID_PLAYER id_player ,
PLA.DS_PLAYER ,
PLA.ID_CLIENTE ,
PLA.VERSAO_APP ,
PLA.FL_ATIVO PLA_FL_ATIVO ,
PLA.ID_GRADE_PROGRAMACAO ,
PLA.FL_DEVICE_ROOTED ,
PLA.DS_GCM_ID ,
PLA.FL_HDMI_LIGADO ,
-- IF(PLA.FL_ATIVO='N',0,IF(PLA.ID_GRADE_PROGRAMACAO IS NULL,0,IF(PLA.ID_GRADE_PROGRAMACAO='0',0,1))) ATIVO,
IF(PLA.FL_ATIVO='N',0,1) ATIVO,
DATE_FORMAT (LOG.DT_CRIACAO , '%Y%m%d%H%i%s') TIME_STAMP_CRIACAO ,
DATE_FORMAT (LOG.DT_CRIACAO , '%d/%m às %H:%i') F_DT_CRIACAO_MIN ,
-- (CURRENT_TIMESTAMP - LOG.DT_CRIACAO) ESPERA_NOVA_COMUNICACAO ,
--GRA.ID_GRADE_PROGRAMACAO GRA_ID_GRADE ,
GRA.DS_GRADE_PROGRAMACAO GRA_DS_GRADE_PROGRAMACAO,
MID.DS_PATH_THUMB THUMB_ULTMID
FROM GM_PLAYER PLA
LEFT JOIN GM_CLIENTE CLI USING ( ID_CLIENTE )
LEFT JOIN GM_USUARIO_CLIENTE GUC USING ( ID_CLIENTE )
LEFT JOIN GM_LOG_PLAYER LOG USING ( ID_PLAYER )
LEFT JOIN GM_GRADE_PROGRAMACAO GRA USING ( ID_GRADE_PROGRAMACAO )
-- LEFT JOIN GM_GRADE_PROGRAMACAO GRA ON ( PLA.ID_GRADE_PROGRAMACAO = GRA.ID_GRADE_PROGRAMACAO )
LEFT JOIN GM_MIDIA MID ON ( LOG.ID_ULTIMA_MIDIA_EXIBIDA = MID.ID_MIDIA )
WHERE PLA.ID_USUARIO = ?
AND PLA.FL_EXCLUIDO = 'N'
AND PLA.FL_ATIVO = 'S'
ORDER BY LOG.DT_CRIACAO DESC
) TBALL
GROUP BY ID_PLAYER
ORDER BY PLA_FL_ATIVO DESC , DT_CRIACAO DESC
EXPLAIN QUERY ABOVE (taken from development database)
+----+-------------+------------+--------+------------------------------------------------------+----------------------------------------------+---------+--------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+------------------------------------------------------+----------------------------------------------+---------+--------------------------------------+-------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 37752 | Using temporary; Using filesort |
| 2 | DERIVED | PLA | ALL | NULL | NULL | NULL | NULL | 44 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | CLI | eq_ref | PRIMARY,UQ_Cliente_ID_CLIENTE | PRIMARY | 8 | imidiatv.PLA.ID_CLIENTE | 1 | NULL |
| 2 | DERIVED | GUC | ref | fk_GM_CLIENTE_has_GM_USUARIO_GM_CLIENTE1_idx | fk_GM_CLIENTE_has_GM_USUARIO_GM_CLIENTE1_idx | 8 | imidiatv.PLA.ID_CLIENTE | 1 | Using index |
| 2 | DERIVED | LOG | ref | FK_GM_LOG_PLAYER_GM_PLAYER_idx | FK_GM_LOG_PLAYER_GM_PLAYER_idx | 9 | imidiatv.PLA.ID_PLAYER | 858 | NULL |
| 2 | DERIVED | GRA | eq_ref | PRIMARY,UQ_GM_GRADE_PROGRAMACAO_ID_GRADE_PROGRAMACAO | PRIMARY | 8 | imidiatv.PLA.ID_GRADE_PROGRAMACAO | 1 | NULL |
| 2 | DERIVED | MID | eq_ref | PRIMARY,UQ_GM_MIDIA_ID_MIDIA | PRIMARY | 8 | imidiatv.LOG.id_ultima_midia_exibida | 1 | NULL |
+----+-------------+------------+--------+------------------------------------------------------+----------------------------------------------+---------+--------------------------------------+-------+----------------------------------------------+
Thanks in advance
Partial answer...
One aspect of scaling is to minimize the disk footprint so that caching will be more effective. Toward that end, here are some suggestions:
PRIMARY KEY (`id_log_player`),
UNIQUE INDEX `UQ_id_log_player` (`id_log_player` ASC),
A PRIMARY KEY is a UNIQUE key, so the latter is redundant and wasteful of disk space and INSERT time. DROP it.
INT is 4 bytes; BIGINT is 8 bytes. ID_xx INT UNSIGNED can handle up to 4 billion values; do you really need to go beyond 4 billion? In InnoDB, each secondary key contains a copy of the PRIMARY KEY, meaning that an unnecessary BIGINT PK consumes a lot more space.
Your tables are latin1; are you limiting the App to western languages? If you change to utf8 (or utf8mb4), I will point out wasted space for CHAR(1).
Please perform EXPLAIN SELECT ... with the tables as they stand; then make some of the changes below and do the EXPLAIN again. I'm thinking that the difference may be dramatic. I expect the part dealing with
LEFT JOIN GM_USUARIO_CLIENTE GUC USING ( ID_CLIENTE )
to be quite 'dramatic'.
If GM_USUARIO_CLIENTE is a "many-to-many" mapping, ...
Get rid of the AUTO_INCREMENT; instead use PRIMARY KEY(ID_CLIENTE, ID_USUARIO) to save some space and make it more efficient. (And if you do go beyond 4 billion CLIENTEs, etc, the INT would not suffice!)
Add two indexes so that lookups will be much faster. (1) the PK (above), and (2) the other direction: INDEX(ID_USUARIO, ID_CLIENTE). Without those, JOINs involving that table will be slower and slower as you scale.
Date arithmetic is not this simple:
(CURRENT_TIMESTAMP - LOG.DT_CRIACAO)
Study the manual page on date functions; it is more complex to subtract TIMESTAMP - DATETIME. If you will be spanning timezones, be careful which datatype you use for what.
I see this pattern:
SELECT * FROM (
SELECT ...
ORDER BY ... -- ??
) x
GROUP BY ...
What were you hoping to achieve? The optimizer is free to ignore the ORDER BY in the subquery. (Although, it may actually be performing it.)
Don't use LEFT unless you have a reason for it.
This clause
WHERE PLA.ID_USUARIO = ?
AND PLA.FL_EXCLUIDO = 'N'
AND PLA.FL_ATIVO = 'S'
would benefit (greatly?) from INDEX(ID_USUARIO, FL_EXCLUIDO, FL_ATIVO). The order (in this case) of the columns in the index does not matter. If those two flags are changing frequently, do not include them in the INDEX -- UPDATEs might be slowed down more than SELECTs would benefit.
Those were the easy-to-spot suggestions. EXPLAIN may help spot more suggestions. And do you have other SELECTs?
I said "partial solution". Was that SELECT the "monitoring select"? Let's also check the periodic UPDATEs.
I have a table with 25 million rows, indexed appropriately.
But adding the clause AND status IS NULL turns a super fast query into a crazy slow query.
Please help me speed it up.
Query:
SELECT
student_id,
grade,
status
FROM
grades
WHERE
class_id = 1
AND status IS NULL -- This line delays results from <200ms to 40-70s!
AND grade BETWEEN 0 AND 0.7
LIMIT 25;
Table:
CREATE TABLE IF NOT EXISTS `grades` (
`student_id` BIGINT(20) NOT NULL,
`class_id` INT(11) NOT NULL,
`grade` FLOAT(10,6) DEFAULT NULL,
`status` INT(11) DEFAULT NULL,
UNIQUE KEY `unique_key` (`student_id`,`class_id`),
KEY `class_id` (`class_id`),
KEY `status` (`status`),
KEY `grade` (`grade`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Local development shows results instantly (<200ms). Production server is huge slowdown (40-70 seconds!).
Can you point me in the right direction to debug?
Explain:
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
| 1 | SIMPLE | grades | index_merge | class_id,status,grade | status,class_id | 5,4 | NULL | 26811 | Using intersect(status,class_id); Using where |
+----+-------------+--------+-------------+-----------------------+-----------------+---------+------+-------+--------------------------------------------------------+
A SELECT statement can only use one index per table.
Presumably the query before just did a scan using the sole index class_id for your condition class_id=1. Which will probably filter your result set nicely before checking the other conditions.
The optimiser is 'incorrectly' choosing an index merge on class_id and status for the second query and checking 26811 rows which is probably not optimal. You could hint at the class_id index by adding USING INDEX (class_id) to the end of the FROM clause.
You may get some joy with a composite index on (class_id,status,grade) which may run the query faster as it can match the first two and then range scan the grade. I'm not sure how this works with null though.
I'm guessing the ORDER BY pushed the optimiser to choose the class_id index again and returned your query to it's original speed.
A query that used to work just fine on a production server has started becoming extremely slow (in a matter of hours).
This is it:
SELECT * FROM news_articles WHERE published = '1' AND news_category_id = '4' ORDER BY date_edited DESC LIMIT 1;
This takes up to 20-30 seconds to execute (the table has ~200.000 rows)
This is the output of EXPLAIN:
+----+-------------+---------------+-------------+----------------------------+----------------------------+---------+------+------+--------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------------+----------------------------+----------------------------+---------+------+------+--------------------------------------------------------------------------+
| 1 | SIMPLE | news_articles | index_merge | news_category_id,published | news_category_id,published | 5,5 | NULL | 8409 | Using intersect(news_category_id,published); Using where; Using filesort |
+----+-------------+---------------+-------------+----------------------------+----------------------------+---------+------+------+--------------------------------------------------------------------------+
Playing around with it, I found that hinting a specific index (date_edited) makes it much faster:
SELECT * FROM news_articles USE INDEX (date_edited) WHERE published = '1' AND news_category_id = '4' ORDER BY date_edited DESC LIMIT 1;
This one takes milliseconds to execute.
EXPLAIN output for this one is:
+----+-------------+---------------+-------+---------------+-------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+-------------+---------+------+------+-------------+
| 1 | SIMPLE | news_articles | index | NULL | date_edited | 8 | NULL | 1 | Using where |
+----+-------------+---------------+-------+---------------+-------------+---------+------+------+-------------+
Columns news_category_id, published and date_edited are all indexed.
The storage engine is InnoDB.
This is the table structure:
CREATE TABLE `news_articles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`title` text NOT NULL,
`subtitle` text NOT NULL,
`summary` text NOT NULL,
`keywords` varchar(500) DEFAULT NULL,
`body` mediumtext NOT NULL,
`source` varchar(255) DEFAULT NULL,
`source_visible` int(11) DEFAULT NULL,
`author_information` enum('none','name','signature') NOT NULL DEFAULT 'name',
`date_added` datetime NOT NULL,
`date_edited` datetime NOT NULL,
`views` int(11) DEFAULT '0',
`news_category_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`c_forwarded` int(11) DEFAULT '0',
`published` int(11) DEFAULT '0',
`deleted` int(11) DEFAULT '0',
`permalink` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `news_category_id` (`news_category_id`),
KEY `published` (`published`),
KEY `deleted` (`deleted`),
KEY `date_edited` (`date_edited`),
CONSTRAINT `news_articles_ibfk_3` FOREIGN KEY (`news_category_id`) REFERENCES `news_categories` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT `news_articles_ibfk_4` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=192588 DEFAULT CHARSET=utf8
I could possibly change all queries my web application does to hint using that index. but this is considerable work.
Is there some way to tune MySQL so that the first query is made more efficient without actually rewriting all queries?
just a few tips..
1 - It seems to me the fields published and news_category_id are INTEGER. If so, please remove the single quotes from your query. It can make a huge difference when comes to performance;
2 - Also, I'd say that your field published has no many different values (it is probably 1 - yes and 0 - no, or something like that). If I'm right, this is not a good field to index at all. The parse in this case still has to go through all the records to find what it is looking for; In this case move the news_category_id to be the first field in your WHERE clause.
3 - "Don't forget about the most left index". This affirmation is valid for your SELECT, JOINS, WHERE, ORDER BY. Even the position of the columns on the table are imporant, keep the indexed ones on the top. Indexes are your friend as long as you know how to play with them.
Hope it can help you in somehow..
SELECT * FROM news_articles WHERE published = '1' AND news_category_id = '4' ORDER BY date_edited DESC LIMIT 1;
Original:
SELECT * FROM news_articles
WHERE published = 1 AND news_category_id = 4
ORDER BY date_edited DESC LIMIT 1;
Since you have LIMIT 1, you're only selecting the latest row. ORDER BY date_edited tells MySQL to sort then take 1 row off the top. This is really slow, and why USE INDEX would help.
Try to match MAX(date_edited) in the WHERE clause instead. That should get the query planner to use its index automatically.
Choose MAX(date_entered):
SELECT * FROM news_articles
WHERE published = 1 AND news_category_id = 4
AND date_edited = (select max(date_edited) from news_articles);
Please change your query to :
SELECT * FROM news_articles WHERE published = 1 AND news_category_id = 4 ORDER BY date_edited DESC LIMIT 1;
Please note that i have removed quotes from '1' and '4' data provided in query
The difference in the datatype passed and the column structure does not allow mysql to be able to use the index on these 2 columns.