I have an InnoDB table in MySQL 5.5.53 where simple updates like
UPDATE mytable SET acol = 'value' WHERE id = 42;
hang for several seconds. id is the primary key of the table.
If I enable query profiling using
SET profiling = 1;
then run the query and look at the profile, I see something like:
show profile;
+------------------------------+----------+
| Status | Duration |
+------------------------------+----------+
| starting | 0.000077 |
| checking permissions | 0.000008 |
| Opening tables | 0.000024 |
| System lock | 0.000008 |
| init | 0.000346 |
| Updating | 0.000108 |
| end | 0.000004 |
| Waiting for query cache lock | 0.000002 |
| end | 3.616845 |
| query end | 0.000016 |
| closing tables | 0.000015 |
| freeing items | 0.000023 |
| logging slow query | 0.000003 |
| logging slow query | 0.000048 |
| cleaning up | 0.000004 |
+------------------------------+----------+
That is, all the time is spent in end.
The documentation says:
end
This occurs at the end but before the cleanup of ALTER TABLE, CREATE VIEW, DELETE, INSERT, SELECT, or UPDATE statements.
How can such a simple statement spend such a long time in this state?
It turns out that the problem is the query cache.
If I disable it with
SET GLOBAL query_cache_size = 0;
SET GLOBAL query_cache_type = 0;
the problem goes away.
It must be invalidating query cache entries that causes the query to hang for such a long time.
According to slow query log, the following query (and similar queries) would take around 2s to execute occassionally:
INSERT INTO incoming_gprs_data (data,type) VALUES ('3782379837891273|890128398120983891823881abcabc','GT100');
Table structure:
CREATE TABLE `incoming_gprs_data` (
`id` int(200) NOT NULL AUTO_INCREMENT,
`dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`data` text NOT NULL,
`type` char(10) NOT NULL,
`test_udp_id` int(20) NOT NULL,
`parse_result` text NOT NULL,
`completed` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `completed` (`completed`)
) ENGINE=InnoDB AUTO_INCREMENT=5478246 DEFAULT CHARSET=latin1
Activities related to this table:
Around 200 rows are inserted to this table every second. The incoming data is originating from different sources (thus, it does not happen in one process but multiple processed at every second).
A cron process will process these rows by getting the rows via SELECT * FROM incoming_gprs_data WHERE completed = 0, process them, and update completed = 1
Another cron process (runs every 15 minutes) will delete the completed rows (i.e. completed = 1) to make the table slimmer.
Slow log query does not indicate any slow SELECT query related to the table.
The size of the table is relatively small less than 200K rows.
The reason we are doing #2 and #3 because previously, we have discovered that deleting completed row took time because the index needs to be rebuilt. Therefore, we added the completed flag and perform the deletion less frequently. These changes help to reduce the number of slow queries.
Here are the innodb_settings that we have:
+---------------------------------+------------------------+
| Variable_name | Value |
+---------------------------------+------------------------+
| have_innodb | YES |
| ignore_builtin_innodb | OFF |
| innodb_adaptive_flushing | ON |
| innodb_adaptive_hash_index | ON |
| innodb_additional_mem_pool_size | 8388608 |
| innodb_autoextend_increment | 8 |
| innodb_autoinc_lock_mode | 1 |
| innodb_buffer_pool_instances | 2 |
| innodb_buffer_pool_size | 6442450944 |
| innodb_change_buffering | all |
| innodb_checksums | ON |
| innodb_commit_concurrency | 0 |
| innodb_concurrency_tickets | 500 |
| innodb_data_file_path | ibdata1:10M:autoextend |
| innodb_data_home_dir | |
| innodb_doublewrite | OFF |
| innodb_fast_shutdown | 1 |
| innodb_file_format | Antelope |
| innodb_file_format_check | ON |
| innodb_file_format_max | Antelope |
| innodb_file_per_table | ON |
| innodb_flush_log_at_trx_commit | 2 |
| innodb_flush_method | O_DIRECT |
| innodb_force_load_corrupted | OFF |
| innodb_force_recovery | 0 |
| innodb_io_capacity | 200 |
| innodb_large_prefix | OFF |
| innodb_lock_wait_timeout | 50 |
| innodb_locks_unsafe_for_binlog | OFF |
| innodb_log_buffer_size | 67108864 |
| innodb_log_file_size | 536870912 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_max_dirty_pages_pct | 75 |
| innodb_max_purge_lag | 0 |
| innodb_mirrored_log_groups | 1 |
| innodb_old_blocks_pct | 37 |
| innodb_old_blocks_time | 0 |
| innodb_open_files | 300 |
| innodb_purge_batch_size | 20 |
| innodb_purge_threads | 0 |
| innodb_random_read_ahead | OFF |
| innodb_read_ahead_threshold | 56 |
| innodb_read_io_threads | 4 |
| innodb_replication_delay | 0 |
| innodb_rollback_on_timeout | OFF |
| innodb_rollback_segments | 128 |
| innodb_spin_wait_delay | 6 |
| innodb_stats_method | nulls_equal |
| innodb_stats_on_metadata | OFF |
| innodb_stats_sample_pages | 8 |
| innodb_strict_mode | OFF |
| innodb_support_xa | ON |
| innodb_sync_spin_loops | 30 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
| innodb_use_native_aio | OFF |
| innodb_use_sys_malloc | ON |
| innodb_version | 1.1.8 |
| innodb_write_io_threads | 4 |
+---------------------------------+------------------------+
We have set our innodb_buffer_pool_size to 6G after calculating using the follow SQL query:
SELECT CEILING(Total_InnoDB_Bytes*1.6/POWER(1024,3)) RIBPS FROM (SELECT SUM(data_length+index_length) Total_InnoDB_Bytes FROM information_schema.tables WHERE engine='InnoDB') A;
And it generates the result of 5GB. We estimated that it won't exceed this size for our InnoDB tables.
Our primary concern right at the moment is on how to speed up the insert query into the table and what causes the occasional slow insert queries.
As you know, 200 rows a second of insertion is a lot. It is worth your trouble to try to optimize this data flow on an application of this scale.
InnoDB uses database transactions on all insertions. That is, every insert looks like this:
START TRANSACTION;
INSERT something...;
COMMIT;
If you don't specify these transactions, you get autocommit behavior.
The secret to doing insertions at high volume is to do many of them in each transaction, like so:
START TRANSACTION;
INSERT something...;
INSERT something...;
...
INSERT something...;
INSERT something...;
INSERT something...;
COMMIT;
START TRANSACTION;
INSERT something...;
INSERT something...;
...
INSERT something...;
INSERT something...;
INSERT something...;
COMMIT;
START TRANSACTION;
INSERT something...;
INSERT something...;
...
INSERT something...;
INSERT something...;
INSERT something...;
COMMIT;
I have had good success with up to one hundred INSERT commands before each COMMIT;
Do not forget the final COMMIT! Don't ask me how I know to give this advice. :-)
Another way to do this in MySQL is with multiple-row INSERT commands In your case they might look like this.
INSERT INTO incoming_gprs_data (data,type) VALUES
('3782379837891273|890128398120983891823881abcabc','GT100'),
('3782379837891273|890128398120983891823881abcabd','GT101'),
('3782379837891273|890128398120983891823881abcabe','GT102'),
...
('3782379837891273|890128398120983891823881abcabf','GT103'),
('3782379837891273|890128398120983891823881abcac0','GT104');
A third way, the hardest and the highest performance way, to get a very high insert rate is to store your batches of data in text files, and then use the LOAD DATA INFILE command to put the data into your table. This technique can be very fast indeed, especially if the file can be loaded directly from the file system of your MySQL server.
I suggest you try the transaction stuff first to see if you get the performance you need.
Another thing: if you have a quiet time of day or night, you can delete the completed rows then, rather than every fifteen minutes. In any case, when you read back these rows to process or to delete, you should use a transaction-batch process like this:
done = false /* pseudocode for your programming language */
while not done {
DELETE FROM table WHERE completed = 1 LIMIT 50;
if that query handled zero rows {
done = true
}
}
This will do your deletion operation in reasonably sized transactional batches. Your occasional two-second insertion delay is probably a result of a very large transactional batch on your processing or deletion.
Around 200 rows are inserted to this table every second. One at a time? Much better is to do a single multi-row INSERT.
Slow log query does not indicate any slow SELECT query related to the table. Lower long_query_time, the default of 10 seconds is virtually 'useless'.
Cron processes do SELECT * FROM incoming_gprs_data WHERE completed = 0.
Don't scan the entire table all at once. Walk through the table, preferably via the PRIMARY KEY, doing say 1000 rows at a time. More details on chunking.
The index is not "rebuilt", it is always incrementally updated. (I hope you are not explicitly rebuilding it!)
I assume you have at least 8GB of RAM? (The buffer_pool is my clue, which should be about 70% of available RAM.)
int(200) -- the (200) means nothing. An INT is 4 bytes regardless.
Don't do two cron processes; go ahead and delete on the first pass. The UPDATE to set completed is about as costly as the DELETE.
More
If you cannot "batch" the inserts, can you at least but them in a single "transaction" (BEGIN...COMMIT)? Ditto for the DELETEs. For data integrity, there is at least one disk hit per transaction. So, doing several operations in a single transaction decreases I/O, thereby speeding up the query. But... Don't get carried away; if you do a million inserts/deletes/updates in a single transaction, there are other issues.
Another thing that can be done to decrease the I/O overhead: innodb_flush_log_at_trx_commit = 2, which is faster, but less safe than the default of 1. If your "200 inserts/sec" is 200 transactions (such as with autocommit=1), this setting change can make a big difference.
You posted in your own answer:
To resolve this, we alter the incoming_gprs_data table to use MEMORY
engine. This table acts like a temporary table to gather all the
incoming data from different sources. We will then using a cron will
process these data, insert them into another table processed_data_xxx,
and finally delete them. This removes all the slow insert queries.
You should use a message queue for this, not a database. If you have a workflow that processes data and then deletes it, this sounds perfect for a message queue. There are many message queues that can handle 200 entries per second easily.
Instead of a cron job to update and delete records from a database, you could just have an application listening to a topic on the message queue, process an item, and then... nothing. No need to store that item, just move on to the next item from the queue.
We use Apache ActiveMQ at my current company. I know other developers who recommend RabbitMQ as well.
In the end, we ended up having a different solution to resolve this mainly because the incoming data that we are inserting is coming from different sources (hence, different processes). Therefore, we cannot use multiple-row INSERT and START TRANSACTION and COMMIT in this matter.
To resolve this, we alter the incoming_gprs_data table to use MEMORY engine. This table acts like a temporary table to gather all the incoming data from different sources. We will then using a cron will process these data, insert them into another table processed_data_xxx, and finally delete them. This removes all the slow insert queries.
We do understand the cons of having MEMORY engine (such as high volatility and lack of sorting and hash indexes). But, the speed in writing and reading using MEMORY engine suits this situation.
In inserting the processed data to the table processed_data_xxx, we have followed suggestion from #Ollie Jones to use START TRANSACTION and COMMIT instead of autocommitting each insert query.
One solution that is fairly pragmatic is to not do direct inserts, but to write to a redis queue and then consume that once per second in order to do batch inserts. These processes require only a few lines of code (in any language).
Something like: In a loop read all records from the queue and insert them into mysql. Sleep x times 100 ms in the loop until the wall clock is one second further and then start the loop again.
It is very fast and pragmatic, but you lose real-time confirmation of successful inserts into the database. With this method I was able to achieve up to 40k inserts per second on a single machine.
I have two databases A and B on MySQL server.
A is the original database and B is derived from A changing the format of some tables. So for each table_A in db A there's a respective table table_B in db B and for each row in table_A there is a respective row in table_B representing the exact same table entry, in a different format.
I'm pretty sure that explaining this "format difference" between A and B is irrelevant of what I'm going to ask.
I use Java, JDBC actually, to interface with MySQL server.
I have a number of "SELECT" queries for db A and the equivalent queries for db B. I want to execute them repeatedly and calculate some metrics, like so:
execute SELECT query on db A and calculate metrics;
execute equivalent SELECT query on db B and calculate metrics;
UPDATE data stored in db A and db B by a percentage
loop
The final goal is to compare the performance of the "same" queries on the two twin dbs, to see what effect the "format difference" has in query performance.
My questions:
How can I calculate CPU time of the query execution? Currently what I do is:
long startTime = System.currentTimeMillis();
ResultSet rs = stmt.executeQuery(QUERY);
long time = System.currentTimeMillis() - startTime;
Is this accurate?
How can I calculate other metrics such as memory usage, cache usage, disk reads, disk writes, buffer gets
Could anyone suggest any other metrics to compare the performance of the "same" queries on the two databases?
There are a lot of metrics you cannot get. But here is a set I like to get:
FLUSH STATUS;
SELECT ...; -- or whatever query
SHOW SESSION STATUS LIKE 'Handler%';
The last command might give something like
mysql> SHOW SESSION STATUS LIKE 'Handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 1 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 5484 | -- rows in the table; so it did a table scan
| Handler_read_prev | 0 |
| Handler_read_rnd | 7 |
| Handler_read_rnd_next | 14 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 13 | -- wrote to a tmp table 13 rows after a GROUP BY
+----------------------------+-------+
18 rows in set (0.00 sec)
Caching comes and goes, so timings can vary even by a factor of 10. Handlers, on the other hand, are very consistent. They give me insight into what is happening.
If you are running through JDBC, run the FLUSH like you would a non-SELECT; run the SHOW like a SELECT that gives you 2 columns.
I have a large database (approx 50GB). It is on a server I have little control over, but I know they are using mysqldump to do backups nightly.
I have a query that takes hours to finish. I set it to run, but it never actually finishes.
I've noticed that after the backup time, all the tables have a lock request (SHOW OPEN TABLES WHERE in_use > 0; lists all tables).
The tables from my query have in_use = 2, all other tables have in_use = 1.
So... what is happening here?
a) my query is running normally, blocking the dump from happening. I should just wait?
b) the dump is causing the server to hang (maybe lack of memory/disk space?)
c) something else?
EDIT: using MyISAM tables
There is a server admin who is not very competent, but if I ask him specific things he does them. What should I get him to check?
EDIT: adding query
SELECT citing.article_id as citing, citing.year, r.id_when_cited, cited_issue.country
FROM isi_lac_authored_articles as citing # 1M records
JOIN isi_citation_references r ON (citing.article_id = r.article_id) # 400M records
JOIN isi_articles cited ON (cited.id_when_cited = r.id_when_cited) # 25M records
JOIN isi_issues cited_issue ON (cited.issue_id = cited_issue.issue_id) # 1M records
This is what EXPLAIN has to say:
+----+-------------+-------------+------+--------------------------------------------------------------------------+---------------------------------------+---------+-------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+--------------------------------------------------------------------------+---------------------------------------+---------+-------------------------------+---------+-------------+
| 1 | SIMPLE | cited_issue | ALL | NULL | NULL | NULL | NULL | 1156856 | |
| 1 | SIMPLE | cited | ref | isi_articles_id_when_cited,isi_articles_issue_id | isi_articles_issue_id | 49 | func | 19 | Using where |
| 1 | SIMPLE | r | ref | isi_citation_references_article_id,isi_citation_references_id_when_cited | isi_citation_references_id_when_cited | 17 | mimir_dev.cited.id_when_cited | 4 | Using where |
| 1 | SIMPLE | citing | ref | isi_lac_authored_articles_article_id | isi_lac_authored_articles_article_id | 16 | mimir_dev.r.article_id | 1 | |
+----+-------------+-------------+------+--------------------------------------------------------------------------+---------------------------------------+---------+-------------------------------+---------+-------------+
I actually don't understand why it needs to look at all the records in isi_issues table. Shouldn't it just be matching up by the isi_articles (cited) on issue_id? Both fields are indexed.
For a MySQL database of that size, you may want to consider setting up replication to a slave node, and then have your nightly database backups performed on the slave.
Yes -- some options to mysqldump will have the effect of locking all MyISAM tables while the backup is in progress, so that the backup is a consistent "snapshot" of a point in time.
InnoDB supports transactions, which make this unnecessary. It's also generally faster than MyISAM. You should use it. :)
I have a MySQL query that is copying data from one table to another for processing. For some reason, this query that normally takes a few seconds locked up overnight and ran for several hours. When I logged in this morning, I tried to kill the query, but it is still listed in the process list.
| Id | User | Host | db | Command | Time | State | Info |
+---------+----------+-----------+------+---------+-------+--------------+--------------------------------------------------------------------------------------+
| 1061763 | tb_admin | localhost | dw | Killed | 45299 | Sending data | INSERT INTO email_data_inno_stage SELECT * FROM email_data_test LIMIT 4480000, 10000 |
| 1062614 | tb_admin | localhost | dw | Killed | 863 | Sending data | INSERT INTO email_data_inno_stage SELECT * FROM email_data_test LIMIT 4480000, 10000 |
What could have caused this, and how can I kill this process so I can get on with my work?
If the table email_data_test is MyISAM and it was locked, that would have held up the the INSERT.
If the table email_data_test is InnoDB, then a lot of MVCC data was being written in ib_logfiles, which may not have occurred yet.
In both cases, you had the LIMIT clause scroll through 4,480,000 rows just to get to 10,000 rows you actually needed to INSERT.
Killing the query only causes the InnoDB table email_data_inno_stage to execute a rollback.