First, a bit of necessary background (please, bear with me). I work as a developers of a web application using MySQL for persistance. We have implemented audit logging by creating an audit trail table for each data table. We might for example have the following table definitions for a Customer entity:
-- Data table definition.
CREATE TABLE my_database.customers (
CustomerId INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
FirstName VARCHAR(255) NOT NULL,
LastName VARCHAR(255) NOT NULL,
-- More data columns, removed for simplicity.
...
);
-- Audit table definition in separate schema.
CREATE TABLE my_database_audittrail.customers (
CustomerId INT(11) DEFAULT NULL,
FirstName VARCHAR(255) DEFAULT NULL,
LastName VARCHAR(255) DEFAULT NULL,
-- More data columns.
...
-- Audit meta data columns.
ChangeTime DATETIME NOT NULL,
ChangeByUser VARCHAR(255) NOT NULL
);
As you can see, the audit table is simply a copy of the data table plus some metadata. Note that the audit table doesn't have any keys. When, for example, we update a customer, our ORM generates SQL similar to the following:
-- Insert a copy of the customer entity, before the update, into the audit table.
INSERT INTO my_database_audittrail.customers (
CustomerId,
FirstName,
LastName,
...
ChangeTime,
ChangeByUser)
)
SELECT
CustomerId,
FirstName,
LastName,
...
NOW(),
#ChangeByUser
FROM my_database.customers
WHERE CustomerId = #CustomerId;
-- Then update the data table.
UPDATE
my_database.customers
SET
FirstName = #FirstName,
LastName = #LastName,
...
WHERE CustomerId = #CustomerId;
This has worked well enough. Recently, however, we needed to add a primary key column to the audit tables for various reasons, changing the audit table definition to something similar to the following:
CREATE TABLE my_database_audittrail.customers (
__auditId INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY,
CustomerId INT(11) DEFAULT NULL,
FirstName VARCHAR(255) DEFAULT NULL,
LastName VARCHAR(255) DEFAULT NULL,
...
ChangeTime DATETIME NOT NULL,
ChangeByUser VARCHAR(255) NOT NULL
);
The SQL generated by our ORM when updating data tables has not been modified. This change seem to have increased the risk of deadlock very much. The system in question is a web application with a number of nightly batch jobs. The increase in deadlocks doesn't show in the day to day use of the system by our web users. The nightly batch jobs, however, do suffer from the deadlocks very much as they do intense work on a few database tables. Our "solution" has been to add a retry-upon-deadlock strategy (hardly controversial) and while this seems to work fine I would very much like to understand why the above change has increased the risk of deadlocks that much (and if we can somehow remedy the problem).
Further information:
Our nightly batch jobs do INSERTS, UPDATES and DELETES on our data tables. Only INSERTS are performed on the audit tables.
We use repeatable read isolation level on out database transactions.
Before this change, we haven't seen a single deadlock when running our nightly batch jobs.
UPDATE: Checked SHOW ENGINE INNODB STATUS to determine the cause of the deadlocks and found this:
*** WAITING FOR THIS LOCK TO BE GRANTED:
TABLE LOCK table `my_database_audittrail`.`customers` trx id 24972756464 lock mode AUTO-INC waiting
I was under the impression that auto increments was handled outside of any transactions in order to avoid using the same auto increment value in different transactions? But I guess the AUTO_INCREMENT property on the primary key we introduced seems to be the problem?
This is speculation.
Inserting or updating into a table with indexes not only locks the data pages but also the index pages, including the higher levels of the index. When multiple threads are affecting records at the same time, they may lock different portions of the index.
This would not generally show up with single record inserts. However, two statements that are updating multiple records might start acquiring locks on the index and find that they are deadlocking each other. Retry may be sufficient for fixing this problem. Alternatively, it seems that "too much" may be running at one time and you may want to consider how the nightly update work is laid out.
When inserting into tables with auto increment columns, MySQL uses different strategies to acquire values for the auto increments column(s) depending on which type of insert is made, on your insert statements and how MySQL is configured to handle auto increment columns, an insert may result in a complete table lock.
With "simple inserts", i.e inserts where MySQL can determine before hand the number of rows which will be inserted into a table (e.g INSERT INTO table (col1, col2) VALUES (val1, val2);) auto increment column values are acquired using a light weight lock on the auto increment counter. This light weight lock is released as soon as the auto increment values are acquired so one won't have to wait until the actual insert to complete. I.e no table lock.
However, with "bulk inserts", where MySQL cannot determine the number of inserted rows before hand (e.g INSERT INTO table (col1, col2) SELECT col1, col2 FROM table2 WHERE ...;) a table lock is created to acquire auto increment column values and not relinquished until the insert is completed.
The above is per MySQL's default configuration. MySQL can be configured to not use table locks on bulk inserts but this may cause auto increment columns to have different values on masters and slaves (if replication is set up) and thus may or may not be an viable option.
Related
I recently encountered an error in my application with concurrent transactions. Previously, auto-incrementing for compound key was implemented using the application itself using PHP. However, as I mentioned, the id got duplicated, and all sorts of issues happened which I painstakingly fixed manually afterward.
Now I have read about related issues and found suggestions to use trigger.
So I am planning on implementing a trigger somewhat like this.
DELIMITER $$
CREATE TRIGGER auto_increment_my_table
BEFORE INSERT ON my_table FOR EACH ROW
BEGIN
SET NEW.id = SELECT MAX(id) + 1 FROM my_table WHERE type = NEW.type;
END $$
DELIMITER ;
But my doubt regarding concurrency still remains. Like what if this trigger was executed concurrently and both got the same MAX(id) when querying?
Is this the correct way to handle my issue or is there any better way?
An example - how to solve autoincrementing in compound index.
CREATE TABLE test ( id INT,
type VARCHAR(192),
value INT,
PRIMARY KEY (id, type) );
-- create additional service table which will help
CREATE TABLE test_sevice ( type VARCHAR(192),
id INT AUTO_INCREMENT,
PRIMARY KEY (type, id) ) ENGINE = MyISAM;
-- create trigger which wil generate id value for new row
CREATE TRIGGER tr_bi_test_autoincrement
BEFORE INSERT
ON test
FOR EACH ROW
BEGIN
INSERT INTO test_sevice (type) VALUES (NEW.type);
SET NEW.id = LAST_INSERT_ID();
END
db<>fiddle here
creating a service table just to auto increment a value seems less than ideal for me. – Mohamed Mufeed
This table is extremely tiny - you may delete all records except one per group with largest autoincremented value in this group anytime. – Akina
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=61f0dc36db25dd5f0cf4647d8970cdee
You may schedule excess rows removing (for example, daily) in service event procedure.
I have managed to solve this issue.
The answer was somewhat in the direction of Akina's Answer. But not quite exactly.
The way I solved it did indeed involved an additional table but not like the way He suggested.
I created an additional table to store meta data about transactions.
Eg: I had table_key like this
CREATE TABLE `journals` (
`id` bigint NOT NULL AUTO_INCREMENT,
`type` smallint NOT NULL DEFAULT '0',
`trans_no` bigint NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `transaction` (`type`,`trans_no`)
)
So I created a meta_journals table like this
CREATE TABLE `meta_journals` (
`type` smallint NOT NULL,
`next_trans_no` bigint NOT NULL,
PRIMARY KEY (`type`),
)
and seeded it with all the different types of journals and the next sequence number.
And whenever I insert a new transaction to the journals I made sure to increment the next_trans_no of the corresponding type in the meta_transactions table. This increment operation is issued inside the same database TRANSACTION, i.e. inside the BEGIN AND COMMIT
This allowed me to use the exclusive lock acquired by the UPDATE statement on the row of meta_journals table. So when two insert statement is issued for the journal concurrently, One had to wait until the lock acquired by the other transaction is released by COMMITing.
I want to update the statistic count in mysql.
The SQL is as follow:
REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)
Schema :
CREATE TABLE `record_amount` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`owner` varchar(50) NOT NULL ,
`source` varchar(50) NOT NULL ,
`day_time` varchar(10) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).
I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:
change REPLACE INTO to transaction with SELECT id FOR UPDATE and UPDATE
change REPLACE INTO to INSERT ... ON DUPLICATE KEY UPDATE
I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!
Are you building a summary table, one source row at a time? And effectively doing UPDATE ... count = count+1? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.
INSERT INTO summary (source, owner, day_time, count)
SELECT source, owner, day_time, COUNT(*)
FROM raw
GROUP BY source, owner, day_time
ON DUPLICATE KEY UPDATE count = count + VALUES(count);
A single statement approximately like that will do all the work at virtually disk I/O speed. No SELECT ... FOR UPDATE. No deadlocks. No multiple threads. Etc.
Further improvements:
Get rid of the AUTO_INCREMENT; turn the UNIQUE into PRIMARY KEY.
day_time -- is that a DATETIME truncated to an hour? (Or something like that.) Use DATETIME, you will have much more flexibility in querying.
To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.
If the data is coming from a file, do LOAD DATA to shovel it into a temp table raw so that the above INSERT..SELECT can work. If it is of manageable size, make raw Engine=MEMORY to avoid any I/O for it.
If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.
I am struggling with INSERT .. ON DUPLICATE KEY UPDATE for a file on a big InnoDB table.
My values table saves the details for each entity belonging to an client. An entity can have only one value for a particular key. So when a change is happening we are updating the same. The table looks something like below:
CREATE TABLE `key_values` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`client_id` int(11) NOT NULL COMMENT 'customer/tenant id',
`key_id` int(11) NOT NULL COMMENT 'reference to the keys',
`entity_id` bigint(20) NOT NULL,
`value` text,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `client_id` (`client_id`,`entity_id`,`key_id`),
KEY `client_id_2` (`client_id`,`key_id`)
) ;
All writes queries are of the form:
INSERT INTO `key_values`
(client_id, key_id, entity_id,value)
values
(23, 47, 147, 'myValue'), (...), (...)...
ON DUPLICATE KEY UPDATE value = values(value);
The table is around 350M records by now and is growing pretty fast.
The writes to table can happen from real time integration often
inserting less than 10 rows or as a bulk of 25K from offline sources.
For a given client, only one bulk operation can run at a time. This is reduce the row locks between insert
Lock wait time out period is set at 50 seconds
Currently, when the offline activities are happening sometimes(not always) we are getting an lock wait time-out. What could be possible changes without to avoid the time out ?
A design change at the moment is not possible ( sharding/partitioning/cluster).
REPLACE is another candidate, but I dont want to give delete privilege in production to anything from code.
INSERT IGNORE and then UPDATE is a good candidate, but will it give much improvement?
What other options do I have?
Thanks in advance for all suggestion and answers.
Regarding the lock wait timeout, this can be changed via the mysql configuration setting innodb_lock_wait_timeout which can be modified dynamically (without restarting mysql), in addition to changing it in your my.cnf.
Regarding the lock waits, one thing to consider with mysql is the default transaction isolation level, which is REPEATABLE READ. The side effect of this setting is that much more locking occurs for reads that you might expect (especially if you had a SQL Server background, which has a default tran iso level of READ COMMITTED). Now, if you don't need REPEATABLE READ, you can change your tran iso level, either in a query, using the SET TRANSACTION ISOLATION LEVEL syntax, or for the whole server, using the config setting transaction-isolation. I recommend using READ COMMITTED, and consider if there are other places in your application where even 'dirtier' reads are acceptable (in which case you can use READ UNCOMMITTED.
I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:
(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )
CREATE TABLE IF NOT EXISTS items (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
key VARCHAR(10) NOT NULL,
busy TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY (id, name),
UNIQUE KEY name_key_unique_key (name, key),
INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;
Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:
CREATE TEMPORARY TABLE temp_items (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
key VARCHAR(10) NOT NULL,
busy TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY (id)
)
ENGINE=MyISAM;
LOAD DATA LOCAL INFILE 'file_to_process.csv'
INTO TABLE temp_items
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
(name, key, created_at, updated_at);
INSERT INTO items (name, key, busy, created_at, updated_at)
(
SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at
FROM temp_items
)
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();
DROP TEMPORARY TABLE temp_items;
The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem.
I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.
How can I improve the performance of my script?
Thanks to all in advance.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.
Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.
Reference: MySQL.com: 8.2.4.1 Optimizing INSERT Statements
Your linear key on name and the large indexes slows things down.
LINEAR KEY needs to be calculated every insert.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html
can you show us some example data of file_to_process.csv maybe a better schema should be build.
Edit looked more closely
INSERT INTO items (name, key, busy, created_at, updated_at)
(
SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at
FROM temp_items
)
this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.
There is a piece of documentation I would like to point out, Speed of INSERT Statements.
By thinking in java ;
Divide the object list into the partitions and generate batch insert statement for each partition.
Utilize CPU cores and available db connections efficiently, nice new java features can help to achieve parallelism easily(e.g.paralel, forkjoin) or you can create your custom thread pool optimized with number of CPU cores you have and feed your threads from centralized blocking queue in order to invoke batch insert prepared statements.
Decrease the number of indexes on the target table if possible. If foreign key is not really needed, just drop it. Less indexes faster inserts.
Avoid using Hibernate except CRUD operations, always write SQL for complex selects.
Decrease number of joins in your query, instead forcing the DB, use java streams for filtering, aggregating and transformation.
If you feel that you do not have to do, do not combine select and inserts as one sql statement
Add rewriteBatchedStatements=true to your JDBC string, it will help to decrease TCP level communication between app and DB.
Use #Transactional for the methods that carry out insert batch and write rollback methods yourself.
You could use
load data local infile ''
REPLACE
into table
etc...
The REPLACE ensure that any duplicate value is overwritten with the new values.
Add a SET updated_at=now() at the end and you're done.
There is no need for the temporary table.
I've run into an innodb locking issue for transactions on a table with both a primary key and a separate unique index. It seems if a TX deletes a record using a unique key, and then re-inserts that same record, this will result in a next-key lock instead of the expected record lock (since the key is unique). See below for a test case as well as breakdown of what records I expect to have what locks:
DROP TABLE IF EXISTS foo;
CREATE TABLE `foo` (
`i` INT(11) NOT NULL,
`j` INT(11) DEFAULT NULL,
PRIMARY KEY (`i`),
UNIQUE KEY `jk` (`j`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
INSERT INTO foo VALUES (5,5), (8,8), (11,11);
(Note: Just run the TX2 sql after the TX1 sql, in a separate connection)
TX1
START TRANSACTION;
DELETE FROM foo WHERE i=8;
results in exclusive lock on i=8 (no gap lock since i is primary key and unique)
INSERT INTO foo VALUES(8,8);
results in exclusive lock for i=8 & j= 8, and shared intention lock on i=6 & i=7, as well as j=6 & j=7
TX2
START TRANSACTION;
INSERT INTO foo VALUES(7,7);
results in exclusive lock for i=7 & j=7, as well as shared intention lock on on i=6 & j=6
I would expect TX2 to not be blocked by TX1, however it is. Oddly, the blocking seems to be related to the insert by TX1. I say this because if TX1's insert statement is not run after the delete, TX2's insert is not blocked. It's almost as if TX1's re-insertion of (8,8) causes a next-key lock on index j for (6,8].
Any insight would be much appreciated.
The problem you are experiencing happens because MySQL doesn't just lock the table row for a value you're going to insert, it locks all possible values between the previous id and the next id in order, so, reusing your example bellow:
DROP TABLE IF EXISTS foo;
CREATE TABLE `foo` (
`i` INT(11) NOT NULL,
`j` INT(11) DEFAULT NULL,
PRIMARY KEY (`i`),
UNIQUE KEY `jk` (`j`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
INSERT INTO foo VALUES (5,5), (8,8), (11,11);
Suppose you start with transaction TX1:
START TRANSACTION;
REPLACE INTO foo VALUES(8,8);
Then if you start a transaction TX2, whatever INSERT or REPLACE using an id between 5 and 11 will be locked:
START TRANSACTION;
REPLACE INTO foo VALUES(11,11);
Looks like MySQL uses this kind of locking to avoid the "phantom problem" described here: http://dev.mysql.com/doc/refman/5.0/en/innodb-next-key-locking.html, MySQL uses a "next-key locking", that combines index-row locking with gap locking, this means for us that it will lock a lot of possible ids between the previous and next ids, and will lock prev and next ids as well.
To avoid this try to create a server algorithm that inserts your records so that records inserted in different transactions don't overlap, or at least don't execute all your transactions at the same time so the TX doesn't have to wait one each other.
It seems as if the problem might lie in the fact that InnoDB indexes are weird.
The primary key (clustered) is i and there would be a rowid associated with it.
The unique key on j (nonclustered) has the rowid of i associated with the value of j in the index.
Doing a DELETE followed by an INSERT on the same key value for i should produce an upcoming different rowid for the primary key (clustered) and, likewise, an upcoming different rowid to associate with the value of j (nonclustered).
This would require some bizarre internal locking within MVCC mechanism.
You may need to change your Transaction Isolation Level to Allow Dirty Reads (i.e., not have repeatable reads)
Play some games with tx_isolation variable within a session
Try READ_COMMITTED and READ_UNCOMMITTED
Click here to see syntax for setting Isolation Level in a Session
Click here to see how there was once a bug concerning this within a Session and the warning on how to use it carefully
Otherwise, just permamnently set the following in /etc/my.cnf (Example)
[mysqld]
transaction_isolation=read-committed
Give it a try !!!
https://bugs.mysql.com/bug.php?id=68021
this bug issue answer your question.
This is the design flaw of InnoDB, the upstream used to fixed this issue to avoid gap lock in row_ins_scan_sec_index_for_duplicate in read-committed isolation. However it bring out another issue, the fix cause secondary index unique key violation silently, so the upstream revert this fix..