MySQL InnoDB Gap Lock on Update with where clause by PK? - mysql

I'm getting locks in update operations that doesn't seem to be related to each other.
This is the DB Context:
MySQL 5.7
InnoDB engine
Read Committed Isolation Level
Optimistic Locking concurrency control in the application
The table structure is something like this:
CREATE TABLE `external_user` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`status` varchar(30) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_user_status` (`status`),
KEY `idx_user_id` (`user_id`) USING BTREE,
);
The structure's been simplified. The real one has more attributes and some FKs to other tables.
The process is something like this:
Process 1
BEGIN;
update external_user
set user_id=33333
where (id in (400000, 400002, 400028............., 420000))
and user_id = 22222;
This is a long running query that modifies around 20k rows. Using between is not an option because we don't update all the consecutive records.
At the same time a second process starts.
Process 2
BEGIN;
update external_user
set status='disabled', user_id = 44444
where id = 10000;
It turns out that this second update is waiting for the first one to complete. So there's a lock held in the first query.
I've been reading a lot about locking in MySQL, but I couldn't find anything about updates that in where clause have a PK filter with in operator and another filter by an attribute that has an non-unique index (that is also being changed in the set clause).
Is the first query obtaining a gap lock because of the non-unique index filter? Is it possible? Even though the PK is provided as a filter?
Note: I don't have access to the engine in order to obtain more detailed information.

Related

MySQL InnoDB: locking the destination of foreign keys

When I add a row that references another table (in a transaction), MySQL seems to lock the whole row that's being referenced. This prevents updates of other columns in the destination table that should be able to run concurrently without any problem.
Simplified example:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`num` int(11) DEFAULT NULL,
UNIQUE KEY `id` (`id`)
);
CREATE TABLE `bar` (
`foo` int(11) NOT NULL,
KEY `foo` (`foo`),
CONSTRAINT `bar_ibfk_1` FOREIGN KEY (`foo`) REFERENCES `t1` (`id`)
);
INSERT INTO `t1` VALUES (1,1),(2,0),(3,4);
Task A:
BEGIN;
insert into bar(foo) values(2);
-- now we ask task B to do some work for us
Task B:
-- when triggered by Task A, tries to do this:
update t1 set num=num+1 where id=2;
-- does not complete because it waits for the lock
Any ideas how to avoid this deadlock? Task A should only read-lock the single value which it actually refers to, so Task B couldn't renumber or delete t1[id=2].id but would otherwise be free to update that row. Is it possible to convince MySQL to do this?
Splitting t1 into two linked tables (one for Task A to refer to and one for task B to update) would result in a heap of fairly intrusive refactoring.
Joining the tasks is not an option because B's work changes global state, thus must be able to commit even if A fails.
Switching to Postgres (which supports this; I checked) is not an easily-executed option.
This is a behavior of MySQL foreign keys that frankly convinces many projects to avoid using foreign key constraints, even though their database logically has foreign key references.
You can't lock just one column of a row. InnoDB effectively locks the whole row against update or delete if an exclusive lock exists on a child row that references it. The idea is that while a child row is depending on that parent row and is in progress of an insert/update/delete, the parent row shouldn't be deleted or its key modified. But you can't lock only the key column that is referenced by the child row.
The best solution is for the transaction against the child table to be finished and committed promptly. The fact that you tried to update the parent row and it timed out (a lock wait timeout is 50 seconds by default) indicates that you have left the transaction running too long.
P.S. What you described is simply a lock-wait. That's not a deadlock. A deadlock is when both transactions end up blocked, waiting for each other to release locks but neither can proceed because they are both waiting. A lock-wait is unidirectional. A deadlock is a cycle of mutual lock-waits.

MySQL Slow INSERT on related big tables, 100% CPU use

I am building a website (LAMP stack) with an Amazon RDS MySQL instance as the back end (type db.m3.medium).
I am happy with database integrity, and it works perfectly with regards to SELECT/JOIN/ETC queries (everything is normalized, indexed, and foreign keyed, all tables have id primary keys and relevant secondary keys / unique keys).
I have a table 'df_products' with approx half a million products in it. The products need to be updated nightly. The process involves a PHP script reading over a large products data-file and inserting data into several tables (products table, product_colours table, brands table, etc), calling either INSERT or UPDATE depending on whether or not a row already exists. This is done as one giant transaction.
What I am seeing is the UPDATE commands are sufficiently fast (50/sec, not exactly lightning but it should do), however the INSERT commands are super slow (1/sec) and appear to be consuming 100% of the CPU. On a dual core instance we see 50% CPU use (i.e. one full core).
I assume that this is because indexes (1x PRIMARY + 5x INDEX + 1x UNIQUE + 1x FULLTEXT) are being rebuilt after every INSERT. However I though that putting the entire process into one transaction should stop indexes being rebuilt until the transaction is committed.
I have tried setting the following params via PHP but there is negligible performance improvement:
$this->db->query('SET unique_checks=0');
$this->db->query('SET foreign_key_checks=0;');
The process will take weeks to complete at this rate so we must improve performance. Google appears to suggest using LOAD DATA. However:
I would have to generate five files in order to populate five tables
The process would have to use UPDATE commands as opposed to INSERT since the tables already exist
I would still need to loop over the products and scan the database for what values already do and don't exist
The database is entirely InnoDB and I don't plan to move to MyISAM (I want transactions, foreign keys, etc). This means that I cannot disable indexes. Even if I did it would probably be a big performance drain as we need to check if a row already exists before we insert it, and without an index this will be super slow.
I have provided the products table defition below for information. Can you please provide advice to what process we should be using to achieve faster INSERT/UPDATE on multiple large related tables? Or what optimisations we can make to our existing process?
Thank you,
CREATE TABLE `df_products` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_brand` int(11) NOT NULL,
`title` varchar(255) NOT NULL,
`id_gender` int(11) NOT NULL,
`id_colourSet` int(11) DEFAULT NULL,
`id_category` int(11) DEFAULT NULL,
`desc` varchar(500) DEFAULT NULL,
`seoAlias` varchar(255) CHARACTER SET ascii NOT NULL,
`runTimestamp` timestamp NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `seoAlias_UNIQUE` (`seoAlias`),
KEY `idx_brand` (`id_brand`),
KEY `idx_category` (`id_category`),
KEY `idx_seoAlias` (`seoAlias`),
KEY `idx_colourSetId` (`id_colourSet`),
KEY `idx_timestamp` (`runTimestamp`),
KEY `idx_gender` (`id_gender`),
FULLTEXT KEY `fulltext_title` (`title`),
CONSTRAINT `fk_id_colourSet` FOREIGN KEY (`id_colourSet`) REFERENCES `df_productcolours` (`id_colourSet`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_id_gender` FOREIGN KEY (`id_gender`) REFERENCES `df_lu_genders` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=285743 DEFAULT CHARSET=utf8
How many "genders" are there? If the usual 2, don't normalize it, don't index it, don't us a 4-byte INT to store it, use a CHAR(1) CHARACTER SET ascii (only 1 byte) or an ENUM (1 byte).
Each unnecessary index is a performance drain on the load, regardless of how it is done.
For INSERT vs UPDATE, look into using INSERT ... ON DUPLICATE KEY UPDATE.
Load the nightly data into a separate table (this could be MyISAM with no indexes). Then run one query to update existing rows and one to insert new rows. (Each needs a JOIN.) See http://mysql.rjweb.org/doc.php/staging_table, especially the 2 SQLs used for "normalizing". They can be adapted to your situation.
Any kind of multi-row query runs noticeably faster than 1-row at a time. (A 100-row INSERT runs 10 times as fast as 100 1-row inserts.)
innodb_flush_log_at_trx_commit = 2 will let the individual write statements run much faster. (Batching them as I suggest won't speed up much.)

Emulate MyISAM's composite primary key with an autoincrement behavior in InnoDB

In MySQL, if you have a MyISAM table that looks something like:
CREATE TABLE `table1` (
`col1` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`col2` INT(10) UNSIGNED NOT NULL,
PRIMARY KEY (`col2`, `col1`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
if you insert rows then the autoincrement base will be unique for every distinct col2 value. If my explanation isn't clear enough, this answer should explain better. InnoDB, however, doesn't follow this behavior. In fact, InnoDB won't even let you put col2 as first in the primary key definition.
My question is, is it possible to model this behavior in InnoDB somehow without resorting to methods like MAX(id)+1 or the likes? The closest I could find is this, but it's for PostgreSQL.
edit: misspelling in title
It's a neat feature of MyISAM that I have used before, but you can't do it with InnoDB. InnoDB determines the highest number on startup, then keeps the number in RAM and increments it when needed.
Since InnoDB handles simultaneous inserts/updates, it has to reserve the number at the start of a transaction. On a transaction rollback, the number is still "used" but not saved. Your MAX(id) solution could get you in trouble because of this. A transaction starts, the number is reserved, you pull the highest "saved" number + 1 in a separate transaction, which is the same as that reserved for the first transaction. The transaction finishes and the reserved number is now saved, conflicting with yours.
MAX(id) returns the highest saved number, not the highest used number. You could have a MyISAM table whose sole purpose to to generate the numbers you want. It's the same number of queries as you MAX(id) solution, it's just that one is a SELECT, the other an INSERT.

A simple INSERT query on InnoDB taking too much

I have this simple query:
INSERT IGNORE INTO beststat (bestid,period,rawView) VALUES ( 4510724 , 201205 , 1 )
On the table:
CREATE TABLE `beststat` (
`bestid` int(11) unsigned NOT NULL,
`period` mediumint(8) unsigned NOT NULL,
`view` mediumint(8) unsigned NOT NULL DEFAULT '0',
`rawView` mediumint(8) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`bestid`,`period`),
) ENGINE=InnoDB AUTO_INCREMENT=2020577 DEFAULT CHARSET=utf8
And it takes 1 sec to completes.
Side Note: actually it doesn't take always 1sec. Sometime it's done even in 0.05 sec. But often it takes 1 sec
This table (beststat) currently has ~500'000 records and its size is: 40MB. I have 4GB RAM and innodb buffer pool size = 104,857,600, with: Mysql: 5.1.49-3
This is the only InnoDB table in my database (others are MyISAM)
ANALYZE TABLE beststat shows: OK
Maybe there is something wrong with InnoDB settings?
I ran some simulations about 3 years ago as part of some evaluation project for a customer. They had a requirement to be able to search a table where data is constantly being added, and they wanted to be up to date up to a minute.
InnoDB has shown much better results in the beginning, but has quickly deteriorated (much before 1mil records), until I have removed all indexes (including primary). At that point InnoDB has become superior to MyISAM when executing inserts/updates. (I have much worse HW then you, executing tests only on my laptop.)
Conclusion: Insert will always suffer if you have indexes, and especially unique.
I would suggest following optimization:
Remove all indexes from your beststat table and use it as a simple dump.
If you really need these unique indexes, consider some programmable solution (like remembering the max bestid at all time, and insisting that the new record is above that number - and immediately increasing this number. (But do you really need so many unique fields - and they all sound to me just like indexes.)
Have a background thread move new records from InnoDB to another table (which can be MyISAM) where they would be indexed.
Consider dropping indexes temporarily and then after bulk update re-indexing the table, possibly switching two tables so that querying is never interrupted.
These are theoretical solutions, I admit, but is the best I can say given your question.
Oh, and if your table is planned to grow to many millions, consider a NoSQL solution.
So you have two unique indexes on the table. You primary key is a autonumber. Since this is not really part of the data as you add it to the data it is what you call a artificial primary key. Now you have a unique index on bestid and period. If bestid and period are supposed to be unique that would be a good candidate for the primary key.
Innodb stores the table either as a tree or a heap. If you don't define a primary key on a innodb table it is a heap if you define a primary key it is defined as a tree on disk. So in your case the tree is stored on disk based on the autonumber key. So when you create the second index it actually creates a second tree on disk with the bestid and period values in the index. The index does not contain the other columns in the table only bestid, period and you primary key value.
Ok so now you insert the data first thing myself does is to ensure the unique index is always unique. Thus it read the index to see if you are trying to insert a duplicate value. This is where the slow down comes into play. It first has to ensure uniqueness then if it passes the test write data. Then it also has to insert the bestid, period and primary key value into the unique index. So total operation would be 1 read index for value 1 insert row into table 1 insert bestid and period into index. A total of three operations. If you removed the autonumber and used only the unique index as the primary key it would read table if unique insert into table. In this case you would have the following number of operations 1 read table to check values 1 insert into tables. This is two operations vs three. So you do 33% less work by removing the redundant autonumber.
I hope this is clear as I am typing from my Android and autocorrect keeps on changing innodb to inborn. Wish I was at a computer.

mysql table 2 unique column

I have a table, called tablen.
It has this structure:
CREATE TABLE `tablen` (
`a` int(11) unsigned not null,
`b` int(11) unsigned not null,
unique key(a,b)
}
This table has 1 use. I hit it with a row of data. If it is found to be a new unique row not already in the table, it gets added, else I get the returned error code.
The main thing I guess is speed. I don't have the facility to stress test the setup at the moment and so...
What would you say is the best format for this table?
Innodb or myisam?
If you have a lot of inserts and updates, go for InnoDB, because it has row locking. MyISAM has table locking, which means, the whole table gets locked when a record is inserted (queuing all other inserts). If you have far more selects than inserts/updates then use MyISAM which is usually faster there (if you also don't care for foreign keys).