Mysql Innodb deadlock problems on REPLACE INTO - mysql

I want to update the statistic count in mysql.
The SQL is as follow:
REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)
Schema :
CREATE TABLE `record_amount` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`owner` varchar(50) NOT NULL ,
`source` varchar(50) NOT NULL ,
`day_time` varchar(10) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).
I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:
change REPLACE INTO to transaction with SELECT id FOR UPDATE and UPDATE
change REPLACE INTO to INSERT ... ON DUPLICATE KEY UPDATE
I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!

Are you building a summary table, one source row at a time? And effectively doing UPDATE ... count = count+1? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.
INSERT INTO summary (source, owner, day_time, count)
SELECT source, owner, day_time, COUNT(*)
FROM raw
GROUP BY source, owner, day_time
ON DUPLICATE KEY UPDATE count = count + VALUES(count);
A single statement approximately like that will do all the work at virtually disk I/O speed. No SELECT ... FOR UPDATE. No deadlocks. No multiple threads. Etc.
Further improvements:
Get rid of the AUTO_INCREMENT; turn the UNIQUE into PRIMARY KEY.
day_time -- is that a DATETIME truncated to an hour? (Or something like that.) Use DATETIME, you will have much more flexibility in querying.
To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.
If the data is coming from a file, do LOAD DATA to shovel it into a temp table raw so that the above INSERT..SELECT can work. If it is of manageable size, make raw Engine=MEMORY to avoid any I/O for it.
If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.

Related

How to efficiently update values without a primary key in MySQL?

I am currently facing an issue with designing a database table and updating/inserting values into it.
The table is used to collect and aggregate statistics that are identified by:
the source
the user
the statistic
an optional material (e.g. item type)
an optional entity (e.g. animal)
My main issue is, that my proposed primary key is too large because of VARCHARs that are used to identify a statistic.
My current table is created like this:
CREATE TABLE `Statistics` (
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL)
In particular, the server_id is configurable, the player_id is a UUID, statistic is the representation of an enumeration that may change, material and entity likewise. The value is then aggregated using SUM() to calculate the overall statistic.
So far it works but I have to use DELETE AND INSERT statements whenever I want to update a value, because I have no primary key and I can't figure out how to create such a primary key in the constraints of MySQL.
My main question is: How can I efficiently update values in this table and insert them when they are not currently present without resorting to deleting all the rows and inserting new ones?
The main issue seems to be the restriction MySQL puts on the primary key. I don't think adding an id column would solve this.
Simply add an auto-incremented id:
CREATE TABLE `Statistics` (
statistis_id int auto_increment primary key,
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL
);
Voila! A primary key. But you probably want an index. One that comes to mind:
create index idx_statistics_server_player_statistic on statistics(server_id, player_id, statistic)`
Depending on what your code looks like, you might want additional or different keys in the index, or more than one index.
Follow the below hope it will solve your problem :-
- First use a variable let suppose "detailed" as money with your table.
- in your project when you use insert statement then before using statement get the maximum of detailed (SELECT MAX(detailed)+1 as maxid FROM TABLE_NAME( and use this as use number which will help you to FETCH,DELETE the record.
-you can also update with this also BUT during update MAXIMUM of detailed is not required.
Hope you understand this and it will help you .
I have dug a bit more through the internet and optimized my code a lot.
I asked this question because of bad performance, which I assumed was because of the DELETE and INSERT statements following each other.
I was thinking that I could try to reduce the load by doing INSERT IGNORE statements followed by UPDATE statements or INSERT .. ON DUPLICATE KEY UPDATE statements. But they require keys to be useful which I haven't had access to, because of constraints in MySQL.
I have fixed the performance issues though:
By reducing the amount of statements generated asynchronously (I know JDBC is blocking but it worked, it just blocked thousand of threads) and disabling auto-commit, I was able to improve the performance by 600 times (from 60 seconds down to 0.1 seconds).
Next steps are to improve the connection string and gaining even more performance.

MySQL performance - Selecting and deleting from a large table

I have a large table called "queue". It has 12 million records right now.
CREATE TABLE `queue` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` varchar(64) DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`target` varchar(64) DEFAULT NULL,
`name` varchar(64) DEFAULT NULL,
`state` int(11) DEFAULT '0',
`timestamp` int(11) DEFAULT '0',
`errors` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_unique` (`userid`,`action`,`target`),
KEY `idx_userid` (`userid`),
KEY `idx_state` (`state`)
) ENGINE=InnoDB;
Multiple PHP workers (150) use this table simultaneously.
They select a record, perform a network request using the selected data and then delete the record.
I get mixed execution times from the select and delete queries. Is the delete command locking the table?
What would be the best approach for this scenario?
SELECT record + NETWORK request + DELETE the record
SELECT record + NETWORK request + MARK record as completed + DELETE completed records using a cron from time to time (I don't want an even bigger table).
Note: The queue gets new records every minute but the INSERT query is not the issue here.
Any help is appreciated.
"Don't queue it, just do it". That is, if the tasks are rather fast, it is better to simply perform the action and not queue it. Databases don't make good queuing mechanisms.
DELETE does not lock an InnoDB table. However, you can write a DELETE that seems that naughty. Let's see your actual SQL so we can work in improving it.
12M records? That's a huge backlog; what's up?
Shrink the datatypes so that the table is not gigabytes:
action is only a small set of possible values? Normalize it down to a 1-byte ENUM or TINYINT UNSIGNED.
Ditto for state -- surely it does not need a 4-byte code?
There is no need for INDEX(userid) since there is already an index (UNIQUE) starting with userid.
If state has only a few value, the index won't be used. Let's see your enqueue and dequeue queries so we can discuss how to either get rid of that index or make it 'composite' (and useful).
What's the current value of MAX(id)? Is it threatening to exceed your current limit of about 4 billion for INT UNSIGNED?
How does PHP use the queue? Does it hang onto an item via an InnoDB transaction? That defeats any parallelism! Or does it change state. Show us the code; perhaps the lock & unlock can be made less invasive. It should be possible to run a single autocommitted UPDATE to grab a row and its id. Then, later, do an autocommitted DELETE with very little impact.
I do not see a good index for grabbing a pending item. Again, let's see the code.
150 seems like a lot -- have you experimented with fewer? They may be stumbling over each other.
Is the Slowlog turned on (with a low value for long_query_time)? If so, I wonder what is the 'worst' query. In situations like this, the answer may be surprising.

Innodb autoincrement per custom column, bad concurrency

We have table MySql 5.5:
CREATE TABLE IF NOT EXISTS `invoices` (
`id` varchar(36) NOT NULL,
`client_id` smallint(4) NOT NULL,
`invoice_number` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `client_id_2` (`client_id`,`invoice_number`),
KEY `client_id` (`client_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
We insert data into that table like this:
INSERT INTO `invoices` ( `id` , `client_id` , `invoice_number` )
VALUES (
UUID(),
10 ,
( SELECT (MAX(`invoice_number`) +1) as next_invoice_number FROM `invoices` WHERE `client_id` = 10 )
);
"10" is client_id value.
It works but, it has bad concurrency. How can I have working solution, which has good concurrency?
Composite-primary-key auto increment is not a solution. We need autoincrement per client_id value. Composite-primary-key auto increment gives autoincrement all over table not per client_id column value.
Not sure what you meant by bad concurrency here. Though every DML operation runs on implicit transaction, you can as well wrap it in a explicit transaction block by using begin transaction ... end construct.
It seems that mysql is really locking the whole table on insert into ... select.
Below a strategy which is working for us (for a similar problem) in pseudocode
function insert_user(){
begin_transaction
next_invoice_number = select_max_invoice_number + 1
insert_user(next_invoice_number)
end_transaction
}
function perform_insert(){
try
insert_user
catch RecordNotUniqueError
perform_insert
end
}
This requires performing a query in some high level programming language.
You basically start a tansaction, where you first perform a query to read the next invoice number for the user. Afterwards you perform the insert query with next_invoice_number and hope for the best. In case there is a concurrent process trying to insert with the same invoice number for the same user, the transaction will fail for one of the processes. It can then try to repeat it. At the end there should be no concurrent operation for the same invoice number and every transaction will succeed.
I see a number of issues here.
First, for each invoice registration you are scanning the same table to find what the next invoice number should be used for this particular customer.
A far faster solution is to have a table with two columns: Customer_ID (key) and Last invoice ID.
Whenever you need to register a new invoice, you simply get-and-update the new invoice number from this new table and use it in the insert.
Second, what makes you think that the operation you are showing in your example should not lock the table?
Since this is happening only sometimes, the best solution is to minimize the probability of a collision, and the approach presented here will certainly do that.
Reformulate the query this way. This is probably simpler and faster.
INSERT INTO `invoices` ( `id` , `client_id` , `invoice_number` )
SELECT UUID(),
10 ,
MAX(`invoice_number`) +1
FROM `invoices`
WHERE `client_id` = 10;
Is this in a transaction by itself? With autocommit=1?
Or is this a part of a much larger set of commands? And possibly they are part of what is leading to the error?
How will you subsequently get the UUID and/or invoice_number for the client? Doesn't the application need to display them and/or store them in some other table?

INSERT ... ON DUPLICATE UPDATE - Lock wait time out

I am struggling with INSERT .. ON DUPLICATE KEY UPDATE for a file on a big InnoDB table.
My values table saves the details for each entity belonging to an client. An entity can have only one value for a particular key. So when a change is happening we are updating the same. The table looks something like below:
CREATE TABLE `key_values` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`client_id` int(11) NOT NULL COMMENT 'customer/tenant id',
`key_id` int(11) NOT NULL COMMENT 'reference to the keys',
`entity_id` bigint(20) NOT NULL,
`value` text,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `client_id` (`client_id`,`entity_id`,`key_id`),
KEY `client_id_2` (`client_id`,`key_id`)
) ;
All writes queries are of the form:
INSERT INTO `key_values`
(client_id, key_id, entity_id,value)
values
(23, 47, 147, 'myValue'), (...), (...)...
ON DUPLICATE KEY UPDATE value = values(value);
The table is around 350M records by now and is growing pretty fast.
The writes to table can happen from real time integration often
inserting less than 10 rows or as a bulk of 25K from offline sources.
For a given client, only one bulk operation can run at a time. This is reduce the row locks between insert
Lock wait time out period is set at 50 seconds
Currently, when the offline activities are happening sometimes(not always) we are getting an lock wait time-out. What could be possible changes without to avoid the time out ?
A design change at the moment is not possible ( sharding/partitioning/cluster).
REPLACE is another candidate, but I dont want to give delete privilege in production to anything from code.
INSERT IGNORE and then UPDATE is a good candidate, but will it give much improvement?
What other options do I have?
Thanks in advance for all suggestion and answers.
Regarding the lock wait timeout, this can be changed via the mysql configuration setting innodb_lock_wait_timeout which can be modified dynamically (without restarting mysql), in addition to changing it in your my.cnf.
Regarding the lock waits, one thing to consider with mysql is the default transaction isolation level, which is REPEATABLE READ. The side effect of this setting is that much more locking occurs for reads that you might expect (especially if you had a SQL Server background, which has a default tran iso level of READ COMMITTED). Now, if you don't need REPEATABLE READ, you can change your tran iso level, either in a query, using the SET TRANSACTION ISOLATION LEVEL syntax, or for the whole server, using the config setting transaction-isolation. I recommend using READ COMMITTED, and consider if there are other places in your application where even 'dirtier' reads are acceptable (in which case you can use READ UNCOMMITTED.

How to improve INSERT performance on a very large MySQL table

I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:
(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )
CREATE TABLE IF NOT EXISTS items (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
key VARCHAR(10) NOT NULL,
busy TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY (id, name),
UNIQUE KEY name_key_unique_key (name, key),
INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;
Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:
CREATE TEMPORARY TABLE temp_items (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200) NOT NULL,
key VARCHAR(10) NOT NULL,
busy TINYINT(1) NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY (id)
)
ENGINE=MyISAM;
LOAD DATA LOCAL INFILE 'file_to_process.csv'
INTO TABLE temp_items
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '\"'
(name, key, created_at, updated_at);
INSERT INTO items (name, key, busy, created_at, updated_at)
(
SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at
FROM temp_items
)
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();
DROP TEMPORARY TABLE temp_items;
The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem.
I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.
How can I improve the performance of my script?
Thanks to all in advance.
You can use the following methods to speed up inserts:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements. If you are adding data to a nonempty table, you can tune the bulk_insert_buffer_size variable to make data insertion even faster.
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements.
Take advantage of the fact that columns have default values. Insert values explicitly only when the value to be inserted differs from the default. This reduces the parsing that MySQL must do and improves the insert speed.
Reference: MySQL.com: 8.2.4.1 Optimizing INSERT Statements
Your linear key on name and the large indexes slows things down.
LINEAR KEY needs to be calculated every insert.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html
can you show us some example data of file_to_process.csv maybe a better schema should be build.
Edit looked more closely
INSERT INTO items (name, key, busy, created_at, updated_at)
(
SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at
FROM temp_items
)
this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.
There is a piece of documentation I would like to point out, Speed of INSERT Statements.
By thinking in java ;
Divide the object list into the partitions and generate batch insert statement for each partition.
Utilize CPU cores and available db connections efficiently, nice new java features can help to achieve parallelism easily(e.g.paralel, forkjoin) or you can create your custom thread pool optimized with number of CPU cores you have and feed your threads from centralized blocking queue in order to invoke batch insert prepared statements.
Decrease the number of indexes on the target table if possible. If foreign key is not really needed, just drop it. Less indexes faster inserts.
Avoid using Hibernate except CRUD operations, always write SQL for complex selects.
Decrease number of joins in your query, instead forcing the DB, use java streams for filtering, aggregating and transformation.
If you feel that you do not have to do, do not combine select and inserts as one sql statement
Add rewriteBatchedStatements=true to your JDBC string, it will help to decrease TCP level communication between app and DB.
Use #Transactional for the methods that carry out insert batch and write rollback methods yourself.
You could use
load data local infile ''
REPLACE
into table
etc...
The REPLACE ensure that any duplicate value is overwritten with the new values.
Add a SET updated_at=now() at the end and you're done.
There is no need for the temporary table.