I am struggling with INSERT .. ON DUPLICATE KEY UPDATE for a file on a big InnoDB table.
My values table saves the details for each entity belonging to an client. An entity can have only one value for a particular key. So when a change is happening we are updating the same. The table looks something like below:
CREATE TABLE `key_values` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`client_id` int(11) NOT NULL COMMENT 'customer/tenant id',
`key_id` int(11) NOT NULL COMMENT 'reference to the keys',
`entity_id` bigint(20) NOT NULL,
`value` text,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `client_id` (`client_id`,`entity_id`,`key_id`),
KEY `client_id_2` (`client_id`,`key_id`)
) ;
All writes queries are of the form:
INSERT INTO `key_values`
(client_id, key_id, entity_id,value)
values
(23, 47, 147, 'myValue'), (...), (...)...
ON DUPLICATE KEY UPDATE value = values(value);
The table is around 350M records by now and is growing pretty fast.
The writes to table can happen from real time integration often
inserting less than 10 rows or as a bulk of 25K from offline sources.
For a given client, only one bulk operation can run at a time. This is reduce the row locks between insert
Lock wait time out period is set at 50 seconds
Currently, when the offline activities are happening sometimes(not always) we are getting an lock wait time-out. What could be possible changes without to avoid the time out ?
A design change at the moment is not possible ( sharding/partitioning/cluster).
REPLACE is another candidate, but I dont want to give delete privilege in production to anything from code.
INSERT IGNORE and then UPDATE is a good candidate, but will it give much improvement?
What other options do I have?
Thanks in advance for all suggestion and answers.
Regarding the lock wait timeout, this can be changed via the mysql configuration setting innodb_lock_wait_timeout which can be modified dynamically (without restarting mysql), in addition to changing it in your my.cnf.
Regarding the lock waits, one thing to consider with mysql is the default transaction isolation level, which is REPEATABLE READ. The side effect of this setting is that much more locking occurs for reads that you might expect (especially if you had a SQL Server background, which has a default tran iso level of READ COMMITTED). Now, if you don't need REPEATABLE READ, you can change your tran iso level, either in a query, using the SET TRANSACTION ISOLATION LEVEL syntax, or for the whole server, using the config setting transaction-isolation. I recommend using READ COMMITTED, and consider if there are other places in your application where even 'dirtier' reads are acceptable (in which case you can use READ UNCOMMITTED.
Related
I have a large table called "queue". It has 12 million records right now.
CREATE TABLE `queue` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` varchar(64) DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`target` varchar(64) DEFAULT NULL,
`name` varchar(64) DEFAULT NULL,
`state` int(11) DEFAULT '0',
`timestamp` int(11) DEFAULT '0',
`errors` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_unique` (`userid`,`action`,`target`),
KEY `idx_userid` (`userid`),
KEY `idx_state` (`state`)
) ENGINE=InnoDB;
Multiple PHP workers (150) use this table simultaneously.
They select a record, perform a network request using the selected data and then delete the record.
I get mixed execution times from the select and delete queries. Is the delete command locking the table?
What would be the best approach for this scenario?
SELECT record + NETWORK request + DELETE the record
SELECT record + NETWORK request + MARK record as completed + DELETE completed records using a cron from time to time (I don't want an even bigger table).
Note: The queue gets new records every minute but the INSERT query is not the issue here.
Any help is appreciated.
"Don't queue it, just do it". That is, if the tasks are rather fast, it is better to simply perform the action and not queue it. Databases don't make good queuing mechanisms.
DELETE does not lock an InnoDB table. However, you can write a DELETE that seems that naughty. Let's see your actual SQL so we can work in improving it.
12M records? That's a huge backlog; what's up?
Shrink the datatypes so that the table is not gigabytes:
action is only a small set of possible values? Normalize it down to a 1-byte ENUM or TINYINT UNSIGNED.
Ditto for state -- surely it does not need a 4-byte code?
There is no need for INDEX(userid) since there is already an index (UNIQUE) starting with userid.
If state has only a few value, the index won't be used. Let's see your enqueue and dequeue queries so we can discuss how to either get rid of that index or make it 'composite' (and useful).
What's the current value of MAX(id)? Is it threatening to exceed your current limit of about 4 billion for INT UNSIGNED?
How does PHP use the queue? Does it hang onto an item via an InnoDB transaction? That defeats any parallelism! Or does it change state. Show us the code; perhaps the lock & unlock can be made less invasive. It should be possible to run a single autocommitted UPDATE to grab a row and its id. Then, later, do an autocommitted DELETE with very little impact.
I do not see a good index for grabbing a pending item. Again, let's see the code.
150 seems like a lot -- have you experimented with fewer? They may be stumbling over each other.
Is the Slowlog turned on (with a low value for long_query_time)? If so, I wonder what is the 'worst' query. In situations like this, the answer may be surprising.
i count page view statistics in Mysql and sometimes get deat lock.
How can resolve this problem? Maybe i need remove one of key?
But what will happen with reading performance? Or is it not affect?
Table:
CREATE TABLE `pt_stat` (
`stat_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(11) unsigned NOT NULL,
`stat_name` varchar(50) NOT NULL,
`stat_value` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`stat_id`),
KEY `post_id` (`post_id`),
KEY `stat_name` (`stat_name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
Error: "Deadlock found when trying to get lock; try restarting transaction".
UPDATE pt_stat SET stat_value = stat_value + 1 WHERE post_id = "21500" AND stat_name = 'day_20170111';
When dealing with deadlocks, the first thing to do, always, is to see whether you have complex transactions deadlocking against eachother. This is the normal case. I assume based on your question that the update statement, however, is in its own transaction and therefore there are no complex interdependencies among writes from a logical database perspective.
Certain multi-threaded databases (including MySQL) can have single statements deadlock against themselves due to write dependencies within threads on the same query. MySQL is not alone here btw. MS SQL Server has been known to have similar problems in some cases and workloads. The problem (as you seem to grasp) is that a thread updating an index can deadlock against another thread that updates an index (and remember, InnoDB tables are indexes with leaf-nodes containing the row data).
In these cases there are three things you can look at doing:
If the problem is not severe, then the best option is generally to retry the transaction in case of deadlock.
You could reduce the number of background threads but this will affect both read and write performance, or
You could try removing an index (key). However, keep in mind that unindexed scans on MySQL are slow.
I want to update the statistic count in mysql.
The SQL is as follow:
REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)
Schema :
CREATE TABLE `record_amount` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`owner` varchar(50) NOT NULL ,
`source` varchar(50) NOT NULL ,
`day_time` varchar(10) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).
I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:
change REPLACE INTO to transaction with SELECT id FOR UPDATE and UPDATE
change REPLACE INTO to INSERT ... ON DUPLICATE KEY UPDATE
I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!
Are you building a summary table, one source row at a time? And effectively doing UPDATE ... count = count+1? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.
INSERT INTO summary (source, owner, day_time, count)
SELECT source, owner, day_time, COUNT(*)
FROM raw
GROUP BY source, owner, day_time
ON DUPLICATE KEY UPDATE count = count + VALUES(count);
A single statement approximately like that will do all the work at virtually disk I/O speed. No SELECT ... FOR UPDATE. No deadlocks. No multiple threads. Etc.
Further improvements:
Get rid of the AUTO_INCREMENT; turn the UNIQUE into PRIMARY KEY.
day_time -- is that a DATETIME truncated to an hour? (Or something like that.) Use DATETIME, you will have much more flexibility in querying.
To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.
If the data is coming from a file, do LOAD DATA to shovel it into a temp table raw so that the above INSERT..SELECT can work. If it is of manageable size, make raw Engine=MEMORY to avoid any I/O for it.
If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.
I have a database table that is giving me those headaches, errors when inserting lots of data. Let me break down what exactly happens and I'm hoping someone will have some insight into how I can get this figured out.
Basically I have a table that has 11+ million records in it and it's growing everyday. We track how times a user is viewing a video and their progress in that video. You can see below what the structure is like. Our setup is a master db with two slaves attached to it. Nightly we run a cron script to compile some statistical data out of this table and compile them into a couple other tables we use just for reporting. These cron scripts only do SELECT statements on the slave and will do the insert into our statistical tables on the master (so it'll propagate down). Like clockwork every time we run this script it will lock up our production table. I thought moving the SELECT to a slave would fix this issue and since we aren't even writing into the main table with the cron but rather other tables, I'm now perplexed what could possibly cause this locking up.
It's almost as if it seems that every time a large read on the main table (master or slave) it locks up the master. As soon as the cron is complete, the table goes back to normal performance.
My question is several levels about INNODB. I've had thoughts that it might be indexing that would cause this issue but maybe it's other variables on INNODB settings that I'm not fully understanding. As you would imagine, I want to keep the master from getting this lockup. I don't really care if the slave is pegged out during this script run as long as it won't effect my master db. Is this something that can happen with Slave/Master relationships in MYSQL?
The tables that are getting the compiled information to are stats_daily, stats_grouped for reference.
The biggest issue I have here, to restate a little, is that I don't understand what can cause the locking like this. Taking the reads off the master and just doing inserts into another table doesn't seem like it should do anything on the master original table. I can watch the errors start streaming in, however, 3 minutes after the script starts and it will end immediately when the script stops.
The table I'm working with is below.
CREATE TABLE IF NOT EXISTS `stats` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`VID` int(10) unsigned NOT NULL DEFAULT '0',
`UID` int(10) NOT NULL DEFAULT '0',
`Position` smallint(10) unsigned NOT NULL DEFAULT '0',
`Progress` decimal(3,2) NOT NULL DEFAULT '0.00',
`ViewCount` int(10) unsigned NOT NULL DEFAULT '0',
`DateFirstView` int(10) unsigned NOT NULL DEFAULT '0', // Use unixtimestamps
`DateLastView` int(10) unsigned NOT NULL DEFAULT '0', // Use unixtimestamps
PRIMARY KEY (`ID`),
KEY `VID` (`VID`,`UID`),
KEY `UID` (`UID`),
KEY `DateLastView` (`DateLastView`),
KEY `ViewCount` (`ViewCount`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=15004624 ;
Does anyone have any thoughts or ideas on this?
UPDATE:
The errors I get from the master DB
MysqlError: Lock wait timeout exceeded; try restarting transaction
Uncaught exception 'Exception' with message 'invalid query UPDATE stats SET VID = '13156', UID = '73859', Position = '0', Progress = '0.8', ViewCount = '1', DateFirstView = '1375789950', DateLastView = '1375790530' WHERE ID = 14752456
The update query fails because of the locking. The query is actually valid. I'll get 100s of these and afterwards I can randomly copy/paste these queries and they will work.
UPDATE 2
Queries and Explains from Cron Script
Query Ran on the Slave (leaving php variables in curly brackets for reference):
SELECT
VID,
COUNT(ID) as ViewCount,
DATE_FORMAT(FROM_UNIXTIME(DateLastView), '%Y-%m-%d') AS YearMonthDay,
{$today} as DateModified
FROM stats
WHERE DateLastView >= {$start_date} AND DateLastView <= {$end_date}
GROUP BY YearMonthDay, VID
EXPLAIN of the SELECT Stat
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE stats range DateLastView DateLastView 4 NULL 25242 Using where; Using temporary; Using filesort
That result set is looped and inserted into the compiled table. Unfortunately I don't have support for batched inserts with this (I tried) so I have to loop through these one at a time instead of sending a batch of 100 or 500 to the server at a time. This is inserted into the master DB.
foreach ($results as $result)
{
$query = "INSERT INTO stats_daily (VID, ViewCount, YearMonthDay, DateModified) VALUES ({$result->VID}, {$result->ViewCount}, '{$result->YearMonthDay}', {$today} );
DoQuery($query);
}
The GROUP BY is the culprit. Apparently MySQL decides to use a temporary table in this case (perhaps because the table has exceeded some limit) which is very inefficient.
I ran into similar problems, but no clear solution. You could consider splitting your stats table into two tables, a 'daily' and a 'history' table. Run your query on the 'daily' table which only contains entries from the latest 24 hours or whatever your interval is, then clean up the table.
To get the info into your permanent 'history' table, either write your stats into both tables from code, or copy them over from daily into history before cleanup.
See this previous question for some background. I'm trying to renumber a corrupted MPTT tree using SQL. The script is working fine logically, it is just much too slow.
I repeatedly need to execute these two queries:
UPDATE `tree`
SET `rght` = `rght` + 2
WHERE `rght` > currentLeft;
UPDATE `tree`
SET `lft` = `lft` + 2
WHERE `lft` > currentLeft;
The table is defined as such:
CREATE TABLE `tree` (
`id` char(36) NOT NULL DEFAULT '',
`parent_id` char(36) DEFAULT NULL,
`lft` int(11) unsigned DEFAULT NULL,
`rght` int(11) unsigned DEFAULT NULL,
... (a couple of more columns) ...,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `lft` (`lft`),
KEY `rght` (`rght`),
... (a few more indexes) ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The database is MySQL 5.1.37. There are currently ~120,000 records in the table. Each of the two UPDATE queries takes roughly 15 - 20 seconds to execute. The WHERE condition may apply to a majority of the records, so that almost all records need to be updated each time. In the worst case both queries are executed as many times as there are records in the database.
Is there a way to optimize this query by keeping the values in memory, delaying writing to disk, delaying index updates or something along these lines? The bottleneck seems to be hard disk throughput right now, as MySQL seems to be writing everything back to disk immediately.
Any suggestion appreciated.
I never used it, but if your have enough memory, try the memory table.
Create a table with the same structure as tree, insert into .. select from .., run your scripts against the memory table, and write it back.
Expanding on some ideas from comment as requested:
The default is to flush to disk after every commit. You can wrap multiple updates in a commit or change this parameter:
http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
The isolation level is simple to change. Just make sure the level fits your design. This probably won't help because a range update is being used. It's nice to know though when looking for some more concurrency:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
Ultimately, after noticing the range update in the query, your best bet is the MEMORY table that andrem pointed out. Also, you'll probably be able to find some performance by using a btree indexes instead of the default of hash:
http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/
You're updating indexed columns - indexes negatively impact (read: slow down) INSERT/UPDATEs.
If this is a one time need to get things correct:
Drop/delete the indexes on the columns being updated (lft, rght)
Run the update statements
Re-create the indexes (this can take time, possibly equivalent to what you already experience in total)