I am trying to run a multi-table update in MYSQL (Amazon RDS) and it is extremely slow.
What I am trying to do?
Remove all duplicate rows based on a 1 hour time frame.
Below I created a temp table to identify the duplicate rows in the table. This query runs in 2 seconds.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
CREATE TEMPORARY TABLE tmpIds (id int primary key);
INSERT into tmpIds
SELECT distinct
d.id
FROM api d INNER JOIN api orig
on d.domain_id = orig.domain_id and d.user_id = orig.user_id
WHERE
orig.created_at < d.created_at
AND d.created_at <= DATE_ADD(orig.created_at, Interval 1 hour)
AND d.type = 'api/check-end'
AND d.created_at >= '2016-08-01';
SET TRANSACTION ISOLATION LEVEL READ COMMITTED ;
The problem is the UPDATE query it takes way to long to run on the production server. It also locks the api table.
SET #TRIGGER_DISABLED = 1;
UPDATE
api
SET
deleted_at = now()
WHERE type = 'api/check-end' AND created_at >= '2016-08-01'
AND id IN (SELECT id FROM tmpIds);
SET #TRIGGER_DISABLED = 0;
I also tried this version:
SET #TRIGGER_DISABLED = 1;
UPDATE
api a,
tmpIds ti
SET
a.deleted_at = now()
WHERE
type = 'api/check-end' AND created_at >= '2016-08-01' AND a.domain_id < 10 AND a.id = ti.id;
SET #TRIGGER_DISABLED = 0;
STATS
Temp Table: 32,000 rows
api table: total - 250,000 rows, after where clause (type, created_at)
200,000 rows.
The api table has costly triggers, this is why I turned them
off.
Sample run for 1000 updates 6 minutes.
There is an index on the api table primary key
The problem was the following statement:
SET #TRIGGER_DISABLED = 1;
Was not disabling the triggers. I had to delete the UPDATE trigger on the api table and the UPDATE ran in 1.3 seconds.
Any help on the best way to disable triggers while running a query?
Related
I was doing the following operation on a table:
UPDATE tname
SET cname = cname + INTERVAL 8 HOUR;
In the table, the cname column is for timestamps and set as a primary key. The operation is to add 8 hours to all the values of cname column.
But the operation gets an error message because of a duplicate key. I don't know how this could happen exactly but what I guess is that there was column 'cname' which has the values with 8 hours interval.
So when it tries to add 8 hours and write it, it gets the duplicate key error.
I have two questions:
If the operation gets the error, is the table inconsistent? I mean some rows are added with 8 hours and some rows are not?
How can I complete this operation without duplicate key error?
The update itself is creating the duplicate; depending on the order in which rows are processed, you might end up with a new value that conflicts with an existing one.
A common workaround is to use order by:
UPDATE tname
SET cname = cname + INTERVAL 8 HOUR
ORDER BY cname desc;
The query starts by updating the greatest date, then processes the rows in descending order, which prevents conflicts from happening.
Yes, unless you execute the query in a transaction, and use ROLLBACK; if there's an error.
You can check first whether there are any duplicates:
UPDATE tname AS t
JOIN (
SELECT COUNT(*) AS count
FROM tname AS t1
JOIN tname AS t2 ON t1.cname = t2.cname + INTERVAL 8 HOUR) as dups
SET t.cname = t.cname + INTERVAL 8 HOUR
WHERE dups.count = 0
I have gone through almost all existing questions those are similar to this but did not find the answer for my question. Sorry if I have missed already posted questions those would answer this.
I have a MySQL table which I am using as a job queue. There are multiple workers which read jobs from these table.
The challenge is how to achieve this using MySQL queries on table.
I have to select rows and simultaneously update the job status. This should be automatic so no worker gets an already processing job.
I want to run following automatically (psuedo code):
select name, job_type from jobs where job_status = "created" limit 10;
foreach row {
update table jobs set job_status = "processing" where id = '$id';
}
Is this possible in MySQL using queries/stored procedures/cursor?
I believe that this is a solution for your case:
update jobs set job_status = "processing"
where id = '$id'
and job_status = "created"
These techniques rely on transactions, so be sure autocommit is off. You shouldn't be using autocommit anyway for performance and data integrity reasons.
This also relies on using the InnoDB table format. MyISAM does not support row level locking.
Use SELECT ... FOR UPDATE to put an exclusive lock on the returned rows. The lock will remain until the transaction is committed or rolled back.
select id, name, job_type
from jobs
where job_status = "created"
limit 10
for update;
foreach row {
update table jobs set job_status = "processing" where id = '$id';
...process...
delete from jobs where id = '$id';
}
commit
Unless you have a good reason to do so, you're better off just grabbing one row at a time. It's a simple, fast query and there's no reason to hold onto 10. In fact, if you hold onto 10 you can't commit after each successful job.
select id, name, job_type
from jobs
where job_status = "created"
limit 1
for update;
update table jobs set job_status = "processing" where id = '$id';
...process the job...
delete from jobs where id = '$id';
commit
We can cut this down even further. There's no need to set the job to processing, other transactions won't see the change anyway. We can rely on the exclusive lock.
select id, name, job_type
from jobs
where job_status = "created"
limit 1
for update;
...process $id...
delete from jobs where id = '$id';
commit
This is robust. If your worker crashes its lock will be removed and another worker can try the job again.
Alternatively, you can update one at time. This requires getting the ID of the row you just updated.
SET #update_id := 0;
UPDATE jobs
SET job_status = "processing", id = (SELECT #update_id := id)
WHERE job_status = "created"
LIMIT 1;
SELECT #update_id;
...do work on #update_id...
DELETE FROM jobs WHERE id = #update_id
COMMIT
This relies on UPDATE setting an exclusive lock on each updated row, other transactions will not be able to update that row. This is why it's a good idea to work on one at a time.
Alternatively, add a queued job status and which process owns it.
UPDATE jobs
SET job_status = "queued", job_owner = "$pid"
WHERE job_status = "created" limit 10;
COMMIT;
SELECT name, job_type
FROM jobs
WHERE job_status = "queued"
AND job_owner = "$pid"
foreach row {
UPDATE jobs SET job_status = "processing" where id = '$id';
... process ...
DELETE FROM jobs WHERE id = '$id';
COMMIT;
}
The downside of this approach is if a worker dies it will still own jobs. I would only recommend this approach if you have a long-lived master which controls the queue and assigns jobs to workers.
Finally, there are subtle problems with using MySQL as a queue. Consider getting a real queuing service.
Is there a way in mysql that I can find the number of rows that get locked when a certain query runs? Eg. for a query, what is the number of rows locked:-
UPDATE xyz SET ARCHIVE = 1 , LAST_MODIFIED = CURRENT_TIMESTAMP WHERE ID = '123' AND ARCHIVE = 0;
Assume in this case, there is a index on ID and Archive is part of primary key.
BEGIN;
# lock
UPDATE xyz SET ARCHIVE = 1 , LAST_MODIFIED = CURRENT_TIMESTAMP WHERE ID = '123' AND ARCHIVE = 0;
# returns locked rows (X)
SELECT trx_rows_locked FROM information_schema.innodb_trx;
# release
COMMIT;
We have a system that has a database based queue for processing items in threads instead of real time. It's currently implemented in Mybatis calling a this stored procedure in mysql:
DROP PROCEDURE IF EXISTS pop_invoice_queue;
DELIMITER ;;
CREATE PROCEDURE pop_invoice_queue(IN compId int(11), IN limitRet int(11)) BEGIN
SELECT LAST_INSERT_ID(id) as value, InvoiceQueue.* FROM InvoiceQueue
WHERE companyid = compId
AND (lastPopDate is null OR lastPopDate < DATE_SUB(NOW(), INTERVAL 3 MINUTE)) LIMIT limitRet FOR UPDATE;
UPDATE InvoiceQueue SET lastPopDate=NOW() WHERE id=LAST_INSERT_ID();
END;;
DELIMITER ;
The problem is that this pops N items from the queue but only updates the lastPopDate value for the last item popped off the queue. So if we call this stored procedure with limitRet = 5, it will pop five items off the queue and start working on them but only the fifth item will have a lastPopDate set so when the next thread comes and pops off the queue it will get items 1-4 and item 6.
How can we get this to update all N records 'popped' off the database?
If you are willing to add a BIGINT field to the table via:
ALTER TABLE InvoiceQueue
ADD uuid BIGINT NULL DEFAULT NULL,
INDEX ix_uuid (uuid);
then you can do the update first, and select the records updated, via:
CREATE PROCEDURE pop_invoice_queue(IN compId int(11), IN limitRet int(11))
BEGIN
SET #uuid = UUID_SHORT();
UPDATE InvoiceQueue
SET uuid = #uuid,
lastPopDate = NOW()
WHERE companyid = compId
AND uuid IS NULL
AND (lastPopDate IS NULL OR lastPopDate < NOW() - INTERVAL 3 MINUTE)
ORDER BY
id
LIMIT limitRet;
SELECT *
FROM InvoiceQueue
WHERE uuid = #uuid
FOR UPDATE;
END;;
For the UUID_SHORT() function to return unique values, it should be called no more than 16 million times a second per machine. Visit here for more details.
For performance, you may want to alter the lastPopDate field to be NOT NULL as the OR clause will cause your query to not use an index, even if one is available:
ALTER TABLE InvoiceQueue
MODIFY lastPopDate DATETIME NOT NULL DEFAULT '0000-00-00';
Then, if you do not already have one, you could add an index on the companyid/lastPopDate/uuid fields, as follows:
ALTER TABLE InvoiceQueue
ADD INDEX ix_company_lastpop (companyid, lastPopDate, uuid);
Then you can remove the OR clause from your UPDATE query:
UPDATE InvoiceQueue
SET uuid = #uuid,
lastPopDate = NOW()
WHERE companyid = compId
AND lastPopDate < NOW() - INTERVAL 3 MINUTE
ORDER BY
id
LIMIT limitRet;
which will use the index you just created.
Since mysql has neither collection nor output/returning clause, my suggestion is to use temporary tables. Something like :
CREATE TEMPORARY TABLE temp_data
SELECT LAST_INSERT_ID(id) as value, InvoiceQueue.* FROM InvoiceQueue
WHERE companyid = compId
AND (lastPopDate is null OR lastPopDate < DATE_SUB(NOW(), INTERVAL 3 MINUTE)) LIMIT limitRet FOR UPDATE;
UPDATE InvoiceQueue
INNER JOIN temp_data ON (InvoiceQueue.PKColumn = temp_data.PKColumn)
SET lastPopDate=NOW();
SELECT * FROM temp_data ;
DROP TEMPORARY TABLE temp_data;
Also, I surmise such select ... for update can cause deadlocks (surely, if the procedure is called from different sessions) - as far as I know order in which rows get locked is not guaranteed (even if you had order by, rows might be locked in different order). I'd recommend to double check documentation.
I'm trying to come up with a nice elegant solution for a voting system like SO's. If there's a way to do it with elegance using triggers I couldn't figure it out so I'm trying with stored procedures. This is what I've come up with, It's not pretty so I'm asking for ideas. I'll probably even have one query rather than the query+stored procedure. But I'd really like to know a clean way to update a user's points and insert/update votes. Points are in a separate table to be updated by procedure.
Upvote
INSERT INTO votes
ON DUPLICATE KEY
UPDATE votes
SET v.weight = v.weight + 1
WHERE v.weight = 0 OR v.weight = -1
AND v.userid = {$uid}
AND v.itemid = {$itemid}
//call procedure to +1 user points
Downvote
INSERT INTO votes
ON DUPLICATE KEY
UPDATE votes
SET v.weight = v.weight - 1
WHERE v.weight = 1 OR v.weight = 0
AND v.userid = {$uid}
AND v.itemid = {$itemid}
//call procedure to -1
Flipdown (when user changes vote from up to down)
INSERT INTO votes
ON DUPLICATE KEY
UPDATE votes
SET v.weight = -1
WHERE v.weight = 1
AND v.userid = {$uid}
AND v.itemid = {$itemid}
//call procedure to -2
Flipup
INSERT INTO votes
ON DUPLICATE KEY
UPDATE votes
SET v.weight = 0
WHERE v.weight = -1
AND v.userid = {$uid}
AND v.itemid = {$itemid}
//call procedure to +2
I assume that votes table has 3 columns (post_id, user_id, weight). You can use the following query:
insert into votes(post_id, user_id, weight)
values(post_id_in, user_id_in, weight_in)
on duplicate key update
set
weight = weight_in;
Use should have unique index on(post_id, user_id).
If you denormalize data and table posts has column for total votes that you need to recalculate it.
I personally do not see a need for stored procedure in your case. If you are tracking user votes this means that the user id is available to you. I would suggest opening a mysql transaction perform your insert in the votes table and then perform an update to keep track of the user's score. Then if both calls are successful commit the transaction this will ensure data integrity.
Maybe you could share the specific reasoning why you want to use procedures?