Prevent race conditions across multiple rows - sql-server-2008

I have read a lot about preventing race conditions, but typically with one record in an upsert scenario. For example:
Atomic UPSERT in SQL Server 2005
I have a different requirement, and it is to prevent race conditions across multiple rows. For example, say I have the following table structure:
GiftCards:
GiftCardId int primary key not null,
OriginalAmount money not null
GiftCardTransactions:
TransactionId int primary key not null,
GiftCardId int (foreign key to GiftCards.GiftCardId),
Amount money not null
There could be multiple processes inserting into GiftCardTransactions and I need to prevent inserting if SUM(GiftCardTransactions.Amount) + insertingAmount would go over GiftCards.OriginalAmount.
I know I could use TABLOCKX on GiftCardTransactions, but obviously this would not be feasible for lots of transactions. Another way would be to add a GiftCards.RemainingAmount column and then I only need to lock one row (though with possibility of lock escalation), but unfortunately this isn't an option for me at this time (would this have been the best option?).
Instead of trying to prevent inserting in the first place, maybe the answer is to just insert, then select SUM(GiftCardTransactions.Amount), and rollback if necessary. This is an edge case, so I'm not worried about unnecessarily using up PK values, etc.
So the question is, without modifying the table structure and using any combination of transactions, isolation levels and hints, how can I achieve this with a minimal amount of locking?

I have run into this exact situation in the past and ended up using SP_GetAppLock to create a semaphore on a key to prevent a race condition. I wrote up an article several years ago discussing various methods. The article is here:
http://www.sqlservercentral.com/articles/Miscellaneous/2649/
The basic idea is that you acquire a lock on a constructed key that is separate from the table. In this way, you can be very precise and only block spids that would potentially create a race condition and not block other consumers of the table.
I've left the meat of the article below but I would apply this technique by acquiring a lock on a constructed key such as
#Key = 'GiftCardTransaction' + GiftCardId
Acquiring a lock on this key (and ensuring you consistently apply this approach) would prevent any potential race condition as the first to acquire the lock would do it's work with all other requests waited for the lock to be released (or time out, depending on how your want your app to work.)
The meat of the article is here:
SP_getapplock is a wrapper for the extended procedure XP_USERLOCK. It allows you to use SQL SERVERs locking mechanism to manage concurrency outside the scope of tables and rows. It can be used you to marshal PROC calls in the same way the above solutions with some additional features.
Sp_getapplock adds locks directly to the server memory which keeps your overhead low.
Second, you can specify a lock timeout without needing to change session settings. In cases where you only want one call for a particular key to run, a quick timeout would ensure the proc doesn't hold up execution of the application for very long.
Third, sp_getapplock returns a status which can be useful in determining if the code should run at all. Again, in cases where you only want one call for a particular key, a return code of 1 would tell you that the lock was granted successfully after waiting for other incompatible locks to be released, thus you can exit without running any more code (like an existence check, for example).
The synax is as follows:
sp_getapplock [ #Resource = ] 'resource_name',
[ #LockMode = ] 'lock_mode'
[ , [ #LockOwner = ] 'lock_owner' ]
[ , [ #LockTimeout = ] 'value' ]
An example using sp_getapplock
/************** Proc Code **************/
CREATE PROC dbo.GetAppLockTest
AS
BEGIN TRAN
EXEC sp_getapplock #Resource = #key, #Lockmode = 'Exclusive'
/*Code goes here*/
EXEC sp_releaseapplock #Resource = #key
COMMIT
I know it goes without saying, but since the scope of sp_getapplock's locks is an explicit transaction, be sure to SET XACT_ABORT ON, or include checks in code to ensure a ROLLBACK happens where required.

My T-SQL is a little rusty, but here is my shot at a solution. The trick is to take an update lock on all transactions for that gift card at the beginning of the transaction, so that as long as all procedures don't read uncommitted data (which is the default behavior), this effectively will lock the transactions of the targeted gift card only.
CREATE PROC dbo.AddGiftCardTransaction
(#GiftCardID int,
#TransactionAmount float,
#id int out)
AS
BEGIN
BEGIN TRANS
DECLARE #TotalPriorTransAmount float;
SET #TotalPriorTransAmount = SELECT SUM(Amount)
FROM dbo.GiftCardTransactions WTIH UPDLOCK
WHERE GiftCardId = #GiftCardID;
IF #TotalPriorTransAmount + #TransactionAmount > SELECT TOP 1 OriginalAmout
FROM GiftCards WHERE GiftCardID = #GiftCardID;
BEGIN
PRINT 'Transaction would exceed GiftCard Value'
set #id = null
RETURN
END
ELSE
BEGIN
INSERT INTO dbo.GiftCardTransactions (GiftCardId, Amount)
VALUES (#GiftCardID, #TransactionAmount);
set #id = ##identity
RETURN
END
COMMIT TRANS
END
While this is very explicit, I think it would be more efficient, and more T-SQL friendly to use a rollback statement like:
BEGIN
BEGIN TRANS
INSERT INTO dbo.GiftCardTransactions (GiftCardId, Amount)
VALUES (#GiftCardID, #TransactionAmount);
IF (SELECT SUM(Amount)
FROM dbo.GiftCardTransactions WTIH UPDLOCK
WHERE GiftCardId = #GiftCardID)
>
(SELECT TOP 1 OriginalAmout FROM GiftCards
WHERE GiftCardID = #GiftCardID)
BEGIN
PRINT 'Transaction would exceed GiftCard Value'
set #id = null
ROLLBACK TRANS
END
ELSE
BEGIN
set #id = ##identity
COMMIT TRANS
END
END

Related

SQLAlchemy bulk update strategies

I am currently writing a web app (Flask) using SQLAlchemy (on GAE, connecting to Google's cloud MySQL) and needing to do bulk updates of a table. In short, a number of calculations are done resulting in a single value needing to be updated on 1000's of objects. At the moment I'm doing it all in a transaction, but still at the end, the flush/commit is taking ages.
The table has an index on id and this is all carried out in a single transaction. So I believe I've avoided the usual mistakes, but is is still very slow.
INFO 2017-01-26 00:45:46,412 log.py:109] UPDATE wallet SET balance=%(balance)s WHERE wallet.id = %(wallet_id)s
2017-01-26 00:45:46,418 INFO sqlalchemy.engine.base.Engine ({'wallet_id': u'3c291a05-e2ed-11e6-9b55-19626d8c7624', 'balance': 1.8711760000000002}, {'wallet_id': u'3c352035-e2ed-11e6-a64c-19626d8c7624', 'balance': 1.5875759999999999}, {'wallet_id': u'3c52c047-e2ed-11e6-a903-19626d8c7624', 'balance': 1.441656}
From my understanding there is no way to do a bulk update in SQL actually, and the statement above ends up being multiple UPDATE statements being sent to the server.
I've tried using Session.bulk_update_mappings() but that doesn't seem to actually do anything :( Not sure why, but the updates never actually happen. I can't see any examples of this method actually being used (including in the performance suite) so not sure if it is intended to be used.
One technique I've seen discussed is doing a bulk insert into another table and then doing an UPDATE JOIN. I've given it a test, like below, and it seems to be significantly faster.
wallets = db_session.query(Wallet).all()
ledgers = [ Ledger(id=w.id, amount=w._balance) for w in wallets ]
db_session.bulk_save_objects(ledgers)
db_session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
db_session.execute('TRUNCATE ledger')
But the problem now is how to structure my code. I'm using the ORM and I need to somehow not 'dirty' the original Wallet objects so that they don't get committed in the old way. I could just create these Ledger objects instead and keep a list of them about and then manually insert them at the end of my bulk operation. But that almost smells like I'm replicating some of the work of the ORM mechanism.
Is there a smarter way to do this? So far my brain is going down something like:
class Wallet(Base):
...
_balance = Column(Float)
...
#property
def balance(self):
# first check if we have a ledger of the same id
# and return the amount in that, otherwise...
return self._balance
#balance.setter
def balance(self, amount):
l = Ledger(id=self.id, amount=amount)
# add l to a list somewhere then process later
# At the end of the transaction, do a bulk insert of Ledgers
# and then do an UPDATE JOIN and TRUNCATE
As I said, this all seems to be fighting against the tools I (may) have. Is there a better way to be handling this? Can I tap into the ORM mechanism to be doing this? Or is there an even better way to do the bulk updates?
EDIT: Or is there maybe something clever with events and sessions? Maybe before_flush?
EDIT 2: So I have tried to tap into the event machinery and now have this:
#event.listens_for(SignallingSession, 'before_flush')
def before_flush(session, flush_context, instances):
ledgers = []
if session.dirty:
for elem in session.dirty:
if ( session.is_modified(elem, include_collections=False) ):
if isinstance(elem, Wallet):
session.expunge(elem)
ledgers.append(Ledger(id=elem.id, amount=elem.balance))
if ledgers:
session.bulk_save_objects(ledgers)
session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
session.execute('TRUNCATE ledger')
Which seems pretty hacky and evil to me, but appears to work OK. Any pitfalls, or better approaches?
-Matt
What you're essentially doing is bypassing the ORM in order to optimize the performance. Therefore, don't be surprised that you're "replicating the work the ORM is doing" because that's exactly what you need to do.
Unless you have a lot of places where you need to do bulk updates like this, I would recommend against the magical event approach; simply writing the explicit queries is much more straightforward.
What I recommend doing is using SQLAlchemy Core instead of the ORM to do the update:
ledger = Table("ledger", db.metadata,
Column("wallet_id", Integer, primary_key=True),
Column("new_balance", Float),
prefixes=["TEMPORARY"],
)
wallets = db_session.query(Wallet).all()
# figure out new balances
balance_map = {}
for w in wallets:
balance_map[w.id] = calculate_new_balance(w)
# create temp table with balances we need to update
ledger.create(bind=db.session.get_bind())
# insert update data
db.session.execute(ledger.insert().values([{"wallet_id": k, "new_balance": v}
for k, v in balance_map.items()])
# perform update
db.session.execute(Wallet.__table__
.update()
.values(balance=ledger.c.new_balance)
.where(Wallet.__table__.c.id == ledger.c.wallet_id))
# drop temp table
ledger.drop(bind=db.session.get_bind())
# commit changes
db.session.commit()
Generally it is poor schema design to need to update thousands of rows frequently. That aside...
Plan A: Write ORM code that generates
START TRANSACTION;
UPDATE wallet SET balance = ... WHERE id = ...;
UPDATE wallet SET balance = ... WHERE id = ...;
UPDATE wallet SET balance = ... WHERE id = ...;
...
COMMIT;
Plan B: Write ORM code that generates
CREATE TEMPORARY TABLE ToDo (
id ...,
new_balance ...
);
INSERT INTO ToDo -- either one row at a time, or a bulk insert
UPDATE wallet
JOIN ToDo USING(id)
SET wallet.balance = ToDo.new_balance; -- bulk update
(Check the syntax; test; etc.)

Can I avoid two MySQL events running at the same time?

I have an event in MySQL that I want to run very frequently at least every 30 seconds.
It is processing data from a queue table that contains recently updated records. Sometimes I receive large batches of updates. When this occurs the event may take longer to run than the usual 2-3 seconds. If it is still running at the time of the next schedule, I want the next event to skip execution.
The best way I could think about doing this is to create a 'state' table in which I set a specific key to 1 when the process starts, set it back to 0 when it is complete.
I'd then alter the event to check the current status.
I'd prefer to do something nicer than that. Is there a feature I am missing completely?
I've looked into global variables but based on the documentation these only seem permissible for system variables.
Current Example Code
Here is the example code I'm currently testing.
acca_sync: BEGIN
DECLARE EXIT HANDLER FOR SQLEXCEPTION BEGIN
GET DIAGNOSTICS CONDITION 1 #sqlstate = RETURNED_SQLSTATE,
#errno = MYSQL_ERRNO, #text = MESSAGE_TEXT;
SET #full_error = CONCAT("ERROR ", #errno, " (", #sqlstate, "): ", #text);
call pa.log(concat("acca-acc_sync"," - Error - ", ifnull(#full_error, "no error message")));
UPDATE `acca`.`processing_state`
SET `value` = 0
WHERE `key` = 'acca_sync';
END;
call pa.log(CONCAT("Started acca_sync # ", NOW()));
SELECT `value`
into #is_locked
from `acca`.`processing_state`
where `key` = 'acca_sync';
IF #is_locked = 0 THEN
UPDATE `acca`.`processing_state`
SET `value` = 1
WHERE `key` = 'acca_sync';
ELSE
CALL pa.log(CONCAT("acca_sync deferred due to active sync. # ", NOW()));
LEAVE acca_sync;
END IF;
call acca.event_sync();
call pa.log(CONCAT("Completed acca_sync # ", NOW()));
UPDATE `acca`.`processing_state`
SET `value` = 0
WHERE `key` = 'acca_sync';
END
Table Locking
Based on a comment I want to explain why I am not using a table lock. My experience with table locks is limited so I hope the below is correct and makes sense.
Source Data Tables
I have triggers that notify updates in queue tables. These are my source data tables from which I read data to process it.
My understanding is that a READ lock on these tables would not lock any other events that just read.
If I was to use a WRITE lock I would block any updates to any of the rows that I don't currently access.
Target Data Tables
I have multiple data sources that process data in different events. The rows these will amend in the target tables. There may be two different events running at the same time writing to the same table, so I don't artificially want to block the target tables
Other Tables
I could create a fake table that I would only have for the purpose of setting and then checking the existance of a lock. This seems absurd and I'd much rather instead create a single table locks with a lock_key and an is_locked column that I query each time.
You can verify if is there any event running:
SELECT *
FROM performance_schema.threads
WHERE NAME LIKE '%event_worker%'
AND TYPE='FOREGROUND'
Other interesting table to watch:
SELECT * FROM information_schema.processlist

Attempt to fetch logical page in database 2 failed. It belongs to allocation unit X not to Y

Started to get following error when executing certain SP. Code related to this error is pretty simple, joining #temp table to real table
Full text of error:
Msg 605, Level 21, State 3, Procedure spSSRSRPTIncorrectRevenue, Line 123
Attempt to fetch logical page (1:558552) in database 2 failed. It belongs to allocation unit 2089673263876079616 not to 4179358581172469760.
Here is what I found:
https://support.microsoft.com/en-us/kb/2015739
This suggests some kind of issue with database. I run DBCC CHECKDB on user database and on temp database - all passes.
Second thing I'm doing - trying to find which table those allocation units belong
SELECT au.allocation_unit_id, OBJECT_NAME(p.object_id) AS table_name, fg.name AS filegroup_name,
au.type_desc AS allocation_type, au.data_pages, partition_number
FROM sys.allocation_units AS au
JOIN sys.partitions AS p ON au.container_id = p.partition_id
JOIN sys.filegroups AS fg ON fg.data_space_id = au.data_space_id
WHERE au.allocation_unit_id in(2089673263876079616, 4179358581172469760)
ORDER BY au.allocation_unit_id
This returns 2 objects in tempdb, not in user db. So, it makes me think it's some kind of data corruption in tempdb? I'm developer, not DBA. Any suggestions on what I should check next?
Also, when I run query above, how can I tell REAL object name that I understand? Like #myTempTable______... instead of #07C650CE
I was able to resolve this by clearing the SQL caches:
DBCC FREEPROCCACHE
GO
DBCC DROPCLEANBUFFERS
GO
Apparently restarting the SQL service would have had the same affect.
(via Made By SQL, reproduced here to help others!)
I have like your get errors too.
firstly you must backing up to table or object for dont panic more after. I tryed below steps on my Database.
step 1:
Backing up table (data movement to other table as manuel or vs..how can you do)
I used to below codes to my table move other table
--CODE-
set nocount on;
DECLARE #Counter INT = 1;
DECLARE #LastRecord INT = 10000000; --your table_count
WHILE #Counter < #LastRecord
BEGIN
BEGIN TRY
BEGIN
insert into your_table_new SELECT * FROM your_table WHERE your_column= #Counter --dont forget! create your_table_new before
END
END TRY
BEGIN CATCH
BEGIN
insert into error_code select #Counter,'error_number' --dont forget the create error_code table before.
END
END CATCH
SET #Counter += 1;
END;
step 2:
-DBCC CHECKTABLE(your_table , REPAIR_REBUILD )
GO
check your table. if you have an error go to other step_3.
step 3:
!!attention!! you can lost some data/datas on your table. but dont worry. so you backed-up your table in step_1.
-DBCC CHECKTABLE(your_table , REPAIR_ALLOW_DATA_LOSS)
GO
Good luck!
~~pektas
In my case, truncating and re-populating data in the concerned tables was the solution.
Most probably the data inside tables was corrupted.
Database ID 2 means your tempdb is corrupted. Fixing tempdp is easy. Restart sqlserver service and you are good to go.
This could be an instance of a bug Microsoft fixed on SQL Server 2008 with queries on temporary tables that self reference (for example we have experienced it when loading data from a real table to a temporary table while filtering any rows we already have populated in the temp table in a previous step).
It seems that it only happens on temporary tables with no identity/primary key, so a workaround is to add one, although if you patch CU3 or later you also can enable the hotfix via turning a trace flag on.
For more details on the bug/fixes: https://support.microsoft.com/en-us/help/960770/fix-you-receive-error-605-and-error-824-when-you-run-a-query-that-inse

Update multiple mysql rows with 1 query?

I am porting client DB to new one with different post titles and rows ID's , but he wants to keep the hits from old website,
he has over 500 articles in new DB , and updating one is not an issue with this query
UPDATE blog_posts
SET hits=8523 WHERE title LIKE '%slim charger%' AND category = 2
but how would I go by doing this for all 500 articles with 1 query ? I already have export query from old db with post title and hits so we could find the new ones easier
INSERT INTO `news_items` (`title`, `hits`) VALUES
('Slim charger- your new friend', 8523 )...
the only reference in both tables is product name word within the title everything else is different , id , full title ...
Make a tmp table for old data in old_posts
UPDATE new_posts LEFT JOIN old_posts ON new_posts.title = old_posts.title SET new_posts.hits = old_posts.hits;
Unfortunately that's not how it works, you will have to write a script/program that does a loop.
articles cursor;
selection articlesTable%rowtype;
WHILE(FETCH(cursor into selection)%hasNext)
Insert into newTable selection;
END WHILE
How you bridge it is up to you, but that's the basic pseudo code/PLSQL.
The APIs for selecting from one DB and putting into another vary by DBMS, so you will need a common intermediate format. Basically take the record from the first DB, stick it into a struct in the programming language of your choice, and prefrom an insert using those struct values using the APIs for the other DBMS.
I'm not 100% sure that you can update multiple records at once, but I think what you want to do is use a loop in combination with the update query.
However, if you have 2 tables with absolutely no relationship or common identifiers between them, you are kind of in a hard place. The hard place in this instance would mean you have to do them all manually :(
The last possible idea to save you is that the id's might be different, but they might still have the same order. If that is the case you can still loop through the old table and update the number table as I described above.
You can build a procedure that'll do it for you:
CREATE PROCEDURE insert_news_items()
BEGIN
DECLARE news_items_cur CURSOR FOR
SELECT title, hits
FROM blog_posts
WHERE title LIKE '%slim charger%' AND category = 2;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
OPEN news_items_cur;
LOOP
IF done THEN
LEAVE read_loop;
END IF;
FETCH news_items_cur
INTO title, hits;
INSERT INTO `news_items` (`title`, `hits`) VALUES (title, hits);
END LOOP;
CLOSE news_items_cur;
END;

processing data with perl - selecting for update usage with mysql

I have a table that is storing data that needs to be processed. I have id, status, data in the table. I'm currently going through and selecting id, data where status = #. I'm then doing an update immediately after the select, changing the status # so that it won't be selected again.
my program is multithreaded and sometimes I get threads that grab the same id as they are both querying the table at a relatively close time to each other, causing the grab of the same id. i looked into select for update, however, i either did the query wrong, or i'm not understanding what it is used for.
my goal is to find a way of grabbing the id, data that i need and setting the status so that no other thread tries to grab and process the same data. here is the code i tried. (i wrote it all together for show purpose here. i have my prepares set at the beginning of the program as to not do a prepare for each time it's ran, just in case anyone was concerned there)
my $select = $db->prepare("SELECT id, data FROM `TestTable` WHERE _status=4 LIMIT ? FOR UPDATE") or die $DBI::errstr;
if ($select->execute($limit))
{
while ($data = $select->fetchrow_hashref())
{
my $update_status = $db->prepare( "UPDATE `TestTable` SET _status = ?, data = ? WHERE _id=?");
$update_status->execute(10, "", $data->{_id});
push(#array_hash, $data);
}
}
when i run this, if doing multiple threads, i'll get many duplicate inserts, when trying to do an insert after i process my transaction data.
i'm not terribly familiar with mysql and the research i've done, i haven't found anything that really cleared this up for me.
thanks
As a sanity check, are you using InnoDB? MyISAM has zero transactional support, aside from faking it with full table locking.
I don't see where you're starting a transaction. MySQL's autocommit option is on by default, so starting a transaction and later committing would be necessary unless you turned off autocommit.
It looks like you simply rely on the database locking mechanisms. I googled perl dbi locking and found this:
$dbh->do("LOCK TABLES foo WRITE, bar READ");
$sth->prepare("SELECT x,y,z FROM bar");
$sth2->prepare("INSERT INTO foo SET a = ?");
while (#ary = $sth->fetchrow_array()) {
$sth2->$execute($ary[0]);
}
$sth2->finish();
$sth->finish();
$dbh->do("UNLOCK TABLES");
Not really saying GIYF as I am also fairly novice at both MySQL and DBI, but perhaps you can find other answers that way.
Another option might be as follows, and this only works if you control all the code accessing the data. You can create lock column in the table. When your code accesses the table it (pseudocode):
if row.lock != 1
row.lock = 1
read row
update row
row.lock = 0
next
else
sleep 1
redo
again though, this trusts that all users/script that access this data will agree to follow this policy. If you cannot ensure that then this won't work.
Anyway thats all the knowledge I have on the topic. Good Luck!