Simulating the execution of a stored procedure by multiple users - sql-server-2008

I have this trigger in SQL Server
ALTER TRIGGER [dbo].[myTrigger]
ON [dbo].[Data]
AFTER INSERT
AS
BEGIN
declare #number int
begin transaction
select top 1 #number = NextNumber FROM settings
Update Settings
set NextNumber = NextNumber + 1
UPDATE Data
set number = #nnumber, currentDate = GetDate(), IdUser = user_id(current_user)
FROM Data
INNER JOIN inserted on inserted.IdData = Data.IdData
commit transaction
END
It works as expected but I wonder if it will work as expected when multiple users add a new row in the table Data at the same time?

Let's analyze this code for a minute:
begin transaction
You begin a transaction using the default READCOMMITTED setting.
select top 1 #number = NextNumber FROM settings
You're selecting the highest number from the Settings table (btw: you should by all means add an ORDER BY clause - otherwise, no ordering is guaranteed! You might get unexpected results here).
This operation however isn't blocking - two or more threads can read the same value of e.g. 100 at the same time - the SELECT only takes a shared lock for a very brief period of time, and shared locks are compatible - multiple readers can read the value simultaneously.
Update Settings
set NextNumber = NextNumber + 1
Now here, one thread gets the green light and writes back the new value - 101 in our example - to the table. The table has an UPDATE lock (later escalated to an exclusive lock) which is exclusive - only one thread can write at the same time
UPDATE Data
set number = #nnumber, currentDate = GetDate(), IdUser = user_id(current_user)
FROM Data
INNER JOIN inserted on inserted.IdData = Data.IdData
Same thing - that one lucky thread gets to update the Data table, sets number to 100 and that table's row(s) it's updating are locked until the end of the transaction.
commit transaction
Now that lucky thread commits his transaction and is done.
HOWEVER: that second (and possibly third, fourth, fifth .....) thread that had read the same original value of 100 is still "in the loops" - now that thread #1 has completed, a second one of those threads gets to do its thing - which it does. It updates the Settings table correctly, to a new value of 102, and goes on doing its second update to the Data table, here using the "current" value of 100 that it has read into its #number variable, too....
In the end, you might have multiple threads that all read the same original value (100) from the Settings table), and each one of those will update the Settings table to the same "new" value (101).
This method you're using here is not safe under load.
Possible solutions:
first and foremost - the recommended way to do this: let the database handle this itself, by using a INT IDENTITY column in your table (or if you're already using SQL Server 2012 - use a SEQUENCE object to handle all the synchronization)
if you cannot do this - for whatever reasons - then at least make sure your code works even on a busy system! You need to e.g. use SELECT .... WITH (UPDLOCK) to put an (exclusive) UPDATE lock on the Settigns table when the first thread comes and reads the current value - that'll block all other threads from even reading the "current" value until the first thread has completed. Or there are alternatives like updating and assigning the old value in a single UPDATE operation.

Simulating the execution of a stored procedure by multiple users
You can use two (or more) edit windows in SQL Server Management Studio and execute something like this simultaneously in each window.
insert into Data(ColName) values ('Value')
go 10000
go 10000 will execute the batch 10000 times. Adjust that to whatever value you think is appropriate.

Related

How do MySQL Transactions work under the hood? Where does MySQL store the temporary value of a field?

I understand that a MySQL transaction will let you perform multiple inserts/updates at once where either all or none of them will succeed and I understand how to use them.
What I am not sure about is how MySQL manages to hold all the data over long periods of time and what affect this might have performance.
Selecting before another transaction has committed
I have a table of 100 people named "John" and I update every name to "Jane" in a loop inside a transaction, each update takes 1 second meaning it takes 100 seconds to finish the transaction.
If I make a select from another process during that 100 seconds the result will be "John" rather than "Jane". If I make a select after the transaction is committed it will return "Jane".
This is all fine and i'm not really confused about this works.
Selecting within a transaction
This is the more confusing bit.
I have a table of 100 people named "John" and I start a transaction in which I loop through and select each row one by one. Each select query takes 1 second so this takes 100 seconds.
After 50 seconds another process, not within a transaction, updates every row to "Jane".
In my first process, within the transaction, I will still receive "John" as a result even after the update to "Jane" has completed.
To be clear the timing would be like so:
12:00:00 - All rows say John and a select begins in a transaction that takes 1 second per row
12:00:30 - All rows are updated to Jane
12:00:31 - Row 31 is selected from the first transaction and still returns "John" rather than "Jane".
How does it work under the hood
So now I could execute SELECT name FROM names WHERE id = 31 at the exact same time and have one return "John" and one return "Jane" depending on whether I was in a transaction, or when the transaction started.
MySQL must then be storing the value of this field twice in some way.
Does it take a copy?
I don't think it takes a copy of the database or table, since when you begin a transaction it doesn't know what tables you're going to touch. You may not touch a table until 10 minutes into the transaction and yet the data is at it was 10 minutes ago, no matter how many modifications other processes made in the mean time.
I've also experimented with databases and tables that are GB's in size and take minutes to dump, there's no way it's making entire copies.
Temporary hold somewhere?
Perhaps it temporarily holds the value of the field somewhere waiting for the transaction to finish?
It would then need to check if there's a pending value when performing a select.
Therefore doing SELECT name FROM names WHERE id = 31 would be the equivalent of something like:
// John
if (pending_value('names', 'name', 31) {
// Jane
name = get_pending_value('names', 'name', 31);
} else {
// John
name = get_db_value('name', 'name', 31);
}
That is obviously very dumb pseudo code, but it's essentially saying "is there a pending update? If yes, use that instead"
This would presumably be held in memory somewhere? Or perhaps a file? Or one of the system databases?
How does it affect performance
If my names table had 1 billion rows and we performed the same queries then MySQL would simultaneously know that 1 billion rows held the value "John" and that 1 billion rows held the value "Jane". This must surely impact performance.
But is it the query within the transaction that is impacted or the query outside the transaction?
e.g.
Process 1 = Begin transaction
Process 2 = UPDATE names SET name = "Jane"
Process 1 = SELECT name FROM names WHERE id = 31 //John
Process 2 = SELECT name FROM names WHERE id = 31 //Jane
Does the query in step (3) or step (4) have a performance impact or both?
Some clues:
Read about "MVCC" -- MultiVersion Concurrency Control
A tentatively-changed row is kept until COMMIT or ROLLBACK. (See "history list" in the documentation.) This is row-by-row, not whole table or database. It will not "escalate the row locks to a table lock".
Each row of each table has a transaction_id. Each new transaction has a new, higher, id.
That xaction id, together with the "transaction isolation mode", determine which copy of each row your transaction can "see". So, yes, there can briefly be multiple "rows" WHERE id = 31.
Rows are locked, not tables. In some of your examples, transactions ran for a while, then stumbled over the 'same' row.
In some cases, the "gap" between rows is locked. (I did not notice that in your examples.)
Whole tables are locked only for DDL (Drop, Alter, etc), not DML (Select, Update, etc)
When a conflict occurs, a "deadlock" might occur. This is when each transaction is waiting for the other one to release a lock. One transaction is automatically rolled back.
When a conflict occurs, a "lock wait" might occur. This is when the transaction with a lock will eventually let go, letting the waiting transaction continue.
When a conflict occurs and "lock wait" occurs, innodb_lock_wait_timeout controls how long before giving up.
Every statement is inside a transaction. When autocommit=ON, each statement is a transaction unto itself. (Your last example is missing a BEGIN, in which case Process 2 has 2 separate transactions.)
In your first example, the isolation mode of read_uncommitted would let you see the other transaction's changes as they happened. That is a rarely used mode. The other modes won't let you see the changes until they are COMMITted, and it would never see the changes if it were ROLLBACK'd. (Yes, there was a copy of each changed row.)
repeatable_read mode (and others) effectively limit you to seeing only the rows with your transaction_id or older. Hence, even at 12:00:31, you still see "John".
General advice:
Don't write a transaction that runs longer than a few seconds
Remember to use SELECT ... FOR UPDATE where appropriate -- this adds a stronger lock on the rows in the SELECT just in case they will be updated or deleted in the transaction.
Where practical, it is better to have one INSERT adding 100 rows; that will be 10 times as fast as 100 single-row INSERTs. (Similarly for UPDATE and DELETE.)
Use SHOW ENGINE=InnoDB STATUS; (I find it useful in dealing with deadlocks, but cryptic for other purposes.)

How does getting mysql's last insert ID work with transactions? + transaction questions

A two part question:
In my CodeIgniter script, I'm starting a transaction, then inserting a row, setting the insert_id() to a php variable, inserting more rows into another table using the new ID as a foreign key, and then I commit everything.
So my question is: if everything does not commit before ending the transaction, how is mysql able to return the last insert ID, if nothing was even inserted? My script works (almost) perfectly, with the new ID being used in subsequent queries.
(I say "almost" because, using the PDO mysql driver, sometimes the first insert that is supposed to return the insert_id() is duplicated--it get's inserted twice. Any idea why that would be? Is that related to getting the last ID? It never happens if using the mysqli or mysql driver.)
I first wrote the script without transactions, so I have code that checks for mysql errors along the way, such as:
if(!$this->db->insert($table, $data)) {
//log message here
}
How does this affect the mysql process once I wrapped all my mysql code in a transaction? It's not causing any visible errors (hopefully unrelated to the problem stated above), but should it be removed?
Thank you.
To answer your first question...
When using transactions, your queries are executed normally as far as your connection is concerned. You can choose to commit, saving those changes, or rollback, reverting all of the changes. Consider the following pseudo-code:
insert into number(Random_number) values (rand());
select Random_number from number where Number_id=Last_insert_id();
//php
if($num < 1)
$this->db->query('rollback;'); // This number is too depressing.
else
$this->db->query('commit;'); // This number is just right.
The random number that was generated can be read prior to commit to ensure that it is suitable before saving it for everyone to see (e.g. commit and unlock the row).
If the PDO driver is not working, consider using the mysqli driver. If that is not an option, you can always use the query 'select last_insert_id() as id;' rather than the $this->db->insert_id() function.
To answer your second question, if you are inserting or updating data that other models will be updating or reading, be sure to use transactions. For example, if a column 'Number_remaining' is set to 1 the following problem can occur.
Person A reads 1
Person B reads 1
Person A wins $1000!
Person A updates 1 to be 0
Person B wins $1000!
Person B updates 0 to be 0
Using transactions in the same situation would yield this result:
Person A starts transaction Person A reads '1' from
Number_remaining (The row is now locked if select for update is used) Person B
attempts to read Number_remaining - forced to wait Person A wins
$1000 Person A updates 1 to be 0 Person A commits Person B
reads 0 Person B does not win $1000 Person B cries
You may want to read up on transaction isolation levels as well.
Be careful of deadlock, which can occur in this case:
Person A reads row 1 (select ... for update) Person B reads row
2 (select ... for update) Person A attempts to read row 2,
forced to wait Person B attempts to read row 1, forced to wait
Person A reaches innodb_lock_wait_timeout (default 50sec) and is
disconnected Person B reads row 1 and continues normally
At the end, since Person B has probably reached PHP's max_execution_time, the current query will finish executing independently of PHP, but no further queries will be received. If this was a transaction with autocommit=0, the query will automatically rollback when the connection to your PHP server is severed.

MySQL transactions vs locking

Quick question/clarification required here. I have a DB table that will quite possibly have simultaneous updates to a record. I am using Zend Framework for the application, and I have read about two directions to go to avoid this, first being table locking (LOCK TABLES test WRITE) or something like that, will go back and re-read how to do it exactly if that is the best solution. The second being transactions: $db->beginTransaction(); ... $db->commit();
Now 'assuming' I am using a transactional storage engine such as InnoDB, transactions seem like the more common solution. However does that avoid the following scenario:
User A is on a webpage -> submits data -> begin transaction -> read row -> calculate new value -> update row -> save -> commit
User B is on the same webpage at the same time and submits data at the same time, now lets just say it is almost simultaneous (User B calls the update function at a point between begin transaction and commit for User A's transaction) User B relies on the committed data from User A's transaction before it can achieve the accurate calculation for updating the record.
IE:
Opening value in database row : 5 User A submits a value of 5. (begin
transaction -> read value (5) -> add submitted value (5+5=10) -> write
the updated value -> save -> commit)
User B submits the value of 7. I need to make sure that the value of
User B's transaction read is 10, and not 5 (if the update isn't done
before read).
I know this is a long winded explanation, I apologize, I am not exactly sure of the correct terminology to simplify the question.
Thanks
transactions doesn't ensure locking. The whole block in transaction is treated as atomic update to db (if anything fails in between all previous changes of this block are rollback). So, two transactions running in parallel can update same row.
You need to use both.
Transaction do
row.lock
update row
end
see, if row level locking can make it easier for u.

SQL begin transaction with no commit

If I begin a transaction, but never get to call COMMIT. What happens to the data?
I have a smallish database (say a million or so SKU's).. I am exporting it in small even numbered chunks of 1024.. (third party constraints limit my file size).
I have to flag the records that have been exported..
eg. Update products set exported = 1 where sku = '1234';
Now every once and a while I have a problem that crashes the third party file writing tool.
But this happens before the file for the given record was created.
So I was thinking if I call begin transaction before I update the records, and commit only after I've confirmed the file was built.
This may result in a few begin transactions that don't have their twin.
So two questions.. Is there a better way? (apart from getting rid of the buggy third party)
Or what happens to records that were part of a transaction that was never committed?
Your transactions stay open with the locks until the connection is fully closed (not just returning to the connection pool). This is bad
To do an UPDATE without an explicit transaction and manage 1024-row chunks, do something like this
UPDATE TOP (1024)
Products
SET
exported = 1
OUTPUT
INSERTED.*
WHERE
exported = 0;
You can modify this to use a status column that has "Processing" "Exported" etc so you know when stuff was read but not exported

Row lock for update status

I have a table of "commands to do" with a status ('toprocess', 'processing', 'done')
I have several instances (amazon ec2) with a daemon asking for "commands to do".
The daemon asks for rows with status 'toprocess', then it processes, and at the end of each loop it changes the status to 'done'.
The thing is that, before starting that loop, I need to change all rows 'toprocess' to status 'processing', so other instances will not take the same rows, avoiding conflict.
I've read about innodb row locks, but I don't understand them very well ...
SELECT * from commands where status = 'toprocess'
then I need to take the ID's of these results, and update status to 'processing' , locking these rows until they are updated.
How can i do it ?
Thank you
You'd use a transaction , and read the data with FOR UPDATE, which will block other selects that include the FOR UPDATE on the rows that gets selected
begin transaction;
select * from commands where status = 'toprocess' for update;
for each row in the result:
add the data to an array/list for processing later.
update commands set status='processing' where id = row.id;
commit;
process all the data
Read a bit about the FOR UPDATE , and InnoDB isolation levels.
A possible (yet not very elegant) solution may be to first UPDATE the record, then read its data:
Each deamon will have a unique ID, and the table will have a new column named 'owner' for that ID.
Then the deamon will run something like "UPDATE table SET status='processing', owner='theDeamonId' where status='toprocess' ... LIMIT 1"
While the update runs the row is locked, so no other deamon can read it.
After the update this row is Owned by a specific deamon, then it can run a SELECT to fetch all necessary data from that row (WHERE status='processing' AND owner= 'theDeamonId').
Finally, the last UPDATE will set the row to 'processed', and may (or may not) remove the owner field. Keeping it there will also enable some statistics about the deamons' work.
As far as I know you can't use MySQL to lock a row (using a built-in method). You have two options though:
If your table should not be read by any other process until the locks are released then you can use table level locking as described here
You can implement your own basic row locking by updating a value in each row you're processing, and then have all your other daemons checking whether this property is set (a BIT data type would suffice).
InnoDB locks at a row level for reading and updating anyway, but if you want to lock the rows for an arbitrary period then you may have to go with the second option.