SQL begin transaction with no commit - sql-server-2008

If I begin a transaction, but never get to call COMMIT. What happens to the data?
I have a smallish database (say a million or so SKU's).. I am exporting it in small even numbered chunks of 1024.. (third party constraints limit my file size).
I have to flag the records that have been exported..
eg. Update products set exported = 1 where sku = '1234';
Now every once and a while I have a problem that crashes the third party file writing tool.
But this happens before the file for the given record was created.
So I was thinking if I call begin transaction before I update the records, and commit only after I've confirmed the file was built.
This may result in a few begin transactions that don't have their twin.
So two questions.. Is there a better way? (apart from getting rid of the buggy third party)
Or what happens to records that were part of a transaction that was never committed?

Your transactions stay open with the locks until the connection is fully closed (not just returning to the connection pool). This is bad
To do an UPDATE without an explicit transaction and manage 1024-row chunks, do something like this
UPDATE TOP (1024)
Products
SET
exported = 1
OUTPUT
INSERTED.*
WHERE
exported = 0;
You can modify this to use a status column that has "Processing" "Exported" etc so you know when stuff was read but not exported

Related

Getting stale results in multiprocessing environment

I am using 2 separate processes via multiprocessing in my application. Both have access to a MySQL database via sqlalchemy core (not the ORM). One process reads data from various sources and writes them to the database. The other process just reads the data from the database.
I have a query which gets the latest record from the a table and displays the id. However it always displays the first id which was created when I started the program rather than the latest inserted id (new rows are created every few seconds).
If I use a separate MySQL tool and run the query manually I get correct results, but SQL alchemy is always giving me stale results.
Since you can see the changes your writer process is making with another MySQL tool that means your writer process is indeed committing the data (at least, if you are using InnoDB it does).
InnoDB shows you the state of the database as of when you started your transaction. Whatever other tools you are using probably have an autocommit feature turned on where a new transaction is implicitly started following each query.
To see the changes in SQLAlchemy do as zzzeek suggests and change your monitoring/reader process to begin a new transaction.
One technique I've used to do this myself is to add autocommit=True to the execution_options of my queries, e.g.:
result = conn.execute( select( [table] ).where( table.c.id == 123 ).execution_options( autocommit=True ) )
assuming you're using innodb the data on your connection will appear "stale" for as long as you keep the current transaction running, or until you commit the other transaction. In order for one process to see the data from the other process, two things need to happen: 1. the transaction that created the new data needs to be committed and 2. the current transaction, assuming it's read some of that data already, needs to be rolled back or committed and started again. See The InnoDB Transaction Model and Locking.

How does getting mysql's last insert ID work with transactions? + transaction questions

A two part question:
In my CodeIgniter script, I'm starting a transaction, then inserting a row, setting the insert_id() to a php variable, inserting more rows into another table using the new ID as a foreign key, and then I commit everything.
So my question is: if everything does not commit before ending the transaction, how is mysql able to return the last insert ID, if nothing was even inserted? My script works (almost) perfectly, with the new ID being used in subsequent queries.
(I say "almost" because, using the PDO mysql driver, sometimes the first insert that is supposed to return the insert_id() is duplicated--it get's inserted twice. Any idea why that would be? Is that related to getting the last ID? It never happens if using the mysqli or mysql driver.)
I first wrote the script without transactions, so I have code that checks for mysql errors along the way, such as:
if(!$this->db->insert($table, $data)) {
//log message here
}
How does this affect the mysql process once I wrapped all my mysql code in a transaction? It's not causing any visible errors (hopefully unrelated to the problem stated above), but should it be removed?
Thank you.
To answer your first question...
When using transactions, your queries are executed normally as far as your connection is concerned. You can choose to commit, saving those changes, or rollback, reverting all of the changes. Consider the following pseudo-code:
insert into number(Random_number) values (rand());
select Random_number from number where Number_id=Last_insert_id();
//php
if($num < 1)
$this->db->query('rollback;'); // This number is too depressing.
else
$this->db->query('commit;'); // This number is just right.
The random number that was generated can be read prior to commit to ensure that it is suitable before saving it for everyone to see (e.g. commit and unlock the row).
If the PDO driver is not working, consider using the mysqli driver. If that is not an option, you can always use the query 'select last_insert_id() as id;' rather than the $this->db->insert_id() function.
To answer your second question, if you are inserting or updating data that other models will be updating or reading, be sure to use transactions. For example, if a column 'Number_remaining' is set to 1 the following problem can occur.
Person A reads 1
Person B reads 1
Person A wins $1000!
Person A updates 1 to be 0
Person B wins $1000!
Person B updates 0 to be 0
Using transactions in the same situation would yield this result:
Person A starts transaction Person A reads '1' from
Number_remaining (The row is now locked if select for update is used) Person B
attempts to read Number_remaining - forced to wait Person A wins
$1000 Person A updates 1 to be 0 Person A commits Person B
reads 0 Person B does not win $1000 Person B cries
You may want to read up on transaction isolation levels as well.
Be careful of deadlock, which can occur in this case:
Person A reads row 1 (select ... for update) Person B reads row
2 (select ... for update) Person A attempts to read row 2,
forced to wait Person B attempts to read row 1, forced to wait
Person A reaches innodb_lock_wait_timeout (default 50sec) and is
disconnected Person B reads row 1 and continues normally
At the end, since Person B has probably reached PHP's max_execution_time, the current query will finish executing independently of PHP, but no further queries will be received. If this was a transaction with autocommit=0, the query will automatically rollback when the connection to your PHP server is severed.

Simulating the execution of a stored procedure by multiple users

I have this trigger in SQL Server
ALTER TRIGGER [dbo].[myTrigger]
ON [dbo].[Data]
AFTER INSERT
AS
BEGIN
declare #number int
begin transaction
select top 1 #number = NextNumber FROM settings
Update Settings
set NextNumber = NextNumber + 1
UPDATE Data
set number = #nnumber, currentDate = GetDate(), IdUser = user_id(current_user)
FROM Data
INNER JOIN inserted on inserted.IdData = Data.IdData
commit transaction
END
It works as expected but I wonder if it will work as expected when multiple users add a new row in the table Data at the same time?
Let's analyze this code for a minute:
begin transaction
You begin a transaction using the default READCOMMITTED setting.
select top 1 #number = NextNumber FROM settings
You're selecting the highest number from the Settings table (btw: you should by all means add an ORDER BY clause - otherwise, no ordering is guaranteed! You might get unexpected results here).
This operation however isn't blocking - two or more threads can read the same value of e.g. 100 at the same time - the SELECT only takes a shared lock for a very brief period of time, and shared locks are compatible - multiple readers can read the value simultaneously.
Update Settings
set NextNumber = NextNumber + 1
Now here, one thread gets the green light and writes back the new value - 101 in our example - to the table. The table has an UPDATE lock (later escalated to an exclusive lock) which is exclusive - only one thread can write at the same time
UPDATE Data
set number = #nnumber, currentDate = GetDate(), IdUser = user_id(current_user)
FROM Data
INNER JOIN inserted on inserted.IdData = Data.IdData
Same thing - that one lucky thread gets to update the Data table, sets number to 100 and that table's row(s) it's updating are locked until the end of the transaction.
commit transaction
Now that lucky thread commits his transaction and is done.
HOWEVER: that second (and possibly third, fourth, fifth .....) thread that had read the same original value of 100 is still "in the loops" - now that thread #1 has completed, a second one of those threads gets to do its thing - which it does. It updates the Settings table correctly, to a new value of 102, and goes on doing its second update to the Data table, here using the "current" value of 100 that it has read into its #number variable, too....
In the end, you might have multiple threads that all read the same original value (100) from the Settings table), and each one of those will update the Settings table to the same "new" value (101).
This method you're using here is not safe under load.
Possible solutions:
first and foremost - the recommended way to do this: let the database handle this itself, by using a INT IDENTITY column in your table (or if you're already using SQL Server 2012 - use a SEQUENCE object to handle all the synchronization)
if you cannot do this - for whatever reasons - then at least make sure your code works even on a busy system! You need to e.g. use SELECT .... WITH (UPDLOCK) to put an (exclusive) UPDATE lock on the Settigns table when the first thread comes and reads the current value - that'll block all other threads from even reading the "current" value until the first thread has completed. Or there are alternatives like updating and assigning the old value in a single UPDATE operation.
Simulating the execution of a stored procedure by multiple users
You can use two (or more) edit windows in SQL Server Management Studio and execute something like this simultaneously in each window.
insert into Data(ColName) values ('Value')
go 10000
go 10000 will execute the batch 10000 times. Adjust that to whatever value you think is appropriate.

sql log file growing too big

I have a table with 10 million records with no indexes and I am trying to dedupe the table. I tried the inserts with select where either using a left join or where not exists; but each time I get the error with violation of key. The other problem is that the log file grows too large and the transaction will not complete. I tried setting the recovery to simple as recommended online but that does not help. Here are the queries I used;
insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
where not exists(select 1 from temp as t where t.profile=s.profile)
This just produces the violation of key error. I tried using the following:
insert into temp(profile,feed,photo,dateadded)
select distinct profile,feed,photo,dateadded from original as s
left outer join temp t on t.profile=s.profile where t.profile is null
In both instances now the log file fills up before the transaction completes. So my main question is about the log file and I can figure out the deduping with the queries.
You may need to work in batches. Write a loop to go through 5000 (you can experiment with the number, I've had to go as far down as 500 or up to 50,000 depending on the db and how busy it was) records or so at a time.
What is your key? Likely your query will need to pick using an aggreagate function on dataadded (use the min or the max function).
the bigger the transaction, the bigger the transaction log will be.
The log is used for uncommitted recovery of an open transaction so if you’re not committing frequently and your executing a very large transaction, it will cause the log file to grow substantially. Once it commits, then the file will become free space. This is to safe guard the data in case something fails and roll back is needed.
my suggestion would be to run the insert in batches, committing after each batch

Enqueue each row in a ssb queue from a large table

I have a table that contains 2.5 million rows, each row has one column of type xml. All records should be deleted and enqueued in a sqlserver service broker queue when a message arrives in another queue (triggerqueue). Performance is very important and now it's too slow. What would be the best way to achieve this?
currently we use an activated sp on the triggerqueue which does in a while(#message <> null) loop:
begin transaction
delete top (1) from table output #tempTable
select top 1 #message = message from #tempTable
send on conversation #message
commit transaction
are there faster ways to tackle this problem?
By the way: before someone asks: we need to start from the table, because it is filled with the output from an earlier calculated merge statement
So your performance problem is on the send side rather than receive side, right? (it's a bit unclear from your question). In this case, you'll want to start with trying:
Batch many operations in a single transaction. You're most likely getting hit the most by synchronous log flushes at commit time.
Try processing the table more efficiently (e.g. select more rows at once into the temp table and then use cursors to iterate over it and send messages)
In case you're experiencing problems on the receive side, take a look at this great article by Remus.