Two Phase Commit on Couchbase - couchbase

So I was reading about this two phase commit:
http://docs.couchbase.com/couchbase-devguide-2.5/#performing-two-phase-commits
AFAIK, Couchbase operation is atomic.
Article explains when performing two phase commit I should create a Trans:1 document, which contains the state of the Transaction progress.
1) First is start at Init state.
2) When Transaction start begin process, I should switch Trans:1's state to pend.
By switching to pend state, we can prevent other process pickup the same transaction.
3) Then update the target document's content (in example is dipti and karen) to include a trans the same time.
*If anything fail during the update on either document, we can do a rollback, by checking the trans:1 document's state is equals to pend *
So here is my questions:
a) Since Couchbase operation is atomic, if there is multiple processes try to come pickup the transaction trans:1, there is chance say process A got trans:1's state = init, before process A update trans:1's state = pend, process B may also got the same result trans:1's state = init and try to perform the update.
b) Same reason, when a process updated trans::1 state to pend, update dipti and karen. Because operation is atomic, they cannot really update the same time, only can be update one after another.
Then how can we prevent other process not getting/updating the value of dipti and karen? Because their values is not completely updated.
c) Couchbase is not versioning like Couchdb, how do we do rollback actually?
d) What is the point of keeping trans:1 and set to state:done? Why not just delete the whole document when we know is done?
e) Last question, actually how should I make the transaction document being pickup? I mean like should I run a service that will constantly checking all the Transition document in Linux or something? Running everytime when machine got rebooted?
(The example and codes there using cas() method, I have no idea what is the equivalent in PHP SDK, I imagine is means get()? )
Currently my approach is:
When trans:1 is state:init the process will getAndLock all necessary documents (dipti and karen) and create copy of each of the documents (trans:1::dipti etc)
When trans:1 is state:pend the process will update each document with state:Processed, so I know which documents got updated and need to rollback (from the copy of documents) if any failure.
And rollback will remove all copy documents and as well as the trans:1.
But it still doesn't prevent other process able to get dipti and karen while both is being updating. (i.e. Get total of amount of all people).

Related

MySQL/MariaDB InnoDB Simultaneous Transactions & Locking Behaviour

As part of the persistence process in one of my models an md5 check_sum of the entire record is generated and stored with the record. The md5 check_sum contains a flattened representation of the entire record including all EAV attributes etc. This makes preventing absolute duplicates very easy and efficient.
I am not using a unique index on this check_sum for a specific reason, I want this all to be silent, i.e. if a user submits a duplicate then the app just silently ignores it and returns the already existing record. This ensures backwards compatibility with legacy app's and api's.
I am using Laravel's eloquent. So once a record has been created and before committing the application does the following:
$taxonRecords = TaxonRecord::where('check_sum', $taxonRecord->check_sum)->get();
if ($taxonRecords->count() > 0) {
DB::rollBack();
return $taxonRecords->first();
}
However recently I encountered a 60,000/1 shot incident(odds based on record counts at that time). A single duplicate ended up in the database with the same check_sum. When I reviewed the logs I noticed that the creation time was identical down to the second. Further investigation of Apache logs showed a valid POST but the POST was duplicated. I presume the users browser malfunctioned or something but both POSTS arrived simultaneously resulting in two simultaneous transactions.
My question is how can I ensure that a transaction and its contained SELECT for the previous check_sum is Atomic & Isolated. Based upon my reading the answer lies in https://dev.mysql.com/doc/refman/8.0/en/innodb-locking-reads.html and isolation levels.
If transaction A and transaction B arrive at the server at the same time then they should not run side by side but should wait for the first to complete.
You created a classic race condition. Both transactions are calculating the checksum while they're both in progress, not yet committed. Neither can read the other's data, since they're uncommitted. So they calculate that they're the only one with the same checksum, and they both go through and commit.
To solve this, you need to run such transactions serially, to be sure that there aren't other concurrent transactions submitting the same data.
You may have to use use GET_LOCK() before starting your transaction to calculate the checksum. Then RELEASE_LOCK() after you commit. That will make sure other concurrent requests wait for your data to be committed, so they will see it when they try to calculate their checksum.

Is every SQL undo when it never reaches the point of COMMIT?

I think I got the principle right I just want to make sure I get it right.
So when autocommit is enabled it means every command I do wont be executed directly, except whose who trigger the commit themselves.
So when I've for example a basic macro running like:
statement.executeUpdate("SET autocommit = 0;")
//some code
//SQL Queries
//SQL DELETEs
//SQL INSERTs
statement.executeUpdate("COMMIT;")
Then what would happen would be - If the script runs through without any problem the script goes to the point where every SQL Statement is executed and COMMITed at the end, if not and an error or exception happens the script breaks at that point never turns to the point where the COMMIT is going to happen and every change prior to that point is undo, so that every deleted information will still be there and every insertion is thrown away.
Is it that simple or did I get something wrong?
Assuming you are using a decent database, the data of the current transaction is not typically stored in the table heap itself but in a "redo log".
That is, it's not even in the table until the commit is executed. The commit and other later processed place them in the main table at some point.
In general, if the database engine crashes, the data may still be on disk somewhere, but not in any "official" table area, so it will be discarded when the database engine is restarted. It did not modify the actual data.

Percona XtraDB Cluster multi-node writing and unexpected deadlocks outside of transaction?

I am having trouble finding an answer to this using google or Stack Overflow, so perhaps people familiar with Percona XtraDB can help answer this. I fully understand how unexpected deadlocks can occur as outlined in this article, and the solution is to make sure you wrap your transactions with retry logic so you can restart them if they fail. We already do that.
https://www.percona.com/blog/2012/08/17/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/
My questions is about normal updates that occur outside of a transaction in auto commit mode. Normally if you are writing only to a single SQL DB and perform an update, you get a last in wins scenario so whoever executes the statement last, is golden. Any other data is lost so if two updates occur at the same time, one of them will take hold and the others data is essentially lost.
Now what happens in a multi master environment with the same thing? The difference in cluster mode with multi master is that the deadlock can occur at the point where the commit happens as opposed to when the lock is first taken on the table. So in auto commit mode, the data will get written to the DB but then it could fail when it tries to commit that to the other nodes in the cluster if something else modified the exact same record at the same time. Clearly the simply solution is to re-execute the update again and it would seem to me that the database itself should be able to handle this, since it is a single statement in auto commit mode?
So is that what happens in this scenario, or do I need to start wrapping all my update code in retry handling as well and retry it myself when this fails?
Autocommit is still a transaction; a single statement transaction. Your single statement is just wrapped up in BEGIN/COMMIT for you. I believe your logic is inverted. In PXC, the rule is "commit first wins". If you start a manual transaction on node1 (ie: autocommit=0; BEGIN;) and UPDATE id=1 and don't commit then on node2 you autocommit an update to the same row, that will succeed on node2 and succeed on node1. When you commit the manual UPDATE, you will get a deadlock error. This is correct behavior.
It doesn't matter if autocommit or not; whichever commits first wins and the other transaction must re-try. This is the reason why we don't recommend writing to multiple nodes in PXC.
Yes, if you want to write to multiple nodes, you need to adjust your code to "try-catch-retry" handle this error case.

Getting stale results in multiprocessing environment

I am using 2 separate processes via multiprocessing in my application. Both have access to a MySQL database via sqlalchemy core (not the ORM). One process reads data from various sources and writes them to the database. The other process just reads the data from the database.
I have a query which gets the latest record from the a table and displays the id. However it always displays the first id which was created when I started the program rather than the latest inserted id (new rows are created every few seconds).
If I use a separate MySQL tool and run the query manually I get correct results, but SQL alchemy is always giving me stale results.
Since you can see the changes your writer process is making with another MySQL tool that means your writer process is indeed committing the data (at least, if you are using InnoDB it does).
InnoDB shows you the state of the database as of when you started your transaction. Whatever other tools you are using probably have an autocommit feature turned on where a new transaction is implicitly started following each query.
To see the changes in SQLAlchemy do as zzzeek suggests and change your monitoring/reader process to begin a new transaction.
One technique I've used to do this myself is to add autocommit=True to the execution_options of my queries, e.g.:
result = conn.execute( select( [table] ).where( table.c.id == 123 ).execution_options( autocommit=True ) )
assuming you're using innodb the data on your connection will appear "stale" for as long as you keep the current transaction running, or until you commit the other transaction. In order for one process to see the data from the other process, two things need to happen: 1. the transaction that created the new data needs to be committed and 2. the current transaction, assuming it's read some of that data already, needs to be rolled back or committed and started again. See The InnoDB Transaction Model and Locking.

Should I commit after a single select

I am working with MySQL 5.0 from python using the MySQLdb module.
Consider a simple function to load and return the contents of an entire database table:
def load_items(connection):
cursor = connection.cursor()
cursor.execute("SELECT * FROM MyTable")
return cursor.fetchall()
This query is intended to be a simple data load and not have any transactional behaviour beyond that single SELECT statement.
After this query is run, it may be some time before the same connection is used again to perform other tasks, though other connections can still be operating on the database in the mean time.
Should I be calling connection.commit() soon after the cursor.execute(...) call to ensure that the operation hasn't left an unfinished transaction on the connection?
There are thwo things you need to take into account:
the isolation level in effect
what kind of state you want to "see" in your transaction
The default isolation level in MySQL is REPEATABLE READ which means that if you run a SELECT twice inside a transaction you will see exactly the same data even if other transactions have committed changes.
Most of the time people expect to see committed changes when running the second select statement - which is the behaviour of the READ COMMITTED isolation level.
If you did not change the default level in MySQL and you do expect to see changes in the database if you run a SELECT twice in the same transaction - then you can't do it in the "same" transaction and you need to commit your first SELECT statement.
If you actually want to see a consistent state of the data in your transaction then you should not commit apparently.
then after several minutes, the first process carries out an operation which is transactional and attempts to commit. Would this commit fail?
That totally depends on your definition of "is transactional". Anything you do in a relational database "is transactional" (That's not entirely true for MySQL actually, but for the sake of argumentation you can assume this if you are only using InnoDB as your storage engine).
If that "first process" only selects data (i.e. a "read only transaction"), then of course the commit will work. If it tried to modify data that another transaction has already committed and you are running with REPEATABLE READ you probably get an error (after waiting until any locks have been released). I'm not 100% about MySQL's behaviour in that case.
You should really try this manually with two different sessions using your favorite SQL client to understand the behaviour. Do change your isolation level as well to see the effects of the different levels too.