Couchbase attempt context transaction not working as expected - couchbase

When there are multiple requests for updating same document in a AttemptContext transaction block, data becomes inconsistent because of concurrent threads trying to update the document. Also a warning message
[com.couchbase.core][IllegalDocumentState] Tried committing document <document_id>, but found that it has been modified by another party in-between staging and committing. The application must ensure that non-transactional writes cannot happen at the same time as transactional writes on a document. The change will be committed with CAS=0, which will overwrite the other change. This document may need manual review to verify that no changes have been lost. Last document state=cas=<CAS>,seqno=<SEQ_NO>,vbucket
This leaves data inconsistent.

Related

Read after write consistency with mysql and multiple concurrent connections

I'm trying to understand whether it is possible to achieve the following:
I have multiple instances of an application server running behind a round-robin load balancer. The client expects GET after POST/PUT semantics, in particular the client will make a POST request, wait for the response and immediately make a GET request expecting the response to reflect the change made by the POST request, e.g:
> Request: POST /some/endpoint
< Response: 201 CREATED
< Location: /some/endpoint/123
> Request: GET /some/endpoint/123
< Response must not be 404 Not Found
It is not guaranteed that both requests are handled by the same application server. Each application server has a pool of connections to the DB. Each request will commit a transaction before responding to the client.
Thus the database will on one connection see an INSERT statement, followed by a COMMIT. One another connection, it will see a SELECT statement. Temporally, the SELECT will be strictly after the commit, however there may only be a tiny delay in the order of milliseconds.
The application server I have in mind uses Java, Spring, and Hibernate. The database is MySQL 5.7.11 managed by Amazon RDS in a multiple availability zone setup.
I'm trying to understand whether this behavior can be achieved and how so. There is a similar question, but the answer suggesting to lock the table does not seem right for an application that must handle concurrent requests.
Under ordinary circumstances, you will not have any issue with this sequence of requests, since your MySQL will have committed the changes to the database by the time the 201 response has been sent back. Therefore, any subsequent statements will see the created / updated record.
What could be the extraordinary circumstances under which the subsequent select will not find the updated / inserted record?
Another process commits an update or delete statement that changes or removes the given record. There is not too much you can do about this, since it is part of the normal operation. If you do not want such thing to happen, then you have to implement application level locking of data.
The subsequent GET request is routed not only to a different application server, but that one uses (or is forced to use) a different database instance, which does not have the most updated state of that record. I would envisage this to happen if either application or database server level there is a severe failure, or routing of the request goes really bad (routed to a data center at a different geographical location). These should not happen too frequently.
If you're using MyISAM tables, you might be seeing the effects of 'concurrent inserts' (see 8.11.3 in the mysql manual). You can avoid them by either setting the concurrent_insert system variable to 0, or by using the HIGH_PRIORITY keyword on the INSERT.

Very frequent couchbase document updates

I'm new to couchbase and was wondering if very frequent updates to a single document (possibly every second) will cause all updates to pass through the disk write queue, or only the last update made to the document?
In other words, does couchbase optimize disk writes by only writing the document to disk once, even if updated multiple time between writes.
Based on the docs, http://docs.couchbase.com/admin/admin/Monitoring/monitor-diskqueue.html, it sounds like all updates are processed. If anyone can confirm this, I'd be grateful.
thanks
Updates are held in a disk queue before being written to disk. If a write to a document occurs and a previous write is still in the disk queue, then the two writes will be coalesced, and only the more recent version will actually be written to disk.
Exactly how fast the disk queue drains will depend on the storage subsystem, so whether writes to the same key get coalesced will depend on how quick the writes come in compared to the storage subsystem speed / node load.
Jako, you should worry more about updates happening in the millisecond time frame or more than one update happening in 1 (one) millisecond. The disc write isn't the problem, Couchbase solves this intelligently itself but the fact that you will concurrency issues when you operate in the milliseconds time frame.
I've run into them fairly easily when I tested my application and first couldn't understand why Node.js (in my case) sometimes would write data to CouchBase and sometimes not. If it didn't write to CouchBase usually for the first record.
More problems raised when I first checked if a document with a specific key existed, upon not existing I would try to write it to CouchBase only to find out that in the meantime an early callback had finished and now there was indeed a key for the same document.
In those case you have to operate with the CAS flag and program it iteratively so that your app is continuously trying to pull the right document for that key and then updates. Keep this in mind especially when running tests and updates to the same document is being done!

How to cache popular queries to avoid both stamedes and blank results

On the customizable front page of our web site, we offer users the option of showing modules showing recently updated content, choosing from well over 100 modules.
All of the data is generated by MySQL queries, the results of which are cached via memcached. Our current system works like this: when a user load a page containing modules, module, they are immediately served the data from cache, and the query is added to a queue to be updated by a separate gearman process (so that the page load does not wait for the mysql query). That query is then run once every 15 minutes to refresh the data in cache. The queue of queries itself is periodically purged so that we do not continually refresh data that has not been requested recently.
The problem is what to do when the cache is empty, for some reason. This doesn't happen often, but when it does, the user is currently shown an empty module, and the data is refreshed in the gearman process so that a bit later, when the same (or a different) user reloads the page, there is data to show.
Our traffic is such that, if we were to try to run the query live for the user when the cache is empty, we would have a serious problem with stampeding--we'd be running the same (possibly slow) query many times as many users loaded the page. Is there any way to solve the "blank module" problem without opening up the risk of stampeding?
This is an interesting implementation though varies a bit from the way most typically implement memcached in fronT of MySQL.
In most cases users will set things up to where queries are first evaluated at memcached to see if there is is an available entry. If so they server it from memcached and never query the database at all. If there is a cache miss, then the query is made against the database, the results added to memcached, and the information returned to the caller. This is how you would typically build up your cache for read queries.
In cases where data is being updated, the update would be made against the database, and then the appropriate data in memcached invalidated and/or updated. Similarly for inserts, you could either do nothing regarding the cache (and let the next read on that record populate the cache), or you could actively add the data related to the insert into the cache, depending on your application needs.
In this way you wouldn't need to take the extra step of calling the database to get authoritative data after getting initial data from memcached. The data in memcached would be a copy of the authoritative data which is just updated/invalidated upon updates/inserts.
Based on your comments, one thing you might want to try in order to prevent a number of of queries on your database in case of cache misses is to use a mutex of sorts. For example, when the first client hits memcached and gets a cache miss for that lookup, you could could insert a temporary value in memcached indicating that the data is pending, then make the query against the database, and the update the memcached data with the result.
On the client side, when you get a cache miss or a "pending" result, you could simply initiate a retry for the cache after a certain period of time (which you may want to increase exponentially). So perhaps first hey wait for 1 second, then try back gain in 2 seconds if they still get a "pending" results, then retry in 4 seconds, and so on.
This would amount in possibly more requests against the memcached server, but should resolve any problems on the database layer.

How to avoid 'Transaction managed block ended with pending COMMIT/ROLLBACK' error across methods

I have a situation where I have applied the #transaction.commit_manually decorator to a method in which I am importing information passed back in an http request response. I need to control committing and rolling back depending on whether business validation rules pass or fail.
Now, when I have some sort of validation failure I have a separate method in which I log an error to the database. This action should always commit immediately, while leaving the primary transaction in its current state. However, what happens is if I apply the #transaction.commit_on_success decorator to the error capturing routine, my primary transaction commits automatically as well. If I don't apply the #transaction.commit_on_success decorator, then, I receive the 'Transaction managed block ended with pending COMMIT/ROLLBACK' error as soon as a call is made to the error capturing routine.
I am using MYSQL database version 5.1.49 using storage engine INNODB.
Is there a way to persist the open transaction in the calling routine while committing the transaction in the second routine?
Django's default transaction management doesn't support nested transactions. In general, transactions can't be nested. Everything that's done in the midst of a transaction is either committed or rolledback. So when you commit the transaction, no matter where you commit the transaction, it's atomic.
Looking around online, I found a snippet that might be a good starting point for you. It essentially overrides the commit_on_success decorator, adding a form of reference counting. In a sense, it forgoes committing if it's not the last out.

how to solve lock_wait_timeout, subsequent rollback and data disappeareance from mysql 5.1.38

i am using a toplink with struts 2 and toplink for a high usage app, the app always access a single table with multiple read and writes per second. This causes a lock_wait_timeout error and the transaction rolls back, causing the data just entered to disappear from the front end. (Mysql's autocommit has been set to one). The exception has been caught and sent to an error page in the app but still a rollback occurs (it has to be a toplink exception as mysql does not have the rollback feature turned on). The raw data files, ibdata01 show the entry in it when opened in an editor. As this happend infreqeuntly have not been able to replicate in test conditions.
Can anyone be kind enough to provide some sort of way out of this dilemma? What sort of approach should such a high access (constant read and writes from the same table all the time)? Any help would be greatly appreciated.
What is the nature of your concurrent reads/updates? Are you updating the same rows constantly from different sessions? What do you expect to happen when two sessions update the same row at the same time?
If it is just reads conflicting with updates, consider reducing your transaction isolation on your database.
If you have multiple write conflicting, then you may consider using pessimistic locking to ensure each transaction succeeds. But either way, you will have lot of contention, so may reconsider your data model or application's usage of the data.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Locking
lock_wait_timeouts are a fact of life for transactional databases. the normal response should usually be to trap the error and attempt to re-run the transaction. not many developers seem to understand this, so it bears repeating: if you get a lock_wait_timeout error and you still want to commit the transaction, then run it again.
other things to look out for are:
persistent connections and not
explicitly COMMIT'ing your
transactions leads to long-running
transactions that result in
unnecessary locks.
since you
have auto-commit off, if you log in
from the mysql CLI (or any other
interactive query tool) and start
running queries you stand a
significant chance of locking rows
and not releasing them in a timely
manner.