Ehcache non-transactional context with transactional cache - configuration

I have Ehcache cache instance configured as transactionalMode="local".
Now, when I try to put an element in said cache outside of a transaction, I get
net.sf.ehcache.transaction.TransactionException: transaction not started.
Does this mean that every call on transactional cache instance needs to be in a transaction context?
I'm doing some custom cache pre-loading on startup, and I don't want Ehcache transaction (and copyOnRead/Write) overhead. Also, since I'll be dealing with logically immutable objects, I'd like to be able to read them from cache without transaction scope, if possible.

Do you really need to use local transaction in the first place? i.e. do you need to put multiple cache entries atomically in a single operation?
In any case, if you use transactionalMode="local", you're kind of stuck having to perform all your cache operations within a transaction boundary (even reads)
But if you need more granularity, I'd recommend you look at ehcache explicit locking which can be used as a custom alternative to XA Transactions or Local transactions (without having to specify transactionalMode in your ehcache config). More at http://ehcache.org/documentation/apis/explicitlocking
Hope that helps.

Related

ActiveRecord::StaleObject error on opening each result on a new tab

Recently we've added a functionality in our RoR application which allows users to open a particular record, let's say in their own individual tabs. Doing so, we've started seeing frequent ActiveRecord::StaleObject errors. On investigating the issue I found that rails is indeed trying to update the session store first whenever a resource is opened in a tab and the exception is raised.
We've lock_version in our active record session store, so Rails is taking it as optimistic locking by default. Is there any way we could solve this issue without introducing much complexity, as the application is already live on the client's machine and without affecting any sessions' data we've stored in our session store DB.
Any suggestions would be much appreciated. Thanks
It sounds like you're using optimistic locking on a db session record and updating the session record when you process an update to other records. Not sure what you'd need to update in the session, but if you're worried about possibly conflicting updates to the session object (and need the locking) then these errors might be desired.
If you don't - you can refresh the session object before saving the session (or disable it's optimistic locking) to avoid this error for these session updates.
You also might look into what about the session is being updated and whether it's strictly necessary. If you're updating something like "last_active_on" then you might be better off sending off a background job to do this and/or using the update_column method which bypasses the rather heavyweight activerecord save callback chain.
--- UPDATE ---
Pattern: Putting side-effects in background jobs
There are several common Rails patterns that start to break down as your app usage grows. One of the most common that I've run into is when a controller endpoint for a specific record also updates a common/shared record (for example, if creating a 'message' also updates the messages_count for a user using counter cache, or updates a last_active_at on a session). These patterns create bottlenecks in your application as multiple different types of requests across your application will compete for write locks on the same database rows unnecessarily.
These tend to creep into your app over time and become hard to refactor later. I'd recommend always handling side-effects of a request in an asynchronous job (using something like Sidekiq). Something like:
class Message < ActiveRecord::Base
after_commit :enqueue_update_messages_count_job
def enqueue_update_messages_count_job
Jobs::UpdateUserMessageCountJob.enqueue(self.id)
end
end
While this may seem like overkill at first, it creates an architecture that is significantly more scalable. If counting the messages becomes slow... that will make the job slower but not impact the usability of the product. In addition, if certain activities create lots of objects with the same side-effects (lets say you have a "signup" controller that creates a bunch of objects for a user that all trigger an update of user.updated_at) it becomes easy to throw out duplicate jobs and prevent updating the same field 20 times.
Pattern: Skipping the activerecord callback chain
Calling save on an ActiveRecord object runs validations and all the before and after callbacks. These can be slow and (at times) unnecessary. For example, updating a message_count cached value doesn't necessarily care about whether the user's email address is valid (or any other validations) and you may not care about other callbacks running. Similar if you're just updating a user's updated_at value to clear a cache. You can bypass the activerecord callback chain by calling user.update_attribute(:message_count, ..) to write that field directly to the database. In theory this shouldn't be necessary for a well designed application but in practice some larger/legacy codebases may make significant use of the activerecord callback chain to handle business logic that you may not want to invoke.
--- Update #2 ---
On Deadlocks
One reason to avoid updating (or generally locking) a common/shared object from a concurrent request is that it can introduce Deadlock errors.
Generally speaking a "Deadlock" in a database is when there are two processes that both need a lock the other one has. Neither thread can continue so it must error instead. In practice, detecting this is hard, so some databases (like postgres) just throw a "Deadlock" error after a thread waits for an exclusive/write lock for x amount of time. While contention for locks is common (e.g. two updates that are both updating a 'session' object), a true deadlock is often rare (where thread A has a lock on the session that thread B needs, but thread B has a lock on a different object that thread A needs), so you may be able to partially address the problem by looking at / extending your deadlock timeout. While this may reduce the errors, it doesn't fix the issue that the threads may be waiting for up to the deadlock timeout. An alternative approach is to have a short deadlock timeout and rescue/retry a few times.

Preventing queries caching in MySQL

I'm using the tomcat connection pool via JNDI resources.
In the context.xml:
<Resource name="jdbc/mydb" auth="Container" type="javax.sql.DataSource"
username="myusr" password="mypwd" driverClassName="com.mysql.jdbc.Driver"
maxActive="1000" maxIdle="100" maxWait="10000"
url="jdbc:mysql://localhost:3306/mydatabase"
factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" />
In web.xml:
<resource-ref>
<description>DB Connection</description>
<res-ref-name>jdbc/mydb</res-ref-name>
<res-type>javax.sql.DataSource</res-type>
<res-auth>Container</res-auth>
</resource-ref>
The database is a MySQL one.
When I select some informations, for example a product list, the same list is displayed also after a product insertion or deletion.
How prevent this? In this case, I would to see the updated list.
EDIT
The query_cache_size is 0 and query_cache_type is ON.
So, where could be the issue? Why does the query caching happen?
EDIT
I read about "RESET QUERY CACHE" and "FLUSH TABLES".
What is the difference between them?
By using one of them, may there be issues in an auction/e-commerce scenario?
As documented under Consistent Nonlocking Reads:
If the transaction isolation level is REPEATABLE READ (the default level), all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.
[ deletia ]
If you want to see the “freshest” state of the database, use either the READ COMMITTED isolation level or a locking read:
SELECT * FROM t LOCK IN SHARE MODE;
You can set the default transaction isolation level in Tomcat via its Resource#defaultTransactionIsolation attribute.
The connection pool does not have anything to do with data caching (unless you specifically configure it that way). It's best practice to use a connection pool for database access to prevent runaway connections (e.g. hitting the database with too many simultaneous connections) and to reuse connections that have been opened once (typically establishing a connection is quite expensive, thus they get utilized again). You'll also want the statements themselves (as PreparedStatement) to be cached, as the next expensive operation for a database is to determine the execution plan. (This is independent of the actual result caching)
Have you analyzed if your cached data actually comes from mysql or if you're caching on application level?
Also, make sure that your insert & update transactions are actually committed, otherwise there obviously won't be any change and the data looks like it's cached.
RESET QUERY CACHE only clears the query cache.
FLUSH TABLES closes all tables (after flushing any unwritten data) and also clears the query cache.
Clearing the cache cannot cause anything like the problem you are having. All it does is forcing subsequent queries to actually fetch the data from the tables (until these results are cached again).
Please note, the query cache is guaranteed to never show outdated data. Any committed write to any table referred to by a query in the cache removes such query from the cache. If you see outdated data, then another external mechanism must be in action. For example, many ORM's do some row caching at some stage, and such mechanism may be broken, or may produce unexpected results if not used exactly as intended.
And anyways, if either query_cache_size = 0 or query_cache_type = OFF (or 0), then the query cache is disabled.

HandlerSocket transactions

In Redis can complete the transaction in this way:
redis.watch('powerlevel')
current = redis.get('powerlevel')
redis.multi()
redis.set('powerlevel', current + 1)
redis.exec()
Is it possible to perform this operation using the HandlerSocket?
What are the general features for working with transaction provides handlersotsket?
Comparing Redis "transactions" to a general purpose transactional engine is always a bit misleading. A Redis WATCH/MULTI/EXEC block is:
Not atomic (no rollback in case of error)
Consistent (there are not many consistency rules anyway with Redis)
Fully isolated (everything is serialized)
Possibly durable if AOF+fsync strategy is selected
So the full ACID properties which are commonly used to define a transaction are not completely provided by Redis. Contrary to most transactional engines, Redis provides very strong isolation, and does not attempt to provide any rollback capabilities.
The example provided in the question is not really representative IMO, since the same behavior can be achieved in a simpler way by just using:
redis.incr( "powerlevel" )
because Redis single operations are always atomic and isolated.
WATCH/MULTI/EXEC blocks are typically used when consistency between various keys must be enforced, or to implement optimistic locking patterns. In other words, if your purpose is just to increment isolated counters, there is no need to use a WATCH/MULTI/EXEC block.
The HandlerSocket is a complete different beast. It is built on top of the generic handler of MySQL, and depending on the underlying storage engine, the transactional behavior will be different. For instance when it is used with MyISAM, it will use no ACID transactions, but consistency will be ensured by a R/W lock at the table level. With InnoDB, ACID transactions will be used with the default isolation level (which can be set in the InnoDB configuration AFAIK). InnoDB implements MVCC (multi-versioning concurrency control), so locking is much more complex than with MyISAM.
The HandlerSocket works with two pools of worker threads (one for read-only connections, one for write oriented connections). People are supposed to use several read worker threads, but only one write thread though (probably to decrease locking contention). So in the base configuration, write operations are serialized, but not read operations. AFAIK, the only possibility to have the same isolation semantic than Redis is to only use the write oriented socket to perform both read and write operations, and keep only one write thread (full serialization of all operations). It will impact scalability though.
From the HandlerSocket protocol, there is no access to transactional capabilities. At each event loop iteration, it collects all the operations (coming from all the sockets), and perform a unique transaction (only relevant with InnoDB) for all these operations. AFAIK, the user has no way to alter the scope of this transaction.
The conclusion is it is not generally possible to emulate the behavior of a Redis WATCH/MULTI/EXEC block with HandlerSocket.
Now, back to the example, if the purpose is just to increment counters in a consistent way, this is fully supported by the HandlerSocket protocol. For instance, the +/- (increment/decrement) operations are available, and also the U? operation (similar to Redis GETSET command), or +?/-? (increment/decrement, returning the previous value).

Is a MySQL procedure thread safe?

I am developing some websites that need to interact with a database. I will not bring here a complicated example. My question actually comes down to: Is a MySQL procedure thread safe? If one client on my site triggers a procedure, can I assume it is atomic, or could it interfere with another request from another user?
Depends on if you're using SQL transactions. Its possible, without the appropriate use of transactions and the actual serialization level, that a procedure can expose some data in a write call, for instance, that is visible to other queries / procedures before the complete procedure has completed.
in short: a given procedure will only be atomic if you use the appropriate transaction level
The database will handle concurrency for you. This is normally done via transactions - any set of statements within a transaction is considered atomic and isolated from other processes. In some databases, a stored procedure will be in an implicit transaction (so you don't need to declare one) - read the documentation for your RDBMS.
Sometimes this will mean that records are locked while another process tries to use them.
You will need to write your application so it can detect such occurrences and retry.
It really depends on how your server is configured to use transactions. There are tradeoff to consider depending on how your data is used and whether or not dirty, non-repeatable, or phantom reads are acceptable for your application.
Yes.
It's the DB's job to ensure thread safety among its worker threads, and it's your job to ensure thread safety among your application threads. Since there's a separation between the DB server, and your application, you don't need to worry about thread safety in this case. MySQL's data locking mechanisms will prevent you from corrupting the data in the DB due to simultaneous access from multiple threads in your own app.
Thread safety is more about modifying data in-memory, that is also shared among multiple threads within your app. Since the DB server is its own, separate application, it basically protects you from the scenario you've outlined above.

MySQL: Transactions across multiple threads

Preliminary:
I have an application which maintains a thread pool of about 100 threads. Each thread can last about 1-30 seconds before a new task replaces it. When a thread ends, that thread will almost always will result in inserting 1-3 records into a table, this table is used by all of the threads. Right now, no transactional support exists, but I am trying to add that now. Also, the table in question is InnoDB. So...
Goal
I want to implement a transaction for this. The rules for whether or not this transaction commits or rollback reside in the main thread. Basically there is a simple function that will return a boolean.
Can I implement a transaction across multiple connections?
If not, can multiple threads share the same connection? (Note: there are a LOT of inserts going on here, and that is a requirement).
1) No, a transaction is limited to a single DB connection.
2) Yes, a connection (and transaction) can be shared across multiple threads.
Well, as stated in a different answer you can't create a transaction across multiple connections. And you can share the single connection across threads. However you need to be very careful with that. You need to make sure that only one thread is writing to the connection at the same time. You can't just have multiple threads talking across the same connection without synchronizing their activities in some way. Bad things will likely happen if you allow two threads to talk at once (memory corruptions in the client library, etc). Using a mutex or critical section to protect the connection conversations is probably the way to go.
-Don
Sharing connections between lots of threads is usually implemented by using a connection pool. Every thread can request a connection from the pool, use it for its purposes (one or more transactions, committed or rolled back) and hand it back to the pool once the task is finished.
This is what application servers offer you. They will take care of transactions, too, i. e. when the method that requested the transaction finishes normally, changes are committed, if it throws an exception, the database transaction is rolled back.
I suggest you have a look at Java EE 5 or 6 - it is very easy to use and can even be employed in embedded systems. For easy start, have a look at Netbeans and the Glassfish application server. However the general concepts apply to all application servers alike.
As for InnoDB, it will not have any problems handling lots of transactions. Under the supervision of the app server you can concentrate on the business logic and do not have to worry about half-written updates or anyone seeing updates/inserts before the transaction they originate from has been committed.
InnoDB uses MVCC (multi version concurrency control), effectively presenting each transaction with a snapshot of the whole database as of the time when it was started. You can read more about MVCC here in a related question: Question 812512