Preventing queries caching in MySQL - mysql

I'm using the tomcat connection pool via JNDI resources.
In the context.xml:
<Resource name="jdbc/mydb" auth="Container" type="javax.sql.DataSource"
username="myusr" password="mypwd" driverClassName="com.mysql.jdbc.Driver"
maxActive="1000" maxIdle="100" maxWait="10000"
url="jdbc:mysql://localhost:3306/mydatabase"
factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" />
In web.xml:
<resource-ref>
<description>DB Connection</description>
<res-ref-name>jdbc/mydb</res-ref-name>
<res-type>javax.sql.DataSource</res-type>
<res-auth>Container</res-auth>
</resource-ref>
The database is a MySQL one.
When I select some informations, for example a product list, the same list is displayed also after a product insertion or deletion.
How prevent this? In this case, I would to see the updated list.
EDIT
The query_cache_size is 0 and query_cache_type is ON.
So, where could be the issue? Why does the query caching happen?
EDIT
I read about "RESET QUERY CACHE" and "FLUSH TABLES".
What is the difference between them?
By using one of them, may there be issues in an auction/e-commerce scenario?

As documented under Consistent Nonlocking Reads:
If the transaction isolation level is REPEATABLE READ (the default level), all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.
[ deletia ]
If you want to see the “freshest” state of the database, use either the READ COMMITTED isolation level or a locking read:
SELECT * FROM t LOCK IN SHARE MODE;
You can set the default transaction isolation level in Tomcat via its Resource#defaultTransactionIsolation attribute.

The connection pool does not have anything to do with data caching (unless you specifically configure it that way). It's best practice to use a connection pool for database access to prevent runaway connections (e.g. hitting the database with too many simultaneous connections) and to reuse connections that have been opened once (typically establishing a connection is quite expensive, thus they get utilized again). You'll also want the statements themselves (as PreparedStatement) to be cached, as the next expensive operation for a database is to determine the execution plan. (This is independent of the actual result caching)
Have you analyzed if your cached data actually comes from mysql or if you're caching on application level?
Also, make sure that your insert & update transactions are actually committed, otherwise there obviously won't be any change and the data looks like it's cached.

RESET QUERY CACHE only clears the query cache.
FLUSH TABLES closes all tables (after flushing any unwritten data) and also clears the query cache.
Clearing the cache cannot cause anything like the problem you are having. All it does is forcing subsequent queries to actually fetch the data from the tables (until these results are cached again).
Please note, the query cache is guaranteed to never show outdated data. Any committed write to any table referred to by a query in the cache removes such query from the cache. If you see outdated data, then another external mechanism must be in action. For example, many ORM's do some row caching at some stage, and such mechanism may be broken, or may produce unexpected results if not used exactly as intended.
And anyways, if either query_cache_size = 0 or query_cache_type = OFF (or 0), then the query cache is disabled.

Related

Mysql debezium connector for rds in production caused deadlocks

We are creating a data pipeline from Mysql in RDS to elastic search for creating search indexes,
and for this using debezium cdc with its mysql source and elastic sink connector.
Now as the mysql is in rds we have to give the mysql user LOCK TABLE permission for two tables we wanted cdc, as mentioned in docs.
We also have various other mysql users performing transactions which may require any of the two tables.
As soon as we connected the mysql connector to our production database there was a lock created and our whole system went down, after realising this we soon stopped the kafka and also removed the connector, but the locks where still increasing and it only solved after we stop all the new queries by stopping our production code from running and manually killing the processes.
What could be the potential cause for this, and how could we prevent this ?
I'm only guessing because I don't know your query traffic. I would assume the locks you saw increasing were the backlog of queries that had been waiting for the table locks to be released.
I mean the following sequence is what I believe happened:
Debezium starts table locks on your two tables.
The application is still working, and it is trying to execute queries that access those locked tables. The queries begin waiting for the lock to be released. They will wait for up to 1 year (this is the default lock_wait_timeout value).
As you spend some minutes trying to figure out why your site is not responding, a large number of blocked queries accumulate. Potentially as many as max_connections. After all the allowed connections are full of blocked queries, then the application cannot connect to MySQL at all.
Finally you stop the Debezium process that is trying to read its initial snapshot of data. It releases its table locks.
Immediately when the table locks are released, the waiting queries can proceed.
But many of them do need to acquire locks too, if they are INSERT/UPDATE/DELETE/REPLACE or if they are SELECT ... FOR UPDATE or other locking statements.
Since there are so many of these queries queued up, it's more likely for them to be requesting locks that overlap, which means they have to wait for each other to finish and release their locks.
Also because there are hundreds of queries executing at the same time, they are overtaxing system resources like CPU, causing high system load, and this makes them all slow down too. So it will take longer for queries to complete, and therefore if they are blocked each other, they have to wait longer.
Meanwhile the application is still trying to accept requests, and therefore is adding more queries to execute. They are also subject to the queueing and resource exhaustion.
Eventually you stop the application, which at least allows the queue of waiting queries to gradually be finished. As the system load goes down, MySQL is able to process the queries more efficiently and finishes them all pretty soon.
The suggestion by the other answer to use a read replica for your Debezium snapshot is a good one. If your application can read from the master MySQL instance for a while, then no query will be blocked on the replica while Debezium has it locked. Eventually Debezium will finish reading all the data, and release the locks, and then go on to read only the binlog. Then the app can resume using the replica as a read instance.
If your binlog uses GTID, you should be able to make a CDC tool like Debezium read the snapshot from the replica, then when that's done, switch to the master to read the binlog. But if you don't use GTID, that's a little more tricky. The tool would have to know the binlog position on the master corresponding to the snapshot on the replica.
If the locking is problem and you cannot afford to tradeoff locking vs consistency then please take a look at snapshot.locking.mode config option.
Use the replica to prevent lock table statement getting executed, why debezium need lock table? all CDC tool fetch the events from bin logs.
The reason is that debezium is not as written in the document (version 1.5). Once FTWRL acquisition fails, it will execute the lock table. It will be released after the snapshot is read. If you see in the log that "Unable to refresh and obtain the global read lock, the table read lock will be used after reading the table name", congratulations, lucky one

Ehcache non-transactional context with transactional cache

I have Ehcache cache instance configured as transactionalMode="local".
Now, when I try to put an element in said cache outside of a transaction, I get
net.sf.ehcache.transaction.TransactionException: transaction not started.
Does this mean that every call on transactional cache instance needs to be in a transaction context?
I'm doing some custom cache pre-loading on startup, and I don't want Ehcache transaction (and copyOnRead/Write) overhead. Also, since I'll be dealing with logically immutable objects, I'd like to be able to read them from cache without transaction scope, if possible.
Do you really need to use local transaction in the first place? i.e. do you need to put multiple cache entries atomically in a single operation?
In any case, if you use transactionalMode="local", you're kind of stuck having to perform all your cache operations within a transaction boundary (even reads)
But if you need more granularity, I'd recommend you look at ehcache explicit locking which can be used as a custom alternative to XA Transactions or Local transactions (without having to specify transactionalMode in your ehcache config). More at http://ehcache.org/documentation/apis/explicitlocking
Hope that helps.

How to cache popular queries to avoid both stamedes and blank results

On the customizable front page of our web site, we offer users the option of showing modules showing recently updated content, choosing from well over 100 modules.
All of the data is generated by MySQL queries, the results of which are cached via memcached. Our current system works like this: when a user load a page containing modules, module, they are immediately served the data from cache, and the query is added to a queue to be updated by a separate gearman process (so that the page load does not wait for the mysql query). That query is then run once every 15 minutes to refresh the data in cache. The queue of queries itself is periodically purged so that we do not continually refresh data that has not been requested recently.
The problem is what to do when the cache is empty, for some reason. This doesn't happen often, but when it does, the user is currently shown an empty module, and the data is refreshed in the gearman process so that a bit later, when the same (or a different) user reloads the page, there is data to show.
Our traffic is such that, if we were to try to run the query live for the user when the cache is empty, we would have a serious problem with stampeding--we'd be running the same (possibly slow) query many times as many users loaded the page. Is there any way to solve the "blank module" problem without opening up the risk of stampeding?
This is an interesting implementation though varies a bit from the way most typically implement memcached in fronT of MySQL.
In most cases users will set things up to where queries are first evaluated at memcached to see if there is is an available entry. If so they server it from memcached and never query the database at all. If there is a cache miss, then the query is made against the database, the results added to memcached, and the information returned to the caller. This is how you would typically build up your cache for read queries.
In cases where data is being updated, the update would be made against the database, and then the appropriate data in memcached invalidated and/or updated. Similarly for inserts, you could either do nothing regarding the cache (and let the next read on that record populate the cache), or you could actively add the data related to the insert into the cache, depending on your application needs.
In this way you wouldn't need to take the extra step of calling the database to get authoritative data after getting initial data from memcached. The data in memcached would be a copy of the authoritative data which is just updated/invalidated upon updates/inserts.
Based on your comments, one thing you might want to try in order to prevent a number of of queries on your database in case of cache misses is to use a mutex of sorts. For example, when the first client hits memcached and gets a cache miss for that lookup, you could could insert a temporary value in memcached indicating that the data is pending, then make the query against the database, and the update the memcached data with the result.
On the client side, when you get a cache miss or a "pending" result, you could simply initiate a retry for the cache after a certain period of time (which you may want to increase exponentially). So perhaps first hey wait for 1 second, then try back gain in 2 seconds if they still get a "pending" results, then retry in 4 seconds, and so on.
This would amount in possibly more requests against the memcached server, but should resolve any problems on the database layer.

how to solve lock_wait_timeout, subsequent rollback and data disappeareance from mysql 5.1.38

i am using a toplink with struts 2 and toplink for a high usage app, the app always access a single table with multiple read and writes per second. This causes a lock_wait_timeout error and the transaction rolls back, causing the data just entered to disappear from the front end. (Mysql's autocommit has been set to one). The exception has been caught and sent to an error page in the app but still a rollback occurs (it has to be a toplink exception as mysql does not have the rollback feature turned on). The raw data files, ibdata01 show the entry in it when opened in an editor. As this happend infreqeuntly have not been able to replicate in test conditions.
Can anyone be kind enough to provide some sort of way out of this dilemma? What sort of approach should such a high access (constant read and writes from the same table all the time)? Any help would be greatly appreciated.
What is the nature of your concurrent reads/updates? Are you updating the same rows constantly from different sessions? What do you expect to happen when two sessions update the same row at the same time?
If it is just reads conflicting with updates, consider reducing your transaction isolation on your database.
If you have multiple write conflicting, then you may consider using pessimistic locking to ensure each transaction succeeds. But either way, you will have lot of contention, so may reconsider your data model or application's usage of the data.
See,
http://en.wikibooks.org/wiki/Java_Persistence/Locking
lock_wait_timeouts are a fact of life for transactional databases. the normal response should usually be to trap the error and attempt to re-run the transaction. not many developers seem to understand this, so it bears repeating: if you get a lock_wait_timeout error and you still want to commit the transaction, then run it again.
other things to look out for are:
persistent connections and not
explicitly COMMIT'ing your
transactions leads to long-running
transactions that result in
unnecessary locks.
since you
have auto-commit off, if you log in
from the mysql CLI (or any other
interactive query tool) and start
running queries you stand a
significant chance of locking rows
and not releasing them in a timely
manner.

Multiple processes accessing Django db backend; records not showing up until manually calling _commit

I have a Django project in which multiple processes are accessing the backend mysql db. One process is creating records, while a second process is trying to read those records. I am having an issue where the second process that is trying to read the records can't actually find the records until I manually call connection._commit().
This question has been asked before:
caching issues in MySQL response with MySQLdb in Django
The OP stated that he solved the problem, but didn't quite explain how. Can anyone shed some light on this? I'd like to be able to access the records without manually calling _commit().
Thanks,
Asif
He said:
Django's autocommit isn't an actual autocommit in the db.
So, you have to ensure that autocommit is set at the db level. Otherwise, because of transaction isolation, processes will not see changes made by a different process (different connection), until a commit is done. AFAIK this is not especially a Django issue, other than the lack of clarity in the docs about Django autocommit != db autocommit.
Update: Paraphrasing slightly from the MySQL docs:
REPEATABLE READ is the default
isolation level for InnoDB. For
consistent reads, there is an
important difference from the READ
COMMITTED isolation level: All
consistent reads within the same
transaction read the snapshot
established by the first read. (My
emphasis.)
So, with REPEATABLE READ you only get, on subsequent reads, what was read in the first read. With READ COMMITTED, each read creates and reads its own fresh snapshot, so that you see subsequent updates from other transactions. So - in answer to your comment - your change to the transaction level is correct.
Are you running the processes as views? If so, they're probably committing when the request is finished processing, but it sounds like you're running these processes concurrently. If you run the process outside of a view, they should commit on each save.