Which Hibernate Queries Hits Second Layer Cache? - mysql

I just added Memcached as my second layer cache for Hibernate. Performance actually took a significant hit after installing the cache. All queries are slower. I realized that the reason is probably due to most of my queries aren't based on id, so second layer cache is not being used.
My question is shouldn't non id-based queries just go straight to the database without ever hitting the cache? Aka, the decision making of whether the query is "cache appropriate" be determined prior to hitting the cache? If so, shouldn't performance be faster?

When I was checking Hibernate code, it looked like Hibernate cannot reuse cache when using HQL queries (it didn't have compiler from HQL to their caching mechanism).
I can recommend you rather use fjorm instead of Hibernate. Disclaimer: I'm a founder of fjorm.

Related

Should I implement my own caching or rely on read-replicas?

We have an enterprise application that uses an SQL database. The database access characteristics are about 90% reads. The data that does get updated or created needs to be up-to-date immediately. The cache needs to be correctly invalidated with high certainty. The entities are referred to by their primary key for 98% of the cases.
The application is based on Node.js and is AWS-native. Since the application is AWS-native, I'd like to rely on managed services from AWS rather than hosting my own. One option is to implement our read-through Redis-based cache. Upon retrieving the entities, we'd check the cache and if the data is not cached we'd put it into the cache before turning it to the user. The parts of the code that update those entities will invalidate the cache by primary key.
Generally speaking, in computer science cache coherency is one of the most challenging problems to get right. I am of the opinion that rather than implementing a Redis cache and thinking through all of the possible scenarios for correctly invalidating it, it is wiser to instead configure an Aurora read-replica specifically for reading frequently accessed entities. The RDBMS will do a much better job at caching than anything we can build ourselves.
So, I am facing two options -- go through the effort of implementing my own caching, or use read replicas. My personal opinion is to use a read replica.
Any advice is greatly appreciated, as always.
Yes, you're right, cache invalidation is a tough problem. The simplest solution is to add code to your data writes, to replace the cached values. So they're always current. But this is easy only if the cached values have a pretty much 1-to-1 correlation with rows in your database.
An advantage of your own cache is that you can cache data that is not 1-to-1 with rows of data in the database. You might cache an entire HTML fragment for a drop-down menu for example. That could be the result of several SQL queries. It could be quite an advantage to cache data that is higher up the "food chain" so to speak. But cache invalidation becomes less straightforward. Best for storing results of queries that don't change often.
Using a read-replica is not a substitute for using a cache. Querying a read-replica still has overhead of making a database connection, authentication, SQL query parsing and optimization, locking, and all the other overhead that goes into RDBMS workings.
Querying data from a cache can be orders of magnitude faster.
Both have their place. It's best to use both a cache and a read-replica for different tasks. I would also add message queues as an important technology. I believe database, cache, and queue form a three-legged stool.
But you must have experience and judgment to know when each is the best tool for a given case.

where to use MYSQL query caching

problem
My question is I am developing a system. I click a query caching concept for fast response time. now I want to find which kind of traffic on system on web application is better for query caching and which is not. and what is the downside of query caching.
Whether Query Cache is good for you? Depends on
what MySQL version are you using
what is the scale of your application
what kind of queries you want to cache
How it works
If MySQL Query Cache is used, MySQL won't go to the trouble of parsing the query every time the query is hit. MySQL will look for a identical query in the query cache whenever a query is hit and if it finds the query, it won't need to parse it again, it will just send it to the server and fetch the results.
Issues & Limitations
Please do remember that the cache doesn't store data of your query. You will not receive old/stale data from a cached query. It just stores the parsed query. But a point to be made here is that if the underlying tables (of the cached query) undergo any change, all the tables being used in the cache will be invalidated.
Among other things, there are serious limitations to the Query Cache. Cached queries cannot be used for stored procedures, functions and triggers. They're also not used for queries which are subqueries of an outer query.
It was once considered a great tool for speeding up the queries, but recently MySQL development team has decided to retire this feature as they found some scalability issues with the query cache.
Do read this this article on MySQL Server Team's blog about retiring the Query Cache in MySQL 8.0

Using Redis to cache SQL result

I have a SQL-based application and I like to cache the result using Redis. You can think of the application as an address book with multiple SQL tables. The application performs the following tasks:
40% of the time:
Create a new record / Update an existing record
Bulk update multiple records
Review an existing record
60% of the time:
Search records based on user's criteria
This is my current approach:
The system cache a record when a record is created or updated.
When user performs a search, the system will cache the query result.
On top of that, I have a Redis look-up table (Redis Set) which stores the MySQL record ID and the Redis cache key. That way I can delete the Redis caches if the MySQL record has been changed (e.g., bulk update).
What if a new record is created after the system cache the search result? If the new record matches the search criteria, the system will always return the old cache (which does not include the new record), until the cache is deleted (which won't happen until an existing record in the cache is updated).
The search is driven by the users and the combination of the search condition is countless. It is not possible to evaluate which cache should be deleted when a new record is created.
So far, the only solution is to remove all caches of a MySQL table when a record is created. However this is not a good choice because lots of records are created daily.
In this situation, what's the best way to implement Redis on top of MySQL?
Here's a surprising thing when it comes to PHP and MySQL (I am not sure about other languages) - not caching stuff into memcached or Redis is actually faster. Much faster. Basically, if you just built your app and queried MySQL - you'd get more out of it.
Now for the "why" part.
InnoDB, the default engine, is a superb engine. Specifically, it's memory management (allocation and what not) is superior to any memory storage solutions. That's a fact, you can look it up or take my word for it - it will, at least, perform as good as Redis.
Now what happens in your app - you query MySQL and cache the result into redis. However, MySQL is also smart enough to keep cached results. What you just did is create an additional file descriptor that's required to connect to Redis. You also used some storage (RAM) to cache the result that MySQL already cached.
Here comes another interesting part - the preferred way of serving PHP scripts is by using php-fpm - it's much quicker than any mod_* crap out there. Down to the core, php-fpm is a supervisor process that spawns child processes. They don't shut down after the script is served, which means they cache connections to MySQL - connect once, use multiple times. Basically, if you serve scripts using php-fpm, they will reuse the already established connection to MySQL, meaning that you won't be opening and closing connections for each request - this is extremely resource friendly and it lets you have lightning fast connection to MySQL. MySQL, being memory efficient and having the cached result is much quicker than Redis.
Now what does all of this mean for you - having a proper setup lets you have small code that's simple, easy, doesn't involve Redis and eliminates all the problems that you might have with cache invalidation and what not and you won't waste your memory to contain the same data twice.
Ingredients you need for this to work:
php-fpm
MySQL and InnoDB based tables and most of all - sufficient RAM and tweaked innodb_buffer_pool_size variable. That one controls how much RAM InnoDB is allowed to allocate for its purposes - the larger the better.
You eliminated Redis from the game, you kept your code simple and easy to maintain, you didn't duplicate data, you didn't introduce additional system to the play and you let software that's meant to take care of data do its job. Pretty cheap trade-off for maximum usefulness, even if you compile all the software from scratch - it won't take more than an hour or so to get it up and running.
Or, you can just ignore what I wrote and look for a solution using Redis.
We met the same problem and we chose to do same thing you are thinking of: remove all query caches affected by the table. It is not ideal like your said but fortunately our "write" is not as high as 40% so it's ok so far.
That's the nature of query based caching. As an alternative you can add entity based caching. Instead of caching the search result only, cache the entire table and do the search inside memory. We use C# LINQ so we can do pretty common queries in memory but if the search is too complicated then you are out of luck.

Ruby on Rails: Why first active record query takes longer?

If i execute active record query after some time gap, it takes longer.
Say Item.all takes .11 sec on first query and .003 later on. what could be possible reason for this behaviour?
edited:
active record query cache 's scope is action of controller. In my case, active record query in subsequent http request is also faster.
Possible explanations:
ActiveRecord Caching
Connection Pooling (it doesn't have to restart the connection)
Load on the web server or db server.
ActiveRecord caches the results from queries. The first query is actually hitting the database - ActiveRecord then waits for the operation to complete and parses the results into its objects. The next time an identical query is made, it has the results cached so that they are returned to you immediately, instead of going all the way back to the database.
Check the API for the QueryCache: it seems like you can clear the query cache (connection.clear_query_cache) if you want to wipe out cached queries.
This SO question also suggests self.class.uncached do ... end to bypass the cache but I am not sure if this still applies in Rails 3.
It's definitely ActiveRecord's caching you're looking at. See doc.
All of the methods are built on a simple caching principle that will
keep the result of the last query around unless specifically
instructed not to. The cache is even shared across methods to make it
even cheaper to use the macro-added methods without worrying too much
about performance at the first go.

sql caching disadvantage?

I have a web server with a lot of web sites with many database operations, and i am tryng sql caching as a way to improve the performance of my server.
In general, is there any disadvantage about sql caching in a common environment?
Thanks
Well, caching consumes RAM memory, so you'll need plenty of that.
I'm not sure about what caching mechanism SQL server employs, but it might be possible that your queries return stale data for some time.
Your best options of performance improvement is to load as much data into RAM as possible instead of caching.
The main problem with caching in a normal environment is cache expiration and stale data.
If you invalidate your cache every time data changes, you could end up rarely or never hitting the cache.
If you try to invalidate just the part of the cache that is changed, you have extra processing time to determine what to invalidate.
If you do not invalidate the cache or have cache timers, you may end up with stale data.
Depending on your environment and your requirements, you need to pick which solution best meets your needs. Sometimes it is ok to have some stale data, and in other applications it is not.
All the above points are valid. Invalidation of stale cache entries would be a key concern as well as syncing local cache across multiple servers. You may want to look into a grid cache (e.g. Hazelcast, mem-cache) and Heimdall Data. Heimdall acts as a transparent cache and provides invalidation logic built in.
In summary, sql caching itself is a good thing to do. It increases performance and can buffer sql traffic away from the database allowing scaling benefits.