Memo is an elegant solution to implement a cache.
Is there a way to use it to implement an updatable cache.
Related
We have an enterprise application that uses an SQL database. The database access characteristics are about 90% reads. The data that does get updated or created needs to be up-to-date immediately. The cache needs to be correctly invalidated with high certainty. The entities are referred to by their primary key for 98% of the cases.
The application is based on Node.js and is AWS-native. Since the application is AWS-native, I'd like to rely on managed services from AWS rather than hosting my own. One option is to implement our read-through Redis-based cache. Upon retrieving the entities, we'd check the cache and if the data is not cached we'd put it into the cache before turning it to the user. The parts of the code that update those entities will invalidate the cache by primary key.
Generally speaking, in computer science cache coherency is one of the most challenging problems to get right. I am of the opinion that rather than implementing a Redis cache and thinking through all of the possible scenarios for correctly invalidating it, it is wiser to instead configure an Aurora read-replica specifically for reading frequently accessed entities. The RDBMS will do a much better job at caching than anything we can build ourselves.
So, I am facing two options -- go through the effort of implementing my own caching, or use read replicas. My personal opinion is to use a read replica.
Any advice is greatly appreciated, as always.
Yes, you're right, cache invalidation is a tough problem. The simplest solution is to add code to your data writes, to replace the cached values. So they're always current. But this is easy only if the cached values have a pretty much 1-to-1 correlation with rows in your database.
An advantage of your own cache is that you can cache data that is not 1-to-1 with rows of data in the database. You might cache an entire HTML fragment for a drop-down menu for example. That could be the result of several SQL queries. It could be quite an advantage to cache data that is higher up the "food chain" so to speak. But cache invalidation becomes less straightforward. Best for storing results of queries that don't change often.
Using a read-replica is not a substitute for using a cache. Querying a read-replica still has overhead of making a database connection, authentication, SQL query parsing and optimization, locking, and all the other overhead that goes into RDBMS workings.
Querying data from a cache can be orders of magnitude faster.
Both have their place. It's best to use both a cache and a read-replica for different tasks. I would also add message queues as an important technology. I believe database, cache, and queue form a three-legged stool.
But you must have experience and judgment to know when each is the best tool for a given case.
I just added Memcached as my second layer cache for Hibernate. Performance actually took a significant hit after installing the cache. All queries are slower. I realized that the reason is probably due to most of my queries aren't based on id, so second layer cache is not being used.
My question is shouldn't non id-based queries just go straight to the database without ever hitting the cache? Aka, the decision making of whether the query is "cache appropriate" be determined prior to hitting the cache? If so, shouldn't performance be faster?
When I was checking Hibernate code, it looked like Hibernate cannot reuse cache when using HQL queries (it didn't have compiler from HQL to their caching mechanism).
I can recommend you rather use fjorm instead of Hibernate. Disclaimer: I'm a founder of fjorm.
I want data queried from several heterogeneous databases, and joined to 1k~1m rows of results each time. in order to improve the performance, I want to add a cache for the result data. because the user may do some sorting or filtering on the result data as well, I would like to cache the results to a relational-database-like system, rather than KV-cache(memcached)
Is there any good tools for this kind of usage? IMHO, mysql has good read performance, but its write performance is not suited as a cache system, is that true?
Hibernate caches automatically, but I've been in this situation before and caching isn't the way to go: It adds more problems than it solves, and usually only briefly delays the inevitable change of technology to... a NoSQL database approach.
Try lucene or mongodb, and store the data fully denormalized in there. You won't be sorry.
I am using linq with linq to sql for data validation in my app.
How can I be positive that when querying my data context, the query won't hit the database?
I wan't to only access the data that has been pre-loaded and validate upon that.
Let's say that concurrency is not an issue here.
If you want to be 100% sure, you need to get the data in memory with e.g. a ToList() after your query.
Dispose the original datacontext after that and you can be sure your entities in the List<> will not hit the database anymore. (They just give you an exception instead...)
However, you will not be querieng the datacontext anymore so it is not a complete answer to your question. If you perform a new query to your datacontext, as far as I know it will always hit the database. Linq has no built in cache.
can this be cached?
No, you would need to enumerate it (e.g. using ToList()) and cache the results of the enumeration. Caching the IQueryable object itself is effectively just caching the query: in that scenario it will still be re-executed every time you try to use the results.