Storing MySQL writes in Memory - mysql

This is a bit of an oddball question. I'm aware of using memcached to cache "read heavy" data in memory, but is it possible todo the same for writes?
For example: You have a chunk of data in memory (in memcached) and if you have to make any changes to that data, you make it in memory itself. At the end of a certain time period (hour, or day) you replicate all those changes into MySQL. So you are using storing things in memory rather than disk, and then at the end of the time period those changes become permanent when they are copied over to MySQL.
Is there a piece of software that can accomplish this ? Sample code maybe ?

Not possible for the critical data, as memcache do not guarantee data consistency.
Though you can use such a behavior for the session data and no special software needed. Just retrieve your data, alter it and save back.

yes, it can be in that way to reduce I/O overload.
i believe what you described is a "write-behind" scenario for cache data.
the implementation can be done with creating task queues and setup workers to do the job.

Perhaps MySQL Proxy can be of use for you? I haven't tried it so I don't know that it does what you require.

Related

Caching the data result of complex computation

I have a Spring Boot server application. Clients of this server ask for statistics about different things all the time. These statistics can be shared among clients, and must not be real time.
It's good enough if these statistics are refreshed every 15-30 mins.
Also, computing these statistics requires reading the whole database.
So, I'd like to cache these computed statistics and update them now and then.
What is your suggestion, what tool or pattern should I use?
I have the following ideas so far:
using memcached
upgrading to MySQL 5.7 which has JSON store, and store the data there
Please keep in mind that the hardware of my server is not too powerful: 512MB RAM and 1 CPU (cheapest option in DigitalOcean).
Thank you in advance!
Edit 1:
These statistics are composed of quite simple data structures: int to int maps, lists, etc. and they are NOT fitting well for a relational database.
Edit 2:
The whole data is only a few megabytes. The crutial point is that creating this data requires a lot of database reads, and a lot of clients are asking for it.
I also want to keep my server application stateless. I think it's important to mention.
A simple solution for the problem, is saving the data in JSON format to a file, and that's it.
Additionally, this file can be on a ram disk partition, so it will be blazing fast.

MySQL DB replication hook to clean local cache

I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks
First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.
As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.

Achieve a strong consistent view

I just started using couchbase and hoping to use it as my data store.
One of my requirements in performing a query that will return a certain field about all the documents in the store. This query is done once at the server startup.
For this purpose I need all the documents that exist and can't miss any of them.
I understand that views in couchbase are eventually consistent but I still hope this query can be done (at the cost of performance).
Notes about my configurating:
I have only one couchbase server instance (I dont need sharding or
replication)
I am using the java client (1.4.1)
What I have tried to do is saving my documents this way:
client.set(key, value, PersistTo.ONE).get();
And querying using:
query.setStale(Stale.FALSE);
Adding the PersistTo parameter caused the following exception:
Cause by: net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: <unknown>
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:167)
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:140)
So I guess I am actually asking 3 questions:
Is it possible to get the consistent results I need?
If so, is what I suggested the correct way of doing that?
How can I prevent those exceptions?
The mapping I'm using:
function (doc,meta) {
if (doc.doc_type && doc.doc_type == "MyType" && doc.myField) {
emit(meta.id,null);
}
}
Thank you
Is it possible to get the consistent results I need?
Yes it is possible to set Couchbase views to be consistent by setting the STALE flag to false as you've done. However there are performance impacts with this, so dependent on your data size the query may be slow, if you are only going to be doing it once a day then it should be ok.
Couchbase is designed to be a distributed system comprising of more than node, it's not really suitable for single node deployments. I have read (but can't find the link) that view performances are much better in larger clusters.
You are also forcing more of a sync processing model onto a system that shines with async requests, PersistTo is ok to use for some requests but not system wide on every call (personal opinion), it'll definitely throttle throughput and performance.
If so, is what I suggested the correct way of doing that?
You say the query is done after your application server is running, is this once per day or more? If once a day then your application should work (I'd consider upping the nodes ;)), if you have to do this query a lot and you are 'hammering' the node over and over with sets then I'd expect to see what you are currently experiencing.
How can I prevent those exceptions?
It could be a variety of reasons, what are the specs of your computer, RAM,CPU,DISK? How much ram is allocated to Couchbase, how much to your bucket, what % of the bucket ram is used?
I've personally seen this when I've hammered some lower end AWS instances on some not so amazing networks. What version of Couchbase are you using? It could be a whole variety of factors that and deserves to be a separate question.
Hope that helps!
EDIT regarding more information on the Stale = false parameter (from official docs)
http://docs.couchbase.com/couchbase-manual-2.2/#couchbase-views-writing-stale
The index is updated before the query is executed. This ensures that any documents updated (and persisted to disk) are included in the view. The client will wait until the index has been updated before the query has executed, and therefore the response will be delayed until the updated index is available.

Reduce database writes with memached

I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?

mysql high read & write

I've been reading about high read or write but what about both, what would be your advice? In my case a high number of people are writing data and another set of people are reading it straight after, this is all web based and there are no pattern to determine who is writing or reading. Also because the data are changing several time per second they can't be cache, but the "cache size" could be increase to avoid mysql to issue I/O all the time.
One of the idea would be to use mysql cluster, but the data will have to be write to all nodes at the same time, what would be the impact in term of performance.
Seeking for advices.
Another approach will be creating a replication setup. Where data is written into the master, data get replicated into the slave and reading is done from the salves
You can read more about it here http://dev.mysql.com/doc/refman/5.0/en/replication-solutions-scaleout.html