MySQL DB replication hook to clean local cache - mysql

I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks

First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.

As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.

Related

Rails: Issues while using Octopus Gem for database sharding

I am using Octopus gem to handle database sharding in my application. I have a master and a slave. The insert query always hits the master and the read goes to slave.
But I am facing a weird issue like, after inserting a record and when I try to fetch it, record is not found. This is affecting my whole application.
I tried to resolve this issue by the following code.
Model.using(:master).where(id: 250)
This will force the model to fetch record from master rather than from slave. But if we add this everywhere in the application there is no point of sharding.
Any solution for this?
Thanks in advance.
Welcome to fun world of asynchronous replication.
Generally, when updating data to your master database, the data is replicated asynchronously to the slaves, meaning it will arrive there at any later point in time. Unfortunately, you can't known when that will happen as the only thing typically guaranteed is the order of updates to the slave, not when they will happen.
Often, you'll try to keep the replication delay rather small but you can't ignore it. Generally, when using asynchronous replication, you have to think critically about your data access strategies to avoid presenting unwanted stale data.
Sharding and replication definitely doesn't come for free. Database systems try hard to implement strongly defined levels of atomicity via transactions but due to CAP, things get more complicated (or sometimes impossible) when introducing a distributed system.
There isn't a generally correct answer for this issue as it is not directly clear, which data can be stale and which doesn't. Think about your access patterns and chose the appropriate server. Often, the simplest answer is to get rid of sharding and replication completely and to simply use a bigger server.

MySQL and Hibernate Simultaneous read write

I have a web application which has the following parts:
Commentators continuously doing match commentary through a browser based tool. The comments are inserted into DB using hibernat.
Lots of users are accessing a URL to read commentary. Hibernate is reading data from the table being updated by commentators in step #1.
There are some stored procedures as well which are set to run every 1 hour. Few of them access the same table (used in step #1 and #2) for reading and writing/updating purpose.
Now my problem is, whenever the site has 100+ concurrent users watching a particular match commentary, my MySQL goes down. It shows lots of queries stuck in processlist. Many of them are in "Copying to temp table" state. This makes the JBOSS restart frequently.
I am using transactions in hibernate for both reading and writing purposes. Please help because I loose big matches because of these crashes.
You have a performance problem. It is difficult to give solutions which always work. What you can consider to do is:
1) Revise the HQL (Hibernate) statements. For this best you write a protocol with <property name="show_sql">true</property> in the config file (or even a tool like log4jdbc if you want to see the actual parameters) and analyse the output. There you see which SQL requests you have most. In many cases a better strategy for reading and writing db data can significantly reduce the database traffic. And check you have good indexes for your table.
2) Consider to use a second level cache. (Normally hibernate only uses the first level cache, which is of no use in your case because it is bound to one session.) Then at least the requests for reading actual commentaries can be served by the cache and don't need to go to the database. (Pay attention: The cache might interfere with the stored procedures. Have a look if the cache product you like to use supports MySQL stored procedures. In the worst case you have to remove the stored procedures for the critical tables and let you application server do the job so it goes through the cache.)
3) If it is only a few tables which are heavily used you can consider to cache them by your application. That's more work, but perhaps you can do it exactly for the demands of your application, so you might be faster than with a general second level cache.
4) If nothing helps and the traffic is really too heavy then perhaps you have to invest in more hardware.
Good luck ;-)

Reduce database writes with memached

I would like to convert my stats tracking system not to write to the database directly, as we're hitting bottlenecks.
We're currently using memcached for certain aspects of the site, and I wanted to use it for storing stats and committing them to mysql DB periodically.
The issue lies however in the number of items (which is in the millions) for which potentially there could be stats collected between the cronjob runs that would commit them into the database. Other than running a SELECT * FROM data and checking for existence of every single memcache key, and then updating the table.... is there any other way to do this?
(I'm not saying below is gospel, this is just my gut feeling. As said later on, I don't have the specifics of your system :) And obviously no offence meant etc :) )
I would advice against using memcached for this. Memcached is build te quickly retrieve values that you've gotten before, not to store values. The big difference is that is your cache is getting full, you'll loose your data.
Normally, you'd just have no data in your cache, and recollect the data from the source, which is impossible in this case. That alone would be a reason for me to try an dissuade you from this.
Now you say the major problem is the mysql connection limit you are hitting. If you do simple stuff (like what we talked about in the comments: the insert delayed), it's just a case of increasing the limit. You should probably have enough power to have your scripts/users go to the database once and say "this should eventually be added", and then go away. If your users can't even open 1 connection for that, there's a serious resource problem you probably won't fix by adding extra layers of cache?
Obviously hard to say without any specs of the system, soft and hardware, but my suggestion would be to see if you can just let them open their connections by increasing the limit, and fiddle with the server variables a bit, instead of monkey-patching your system by using a memcached as an in-between layer.
I had a similar issue with statistic data. But please don't use memcached for it. You can't be sure that ALL your items will moved to DB. You can loose data and/or double process data.
You should analyse your bottleneck against how much data you are writing/reading and how many connections you need. And than switch to something scalable like Hadoop, Cassandra, Scripe and other systems.
You need to provide additional information on the platform that you are running: O/S, database (version), storage engine, RAM, CPU (if possible)?
Are you inserting into a single table or more than one table?
Can you disable the indexes on the tables you are inserting into as this slows down the insert functions.
Are you running any triggers or stored procedures to compute values as you insert the raw data?

MySQL dual master replication -- is this scenario safe?

I currently have a MySQL dual master replication (A<->B) set up and everything seems to be running swimmingly. I drew on the basic ideas from here and here.
Server A is my web server (a VPS). User interaction with the application leads to updates to several fields in table X (which are replicated to server B). Server B is the heavy-lifter, where all the big calculations are done. A cron job on server B regularly adds rows to table X (which are replicated to server A).
So server A can update (but never add) rows, and server B can add rows. Server B can also update fields in X, but only after the user no longer has the ability to update that row.
What kinds of potential disasters can I expect with this scenario if I go to production with it? Or does this seem OK? I'm asking mostly because I'm ignorant about whether any simultaneous operation on the table (from either the A copy or the B copy) can cause problems or if it's just operations on the same row that get hairy.
Dual master replication is messy if you attempt to write to the same database on both masters.
One of the biggest points of contention (and high blood pressure) is the use of autoincrement keys.
As long as you remember to set auto_increment_increment and auto_increment_offset, you can lookup any data you want and retrieve auto_incremented ids.
You just have to remember this rule: If you read an id from serverX, you must lookup needed data from serverX using the same id.
Here is one saving grace for using dual master replication.
Suppose you have
two databases (db1 and db2)
two DB servers (serverA and serverB)
If you impose the following restrictions
all writes of db1 to serverA
all writes of db2 to serverB
then you are not required to set auto_increment_increment and auto_increment_offset.
I hope my answer clarifies the good, the bad, and the ugly of using dual master replication.
Here is a pictorial example of 4 masters using auto increment settings
Nice article from Percona on this subject
Master-master replication can be very tricky, are you sure that this is the best solution for you ? Usually it is used for load-balancing purposes (e.g. round-robin connect to your db servers) and sometimes when you want to avoid the replication lag effect. A big known issue is the auto_increment problem which is supposedly solved using different offsets and increment value.
I think you should modify your configuration to simple master-slave by making A the master and B the slave, unless I am mistaken about the requirements of your system.
I think you can depend on
Percona XtraDB Cluster Feature 2: Multi-Master replication than regular MySQL replication
They promise the foll:
By Multi-Master I mean the ability to write to any node in your cluster and do not worry that eventually you get out-of-sync situation, as it regularly happens with regular MySQL replication if you imprudently write to the wrong server.
With Cluster you can write to any node, and the Cluster guarantees consistency of writes. That is the write is either committed on all nodes or not committed at all.
The two important consequences of Muti-master architecture.
First: we can have several appliers working in parallel. This gives us true parallel replication. Slave can have many parallel threads, and you can tune it by variable wsrep_slave_threads
Second: There might be a small period of time when the slave is out-of-sync from master. This happens because the master may apply event faster than a slave. And if you do read from the slave, you may read data, that has not changes yet. You can see that from diagram. However you can change this behavior by using variable wsrep_causal_reads=ON. In this case the read on the slave will wait until event is applied (this however will increase the response time of the read. This gap between slave and master is the reason why this replication named “virtually synchronous replication”, not real “synchronous replication”
The described behavior of COMMIT also has the second serious implication.
If you run write transactions to two different nodes, the cluster will use an optimistic locking model.
That means a transaction will not check on possible locking conflicts during individual queries, but rather on the COMMIT stage. And you may get ERROR response on COMMIT. I am highlighting this, as this is one of incompatibilities with regular InnoDB, that you may experience. In InnoDB usually DEADLOCK and LOCK TIMEOUT errors happen in response on particular query, but not on COMMIT. Well, if you follow a good practice, you still check errors code after “COMMIT” query, but I saw many applications that do not do that.
So, if you plan to use Multi-Master capabilities of XtraDB Cluster, and run write transactions on several nodes, you may need to make sure you handle response on “COMMIT” query.
You can find it here along with pictorial expln
From my rather extensive experience on this topic I can say you will regret writing to more than one master someday. It may be soon, it may not be for a long time, but it will happen. You will have two servers that each have some correct data and some wrong data, and you will either pick one as the authoritative source and throw the other away (probably without really knowing what you're throwing away) or you'll reconcile the two. No matter how you design it, you cannot eliminate the possibility of this happening, so it's a mathematical certainty that it will happen someday.
Percona (my employer) has handled probably several hundred cases of recovery after doing what you're attempting. Some of them take hours, some take weeks, one I helped with took a few months -- and that's with excellent tools to help.
Use a different replication technology or find a different way to do what you want to do. MMM will not help -- it will bring catastrophe sooner. You cannot do this with standard MySQL replication, with or without external tools. You need a replacement replication technology such as Continuent Tungsten or Percona XtraDB Cluster.
It's often easier to just solve the real need in some other fashion and give up multi-master writes, if you want to use vanilla MySQL replication.
and thanks for sharing my Master-Master Mysql cluster article. As Rolando clarified this configuration is not suitable for most production environment due to the limitation of autoincrement support.
The most adequate way to get a MySQL cluster is using NDB, which require at least 4 servers (2 management and 2 data nodes).
I have written a detailed article to get this running on two servers only, which is very similar to my previous article but using NDB instead.
http://www.hbyconsultancy.com/blog/mysql-cluster-ndb-up-and-running-7-4-and-6-3-on-ubuntu-server-trusty-14-04.html
Notice that I always recommend to analyse your needs and find out the most adequate solution, don't just look for available solutions and try to figure out if they fit with your needs or not.
-Hatem
I would highly recommend looking into a tool that will manage this for you. Multi-master replication can be very troublesome if things go wrong.
I would suggest something like Percona XtraDB Cluster. I've been following this project, and it looks very cool. I definitely think it will be a game changer in the MySQL world. It's still in beta though.

Mysql database sync between two databases

We are running a Java PoS (Point of Sale) application at various shops, with a MySql backend. I want to keep the databases in the shops synchronised with a database on a host server.
When some changes happen in a shop, they should get updated on the host server. How do I achieve this?
Replication is not very hard to create.
Here's some good tutorials:
http://www.ghacks.net/2009/04/09/set-up-mysql-database-replication/
http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html
http://www.lassosoft.com/Beginners-Guide-to-MySQL-Replication
Here some simple rules you will have to keep in mind (there's more of course but that is the main concept):
Setup 1 server (master) for writing data.
Setup 1 or more servers (slaves) for reading data.
This way, you will avoid errors.
For example:
If your script insert into the same tables on both master and slave, you will have duplicate primary key conflict.
You can view the "slave" as a "backup" server which hold the same information as the master but cannot add data directly, only follow what the master server instructions.
NOTE: Of course you can read from the master and you can write to the slave but make sure you don't write to the same tables (master to slave and slave to master).
I would recommend to monitor your servers to make sure everything is fine.
Let me know if you need additional help
three different approaches:
Classic client/server approach: don't put any database in the shops; simply have the applications access your server. Of course it's better if you set a VPN, but simply wrapping the connection in SSL or ssh is reasonable. Pro: it's the way databases were originally thought. Con: if you have high latency, complex operations could get slow, you might have to use stored procedures to reduce the number of round trips.
replicated master/master: as #Book Of Zeus suggested. Cons: somewhat more complex to setup (especially if you have several shops), breaking in any shop machine could potentially compromise the whole system. Pros: better responsivity as read operations are totally local and write operations are propagated asynchronously.
offline operations + sync step: do all work locally and from time to time (might be once an hour, daily, weekly, whatever) write a summary with all new/modified records from the last sync operation and send to the server. Pros: can work without network, fast, easy to check (if the summary is readable). Cons: you don't have real-time information.
SymmetricDS is the answer. It supports multiple subscribers with one direction or bi-directional asynchronous data replication. It uses web and database technologies to replicate tables between relational databases, in near real time if desired.
Comprehensive and robust Java API to suit your needs.
Have a look at Schema and Data Comparison tools in dbForge Studio for MySQL. These tool will help you to compare, to see the differences, generate a synchronization script and synchronize two databases.