How to Handle Eventual Consistency Issues in MySQL Read/Write Splitting - mysql

I've been looking into solutions to scale MySQL. One that often comes up beyond adding a Memcached layer is read/write splitting -- all writes go to the master and all reads go to a set of load balanced slaves.
The one issue that obviously comes up with this approach is "eventual consistency." When I run a write on the master, replication to the read slaves takes a certain amount of time. Thus, if I make a request for a newly created row, it may not be there.
Does anyone know of specific strategies to handle this issue? I've read about a conceptual partial solution of the ability to "read-what-you-writes". But, does anyone have anyone have any ideas how to implement such a solution -- whether it be conceptually, or specifically in a Spring/Hibernate stack?

I've not done this, but here's an idea. You could have a memcache server on your write database that you connect to before each read query. When you do a write, add a key of some kind to your memcache, and when you replicate1, remove the key.
When you do the memcache read and you're reading a single record, if the key of a record is found, you should read it from the master only. If you're selecting several records, then read them from a slave, and then query each found ID against the memcache keys. If any found in memcache, re-read only those records from the master database.
You may find that there are some (write-heavy) use cases where this strategy would negate the benefits of having a read/write split. But I would wager that in most cases, the extra checking of memcache and the occasional master re-reads will still make it worthwhile.
1 If you are using standard replication and cannot track whether a particular record has replicated fully, just timestamp all your keys, and remove/expire them after a worst-case scenario delay. For example if your slaves lag behind your master by two minutes, ignore (and delete) any keys that are older than two minutes, since they are sure to be replicated.
That all said: don't forget there are lots of cases where lag is acceptable. For example if you have a website at which users update their profiles, if their changes do not fully propagate for five minutes, this is in most cases fine. The key is, imo, not to over-engineer something to get instant propagation if it is not necessary.

Related

AWS RDS MySQL Read Replica - do I need to update my API to point to the replica for any URIs that are only reading?

This is the first time I have worked with a read replica so please bare with me. I tried multiple searches on here and google to no avail.
I know how a primary database instance and a read replica on RDS. Replication is working great, I can connect to both instances with no problem. My question is two fold.
Do I need to update my API models so that read only operations are directed to the replica?
Are there any tips to get the best performance from this setup? Can I adjust indexes on the master so that write commands have less indexes than tables I need for read? Lastly, some of my models read a dataset, return the value, then log the activity in an audit log table. Would I need to split this functionality out so the logging activity occurs on the main write database?
Appreciate any guidance here.
It depends. The replica will always be slightly behind its source instance, so if you have queries that have a requirement to read current data always, they need to read from the source instance.
Different queries even in your single app will have different needs in that respect. So you really need to decide on a case-by-case basis which query can read from the read-replica instance.
I wrote a presentation about this: Read Write Splitting in MySQL or the video.
Regarding performance, you may have different secondary indexes on the same tables on each instance. Indexes depend on which queries you will run, so you have to decide first which queries you will run on the source versus the replica.
But consider that if the read-replica goes offline for some reason, or falls too far behind or something, you might want to be able to run the same queries on the source instance until the replica is ready again. In that case, you'd want the same indexes on both, or else your query will run very slowly.
Another possibility is that the source instance dies for some reason. In the cloud, you must always have a plan for resources disappearing on short notice, including databases! It happens. So if the source instance dies, the read-replica could take over as your new source. But if it has different indexes, different tables, different instance size, or whatever, then it may not be prepared to be that substitute.
It's simpler to just make the source and replica the same in as many ways as you can. Less details to track, and you're prepared for failover.

MySQL DB replication hook to clean local cache

I have the app a MySQL DB is a slave for other remote Master DB. And i use memcache to do caching of some DB data.
My slave DB can be updated if there are updates in a Master DB. So in my application i want to know when my local (slave) DB is updated to invalidate related cached data and display fresh data i got from master.
Is there any way to run some program when slave mysql DB is updated ? i would then filter q query and understand if i need to clean a cache or not.
Thanks
First of all you are looking for solution similar to what Facebook did in their db architecture (As I remember they patched MySQL for this).
You can build your own solution based on one of these techniques:
Parse replication log on slave side, remove cache entry when you see update of data in the log
Load UDF (user defined function) for memcached, attach trigger on replica side (it will call UDF remove function) to interested tables inside MySQL.
Please note that this configuration is complicated during the support and maintenance. If you can sacrifice stale data in the cache maybe small ttl will help you.
As Kirugan says, it's as simple as writing your own SQL parser, and ensuring that you also provide an indexed lookup keyed to the underlying data for anything you insert into the cache, then cross reference the datasets for any DML you apply to the database. Of course, this will be a lot simpler if you create a simplified, abstract syntax to represent the DML, but thereby losing the flexibilty of SQL and of course, having to re-implement any legacy code using your new syntax. Apart from fixing the existing code, it should only take a year or two to get this working right. Basing your syntax on MySQL's handler API rather than SQL will probably save a lot of pain later in the project.
Of course, if you need full cache consistency then you need to ensure that a logical transaction now spans all the relevant datacentres which will have something of an adverse impact on your performance (certainly much slower than just referencing the master directly).
For a company like facebook, with hundreds of thousands of servers and terrabytes of data (and no requirement for cache consistency) such an approach to solving the problem leads to massive savings. If you only have 2 servers, a better solution would be to switch to multi-master replication, possibly add another database node, optimize the storage (e.g. switching to ssds / adding fast bcache) make sure you have session affinity to the dbms from the aplication (but not stcky sessions) and spend some time tuning your dbms, particularly its cache performance.

multiple rails engines talking to one mySQL server for horizontally scaling application servers

I've seen pictures like this where multiple rails engines write to a single mySQL server.
1) Is this possible? Or does Rails want each application server to write to one database server?
2) If this is possible, how is it accomplished? Are there queues and a scheduler between the application servers and the write database server?
Scaling a mysql db is a pretty difficult thing to do, but its certainly been done plenty of times and there are a lot of best practices out there for you to take advantage of. The first thing you should know is that before you worry about scaling writes for a while yet, you probably need to scale your reads first.
Scaling reads can be done fairly easily using replication. There are several tools out there that make managing replication a lot easier such as Amazon RDS. Generally speaking many web severs can connect to many databases (as suggested by others), however you quickly run into scale issues once you have a lot of traffic, connections or whatever other action you are performing that generates load on the server.
As replicated severs are read only, you need to manage which sever you connect to depending on the action you're performing. I.e. if you had a users table, when creating, updating or deleting users you need to use the "write" database (the primary "source" sever) but when reading the user table, you can use one of the read replicas. This reduces the load on the primary write sever (allowing it to deal with even more writes) and as you can have multiple read databases behind a load balancer, you can get away with this structure for a very long time and scale reads across tens of database severs before you'll hit any significant issues (however most apps get away with 1-3).
There are situations where you will need to use your write database for read actions (although you should avoid it as much as possible) as the read replicas can be slightly behind the write dbs due to latency in replicating the write db queries, however most of the time you should be able to code knowing that there is the possibility that the read db is delayed (i.e. queue actions a reasonable period of time such that the updates will propagate across all the read severs) and simply use one of your read dbs rather than the write db.
Beyond this the key items to work on are ensuring you have efficient indexes and applying other best practices around maintaining a sensible data structure. You might also want to consider having 3 distinct "groups" of database servers. I generally like to have write, read and "stats" db groups. The write group for create, update and delete operations (as well as select for update), the read for general read items that must return their results quickly, and stats for anything that is going to be under high load and that you do not rely on for a prompt response (this keeps heavy queries that are not time sensitive away from your read db that you need quick responses from for general reads)
Once you get into a situation where you can no longer buy larger hardware and you're near maxing out your write capacity, you'll need to look into sharding, however that will take a lot of traffic / data (so dont worry about it unless you've done all of the above already).

MySQL dual master replication -- is this scenario safe?

I currently have a MySQL dual master replication (A<->B) set up and everything seems to be running swimmingly. I drew on the basic ideas from here and here.
Server A is my web server (a VPS). User interaction with the application leads to updates to several fields in table X (which are replicated to server B). Server B is the heavy-lifter, where all the big calculations are done. A cron job on server B regularly adds rows to table X (which are replicated to server A).
So server A can update (but never add) rows, and server B can add rows. Server B can also update fields in X, but only after the user no longer has the ability to update that row.
What kinds of potential disasters can I expect with this scenario if I go to production with it? Or does this seem OK? I'm asking mostly because I'm ignorant about whether any simultaneous operation on the table (from either the A copy or the B copy) can cause problems or if it's just operations on the same row that get hairy.
Dual master replication is messy if you attempt to write to the same database on both masters.
One of the biggest points of contention (and high blood pressure) is the use of autoincrement keys.
As long as you remember to set auto_increment_increment and auto_increment_offset, you can lookup any data you want and retrieve auto_incremented ids.
You just have to remember this rule: If you read an id from serverX, you must lookup needed data from serverX using the same id.
Here is one saving grace for using dual master replication.
Suppose you have
two databases (db1 and db2)
two DB servers (serverA and serverB)
If you impose the following restrictions
all writes of db1 to serverA
all writes of db2 to serverB
then you are not required to set auto_increment_increment and auto_increment_offset.
I hope my answer clarifies the good, the bad, and the ugly of using dual master replication.
Here is a pictorial example of 4 masters using auto increment settings
Nice article from Percona on this subject
Master-master replication can be very tricky, are you sure that this is the best solution for you ? Usually it is used for load-balancing purposes (e.g. round-robin connect to your db servers) and sometimes when you want to avoid the replication lag effect. A big known issue is the auto_increment problem which is supposedly solved using different offsets and increment value.
I think you should modify your configuration to simple master-slave by making A the master and B the slave, unless I am mistaken about the requirements of your system.
I think you can depend on
Percona XtraDB Cluster Feature 2: Multi-Master replication than regular MySQL replication
They promise the foll:
By Multi-Master I mean the ability to write to any node in your cluster and do not worry that eventually you get out-of-sync situation, as it regularly happens with regular MySQL replication if you imprudently write to the wrong server.
With Cluster you can write to any node, and the Cluster guarantees consistency of writes. That is the write is either committed on all nodes or not committed at all.
The two important consequences of Muti-master architecture.
First: we can have several appliers working in parallel. This gives us true parallel replication. Slave can have many parallel threads, and you can tune it by variable wsrep_slave_threads
Second: There might be a small period of time when the slave is out-of-sync from master. This happens because the master may apply event faster than a slave. And if you do read from the slave, you may read data, that has not changes yet. You can see that from diagram. However you can change this behavior by using variable wsrep_causal_reads=ON. In this case the read on the slave will wait until event is applied (this however will increase the response time of the read. This gap between slave and master is the reason why this replication named “virtually synchronous replication”, not real “synchronous replication”
The described behavior of COMMIT also has the second serious implication.
If you run write transactions to two different nodes, the cluster will use an optimistic locking model.
That means a transaction will not check on possible locking conflicts during individual queries, but rather on the COMMIT stage. And you may get ERROR response on COMMIT. I am highlighting this, as this is one of incompatibilities with regular InnoDB, that you may experience. In InnoDB usually DEADLOCK and LOCK TIMEOUT errors happen in response on particular query, but not on COMMIT. Well, if you follow a good practice, you still check errors code after “COMMIT” query, but I saw many applications that do not do that.
So, if you plan to use Multi-Master capabilities of XtraDB Cluster, and run write transactions on several nodes, you may need to make sure you handle response on “COMMIT” query.
You can find it here along with pictorial expln
From my rather extensive experience on this topic I can say you will regret writing to more than one master someday. It may be soon, it may not be for a long time, but it will happen. You will have two servers that each have some correct data and some wrong data, and you will either pick one as the authoritative source and throw the other away (probably without really knowing what you're throwing away) or you'll reconcile the two. No matter how you design it, you cannot eliminate the possibility of this happening, so it's a mathematical certainty that it will happen someday.
Percona (my employer) has handled probably several hundred cases of recovery after doing what you're attempting. Some of them take hours, some take weeks, one I helped with took a few months -- and that's with excellent tools to help.
Use a different replication technology or find a different way to do what you want to do. MMM will not help -- it will bring catastrophe sooner. You cannot do this with standard MySQL replication, with or without external tools. You need a replacement replication technology such as Continuent Tungsten or Percona XtraDB Cluster.
It's often easier to just solve the real need in some other fashion and give up multi-master writes, if you want to use vanilla MySQL replication.
and thanks for sharing my Master-Master Mysql cluster article. As Rolando clarified this configuration is not suitable for most production environment due to the limitation of autoincrement support.
The most adequate way to get a MySQL cluster is using NDB, which require at least 4 servers (2 management and 2 data nodes).
I have written a detailed article to get this running on two servers only, which is very similar to my previous article but using NDB instead.
http://www.hbyconsultancy.com/blog/mysql-cluster-ndb-up-and-running-7-4-and-6-3-on-ubuntu-server-trusty-14-04.html
Notice that I always recommend to analyse your needs and find out the most adequate solution, don't just look for available solutions and try to figure out if they fit with your needs or not.
-Hatem
I would highly recommend looking into a tool that will manage this for you. Multi-master replication can be very troublesome if things go wrong.
I would suggest something like Percona XtraDB Cluster. I've been following this project, and it looks very cool. I definitely think it will be a game changer in the MySQL world. It's still in beta though.

MySQL dual master

For my current project we are thinking of setting up a dual master replication topology for a geographically separated setup; one db on the us east coast and the other db in japan.
I am curious if anyone has tried this and what there experience has been.
Also, I am curious what my other options are for solving this problem; we are considering message queues.
Thanks!
Just a note on the technical aspects of your plan: You have to know that MySQL does not officially support multi-master replication (only MySQL Cluster provides support for synchronous replication).
But there is at least one "hack" that makes multi-master-replication possible even with a normal MySQL replication setup. Please see Patrick Galbraith's "MySQL Multi-Master Replication" for a possible solution. I don't have any experience with this setup, so I don't dare to judge on how feasible this approach would be.
There are several things to consider when replicating databases geographically. If you are doing this for performance reasons, be sure your replication model supports your data being "eventually consistent" as it can take time to bring the replication current in both, or many, locations. If your throughput or response times between locations is not good, active replication may not be the best option.
Setting up mysql as dual master does actually work fine in the right scenario done correctly. But I am not sure it fits very well in your scenario.
First of all, dual master setup in mysql is really a ring-setup. Server A is defined as master of B, while B is at the same time defined as the master of A, so both servers act as both master and slave. The replication works by shipping a binary log containing the sql statements which the slave inserts when it sees fit, which is usually right away. But if you're hammering it with local insertions, it will take a while to catch up. The slave insertions are sequential by the way, so you won't get any benefit of multiple cores etc.
The primary use of dual master mysql is to have redundancy on the server level with automatic fail-over (often using hearbeat on linux). Excluding mysql-cluster (for various reasons), this is the only usable automatic failover for mysql. The setup for basic dual master is easily found on google. The heartbeat stuff is a bit more work. But this is not really what you were asking about, since this really behaves as a single database server.
If you want the dual master setup because you always want to write to a local database (write to both of them at the same time), you'll need to write your application with this in mind. You can never have auto-incrementing values in the database, and when you have unique values, you must make sure that the two locations never write the same value. For example location A could write odd unique numbers and location B could write even unique numbers. The reason is that you're not guaranteed that the servers are in sync at any given time, so if you've inserted a unique row in A, and then an overlapping unique row in B before the second server catches up, you'll have a broken system. And if something first breaks, the entire system stops.
To sum it up: it's possible, but you'll need to tip-toe very carefully if you're building business software on top of this.
Because of the one-to-many architecture of MySQL replication, you have to have a replication ring with multiple masters: that is, each replicates from the next in a loop. For two, they replicate off each other. This has been supported from as far back as v3.23.
In a previous place I worked, we did it with v3.23 with quite a number of customers as a way of providing exactly what you're asking. We used SSH tunnels over the Internet to do the replication. It took us some time to get it reliable and several times we had to do a binary copy of one database to another (fortunately, none of them were over 2Gb nor needed 24-hour access). Also the replication in v3 was not nearly as stable as in v4 but even in v5, it will just stop if it detects any sort of error.
To accomodate the inevitable replication lag, we re-structured the application so that it didn't rely on AUTOINCREMENT fields (and removed that attribute from the tables). This was reasonably straightforward due to the data-access layer we had developed; instead of it using mysql_insert_id() for new objects, it created the new ID first and inserted it along with the rest of the row. We also implemented site IDs that we stored in the top half of the ID, because they were BIGINTs. This also meant we didn't have to change the application when we had a client who wanted the database in three locations. :-)
It wasn't 100% robust. InnoDB was just gaining some visibility so we couldn't easily use transactions, although we considered it. So there were race conditions occasionally when two objects tried to be created with the same ID. This meant one failed and we tried to report that in the app. But it was still a significant part of someone's job to watch over the replication and fix things when it broke. Importantly, to fix it before we got too far out of sync, because in a few cases the databases were being used in both sites and would quickly become difficult to re-integrate if we had to rebuild one.
It was a good exercise to be a part of, but I wouldn't do it again. Not in MySQL.