How to set binary log retention in aws rds mysql - mysql

I want to set the binary log retention in aws rds for mysql. It looks like there are 2 places i can do this,
via a procedural call
CALL mysql.rds_set_configuration(name,value);
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mysql_rds_set_configuration.html
Or via the param option group
If the values are different which takes precedent?
side question: the value in the param group is the default one aws rds sets (30) days. Is there some way i can know/measure how long i would need the binary log for?

I don't know which one takes precedence, but you should be able to check that yourself easily by using mysql.rds_show_configuration(). Change the parameter group setting, and then check if that affects the value when you show configuration using the procedure.
I have an educated guess that the parameter group is like changing the persistent setting in the my.cnf file, and changing the setting using the procedure is like using SET GLOBAL, which only lasts until you restart the instance. Then it reads the persistent setting from the parameter group and forgets that you changed it with the procedure.
Is there some way i can know/measure how long i would need the binary log for?
You can't know for certain how much binlog retention you need.
That statement needs some explanation.
Binary logs are used for:
Point in time recovery. For this to work, you need a full backup, and enough binlogs to replay events since the backup. So you need binlog retention only to the most recent backup. The frequency of backups is up to you. If too much time passes and there isn't a continuous set of binlogs to do PITR, then no problem — just create a new full backup.
Replication. Normally, a replica downloads binlogs nearly immediately, so the binlog retention can be very low. But if a replica is offline for some time, it's good that the binlogs stay on the source instance. So the binlog retention has to be long enough so there aren't any binlogs missed when the replica comes back online. If the replica misses any binlogs because they expired, then the replica cannot catch up. It must be wiped and reinitialized from a new backup.
CDC using tools like Debezium. Similar pattern as replication. If the CDC runs periodically, there needs to be enough binlog retention to cover the period between CDC. This could be seconds, or days. It's up to you to determine how often this runs.
So how can the source instance know if a replica or a CDC client has disconnected, when it might come back to download more? It can't know. Perhaps those clients were decommissioned and will never reconnect. The source instance storing the binlogs has no way of knowing.
So it's up to you to know things like how long your replica will be offline.
Or stated another way, if you have X days of binlog retention, it's up to you to make sure the replica is back up before the binlogs it needs start expiring.
If you can't do that, then the replica needs to be reinitialized. At my last DBA job, we had so many replicas that were offline for days due to server failures, we had to reinitialize replicas at least once a week.

Related

MySQL/MariaDB read preference from slave with max staleness

I am using Mysql/MariaDB with Innodb storage engine version 10.x.
I want to setup a cluster with master-slave configuration. There is an option to read data from slave using --innodb-read-only or --read-only.
However in addition to the above, client needs to read the data from slave if and only if max slave lag is less than x seconds.
Slaves can lag behind the primary due to network congestion, low disk throughput, long-running operations, etc. The read preference with max allowed staleness option should let application specify a maximum replication lag, or “staleness”, for reads from slaves. When a secondary’s estimated staleness exceeds, the client stops using it for read operations from slaves and start reading from master.
I would like to know if there is an option in MySql/InnoDB?
There's no automatic option for switching the query to the master. This is handled by application logic.
You can run a query SHOW SLAVE STATUS and one of the fields returned is Seconds_Behind_Master. You would have to write application code to check this, and if the lag is greater than your threshold, query the master instead.
You might find some type of proxy that can do this logic for you. See https://mydbops.wordpress.com/2018/02/19/proxysql-series-mysql-replication-read-write-split-up/
It's not always the best option to treat a replica with X seconds of lag as unusable. Some queries are perfectly okay regardless of the lag. I wrote a presentation about this some years ago, and it includes some example queries. Read / Write Splitting with MySQL and PHP (Percona webinar 2013)
There are many Proxy products that may have code for such.
If you automatically switch to the Master, then it may get overwhelmed, thereby leading to worse system problem.
If you try to switch to another Slave, it is too easy to get into a flapping situation.
Galera has a way to deal with "critical read", if you wanted to go to a Cluster setup instead of Master + Slaves.
If part of the problem is the distance between Master and Slave, and if you switch to the Master, where is the Client? If it is near the Slave, won't the added time to reach the master cancel out some of the benefit?
Avoid long-running queries, beef up the Slave to avoid slow disks, speed up queries that are hitting the disk a lot, look into network improvements.
In summary, I don't like the idea of attempt to move a query to the Master; I would work on dealing with the underlying problem.
MariaDB MaxScale has multiple ways of dealing with replication lag.
The simplest method is to limit the maximum allowed replication lag with the max_slave_replication_lag parameter. This works exactly the way you described: if a slave is too many seconds behind the master, other slaves and, as a last resort, the master is used. This is the most common method of dealing with replication lag in MaxScale.
Another option is to use the causal_reads feature which leverages the MASTER_GTID_WAIT and other features found in MariaDB 10.2 and newer versions. This allows read consistency without adding additional load on the master. This does come at the cost of latency: if the server is lagging several seconds behind the read could take longer. This option is used when data consistency is critical but the request latency is not as important.
The third option is to use the CCRFilter to force reads to the master after a write happens. This is a simpler approach compared to causal_reads but it provides data consistency at the cost of increased load on the master.

RDS Read Replica Considerations

We hired an intern and want to let him play around with our data to generate useful reports. Currently we just took a database snapshot and created a new RDS instance that we gave him access to. But that is out of date almost immediately due to changes on the production database.
What we'd like is a live (or close-to-live) mirror of our actual database that we can give him access to without worrying about him modifying any real data or accidentally bringing down our production database (eg by running a silly query like SELECT (*) FROM ourbigtable or a really slow join).
Would a read replica be suitable for this purpose? It looks like it would at least be staying up to date but I'm not clear what would happen if a read replica went down or if data was accidentally changed on it or any other potential liabilities.
The only thing I could find related to this was this SO question and this has me a bit worried (emphasis mine):
If you're trying to pre-calculate a lot of data and otherwise modify
what's on the read replica you need to be really careful you're not
changing data -- if the read is no longer consistent then you're in
trouble :)
TL;DR Don't do it unless you really know what you're doing and you
understand all the ramifications.
And bluntly, MySQL replication can be quirky in my experience, so even
knowing what is supposed to happen and what does happen if there's as
the master tries to write updated data to slave you've also
updated.... who knows.
Is there any risk to the production database if we let an intern have at it on an unreferenced read replica?
We've been running read-replicas of our production databases for a couple years now without any significant issues. All of our sales, marketing, etc. people who need the ability to run queries are provided access to the replica. It's worked quite well and has been stable for the most part. The production databases are locked down so that only our applications can connect to it, and the read-replicas are accessible only via SSL from our office. Setting up the security is pretty important since you would be creating all the user accounts on the master database and they'd then get replicated to the read-replica.
I think we once saw a read-replica get into a bad state due to a hardware-related issue. The great thing about read-replicas though is that you can simply terminate one and create a new one any time you want/need to. As long as the new replica has the exact same instance name as the old one its DNS, etc. will remain unchanged, so aside from being briefly unavailable everything should be pretty much transparent to the end users. Once or twice we've also simply rebooted a stuck read-replica and it was able to eventually catch up on its own as well.
There's no way that data on the read-replica can be updated by any method other than processing commands sent from the master database. RDS simply won't allow you to run something like an insert, update, etc. on a read-replica no matter what permissions the user has. So you don't need to worry about data changing on the read-replica causing things to get out of sync with the master.
Occasionally the replica can get a bit behind the production database if somebody submits a long running query, but it typically catches back up fairly quickly once the query completes. In all our production environments we have a few monitors set up to keep an eye on replication and to also check for long running queries. We make use of the pmp-check-mysql-replication-delay command in the Percona Toolkit for MySQL to keep an eye on replication. It's run every few minutes via Nagios. We also have a custom script that's run via cron that checks for long running queries. It basically parses the output of the "SHOW FULL PROCESSLIST" command and sends out an e-mail if a query has been running for a long period of time along with the username of the person running it and the command to kill the query if we decide we need to.
With those checks in place we've had very little problem with the read-replicas.
The MySQL replication works in a way that what happens on the slave has no effect on the master.
A replication slave asks for a history of events that happened on the master and applies them locally. The master never writes anything on the slaves: the slaves read from the master and do the writing themselves. If the slave fails to apply the events it read from the master, it will stop with an error.
The problematic part of this style of data replication is that if you modify the slave and later modify the master, you might have a different value on the slave than on the master. This can be avoided by turning on the global read_onlyvariable.

Reliability of MySQL master-slave replication

I have a an application that requires a master catalogue of about 30 tables which require to be copied out to many (100+) slave copies of the application. Slaves may be in their own DB instance or there may be multiple slaves in single DB instances. Any changes to the Master catalogue require to be copied out to the slaves within a reasonable time - around 5 minutes. Our infrastructure is all AWS EC2 and we use MySQL. Master and slaves will all reside within a single AWS region.
I had planned to use Master-Slave replication but I see reports of MySQL replication being sometimes unreliable and I am not sure if this is due to failings inherent in the particular implementations or failings in MySQL itself. We need a highly automated and reliable system and it may be that we have to develop monitoring scripts that allow a slave to continuously monitor its catalogue relative to the master.
Any observations?
When I was taking dance lessons before my wedding, the instructor said, "You don't have to do every step perfectly, you just have to learn to recover gracefully when missteps happen. If you can do that quickly, with a smile on your face, no one will notice."
If you have 100+ replicas, expect that you will be reinitializing replicas frequently, probably at least one or two every day. This is normal.
All software has bugs. Expecting anything different is, frankly, naive. Don't expect software to be flawless and continue operating 24/7 indefinitely without errors, because you will be disappointed. You should not seek a perfect solution, you should think like a dancer and recover gracefully.
MySQL replication is reasonably stable, and no less so than other solutions. But there are a variety of failures that can happen, without it being MySQL's fault.
Binlogs can develop corrupted packets in transit due to network glitches. MySQL 5.6 introduced binlog checksums to detect this.
The master instance can crash and fail to write an event to the binlog. sync_binlog can help to ensure all transactions are written to the binlog on commit (though with overhead for transactions).
Replica data can fall out of sync due to non-deterministic SQL statements, or packet corruption, or log corruption on disk, or some user can change data directly on a replica. Percona's pt-table-checksum can detect this, and pt-table-sync can correct errors. Using binlog_format=ROW reduces the chance of non-deterministic changes. Setting the replicas read-only can help, and don't let users have SUPER privilege.
Resources can run out. For example, you could fill up the disk on the master or the replica.
Replicas can fall behind, if they can't keep up with the changes on the master. Make sure your replica instances are not under-powered. Use binlog_format=ROW. Write fewer changes to an individual MySQL master. MySQL 5.6 introduces multi-threaded replicas, but so far I've seen some cases where this is still a bit buggy, so test carefully.
Replicas can be offline for an extended time, and when they come back online, some of the master's binlogs have been expired so the replica can't replay a continuous stream of events from where it left off. In that case, you should trash the replica and reinitialize it.
Bugs happen in any software project, and MySQL's replication has had their share. You should keep reading release notes of MySQL, and be prepared to upgrade to take advantage of bug fixes.
Managing a big collection of database servers in continuous operation takes a significant amount of full-time work, no matter what brand of database you use. But data has become the lifeblood of most businesses, so it's necessary to manage this resource. MySQL is no better and no worse than any other brand of database, and if anyone tells you something different, they're selling something.
P.S.: I'd like to hear why you think you need 100+ replicas in a single AWS region, because that is probably overkill by an order of magnitude for any goal of high availability or scaling.

What is a good way to show the effect of replication in MySQL?

We have to show a difference to show the advantages of using replication. We have two computers, linked by teamviewer so we can show our class what we are doing exactly.
Is it possible to show a difference in performance? (How long it takes to execute certain queries?)
What sort queries should we test? (in other words, where is the difference between using/not using replication the biggest)
How should we fill our database? How much data should be there?
Thanks a lot!
I guess the answer to the above questions depends on factors such as which storage engine you are using, size of the database, as well as your chosen replication architecture.
I don't think replication will have much of an impact on query execution for simple master->slave architecture. If however, you have an architecture where there are two masters: one handling writes, replicating to another master which exclusively handles reads, and then replication to a slave which handles backups, then you are far more likely to be able to present some of the more positive scenarios. Have a read up on locks and storage engines, as this might influence your choices.
One simple way to show how Replication can be positive is to demonstrate a simple backup strategy. E.g. Taking hourly backups on a master server itself can bring the underlying application to a complete halt for the duration of the backup (Taking backups using mysqldump locks the tables so that no read/write operations can occur). Whereas replicating to a slave, then taking backups from there negates this affect.
If you want to show detailed statistics, it's probably better to look into some benchmarking/profiling tools (sysbench,mysqlslap,sql-bench to name a few). This can become quite complex though.
Also might be worth looking at the Percona Toolkit and the Percona monitoring plugins here: http://www.percona.com/software/
Replication has several advantages:
Robustness is increased with a master/slave setup. In the event of problems with the master, you can switch to the slave as a backup
Better response time for clients can be achieved by splitting the load for processing client queries between the master and slave servers
Another benefit of using replication is that you can perform database backups using a slave server without disturbing the master.
Using replication always a safe thing to do you should be replicating your Production server always incase of failure it will be helpful.
You can show seconds_behind_master value while showing replication performance, this shows indication of how “late” the slave is this value should not be more than 600-800 seconds but network latency does matter here.
Make sure that Master and Slave servers are configured correctly now
You can stop slave server and let Master server has some updates/inserts (bulk inserts) happening and now start slave server you will see larger seconds_behind_master value it should be keep on decreasing till reaches 0 value.
There is a tool called MONyog - MySQL Monitor and Advisor which shows Replication status in real-time.
Also what kind of replication to use whether statement based or row based has been explained here
http://dev.mysql.com/doc/refman/5.1/en/replication-sbr-rbr.html

Best way to migrate servers without losing any data and with no downtime(?)

This is a methodology question from a freelancer, with a corollary on MySQL.. Is there a way to migrate from an old dedicated server to a new one without losing any data in-between - and with no downtime? In the past, I've had to lose MySQL data between the time when the new server goes up (i.e., all files transferred, system up and ready), and when I take the old server down (data still transferred to old until new one takes over). There is also a short period where both are down for DNS, etc., to refresh.
Is there a way for MySQL/root to easily transfer all data that was updated/inserted between a certain time frame?
I'd make a sorry page, put it up on the old server, transfer all data to the new one and then switch DNS. Though there will be a downtime.
What I like to do is close the site and starting to move DB to other server using these commands: 2, then move all files (php ..etc) to the other server (if you have some store data or change files every hour, like image upload). and point the old server to the new DB server while the DNS is changing to all to the new server.
The longest downtime is from DNS switch - can take several hours and even days till all clients caches are expired.
To avoid it:
set up application on new server to access DB on old one, or just proxy http requests with nginx to the old one, depending on what is more acceptable.
then goes DNS switch, some clients go to ld server, some to new, here you can wait for 24+ hours to make sure all requests go to new server
While DNS switches - rehearse mysql transition.
make a 'sorry/maintanance page', there're plenty of guides how to do that usung rewrites. You'll need it anyway
measure how fast you can dump-transfer-restore db, if time is acceptable - this is the simplest, but remember to give some margin
if previous is too slow - you can try binlog method suggested in previos answer
minimal downtime can be achieved by making new server a mysql slave to the old one, under the hood it is just downloads binlog from master on-the-fly and you will save time on transferring the whole log, most probably during minimal load slave will be just several seconds behind master and catch up very quickly once app is taken down, see how to force slave to catch up.
Write a script, that does all transition for you - enables maintenance mode, locks master db, waits till slave catches up, makes slave a new master, replaces app config with new db, disables maintenance, switches app etc. This way you save time on typing commands youself, test on staging environment to avoid possible errors (also remember to set larger mysql timeout, just in case slave is a lot behind)
here goes the transition itself, by running script from previous step
Also if you use file uploads to a local filesystem - these need to be synced too and on lots of files this is more pain than with db, because even rsync scan for changes can take a lot of time.
Check out the MySQL binary log.
Sure. Enable bin logging on the source server. After that is started, make a DB dump and transfer it to the new server and import it. Then, when you're ready to make the switch, change DNS (let the change propagate while you're working), then take the site down on both servers. Copy the binlogs to the new server and run them again starting at the date/time of the dump.