I have a running RDS Aurora MySQL 8.0.23 cluster running in production. The database is unencrypted and I need to enable encryption for it. As far as I understand, this is not possible to do directly. The procedure I am evaluating is:
Create a read replica on the current cluster.
Stop replication on replica and annotate binlog filename and
position.
Promote the read replica to a new encrypted cluster (maybe it
requires to do a snapshot before).
Set up back replication with the original cluster using binlog file
and position annotated before.
Wait until replication lag is zero.
Redirect production traffic to the new cluster.
Stop replication.
[Optional] Delete old cluster.
I have two issues with the above procedure:
Once created the replica, running commands like SHOW SLAVE STATUS
or SHOW REPLICA STATUS return empty set, so I can't annotate
binlog file and position. Please note that replication is enabled on
the original cluster (binlog_format is set to ROW).
It seems I can't promote the Aurora read replica to a new cluster,
the option is missing on the available actions. But according to the documentation it should be possible.
Has anyone have feedback about the issues above? What is the current up-to-date procedure to encrypt an Aurora MySQL cluster with minimum downtime and no data loss?
I added a reader replica to my RDS Aurora MySQL cluster. The instance is running with minor cpu usage but it does not show connections on the monitoring page. I have enabled detailed monitoring. Access groups are the same as the writer instance.
How to I ensure that traffic is going to my reader instance?
AWS RDS Aurora does not support splitting of read/write transactions,
In order to forward read only queries to read replica endpoint and read/write queries to your master endpoint you need to add function or a proxy to your application to inspect the query then forward it to read replica or to the master, in other word the application logic should manage this process.
I am trying to deploy two MySQL pods with the same PVC, but I am getting CrashLoopBackoff state when I create the second pod with the error in logs: "innoDB check that you do not already have another mysqld process using the same innodb log files". How to resolve this error?
There are different options to solve high availability. If you are running kubernetes with an infrastructure that can provision the volume to different nodes (f.e. in the cloud) and your pod/node crashes, kubernetes will restart the database on a different node with the same volume. Aside from a short downtime you will have the database back up running in a relatively short time.
The volume will be mounted to a single running mysql pod to prevent data corruption from concurrent access. (This is what mysql notices in your scenario as well, since it is no designed for shared storage as HA solution)
If you need more you can use the built in replication of mysql to create a mysql 'cluster' which can be used even if one node/pod should fail. Each instance of the mysql cluster will have an individual volume in that case. Look at the kubernetes stateful set example for this scenario: https://kubernetes.io/docs/tasks/run-application/run-replicated-stateful-application/
I have setup MYSQLHA as per https://kublr.com/blog/setting-up-mysql-replication-clusters-in-kubernetes-2/ have two nodes up and ready able to deploy pods on each of them and replicate data from master to slave within seconds.
1 Master node
2 Slave nodes
VMWARE ESXi setup 3 VM's on seperate subnets
I also have NFS shared setup just in case required.
Ref:- https://kublr.com/blog/setting-up-mysql-replication-clusters-in-kubernetes-2/
How to perform auto fail-overs and scaling?
Async master-slave replication of MySQL is not the best fit for this. I would go for something like Galera replication where all the nodes are active in the cluster, can act as seed nodes for new joiners when you scale up and a simple readiness probe is enough to exclude faulty node / include new ones in the Galera cluster.
Asynchronous replication with master-slave is a good choice for cases that are ie. geographically distributed, so that the latency does not affect your workloads.
I have created an Amazon Aurora Database cluster runing MySQL with three instances: the main instance that backs the cluster and two read replicas for balancing. However, the cluster does not seem to be balancing the reads at all. I have one replica managing 700+ Selects/sec maximizing the CPU at 99.75% or higher while the other replica is doing virtually nothing with a CPU usage of 4% at 1 select per second, if that. The main cluster instance itself is at 33% CPU usage as it is being written to simultaneously while the replicas should are being read from. The lag time between the replicas is under 20 milliseconds. My application is querying the read only endpoint of the cluster but no balancing appears to be happening. Does anyone have any insight into why this may be happening or why the replica is at such a high CPU usage? The queries being ran against it are not complex by any means.
Aurora Cluster endpoints are DNS records and they only do DNS round robin during resolution. This means that when your client application opens connections to a cluster endpoint, you end up resolving the endpoint to different instances (different IPs basically), there by striping your connections across multiple replicas. Past that point, there is no load balancing. Connections are striped across instances, and queries run on each of those connections go to the corresponding instance backing it.
Now consider the scenario where your connection pool was already created to the cluster endpoint when you have one instance behind it. Now, if you add more instances, there will be no impact to your application, unless you terminate your connection and reestablish them. You would do a DNS round robin again, and this time some of your connections would land on the new instance that you provisioned.
Few callouts:
In Aurora, you have 2 cluster endpoints. One (RW) endpoint always points to the current writer and one (RO) does the DNS round robin between your read replicas.
Also, DNS propagation might take a few seconds when failovers happen, so that occasional errors are quite natural when failovers occur.
Hope this helps.
We've implemented a driver to try to mitigate this problem, with some visible gains: https://github.com/DiceTechnology/dice-fairlink
It regularly discovers the read-replicas to catch up with cluster changes and round-robins connections among them.
Despite not measuring any CPU utilisation, we've observed a better load distribution than with the native DNS based round-robin of the cluster reader endpoint
The Aurora's DNS based load balancing works at the connection level (not the individual query level). You must keep resolving the endpoint without caching DNS to get a different instance IP on each resolution. If you only resolve the endpoint once and then keep the connection in your pool, every query on that connection goes to the same instance. If you cache DNS, you receive the same instance IP each time you resolve the endpoint.
Unless you use a smart database driver, you depend on DNS record updates and DNS propagation for failovers, instance scaling, and load balancing across Aurora Replicas. Currently, Aurora DNS zones use a short Time-To-Live (TTL) of 5 seconds. Ensure that your network and client configurations don’t further increase the DNS cache TTL. Remember that DNS caching can occur anywhere from your network layer, through the operating system, to the application container. For example, Java virtual machines (JVMs) are notorious for caching DNS indefinitely unless configured otherwise. Here are AWS documentation and Aurora whitepaper on configuring DNS cache ttl.
My guess is that you are not connecting to the cluster endpoint.
Load Balancing – Connecting to the cluster endpoint allows Aurora to load-balance connections across the replicas in the DB cluster. This helps to spread the read workload around and can lead to better performance and more equitable use of the resources available to each replica. In the event of a failover, if the replica that you are connected to is promoted to the primary instance, the connection will be dropped. You can then reconnect to the reader endpoint in order to send your read queries to the other replicas in the cluster.
New Reader Endpoint for Amazon Aurora – Load Balancing & Higher Availability
[EDIT]
To load balance within a single application, you will need to reconnect to the endpoint. If you use the same connection for all queries only one replica will be responding. However, opening connections is expensive so this might not provide much benefit unless your queries take some time to run.