Google Cloud master Failure - mysql

Master instance of google cloud sql keeps failing randomly and when it goes to fail-over, I got "The MySQL server is running with the --read-only". I changed the fail-over(replica) mysql instance to read-only : false. This removed the read only error but now the master and replica are not in sync.
Also, it keeps randomly pointing to the replica, suggesting that the master is down.
How do I get the master and replica in sync again?
Why does the master keep failing?
Thank you team!

Related

RDS blue green switch failing with replication status

I have a database cluster A in AWS RDS mysql aurora.
I create a blue green deployment that generates a database A-green. I choose "Switch" in the AWS UI. The switch fails and it shows in the logs:
Switchover from DB cluster A to A-green was canceled due to external replication on A. Stop replication from an external database to A before you switch over.
Where can I find this replication that the error talks about? I don't see anything in the console.
Also, show slave status returns empty
Cheers

Unexpected behavior of RDS snapshots in a replication array with an external master

RDS Snapshots don't seem to work as I would expect when set up with replication. I'd like to get some guidance on if I'm making incorrect assumptions, or just doing something wrong.
Here's what happened:
I set up an RDS instance as a slave to an external mysql instance (outside of AWS)
I let the instance catch up, replication was running successfully for a few days, taking nightly snapshots of the slave on RDS.
Some queries were run on the slave accidentally, causing errors for the replication, and causing the databases to get completely out of sync.
I restored the slave from a snapshot.
What I expected:
After the snapshot restored, replication on the new slave database would be able to catch back up to the position of the master.
What actually happened:
After the snapshot restored, data was restored, but replication settings were not. show slave status returned null.
TLDR; The AWS documentation states that RDS snapshots back up the entire database instance, so I would expect all of its settings to be backed up as well, including settings for an external master, but that doesn't seem to be the case. What are the limitations of RDS's snapshot capabilities, and how should replication with an external master be handled if the slave gets too far out of sync?
Thanks!
If the replication errors that you mentioned in your question stopped replication for extended period, Amazon AWS RDS stops replication. This is done to prevent excessive storage requirements in the source side. When the RDS replica is restored using a snap shot, the new replica will never catch-up in that case because the binary logs are also deleted from the source in this case. This is mentioned in the AWS documentation but it also states that for this to happen the replication error should continue for a month.

Google Cloud SQL instance always in Maintenance status & Binary logs issue

I've had some of Google Cloud SQL MySQL 2nd Gen 5.7 instances with failover replications. Recently I noticed that the one of the instance overloaded with the storage overloaded with binlogs and old binlogs not deleted for some reason.
I tried restart this instance but it wont start since 17 March.
Normal process with binlogs on other server:
Problem server. Binlogs not clearing and server wont start and always under maintenance in the gcloud console.
Also I created one other server with same configuration and not binlogs never clearing. I have already 5326 binlogs here when on normal server I have 1273 binlogs and they are clearing each day.
What I tried with the problem server:
1 - delete it from the Google Cloud Platform frontend. Response: The instance id is currently unavailable.
2 - restart it with the gcloud command. Response: ERROR: (gcloud.sql.instances.restart) HTTPError 409: The instance or operation is not in an appropriate state to handle the request. Same response on any other command which I sent with the gcloud.
Also I tried to solve problem with binlogs to configure with expire_logs_days option, but it seems this option not support by google cloud sql instance.
After 3 days of digging I found a solution. Binlogs must cleared automatically when 7 days past. In 8 day it must clear binlogs. It still not deleted for me and still storage still climbing, but I trust it must clear shortly (today I guess)
As I told - SQL instance always in maintenance and can't be deleted from the gcloud console command or frontend. But this is interesting because I still can connect to the instance with the mysql command like mysql -u root -p -h 123.123.123.123. So, I just connected to the instance, deleted database which unused (or we can just use mysqldump to save current live database) and then I just deleted it. In the mysql logs (I'm using Stackdriver for this) I got a lot of messages like this: 2018-03-25T09:28:06.033206Z 25 [ERROR] Disk is full writing '/mysql/binlog/mysql-bin.034311' (Errcode: -255699248 - No space left on device). Waiting for someone to free space.... Let's me be this "someone".
When I deleted database it restarted and then it up. Viola. And now we have live instance. Now we can delete it/restore database on it/change storage for it.

Can't delete google cloud sql replication master instance

I decided to play around with Google Could SQL and I setup a test sql instance, loaded it with some data and then setup replication on it in the google dev console. I did my testing and found out it all works great, the master/slave setup works as it should and my little POC was a success. So now I want to delete the POC sql instances but that's not going so well.
I deleted the replica instance fine (aka the 'slave') but for some reason the master instance still thinks there is a slave and therefore will not let me delete it. For example I run the following command in the gclound shell:
gcloud sql instances delete MY-INSTANCE-NAME
I get the following message:
ERROR: (gcloud.sql.instances.delete) The requested operation is not valid for a replication master instance.
This screenshot also shows that in the google dev console it clearly thinks there are no replicas attached to this instance (because I deleted them) but when I run:
gcloud sql instances describe MY-INSTANCE-NAME
It shows that there is a replica name still attached to the instance.
Any ideas on how to delete this for good? Kinda lame to keep on paying for this when it was just a POC that I want to delete (glad I didn't pick a high memory machine!)
Issue was on Google's side and they fixed it. Here were the sequence of events that led to the issue happening:
1) Change master's tier
2) Promote replica to master while the master tier change is in progress
Just had the same problem using GCloud. Deleting the failover replica first and then the master instance worked for me.

Create instance from backup on google cloud sql

I would have two questions related to cloud sql backups:
Are backups removed together with instance or maybe they are left for some days?
If no, is it possible to create new instance from backup of already gone instance?
I would expect it possible but looks like backups are only listable under the specific instance and there is no option to start new instance from existing backup.
Regarding to (2): It's actually possible to recover them if you are quick enough. They should still be there, even when Google says they're deleted.
If you know the name of the deleted DB run the following command to check if they are still there
gcloud sql backups list --instance=deleted-db-name --project your-project-name
If you can see any results, you are lucky. Restore them ASAP!
gcloud sql backups restore <backup-ID> --restore-instance=new-db-from-scratch-name --project your-project
And that's it!
Further info: https://geko.cloud/gcp-cloud-sql-how-to-recover-an-accidentally-deleted-database/
Extracted from Google Cloud SQL - Backups and recovery
Restoring from a backup restores to the instance from which the backup
was taken.
So the answer to (1) is they're gone and with regards to (2) if you didn't export a copy of the DB to your Cloud Storage, then No, you can't recover your deleted Cloud sQL instance content.
I noticed a change in this behavior recently (July 28, 2022). Part of our application update process was to run an on-demand backup on the existing deployment, tear down our stack, create a new stack, and then populate the NEW database from the contents of the backup.
Until now, this worked perfectly.
However, as of today, I'm unable to restore from the backup since the original database (dummy-db-19e2df4f) was deleted when we destroyed the old stack. Obviously the workaround is to not delete our original database until the new one has been populated, but this apparent change in behavior was unexpected.
Since the backup is listed, it seems like there are some "mixed messages" below.
List the backups for my old instance:
$ gcloud sql backups list --instance=- | grep dummy-db-19e2df4f
1659019144744 2022-07-28T14:39:04.744+00:00 - SUCCESSFUL dummy-db-19e2df4f
1658959200000 2022-07-27T22:00:00.000+00:00 - SUCCESSFUL dummy-db-19e2df4f
1658872800000 2022-07-26T22:00:00.000+00:00 - SUCCESSFUL dummy-db-19e2df4f
1658786400000 2022-07-25T22:00:00.000+00:00 - SUCCESSFUL dummy-db-19e2df4f
Attempt a restore to a new instance (that is, replacing the contents of new-db-13d63593 with that of the backup/snapshot 1659019144744). Until now this worked:
$ gcloud sql backups restore 1659019144744 --restore-instance=new-db-13d63593
All current data on the instance will be lost when the backup is
restored.
Do you want to continue (Y/n)? y
ERROR: (gcloud.sql.backups.restore) HTTPError 400: Invalid request: Backup run does not exist..
(uh oh...)
Out of curiosity, ask it to describe the backup:
$ gcloud sql backups describe 1659019144744 --instance=dummy-db-19e2df4f
ERROR: (gcloud.sql.backups.describe) HTTPError 400: Invalid request: Invalid request since instance is deleted.