Google Cloud SQL instance always in Maintenance status & Binary logs issue - mysql

I've had some of Google Cloud SQL MySQL 2nd Gen 5.7 instances with failover replications. Recently I noticed that the one of the instance overloaded with the storage overloaded with binlogs and old binlogs not deleted for some reason.
I tried restart this instance but it wont start since 17 March.
Normal process with binlogs on other server:
Problem server. Binlogs not clearing and server wont start and always under maintenance in the gcloud console.
Also I created one other server with same configuration and not binlogs never clearing. I have already 5326 binlogs here when on normal server I have 1273 binlogs and they are clearing each day.
What I tried with the problem server:
1 - delete it from the Google Cloud Platform frontend. Response: The instance id is currently unavailable.
2 - restart it with the gcloud command. Response: ERROR: (gcloud.sql.instances.restart) HTTPError 409: The instance or operation is not in an appropriate state to handle the request. Same response on any other command which I sent with the gcloud.
Also I tried to solve problem with binlogs to configure with expire_logs_days option, but it seems this option not support by google cloud sql instance.

After 3 days of digging I found a solution. Binlogs must cleared automatically when 7 days past. In 8 day it must clear binlogs. It still not deleted for me and still storage still climbing, but I trust it must clear shortly (today I guess)
As I told - SQL instance always in maintenance and can't be deleted from the gcloud console command or frontend. But this is interesting because I still can connect to the instance with the mysql command like mysql -u root -p -h 123.123.123.123. So, I just connected to the instance, deleted database which unused (or we can just use mysqldump to save current live database) and then I just deleted it. In the mysql logs (I'm using Stackdriver for this) I got a lot of messages like this: 2018-03-25T09:28:06.033206Z 25 [ERROR] Disk is full writing '/mysql/binlog/mysql-bin.034311' (Errcode: -255699248 - No space left on device). Waiting for someone to free space.... Let's me be this "someone".
When I deleted database it restarted and then it up. Viola. And now we have live instance. Now we can delete it/restore database on it/change storage for it.

Related

Google compute engine, instance dead? How to reach?

I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.

Good idea to use SQS to move thousands of databases?

We want to move from using MySQL on an EC2 instance to RDS and setup replication. Seems like a no-brainer, right? Well, I've got 30,000 databases to move (don't ask). While setting up replication seems to work well, the process of getting the 30,000 databases into RDS is a royal pain; it takes forever and something almost alway happens.
The nightly backup takes about two hours. I end up with a multi-GB SQL dump file. When I try to restore it, something almost always goes wrong: the RDS instance wasn't big enough memory-wise and crashed, the localhost ran out of swap space, the network connection went flaky. Whatever! I did get it to restore once; IIRC it took 23 hours (30K MySQL DBs are a ton of file IO).
So today, I decided to use mydumper. It generated 30,000 schema files for the database in about two hours, then suddenly, the source MySQL went into uninterruptible sleep according to top, I lost my client connections, strace showed it was still trying to read files, and the mydumper process crashed. I restarted the whole process and just checked the status; mysqld restarted 2.5 hours into it for some reason.
So here's what I'm thinking and I'd like your input: I write two python scripts: firstScript.py will run mydumper on a single database, update a status table, package up the SQL, put it onto an AWS SQS queue, repeating until no more databases are found; the secondScript.py reads from the queue, runs the SQL and updates the status table, repeating until no more messages are found.
I think this can work. Do you? The main thing I'm not sure of is this: can I simply run multiple secondScript.py by Ctrl-Z-ing them into the background?
Or does someone have a better way of moving 30,000 databases?
I would not use mysqldump or mydumper to make a logical dump. Loading the resulting SQL-format dump takes too long.
Instead, use Percona XtraBackup to make a physical backup of your EC2 instance, and upload the backup to S3. Then restore to the RDS instance from S3, setup replication on the RDS instance to your EC2 instance, and let it catch up.
The feature of restoring a physical MySQL backup to RDS was announced in November 2017.
See also:
https://www.percona.com/blog/2018/04/02/migrate-to-amazon-rds-with-percona-xtrabackup/
https://aws.amazon.com/about-aws/whats-new/2017/11/easily-restore-an-amazon-rds-mysql-database-from-your-mysql-backup/
You should try it out with a smaller instance than your 30k databases just so you get some practice with the steps. See the steps in the Percona blog I linked to above.

Create instance from backup on google cloud sql

I would have two questions related to cloud sql backups:
Are backups removed together with instance or maybe they are left for some days?
If no, is it possible to create new instance from backup of already gone instance?
I would expect it possible but looks like backups are only listable under the specific instance and there is no option to start new instance from existing backup.
Regarding to (2): It's actually possible to recover them if you are quick enough. They should still be there, even when Google says they're deleted.
If you know the name of the deleted DB run the following command to check if they are still there
gcloud sql backups list --instance=deleted-db-name --project your-project-name
If you can see any results, you are lucky. Restore them ASAP!
gcloud sql backups restore <backup-ID> --restore-instance=new-db-from-scratch-name --project your-project
And that's it!
Further info: https://geko.cloud/gcp-cloud-sql-how-to-recover-an-accidentally-deleted-database/
Extracted from Google Cloud SQL - Backups and recovery
Restoring from a backup restores to the instance from which the backup
was taken.
So the answer to (1) is they're gone and with regards to (2) if you didn't export a copy of the DB to your Cloud Storage, then No, you can't recover your deleted Cloud sQL instance content.
I noticed a change in this behavior recently (July 28, 2022). Part of our application update process was to run an on-demand backup on the existing deployment, tear down our stack, create a new stack, and then populate the NEW database from the contents of the backup.
Until now, this worked perfectly.
However, as of today, I'm unable to restore from the backup since the original database (dummy-db-19e2df4f) was deleted when we destroyed the old stack. Obviously the workaround is to not delete our original database until the new one has been populated, but this apparent change in behavior was unexpected.
Since the backup is listed, it seems like there are some "mixed messages" below.
List the backups for my old instance:
$ gcloud sql backups list --instance=- | grep dummy-db-19e2df4f
1659019144744 2022-07-28T14:39:04.744+00:00 - SUCCESSFUL dummy-db-19e2df4f
1658959200000 2022-07-27T22:00:00.000+00:00 - SUCCESSFUL dummy-db-19e2df4f
1658872800000 2022-07-26T22:00:00.000+00:00 - SUCCESSFUL dummy-db-19e2df4f
1658786400000 2022-07-25T22:00:00.000+00:00 - SUCCESSFUL dummy-db-19e2df4f
Attempt a restore to a new instance (that is, replacing the contents of new-db-13d63593 with that of the backup/snapshot 1659019144744). Until now this worked:
$ gcloud sql backups restore 1659019144744 --restore-instance=new-db-13d63593
All current data on the instance will be lost when the backup is
restored.
Do you want to continue (Y/n)? y
ERROR: (gcloud.sql.backups.restore) HTTPError 400: Invalid request: Backup run does not exist..
(uh oh...)
Out of curiosity, ask it to describe the backup:
$ gcloud sql backups describe 1659019144744 --instance=dummy-db-19e2df4f
ERROR: (gcloud.sql.backups.describe) HTTPError 400: Invalid request: Invalid request since instance is deleted.

RHEL5 rSync Mysql server

Objective: to be able to synchronize 2 linux server realtime.
My concern is after using Rsync to mirror the mysql server. The only thing it wasnt able to synchronize is the entries (ie. inserting data to the database using the insert query). How will I be able to solve this?
Things I've done:
scp the keys of the 2 server so that password wont be asked for each transaction
I used
rsync -avc /var/lib/mysql/ root#10.1.99.XXX:/var/lib/mysql/
to sync the database/tables, but wasn't able to sync the entries.
Isubaki,
It's not quite as simple as just using rsync, as mysql may have the files open at the time you are pushing them across. Linux will do the file copy ok, but using this technique, the table is locked in memory until the database is restarted.
I do have a script that will do the sync part, but it does require a database restart, which may not be what you want (you mention realtime sync)

Mysql have suddenly started regularly opening unsuccessful sockets

I've desperately tried to figure out what's happened here, but haven't seen this particular problem anywhere. I've 'inherited' (as in, not built any of it myself) management of a database server (remote, in a data warehouse, accessed by ssh) where some php daemons are running on a Linux server acting as data crawlers, inserting and processing information in a relatively steady stream into mysql.
A couple of days ago, the server crashed and came back on again. I logged in an restarted the mysql server and the crawlers, thinking no more of it. A day and a half later, the mysql server stopped working, and I couldn't diagnose it since I couldn't log into it, nor did it respond to "/etc/init.d/mysql stop" or varieties thereof. According to the log file, it kept throwing errors very regularly (once every four minutes and 16 seconds) and said that it had too many file handlers open. When I shut down the crawlers, however, I could log in again, but mysql kept throwing the errors. I checked lsof and it showed a lot of open sockets with "can't identify protocol" error.
mysqld 28843 mysql 1990u sock 0,4 2856488 can't identify protocol
mysqld 28843 mysql 1989u sock 0,4 2857220 can't identify protocol
^Thousands of these rows
I thought it was something the crawlers had done, and I restarted mysql and the failed sockets disappeared. But I was surprised to see that mysql kept opening new ones, even when the crawlers weren't running. It did this very regularly, about two new failed sockets a minute, regardless of whether the crawlers were active or not. I increased the maximum amount of filehandlers allowed for mysql to buy some time, but I'm obviously looking for a diagnosis and permanent solution.
All descriptions of such errors (socket leaks) that I've found on forums seems to be about your own software leaking, not closing its sockets. But this seems to be mysql itself that does it, and there has been no change in any of the code from when it worked fine, just a server crash and restart.
Any ideas?