google compute engine -- mount old system disk on new disk built from snapshot? - google-compute-engine

I have a GCE ubuntu 18.04 system that overflowed the / partition. Consequently, I can't start a remote shell to fix it. I created a new system from a snapshot, and stopped the old system.
I want to attach the old system disk to the new system, mount it, and delete a bunch of stuff so I can restart it.
However, when I edit the new system disk in the cloud console, the old system disk does not show up as a possible disk to attach. What do I need to do to the old system disk to both preserve it as a system disk, and allow attaching it to the new system?

What do I need to do to the old system disk to both preserve it as a
system disk, and allow attaching it to the new system?
Either the disk is still attached or the new instance is located in a different zone. Follow the steps below.
STEP 1:
Shut down your instance with the disk space problem. Login into the Google Cloud Console. Go to Compute Engine -> VM instances. Click on your instance and make note of the “Boot disk” name. This will be the first disk under “Boot disk and local disks”.
STEP 2:
Create a snapshot of the boot disk before doing anything further. While still in Compute Engine -> Disk. Click on your boot disk. Click on “CREATE SNAPSHOT”.
STEP 3:
Create a new instance in the same zone. A micro instance will work.
STEP 4:
Open a Cloud Shell prompt (this also works from your desktop if gcloud is set up). Execute this command. Replace NAME with your instance name (broken system) and DISK with the boot disk name and ZONE with the zone that the system is in:
gcloud compute instances detach-disk NAME --disk=DISK --zone=ZONE
Make sure that the command did not report an error.
STEP 5:
Now we will attach this disk to the new instance that you created.
Make sure that the repair instance is running before attaching the second disk. Sometimes an instance can get confused on which disk to boot from if more than one disk is bootable.
Go to Compute Engine -> VM instances. Click on your instance. Click Edit. Under “Additional disks” click “Add item”. For name enter/select the disk that you detached from your broken instance. Click Save.
STEP 6:
SSH into your new instance with both disks attached.
STEP 7:
Mount the second disk to the root file system.
Become superuser. Execute sudo -s
Execute df. Make sure that /dev/sdb1 is not mounted.
Create a directory for the mount point: mkdir /mnt/repair
Mount the second disk: mount /dev/sdb1 /mnt/repair

Related

How to track disk usage on Container-Optimized OS

I have an application running on Container-Optimized OS based Compute Engine.
My application runs every 20min, fetches and writes data to a local file, then deletes the file after some processing. Note that each file is less than 100KB.
My boot disk size is the default 10GB.
I run into "no space left on device" error every month or so while attempting to write the file locally.
How can I track disk usage?
I manually checked the size of the folders and it seems that the bulk of the space is taken by /mnt/stateful_partition/var/lib/docker/overlay2.
my-vm / # sudo du -sh /mnt/stateful_partition/var/lib/docker/*
20K /mnt/stateful_partition/var/lib/docker/builder
72K /mnt/stateful_partition/var/lib/docker/buildkit
208K /mnt/stateful_partition/var/lib/docker/containers
4.4M /mnt/stateful_partition/var/lib/docker/image
52K /mnt/stateful_partition/var/lib/docker/network
1.6G /mnt/stateful_partition/var/lib/docker/overlay2
20K /mnt/stateful_partition/var/lib/docker/plugins
4.0K /mnt/stateful_partition/var/lib/docker/runtimes
4.0K /mnt/stateful_partition/var/lib/docker/swarm
4.0K /mnt/stateful_partition/var/lib/docker/tmp
4.0K /mnt/stateful_partition/var/lib/docker/trust
28K /mnt/stateful_partition/var/lib/docker/volumes
TL;DR: Use Stackdriver Monitoring and create an alert for DISK usage.
Since you are using COS images, you can enable Stackdriver Monitoring agent by simply adding the “google-monitoring-enabled” label set to “true” on GCE Instance metadata. To do so, run the command:
gcloud compute instances add-metadata instance-name --metadata=google-monitoring-enabled=true
Replace instance-name with the name of your instance. Remember to restart your instance to get the change done. You don't need to install the Stackdriver Monitoring agent since is already installed by default in COS images.
Then, you can use disk usage metric to get the usage of your disk.
You can create an alert to get a notification each time the usage of the partition reaches a certain threshold.
Since you are in a cloud, it is always the best idea to use the Cloud resources to solve Cloud issues.
Docker uses /var/lib/docker to store your images, containers, and local named volumes. Deleting this can result in data loss and possibly stop the engine from running. The overlay2 subdirectory specifically contains the various filesystem layers for images and containers.
To cleanup unused containers and images via command:
docker system prune.
Monitor it via command "watch"
sudo watch "du -sh /mnt/stateful_partition/var/lib/docker/*"

Google compute engine, instance dead? How to reach?

I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.

Why restored VM from Google cloud snapshot doesn't have data in its database?

I'm running an web application in google compute engine and have scheduled a snapshot for the VM [Ubuntu 16.04].
I tried restoring the VM from the last available snapshot. I'm able to bring up the web application from the restored VM. But the problem is there are no any data in the database [mongodb]. All the collections created by application and default data [data seeded during deployment] are present in the mongodb in restored VM, but other than that, there is no data.
Is this how Google snapshots work? Isn't the new restored VM supposed to have all the data till the time of snapshot creation?
Creating snapshot while all the apps are running may not prove 100% accurate because some data are in buffers / caches etc.
Your missing data might have been not yet written to the disk when the snapshot was being created.
Google documentation about creating snapshots is quite clear about it:
You can create a snapshot of a persistent disk even while your apps
write data to the disk. However, you can improve snapshot consistency
if you flush the disk buffers and sync your file system before you
create a snapshot.
Pause apps or operating system processes that write data to that
persistent disk. Then flush the disk buffers before you create the
snapshot.
Try following instructions and test the results.
If for some reason you can't completely stop the database try to just flush the buffer to the disk, freeze the file system (if possible) and then create a snapshot.
You can freeze file system by logging into the instance and typing sudo fsfreeze -f [example-disk_location] and unfreeze sudo fsfreeze -u [example-disk_location].
The perfect way (with guaranteed data integrity) is to either stop the VM or unmount the disk.

Access hard drive for VM that won't boot

The hard drive on my VM filled up (it was under a SSH login attack I believe) and now the VM won't boot.
How can I access the boot hard drive so I can clear out space and get it booting again?
You'll need to delete the instance without deleting the boot disk, attach the disk on a temporary instance and then mount it there. The instructions to do this can found here: https://cloud.google.com/compute/docs/troubleshooting#ssherrors (Mount your disk on a temporary instance)
You'll then be able to access the disk through the /mnt/myinstance directory, and delete the files. Once the disk is no longer full, you can detach the disk from the temporary instance, and then use it to recreate your original instance.

Compute Engine Instance

I have created a Google Compute Engine instance with CentOS and added some stuff there, such as Apache, Webmin, ActiveCollab, Gitolite etc.. etc.
The problem is that the VM is always running out of memory because the RAM is too low.
How do I change the assigned RAM in Google Compute Engine?
Should I have to copy the VM to another with bigger RAM? If so will it copy all the contents from my CentOS installation?
Can anyone give me some advises on how to get more RAM without having to reinstall everything.
Thanks
The recommended approach for manually managed instances is to boot from a Persistent root Disk. When your instance has been booted from Persistent Disk, you can delete the instance and immediately create a new instance from the same disk with a larger machine type. This is similar to shutting down a physical machine, installing faster processors and more RAM, and starting it back up again. This doesn't work with scratch disks because they come and go with the instance.
Using Persistent Disks also enables snapshots, which allow you to take a point-in-time snapshot of the exact state of the disk and create new disks from it. You can use them as backups. Snapshots are also global resources, so you can use them to create Persistent Disks in any zone. This makes it easy to migrate your instance between zones (to prepare for a maintenance window in your current zone, for example).
Never store state on scratch disks. If the instance stops for any reason, you've lost that data. For manually configured instances, boot them from a Persistent Disk. For application data, store it on Persistent Disk, or consider using a managed service for state, like Google Cloud SQL or Google Cloud Datastore.