Access hard drive for VM that won't boot - google-compute-engine

The hard drive on my VM filled up (it was under a SSH login attack I believe) and now the VM won't boot.
How can I access the boot hard drive so I can clear out space and get it booting again?

You'll need to delete the instance without deleting the boot disk, attach the disk on a temporary instance and then mount it there. The instructions to do this can found here: https://cloud.google.com/compute/docs/troubleshooting#ssherrors (Mount your disk on a temporary instance)
You'll then be able to access the disk through the /mnt/myinstance directory, and delete the files. Once the disk is no longer full, you can detach the disk from the temporary instance, and then use it to recreate your original instance.

Related

google compute engine -- mount old system disk on new disk built from snapshot?

I have a GCE ubuntu 18.04 system that overflowed the / partition. Consequently, I can't start a remote shell to fix it. I created a new system from a snapshot, and stopped the old system.
I want to attach the old system disk to the new system, mount it, and delete a bunch of stuff so I can restart it.
However, when I edit the new system disk in the cloud console, the old system disk does not show up as a possible disk to attach. What do I need to do to the old system disk to both preserve it as a system disk, and allow attaching it to the new system?
What do I need to do to the old system disk to both preserve it as a
system disk, and allow attaching it to the new system?
Either the disk is still attached or the new instance is located in a different zone. Follow the steps below.
STEP 1:
Shut down your instance with the disk space problem. Login into the Google Cloud Console. Go to Compute Engine -> VM instances. Click on your instance and make note of the “Boot disk” name. This will be the first disk under “Boot disk and local disks”.
STEP 2:
Create a snapshot of the boot disk before doing anything further. While still in Compute Engine -> Disk. Click on your boot disk. Click on “CREATE SNAPSHOT”.
STEP 3:
Create a new instance in the same zone. A micro instance will work.
STEP 4:
Open a Cloud Shell prompt (this also works from your desktop if gcloud is set up). Execute this command. Replace NAME with your instance name (broken system) and DISK with the boot disk name and ZONE with the zone that the system is in:
gcloud compute instances detach-disk NAME --disk=DISK --zone=ZONE
Make sure that the command did not report an error.
STEP 5:
Now we will attach this disk to the new instance that you created.
Make sure that the repair instance is running before attaching the second disk. Sometimes an instance can get confused on which disk to boot from if more than one disk is bootable.
Go to Compute Engine -> VM instances. Click on your instance. Click Edit. Under “Additional disks” click “Add item”. For name enter/select the disk that you detached from your broken instance. Click Save.
STEP 6:
SSH into your new instance with both disks attached.
STEP 7:
Mount the second disk to the root file system.
Become superuser. Execute sudo -s
Execute df. Make sure that /dev/sdb1 is not mounted.
Create a directory for the mount point: mkdir /mnt/repair
Mount the second disk: mount /dev/sdb1 /mnt/repair

What if log file size exceeds my claimed Persistent volume?

I am working on logging of my application on Persistent Volume.
I am using OpenShift, I created storage(Persistent volume under nas-thin class) and allocated 64Gib to it. I added mount path to this PV for one of my pods where my application is running and generating logs in one of the folder named "logs".
My mount path is "/logs". Hence anything inside this folder will be root for my PVC.
I am appending my logs inside logs folder in a single file.
I tried to read about expanding PV but couldn't understand much.
What would happen if my log file size exceeds allocated PV size(which is 64Gib)?
That will depend on the persistent storage actually being used by the cluster. Some persistent volume providers will let you write more data than you actually defined. So you'll have to test how your storage and your application actually behave on your particular cluster.
That being said, it is generally a bad idea to have container workload log to a persistent volume. I would strongly recommend to log to STDOUT and then use an external logging system to manage your logs instead of writing to a file.
How will you deal with multiple replicas of your application running? Do you really want to go into each container to get the log files? How will you correlate logs between different replicas?
Applications running on OpenShift / Kubernetes should not manage their logs in files but write to STDOUT.

How to track disk usage on Container-Optimized OS

I have an application running on Container-Optimized OS based Compute Engine.
My application runs every 20min, fetches and writes data to a local file, then deletes the file after some processing. Note that each file is less than 100KB.
My boot disk size is the default 10GB.
I run into "no space left on device" error every month or so while attempting to write the file locally.
How can I track disk usage?
I manually checked the size of the folders and it seems that the bulk of the space is taken by /mnt/stateful_partition/var/lib/docker/overlay2.
my-vm / # sudo du -sh /mnt/stateful_partition/var/lib/docker/*
20K /mnt/stateful_partition/var/lib/docker/builder
72K /mnt/stateful_partition/var/lib/docker/buildkit
208K /mnt/stateful_partition/var/lib/docker/containers
4.4M /mnt/stateful_partition/var/lib/docker/image
52K /mnt/stateful_partition/var/lib/docker/network
1.6G /mnt/stateful_partition/var/lib/docker/overlay2
20K /mnt/stateful_partition/var/lib/docker/plugins
4.0K /mnt/stateful_partition/var/lib/docker/runtimes
4.0K /mnt/stateful_partition/var/lib/docker/swarm
4.0K /mnt/stateful_partition/var/lib/docker/tmp
4.0K /mnt/stateful_partition/var/lib/docker/trust
28K /mnt/stateful_partition/var/lib/docker/volumes
TL;DR: Use Stackdriver Monitoring and create an alert for DISK usage.
Since you are using COS images, you can enable Stackdriver Monitoring agent by simply adding the “google-monitoring-enabled” label set to “true” on GCE Instance metadata. To do so, run the command:
gcloud compute instances add-metadata instance-name --metadata=google-monitoring-enabled=true
Replace instance-name with the name of your instance. Remember to restart your instance to get the change done. You don't need to install the Stackdriver Monitoring agent since is already installed by default in COS images.
Then, you can use disk usage metric to get the usage of your disk.
You can create an alert to get a notification each time the usage of the partition reaches a certain threshold.
Since you are in a cloud, it is always the best idea to use the Cloud resources to solve Cloud issues.
Docker uses /var/lib/docker to store your images, containers, and local named volumes. Deleting this can result in data loss and possibly stop the engine from running. The overlay2 subdirectory specifically contains the various filesystem layers for images and containers.
To cleanup unused containers and images via command:
docker system prune.
Monitor it via command "watch"
sudo watch "du -sh /mnt/stateful_partition/var/lib/docker/*"

Why restored VM from Google cloud snapshot doesn't have data in its database?

I'm running an web application in google compute engine and have scheduled a snapshot for the VM [Ubuntu 16.04].
I tried restoring the VM from the last available snapshot. I'm able to bring up the web application from the restored VM. But the problem is there are no any data in the database [mongodb]. All the collections created by application and default data [data seeded during deployment] are present in the mongodb in restored VM, but other than that, there is no data.
Is this how Google snapshots work? Isn't the new restored VM supposed to have all the data till the time of snapshot creation?
Creating snapshot while all the apps are running may not prove 100% accurate because some data are in buffers / caches etc.
Your missing data might have been not yet written to the disk when the snapshot was being created.
Google documentation about creating snapshots is quite clear about it:
You can create a snapshot of a persistent disk even while your apps
write data to the disk. However, you can improve snapshot consistency
if you flush the disk buffers and sync your file system before you
create a snapshot.
Pause apps or operating system processes that write data to that
persistent disk. Then flush the disk buffers before you create the
snapshot.
Try following instructions and test the results.
If for some reason you can't completely stop the database try to just flush the buffer to the disk, freeze the file system (if possible) and then create a snapshot.
You can freeze file system by logging into the instance and typing sudo fsfreeze -f [example-disk_location] and unfreeze sudo fsfreeze -u [example-disk_location].
The perfect way (with guaranteed data integrity) is to either stop the VM or unmount the disk.

Compute Engine Instance

I have created a Google Compute Engine instance with CentOS and added some stuff there, such as Apache, Webmin, ActiveCollab, Gitolite etc.. etc.
The problem is that the VM is always running out of memory because the RAM is too low.
How do I change the assigned RAM in Google Compute Engine?
Should I have to copy the VM to another with bigger RAM? If so will it copy all the contents from my CentOS installation?
Can anyone give me some advises on how to get more RAM without having to reinstall everything.
Thanks
The recommended approach for manually managed instances is to boot from a Persistent root Disk. When your instance has been booted from Persistent Disk, you can delete the instance and immediately create a new instance from the same disk with a larger machine type. This is similar to shutting down a physical machine, installing faster processors and more RAM, and starting it back up again. This doesn't work with scratch disks because they come and go with the instance.
Using Persistent Disks also enables snapshots, which allow you to take a point-in-time snapshot of the exact state of the disk and create new disks from it. You can use them as backups. Snapshots are also global resources, so you can use them to create Persistent Disks in any zone. This makes it easy to migrate your instance between zones (to prepare for a maintenance window in your current zone, for example).
Never store state on scratch disks. If the instance stops for any reason, you've lost that data. For manually configured instances, boot them from a Persistent Disk. For application data, store it on Persistent Disk, or consider using a managed service for state, like Google Cloud SQL or Google Cloud Datastore.