I have a strange problem whereby instances in an instance-group reboot themselves when I give them a sudo poweroff command (I'm doing this in a startup-script is that makes any difference...)
I've also tried the more elaborate gcloud compute instances delete -q --zone europe-west1-c $HOSTNAME to no avail.
What is the correct way to do this?
Instance groups spawn and restart instances on demand as required by its management policy. If necessary, when an instance goes down, the policy will wake it again; when deleted, another one will be created in its place.
Removing an instance from an instance group requires modifying the instance group as described here. Resizing the instance group size depends on the management policy:
For replica pool managed instance groups check here
For autoscaler managed instances groups check here
Hope this helps.
Related
I'm trying to set up a small blog server on Google Cloud Platform using the free tier f1.micro instance. I'm using Ubuntu 20.04 LTS as the base image (Ubuntu is the only Linux distro that I'm at all familiar with), though I tried 20.10. Everything works normally until I install MySQL. This is the guide that I'm following. After each failure, I deleted the VM and started with a fresh one.
These are the VM settings:
In addition to the steps listed in the guide, I also tried adding ssh to ufw, just in case.
sudo ufw allow ssh
sudo ufw enable
I also tried running this prior to installing MySQL, based on this article after failing the first couple of times.
sudo apt-get purge mysql*
sudo apt-get autoremove
sudo apt-get autoclean
sudo apt-get dist-upgrade
Once I try installing mysql-server the ssh prompt hangs here:
I've tried reconnecting immediately and I've tried waiting overnight, but I always get stuck here when I try to connect again (it stays like this for a very long time before failing):
I experienced a similar issue with a MySQL Instance in GCP, the first issue was related with the type of the VM instance I used, I had a f1-micro machine type on this VM Instance and suddenly I wasn’t able to access the ssh. As this type of VM Instance has only 0.6GB of memory, it became out of memory soon, I changed it to a e2-medium that is value by default and it resolved my problems this time.
As the Instance was out of memory the services in the instance started to fail, it was the reason that I can't access my instance.
At another time I started again with similar issues, but this time, the problem was the disk, I only had 10 GB and there was a process filling my disk, when a partition was out of space, the instance started to fail again.
I only resized my disk, now my instance disk is 20GB and is working fine.
Having said that, I suggest increasing your resources per your convenience to enhance your performance, because to have the problems you described is a good indicator that your existing machine type is not a good fit for your workloads you run on that instance.
So, I suggest to change the machine type to adjust your memory and you can follow the next steps for these tasks please visit the following link to get further information about it.
Changing a machine type
1.- Go to the VM Instances page.
2.- In the Name column, click your instance.
From the instance details page, complete the following steps:
a) Click the Stop button to stop the instance, if you have not stopped it yet.
b) After the instance stops, click the Edit button at the top of the page.
c) Under the Machine configuration section, select the machine type you want to use, or create a custom machine type to increase only the Memory.
d) Save your changes and start again your VM Instance.
You can resize your disk following this guide or with the following command:
gcloud compute disks resize DISK_NAME --size DISK_SIZE
Or with the Console:
Go to the Disks page to see a list of zonal persistent disks in your project.
Click the name of the disk that you want to resize.
On the disk details page, click Edit.
In the Size field, enter the new size for your disk.
Click Save to apply your changes to the disk.
After you resize the disk, you must resize the file system so that the operating system can access the additional space.
Note: Do not resize boot disks beyond 2 TB because this is the limit.
As per the installation guide you need a server with at least 1GB of memory and your selected VM instance has 614MB of memory. If I understand correctly, when Mysql service is installed it has been occupied total memory and that might be the reason you got stuck on that point also not able to SSH the instance.
I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.
How do I automatically restart a preemptible Google Compute Engine instance? I only have one instance that doesn't need 100% uptime but that I would like to restart once the data center becomes unloaded again. The instance/server that I'm trying to automatically restart has its own boot disk that I'd like to use each time it restarts.
You could try using Instance Group Manager to set up a pool of size 1. It will then try to re-create instances after they are preempted.
You should be aware that there is no guarantee that there is going to be capacity for your instance. As the docs say:
Preemptible instances are available from a finite amount of Compute Engine resources, and might not always be available.
You could create a f1-micro instance which is free for one instance per month in several data centers and create a cron job
*/10 * * * * /snap/bin/gcloud beta compute instances start --zone "yourzone" "yourinstance" --project "yourproject"
after you ran gcloud auth login once.
This will restart your instance every 10 minutes. Of course you can set this also to an hour or more. With a bit more scripting also things like exponential back off can be done.
If you'd like to restart it less frequently, you can use Instance schedules that's built in to the Google Cloud Dashboard.
https://cloud.google.com/compute/docs/instances/schedule-instance-start-stop
When I run Couchbase server in a Docker container on GCE, using the ncolomer/couchbase image, I'm getting this error:
The maximum number of open files for the couchbase user is set too low.
It must be at least 10240. Normally this can be increased by adding
the following lines to /etc/security/limits.conf:
couchbase soft nofile <value>
couchbase hard nofile <value>
Where <value> is greater than 10240.
The docs in ncolomer/couchbase are recommending to update /etc/init/docker.conf and add limit nofile 262144, but I'm not sure that's even available when using Docker under GCE.
I see a few options:
In the Dockerfile, run a script to modify /etc/security/limits.conf as suggested by the couchbase error.
Call ulimit -n 64000 in the Dockerfile
Any suggestions?
The problem with ulimit is that the limits are bounded by the limits of the docker host process.
This is related to the known #4717 (and to a lesser extent #1916) Docker issue.
As I understand, the two options you mentioned should not work since it will only set ulimit on the child process (i.e. the container). From there, I see no choice but set the correct ulimit on the host before trying to increase it on your container.
The documented procedure should work fine, until you have the possibility to apply it.
I don't know the GCE platform very well, but if you have a root access to your instance, you should just apply the changes to the /etc/init/docker.conf file, restart the Docker service and fire up the Couchbase container.
Several months ago, I followed http://aws.amazon.com/articles/1663 and got it all running. Then, my PC crashed and I lost the keypair (http://stackoverflow.com/questions/7949835/accessing-ec2-instance-after-losing-keypair) and could no longer access the instance.
I want to now launch a new instance and mount this MySQL/DB volume which is left over from before and see if I can get to the data on it. How can I go about doing that?
You outlined the correct approach to this problem already, and the author of the article you referenced, Eric Hammond, has written another one detailing this very process, see Fixing Files on the Root EBS Volume of an EC2 Instance - it boils down to:
start another EC2 instance
stop the EC2 instance you can't access anymore
detach the EBS volume from the stopped instance
attach the EBS volume to the running instance
SSH into the running instance
mount the EBS volume in the running instance
perform whatever fixes necessary, i.e. adjust the /var permissions in your case
Please see Eric's instructions for details on how to do this from the command line; obviously you can achieve all steps up to the SSH access via the AWS Management Console as well, removing the need to install the Amazon EC2 API Tools, in case they aren't readily available already.