I set up a 1 node cluster on google container engine which I just intend to use for testing, so I want to be able keep it shutdown while I am not using it to keep my costs low. I can not however figure out why the VM continually restarts after I shut it down through the console. I have set the "Automatic Restarts" option to false on the VM.
The VM is a n1-standard-2 (2 vCPUs, 7.5 GB memory) with 2 standard persistent disks attached.
Has anyone else faced this issue, or have experience with how to set up GCE so that you can keep it offline while not in use? Thanks in advance for any help.
The VMs in GKE clusters are managed by what's called a Managed Instance Group, which ensures that there's always the expected number of nodes in your cluster. I'd guess that it's seeing that there isn't a VM running in your project and assuming that something's gone wrong, so it recreates it.
You could stop it from doing so by explicitly resizing the instance group down to 0. You can change the number of nodes in the cluster either via the Container Engine UI or by running gcloud container clusters resize $CLUSTERNAME --size=0.
Related
My GCP Compute engine is down. In GCP console it is up and running. It is an Ubuntu 18.04 server with 0.6GB memory(an always free tier compute engine). It was not restarted for more than couple of months. The system usage was around 60% last checked. I have already checked this answer. And none of them seems valid for me(seems so).
Free disk stands at 54%.
SSH perfectly configured.
No firewall issue.
It just seemed the VM stopped responding putting the hosted url down. When I checked Compute engine monitoring tabs, all the graphs were normal without any visible changes. I even checked the logging but no system crash kind of logs were present. I stopped and restarted the compute engine, and it started working perfectly, as nothing happened. In AWS, the VM instance failed System Reachability tests in such scenarios.
Does GCP has something similar like AWS system reachability test?
Any possible logs or something, by which I can understand the reason why the Compute Engine stopped responding?
There are different kinds of test with custom scenarios in your project , please find it on reference link [1]
[1] https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/how-to/running-connectivity-tests
Help! Help! Help!
It is really annoying and I almost cannot bear it anymore! I'm using google cloud compute engine instances but they often unexpectedly restart without any notification in advance. The restart of instances seems to happen randomly and I have no idea what's going wrong there! I'm pretty sure that the instances are been occupied (usage of CPUs > 50% and all GPUs are in use) when restart happens. Could anyone please tell me how to solve this problem? Thanks in advance!
The issue is right here:
all GPUs are in use
If you check the official documentation about GPU:
GPU instances must terminate for host maintenance events, but can automatically restart. These maintenance events typically occur once per week, but can occur more frequently when necessary. You must configure your workloads to handle these maintenance events cleanly. Specifically, long-running workloads like machine learning and high-performance computing (HPC) must handle the interruption of host maintenance events. Learn how to handle host maintenance events on instances with GPUs.
This is because an instance that has a GPU attached cannot be migrated to another host for maintenance as it happens for the rest of the virtual machines. To get a physical GPU attached to the instance and bare metal performance you are using GPU passthrough , which sadly means if the host has to go through maintenance the VM is going down with it.
This sounds like Preemptible VM instance.
Preemptible instances function like normal instances, but have the following limitations:
Compute Engine might terminate preemptible instances at any time due to system events. The probability that Compute Engine will terminate a preemptible instance for a system event is generally low, but might vary from day to day and from zone to zone depending on current conditions.
Compute Engine always terminates preemptible instances after they run for 24 hours.
To check if your instance is preemptible using gcloud cli, just run
gcloud compute instances describe instance-name --format="(scheduling.preemptible)"
Result
scheduling:
preemptible: false
change "instance-name" to real name.
Or simply via UI, click on compute instance and scroll down:
To check for system operations performed on your instance, you can review it using following command:
gcloud compute operations list
I am running n1-standard-1 (1 vCPU, 3.75 GB memory) Compute Instance , In my android app around 80 users are online write now and cpu Utilisation of instance is 99% and my app became less responsive. Kindly suggest me the workaround and If i need to upgrade , can I do that with same instance or new instance needs to be created.
Since your app is running already and users are connecting to it, you don't want to do the following process:
shut down the VM instance, keeping the boot disk and other disks
boot a more powerful instance, using the boot disk from step (1)
attach and mount any additional disks, if applicable
Instead, you might want to do the following:
create an additional VM instance with similar software/configuration
create a load balancer and add both the original and new VM to it as a backend
change your DNS name to point to the load balancer IP instead of the original VM instance
Now, your users will be randomly sent to a VM that's least-loaded to see the application, and you can add more VMs if your traffic increases.
You did not describe your application in detail, so it's unclear if each VM has local state (e.g., runs a database) or there's a database running externally. You will still need to figure out how to manage the stateful systems such as database or user-uploaded data from all the VM instances, which is hard to advise on given the little information in your quest.
Yesterday I tried to delete an Instance by invoking the "halt" command through SSH. Unlike AWS, GCE does not allow us to choose the behavior of the VM shutdown and stop the instance by default (the instance status is TERMINATED).
Today I was browsing the Google Compute Engine REST API documentation and I found the following description :
status : [Output Only] The status of the instance. One of the following values: PROVISIONING, STAGING, RUNNING, STOPPING, STOPPED, TERMINATED.
What is this "STOPPPED" status ? Both the instances stopped through the Web console or the "halt" command have the "TERMINATED" status.
Any ideas ?
This STOPPED state is a new feature added a few weeks ago which you can reach via the compute engine API.
This method stops a running instance, shutting it down cleanly, and allows you to restart the instance at a later time. Stopped instances do not incur per-minute, virtual machine usage charges while they are stopped, but any resources that the virtual machine is using, such as persistent disks and static IP addresses,will continue to be charged until they are deleted. For more information, see Stopping an instance.
I think this is similar to the AWS option you mention.
For anyone stumbling on this question years later, a detailed lifecycle diagram of instances can be found here
There is no STOPPED status anymore, instances are going from STOPPING to TERMINATED, whatever the stopping method is.
However a new state, that may be closer to what halt does, has been introduced since: SUSPENDED. It's still in beta though, and not sure that invoking halt would induce this state or simply terminates the instance.
See here for more details
I'm new to GCE and want to migrate my web site there. I created a VM instance group hoping. I installed all the packages and set it up a couple days ago. But today I noticed my VM instance group has a different name (postfix, to be exact), and the disk is flushed empty. Is it possible to restore its status, or at least make sure it won't get wiped out again? I'm so surprised that GCE wiped out everything and I wonder if I'm missing something during setup.
A few details in case they are related:
I'm using a trusty image for the VM.
The cloud storage is chosen to be a regular persistent disk.
It was working with emphemeral IP, and yesterday I started to use Cloud DNS to host my domain. I should have used a static IP, but that mistake shouldn't cause the VM instance group to be flushed...
I'm using cloud sql as the database service.
Maybe I should just use VM instance, given I don't have much traffic now?
Any help will be greatly appreciated~