I'm trying to boot an instance on GCE through libcloud.
When I boot through the libcloud function, ex_create_multiple_nodes (with 1 machine specified), the instance and the disk are created successfully, and the disk is attached. I verify this through the developer console. No exceptions are thrown by the function call.
Unfortunately, the instance never boots successfully:
...
Booting from Hard Disk...
Boot failed: not a bootable disk
...
Full log: https://gist.github.com/danwinkler/dcf1351675eb8c744220
(This repeats again and again)
I've tested booting with the same parameters (snapshot, zone, size, etc.) through the developers console and it works fine.
A colleague pointed out that the error looks similar to those caused by virt-manager, but I don't see anything related to that in the docs or the console Link.
Thanks!
This error normally happens when you're trying to boot from an empty disk. You can attach the disk to another VM instance and check the disk contents to ensure that is has a valid and bootable partition.
Related
My Windows Server instance on GCE is shut down from time to time. Based on the GCP logging, we can tell that fail to pass the lateBootReportEvent check only triggers a reboot by a certain chance. I am wondering why?
logs screenshot
I am aware that auto-shutdown is caused by integrity monitoring (settings shown below). And I understand that my boot integrity might fail here. I am just trying to understand why there is a "probability" here
Shielded-VM settings
The integrity monitor and shielded VMs don't have any relation with a VM restart or shutdown.
Integrity monitoring only compares the most recent boot measurements to the integrity policy baseline and returns a pair of pass/fail results depending on whether they match or not, one for the early boot sequence and one for the late boot sequence.
Early boot is the boot sequence from the start of the UEFI firmware until it passes control to the bootloader. Late boot is the boot sequence from the bootloader until it passes control to the operating system kernel. If either part of the most recent boot sequence doesn't match the baseline, you get an integrity validation failure.
If the failure is expected, for example if you applied some system update on that VM instance, you should update the integrity policy baseline. If it is not expected, you should stop that VM instance and investigate the reason for the failure, but the VM never be shutdown by integrity monitor .
In order to determine what actually caused the VM to restart you will need to look at the internal Windows event manager logs, and review the event viewer logs for the instance at time to shutdown, then reference the shutdown reason against Microsoft's reason codes to determine what caused the VM stop.
It is possible that the instance restarted to complete installation of updates, or encountered an internal error. However only the event viewer logs will determine the true cause.
If you found a useful internal logs please share on this post to check.
My GCP Compute engine is down. In GCP console it is up and running. It is an Ubuntu 18.04 server with 0.6GB memory(an always free tier compute engine). It was not restarted for more than couple of months. The system usage was around 60% last checked. I have already checked this answer. And none of them seems valid for me(seems so).
Free disk stands at 54%.
SSH perfectly configured.
No firewall issue.
It just seemed the VM stopped responding putting the hosted url down. When I checked Compute engine monitoring tabs, all the graphs were normal without any visible changes. I even checked the logging but no system crash kind of logs were present. I stopped and restarted the compute engine, and it started working perfectly, as nothing happened. In AWS, the VM instance failed System Reachability tests in such scenarios.
Does GCP has something similar like AWS system reachability test?
Any possible logs or something, by which I can understand the reason why the Compute Engine stopped responding?
There are different kinds of test with custom scenarios in your project , please find it on reference link [1]
[1] https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/how-to/running-connectivity-tests
I'm creating a managed instance group with autoscaling in GCE. When a lot of work is queued up new instances will be created which start doing work.
Let's say each chunk of work takes 10 minutes, could it happen that GCE decides to shut down an instance that still has work in progress?
Autoscaler will immediately terminate instance if the health check condition meets.
However, you can use a shutdown script to control the termination. A shutdown script will run, on a best-effort basis, in the brief period between when the termination request is made and when the instance is actually terminated. During this period, Compute Engine will attempt to run your shutdown script to perform any tasks you provide in the script. You can read more about the autoscaler decision in this document. You can read about using shutdown script and its limitation at this link.
Also if these instances are offering backend service then it is good to enable connection draining. You can enable connection draining on backend services to ensure minimal interruption to your users when an instance is deleted automatically by an autoscaler or manually removed from an instance group. You can find more at this link about enabling connection draining.
I am running n1-standard-1 (1 vCPU, 3.75 GB memory) Compute Instance , In my android app around 80 users are online write now and cpu Utilisation of instance is 99% and my app became less responsive. Kindly suggest me the workaround and If i need to upgrade , can I do that with same instance or new instance needs to be created.
Since your app is running already and users are connecting to it, you don't want to do the following process:
shut down the VM instance, keeping the boot disk and other disks
boot a more powerful instance, using the boot disk from step (1)
attach and mount any additional disks, if applicable
Instead, you might want to do the following:
create an additional VM instance with similar software/configuration
create a load balancer and add both the original and new VM to it as a backend
change your DNS name to point to the load balancer IP instead of the original VM instance
Now, your users will be randomly sent to a VM that's least-loaded to see the application, and you can add more VMs if your traffic increases.
You did not describe your application in detail, so it's unclear if each VM has local state (e.g., runs a database) or there's a database running externally. You will still need to figure out how to manage the stateful systems such as database or user-uploaded data from all the VM instances, which is hard to advise on given the little information in your quest.
Yesterday I tried to delete an Instance by invoking the "halt" command through SSH. Unlike AWS, GCE does not allow us to choose the behavior of the VM shutdown and stop the instance by default (the instance status is TERMINATED).
Today I was browsing the Google Compute Engine REST API documentation and I found the following description :
status : [Output Only] The status of the instance. One of the following values: PROVISIONING, STAGING, RUNNING, STOPPING, STOPPED, TERMINATED.
What is this "STOPPPED" status ? Both the instances stopped through the Web console or the "halt" command have the "TERMINATED" status.
Any ideas ?
This STOPPED state is a new feature added a few weeks ago which you can reach via the compute engine API.
This method stops a running instance, shutting it down cleanly, and allows you to restart the instance at a later time. Stopped instances do not incur per-minute, virtual machine usage charges while they are stopped, but any resources that the virtual machine is using, such as persistent disks and static IP addresses,will continue to be charged until they are deleted. For more information, see Stopping an instance.
I think this is similar to the AWS option you mention.
For anyone stumbling on this question years later, a detailed lifecycle diagram of instances can be found here
There is no STOPPED status anymore, instances are going from STOPPING to TERMINATED, whatever the stopping method is.
However a new state, that may be closer to what halt does, has been introduced since: SUSPENDED. It's still in beta though, and not sure that invoking halt would induce this state or simply terminates the instance.
See here for more details