My GCP Compute engine is down. In GCP console it is up and running. It is an Ubuntu 18.04 server with 0.6GB memory(an always free tier compute engine). It was not restarted for more than couple of months. The system usage was around 60% last checked. I have already checked this answer. And none of them seems valid for me(seems so).
Free disk stands at 54%.
SSH perfectly configured.
No firewall issue.
It just seemed the VM stopped responding putting the hosted url down. When I checked Compute engine monitoring tabs, all the graphs were normal without any visible changes. I even checked the logging but no system crash kind of logs were present. I stopped and restarted the compute engine, and it started working perfectly, as nothing happened. In AWS, the VM instance failed System Reachability tests in such scenarios.
Does GCP has something similar like AWS system reachability test?
Any possible logs or something, by which I can understand the reason why the Compute Engine stopped responding?
There are different kinds of test with custom scenarios in your project , please find it on reference link [1]
[1] https://cloud.google.com/network-intelligence-center/docs/connectivity-tests/how-to/running-connectivity-tests
Related
I'm trying to set up a Staging VM for a site that's in production that I have just inherited. The site is running Wordpress/Woocommerce and has not been updated in a while. The VM it's hosted on is running an old version of PHP. Obviously, this all needs to be fixed up but I'm unfamiliar with GCP Compute Engine. Also any attempt to run backup/clone plugins crashes the site and requires a restore from the daily snapshot which is very annoying.
Is it possible to clone the VM/disk to a new instance, point that at a temporary domain, and test/update the site? I have been trying to do this for a while now without much luck any suggestions would be much appreciated. Thanks.
Creating a clone of an existing VM is possible and quite easy.
Create a snapshot of the VM. If possible stop the VM before doing this to ensure 100% accuracy - this way you will have exact snapshot of the drive without any errors. You can do it while the VM is running too if stopping it is out of the question.
Create a VM from the shapshot - select as a boot disk a snapshot that you've just created. Remember to assign a static public IP to this VM (unless you want it changed after VM restart and since you're going to do some configuration this would likely happen). You can change the VM's specs at this time too - nothing stops you from adding/removing CPU's, RAM etc. It may well be that your VM is underutilised and you can use something smaller to save costs. Or the opposite.
Start the machine. Now you can modify your WP configuration to point to a new domain. Depending on the SSL certificate - you can either use external one or the one provided by GCP (most convinient solution).
If you already own a domain you want to use for staging you can host it in Cloud DNS or at some other provider - just point it to the external IP you just reserved.
If you will be hosting your domain in the Cloud DNS then you will find necessary infomration in the documentation about managed zones (domains).
You can also consider creating a new VM and setting it as a template for creating a group of VM's (managed autoscaled group) and creating an external HTTPS load balancer in front of it. But this adds a little to the complexity so it's just my idea if you needed to handle a lot more traffic.
I am running an EC2 Instance on Ubuntu Server machine. Tomcat and MySQL are installed and deployed java web-application on it since 1 month. It was running good with great performance for almost 1 month but now my application is responding very slow.
Also, point to note is: Earlier when I used to log into my Ubuntu Server through PuTTY, it was quick but now its taking time even when I enter Ubuntu password.
Is there any solution?
I would start with checking with memory/CPU/network availability to check if it is not bottleneck.
Try following commands:
To check memory availability:
free -m
To check CPU usage:
top
To check network usage:
ntop
To check disk usage:
df -h
To check disk io operations:
iotop
Please also check if when you disable your application you are able to quickly log in to that machine. If login is still slow, then you should contact your EC2 support complaining about poor performance and asking for assigning more resources for that machine.
You can use WAIT Tool to diagnose what is wrong with your server or your application. The tool will gather all information about CPU and memoru utilization, running threads etc.
In addition, I would definitely check Tomcat application server with VisualVM or some other profiler. For configuring JMX for Tomcat you can check article here.
For network monitoring - nload tool is worth your attention. You can launch it in screen so you always check network utilization stats when server is slown.
First check is there any application using too much cpu or memory. This can be checked by using top command. I'll tell you two simple shortcut keys that may be helpful while using top command. In top command result page, if you enter M it will sort application based on memory usage, from highest to lowest. If you enter P it will sort application based on cpu usage, from highest to lowest.
If you are unable to find any suspicious application using top you can use iotop it will show disk I/O usage details.
I was facing the same issue, the solution which worked for me was
Restart the ec2 instance
Edit
lately, I figure out this issue is happening due to the fewer resources (memory, CPU) available to the EC2 machine. So check available resources to the EC2 machine.
Yesterday I tried to delete an Instance by invoking the "halt" command through SSH. Unlike AWS, GCE does not allow us to choose the behavior of the VM shutdown and stop the instance by default (the instance status is TERMINATED).
Today I was browsing the Google Compute Engine REST API documentation and I found the following description :
status : [Output Only] The status of the instance. One of the following values: PROVISIONING, STAGING, RUNNING, STOPPING, STOPPED, TERMINATED.
What is this "STOPPPED" status ? Both the instances stopped through the Web console or the "halt" command have the "TERMINATED" status.
Any ideas ?
This STOPPED state is a new feature added a few weeks ago which you can reach via the compute engine API.
This method stops a running instance, shutting it down cleanly, and allows you to restart the instance at a later time. Stopped instances do not incur per-minute, virtual machine usage charges while they are stopped, but any resources that the virtual machine is using, such as persistent disks and static IP addresses,will continue to be charged until they are deleted. For more information, see Stopping an instance.
I think this is similar to the AWS option you mention.
For anyone stumbling on this question years later, a detailed lifecycle diagram of instances can be found here
There is no STOPPED status anymore, instances are going from STOPPING to TERMINATED, whatever the stopping method is.
However a new state, that may be closer to what halt does, has been introduced since: SUSPENDED. It's still in beta though, and not sure that invoking halt would induce this state or simply terminates the instance.
See here for more details
I'm new to GCE and want to migrate my web site there. I created a VM instance group hoping. I installed all the packages and set it up a couple days ago. But today I noticed my VM instance group has a different name (postfix, to be exact), and the disk is flushed empty. Is it possible to restore its status, or at least make sure it won't get wiped out again? I'm so surprised that GCE wiped out everything and I wonder if I'm missing something during setup.
A few details in case they are related:
I'm using a trusty image for the VM.
The cloud storage is chosen to be a regular persistent disk.
It was working with emphemeral IP, and yesterday I started to use Cloud DNS to host my domain. I should have used a static IP, but that mistake shouldn't cause the VM instance group to be flushed...
I'm using cloud sql as the database service.
Maybe I should just use VM instance, given I don't have much traffic now?
Any help will be greatly appreciated~
In the operations history of my compute engine project, my machines all have an operation listed as "Automatically migrate an instance (compute.instances.automaticRestart)". They are all on the same zone, using the same debian template.
I suppose that their was some maintenance on the plateform, which is fine for me if the OS doesn't reboot.
Unfortunately two machines suffered a reboot. The operations history listed the operation as "compute.instances.hostError an instance" (compute.instances.hostError).
In addition Syslog doesn't suggest a clean shutdown.
Is there anything I should/can do to prevent such problem?
edit : We are europe-west1-b and all servers have the setting : On host maintenance to Migrate VM instance
Doesn't look like this was ever answered.
compute.instances.hostError means there was a hardware or software failure on the physical machine that was hosting your VM.
The FAQ has a description -- https://cloud.google.com/compute/docs/faq#hosterror
As per this article all the zones in GCE except "europe-west1-a" have Transparent maintenance where the instance will be live migrated without rebooting it. If your instance is in a zone with Transparent Maintenance you can set the option On host maintenance to Migrate VM instance using your developer console. Once this option is set your instance will be live migrated without rebooting the instance.