TLDR: What is the upper-bound on how long I should wait to guarantee that a GCE instance has been removed from the load-balancing path and can be safely deleted?
Details: I have a relatively standard setup: GCE instances in a managed instance group, global HTTPS load balancer in front of them pointed at a backend service with only the one managed instance group in it. Health checks are standard 5 seconds timeout, 5 seconds unhealthy threshold, 2 consecutive failures, 2 consecutive successes.
I deploy some new instances, add them to the instance group, and remove the old ones. After many minutes (10-15 min usually), I delete the old instances.
Every once in a while, I notice that deleting the old instances (which I believe are no longer receiving traffic) correlates with a sporadic 502 response to a client, which can be seen only in the load-balancer level logs:
I've done a bunch of logs correlation and tcpdumping and load testing to be fairly confident that this 502 is not being served by one of the new, healthy instances. In any case, my question is:
What is the upper-bound on how long I should wait to guarantee that a GCE instance has been removed from the load-balancing path and can be safely deleted?
I think what you are looking for is the connection draining feature. https://cloud.google.com/compute/docs/load-balancing/enabling-connection-draining
To answer my own question: it turns out that these 502s were not related to shutting down an instance, 10 minutes was plenty of time to remove an instance from the serving path. The 502s were caused by a race condition between nginx timeouts and GCP's HTTP(S) Load Balancer timeouts—I've written up a full blog post on it here: Tuning NGINX behind Google Cloud Platform HTTP(S) Load Balancer
Related
I'm creating a managed instance group with autoscaling in GCE. When a lot of work is queued up new instances will be created which start doing work.
Let's say each chunk of work takes 10 minutes, could it happen that GCE decides to shut down an instance that still has work in progress?
Autoscaler will immediately terminate instance if the health check condition meets.
However, you can use a shutdown script to control the termination. A shutdown script will run, on a best-effort basis, in the brief period between when the termination request is made and when the instance is actually terminated. During this period, Compute Engine will attempt to run your shutdown script to perform any tasks you provide in the script. You can read more about the autoscaler decision in this document. You can read about using shutdown script and its limitation at this link.
Also if these instances are offering backend service then it is good to enable connection draining. You can enable connection draining on backend services to ensure minimal interruption to your users when an instance is deleted automatically by an autoscaler or manually removed from an instance group. You can find more at this link about enabling connection draining.
Our load balancer is returning 502 errors for some requests. It is just a very low percentage of the total requests, we have around 36000 request per hour and about 40 errors per hour, so just a 0,01% of the requests returns an error.
The instances are healthy when the error occurs and we have added this forwarding rule to the firewall for the load balancer: 130.211.0.0/22 tcp:1-5000 Apply to all targets
It is not a very serious problem because the application tolerates such errors, but I would like to know why they are given.
Any help will be apreciated.
It seems that there are no an easy solution for this.
As Mike Fotinakis explains in this blog (thank you for this info JasonG :)):
It turns out that there is a race condition between the Google Cloud HTTP(S) Load Balancer and NGINX’s default keep-alive timeout of 65 seconds. The NGINX timeout might be reached at the same time the load balancer tries to re-use the connection for another HTTP request, which breaks the connection and results in a 502 Bad Gateway response from the load balancer.
In my case I'm using Apache with the mpm_prefork module. The solution proposed is to increase the connection keepalive timeout to 650s, but this is not possible because each connection opens one new process (so this would represent a great waste of resources).
UPDATE:
It seems that there are some new documentation about this problem on the official load balancer documentation page (search for "Timeouts and retries"): https://cloud.google.com/compute/docs/load-balancing/http/
They recommend to set the KeepAliveTimeout value to 620 in both cases (Apache and Nginx).
I had an issue w/ 502s that was unexplainable after recreating a load balancer and backend config. I recreated my backend & instance group for unmanaged instances and this seemed to fix the issue for me. I wasn't able to identify any issues in my configuration in GCP :(
But I had a lot more errors - 1/10. There are load balancer logs that will tell you what the cause is and docs explain the causes.
Eg mine were:
jsonPayload: { statusDetails: "failed_to_pick_backend" #type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry" }
If you're using nginx and it's on POSTS and the error is reported as "backend_connection_closed_before_data_sent_to_client" it may be fixed by changing your nginx timeouts. See this excellent blog post:
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.btzyusgi6
I have a single GCE VM running behind an http load balancer. Its too early to have multiple VMs, saving money.
What happens is this VM seems to be deleted and reinserted randomly by GCE. My tiny web app dies for that duration. I see these in my VM logs:
compute.instances.delete
compute.instances.insert
Seems like GCE maintenance activity. Is there a way to avoid this?
Yesterday I tried to delete an Instance by invoking the "halt" command through SSH. Unlike AWS, GCE does not allow us to choose the behavior of the VM shutdown and stop the instance by default (the instance status is TERMINATED).
Today I was browsing the Google Compute Engine REST API documentation and I found the following description :
status : [Output Only] The status of the instance. One of the following values: PROVISIONING, STAGING, RUNNING, STOPPING, STOPPED, TERMINATED.
What is this "STOPPPED" status ? Both the instances stopped through the Web console or the "halt" command have the "TERMINATED" status.
Any ideas ?
This STOPPED state is a new feature added a few weeks ago which you can reach via the compute engine API.
This method stops a running instance, shutting it down cleanly, and allows you to restart the instance at a later time. Stopped instances do not incur per-minute, virtual machine usage charges while they are stopped, but any resources that the virtual machine is using, such as persistent disks and static IP addresses,will continue to be charged until they are deleted. For more information, see Stopping an instance.
I think this is similar to the AWS option you mention.
For anyone stumbling on this question years later, a detailed lifecycle diagram of instances can be found here
There is no STOPPED status anymore, instances are going from STOPPING to TERMINATED, whatever the stopping method is.
However a new state, that may be closer to what halt does, has been introduced since: SUSPENDED. It's still in beta though, and not sure that invoking halt would induce this state or simply terminates the instance.
See here for more details
Ok, I have my server built on ec2. My stack is Nginx as a load balancer, supervisord for managing processes for node.js i.e. one process for each cpu, and redis, master and slave on separate boxes. I have stress tested by testing failover and taking services offline. Using apache AB, on the server I can get up to 6500 QPS.
Now, I need to load test remotely. What are the best open source tools to accomplish this or even the most cost effective SaaS method to do this. I do expect 6500 QPS per server in production and need to extend the isolation of apache AB to remote testing. E.g. I will have servers in singapore and I need to test 6500 QPS from Japan and the effect of latency. I am aware of apache Jmeter but looking for a best practice solution.
Thanks
I have successfully used jMeter for load testing at significant scale.
If a single load generation client cannot output enough load, you can configure jMeter with multiple load generation clients, with the load coordinated by a master instance.
Using "open source tools" implies that you have the ability to spin up servers in the zones you're interested in (e.g. Japan). If you locate a cloud provider in that region, you can spin up as many load generation instances as needed. You may, however, need quite a few instances depending on the network connectivity offered to individual instances. The nice thing about jMeter is that it can coordinate many load generation instances.
You can use blazemeter as SaaS solution. It's 100% Jmeter compatible. There is Japan(Tokyo) load origin location which you need.