I am getting a 502 Gad Gateway nginx error returned from a couple of my applications to one single workstation which happens to be an Amazon Workspace workstation. These applications work flawlessly when accessing them from any other machine - including from a different Amazon Workspace. I can see the server side error being returned in the /var/log/nginx/access.log. I have another application that is set up pretty much identically that I can access fine from the Workspace in question. The 502 Bad Gateway shows up only after 4 or 5 minutes of loading. Any ideas? Thanks!
Related
I deployed a Flask app to AWS ECS instance but when the app is run, it shows "502 Bad Gateway". When I check the console, it says "Failed to load resource: the server responded with a status of 502 (Bad Gateway)" for favicon.ico. But the thing is I haven't added favicon in my html file.
enter image description here
What might be the issue here?
I was trying to deploy an ML model. I dockeried the app and deployed it as an AWS ECS service. Thats when this issue occured.
I have a single-instance (NO load balancer) Docker container (NO proxy server) that times out at exactly sixty seconds no matter what I do.
Yes, I'm aware of the many seemingly "duplicate" questions. I've been trying to solve this problem for 40+ hours. I've seen them all.
Every single answer to these questions informs the user that they must change the settings of NGINX or the load balancer.
However, I have NEITHER NGINX or a load balancer for the environment, yet it still times out. I am mostly convinced that this is an AWS bug.
I have an endpoint titled time_test for the mini server I created. When I make a POST request to the endpoint, I get a timeout at exactly 60 seconds (the request throws an exception on my end).
Here's the Python code to make the request.
import requests
url = f"http://...us-east-1.elasticbeanstalk.com/"
time_to_sleep = 65
url += f"time_test?time_to_sleep={time_to_sleep}"
response = requests.post(url=url, timeout=10000)
This throws an HTTPSException error, indicating that the server terminated the response, always at exactly 60 seconds.
However, the logs show a successful response.
My logs (specifically, "eb-docker/containers/eb-current-app/eb-blahblah-stdouterr.log) shows
[01/Jun/2022 22:05:49] "POST /time_test?time_to_sleep=65 HTTP/1.1" 200 -
Note the 200 successful status code.
I'm going to continue to find an answer to this problem, which seemingly has none, and will report back if so. Any help with how to change the environment to accept >60 second requests would be greatly appreciated. Please don't reply, "You should have shorter request times." Not helpful or applicable.
(Platform = Docker running on 64bit Amazon Linux 2/3.4.10)
Related:
How to increase FastAPI timeout in Docker to be deployed on AWS EB?
Elastic Beanstalk WebSocket Connection Dropped
PHP beanstalk application giving 504 errors
Blazor Server Side - Frequent 504 errors in AWS environment
504 error on aws elastic beanstalk
Deploying ebextensions on Elastic beanstalk and EC2
AWS bug. It magically started working after I reported this issue to support. No changes. Considering it magically stopped working, that's the conclusion I've come to.
I am using openshift 4.5. I used service and routes and it is externally accessible. Application is working fine, but sometimes i am getting 502 error.
I am using ingress haproxy route.
Appreciate your thoughts on resolving this issue.
Our load balancer is returning 502 errors for some requests. It is just a very low percentage of the total requests, we have around 36000 request per hour and about 40 errors per hour, so just a 0,01% of the requests returns an error.
The instances are healthy when the error occurs and we have added this forwarding rule to the firewall for the load balancer: 130.211.0.0/22 tcp:1-5000 Apply to all targets
It is not a very serious problem because the application tolerates such errors, but I would like to know why they are given.
Any help will be apreciated.
It seems that there are no an easy solution for this.
As Mike Fotinakis explains in this blog (thank you for this info JasonG :)):
It turns out that there is a race condition between the Google Cloud HTTP(S) Load Balancer and NGINX’s default keep-alive timeout of 65 seconds. The NGINX timeout might be reached at the same time the load balancer tries to re-use the connection for another HTTP request, which breaks the connection and results in a 502 Bad Gateway response from the load balancer.
In my case I'm using Apache with the mpm_prefork module. The solution proposed is to increase the connection keepalive timeout to 650s, but this is not possible because each connection opens one new process (so this would represent a great waste of resources).
UPDATE:
It seems that there are some new documentation about this problem on the official load balancer documentation page (search for "Timeouts and retries"): https://cloud.google.com/compute/docs/load-balancing/http/
They recommend to set the KeepAliveTimeout value to 620 in both cases (Apache and Nginx).
I had an issue w/ 502s that was unexplainable after recreating a load balancer and backend config. I recreated my backend & instance group for unmanaged instances and this seemed to fix the issue for me. I wasn't able to identify any issues in my configuration in GCP :(
But I had a lot more errors - 1/10. There are load balancer logs that will tell you what the cause is and docs explain the causes.
Eg mine were:
jsonPayload: { statusDetails: "failed_to_pick_backend" #type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry" }
If you're using nginx and it's on POSTS and the error is reported as "backend_connection_closed_before_data_sent_to_client" it may be fixed by changing your nginx timeouts. See this excellent blog post:
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.btzyusgi6
Additional, unexpected HTTPS connections are being made to GCE servers.
This started 2nd October and is affecting europe-west1-b and us-central1-b.
We have the same codebase running on servers in Amazon EC2 that are not affected.
Is anyone else seeing issues with HTTPS traffic to GCE?
UPDATE: Clarification of duplicated connections:
A single HTTPS request from a web browser for example
GET /favicon.ico HTTP/1.1
Results in 5 HTTPS connections opened, no http request is send and then they are closed (before timeout period).
Then a final connection is opened and the request is sent as it should.
Note:
This usually would go undetected. However we only allow 10 SSL connections from a single IP in the space of 1 second (velocity restriction).
I have temporarily increased this to 20 and everything is working OK.
What I don't understand is why this would suddenly start happening and only on GCE servers.
I will update this again when I have looked into the raw SSL traffic.