I have a single-instance (NO load balancer) Docker container (NO proxy server) that times out at exactly sixty seconds no matter what I do.
Yes, I'm aware of the many seemingly "duplicate" questions. I've been trying to solve this problem for 40+ hours. I've seen them all.
Every single answer to these questions informs the user that they must change the settings of NGINX or the load balancer.
However, I have NEITHER NGINX or a load balancer for the environment, yet it still times out. I am mostly convinced that this is an AWS bug.
I have an endpoint titled time_test for the mini server I created. When I make a POST request to the endpoint, I get a timeout at exactly 60 seconds (the request throws an exception on my end).
Here's the Python code to make the request.
import requests
url = f"http://...us-east-1.elasticbeanstalk.com/"
time_to_sleep = 65
url += f"time_test?time_to_sleep={time_to_sleep}"
response = requests.post(url=url, timeout=10000)
This throws an HTTPSException error, indicating that the server terminated the response, always at exactly 60 seconds.
However, the logs show a successful response.
My logs (specifically, "eb-docker/containers/eb-current-app/eb-blahblah-stdouterr.log) shows
[01/Jun/2022 22:05:49] "POST /time_test?time_to_sleep=65 HTTP/1.1" 200 -
Note the 200 successful status code.
I'm going to continue to find an answer to this problem, which seemingly has none, and will report back if so. Any help with how to change the environment to accept >60 second requests would be greatly appreciated. Please don't reply, "You should have shorter request times." Not helpful or applicable.
(Platform = Docker running on 64bit Amazon Linux 2/3.4.10)
Related:
How to increase FastAPI timeout in Docker to be deployed on AWS EB?
Elastic Beanstalk WebSocket Connection Dropped
PHP beanstalk application giving 504 errors
Blazor Server Side - Frequent 504 errors in AWS environment
504 error on aws elastic beanstalk
Deploying ebextensions on Elastic beanstalk and EC2
AWS bug. It magically started working after I reported this issue to support. No changes. Considering it magically stopped working, that's the conclusion I've come to.
Related
I am getting a 502 Gad Gateway nginx error returned from a couple of my applications to one single workstation which happens to be an Amazon Workspace workstation. These applications work flawlessly when accessing them from any other machine - including from a different Amazon Workspace. I can see the server side error being returned in the /var/log/nginx/access.log. I have another application that is set up pretty much identically that I can access fine from the Workspace in question. The 502 Bad Gateway shows up only after 4 or 5 minutes of loading. Any ideas? Thanks!
I am trying to hit an https client api which is working fine on postman(gives response in 800ms) and in local mule flow but it is not working fine on cloudhub . I am getting Connect Timeout error. It tries connecting for 30 secs(as per logs) and then gives HTTP:CONNECTIVITY error.
failed: Connect timeout.
errorType=HTTP:CONNECTIVITY
cause=org.mule.extension.http.api.error.HttpRequestFailedException
Response Timeout that I have set is 5 mins.
The flow was working fine when deployed on cloudhub before.It stopped working a few days ago though I didn't make any changes to my code.I am unable to debug this issue as it is not reproducible on my local env(it works perfectly). Any help would be appreciated.
There are 4 different types of general timeouts mule HTTP calls offer. Each has its own differences.
Connection Idle Timeout
Response Timeout
Max Idle Timeout
Query or Transactions Timeout ( Applies for DB Connectors)
Since you are getting
HTTP:CONNECTIVITY ERROR.
Applying a 5 min Response Timeout doesn't help.
Response Timeout (means taking longer time to respond) should be worried only after Establishing a Connection Handshake.
Your problem is with the Connection itself.
The only possible way you could try fixing this is by Applying a Connection Idle Timeout and a Reconnection Strategy with some frequency gaps.
Since you are so sure about tests in local. I suggest you the below two steps:
1. Try using the same HTTP connector configuration in a separate new mule APP. Try with a simple listener and the failing requestor. Also add one more freely available online REST services into your code in other extra flow. Now try to test both. See which one is working and which is failing.
This would tell if it's a real HTTP CONNECTIVITY problem or anything else related to some mule bug.
2. Check your configurations once again and make sure if your hitting the same endpoint in the cloudhub version.
Finally, I hope you did not accidentally put any proxy conf in the local version.
If it was working, probably there was a networking change in the other side that prevents access from the CloudHub application. You didn't share the URL so it is not clear if it is an internal host or a public host. We also don't know if there is some kind of whitelisting on the server side.
You can test connectivity to the HTTP host and port using the Network Tools application, to see if it accessible from your CloudHub environment.
I am currently running a Django site on ec2. The site sends a csv back to the client. The CSV is of varying sizes. If it is small the site works fine and client is able to download the file. However, if the file gets large, I get an ERR_EMPTY_RESPONSE. I am guessing this is because the connection is aborting without giving adequate time for the process to run fully. Is there a way to increase this time span?
Here's what my site is returning to the client.
with open('//home/ubuntu/Fantasy-Fire/website/optimizer/lineups.csv') as myfile:
response = HttpResponse(myfile, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=lineups.csv'
return response
Is there some other argument that can allow me to ignore this error and keep generating the file even if it is taking awhile or is large?
I believe that you have any sort of backend proxy server which resets the connection to the Django backend and returns ERR_EMPTY_RESPONSE for the case. You should re-configure timeouts on your backend proxy. Usually that is nginx or apache used as a reverse proxy server.
What is Reverse Proxy Server
A reverse proxy server is an intermediate connection point positioned at a network’s edge. It receives initial HTTP connection requests, acting like the actual endpoint.
Essentially your network’s traffic cop, the reverse proxy serves as a gateway between users and your application origin server. In so doing it handles all policy management and traffic routing.
A reverse proxy operates by:
Receiving a user connection request
Completing a TCP three-way handshake, terminating the initial connection
Connecting with the origin server and forwarding the original request
More info at https://www.imperva.com/learn/performance/reverse-proxy/
One more possible case - your reverse proxy backend server doesn't have enough free space to process response from Django and aborts the request. You can also check free space on your reverse proxy balancer.
Within gunicorn, there is an argument for timeout, -t. When you run gunicorn, the default timeout is 30 seconds. Increase that to something your comfortable with like 90 or 120 seconds, whatever you think fits your application.
Our load balancer is returning 502 errors for some requests. It is just a very low percentage of the total requests, we have around 36000 request per hour and about 40 errors per hour, so just a 0,01% of the requests returns an error.
The instances are healthy when the error occurs and we have added this forwarding rule to the firewall for the load balancer: 130.211.0.0/22 tcp:1-5000 Apply to all targets
It is not a very serious problem because the application tolerates such errors, but I would like to know why they are given.
Any help will be apreciated.
It seems that there are no an easy solution for this.
As Mike Fotinakis explains in this blog (thank you for this info JasonG :)):
It turns out that there is a race condition between the Google Cloud HTTP(S) Load Balancer and NGINX’s default keep-alive timeout of 65 seconds. The NGINX timeout might be reached at the same time the load balancer tries to re-use the connection for another HTTP request, which breaks the connection and results in a 502 Bad Gateway response from the load balancer.
In my case I'm using Apache with the mpm_prefork module. The solution proposed is to increase the connection keepalive timeout to 650s, but this is not possible because each connection opens one new process (so this would represent a great waste of resources).
UPDATE:
It seems that there are some new documentation about this problem on the official load balancer documentation page (search for "Timeouts and retries"): https://cloud.google.com/compute/docs/load-balancing/http/
They recommend to set the KeepAliveTimeout value to 620 in both cases (Apache and Nginx).
I had an issue w/ 502s that was unexplainable after recreating a load balancer and backend config. I recreated my backend & instance group for unmanaged instances and this seemed to fix the issue for me. I wasn't able to identify any issues in my configuration in GCP :(
But I had a lot more errors - 1/10. There are load balancer logs that will tell you what the cause is and docs explain the causes.
Eg mine were:
jsonPayload: { statusDetails: "failed_to_pick_backend" #type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry" }
If you're using nginx and it's on POSTS and the error is reported as "backend_connection_closed_before_data_sent_to_client" it may be fixed by changing your nginx timeouts. See this excellent blog post:
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.btzyusgi6
I am attempting to use purely https with my compute engine. I have a network load balancer created that forwards to a pool with my instance in it. However, the pool has constantly failing health checks because it won't let me configure a health check that uses https.
I'm using apache to redirect 80 to 443. Does anyone know how to either create an https health check or have the http health check follow the redirect?
Thanks for any help.
--edit--
I finally came across some documentation at http://googlecloudplatform.blogspot.com/2015/07/Debugging-Health-Checks-in-Load-Balancing-on-Google-Compute-Engine.html.
Failure 5: Not answering directly with a 200 response code The web server may be configured to redirect to a page that returns an HTTP 200 response code. The health check will not follow the redirect; it expects the health check page to return a 200 directly.
This basic capability has been supported at every other hosting provider we've been on. Why can't this be done? What am I missing?
I spent the whole day trying to configure a purely https based load balancer in GCloud for a Kubernetes cluster with an ingress controller.
I finally got it working, so maybe I share my experience with people that struggle with the same configuration. If the health-check fails for the instances you will usually see the following accessing your websites URL.
Error: Server Error
The server encountered a temporary error and could not complete your request.
Please try again in 30 seconds.
1) Protocol: GCloud introduced new health checks which can be configured for HTTPS, SSLTCP, SSL, HTTP, HTTPS, or HTTP/2 probing. This can help the original problem to prevent a redirect from port 80 to port 443.
2) Path: The most common issue is a that the "/" path of your application will not return a 200 OK and thus let the health issue fail. This can be prevented by adding a path argument to your health check e.g. "/index".
3) Ingress HTTPS: This is relatively simple. Adding a secret or a pre-shared-cert to your ingress.yaml will automatically result in an HTTPS Load Balancer instead of HTTP. Further information to follow are here
Lastly, the guide from the docs for Setting up HTTP Load Balancing with Ingress .
However, even though the new HTTPS Health checks seem to work, they are still in the beta phase and bugs are reported in the issue tracker. The documentation for the gcloud-ingress-controller can be found here.