Connect Timeout Error on cloudhub : Mule version:4.2.2 - esb

I am trying to hit an https client api which is working fine on postman(gives response in 800ms) and in local mule flow but it is not working fine on cloudhub . I am getting Connect Timeout error. It tries connecting for 30 secs(as per logs) and then gives HTTP:CONNECTIVITY error.
failed: Connect timeout.
errorType=HTTP:CONNECTIVITY
cause=org.mule.extension.http.api.error.HttpRequestFailedException
Response Timeout that I have set is 5 mins.
The flow was working fine when deployed on cloudhub before.It stopped working a few days ago though I didn't make any changes to my code.I am unable to debug this issue as it is not reproducible on my local env(it works perfectly). Any help would be appreciated.

There are 4 different types of general timeouts mule HTTP calls offer. Each has its own differences.
Connection Idle Timeout
Response Timeout
Max Idle Timeout
Query or Transactions Timeout ( Applies for DB Connectors)
Since you are getting
HTTP:CONNECTIVITY ERROR.
Applying a 5 min Response Timeout doesn't help.
Response Timeout (means taking longer time to respond) should be worried only after Establishing a Connection Handshake.
Your problem is with the Connection itself.
The only possible way you could try fixing this is by Applying a Connection Idle Timeout and a Reconnection Strategy with some frequency gaps.
Since you are so sure about tests in local. I suggest you the below two steps:
1. Try using the same HTTP connector configuration in a separate new mule APP. Try with a simple listener and the failing requestor. Also add one more freely available online REST services into your code in other extra flow. Now try to test both. See which one is working and which is failing.
This would tell if it's a real HTTP CONNECTIVITY problem or anything else related to some mule bug.
2. Check your configurations once again and make sure if your hitting the same endpoint in the cloudhub version.
Finally, I hope you did not accidentally put any proxy conf in the local version.

If it was working, probably there was a networking change in the other side that prevents access from the CloudHub application. You didn't share the URL so it is not clear if it is an internal host or a public host. We also don't know if there is some kind of whitelisting on the server side.
You can test connectivity to the HTTP host and port using the Network Tools application, to see if it accessible from your CloudHub environment.

Related

60 Second Timeout on Elastic Beanstalk

I have a single-instance (NO load balancer) Docker container (NO proxy server) that times out at exactly sixty seconds no matter what I do.
Yes, I'm aware of the many seemingly "duplicate" questions. I've been trying to solve this problem for 40+ hours. I've seen them all.
Every single answer to these questions informs the user that they must change the settings of NGINX or the load balancer.
However, I have NEITHER NGINX or a load balancer for the environment, yet it still times out. I am mostly convinced that this is an AWS bug.
I have an endpoint titled time_test for the mini server I created. When I make a POST request to the endpoint, I get a timeout at exactly 60 seconds (the request throws an exception on my end).
Here's the Python code to make the request.
import requests
url = f"http://...us-east-1.elasticbeanstalk.com/"
time_to_sleep = 65
url += f"time_test?time_to_sleep={time_to_sleep}"
response = requests.post(url=url, timeout=10000)
This throws an HTTPSException error, indicating that the server terminated the response, always at exactly 60 seconds.
However, the logs show a successful response.
My logs (specifically, "eb-docker/containers/eb-current-app/eb-blahblah-stdouterr.log) shows
[01/Jun/2022 22:05:49] "POST /time_test?time_to_sleep=65 HTTP/1.1" 200 -
Note the 200 successful status code.
I'm going to continue to find an answer to this problem, which seemingly has none, and will report back if so. Any help with how to change the environment to accept >60 second requests would be greatly appreciated. Please don't reply, "You should have shorter request times." Not helpful or applicable.
(Platform = Docker running on 64bit Amazon Linux 2/3.4.10)
Related:
How to increase FastAPI timeout in Docker to be deployed on AWS EB?
Elastic Beanstalk WebSocket Connection Dropped
PHP beanstalk application giving 504 errors
Blazor Server Side - Frequent 504 errors in AWS environment
504 error on aws elastic beanstalk
Deploying ebextensions on Elastic beanstalk and EC2
AWS bug. It magically started working after I reported this issue to support. No changes. Considering it magically stopped working, that's the conclusion I've come to.

Zabbix Mattermost notification integrations - Timeout exceeded while connecting to 'localhost' when testing Mattermost Media Type

I am trying to intergrate our mattermost with zabbix to receive notifications on alerts. I've followed up the instructions on this link. We are using Zabbix 4.4 with MM 5.19.
After enabling the integration, No alerts are being posted on Mattermost. I tried testing the Media type on Administration > Media Types > Mattermost > Test.
I've added the following as the parameters, but it throws the error : Connection timeout of 3 seconds exceeded when connecting to Zabbix server "localhost".
bot_token : {Token generated for the Bot in Mattemost}
mattermost_url : {https://mattermost.our-company.com}
send_mode : alarm
Tried changing {ZABBIX_URL} to both http://127.0.0.1 and http://zabbix.our-company.com (The DNS is resolved only internally, but our mattermost is available on public network) but none of them work.
I checked the logs inside /var/log/zabbix but no error or anything. I even tried putting the zabbix logs to Debug mode but no luck in any case, the only Debug log I've got is the following :
2063:20200216:090224.146 trapper got '{"request":"alert.send","sid":"74095b240dd6783618571516f029187a","data":{"parameters":{"zabbix_url":"{$ZABBIX.URL}","send_mode":"alarm","send_to":"{ALERT.SENDTO}","event_tags":"{EVENT.TAGS}","event_name":"{EVENT.NAME}","event_nseverity":"{EVENT.NSEVERITY}","event_ack_status":"{EVENT.ACK.STATUS}","event_value":"{EVENT.VALUE}","event_update_status":"{EVENT.UPDATE.STATUS}","event_date":"{EVENT.DATE}","event_time":"{EVENT.TIME}","event_severity":"{EVENT.SEVERITY}","event_opdata":"{EVENT.OPDATA}","event_id":"{EVENT.ID}","event_update_message":"{EVENT.UPDATE.MESSAGE}","trigger_id":"{TRIGGER.ID}","trigger_description":"{TRIGGER.DESCRIPTION}","host_name":"{HOST.NAME}","host_ip":"{HOST.IP}","event_update_date":"{EVENT.UPDATE.DATE}","event_update_time":"{EVENT.UPDATE.TIME}","event_recovery_date":"{EVENT.RECOVERY.DATE}","event_recovery_time":"{EVENT.RECOVERY.TIME}","bot_token":"qs3rkqdappy6i8gs3a8871phxc","mattermost_url":"https:\/\/mattermost.our-company.com"},"mediatypeid":"7"}}'
What can be the issue? Is there a way to "debug" and find the root cause of this error? Any help is appreciated! Note that right now we have integrated Slack with Zabbix and it's working fine, but we are moving to Mattermost and therefore, we need to migrate the integrations as well.
We found out the issue with our Network Admin. The problem was that our Zabbix server was trying to resolve Mattermost name from local network route (i.e. 192.168.x.x) and it kept failing, therefore, no SSL connection could be initiated.
It seems that Zabbix integration tests' error messages are quite generic and sometimes, misleading. Thorough investigation is needed for finding out the root cause.

Django ERR_EMPTY_RESPONSE

I am currently running a Django site on ec2. The site sends a csv back to the client. The CSV is of varying sizes. If it is small the site works fine and client is able to download the file. However, if the file gets large, I get an ERR_EMPTY_RESPONSE. I am guessing this is because the connection is aborting without giving adequate time for the process to run fully. Is there a way to increase this time span?
Here's what my site is returning to the client.
with open('//home/ubuntu/Fantasy-Fire/website/optimizer/lineups.csv') as myfile:
response = HttpResponse(myfile, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=lineups.csv'
return response
Is there some other argument that can allow me to ignore this error and keep generating the file even if it is taking awhile or is large?
I believe that you have any sort of backend proxy server which resets the connection to the Django backend and returns ERR_EMPTY_RESPONSE for the case. You should re-configure timeouts on your backend proxy. Usually that is nginx or apache used as a reverse proxy server.
What is Reverse Proxy Server
A reverse proxy server is an intermediate connection point positioned at a network’s edge. It receives initial HTTP connection requests, acting like the actual endpoint.
Essentially your network’s traffic cop, the reverse proxy serves as a gateway between users and your application origin server. In so doing it handles all policy management and traffic routing.
A reverse proxy operates by:
Receiving a user connection request
Completing a TCP three-way handshake, terminating the initial connection
Connecting with the origin server and forwarding the original request
More info at https://www.imperva.com/learn/performance/reverse-proxy/
One more possible case - your reverse proxy backend server doesn't have enough free space to process response from Django and aborts the request. You can also check free space on your reverse proxy balancer.
Within gunicorn, there is an argument for timeout, -t. When you run gunicorn, the default timeout is 30 seconds. Increase that to something your comfortable with like 90 or 120 seconds, whatever you think fits your application.

Some 502 errors in GCP HTTP Load Balancing

Our load balancer is returning 502 errors for some requests. It is just a very low percentage of the total requests, we have around 36000 request per hour and about 40 errors per hour, so just a 0,01% of the requests returns an error.
The instances are healthy when the error occurs and we have added this forwarding rule to the firewall for the load balancer: 130.211.0.0/22 tcp:1-5000 Apply to all targets
It is not a very serious problem because the application tolerates such errors, but I would like to know why they are given.
Any help will be apreciated.
It seems that there are no an easy solution for this.
As Mike Fotinakis explains in this blog (thank you for this info JasonG :)):
It turns out that there is a race condition between the Google Cloud HTTP(S) Load Balancer and NGINX’s default keep-alive timeout of 65 seconds. The NGINX timeout might be reached at the same time the load balancer tries to re-use the connection for another HTTP request, which breaks the connection and results in a 502 Bad Gateway response from the load balancer.
In my case I'm using Apache with the mpm_prefork module. The solution proposed is to increase the connection keepalive timeout to 650s, but this is not possible because each connection opens one new process (so this would represent a great waste of resources).
UPDATE:
It seems that there are some new documentation about this problem on the official load balancer documentation page (search for "Timeouts and retries"): https://cloud.google.com/compute/docs/load-balancing/http/
They recommend to set the KeepAliveTimeout value to 620 in both cases (Apache and Nginx).
I had an issue w/ 502s that was unexplainable after recreating a load balancer and backend config. I recreated my backend & instance group for unmanaged instances and this seemed to fix the issue for me. I wasn't able to identify any issues in my configuration in GCP :(
But I had a lot more errors - 1/10. There are load balancer logs that will tell you what the cause is and docs explain the causes.
Eg mine were:
jsonPayload: { statusDetails: "failed_to_pick_backend" #type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBal‌​ancerLogEntry" }
If you're using nginx and it's on POSTS and the error is reported as "backend_connection_closed_before_data_sent_to_client" it may be fixed by changing your nginx timeouts. See this excellent blog post:
https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.btzyusgi6

websockets disconnects on wildfly openshift

I have web application deploy on WildFly Application Server 8.2.0.Final on Openshift.
My application serves websockets endpoint.
I connect to the websocket endpoint with my java (tyrus implementation) client application and after short period (few hours) connection is closed by server side. I receive close reason "Closed abnormally" and close reason code: "1006".
Client does automatic reconnection and then exactly every hour connection is again broken with mentioned close reason.
Is this builtin mechanism working on openshift serverside? Some sort of cleaning mechanism?
I would like to have permanent websocket connection to server.
Would buying openshift broze/silver support solve this problem?
The problem is in your browser, not in the server:
Close Code 1006 is a special code that means the connection was closed abnormally (locally) by the browser implementation.
If your browser client reports close code 1006, then you should be looking at the websocket.onerror(evt) event for details.
See this SO answer for more details:
https://stackoverflow.com/a/19305172/212224