Webmethods : [ISS.0085.9281] Http Error: 500 - Timeout - webmethods

We have a problem with our webmethods platform, in fact, we have errors:
[ISP.0085.9998E] Exception -> com.wm.app.b2b.server.ServiceException:
[ISS.0085.9281] Http Error: 500 - Timeout , And this on several
streams.
Knowing that the configured timeout is 120 seconds and the error occurs after 35 to 40 seconds.
We are working on webmethods 9.6, and we call a backend that uses SAP.

This timeout is not at wM side but from a called web service.
This WS can be located from your SAP backend but can be also internal to your wM suite (for example, a call to MWS or to another IS in case of cluster).
Increasing IS logs related to webwervices call to obtain more details and ensure the source of this timeout.
Bye.

Related

Reaching QueuePool limit overflow in FastAPI application

I have the following project on GitHub built with FastAPI, SQLAlchemy and PostgresDB.
And during the load tests via Locust I investigated that my server fails with the following error:
sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00
Here is a picture in Locust:
Why am I getting this requests as HTTPError('500 Server Error: Internal Server Error and not as timeout. How can I avoid this by turning this requests into timeout?
Obviously I can set pool_size and max_overflow options in engine constructor to greater values, but I don't think it's a good practice.
Is it possible to control number of session created?

How can I optimize my Google Cloud SQL (MySQL) database for use with an API

I created a MySQL Database into the Google Cloud Platform.
Machine type is db-n1-standard-2 with 2 vCPUs and 7.5 GB Memory.
Network throughput (MB/s) is 500 of 2000
Storage type: SSD
Disk throughput (MB/s)
Read: 4.8
Write 4.8
IOPS
Read: 300
Write: 300
Availability: High availability
Database Flags:
max_connections: 500
I created a API with Laravel Lumen and let it work onto Google Cloud Platform into a App Engine
runtime: php72
instance_class: F2
automatic_scaling:
min_instances: 1
max_instances: 20
target_cpu_utilization: 0.7
max_concurrent_requests: 80
target_throughput_utilization: 0.8
If I send a request to my API with postman the first response needs 1123ms. The size of the response is 8.59 KB.
If I send the same request with loader.io with 250 clients over 1 minute,
the test aborted because it reached the error threshold.
79,5% error rate
avg resp = 9141 ms
min/max Responsetime is: 2081/10376
Response Counts success: 104
Response Counts timeout: 403
When I have a look at the MySQL Error Logging, I do have impossible much errors like this:
2020-01-07 16:29:18.670 CET
2020-01-07T15:29:18.670275Z 1507 [Note] Aborted connection 1507 to db: 'mydatabasename' user: 'mydatabaseuser' host: 'cloudsqlproxy~172.217.35.158' (Got an error reading communication packets)
Does someone have an Idea, how I can solve this problem?
I've investigated a little about this error and found some useful guides on how to diagnose this types of errors at this link I believe that as a first step we would need to find the real cause of this message (this could be due to various reasons according to the link shared), some suggestions that I could notice repeating on other posts and on the same link that I shared before are the next:
Check to make sure the value of max_allowed_packet is high enough ( this can be modified with flags in Cloud SQL).
The client connected successfully but terminated improperly
The client slept for longer than the defined wait_timeout or interactive_timeout seconds
What I would do is to go on and try tweaking the database flags as described on this public documentation on the google page and check how the behavior changes.
Please let us know if you find something useful when tweaking the instance.

Disable Mule flow at startup - continue or ignore startup failure

Mule project has multiple flows some of which have endpoints that may be offline at startup during testing. A failed endpoint in any flow is causing the entire Mule project to fail to deploy. Console logs that domain status is deployed but application status = FAILED.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Starting app 'test' +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
. Root Exception was: Connection refused: connect. Type: class java.net.ConnectException
ERROR 2018-01-09 10:31:08,287 [main] org.mule.module.launcher.application.DefaultMuleApplication:
********************************************************************************
Message : Could not connect to broker URL: tcp://localhost:61616.
Reason: java.net.ConnectException: Connection refused: connect
JMS Code : null
*************************************************************
* Application "test" shut down normally on: 1/9/18 10:31 AM *
* Up for: 0 days, 0 hours, 0 mins, 1.449 sec *
*************************************************************
ERROR 2018-01-09 10:31:08,413 [main] org.mule.module.launcher.DefaultArchiveDeployer:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+ Failed to deploy artifact 'test', see below +
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
org.mule.module.launcher.DeploymentStartException: ConnectException: Connection refused: connect
Have tried to set initialState="stopped" on flows that could have startup connection issues but has no affect on running the project. Project still fails to run and no flows are running.
Added CatchExceptionStrategy to the inbound endpoints that can fail at startup to no available. Also tried "Until Successful" scope in flow.
In particular have some JMS and Web service components which may be offline at different times during development and testing. Want to configure flows to allow overall project to continue even if a single component/flow fails to connect at startup. Want to manage a single project with multiple flows such that some flows may not be active.
Environment: Anypoint Studio and Mule 3.9.0 EE.
If you would like your deployment to succeed even when your service is not available, you will need to supply a reconnection strategy on the JMS Connector with blocking=false. For example:
<jms:activemq-connector name="Active_MQ" username="a" password="b" brokerURL="tcp://localhost:61616" validateConnections="true" doc:name="Active MQ">
<reconnect-forever blocking="false"/>
</jms:activemq-connector>
More information on reconnection strategies can be found in the MuleSoft documentation here: https://docs.mulesoft.com/mule-user-guide/v/3.9/configuring-reconnection-strategies if needed.

nodejs runtime stops responding when testing under stress

I'm basically checking all the routes via request module with mocha.
https://www.npmjs.com/package/request
I'm doing a stress test, by opening two console windows side by side and running them simultaneously. Most of the time tests are successful, but then an instant comes when the tests fail without timeout error, and from postman I've this specific route that stops responding.
it happens once in around 7 times, and I'm wondering what I could do to figure this out.
Edit:
Increased to 4 console windows running tests simultaneously, they ran fine couple of times but then start to timeout.
even no console output on app.get, app.post etc. routes.
Any suggestions?
Edit
Caught some request errors based on the suggestion within tests.
Uncaught AssertionError: { [Error: connect ECONNREFUSED]
code: 'ECONNREFUSED',
errno: 'ECONNREFUSED',
syscall: 'connect' } == null
The corresponding code for the above error is
request({url: endpoint + "/SignIn?emailAddress=" + emailAddress + "&password=" + password}, function (error, response, body) {
assert.equal(error, null);
Edit 2
Dig further deep with console statements and noticed the mysql connection callback was not called. Attaching a screenshot and noticing some connection limit, is it because of this? I'm using connection pools though.
logs says forcing close of threads.
Probable Answer:
This thread helped with the issue.
https://github.com/felixge/node-mysql/issues/405
I set the waitForConnections: false and then started to see the error ->
[Error: No connections available.]
so it seems to me that system was waiting for the connections but test runner didn't wait and ended up with timeout error.
It also seems there's some limit on the maximum number of connections, though I was calling release on connections after each query, not sure how this works on production systems out there? do we have a limit there?
You are running out of tcp connections. You need to make few changes in system and application level, to make it handle more load.
1. Change your connection setting to keepAlive, wherever possible.
2. On unix, you have ulimit, i.e., the maximum number of file handles that any process can hold at any instant. Remember, in unix every socket is also a file.
3. Manage your time out settings, based on the response time of your database server or another web server.
You'll have to do similar changes at each level of handling request, if you have a multi-tier architecture.

Configure GlassFish JDBC connection pool to handle Amazon RDS Multi-AZ failover

I have a Java EE application running in GlassFish on EC2, with a MySQL database on Amazon RDS.
I am trying to configure the JDBC connection pool to in order to minimize downtime in case of database failover.
My current configuration isn't working correctly during a Multi-AZ failover, as the standby database instance appears to be available in a couple of minutes (according to the AWS console) while my GlassFish instance remains stuck for a long time (about 15 minutes) before resuming work.
The connection pool is configured like this:
asadmin create-jdbc-connection-pool --restype javax.sql.ConnectionPoolDataSource \
--datasourceclassname com.mysql.jdbc.jdbc2.optional.MysqlConnectionPoolDataSource \
--isconnectvalidatereq=true --validateatmostonceperiod=60 --validationmethod=auto-commit \
--property user=$DBUSER:password=$DBPASS:databaseName=$DBNAME:serverName=$DBHOST:port=$DBPORT \
MyPool
If I use a Single-AZ db.m1.small instance and reboot the database from the console, GlassFish will invalidate the broken connections, throw some exceptions and then reconnect as soon the database is available. In this setup I get less than 1 minute of downtime.
If I use a Multi-AZ db.m1.small instance and reboot with failover from the AWS console, I see no exception at all. The server halts completely, with all incoming requests timing out. After 15 minutes I finally get this:
Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.3.2.v20111125-r10461): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 940,715 milliseconds ago. The last packet sent successfully to the server was 935,598 milliseconds ago.
It appears as if each HTTP thread gets blocked on an invalid connection without getting an exception and so there's no chance to perform connection validation.
Downtime in the Multi-AZ case is always between 15-16 minutes, so it looks like a timeout of some sort but I was unable to change it.
Things I have tried without success:
connection leak timeout/reclaim
statement leak timeout/reclaim
statement timeout
using a different validation method
using MysqlDataSource instead of MysqlConnectionPoolDataSource
How can I set a timeout on stuck queries so that connections in the pool are reused, validated and replaced?
Or how can I let GlassFish detect a database failover?
As I commented before, it is because the sockets that are open and connected to the database don't realize the connection has been lost, so they stayed connected until the OS socket timeout is triggered, which I read might be usually in about 30 minutes.
To solve the issue you need to override the socket Timeout in your JDBC Connection String or in the JDNI COnnection Configuration/Properties to define the socketTimeout param to a smaller time.
Keep in mind that any connection longer than the value defined will be killed, even if it is being used (I haven't been able to confirm this, is what I read).
The other two parameters I mention in my comment are connectTimeout and autoReconnect.
Here's my JDBC Connection String:
jdbc:(...)&connectTimeout=15000&socketTimeout=60000&autoReconnect=true
I also disabled Java's DNS cache by doing
java.security.Security.setProperty("networkaddress.cache.ttl" , "0");
java.security.Security.setProperty("networkaddress.cache.negative.ttl" , "0");
I do this because Java doesn't honor the TTL's, and when the failover takes place, the DNS is the same but the IP changes.
Since you are using an Application Server, the parameters to disable DNS cache must be passed to the JVM when starting the glassfish with -Dnet and not the application itself.