API Management retry policy - azure-api-management

I have configured an API Management policy to retry in the case of 500 errors
<retry condition="#(context.Response.StatusCode == 500)" count="10" interval="10" max-interval="100" delta="10" first-fast-retry="false">
<forward-request buffer-request-body="true" />
</retry>
I can see via app insights that it only ever appears to retry a maximum of four times. Sometimes it tries once and sometimes twice
I would expect to see 10 attempts?

Related

unable to get status code of oci-cli command

i need to get the response code to use in scripts
like i run a command
oci compute instance update --instance-id ocid.of.instance --shape-config '{"OCPU":"2"}' --force
i will get this message
ServiceError:
{
"code": "InternalError",
"message": "Out of host capacity.",
"opc-request-id": "3FF4337F4ECE43BBB4B8E52524E80247/37CB970D371A9C6BB01DFB23E754FE5B/18DFE9AE75B88A77AB3A1FBEBD3B191B",
"status": 500
}
in this case, i got the error message and a status code 500
but if the commond works, it will output a full json of my instance's parameters, and i can only see a line of response code 200 in debug mode
is there a way to only show the response code?
Currently OCI CLI does not provide the HTTP response code directly in the response. The response would either contain the service response in case of success or a service error message in case of error.
Can you explain how you are using the HTTP response code in your script? Could you not use the command error code (non-zero on error) to determine the error case?
The ERROR: "Out of host capacity" means The selected shape does not have any available servers in the selected region and Availability Domain (AD). Virtual Machines (VM) are dynamically provisioned. If an AD has reached a minimum threshold, new hypervisors (physical servers) will be automatically provisioned.
There may be some occasions where the additional capacity has not finished provisioning before the existing capacity is exhausted, but when retrying in 15 minutes the customer may find the shape they want is available.
Alternatively, selecting a different shape, AD or region will almost certainly have the capacity needed.
Bare metal instances: Host capacity is ordered on a proactive basis guided by the growth rate of a region. Specialized shapes such as DenseIO do not have as much spare overhead and may be more likely to run out of capacity. Customers may need to try another AD or region.

Is there any way to retry 'BackendConnectionFailure at transfer-response' errors in Azure API Management

I am having intermittent connectivity problems with an old legacy api that sometimes causes a
'BackendConnectionFailure at transfer-response' error to be returned from Azure API Management. From my experience retrying the request to the legacy api is usually successful. I have a retry policy similar to the below that is checking for 5xx status codes, however, the retries do not seem to take place.
<retry
condition="#(context.Response.StatusCode == 500)"
count="10"
interval="10"
max-interval="100"
delta="10"
first-fast-retry="false">
<forward-request buffer-request-body="true" />
</retry>
Upon further research Application Insights seems to indicate that the Backend Dependency has a call status = false, but a Result Code = 200.
Is there any way to detect this condition so that a retry takes place, or any other policies that can be used?
In your policy above retry covers only receival of response status code and headers from backend. Response body is not proactively read by APIM and instead transferred directly from backend to client piece by piece. That is what "Transfer response" means. By that time all your policies have already completed.
One way to avoid that is to proactively buffer response from backend at APIM side. Try adding as the first thing in outbound:
<set-body>#(context.Response.Body.As<byte[]>())</set-body>

Google Cloud SQL No Response

We are running a Sails.js API on Google Container Engine with a Cloud SQL database and recently we've been finding some of our endpoints have been stalling, never sending a response.
I had a health check monitoring /v1/status and it registered 100% uptime when I had the following simple response;
status: function( req, res ){
res.ok('Welcome to the API');
}
As soon as we added a database query, the endpoint started timing out. It doesn't happen all the time, but seemingly at random intervals, sometimes for hours on end. This is what we have changed the query to;
status: function( req, res ){
Email.findOne({ value: "someone#example.com" }).then(function( email ){
res.ok('Welcome to the API');
}).fail(function(err){
res.serverError(err);
});
}
Rather suspiciously, this all works fine in our staging and development environments, it's only when the code is deployed in production that the timeout occurs and it only occurs some of the time. The only thing that changes between staging and production is the database we are connecting to and the load on the server.
As I mentioned earlier we are using Google Cloud SQL and the Sails-MySQL adapter. We have the following error stacks from the production server;
AdapterError: Invalid connection name specified
at getConnectionObject (/app/node_modules/sails-mysql/lib/adapter.js:1182:35)
at spawnConnection (/app/node_modules/sails-mysql/lib/adapter.js:1097:7)
at Object.module.exports.adapter.find (/app/node_modules/sails-mysql/lib/adapter.js:801:16)
at module.exports.find (/app/node_modules/sails/node_modules/waterline/lib/waterline/adapter/dql.js:120:13)
at module.exports.findOne (/app/node_modules/sails/node_modules/waterline/lib/waterline/adapter/dql.js:163:10)
at _runOperation (/app/node_modules/sails/node_modules/waterline/lib/waterline/query/finders/operations.js:408:29)
at run (/app/node_modules/sails/node_modules/waterline/lib/waterline/query/finders/operations.js:69:8)
at bound.module.exports.findOne (/app/node_modules/sails/node_modules/waterline/lib/waterline/query/finders/basic.js:78:16)
at bound [as findOne] (/app/node_modules/sails/node_modules/lodash/dist/lodash.js:729:21)
at Deferred.exec (/app/node_modules/sails/node_modules/waterline/lib/waterline/query/deferred.js:501:16)
at tryCatcher (/app/node_modules/sails/node_modules/waterline/node_modules/bluebird/js/main/util.js:26:23)
at ret (eval at <anonymous> (/app/node_modules/sails/node_modules/waterline/node_modules/bluebird/js/main/promisify.js:163:12), <anonymous>:13:39)
at Deferred.toPromise (/app/node_modules/sails/node_modules/waterline/lib/waterline/query/deferred.js:510:61)
at Deferred.then (/app/node_modules/sails/node_modules/waterline/lib/waterline/query/deferred.js:521:15)
at Strategy._verify (/app/api/services/passport.js:31:7)
at Strategy.authenticate (/app/node_modules/passport-local/lib/strategy.js:90:12)
at attempt (/app/node_modules/passport/lib/middleware/authenticate.js:341:16)
at authenticate (/app/node_modules/passport/lib/middleware/authenticate.js:342:7)
at Object.AuthController.login (/app/api/controllers/AuthController.js:119:5)
at bound (/app/node_modules/sails/node_modules/lodash/dist/lodash.js:729:21)
at routeTargetFnWrapper (/app/node_modules/sails/lib/router/bind.js:179:5)
at callbacks (/app/node_modules/sails/node_modules/express/lib/router/index.js:164:37)
Error (E_UNKNOWN) :: Encountered an unexpected error :
Could not connect to MySQL: Error: Pool is closed.
at afterwards (/app/node_modules/sails-mysql/lib/connections/spawn.js:72:13)
at /app/node_modules/sails-mysql/lib/connections/spawn.js:40:7
at process._tickDomainCallback (node.js:381:11)
Looking at the errors alone, I'd be tempted to say that we have something misconfigured. But the fact that it works some of the time (and has previously been working fine!) leads me to believe that there's some other black magic at work here. Our Cloud SQL instance is D0 (though we've tried upping the size to D4) and our activation policy is "Always On".
EDIT: I had seen others complain about Google Cloud SQL eg. this SO post and I was suspicious but we have since moved our database to Amazon RDS and we are still seeing the same issues, so it must be a problem with sails and the mysql adapter.
This issue is leading to hours of downtime a day, we need it resolved, any help is much appreciated!
This appears to be a sails issue, and not necessarily related to Cloud SQL.
Is there any way the QPS limit for Google Cloud SQL is being reached? See here: https://cloud.google.com/sql/faq#sizeqps
Why is my database instance sometimes slow to respond?
In order to minimize the amount you are charged for instances on per use billing plans, by default your instance becomes passive if it is not accessed for 15 minutes. The next time it is accessed there will be a short delay while it is activated. You can change this behavior by configuring the activation policy of the instance. For an example, see Editing an Instance Using the Cloud SDK.
It might be related to your policy setting. If you set it to ON_DEMAND, the instance will sleep to save your budget so that the first query to activate the instance is slow. This might cause the timeout.
https://cloud.google.com/sql/faq?hl=en

HttpError: <HttpError 502 when requesting https://www.googleapis.com/oauth2/v2/userinfo?alt=json returned "Bad Gateway">

We're seeing is issues with our test and production application starting approximately 6:30PM PST 4/30/2013, and requests are failing with the following error.
HttpError: <HttpError 502 when requesting https://www.googleapis.com/oauth2/v2/userinfo?alt=json returned "Bad Gateway">
The API console seems to be having issues as well, the Drive API url below only loads up partially.
https://developers.google.com/apis-explorer/#p/drive/v2/
Indeed, the error was due to a temporary outage.
I would like to add that, for future issues like this (5XX errors), it might be useful/informative to keep an eye on Google's Apps Status Dashboard.

multiple calls to WCF service method in a loop (using the same proxy object) causing timeout

I am calling a WCF service method repeatedly in a loop (with different params on each iteration) and it is causing timeout after around 40 mins. I am using the same proxy object and closing it only once the loop is completed like this. how can I avoid this timeout error? do I need to instantiate a new proxy for each call. (actually I am calling a SQL server reporting server webservice here and passing different params to generate different reports and I am not using a new proxy for each iteration thinking that could slow down generation of reports). here is the client is also a WCF service and it is hosted in a windows service.
(this is just an example for illustration, not the actual code that is failing)
using(var proxy=new serviceclient())
{
for(var i=0;i<50;i++)
{
proxy.methodName(i);
}
}
The error message is something like this
System.TimeoutException: The request
channel timed out while waiting for a
reply after 00:01:00. Increase the
timeout value passed to the call to
Request or increase the SendTimeout
value on the Binding. The time
allotted to this operation may have
been a portion of a longer timeout.
---> System.TimeoutException: The HTTP request to
'http://localhost/ReportServer/ReportExecution2005.asmx'
has exceeded the allotted timeout of
00:01:00. The time allotted to this
operation may have been a portion of a
longer timeout. --->
System.Net.WebException: The operation
has timed out
here is the client WCF config (only part that is related to the reporting services, not the entire WCF config)
<bindings>
<basicHttpBinding>
<binding name="ReportExecutionServiceSoap" closeTimeout="00:01:00"
openTimeout="00:01:00" receiveTimeout="00:10:00" sendTimeout="00:01:00"
allowCookies="false" bypassProxyOnLocal="false" hostNameComparisonMode="StrongWildcard"
maxBufferSize="2147483647" maxBufferPoolSize="2147483647" maxReceivedMessageSize="2147483647"
messageEncoding="Text" textEncoding="utf-8" transferMode="Buffered"
useDefaultWebProxy="true">
<readerQuotas maxDepth="32" maxStringContentLength="2147483647" maxArrayLength="2147483647"
maxBytesPerRead="2147483647" maxNameTableCharCount="2147483647" />
<security mode="TransportCredentialOnly">
<transport clientCredentialType="Ntlm" proxyCredentialType="None"
realm="" />
<message clientCredentialType="UserName" algorithmSuite="Default" />
</security>
</binding>
</basicHttpBinding>
</bindings>
<client>
<endpoint address="http://localhost/ReportServer/ReportExecution2005.asmx"
binding="basicHttpBinding" bindingConfiguration="ReportExecutionServiceSoap"
contract="ReportExecutionServiceReference.ReportExecutionServiceSoap"
name="ReportExecutionServiceSoap" />
</client>
this issue is resolved now. one of the reports (generated by making a call to the report server ASMX webservice) was taking longer than usual and causing the timeout, it was NOT due to the number of calls in the loop (each webservice call is synchronous and not queued up). To resolve this, I used the standard ASP.NET webservice API instead of WCF to call the report execution webservice and set the timeout to infinite like this
var webServiceProxy = new ReportExecutionServiceReference.ReportExecutionService()
{
Url = ConfigurationManager.AppSettings["ReportExecutionServiceUrl"],
Credentials = System.Net.CredentialCache.DefaultCredentials
};
webServiceProxy.Timeout = Timeout.Infinite;
the timeout could have been set to a bigger value instead of infinite as well. this webservice is called in a loop for each report and it takes about two hours to generate all the user selected reports in one go. client is a WCF service and hosted in a windows service instead of IIS to avoid a timeout on the client. thanks for all the replies.
If you are doing asynchronous calls from the client and the server is not a "webfarm" all the calls will be qued on the server. And that could make calls timeout. It doesn't really say fron your code.
Let say that you do are going through a list with 10 items, each response takes 10 seconds to process on the server. Since you are using the same proxy all calls will be quite quick to dispatch from client code. But it will take around 100 seconds to return all answers ( note that i dont take into consideration that you have network latency, object serilization etc etc)
That means that all calls after nr 6 will timeout.
If the server would have more threads available to process data this could ve avoided, but the problem could popup somewhere else. You should be able to try the same call again, since a timeout could accour for any other reason as well, network problem, temporary server overload etc etc.
I would sugest making some sort of quing system that dispatches all the server calls so that you could make the same call again. How that would be implemented depends on your scenario:
Do they need to be sent in a specific order?
Do you need to know when the last call has returned?
etc.
This simply means your server can't deal with the load, or is taking too long for some other reason. Ask the server why it's taking too long; don't be surprised when the client times out.
Dim ws As WCFService.ServiceClient = New WCFService.ServiceClient
ws.Endpoint.Binding.SendTimeout() = TimeSpan.FromSeconds(2)