What happens to new events when one is retrying in the same partition in Event Hub Azure? - partitioning

I'm trying to understand how the partitions are executing the events when there is retry policy in place for the event hub and I can't find an answer to what happens to new events when one got an error and is retrying in the same partition in the event hub?
I'm guessing that the one that got an error shouldn't block new ones from executing and when it reties it should be put at the end of the partition, so any other events that got in the partition after that event got an error should be executed in order without any blockage.
Can someone explain what is actually happening in a scenario like that?
Thanks.

It's difficult to answer precisely without some understanding of the application context. The below assumes the current generation of the Azure SDK for .NET, though conceptually the answer will be similar for others.
Retries during publishing are performed within the client, which treats each publishing operation an independent and isolated. When your application calls SendAsync, the client will attempt to publish them and will apply its retry policy in the scope of that call. When the SendAsync call completes, you'll have a deterministic answer of whether the call succeeded or failed.
If the SendAsync call throws, the retry policy has already been applied and either the exception was fatal or all retries were exhausted. The operation is complete and the client is no longer trying to publish those events.
If your application makes a single SendAsync call then, in the majority of cases, it will understand the outcome of the publishing operation and the order of events is preserved. If your application is calling SendAsync concurrently, then it is possible that events will arrive out of order - either due to network latency or retries.
While the majority of the time, the outcome of a call is fully deterministic, some corner cases do exist. For example, if the SendAsync call encounters a timeout, it is ambiguous whether or not the service received the events. The client will retry, which may produce duplicates. If your application sees a TimeoutException surface, then it cannot be sure whether or not the events were successfully published.

Related

Sawtooth - Remove Pending Transactions

Often we have ran into problems with custom TransactionProcessors, when the TP crashes or is unable to connect to the sawtooth Nodes we get a QUEUE_FULL error and from there on all transaction go into PENDING state, including intkey / settings.
Is there a way to remove PENDING transactions and clean up the queue or any cli that can clean up the batches / transactions that are in the queue.
Hyperledger Sawtooth validator attempts at executing transactions in the order they arrive, when there is a call from the Consensus engine. The question is discussing 2 distinct features, happy to help further.
Feature 1: The solution for Transaction Processor crash. It is expected that the transaction processor execute a transaction in the queue when consensus engine asks the validator to build a block. If for some reason the Transaction Processor is unable to process the message, the result of which is still unknown to the validator. So, the validator keeps it in pending state as long as it can be scheduled for execution. Right way for de-queuing it is by executing it. Either put it in a block if it's valid or remove from the queue if it is invalid.
Solution: Check why is the Transaction Processor crashing. This is the code you own. The validator expects one of the following responses - transaction is valid, transaction is invalid, transaction couldn't be evaluated and needs retry.
Feature 2: Removing pending batches from the queue deliberately without telling Hyperledger Sawtooth about it. The pending queue is in memory, it is not saved on disk. The crazy solution thus is to restart that particular instance of validator node.
Note: This may not be possible in certain cases because of the deployment model chosen. Ensure your network and deployment is able to handle node restart scenarios before doing it. There could be bad consequences if the TP crashed on one of the node instead of all. The effect of which will make this particular validator send wrong result to reach the consensus, and depending on the consensus algorithm and the network size the handling of this error may happen differently. The clean solution however is to restart the Transaction Processor.
Hope this answer helps! Happy blockchaining..

Axon4 - Standard exception handling mechanism

What is standard exception handling mechanism for Axon4 ?
Addiotnaally - how do you ensure that if exception occours during one of the event in Saga (in between state), previous states gets rolled back ?
Any example would help.
This is quite a broad question you're asking here Prashant, with a couple of answers to it. Additionally, your asking two questions, one being what the exception handling approach is and two how to deal with exceptions in Sagas.
So, as you're talking about Sagas and events, I am going to take the stance that you want to know the exception handling process around events. Let me first answer your first question here.
As you might have read in the Reference Guide, Axon uses what's called an EventProcessor as the technical mechanism to get events to your 'event handling components' and Saga instances.
The EventProcessors allow two different levels of exception handling:
1. When an exception occurs in the #EventHandler/#SagaEventHandler annotated function, this can be caught in the ListenerInvocationErrorHandler.
2. When an exception occurs in the EventProcessor, this can be caught in the ErrorHandler.
Both of these can be set in Axon 4 by dealing with the EventProcessingConfigurer and calling their respective register functions, which allow you to set a global default or adjust these per Event Processor.
The defaults are respectively the LoggingErrorHandler and the PropagatingErrorHandler.
The second question your asking is about how to rollback state in a Saga when an exception occurs. The suggestion I'd like to give you here is that upon receiving an event, you change the state first and only after that will you perform other operations.
Other operations like calling a third party service or publishing a command.
These operations might as you've noticed fail with an exception. This should however not rollback the state of the Saga at all.
The event already happened. The fact that the operation after that failed does not change the fact of that event having occurred.
Thus what I do suggest is that you perform a compensating action if such an exception occurs.

Put message works in spite of catching MQException with MQ Code 2009

I have a strange issue which is causing a serious double-booking problem for us.
We have an MQ.NET code written in C# running on a Windows box that has MQ Client v 7.5. The code is placing messages on the MQ queue. Once in a while the "put" operation works and the message is placed on the, but the MQException is still thrown with Error Code 2009.
In this case, the program assumes that the put operation failed and places the same message on the queue again, which is not a desirable scenario. The assumption is that if the "put" resulted in MQException the operation has failed. Any idea how to avoid this issue from happening? See the client code below.
queue = queueManager.AccessQueue(queueName, MQC.MQOO_OUTPUT + MQC.MQOO_FAIL_IF_QUIESCING);
queueMessage = new MQMessage();
queueMessage.CharacterSet = 1208;
var utf8Enc = new UTF8Encoding();
byte[] utf8String = Encoding.UTF8.GetBytes(strInputMsg);
queueMessage.WriteBytes(Encoding.UTF8.GetString(utf8String).ToString());
queuePutMessageOptions = new MQPutMessageOptions();
queue.Put(queueMessage, queuePutMessageOptions);
Exception:
MQ Reason code: 2009, Exception: Error in the application.
StackTrace: at IBM.WMQ.MQBase.throwNewMQException()
at IBM.WMQ.MQDestination.Open(MQObjectDescriptor od)
at IBM.WMQ.MQQueue..ctor(MQQueueManager qMgr, String queueName, Int32 openOptions, String queueManagerName, String dynamicQueueName, String alternateUserId)
at IBM.WMQ.MQQueueManager.AccessQueue(String queueName, Int32 openOptions, String queueManagerName, String dynamicQueueName, String alternateUserId)
at IBM.WMQ.MQQueueManager.AccessQueue(String queueName, Int32 openOptions)
There is always an ambiguity of outcomes when using any async messaging over the network. Consider the following steps in the API call:
The client sends the API call to the server.
The server executes the API call.
The result is returned to the client.
Let's say the connection is lost prior or during #1 above. The application gets the 2009 and the message is never sent.
But what if the connection is lost after #1? The outcome of #2 cannot possibly be returned to the calling application. Whether the PUT succeeded or failed, it always gets back a 2009. Maybe the message was sent and maybe it wasn't. The application probably should take the conservative option, assume it wasn't sent, then resend it. This results in duplicate messages.
Worse is if the application is getting the message. When the channel agent successfully gets the message and can't return it to the client then that message is irretrievably lost. Since the application didn't specify syncpoint, it wasn't MQ that lost the message but rather the application.
This is intrinsic to all types of async messaging. So much so that the JMS 1.1 specification specifically addresses it in 4.4.13 Duplicate Production of Messages which states that:
If a failure occurs between the time a client commits its work on a
Session and the commit method returns, the client cannot determine if
the transaction was committed or rolled back. The same ambiguity
exists when a failure occurs between the non-transactional send of a
PERSISTENT message and the return from the sending method.
It is up to a JMS application to deal with this ambiguity. In some
cases, this may cause a client to produce functionally duplicate
messages.
A message that is redelivered due to session recovery is not
considered a duplicate message.
This can be addressed in part by using syncpoint. Any PUT or GET under syncpoint will be rolled back if the call fails. The application can safely assume that it needs to PUT or GET the message again and no dupes or lost messages will result.
However, there is still the possibility that 2009 will be returned on the COMMIT. At this point you do not know whether the transaction completed or not. If it is 2-phase commit (XA) the transaction manager will reconcile the outcome correctly. But if it is 1-Phase commit, then you are back to not knowing whether the call succeeded or failed.
In the case that the app got a message under syncpoint, it will at least have either been processed or rolled back. This completely eliminates the possibility of losing persistent messages due to ambiguous outcomes. However if the app received a message and gets 2009 on the COMMIT then it may receive the same message again, depending on whether the connection failure occurred in #1 or #3 in the steps above. Similarly, a 2009 when committing a PUT can only be dealt with by retrying the PUT. This also potentially results in dupe messages.
So, short of using XA, any async messaging faces the possibility of duplicate messages due to connection exception and recovery. TCP/IP has become so reliable since MQ was invented that most applications ignore this architectural constraint without detrimental effects. Although that increased reliability in the network makes it less risky to design apps that don't gracefully handle dupes, it doesn't actually address the underlying architectural constraint. That can only be done in code, XA being one example of that. Many apps are written to gracefully handle dupe messages and do not need XA to address this problem.
Note: Paul Clarke (the guy who wrote much of the MQ channel code) is quick to point out that the ambiguity exists when using bindings mode connections. In 20 years of using WMQ I have yet to see a 2009 on a bindings mode connection but he says the shorter path to the QMgr doesn't eliminate the underlying architectural constraint any more so than does the reliable network.)

Biztalk exception- self healing orchestartion

We have main orchestration that has multiple sub orchestration. All root orchestration is of transaction type:none, hence all the sub are also of same nature. Now any exception is caught in a parent scope of main orchestration and we have some steps like logging. The orchestration is activated with a message from App SQL. So every time an exception occurs, say due to something intermittent, like unable to connect to web service. We later go manually re-trigger.
I'm looking at modifying the orch to be self healing, say from exception catch block it reinitialize the messages based on conditions that tell, the issue was intermittent. Something like app issue-null reference, we would not want to resend message, because, the orch is never going to work.
There is a concept called compensation, but that is for transaction based orch- do n steps if any 1 fails, do m other steps(which would do alternate action or cleanup).
The only idea I have is do a look-up based on keywords in exception and decide to resend messages. But I want some1 to challenge this or suggest a better approach
I have always thought that it's better to handle failures offline. So if the orchestration fails, terminate it. But before you terminate, send a message out. This message will contain all the information necessary to recover the message processing if it turns out that there was a temporary problem which caused the failure. The message can be consumed by a "caretaker" process which is responsible for recovery.
This is similar to how the Erlang OTP framework approaches high availability. Processes fail quickly and caretaker processes make sure recovery happens.

Implementing rollback transaction in WP7

How I can implement rollback transaction in wp7. Presently my issue is after insertion or deletion I am calling submits changes, In that time if i made a tombstone the app exits. how I can handle this situation I am planning to use try catch and if any exception caught means I need to rollback the changes. Please anyone help me to implement the same in wp7.
Why do you need to rollback when the application becomes tombstoned? Technically your application is not aware of when it is tombstoned, you are only aware of when it becomes de-activated. See the following lifecycle diagram:
(The image above is from the blog post http://www.scottlogic.co.uk/blog/colin/2011/10/a-windows-phone-7-1-mango-mvvm-tombstoning-example/ which describes the lifecycle in detail)
Whenever you application is de-activated, you can handle the Deactivated event. From MSDN:
Applications are given 10 seconds to complete the Deactivated handler
This gives you the oppurtunity to cleanup, save state and perform other activities before your application becomes de-activated.
I presume you are commiting your transaction when your application state changes? Does the commit run on the UI thread? i.e. is it blocking? If so, you do not need to do anything else (other than ensure it does not take more than 10 seconds). If your commit is running on a background thread, you will have to ensure that your Deactivated event handler blocks until the commit is complete.