From 0.8 Documentation under producer config
the property request.required.acks
value controls when the producer receives an acknowledgement from the broker.
Typical values are
(1) 0, which means that the producer never waits for an acknowledgement from the broker
(2) 1, which means that the producer gets an acknowledgement after the leader replica has received the data
(3) -1, which means that the producer gets an acknowledgement after all in-sync replicas have received the data
How do I receive this acknowledgement in producer when the request.required.acks value is 1. The producer.send(MessageKey) being a void I couldn't find any options to retrieve it.
The API for the producer send leaves much to be desired, particularly in the async mode. Those acks are hidden from the user of the producer object. If they fail, you will eventually see an exception.
The problem in the async case is that you will not know were the batch that failed began, so some guessing will be involved if you want to retry the sends later.
It seems that there are plans to improve this in future releases (> 0.8.0).
Related
I have implemented kafka stream application. Let's say one of the object's field which the stream is currently processing contains a number instead of a string value. Currently when there is an exception thrown in the processing logic eg. .transform() method, whole stream is killed and my application stops to process data.
I would like to skip such invalid record and keep processing next records available on a input topic. Additionally I don't want to implement any try-catch statements in my stream processing code.
To achieve this, I implemented StreamsUncaughtExceptionHandler so it returns StreamThreadExceptionResponse.REPLACE_THREAD enum in order to spawn new thread and keep on processing next records waiting on the input topic. However, it turned out that the stream consumer offset is not committed and when new a thread is started, it takes old record which just have killed the previous stream thread... Since the logic is the same, new thread will also fail to process the error record and again fail. Some kind of a loop spawning new thread and failing on a same record every time.
Is there any clean way of skipping failing record and keep the stream processing next records?
Please note, I am not asking about DeserializationExceptionHandler or ProductionExceptionHandler.
When it comes to the application-level code, it is mostly up to the application how the exception is handled. This use case has come up before. See these previous Stack Overflow threads.
Example on handling processing exception in Spring Cloud Streams with Kafka Streams Binder and the functional style processor
How to stop sending to kafka topic when control goes to catch block Functional kafka spring
Try to see if those answers can be applied to your scenario.
You can filter the event that dont match a pattern or validate the events before you transform them
I’m working with the 0.27.0 version of context broker. I'm using the Cygnus generic enabler and I have established a MQTT agent that connects external devices to the context broker.
My major concern right now is how to prevent from data loss. I established the context broker and the Cygnus mongodb databases as replica sets, but that won't ensure that all data will be persisted into the databases. I have seen that Cygnus uses Apache flume. Looking at its configuration, the re-injection retries can be configured:
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = -1
¿It is a good idea to establish the retries value to -1? I have read about events re-injected in the channel forever.
¿What can be done to ensure that all the data will be persisted?
¿Is there any functionality into fiware ecosystem oriented to that purpose?
Regarding Cygnus, the TTL is for sure the way of controlling the persistence retries after an error. A retry means the data is reinjected in the internal channel communicating the source (which receives Orion notifications) and the sink (which persists the data in the final storage) for future persistence attempts.
Possible values for this TTL are:
TTL = 0: there are no retries, i.e. if the first time a notified data cannot be persisted in the final storage (because of a network fail, a storage error, whatever) then the data is dropped.
TTL > 0: there are as much retries as configured TTL. Once exhausted the TTL the data is dropped.
TTL = -1: infinite retries, i.e. the data is reinjected in the channel forever until it is persisted or the channel gets full.
As commented, a -1 TTL may consume the channel capacity if the final storage never gets OK, avoiding new received data is put into the channel. Nevertheless, if the final storage never gets OK, such a drawback does not matter, right? :)
Thus, we could say the rules for choosing a TTL are:
If you don't want retries, simply configure 0.
If you want retries but you don't mind to loose data afeter certain number of retries, then configure a positive value.
If you want retries but you don't want to loose data, then configure -1 and a large channel capacity since the final storage may be down for an unknown time.
In any case, the TTL feature is changing during this sprint. The behaviour will be the same, but instead of being applied to single events, it will applied to batches of events (batches may be about 1 single event, of course). You'll see this change in the next release of Cygnus (0.13.0), and it will be available at the end of February 2016 (at the moment of writing this, the next week :)). My recommendation is to wait for such a release if you want to instensively use the TTL feature.
What is the difference between a message channel and a message queue itself?
They're different things. The queue actually holds messages which will be processed (pushed to the listener) in FIFO manner.
A channel is a medium through which messages are transmitted.
What does that mean exactly? In a book "Enterprise Integration Patterns" it says:
Connect the applications using a Message Channel, where one application writes information to the channel and the other one reads that information from the channel.
Does this mean that this message channel actually abstracts the queue away from the producer and consumer of the message? But it really doesn't right? When a producer has to place a message into a queue, it actually specifies the queue manager and queue names it want's to connect to.
There's also the concept of different protocols in channels and different data formats in channels where you have a separate channel for each protocol you're using maybe and maybe a separate channel for each data format (XML, JSON etc).
This would facilitate the different queues to pick up from different channels. But why not directly call different queues for different data formats? What exactly is the role of the channel? Is it just a connection?
I'm a completely new at MQM. I've just been assigned to this project which involves producing and consuming messages and I'm trying to wrap my mind around this.
To expand a bit on Shashi's answer, please keep in mind that the EIP book referenced talks about high level messaging patterns. In that context the authors needed a generic term for the medium by which messages are transferred between two points and chose the word "channel".
For purposes of the book the a channel connects any two endpoints that move messages, for any message transport vendor. In this case a channel has attributes that are classes of service and support the various patterns. It may be 1-1, 1-many, many-1, many-many, etc.
So for example if it is ZeroMQ, the endpoints are two peer-to-peer nodes and there's no messaging engine between them. For IBM MQ one endpoint is always the queue manager (a type of messaging engine) and the other is an application or another queue manager.
Based on this example, it should be obvious that channel as used in the book and channel as defined by any messaging transport are at different levels of abstraction. As used by MQ, a channel is a specific set of configuration parameters that define a communication path and includes such attributes as CONNAME, MAXMSGL, tuning parameters, SSL parameters, etc. Once an MQ channel is successfully started, you can see a running instance of it by displaying the channel status. In the case of CLUSRCVR, SVRCONN, and (less commonly) RCVR or RQSTR channels, you may see multiple instances of the same channel active simultaneously.
If you are still with me, you may have noticed that the MQ usage of the term channel always describes one or more point-to-point network connections whereas the EIP book's usage of the term channel is roughly translated as "the thing that moves messages between application endpoints." Consider that two applications connected directly to the queue manager using shared memory would be using a channel as defined in EIP but not as the term is defined by IBM MQ.
Based on that example, it should be clear that the EIP version of the term channel includes the queue manager and any MQ connections between the queue manager and application endpoints.
To sum up:
The EIP book's channel is all messaging infrastructure that isn't one of the application endpoints, and in an MQ context it includes the queue manager and any MQ channels.
The IBM MQ channel is a specific configuration defining network connectivity between the queue manager and another queue manager or a client application.
I hope this clarifies the terminology rather than confusing things further. I will update based on any comments if needed.
A message queue stores messages sent by producers so that they can be delivered to consumers.
A channel is the media or communication link for transmitting messages from
producer to queue,
queue to consumer,
or one queue in a queue manager to another queue in another queue manager.
There are two types of channels:
1) A Message channel is a unidirectional communications link between two queue managers.
Message channels are used to transfer messages between the two queue managers.
2) A MQI channel connects an application (producer or consumer) to a queue manager on a server machine.
MQI channels are required to transfer MQ API calls and responses between MQ client applications and queue managers.
So,
in simple terms,
channel is a communication media between a client application and a queue manager (or between queue managers) for sending and/or receiving messages.
MQ uses a proprietary protocol to transmit messages from client applications to queue managers and between queue managers.
The format of the data contained in the message does not matter,
it can be anything including bytes, XML, or JSON.
Any type of data can be sent over the same channel.
Hope this helped.
WebSphere MQ queues
A queue is a container for messages. Business applications that are connected to the queue manager that hosts the queue can retrieve messages from the queue or can put messages on the queue. A queue has a limited capacity in terms of both the maximum number of messages that it can hold and the maximum length of those messages.
Reference
Channels
WebSphere® MQ uses two different types of channels:
A message channel, which is a unidirectional communications link between two queue managers. WebSphere MQ uses message channels to transfer messages between the queue managers. To send messages in both directions, you must define a channel for each direction.
An MQI (Message Queue Interface) channel, which is bidirectional and connects an application (MQI client) to a queue manager on a server machine. WebSphere MQ uses MQI channels to transfer MQI calls and responses between MQI clients and queue managers
Reference
We have a requirement for an API, which allows asynchronous updates via a MSMQ message queue, that I'm putting together which will allow the developer consuming the API to specify different retry policies per message. So a high priority client system, e.g. for sales will submit all messages with 5 delivery attempts (retries) and 15 minutes between each attempt, whereas a low priority client system, e.g. back-end mail shot system will allow users to update their marketing preferences, submitting messages with 3 retries and an hour between each attempt.
Is there a way in the System.Messaging MSMQ (version 3 or 4) implementation to specify number of retries, retry delay and things like whether messages are sent to a dead letter queue or just deleted? (and if so, how?)
I would be open to using other messaging frameworks if they fulfilled this requirement.
Is there a way in the System.Messaging MSMQ (version 3 or 4) implementation to specify number of retries
Depending on which operating system/msmq version you're using, specifying retry semantics is highly sophisticated in WCF. The following is for Windows 2008 and MSMQ4 using a transactional queue.
The main setting on the binding is called MaxRetryCycles. One retry cycle is an attempt to successfully read a message from a queue and process it inside the handling method. This "attempt" can actually be made up of multiple attempts, as defined by the msmq binding property ReceiveRetryCount. ReceiveRetryCount is the number of times an application will try to read the message and process it before rolling back the de-queue transaction. This marks the end of one retry cycle.
You can also introduce a delay in between cycles with the RetryCycleDelay property.
A more complicated consideration is what to do with the messages which fail even after multiple retry cycles.
allow the developer consuming the API to specify different retry policies per message
I am not sure how you could do this with MSMQ - as far as I'm aware it's only possible to set retry semantics on a per-endpoint basis. If you're using transactions then you can't even allow API users to set the priority of the messages being sent (transactional queues guarantee delivery in order).
The only thing you could do is host a another instance of your API as high-priority and one for low priority. These could be hosted on different environments, and this has the added benefit that low priority messages won't be competing for system resources with high priority messages.
Hope this helps.
I have studied Message Queues System in my class but I still don't get it how these Message Queues System work in real time scenarios? Is there any tutorial which can help me to get the complete picture?
Can someone explain me how these systems work?
An example: My thread or process can send a message to your message queue, and having sent it, my code goes on to do something else. Your code, when it gets around to it, reads the next message from the message queue, and then decides what to do about that message. Message queues avoid needing to have a critical section or mutex shared between the two threads, or processes. The underlying message queue layer itself takes care of making sure that messages get into the queue without race conditions affecting the integrity of the queue.
Message queues can be used for both one-way and two-way, asynchronous messaging. For one-way use, my thread can use it to keep your thread appraised of key events in my thread, without acknowledgement back from your thread. For two-way use, after my thread sends a message to your thread, your thread may need to send data back to my thread via my message queue.
The message queue layer uses lower level synchronization schemes to insure that no two writers to the queue can write at the same time. It insures that all writes to the queue are atomic. It also insures that a reader of the queue cannot read a partially written message from the queue.
Most message queue APIs also offer support for reading messages from the queue based on a filter that you designate. Say for instance that you consider messages from a time critical thread to be more important that other messages. You can each time you check your queue for messages, first check for messages from the critical thread, and service those messages first. Your thread would then go onto to process the rest of the messages as normal, provided no more messages from the critical thread are found.
A C tutorial of the UNIX message queues
That's a complex topic but to put it simply:
Message Queues are one of the best ways, if not the best, to
implement distributed systems.
Now you might ask, what is a distributed system? It is an integrated system that spans multiple machines, clients or nodes which execute their tasks in parallel in a non-disruptive way. A distributed system should be robust enough to continue to operate when one or more nodes fail, stop working, lag or are taken down for maintenance.
Then you might ask, what is a message queue? It is a message-oriented middleware that enables the development of a distributed system by using asynchronous messages for inter-node communication through the network.
And finally you might ask, what is all that good for? This is good for implementing applications with a lot of moving parts called nodes which needs real-time monitoring and real-time reaction capabilities. To summarize they provide: parallelism (nodes can truly run in parallel), tight integration (all nodes see the same messages in the same order), decoupling (nodes can evolve independently), failover/redundancy (when a node fails, another one can be running and building state to take over immediately), scalability/load balancing (just add more nodes), elasticity (nodes can lag during activity peaks without affecting the system as a whole) and resiliency (nodes can fail / stop working without taking the whole system down).
Check this article which discusses a message queue infrastructure in detail.