Ordering of the messages - message-queue

Does ZeroMQ guarantee the order of messages( FIFO ).
Is there an option for persistence.
Is it the best fit for IPC communications.
Does it allow prioritizing the messages.
Does it allow prioritizing the receivers.
Does it allow both synchronous as well asynchronous way of communication?

https://lists.zeromq.org/pipermail/zeromq-dev/2015-January/027748.html
The author said:"Messages carried over TCP or IPC will be delivered in order if they pass through the same network paths. This is guaranteed and it's a TCP
guarantee, nothing to do with ZeroMQ. ZeroMQ does not reorder
messages, ever. However if you pass messages through two or more
paths, and then merge those flows again, you will in effect shuffle
the messages."

Zeromq is best understood as a udp like messaging system. Thus is does not intrinsically guarantee any of that. It DOES guarantee that parts of a single message are received atomically and in order, since ZMQ allows for sending a message consisting of several parts. All communication is always asynchronous by design.
see http://zguide.zeromq.org/ for more advanced patterns.
that being said, all the features requested would by definition make the transmission slower and more complicated. If they are needed you should implement or use one of the available patterns of the guide.

Related

Guidance as to when querying the database for read operation should be done using mass transit request/response for timebound operation

For Create operations it is clear that putting the message in the queue is a good idea in case the processing or creation of that entity takes longer than expected and other the other benefits queues bring.
However, for read operations that are timebound (must return to the UI in less than 3 seconds) it is not entirely clear if a queue is a good idea.
http://masstransit-project.com/MassTransit/usage/request-response.html provides a nice abstraction but it goes through the queue.
Can someone provide some suggestions as to why or why not I would use mass transit or that effect any technology like nservicebus etc for database read operation that are UI timebound?
Should I only use mass transit only for long running processes?
Request/Reply is a perfectly valid pattern for timebound operations. Transport costs in case of, for example, RabbitMQ, are very low. I measured performance of request/response using ServiceStack (which is very fast) and MassTransit. There is an initial delay with MassTransit to cache the endpoints, but apart from that the speed is pretty much the same.
Benefits here are:
Retries
Fine tuning of timeouts
Easy scaling with competing consumers
just to name the most obvious ones.
And with error handling you get your requests ending up in the error queue so there is no data loss and you can always look there to find out what and why went wrong.
Update: There is a SOA pattern that describes this (or rather similar) approach. It is called Decoupled Invocation.

How to understand a role of a queue in a distributed system?

I am trying to understand what is the use case of a queue in distributed system.
And also how it scales and how it makes sure it's not a single point of failure in the system?
Any direct answer or a reference to a document is appreciated.
Use case:
I understand that queue is a messaging system. And it decouples the systems that communicate between each other. But, is that the only point of using a queue?
Scalability:
How does the queue scale for high volumes of data? Both read and write.
Reliability:
How does the queue not becoming a single point of failure in the system? Does the queue do a replication, similar to data-storage?
My question is not specified to any particular queue server like Kafka or JMS. Just in general.
Queue is a mental concept, the implementation decides about 1 + 2 + 3
A1: No, it is not the only role -- a messaging seems to be main one, but a distributed-system signalling is another one, by no means any less important. Hoare's seminal CSP-paper is a flagship in this field. Recent decades gave many more options and "smart-behaviours" to work with in designing a distributed-system signalling / messaging services' infrastructures.
A2: Scaling envelopes depend a lot on implementation. It seems obvious that a broker-less queues can work way faster, that a centralised, broker-based, infrastructure. Transport-classes and transport-links account for additional latency + performance degradation as the data-flow volumes grow. BLOB-handling is another level of a performance cliff, as the inefficiencies are accumulating down the distributed processing chain. Zero-copy ( almost ) zero-latency smart-Queue implementations are still victims of the operating systems and similar resources limitations.
A3: Oh sure it is, if left on its own, the SPOF. However, Theoretical Cybernetics makes us safe, as we can create reliable systems, while still using error-prone components. ( M + N )-failure-resilient schemes are thus achievable, however the budget + creativity + design-discipline is the ceiling any such Project has to survive with.
my take:
I would be careful with "decouple" term - if service A calls api on service B, there is coupling since there is a contract between services; this is true even if the communication is happening over a queue, file or fax. The key with queues is that the communication between services is asynchronous. Which means their runtimes are decoupled - from practical point of view, either of systems may go down without affecting the other.
Queues can scale for large volumes of data by partitioning. From clients point of view, there is one queue, but in reality there are many queues/shards and number of shards helps to support more data. Of course sharding a queue is not "free" - you will lose global ordering of events, which may need to be addressed in you application.
A good queue based solution is reliable based on replication/consensus/etc - depends on set of desired properties. Queues are not very different from databases in this regard.
To give you more direction to dig into:
there an interesting feature of queues: deliver-exactly-once, deliver-at-most-once, etc
may I recommend Enterprise Architecture Patterns - https://www.enterpriseintegrationpatterns.com/patterns/messaging/Messaging.html this is a good "system design" level of information
queues may participate in distributed transactions, e.g. you could build something like delete a record from database and write it into queue, and that will be either done/committed or rolledback - another interesting topic to explore

How can I model this usage scenario?

I want to create a fairly simple mathematical model that describes usage patterns and performance trade-offs in a system.
The system behaves as follows:
clients periodically issue multi-cast packets to a network of hosts
any host that receives the packet, responds with a unicast answer directly
the initiating host caches the responses for some given time period, then discards them
if the cache is full the next time a request is required, data is pulled from the cache not the network
packets are of a fixed size and always contain the same information
hosts are symmetic - any host can issue a request and respond to requests
I want to produce some simple mathematical models (and graphs) that describe the trade-offs available given some changes to the above system:
What happens where you vary the amount of time a host caches responses? How much data does this save? How many calls to the network do you avoid? (clearly depends on activity)
Suppose responses are also multi-cast, and any host that overhears another client's request can cache all the responses it hears - thereby saving itself potentially making a network request - how would this affect the overall state of the system?
Now, this one gets a bit more complicated - each request-response cycle alters the state of one other host in the network, so the more activity the quicker caches become invalid. How do I model the trade off between the number of hosts, the rate of activity, the "dirtyness" of the caches (assuming hosts listen in to other's responses) and how this changes with cache validity period? Not sure where to begin.
I don't really know what sort of mathematical model I need, or how I construct it. Clearly it's easier to just vary two parameters, but particularly with the last one, I've got maybe four variables changing that I want to explore.
Help and advice appreciated.
Investigate tokenised Petri nets. These seem to be an appropriate tool as they:
provide a graphical representation of the models
provide substantial mathematical analysis
have a large body of prior work and underlying analysis
are (relatively) simple mathematical models
seem to be directly tied to your problem in that they deal with constraint dependent networks that pass tokens only under specified conditions
I found a number of references (quality not assessed) by a search on "token Petri net"

Is message priority inherently unimportant in message queue systems?

It seems like most of the messaging systems I've looked at have basic, if any, support for priority message queues. For example, the AMQP only specifies a minimum of 2 priorities. RabbitMQ, an AMQP implementation, doesn't support any priorities. ActiveMQ will be getting support for 10 message priorities in version 5.4 in a couple days. 10 priority levels is the specified by the JMS spec.
A priority queue in the non-messaging sense of the word orders its contents based on an arbitrary field with an unconstrained range of priorities. Why does an implementation like this not exist as part of a messaging system? As I asked in the title, is priority an inherently non-messaging concept?
I realize that one answer might be that the concept of priority introduces the possibility of messages infinitely languishing in the queue while higher priority messages are processed. Are there other reasons?
BTW ActiveMQ now supports priority messaging in 5.4.x via JMSPriority headers.
Rather than getting the message broker to re-order messages within some buffer as they arrive, there are often better techniques to implement priority consumption, such as having a dedicated consumer pool for high priority messages. Then irrespective of how much noise there is from low priority messages, high priority messages will always get though.
Given the asynchronous nature of messaging its easy to fill up buffers, network pipes and prefetch queues with low priority messages if using things like JMSPriority headers etc.
In general, message queue systems are used to ensure delivery of messages between disparate systems.
Usually, there is some sort of once-and-only-once guarantee, and often a further promise that the messages will come in order.
By and large, that then informs the design of the system(s) that you are building and hooking together.
The concepts of priority between decoupled systems often don't make that much sense.
That said, one common workaround is to have two queues, one high priority and one background priority. The inherent problem is then made clear however, because of course the receiving system probably cannot halt processing the low-level request when a higher priority request comes in, so they are generally done sequentially at that level of granularity.
It seems to me that the idea is probably more akin to "process priority" than the priority values in a priority queue. Certainly that is consistent with the sentence-or-two about it in the JMS spec, and evidently with the AMQP spec as well.
One has to be careful in that too many priorities aren't used to the point where using the program becomes more burdensome than going through each message.
Messaging systems are designed and optimized for chronological ordering. File systems are optimized for appending files and not inserting data at the beginning or in the middle. Queue-like data structures are usually optimized for append at the end and removal from the head. For file systems, this means appending to file (adding) and appending to a transaction log (removal), and deleting message files once they are consumed (removal).
Introducing priorities to a processing queue effectively turns the queue into a data structure that has both chronological and priority sorting. Basically, when it comes to working with file storage, it's quite sub-optimal as you have to create some sort of indexing strategy.

In a FIFO Qeueing system, what's the best way the to implement priority messaging

For message-oriented middleware that does not consistently support priority messages (such as AMQP) what is the best way to implement priority consumption when queues have only FIFO semantics? The general use case would be a system in which consumers receive messages of a higher priority before messages of a lower priority when a large backlog of messages exists in Queue(s).
Given only FIFO support for a given single queue, you will of course have to introduce either multiple queues, an intermediary, or have a more complex consumer.
Multiple queues could be handled in a couple of ways. The producer and consumer could agree to have two queues between them, one for high-priority, and one for background tasks.
If your producer is constrained to a single queue, but you have control over the consumer, consider introducing a fan-out router in the path. So producer->Router is a single queue, and the router then has two queues to the consumer.
Another way to tackle it, which is likely less than ideal, would be to have your consumer spin a thread to front the queue, then dispatch the work internally. Something like the router version above, but inside a single app. This has the downside of having multiple messages in flight inside your app, which may complicate recovery in the event of a failure.
Don't forget to consider starvation of the effectively low-priority events, whatever they are, if some of them should be processed even if there are higher-priority events still hanging about.