ZeroMQ Messaging Queues - message-queue

I am working for a while with ZeroMQ. I read already some whitepapers and a lot from the guide, but one question remained open to me:
Lets say we use PUB-SUB.
Where or on which system are the messaging queues? On the publisher system side, on the subscriber system side or something in between?
Sorry for the maybe stupid question.
This picture is from the Broker vs. Brokerless whitepaper.

Every zeromq socket has both send and recv queues (The limits are set via high water mark).
In the case of a brokerless PUB/SUB the messages could be queued on the sender or the receiver.
Example:
If there is network congestion the sender may queue on its send queue
If the receiver is consuming messages slower than they arrive they will be queued on the receiver recv queue
If the queues reach the high water mark the messages will be lost.

Q : Where or on which system are the messaging queues?
The concept of the Zen-of-Zero, as built-into ZeroMQ uses smart and right enough approches, on a case by case principle.
There cases, where there are no-Queues, as you know them, at all :
for example, the inproc:// transport-class has no Queue, as one may know 'em, as there are just memory-regions, across which the messages are put and read-from. The whole magic thus appears inside the special component - the Context()-instance. There the message-memory-mapping takes place. The inproc:// case is a special case, where no I/O-thread(s) are needed at all, as the whole process is purely memory-mapping based and the Context()-instance manipulates its internal states, so as to emulate both an externally provided abstraction of the Queue-alike behaviour and also the internal "Queue"-management.
Others, similarly, operate localhost-located internal Queue(s) endpoints :
for obvious reason, as the ZeroMQ is a broker-less system, there is no "central"-place and all the functionality is spread across the participating ( coordinated and cooperating ) nodes.
Either one of the Queue-ends reside inside the Context()-instances, one in the one-side's localhost-itself, the other in the "remote"-host located Context()-instance, as was declared and coordinated during building of the ZMTP/RFC-specified socket-archetype .bind()/.connect() construction. Since a positive acknowledgement was made for such a ZMTP-specific-socket, the Queue-abstraction ( implemented in a distributed-system-manner ) holds till such socket does not get .close()-ed or the Context()-instance did not get forcefully or by a chance .term()-ed.
For these reasons, the proper capacity sizings are needed, as the messages may reside inside the Context()-operated memory before it gets transported across the { network | ipc-channel }-mediated connection ( on the .send()-side ) or before being .recv()-ed ( from inside the remote Context()-instance ) by the remote application-code for any further use of the message-payload data ( a Zero-copy message-data management is possible, yet not all use-cases indeed use this mode for avoiding replicated memory-allocations and data-transfer costs ).

Related

Are there any Python 2.7 alternatives to ZeroMQ that are released under the BSD or MIT license?

I am seeking Python 2.7 alternatives to ZeroMQ that are released under the BSD or MIT license. I am looking for something that supports request-reply and pub-sub messaging patterns. I can serialize the data myself if necessary. I found Twisted from Twisted Matrix Labs but it appears to require a blocking event loop, i.e. reactor.run(). I need a library that will run in the background and let my application check messages upon certain events. Are there any other alternatives?
Give nanomsg, a ZeroMQ younger sister, a try - same father, same beauty
Yes, it is licensed under MIT/X11 license.
Yes, REQ/REP - allows to build clusters of stateless services to process user requests
Yes, PUB/SUB - distributes messages to large sets of interested subscribers
Has several Python bindings available
https://github.com/tonysimpson/nanomsg-python (recommended)
https://github.com/sdiehl/pynanomsg
https://github.com/djc/nnpy
Differences between nanomsg and ZeroMQ
( state as of 2014/11 v0.5-beta - courtesy nanomsg.org >>> a-click-thru to the original HyperDoc )
Licensing
nanomsg library is MIT-licensed. What it means is that, unlike with ZeroMQ, you can modify the source code and re-release it under a different license, as a proprietary product, etc. More reasoning about the licensing can be found here.
POSIX Compliance
ZeroMQ API, while modeled on BSD socket API, doesn't match the API fully. nanomsg aims for full POSIX compliance.
Sockets are represented as ints, not void pointers.
Contexts, as known in ZeroMQ, don't exist in nanomsg. This means simpler API (sockets can be created in a single step) as well as the possibility of using the library for communication between different modules in a single process (think of plugins implemented in different languages speaking each to another). More discussion can be found here.
Sending and receiving functions ( nn_send, nn_sendmsg, nn_recv and nn_recvmsg ) fully match POSIX syntax and semantics.
Implementation Language
The library is implemented in C instead of C++.
From user's point of view it means that there's no dependency on C++ runtime (libstdc++ or similar) which may be handy in constrained and embedded environments.
From nanomsg developer's point of view it makes life easier.
Number of memory allocations is drastically reduced as intrusive containers are used instead of C++ STL containers.
The above also means less memory fragmentation, less cache misses, etc.
More discussion on the C vs. C++ topic can be found here and here.
Pluggable Transports and Protocols
In ZeroMQ there was no formal API for plugging in new transports (think WebSockets, DCCP, SCTP) and new protocols (counterparts to REQ/REP, PUB/SUB, etc.) As a consequence there were no new transports added since 2008. No new protocols were implemented either. The formal internal transport API (see transport.h and protocol.h) are meant to mitigate the problem and serve as a base for creating and experimenting with new transports and protocols.
Please, be aware that the two APIs are still new and may experience some tweaking in the future to make them usable in wide variety of scenarios.
nanomsg implements a new SURVEY protocol. The idea is to send a message ("survey") to multiple peers and wait for responses from all of them. For more details check the article here. Also look here.
In financial services it is quite common to use "deliver messages from anyone to everyone else" kind of messaging. To address this use case, there's a new BUS protocol implemented in nanomsg. Check the details here.
Threading Model
One of the big architectural blunders I've done in ZeroMQ is its threading model. Each individual object is managed exclusively by a single thread. That works well for async objects handled by worker threads, however, it becomes a trouble for objects managed by user threads. The thread may be used to do unrelated work for arbitrary time span, e.g. an hour, and during that time the object being managed by it is completely stuck. Some unfortunate consequences are: inability to implement request resending in REQ/REP protocol, PUB/SUB subscriptions not being applied while application is doing other work, and similar. In nanomsg the objects are not tightly bound to particular threads and thus these problems don't exist.
REQ socket in ZeroMQ cannot be really used in real-world environments, as they get stuck if message is lost due to service failure or similar. Users have to use XREQ instead and implement the request re-trying themselves. With nanomsg, the re-try functionality is built into REQ socket.
In nanomsg, both REQ and REP support cancelling the ongoing processing. Simply send a new request without waiting for a reply (in the case of REQ socket) or grab a new request without replying to the previous one (in the case of REP socket).
In ZeroMQ, due to its threading model, bind-first-then-connect-second scenario doesn't work for inproc transport. It is fixed in nanomsg.
For similar reasons auto-reconnect doesn't work for inproc transport in ZeroMQ. This problem is fixed in nanomsg as well.
Finally, nanomsg attempts to make nanomsg sockets thread-safe. While using a single socket from multiple threads in parallel is still discouraged, the way in which ZeroMQ sockets failed randomly in such circumstances proved to be painful and hard to debug.
State Machines
Internal interactions inside the nanomsg library are modeled as a set of state machines. The goal is to avoid the incomprehensible shutdown mechanism as seen in ZeroMQ and thus make the development of the library easier.
For more discussion see here and here.
IOCP Support
One of the long-standing problems in ZeroMQ was that internally it uses BSD socket API even on Windows platform where it is a second class citizen. Using IOCP instead, as appropriate, would require major rewrite of the codebase and thus, in spite of multiple attempts, was never implemented. IOCP is supposed to have better performance characteristics and, even more importantly, it allows to use additional transport mechanisms such as NamedPipes which are not accessible via BSD socket API. For these reasons nanomsg uses IOCP internally on Windows platforms.
Level-triggered Polling
One of the aspects of ZeroMQ that proved really confusing for users was the ability to integrate ZeroMQ sockets into an external event loops by using ZMQ_FD file descriptor. The main source of confusion was that the descriptor is edge-triggered, i.e. it signals only when there were no messages before and a new one arrived. nanomsg uses level-triggered file descriptors instead that simply signal when there's a message available irrespective of whether it was available in the past.
Routing Priorities
nanomsg implements priorities for outbound traffic. You may decide that messages are to be routed to a particular destination in preference, and fall back to an alternative destination only if the primary one is not available.
For more discussion see here.
TCP Transport Enhancements
There's a minor enhancement to TCP transport. When connecting, you can optionally specify the local interface to use for the connection, like this:
nn_connect (s, "tcp://eth0;192.168.0.111:5555").
Asynchronous DNS
DNS queries (e.g. converting hostnames to IP addresses) are done in asynchronous manner. In ZeroMQ such queries were done synchronously, which meant that when DNS was unavailable, the whole library, including the sockets that haven't used DNS, just hung.
Zero-Copy
While ZeroMQ offers a "zero-copy" API, it's not true zero-copy. Rather it's "zero-copy till the message gets to the kernel boundary". From that point on data is copied as with standard TCP. nanomsg, on the other hand, aims at supporting true zero-copy mechanisms such as RDMA (CPU bypass, direct memory-to-memory copying) and shmem (transfer of data between processes on the same box by using shared memory). The API entry points for zero-copy messaging are nn_allocmsg and nn_freemsg functions in combination with NN_MSG option passed to send/recv functions.
Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure. Thus, nanomsg uses memory-efficient version of Patricia trie instead of simple trie.
For more details check this article.
Unified Buffer Model
ZeroMQ has a strange double-buffering behaviour. Both the outgoing and incoming data is stored in a message queue and in TCP's tx/rx buffers. What it means, for example, is that if you want to limit the amount of outgoing data, you have to set both ZMQ_SNDBUF and ZMQ_SNDHWM socket options. Given that there's no semantic difference between the two, nanomsg uses only TCP's (or equivalent's) buffers to store the data.
Scalability Protocols
Finally, on philosophical level, nanomsg aims at implementing different "scalability protocols" rather than being a generic networking library. Specifically:
Different protocols are fully separated, you cannot connect REQ socket to SUB socket or similar.
Each protocol embodies a distributed algorithm with well-defined prerequisites (e.g. "the service has to be stateless" in case of REQ/REP) and guarantees (if REQ socket stays alive request will be ultimately processed).
Partial failure is handled by the protocol, not by the user. In fact, it is transparent to the user.
The specifications of the protocols are in /rfc subdirectory.
The goal is to standardise the protocols via IETF.
There's no generic UDP-like socket (ZMQ_ROUTER), you should use L4 protocols for that kind of functionality.

Realtime synchronization of live data over network

How do you sync data between two processes (say client and server) in real time over network?
I have various documents/datasets constructed on the server, which are downloaded and displayed by clients. Once downloaded, the document receives continuous updates in order to remain fresh.
It seems to be a simple and commonly occurring concept, but I cannot find any tools that provide this level of abstraction. I am not even sure what I am looking for. Perhaps there is a similar concept with solid tool support? Perhaps there is a chain of different tools that must be put together? Here's what I have considered so far:
I am required to propagate every change in a single hop (0.5 RTT), which rules out polling (typically >10 RTT) and cache invalidation techniques (1.5 RTT).
Data replication and simple notification broadcasts are not an option, because there is too much data and too many changes. Clients must be able to select specific documents to download and monitor for changes.
I am currently using message passing pattern, which does the job, but it is hopelessly unproductive. It works at way too low level of abstraction. It is laborious, error-prone, and it doesn't scale well with increasing application complexity.
HTTP and other RPC-like techniques are good for the initial fetch, but they encourage polling for subsequent synchronization. When performing reverse requests (from data source to data consumer), change notifications are possible, but it's even more complicated than message passing.
Combining RPC (for the initial fetch) with message passing (for updates) turned out to be a nightmare due to the complexity involved in coordinating communication over the two parallel connections as well as due to the impedance mismatch between the two paradigms. I need something unified.
WebSocket & Comet are popular methods to implement change notification, but they need additional libraries to be productive and I am not aware of any libraries suitable for my application.
Message queues merely put an intermediary on the network while maintaining the basic message passing pattern. Custom message filters/routers allow me to get closer to the live document concept, but I feel like I am implementing custom middleware layer on top of the MQ.
I have tons of additional requirements (native observable data structure API on both ends, incremental updates, custom message filters, custom connection routing, cross-platform, robustness & scalability), but before considering those requirements, I need to find some tools that at least attempt to do what I need. I am trying to avoid in-house frameworks for the standard reasons - cost, time to market, long-term maintenance, and keeping developers happy.
My conclusion at the moment is that there is no such live document synchronization framework. In-house solution is the way to go, but many existing components can be used as part of the solution.
It is pretty simple to layer live document logic on top of WebSocket or any other message passing platform. Server just sends the document as a separate message when the connection is initiated and then after every change. Automated reconnection and some connection monitoring must be added to handle network failures.
Serialization at both ends is a separate problem targeted by many existing libraries. Detecting changes in server-side data structures (needed to initiate push) is yet another separate problem that has its own set of patterns and tools. Incremental updates and many other issues can be solved by intermediaries intercepting the connection.
This approach will work with current technology at the cost of extensive in-house glue code. It can be incrementally substituted with standard components as they become available.
WebSocket already includes resource URIs, routing, and a few other nice features. Useful intermediaries and libraries will likely emerge in the future. HTTP with text/event-stream MIME type is a possible future alternative to WebSocket. The advantage of HTTP is that existing tools can be reused with little modification.
I've completely thrown away the pattern of combining RPC pull with separate push channel despite rich tool support. Pushing everything in 0.5 RTT requires the push channel to use exactly the same technology as the pull channel, i.e. reverse RPC. Reverse RPC is like message passing except it introduces redundant returns, throws away useful connection semantics, and makes it hard to insert content-agnostic intermediaries into the stream.

Message queuing solution for millions of topics

I'm thinking about system that will notify multiple consumers about events happening to a population of objects. Every subscriber should be able to subscribe to events happening to zero or more of the objects, multiple subscribers should be able to receive information about events happening to a single object.
I think that some message queuing system will be appropriate in this case but I'm not sure how to handle the fact that I'll have millions of the objects - using separate topic for every of the objects does not sound good [or is it just fine?].
Can you please suggest approach I should should take and maybe even some open source message queuing system that would be reasonable?
Few more details:
there will be thousands of subscribers [meaning not plenty of them],
subscribers will subscribe to tens or hundreds of objects each,
there will be ~5-20 million of the objects,
events themselves dont have to carry any message. just information that that object was changed is enough,
vast majority of objects will never be subscribed to,
events occur at the maximum rate of few hundreds per second,
ideally the server should run under linux, be able to integrate with the rest of the ecosystem via http long-poll [using node js? continuations under jetty?].
Thanks in advance for your feedback and sorry for somewhat vague question!
I can highly recommend RabbitMQ. I have used it in a couple of projects before and from my experience, I think it is very reliable and offers a wide range of configuraions. Basically, RabbitMQ is an open-source ( Mozilla Public License (MPL) ) message broker that implements the Advanced Message Queuing Protocol (AMQP) standard.
As documented on the RabbitMQ web-site:
RabbitMQ can potentially run on any platform that Erlang supports, from embedded systems to multi-core clusters and cloud-based servers.
... meaning that an operating system like Linux is supported.
There is a library for node.js here: https://github.com/squaremo/rabbit.js
It comes with an HTTP based API for management and monitoring of the RabbitMQ server - including a command-line tool and a browser-based user-interface as well - see: http://www.rabbitmq.com/management.html.
In the projects I have been working with, I have communicated with RabbitMQ using C# and two different wrappers, EasyNetQ and Burrow.NET. Both are excellent wrappers for RabbitMQ but I ended up being most fan of Burrow.NET as it is easier and more obvious to work with ( doesn't do a lot of magic under the hood ) and provides good flexibility to inject loggers, serializers, etc.
I have never worked with the amount of amount of objects that you are going to work with - I have worked with thousands ( not millions ). However, no matter how many objects I have been playing around with, RabbitMQ has always worked really stable and has never been the source to errors in the system.
So to sum up - RabbitMQ is simple to use and setup, supports AMQP, can be managed via HTTP and what I like the most - it's rock solid.
Break up the topics to carry specific events for e.g. "Object updated topic" "Object deleted"...So clients need to only have to subscribe to the "finite no:" of event based topics they are interested in.
Inject headers into your messages when you publish them and put intelligence into the clients to use these headers as message selectors. For eg, client knows the list of objects he is interested in - and say you identify the object by an "id" - the id can be the header, and the client will use the "id header" to determine if he is interested in the message.
Depending on whether you want, you may also want to consider ensuring guaranteed delivery to make sure that the client will receive the message even if it goes off-line and comes back later.
The options that I would recommend top of the head are ActiveMQ, RabbitMQ and Redis PUB SUB ( Havent really worked on redis pub-sub, please use your due diligance)
Finally here are some performance benchmarks for RabbitMQ and Redis
Just saw that you only have few 100 messages getting pushed out / sec, this is not a big deal for activemq, I have been using Amq on a system that processes 240 messages per second , and it just works fine. I use a thread pool of workers to asynchronously process the messages though . Look at a framework like akka if you are in the java land, if not stick with nodejs and the cool Eco system around it.
If it has to be open source i'd go for ActiveMQ, and an application server to provide the JMS functionality for topics and it has Ajax Support so you can access them from your client
So, you would use the JMS infrastructure to publish the topics for the objects, and you can create topis as you need them
Besides, by using an java application server you may be able to take advantages from clustering, load balancing and other high availability features (obviously based on the selected product)
Hope that helps!!!
Since your messages are very small might want to consider MQTT, which is designed for small devices, although it works fine on powerful devices as well. Key consideration is the low overhead - basically a 2 byte header for a small message. You probably can't use any simple or open source MQTT server, due to your volume. You probably need a heavy duty dedicated appliance like a MessageSight to handle your volume.
Some more details on your application would certainly help. Also you don't mention security at all. I assume you must have some needs in this area.
Though not sure about your work environment but here are my bits. Can you identify each object with unique ID in your system. If so, you can have a topic per each event type. for e.g. you want to track object deletion event, object updation event and so on. So you can have topic for each event type. These topics would be published with Ids of object whenever corresponding event happened to the object. This will limit the no of topics you needed.
Second part of your problem is different subscribers want to subscribe to different objects. So not all subscribers are interested in knowing events of all objects. This problem statement scoped to message selector(filtering) mechanism provided by messaging framework. So basically you need to seek on what basis a subscriber interested in particular object. Have that basis as a message filtering mechanism. It could be anything: object type, object state etc. So ultimately your system would consists of one topic for each event type with someone publishing event messages : {object-type:object-id} information. Subscribers could subscribe to any topic and with an filtering criteria.
If above solution satisfy, you can use any messaging solution: activeMQ, WMQ, RabbitMQ.

Does RabbitMq do round-robin from the exchange to the queues

I am currently evaluating message queue systems and RabbitMq seems like a good candidate, so I'm digging a little more into it.
To give a little context I'm looking to have something like one exchange load balancing the message publishing to multiple queues. I don't want to replicate the messages, so a fanout exchange is not an option.
Also the reason I'm thinking of having multiple queues vs one queue handling the round-robin w/ the consumers, is that I don't want our single point of failure to be at the queue level.
Sounds like I could add some logic on the publisher side to simulate that behavior by editing the routing key and having the appropriate bindings in place. But that's kind of a passive approach that wouldn't take the pace of the message consumption on each queue into account, potentially leading to fill up one queue if the consumer applications for that queue are dead.
I was looking for a more pro-active way from the exchange entity side, that would decide where to send the next message based on each queue size or something of that nature.
I read about Alice and the available RESTful APIs but that seems kind of a heavy duty solution to implement fast routing decisions.
Anyone knows if round-robin between the exchange the queues is feasible w/ RabbitMQ then? Thanks.
Exchanges are generally stateless in the AMQP model, though there have been some recent experiments in stateful exchanges now that there's both a system for managing RabbitMQ plugins and for providing new experimental exchange types.
There's nothing that does quite what you want, I don't think, though I'm not completely sure I understand the requirement. Aside from the single-point-of-failure point, would having a single queue with workers reading from it solve your problem? If so, then your problem reduces to configuring RabbitMQ in an HA configuration that permits you to use that solution. There are a couple of approaches to doing that: either use HALinux and a shared store to get active/passive HA with quick failover, or set up more than one parallel broker and deduplicate on the client, perhaps using redis or similar to do so.
I suggest asking your question again on the rabbitmq-discuss mailing list, where more people will be able to offer suggestions, and where the discussion can be archived for posterity.
Agree with Tony on the approach.
Here is a 'mashup' of RabbitMQ, Redis that you could use instead of rolling your own -
http://xing.github.com/beetle/
One built in way you can do a form of sharing a form exchange to queues, but not exactly round robin, is Consistent Hashing. rabbitmq_consistent_hash_exchange
How too
https://medium.com/#eranda/rabbitmq-x-consistent-hashing-with-wso2-esb-27479b8d1d21
Paper to explain, it puts queues at a weighted distribution on a circle and then by sending random routing key it will send to the closest queue.
http://www8.org/w8-papers/2a-webserver/caching/paper2.html

How Does Listening to a Multicast Hurt Me?

I'm receiving a recovery feed from an exchange for recovering data missed from their primary feed.
The exchange strongly recommends listening to the recovery feed only when data is needed, and leaving the multicast once I have recovered the data I need.
My question is, if I am using asio, and not reading from the NIC when I don't need it, what is the harm? The messages have sequence numbers, so I can't accidentally process an old message "left" on the card.
Is this really harming my application?
It's likely not harming your application so much as harming your machine - since the nic is still configured into the multicast group, it's still listening to those messages and passing them up, before your software ignores them and they get discarded. That's a lot of extra work that your network stack and kernel are doing, and therefore a lot of extra load on the machine in general, not just your app.
Listening to your recovery feed could also have a potential impact on a network level. As pjz mentioned, your NIC and IP stack will have more frames/packets to process. In addition, more of your available bandwidth is being used up by data that is not being used by your application; this could lead to dropped frames if there is congestion on your link. Whether congestion is likely to occur depends on whether your server is 100Mb or 1Gb attached, how much other traffic your host is sending/receiving, etc.
Another potential concern is the impact on other hosts. If the switch your host is attached to does not have IGMP snooping enabled, then all hosts on the same VLAN will receive the additional multicast traffic, which could lead them to experience the same problems as mentioned above.
If you have a networking team administering your network for you, it may be worth seeking out some recommendations from them? If you feel it is necessary to subscribe to a redundant feed, it would seem prudent to work out what level of redundancy already exists in your network and how likely it is that messages on the primary feed will be lost.
An addition to muz's comment...
It's unlikely that this will make any difference to your system, but it's worth being aware that there is an overhead associated with maintaining a multicast membership (assuming that you're using IGMP - which is probably reasonable given the restriction about "leaving the multicast")
IGMP requires the sending and processing of multicast group memberships at regular intervals. And (as alluded to in muz's comment) if you have any switches or routers between you and the multicast source that are capable of igmp snooping then they are able to disable the multicast for a given network.