Lightweight Message Bus with broadcasting and routing capabilities? - message-queue

I am trying to find the most light-weight message bus (queue?) that can handle the following:
Producer A subscribes to the bus. The bus is specified via a well known form of identification (like a name, a socket or something).
Consumer B subscribes to the same bus and registers to only a certain kind of messages.
Consumer C subscribes to the same bus and registers to another kind of messages that overlaps with that of B.
Producer A puts a message in the bus such that both B and C are interested in. Both B and C receive the message (not just one of them, but both of them).
A, B, C and the bus are residing in different machines.

You are probably best using a queue (such as ActiveMQ) for this and for the lightweight requirement, you can use MQTT as the underlying protocol which is extremely lightweight.
See:
http://activemq.apache.org/mqtt.html
http://mqtt.org/
For your scenario:
Your producer A will connect to the queue
Your consumer B will connect to the same queue and listens to a topic (for instance /topic/onlyBisInterested). It will also subscribe to /topic/everyoneIsInterested
Your consumer C will connect to the queue and listens to /topic/everyoneIsInterested
Your producer will publish a message to /topic/everyoneIsInterested and both B and C will receive this message.
You can play around with the topics a lot. You can have topic matching rules on a wildcard basis, on an exact basis or any combination thereof. There are also levels of hierarchy (such as /a/b/*) that can be setup which will accept (/a/b/c or /a/b/d or anything else).
You can also transfer your data over SSL and use authentication if required and use message persistence and guaranteed delivery in case one/many of your node(s) go down.
I recommend reading the official documentation and deciding if it's good enough for you. There are plenty of other free or commercial queues available, most/all of which will satisfy your requirements since they are very standard.

We've written a peer-to-peer message bus, Zebus, based on ZeroMq (transport), Cassandra (peer discovery and persistence) and Protobuf (serialisation). It supports routing of messages based on properties of the messages.
It is open source and production tested https://github.com/Abc-Arbitrage/Zebus
Zebus is being actively developed and it is in heavy in-house production use. Currently there is only a .NET language binding.

Related

Message bus vs. Service bus vs. Event hub vs Event grid

I'm learning the messaging system and got confused by those terminology.
All the messaging system below provides loose coupling between services with different sets of features.
queue - FIFO, pulling mechanism, 1 consumer each queue but any number of producers?
message bus - pub/sub model, any number of consumers with any number of producers processing messages? Is Azure Service Bus an implementation of message bus?
event bus - pub/sub model, any number of consumers with any number of producers processing events?
Do people use message bus and event bus interchangeably as far as terminology goes?
What are the difference between events and messages? Are those just synonyms in this context?
event hub - pub/sub model, partition, replay, consumers can store events in external storage or close to real-time data analysis. What exactly is event hub?
event grid - it can be used as a downstream service of event hub. What does it exactly do that event hub doesn't do?
Can someone provide some historical context as how each technology evolve to another each tied with some practical use cases?
I've found message bus vs. message queue helpful
Even thou all these services deal with the transfer of data from source to target and might seem similar falling under the umbrella messaging services they do differ in their intent.
High-level definition:
Azure Event Grids – Event-driven publish-subscribe model (think reactive programming)
Azure Event Hubs – Multiple source big data streaming pipeline (think telemetry data)
Azure Service Bus- Traditional enterprise broker messaging system (Similar to Azure Queue but provide many advanced features depending on use case full comparison)
Difference between Event Grids & Event Hubs
Event Grids doesn’t guarantee the order of the events, but Event Hubs use partitions which are ordered sequences, so it can maintain the order of the events in the same partition.
Event Hubs are accepting only endpoints for the ingestion of data and they don’t provide a mechanism for sending data back to publishers. On the other hand, Event Grids sends HTTP requests to notify events that happen in publishers.
Event Grid can trigger an Azure Function. In the case of Event Hubs, the Azure Function needs to pull and process an event.
Event Grids is a distribution system, not a queueing mechanism. If an event is pushed in, it gets pushed out immediately and if it doesn’t get handled, it’s gone forever. Unless we send the undelivered events to a storage account. This process is known as dead-lettering.
In Event Hubs the data can be kept for up to seven days and then replayed. This gives us the ability to resume from a certain point or to restart from an older point in time and reprocess events when we need it.
Difference between Event Hubs & Service Bus
To the external publisher or the receiver Service Bus and Event Hubs can look very similar and this is what makes it difficult to understand the differences between the two and when to use what.
Event Hubs focuses on event streaming where Service Bus is more of a traditional messaging broker.
Service Bus is used as the backbone to connects applications running in the cloud to other applications or services and transfers data between them whereas Event Hubs is more concerned about receiving massive volume of data with high throughout and low latency.
Event Hubs decouples multiple event-producers from event-receivers whereas Service Bus aims to decouple applications.
Service Bus messaging supports a message property ‘Time to Live’ whereas Event Hubs has a default retention period of 7 days.
Service Bus has the concept of message session. It allows relating messages based on their session-id property whereas Event Hubs does not.
Service Bus the messages are pulled out by the receiver & cannot be processed again whereas Event Hubs message can be ingested by multiple receivers.
Service Bus uses the terminology of queues and topics whereas Event Hubs partitions terminology is used.
Use this loose general rule of thumb.
SOMETHING HAS HAPPENED – Event Hubs
DO SOMETHING or GIVE ME SOMETHING – Service Bus
As #Louie Almeda stated you may find this link to the official Azure documentation useful.
I found this comparison from Azure docs extremely helpful. Here's the key distinction between events and messages.
Event vs. message services
There's an important distinction to note
between services that deliver an event and services that deliver a
message.
Event
An event is a lightweight notification of a condition or a state
change. The publisher of the event has no expectation about how the
event is handled. The consumer of the event decides what to do with
the notification. Events can be discrete units or part of a series.
Discrete events report state change and are actionable. To take the
next step, the consumer only needs to know that something happened.
The event data has information about what happened but doesn't have
the data that triggered the event. For example, an event notifies
consumers that a file was created. It may have general information
about the file, but it doesn't have the file itself. Discrete events
are ideal for serverless solutions that need to scale.
Series events
report a condition and are analyzable. The events are time-ordered and
interrelated. The consumer needs the sequenced series of events to
analyze what happened.
Message
A message is raw data produced by a service to be consumed or stored
elsewhere. The message contains the data that triggered the message
pipeline. The publisher of the message has an expectation about how
the consumer handles the message. A contract exists between the two
sides. For example, the publisher sends a message with the raw data,
and expects the consumer to create a file from that data and send a
response when the work is done.
Comparison of those different services were also discussed, so be sure to check it out.
I agree with your remarks about overloaded terms, especially with cloud-service marketing jargon....
Historically, I events and messages had more distinct meanings
- event was term used to refer to communication within the same process whereas
- message referred to communication across different processes.
Regarding the "bus", I can give you some "historical" information, as I learned to be a sound engineer. In a music mixer, you also have a "bus" and "routing" for mixing signals. In the case of a mixer, we are talking about electrical signals, either being in the mix or not!
Regarding the messaging system, think of "bus", "hub" and "grid" to be synonyms! They are all fancy words for the same thing. They are trying to express some kind of transportation system that includes some kind of routing, because you always have producers and consumers - and this can be an N:M relation. Depending on the use case.
A queue is typically a bit different, but its effect can be the same. A queue means something where things are in line, like a queue of people to buy something! (Theatre tickets....)
Nowadays, everything is digital, which in its essence means it can be countable. That's how "messages" came into existance! A music mixer would traditionally mix analogous signals, which are not countable but continuous, so the information would be f.ex. spoken voices or any kind of sound. Today, a "message" means some kind of information package, which is unique and countable. So it is a "thing" you can add to and remove from a queue, or send it to a hub for consumers to consume it.
Don't worry, you'll get used to those terms! I hope I was able to give you an idea.

Queue data publish/duplicate

I am using IBM webSphere MQ 7.5 server as queue manager for my applications.
Already I am receiving data through single queue.
On the other hand there are 3 applications that want to process data.
I have 3 solutions to duplicate/distribute data between them.
Use broker to duplicate 1 to 3 queue - I don't have broker so it is not accessible for me.
Write an application to get from queue and put them in other 3 queues on same machine
Define publish/subscribe definitions to publish input queue to 3 queues on same machine.
I want to know which methods (2 & 3) is preferred and have higher performance and acceptable operational management effort.
Based on the description I would say that going PubSub would achieve the goal; try to thinking in pure PubSub terms rather than thinking about the queues. i.e. You have an application that publishes to a topic with then 3 applications each having their own subscription to get copies of the message.
You then have the flexibility to define durable/nondurable subscriptons for example.
For option # 2, there are (at least) 2 solutions available:
There is an open source application called MMX (Message Multiplexer). It will do exactly what you describe. The only issue is that you will need to manage the application. i.e. If you stop the queue manager then the application will need to be manually restarted.
There is a commercial solution called MQ Message Replication. It is an API Exit that runs within the queue manager and does exactly what you want. Note: There is nothing external to manage as it runs within the queue manager.
I think there is another solution, with MQ only, to define a Namelist which will mirror queue1 to queue2 and queue3
Should be defined like: Source, Destination, QueueManager.
Hope it is useful.
Biruk.

Are there any Python 2.7 alternatives to ZeroMQ that are released under the BSD or MIT license?

I am seeking Python 2.7 alternatives to ZeroMQ that are released under the BSD or MIT license. I am looking for something that supports request-reply and pub-sub messaging patterns. I can serialize the data myself if necessary. I found Twisted from Twisted Matrix Labs but it appears to require a blocking event loop, i.e. reactor.run(). I need a library that will run in the background and let my application check messages upon certain events. Are there any other alternatives?
Give nanomsg, a ZeroMQ younger sister, a try - same father, same beauty
Yes, it is licensed under MIT/X11 license.
Yes, REQ/REP - allows to build clusters of stateless services to process user requests
Yes, PUB/SUB - distributes messages to large sets of interested subscribers
Has several Python bindings available
https://github.com/tonysimpson/nanomsg-python (recommended)
https://github.com/sdiehl/pynanomsg
https://github.com/djc/nnpy
Differences between nanomsg and ZeroMQ
( state as of 2014/11 v0.5-beta - courtesy nanomsg.org >>> a-click-thru to the original HyperDoc )
Licensing
nanomsg library is MIT-licensed. What it means is that, unlike with ZeroMQ, you can modify the source code and re-release it under a different license, as a proprietary product, etc. More reasoning about the licensing can be found here.
POSIX Compliance
ZeroMQ API, while modeled on BSD socket API, doesn't match the API fully. nanomsg aims for full POSIX compliance.
Sockets are represented as ints, not void pointers.
Contexts, as known in ZeroMQ, don't exist in nanomsg. This means simpler API (sockets can be created in a single step) as well as the possibility of using the library for communication between different modules in a single process (think of plugins implemented in different languages speaking each to another). More discussion can be found here.
Sending and receiving functions ( nn_send, nn_sendmsg, nn_recv and nn_recvmsg ) fully match POSIX syntax and semantics.
Implementation Language
The library is implemented in C instead of C++.
From user's point of view it means that there's no dependency on C++ runtime (libstdc++ or similar) which may be handy in constrained and embedded environments.
From nanomsg developer's point of view it makes life easier.
Number of memory allocations is drastically reduced as intrusive containers are used instead of C++ STL containers.
The above also means less memory fragmentation, less cache misses, etc.
More discussion on the C vs. C++ topic can be found here and here.
Pluggable Transports and Protocols
In ZeroMQ there was no formal API for plugging in new transports (think WebSockets, DCCP, SCTP) and new protocols (counterparts to REQ/REP, PUB/SUB, etc.) As a consequence there were no new transports added since 2008. No new protocols were implemented either. The formal internal transport API (see transport.h and protocol.h) are meant to mitigate the problem and serve as a base for creating and experimenting with new transports and protocols.
Please, be aware that the two APIs are still new and may experience some tweaking in the future to make them usable in wide variety of scenarios.
nanomsg implements a new SURVEY protocol. The idea is to send a message ("survey") to multiple peers and wait for responses from all of them. For more details check the article here. Also look here.
In financial services it is quite common to use "deliver messages from anyone to everyone else" kind of messaging. To address this use case, there's a new BUS protocol implemented in nanomsg. Check the details here.
Threading Model
One of the big architectural blunders I've done in ZeroMQ is its threading model. Each individual object is managed exclusively by a single thread. That works well for async objects handled by worker threads, however, it becomes a trouble for objects managed by user threads. The thread may be used to do unrelated work for arbitrary time span, e.g. an hour, and during that time the object being managed by it is completely stuck. Some unfortunate consequences are: inability to implement request resending in REQ/REP protocol, PUB/SUB subscriptions not being applied while application is doing other work, and similar. In nanomsg the objects are not tightly bound to particular threads and thus these problems don't exist.
REQ socket in ZeroMQ cannot be really used in real-world environments, as they get stuck if message is lost due to service failure or similar. Users have to use XREQ instead and implement the request re-trying themselves. With nanomsg, the re-try functionality is built into REQ socket.
In nanomsg, both REQ and REP support cancelling the ongoing processing. Simply send a new request without waiting for a reply (in the case of REQ socket) or grab a new request without replying to the previous one (in the case of REP socket).
In ZeroMQ, due to its threading model, bind-first-then-connect-second scenario doesn't work for inproc transport. It is fixed in nanomsg.
For similar reasons auto-reconnect doesn't work for inproc transport in ZeroMQ. This problem is fixed in nanomsg as well.
Finally, nanomsg attempts to make nanomsg sockets thread-safe. While using a single socket from multiple threads in parallel is still discouraged, the way in which ZeroMQ sockets failed randomly in such circumstances proved to be painful and hard to debug.
State Machines
Internal interactions inside the nanomsg library are modeled as a set of state machines. The goal is to avoid the incomprehensible shutdown mechanism as seen in ZeroMQ and thus make the development of the library easier.
For more discussion see here and here.
IOCP Support
One of the long-standing problems in ZeroMQ was that internally it uses BSD socket API even on Windows platform where it is a second class citizen. Using IOCP instead, as appropriate, would require major rewrite of the codebase and thus, in spite of multiple attempts, was never implemented. IOCP is supposed to have better performance characteristics and, even more importantly, it allows to use additional transport mechanisms such as NamedPipes which are not accessible via BSD socket API. For these reasons nanomsg uses IOCP internally on Windows platforms.
Level-triggered Polling
One of the aspects of ZeroMQ that proved really confusing for users was the ability to integrate ZeroMQ sockets into an external event loops by using ZMQ_FD file descriptor. The main source of confusion was that the descriptor is edge-triggered, i.e. it signals only when there were no messages before and a new one arrived. nanomsg uses level-triggered file descriptors instead that simply signal when there's a message available irrespective of whether it was available in the past.
Routing Priorities
nanomsg implements priorities for outbound traffic. You may decide that messages are to be routed to a particular destination in preference, and fall back to an alternative destination only if the primary one is not available.
For more discussion see here.
TCP Transport Enhancements
There's a minor enhancement to TCP transport. When connecting, you can optionally specify the local interface to use for the connection, like this:
nn_connect (s, "tcp://eth0;192.168.0.111:5555").
Asynchronous DNS
DNS queries (e.g. converting hostnames to IP addresses) are done in asynchronous manner. In ZeroMQ such queries were done synchronously, which meant that when DNS was unavailable, the whole library, including the sockets that haven't used DNS, just hung.
Zero-Copy
While ZeroMQ offers a "zero-copy" API, it's not true zero-copy. Rather it's "zero-copy till the message gets to the kernel boundary". From that point on data is copied as with standard TCP. nanomsg, on the other hand, aims at supporting true zero-copy mechanisms such as RDMA (CPU bypass, direct memory-to-memory copying) and shmem (transfer of data between processes on the same box by using shared memory). The API entry points for zero-copy messaging are nn_allocmsg and nn_freemsg functions in combination with NN_MSG option passed to send/recv functions.
Efficient Subscription Matching
In ZeroMQ, simple tries are used to store and match PUB/SUB subscriptions. The subscription mechanism was intended for up to 10,000 subscriptions where simple trie works well. However, there are users who use as much as 150,000,000 subscriptions. In such cases there's a need for a more efficient data structure. Thus, nanomsg uses memory-efficient version of Patricia trie instead of simple trie.
For more details check this article.
Unified Buffer Model
ZeroMQ has a strange double-buffering behaviour. Both the outgoing and incoming data is stored in a message queue and in TCP's tx/rx buffers. What it means, for example, is that if you want to limit the amount of outgoing data, you have to set both ZMQ_SNDBUF and ZMQ_SNDHWM socket options. Given that there's no semantic difference between the two, nanomsg uses only TCP's (or equivalent's) buffers to store the data.
Scalability Protocols
Finally, on philosophical level, nanomsg aims at implementing different "scalability protocols" rather than being a generic networking library. Specifically:
Different protocols are fully separated, you cannot connect REQ socket to SUB socket or similar.
Each protocol embodies a distributed algorithm with well-defined prerequisites (e.g. "the service has to be stateless" in case of REQ/REP) and guarantees (if REQ socket stays alive request will be ultimately processed).
Partial failure is handled by the protocol, not by the user. In fact, it is transparent to the user.
The specifications of the protocols are in /rfc subdirectory.
The goal is to standardise the protocols via IETF.
There's no generic UDP-like socket (ZMQ_ROUTER), you should use L4 protocols for that kind of functionality.

Any links to Service Broker best practices?

I am looking for any authoritative articles on Service Broker best practices.
In particular, I am looking for the following topics (I know the answers, but have to find documents that support the knowledge):
queues in the same database
message
size
systems where the message is just a pointer and data is retrieved from tables
instrumentation - auditing Service Broker applications
TIA
systems where the message is just a pointer and data is retrieved from tables
This is not a Service Broker application, is just a queueing application. Service Broker was designed primarily for distributed applications, communication (networking, security, routing, retries) is a major component. If you only send messages as pointers and the data is in tables the distributed nature of SSB falls apart. The litmus test is "can I move my service onto another server and the application continues to work after I fix the routing?". If the answer is Yes then you're using SSB the way it was designed. If is No it meas you're only interested in queues.
The problem with using SSB as a 'dumb queue' is that is a very expensive queue (just think at the extra writes required on each message due to conversations and conversation groups). RECEIVE statement is expensive and basically a black box from optimization pov. You could optimize a table used as a queue a lot better than what you can do with an SSB service/queue. I reckon that SSB has an ace up its sleeves which makes it attractive even when used as a local queue, namely the internal activation capabilities. One may say that activation cannot be replaced with anything else (I agree, it cannot), but one must be aware of the cost and balance the pros and cons.

Messaging Confusion: Pub/Sub vs Multicast vs Fan Out

I've been evaluating messaging technologies for my company but I've become very confused by the conceptual differences between a few terms:
Pub/Sub vs Multicast vs Fan Out
I am working with the following definitions:
Pub/Sub has publishers delivering a separate copy of each message to
each subscriber which means that the opportunity to guarantee delivery exists
Fan Out has a single queue pushing to all listening
clients.
Multicast just spams out data and if someone is listening
then fine, if not, it doesn't matter. No possibility to guarantee a client definitely gets a message.
Are these definitions right? Or is Pub/Sub the pattern and multicast, direct, fanout etc. ways to acheive the pattern?
I'm trying to work the out-of-the-box RabbitMQ definitions into our architecture but I'm just going around in circles at the moment trying to write the specs for our app.
Please could someone advise me whether I am right?
I'm confused by your choice of three terms to compare. Within RabbitMQ, Fanout and Direct are exchange types. Pub-Sub is a generic messaging pattern but not an exchange type. And you didn't even mention the 3rd and most important Exchange type, namely Topic. In fact, you can implement Fanout behavior on a Topic exchange just by declaring multiple queues with the same binding key. And you can define Direct behavior on a Topic exchange by declaring a Queue with * as the wildcard binding key.
Pub-Sub is generally understood as a pattern in which an application publishes messages which are consumed by several subscribers.
With RabbitMQ/AMQP it is important to remember that messages are always published to exchanges. Then exchanges route to queues. And queues deliver messages to subscribers. The behavior of the exchange is important. In Topic exchanges, the routing key from the publisher is matched up to the binding key from the subscriber in order to make the routing decision. Binding keys can have wildcard patterns which further influences the routing decision. More complicated routing can be done based on the content of message headers using a headers exchange type
RabbitMQ doesn't guarantee delivery of messages but you can get guaranteed delivery by choosing the right options(delivery mode = 2 for persistent msgs), and declaring exchanges and queues in advance of running your application so that messages are not discarded.
Your definitions are pretty much correct. Note that guaranteed delivery is not limited to pub/sub only, and it can be done with fanout too. And yes, pub/sub is a very basic description which can be realized with specific methods like fanout, direct and so on.
There are more messaging patterns which you might find useful. Have a look at Enterprise Integration Patterns for more details.
From an electronic exchange point of view the term “Multicast” means “the message is placed on the wire once” and all client applications that are listening can read the message off the “wire”. Any solution that makes N copies of the message for the N clients is not multicast. In addition to examining the source code one can also use a “sniffer” to determine how many copies of the message is sent over the wire from the messaging system. And yes, multicast messages are a form the UDP protocol message. See: http://en.wikipedia.org/wiki/Multicast for a general description. About ten years ago, we used the messaging system from TIBCO that supported multicast. See: https://docs.tibco.com/pub/ems_openvms_c_client/8.0.0-june-2013/docs/html/tib_ems_users_guide/wwhelp/wwhimpl/common/html/wwhelp.htm#context=tib_ems_users_guide&file=EMS.5.091.htm